cancel
Showing results for 
Search instead for 
Did you mean: 

How to keep original file creation date

marc1911
Champ in-the-making
Champ in-the-making
Hi there,

When I upload a Word document through the Webclient- or CIFS interface, the original creation date of my file changes automatically to the date of today. This is not what I anticipated. Is there any way to keep the original file date when (bulk) uploading? I need to upload financial documents from last year. If I continue the process it looks that all these documents are from july 2007 and this is really not what we want.

Regards,

Marc
54 REPLIES 54

tschiller
Champ in-the-making
Champ in-the-making
Ok… I see how to override the content-services-context.xml file by using the external configuration file custom-metadata-extractors-context.xml. However, it seems that in content-services-context.xml, a bean is already defined for extracting Office metadata
 <bean id="extracter.Office" class="org.alfresco.repo.content.metadata.OfficeMetadataExtracter" parent="baseMetadataExtracter" />
I don't want to override this, so much as have it do what it is supposed to do. It seems (and please let me know if I am incorrect), that by having the OfficeMetadataExtracter bean in the content-services-context.xml file, the bean should populate creationDate, creator, description, and title by default (according the the Metadata Extractor wiki). So why when I import files into alfresco is the creation date set at the date/time of upload, instead of the actual file creation date?

Or, is it that the beans listed under the <!– Content Metadata Extracters –> line are just there as placeholders, and then you still have to define the mapping properties in the custom-metadata-extractors-context.xml file? However, that file says it shows how to modify the mapping properties of the Metadata Extractors. I just want them to work with their default mappings! 😕  Any ideas?

tschiller
Champ in-the-making
Champ in-the-making
I just saw this thread: http://forums.alfresco.com/viewtopic.php?t=3552
Is this happening because I'm uploading through CIFS? I will try uploading through the web portal and see what happens…
Has there been any progress with this issue?

tschiller
Champ in-the-making
Champ in-the-making
Nope, even when I upload the file through the web portal, and even though I have a rule telling the system to extract common metadata, it still does not work. 😛

derek
Star Contributor
Star Contributor
I don't want to override this, so much as have it do what it is supposed to do
It does what it is meant to according to many other people.  If you need it to behave differently, modify the config.
Has there been any progress with this issue?
No, because the progress needs to be in your config files.
I just want them to work with their default mappings!
They do work - they've been working for ages.  What you need is for the behaviour to match your requirements.  If enough of our customers request different default behaviour then we will gladly change it.  Ofcourse, we can't please everyone which is why it is configurable.  It wasn't always configurable - not too long ago you would have had to write your own implementation of the extractors.

Regards

tschiller
Champ in-the-making
Champ in-the-making
Ok, it's quite possible I'm doing it wrong- I am more than willing to admit that.

What you need is for the behaviour to match your requirements.
Correct. That is where the problem lies.

This is what I want to happen:
When files (let's just limit it to MS Office files for now) are uploaded into the system, they populate with the original creation date/time of the file, not the upload date/time.

This is my view on extractors:
They are defined in the <WEB-INF>/classes/alfresco/content-service-context.xml file.  The default set includes PdfBoxMetadataExtracter, OfficeMetadataExtracter, MailMetadataExtracter, HtmlMetadataExtracter, MP3MetadataExtracter, OpenDocumentMetadataExtracter, and OpenOfficeMetadataExtracter. If you want to override the default set of extractors, you need to do it in the <extension-config>/alfresco/extension/custom-repository-context.xml file.

The Metadata Extraction wiki says:
By default, the extractor will not overwrite any properties already present in the document's metadata, but this can be changed by overriding the extractor's bean definition.
The Javadocs for the extractor give the list (on the left) of values extracted from the document. All these extracted values are put into a map, ready for conversion to model-specific properties. By default, the following will be populated by the extractor:

  creationDate:           –      cm:created
  creator:                –      cm:author
  description:            –      cm:description
  title:                  –      cm:title

Have I been confused by the wiki saying both "By default, the following will be populated by the extractor…" AND "By default, the extractor will not overwrite any properties already present in the document's metadata…"? Does this mean that, although the OfficeMetadataExtracter bean is defined in content-service-context.xml, it doesn't actually do anything until you "override" the bean definition in custom-repository-context.xml?

Also, where does the rule come into play? If I change custom-repository-context.xml, is the metadata extraction automatic? Or do I still have to define a rule for the space, telling it to extract common metadata from an incoming file? As an aside, I have defined the rule in the top space and applied it to subspaces. However, when I go to those subspaces it says that (0) rules are applied to that subspace, but when I go to Manage Content Rules the rule is there (and Local=no). The only time I actually see that (1) rules is applied to the space is at that top space.

Please let me know if you can see where I am going wrong. Thank you so much for your help with this!

derek
Star Contributor
Star Contributor
Hi,

the extractor will not overwrite any properties already present in the document's metadata

The creation date is automatically populated by the system.  If you want the extractor to overwrite the value, set the overwritePolicy property.

As for the triggering of the extraction, you will have to do this using the rules.  I'm assuming that we're not working in Web Projects, but rather in the normal document management spaces.  The best way to play with this is to upload a new document and check that the file's description is set.  When you upload via the Web Client, one of the wizard steps will actually do an extraction and present the values for changing.  If you get the correct description there, then you know that the extractors are working.  That should be out of the box.

To get automatic application of the extractors, the rules will need to be firing.  Once again, the basic non-overwriting extractors will be available to the rule.

Once it's working, override the beans and set the overwrite policy to have the creation date set.

Regards

finner
Champ in-the-making
Champ in-the-making
Hi,
I have overridden the metadata extractors in order to extract the creation date.
Below is the open office bean:

custom-metadata-extractors-content.xml



   <bean class="org.alfresco.repo.content.metadata.OpenOfficeMetadataExtracter" parent="baseMetadataExtracter" init-method="init" >

      <property name="overwitePolicy">
            <value>CAUTIOUS</value>
      </property>
        <property name="mappingProperties">
            <props>
                <prop key="namespace.prefix.cm">http://www.alfresco.org/model/content/1.0</prop>
                <prop key="creationDate">cm:created</prop>
            </props>
        </property>
      <property name="connection">

         <ref bean="openOfficeConnection" />

      </property>

   </bean>



and a Rule on the Company Home to extract common metadata on all inbound items.
However, the date on the file is today. Is my config file correct or am I missing a step ?

Thanks in advance.
Finner

derek
Star Contributor
Star Contributor
You should inheritDefaultMapping (unless it is the only property you want extracted) and use EAGER modification metadata.

finner
Champ in-the-making
Champ in-the-making
Hi Derek,
much abliged for the quick response. I still can't keep the original date of a document.



   <bean class="org.alfresco.repo.content.metadata.OpenOfficeMetadataExtracter" parent="baseMetadataExtracter" init-method="init" >

        <property name="inheritDefaultMapping">
            <value>true</value>
        </property>
      <property name="overwritePolicy">
            <value>EAGER</value>
      </property>
        <property name="mappingProperties">
            <props>
                <prop key="namespace.prefix.cm">http://www.alfresco.org/model/content/1.0</prop>
                <prop key="creationDate">cm:created</prop>
            </props>
        </property>
      <property name="connection">

         <ref bean="openOfficeConnection" />

      </property>

   </bean>




I'm copying a .odt to the repository and the created and modified dates are for today.
Does the above look correct ?

derek
Star Contributor
Star Contributor
Hint: It's time to read the javadocs for the extractors.  You'll see which extractors handle the different mimetypes and which properties they pull out.  So you're using an odt document.  You're overriding the incorrect extractor for that.

Regards