cancel
Showing results for 
Search instead for 
Did you mean: 

How to preserve original document create and modified date during upload

sanjaybandhaniya
Elite Collaborator
Elite Collaborator

Hi,

I want to preserve original document create and modified date during upload. how can I achieve that?

If this is possible then will It preserve during FTP upload?

3 ACCEPTED ANSWERS

Ok, if you are able to locate the extracted metadata in log by AbstractMappingMetadataExtracter/PdfBoxMetadataExtracter and check that Found: {..............} has 'created/modified' metadata but Mapped and Accepted: {............} doesn't show it then here is what could be happening.

  • When file is uploaded, during node creation cm:auditable aspect is applied. It contains "cm:created" and "cm:modified" properties which are set during the node creation.  These properties are protected and mandatory properties (see the details below) defined in ootb content-model.xml. When a property is defined as "protected", it means once the value is set, it can not be updated i.e. becomes read-only. 

         So, these values are set at the time of node creation and marked read-only after that. 

<property name="cm:created">
	<title>Created</title>
	<type>d:datetime</type>
	<protected>true</protected>
	<mandatory enforced="true">true</mandatory>
	<index enabled="true">
		<atomic>true</atomic>
		<stored>false</stored> 
		<tokenised>both</tokenised>
		<facetable>true</facetable>
	</index>
</property>

<property name="cm:modified">
	<title>Modified</title>
	<type>d:datetime</type>
	<protected>true</protected>
	<mandatory enforced="true">true</mandatory>
	<index enabled="true">
		<atomic>true</atomic>
		<stored>false</stored> 
		<tokenised>both</tokenised>
		<facetable>true</facetable>
	</index>
</property>

The alternative solution for this would be create your custom properties in your custom content model; and keep the created/modified matadata values on those custom properties for your use. Unless you want to override the default behavior of auditable aspect properties which i believe would not be a good idea. 

For example:::::

Create following properties in your custom content model:

<aspect name="demo:customAuditMetadata">
	<title>Custom Audit Metadata</title>
	<description>Custom Audit Metadata</description>
	<properties>
		<property name="demo:originCreatedDate">
			<title>Original Created Date</title>
			<description>Created date of files based on incoming metadata extracted from metadata extractor</description>
			<type>d:text</type>
		</property>	
		<property name="demo:originModifiedDate">
			<title>Original Modified Date</title>
			<description>Modified date of files based on incoming metadata extracted from metadata extractor</description>
			<type>d:text</type>
		</property>	
	</properties>
</aspect>

Add following bean definition and add the above properties in the mappingProperties:

<bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter">
 <property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" />
 <property name="inheritDefaultMapping">
	 <value>false</value>
 </property>
 <property name="overwritePolicy">
        <!-- Allow extraction happens all the time (e.g. when content is updated or new version is uploaded).-->
	<value>EAGER</value>
 </property>
 <property name="mappingProperties">
	  <props>
		 <prop key="namespace.prefix.demo">http://www.github.com/model/demo/1.0</prop>
		 <prop key="created">demo:originCreatedDate</prop>
		 <prop key="modified">demo:originModifiedDate</prop>
	</props>
 </property>
</bean>

- Update the share config to display the newly added properties on document-details page as needed.

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

View answer in original post

@sanjaybandhniya  Find the demo project here:

https://github.com/abhinavmishra14/alfresco-metadataextraction-demo

I had an observation between community and enterprise versions. Examples i gave above works perfectly fine with enterprise versions of 5.2.x (i used 5.2.6) and 6.1.x(used 6.1), but properties files are not getting picked correctly (its some sort of intermittent behavior) on community editions. 

Only change i did is highlighted below for community edition and it picks up always corretly.

<property name="mappingProperties">
    <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
	<property name="location">
	   <value>classpath:alfresco/module/${project.artifactId}/metadataextraction/TikaAutoMetadataExtracter.properties</value>
	</property>
    </bean>
</property>

On enterprise version both works fine, above path and below given path as well:

<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>

This one also works on both versions:

<value>classpath:alfresco/extension/metadata/TikaAutoMetadataExtracter.properties</value>

https://github.com/abhinavmishra14/alfresco-metadataextraction-demo/blob/master/metadata-extractor-d...

I am not sure what difference the two type of versions (community and enterprise) has in terms of extension points, tried looking at source code but no clues. But good news is that the other path i shared above (available in demo project) works fine for both community and enterprise versions.

Hope this helps trim down your issue. 

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

View answer in original post

I have created an alternative deployment for Docker that allows apply this configuration.

Sample project available in https://github.com/aborroy/alfresco-custom-metadata-extractor

Hyland Developer Evangelist

View answer in original post

24 REPLIES 24

bip1989
Star Contributor
Star Contributor

I had a similar requirement in my project. And to keep the original create/creator/modified/modifier values when documents are uploaded initially, we added custom auditable aspect in our custom model. And we apply the aspect with same values as we can see on olfresco's auditable aspect while documents are created and we used document creation behavior to do this.

Alfresco then keeps on updating its original auditable aspect based on further updates on document. And our custom aspect remains unchanged

I am not talking about alfresco upload date,I want to preserve document original creation date during upload.

Please check below Image.

image

OOTB Metadata extrator does maps extraction and application of created date metadata. If you look at the pdfbox metadata extactor properties you would notice that "created" metadata is mapped to "cm:created".

# Mappings
author=cm:author
title=cm:title
description=cm:description
created=cm:created

https://github.com/Alfresco/alfresco-repository/blob/master/src/main/resources/alfresco/metadata/Pdf...

This is the class which may be parsing the newly uploaded pdf files and extracting their available metadata and map them to content model metadata:

https://github.com/Alfresco/alfresco-repository/blob/master/src/main/java/org/alfresco/repo/content/...

You can enable following logs to see if metafdata is getting extracted or not:

log4j.logger.org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter=DEBUG
log4j.logger.org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter=DEBUG
log4j.logger.org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter=DEBUG

Have a look at this test class as well which tests about "createdate" metadata.:

https://github.com/Alfresco/alfresco-repository/blob/master/src/test/java/org/alfresco/repo/content/...

You can also look at auto metada extractor impl as well for reference: https://github.com/Alfresco/alfresco-repository/blob/master/src/main/java/org/alfresco/repo/content/...

https://github.com/Alfresco/alfresco-repository/blob/master/src/main/resources/alfresco/metadata/Tik...

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

Hi,

I have checked all these class but I am not getting idea what customization I have to make to unable mapping of original document created date with cm:created.

Ok, if you are able to locate the extracted metadata in log by AbstractMappingMetadataExtracter/PdfBoxMetadataExtracter and check that Found: {..............} has 'created/modified' metadata but Mapped and Accepted: {............} doesn't show it then here is what could be happening.

  • When file is uploaded, during node creation cm:auditable aspect is applied. It contains "cm:created" and "cm:modified" properties which are set during the node creation.  These properties are protected and mandatory properties (see the details below) defined in ootb content-model.xml. When a property is defined as "protected", it means once the value is set, it can not be updated i.e. becomes read-only. 

         So, these values are set at the time of node creation and marked read-only after that. 

<property name="cm:created">
	<title>Created</title>
	<type>d:datetime</type>
	<protected>true</protected>
	<mandatory enforced="true">true</mandatory>
	<index enabled="true">
		<atomic>true</atomic>
		<stored>false</stored> 
		<tokenised>both</tokenised>
		<facetable>true</facetable>
	</index>
</property>

<property name="cm:modified">
	<title>Modified</title>
	<type>d:datetime</type>
	<protected>true</protected>
	<mandatory enforced="true">true</mandatory>
	<index enabled="true">
		<atomic>true</atomic>
		<stored>false</stored> 
		<tokenised>both</tokenised>
		<facetable>true</facetable>
	</index>
</property>

The alternative solution for this would be create your custom properties in your custom content model; and keep the created/modified matadata values on those custom properties for your use. Unless you want to override the default behavior of auditable aspect properties which i believe would not be a good idea. 

For example:::::

Create following properties in your custom content model:

<aspect name="demo:customAuditMetadata">
	<title>Custom Audit Metadata</title>
	<description>Custom Audit Metadata</description>
	<properties>
		<property name="demo:originCreatedDate">
			<title>Original Created Date</title>
			<description>Created date of files based on incoming metadata extracted from metadata extractor</description>
			<type>d:text</type>
		</property>	
		<property name="demo:originModifiedDate">
			<title>Original Modified Date</title>
			<description>Modified date of files based on incoming metadata extracted from metadata extractor</description>
			<type>d:text</type>
		</property>	
	</properties>
</aspect>

Add following bean definition and add the above properties in the mappingProperties:

<bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter">
 <property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" />
 <property name="inheritDefaultMapping">
	 <value>false</value>
 </property>
 <property name="overwritePolicy">
        <!-- Allow extraction happens all the time (e.g. when content is updated or new version is uploaded).-->
	<value>EAGER</value>
 </property>
 <property name="mappingProperties">
	  <props>
		 <prop key="namespace.prefix.demo">http://www.github.com/model/demo/1.0</prop>
		 <prop key="created">demo:originCreatedDate</prop>
		 <prop key="modified">demo:originModifiedDate</prop>
	</props>
 </property>
</bean>

- Update the share config to display the newly added properties on document-details page as needed.

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

Hi,

I have tried given approach and its working for alfresco upload and CMIS sync.Is there any way for FTP to achieve same thing.

Hi,

I want to unable this for Pdf/Office Document and  image files so which other class I need to use other than

org.alfresco.repo.content.metadata.PoiMetadataExtracter , 

org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter

If you look at the https://github.com/Alfresco/alfresco-repository/blob/alfresco-repository-6.8/src/main/resources/alfr...

there is extractors configured based on specific mimetypes. Their mapping is within properties file here https://github.com/Alfresco/alfresco-repository/tree/alfresco-repository-6.8/src/main/resources/alfr...

TikaAutoMetadataExtracter takes care of other mimetypes which doesn't have specific extractors, It uses AutoDetectParser for parsing and extraction. E.g. for images

Identify all the file types you want to extend, and add appropriate bean config (copy from content-service-context.xml for reference) to inject your custom properties. Copy the ootb properties files and keep it under "alfresco/metadata/" classpath in your project. e.g. : alfresco/metadata/TikaAutoMetadataExtractor.properties. 

OR use the mapping like this:

<property name="mappingProperties">
<props>
   <prop key="namespace.prefix.cm">http://www.alfresco.org/model/content/1.0</prop> 
   <prop key="namespace.prefix.demo">http://www.github.com/model/demo/1.0</prop>
  <prop key="author">cm:author</prop> 
  <prop key="title">cm:title</prop> 
  <prop key="subject">cm:description</prop>
  <prop key="created">demo:originCreatedDate</prop>
  <prop key="modified">demo:originModifiedDate</prop>
</props>
</property>

In TikaAutoMetadataExtractor.properties file there are so many mappings, so for this you should choose to use properties file directly instead of mapping the properties within bean definition.

An example for Images, PDF, Office :

<bean id="extracter.TikaAuto" class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter"
		parent="baseMetadataExtracter">
		<constructor-arg>
			<ref bean="tikaConfig" />
		</constructor-arg>

		<property name="overwritePolicy">
			<value>EAGER</value>
		</property>

		<property name="mappingProperties">
			<bean
				class="org.springframework.beans.factory.config.PropertiesFactoryBean">
				<property name="location">
					<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>
				</property>
			</bean>
		</property>
	</bean>

	<bean id="extracter.PDFBox"
		class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter"
		parent="baseMetadataExtracter">
		<property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" />
		<property name="overwritePolicy">
			<value>EAGER</value>
		</property>

		<property name="mappingProperties">
			<bean
				class="org.springframework.beans.factory.config.PropertiesFactoryBean">
				<property name="location">
					<value>classpath:alfresco/metadata/PdfBoxMetadataExtracter.properties</value>
				</property>
			</bean>
		</property>
	</bean>

	<bean id="extracter.Poi"
		class="org.alfresco.repo.content.metadata.PoiMetadataExtracter"
		parent="baseMetadataExtracter">
		<property name="poiFootnotesLimit" value="${content.transformer.Poi.poiFootnotesLimit}" />
		<property name="poiExtractPropertiesOnly" value="${content.transformer.Poi.poiExtractPropertiesOnly}" />
		<property name="poiAllowableXslfRelationshipTypes">
			<list>
				<!-- These values are valid for Office 2007, 2010 and 2013 -->
				<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps</value>
				<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps</value>
			</list>
		</property>
		
		<property name="overwritePolicy">
			<value>EAGER</value>
		</property>

		<property name="mappingProperties">
			<bean
				class="org.springframework.beans.factory.config.PropertiesFactoryBean">
				<property name="location">
					<value>classpath:alfresco/metadata/PoiMetadataExtracter.properties</value>
				</property>
			</bean>
		</property>
	</bean>

If you choose to use properties file then add custom namespace and properties mapping in the properties filed mapped to selected extractors. 

For example:

in alfresco/metadata/TikaAutoMetadataExtractor.properties add:

namespace.prefix.demo=http://www.github.com/abhinavmishra14/model/demo/1.0

# Custom Properties to be mapped
created=demo:originCreatedDate
modified=demo:originModifiedDate

Same way in other selected extractor properties you can add above mappings.

For reference: https://docs.alfresco.com/6.0/references/dev-extension-points-custom-metadata-extractor.html

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

Hi, @abhinavmishra14 

I did the above configuration and it is working for PDF and Office Document.

Can you guide what configuration I have to do for image files?