cancel
Showing results for 
Search instead for 
Did you mean: 

How to preserve original document create and modified date during upload

sanjaybandhaniya
Elite Collaborator
Elite Collaborator

Hi,

I want to preserve original document create and modified date during upload. how can I achieve that?

If this is possible then will It preserve during FTP upload?

3 ACCEPTED ANSWERS

Ok, if you are able to locate the extracted metadata in log by AbstractMappingMetadataExtracter/PdfBoxMetadataExtracter and check that Found: {..............} has 'created/modified' metadata but Mapped and Accepted: {............} doesn't show it then here is what could be happening.

  • When file is uploaded, during node creation cm:auditable aspect is applied. It contains "cm:created" and "cm:modified" properties which are set during the node creation.  These properties are protected and mandatory properties (see the details below) defined in ootb content-model.xml. When a property is defined as "protected", it means once the value is set, it can not be updated i.e. becomes read-only. 

         So, these values are set at the time of node creation and marked read-only after that. 

<property name="cm:created">
	<title>Created</title>
	<type>d:datetime</type>
	<protected>true</protected>
	<mandatory enforced="true">true</mandatory>
	<index enabled="true">
		<atomic>true</atomic>
		<stored>false</stored> 
		<tokenised>both</tokenised>
		<facetable>true</facetable>
	</index>
</property>

<property name="cm:modified">
	<title>Modified</title>
	<type>d:datetime</type>
	<protected>true</protected>
	<mandatory enforced="true">true</mandatory>
	<index enabled="true">
		<atomic>true</atomic>
		<stored>false</stored> 
		<tokenised>both</tokenised>
		<facetable>true</facetable>
	</index>
</property>

The alternative solution for this would be create your custom properties in your custom content model; and keep the created/modified matadata values on those custom properties for your use. Unless you want to override the default behavior of auditable aspect properties which i believe would not be a good idea. 

For example:::::

Create following properties in your custom content model:

<aspect name="demo:customAuditMetadata">
	<title>Custom Audit Metadata</title>
	<description>Custom Audit Metadata</description>
	<properties>
		<property name="demo:originCreatedDate">
			<title>Original Created Date</title>
			<description>Created date of files based on incoming metadata extracted from metadata extractor</description>
			<type>d:text</type>
		</property>	
		<property name="demo:originModifiedDate">
			<title>Original Modified Date</title>
			<description>Modified date of files based on incoming metadata extracted from metadata extractor</description>
			<type>d:text</type>
		</property>	
	</properties>
</aspect>

Add following bean definition and add the above properties in the mappingProperties:

<bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter">
 <property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" />
 <property name="inheritDefaultMapping">
	 <value>false</value>
 </property>
 <property name="overwritePolicy">
        <!-- Allow extraction happens all the time (e.g. when content is updated or new version is uploaded).-->
	<value>EAGER</value>
 </property>
 <property name="mappingProperties">
	  <props>
		 <prop key="namespace.prefix.demo">http://www.github.com/model/demo/1.0</prop>
		 <prop key="created">demo:originCreatedDate</prop>
		 <prop key="modified">demo:originModifiedDate</prop>
	</props>
 </property>
</bean>

- Update the share config to display the newly added properties on document-details page as needed.

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

View answer in original post

@sanjaybandhniya  Find the demo project here:

https://github.com/abhinavmishra14/alfresco-metadataextraction-demo

I had an observation between community and enterprise versions. Examples i gave above works perfectly fine with enterprise versions of 5.2.x (i used 5.2.6) and 6.1.x(used 6.1), but properties files are not getting picked correctly (its some sort of intermittent behavior) on community editions. 

Only change i did is highlighted below for community edition and it picks up always corretly.

<property name="mappingProperties">
    <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
	<property name="location">
	   <value>classpath:alfresco/module/${project.artifactId}/metadataextraction/TikaAutoMetadataExtracter.properties</value>
	</property>
    </bean>
</property>

On enterprise version both works fine, above path and below given path as well:

<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>

This one also works on both versions:

<value>classpath:alfresco/extension/metadata/TikaAutoMetadataExtracter.properties</value>

https://github.com/abhinavmishra14/alfresco-metadataextraction-demo/blob/master/metadata-extractor-d...

I am not sure what difference the two type of versions (community and enterprise) has in terms of extension points, tried looking at source code but no clues. But good news is that the other path i shared above (available in demo project) works fine for both community and enterprise versions.

Hope this helps trim down your issue. 

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

View answer in original post

I have created an alternative deployment for Docker that allows apply this configuration.

Sample project available in https://github.com/aborroy/alfresco-custom-metadata-extractor

Hyland Developer Evangelist

View answer in original post

24 REPLIES 24

@sanjaybandhniya  Hope you have read the information shared above.

"TikaAutoMetadataExtracter takes care of other mimetypes which doesn't have specific extractors, It uses AutoDetectParser for parsing and extraction. E.g. for images"

And gave example of TikaAutoMetadataExtractor and other with bold letters: "An example for Images, PDF, Office"

Look ath this bean definition which is provided in above response as well:

<bean id="extracter.TikaAuto" class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter" parent="baseMetadataExtracter">

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

This is my bean.

<bean id="extracter.TikaAuto" class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter"
		parent="baseMetadataExtracter">
		<constructor-arg>
			<ref bean="tikaConfig" />
		</constructor-arg>
		<property name="overwritePolicy">
			<value>EAGER</value>
		</property>
		<property name="mappingProperties">
			<bean
				class="org.springframework.beans.factory.config.PropertiesFactoryBean">
				<property name="location">
					<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>
				</property>
			</bean>
		</property>
	</bean>

	<bean id="extracter.PDFBox"
		class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter"
		parent="baseMetadataExtracter">
		<property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" />
		<property name="overwritePolicy">
			<value>EAGER</value>
		</property>
		<property name="mappingProperties">
			<bean
				class="org.springframework.beans.factory.config.PropertiesFactoryBean">
				<property name="location">
					<value>classpath:alfresco/metadata/PdfBoxMetadataExtracter.properties</value>
				</property>
			</bean>
		</property>
	</bean>

	<bean id="extracter.Poi"
		class="org.alfresco.repo.content.metadata.PoiMetadataExtracter"
		parent="baseMetadataExtracter">
		<property name="poiFootnotesLimit" value="${content.transformer.Poi.poiFootnotesLimit}" />
		<property name="poiExtractPropertiesOnly" value="${content.transformer.Poi.poiExtractPropertiesOnly}" />
		<property name="poiAllowableXslfRelationshipTypes">
			<list>
				<!-- These values are valid for Office 2007, 2010 and 2013 -->
				<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps</value>
				<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps</value>
			</list>
		</property>
		<property name="overwritePolicy">
			<value>EAGER</value>
		</property>
		<property name="mappingProperties">
			<bean
				class="org.springframework.beans.factory.config.PropertiesFactoryBean">
				<property name="location">
					<value>classpath:alfresco/metadata/PoiMetadataExtracter.properties</value>
				</property>
			</bean>
		</property>
	</bean>

Your bean looks correct, what is the config in these files:

PdfBoxMetadataExtracter.properties
PoiMetadataExtracter.properties
TikaAutoMetadataExtracter.properties

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

Properties file having my custom properties.

namespace.prefix.ks=http://www.alfresco.com/model/custom-model/1.0
created=ksSmiley SurprisedriginalCreationDate
modified=ksSmiley SurprisedriginalModificationDate

My content Model

<aspects>
		<aspect name="ks:importedDoc">
			<properties>
				<property name="ks:originalCreationDate">
					<type>d:date</type>
				</property>
				<property name="ks:originalModificationDate">
					<type>d:date</type>
				</property>
			</properties>
		</aspect>
	</aspects>

It is working for Pdf and Office files.

Hmm kind of weird. It should work i think. Let me try at my end and see what i get. 

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

It seems to work perfectly. Try re-checking the configs and logs and see what you get.

Here is the test i did:

<bean id="extracter.TikaAuto" class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter"
parent="baseMetadataExtracter">
<constructor-arg>
	<ref bean="tikaConfig" />
</constructor-arg>
<property name="overwritePolicy">
	<value>EAGER</value>
</property>
<property name="mappingProperties">
	<bean
		class="org.springframework.beans.factory.config.PropertiesFactoryBean">
		<property name="location">
			<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>
		</property>
	</bean>
</property>
</bean>

<bean id="extracter.PDFBox"
class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter"
parent="baseMetadataExtracter">
<property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" />
<property name="overwritePolicy">
	<value>EAGER</value>
</property>
<!-- Including custom properties -->
<property name="mappingProperties">
	<bean
		class="org.springframework.beans.factory.config.PropertiesFactoryBean">
		<property name="location">
			<value>classpath:alfresco/metadata/PdfBoxMetadataExtracter.properties</value>
		</property>
	</bean>
</property>
</bean>

<bean id="extracter.Poi"
class="org.alfresco.repo.content.metadata.PoiMetadataExtracter"
parent="baseMetadataExtracter">
<property name="poiFootnotesLimit" value="${content.transformer.Poi.poiFootnotesLimit}" />
<property name="poiExtractPropertiesOnly" value="${content.transformer.Poi.poiExtractPropertiesOnly}" />
<property name="poiAllowableXslfRelationshipTypes">
	<list>
		<!-- These values are valid for Office 2007, 2010 and 2013 -->
		<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps</value>
		<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps</value>
	</list>
</property>
<property name="overwritePolicy">
	<value>EAGER</value>
</property>
<!-- Including custom properties -->
<property name="mappingProperties">
	<bean
		class="org.springframework.beans.factory.config.PropertiesFactoryBean">
		<property name="location">
			<value>classpath:alfresco/metadata/PoiMetadataExtracter.properties</value>
		</property>
	</bean>
</property>
</bean>

TikaAutoMetadataExtracter.properties

# Namespaces
namespace.prefix.cm=http://www.alfresco.org/model/content/1.0
namespace.prefix.exif=http://www.alfresco.org/model/exif/1.0
namespace.prefix.audio=http://www.alfresco.org/model/audio/1.0
#Custom model namespace namespace.prefix.demo=http://www.github.com/abhinavmishra14/model/demo/1.0
# OOTB Default Mappings author=cm:author title=cm:title description=cm:description created=cm:created
# Custom Properties to be mapped created=demo:originCreatedDate modified=demo:originModifiedDate
geo\:lat=cm:latitude geo\:long=cm:longitude tiff\:ImageWidth=exif:pixelXDimension tiff\:ImageLength=exif:pixelYDimension tiff\:Make=exif:manufacturer tiff\:Model=exif:model tiff\:Software=exif:software tiff\:Orientation=exif:orientation tiff\:XResolution=exif:xResolution tiff\:YResolution=exif:yResolution tiff\:ResolutionUnit=exif:resolutionUnit exif\:Flash=exif:flash exif\:ExposureTime=exif:exposureTime exif\:FNumber=exif:fNumber exif\:FocalLength=exif:focalLength exif\:IsoSpeedRatings=exif:isoSpeedRatings exif\:DateTimeOriginal=exif:dateTimeOriginal xmpDM\:album=audio:album xmpDM\:artist=audio:artist xmpDM\:composer=audio:composer xmpDM\:engineer=audio:engineer xmpDM\:genre=audio:genre xmpDM\:trackNumber=audio:trackNumber xmpDM\:releaseDate=audio:releaseDate #xmpDM:logComment xmpDM\:audioSampleRate=audio:sampleRate xmpDM\:audioSampleType=audio:sampleType xmpDM\:audioChannelType=audio:channelType xmpDM\:audioCompressor=audio:compressor

PdfBoxMetadataExtracter.properties

# Namespaces
namespace.prefix.cm=http://www.alfresco.org/model/content/1.0
#Custom model namespace namespace.prefix.demo=http://www.github.com/abhinavmishra14/model/demo/1.0
# OOTB Default Mappings author=cm:author title=cm:title subject=cm:description
# Custom Properties to be mapped created=demo:originCreatedDate modified=demo:originModifiedDate

PoiMetadataExtracter.properties

# Namespaces
namespace.prefix.cm=http://www.alfresco.org/model/content/1.0
#Custom model namespace namespace.prefix.demo=http://www.github.com/abhinavmishra14/model/demo/1.0
# OOTB Default Mappings author=cm:author title=cm:title description=cm:description
# Custom Properties to be mapped created=demo:originCreatedDate modified=demo:originModifiedDate

ContentModel:

<aspect name="demo:testAuditMetadata">
<title>Test Audit Metadata</title>
<description>Test Audit Metadata</description>
<properties>
<property name="demo:originCreatedDate">
<title>Original Created Date</title>
<description>Original Created Date</description>
<type>d:text</type>
</property>
<property name="demo:originModifiedDate">
<title>Original Modified Date</title>
<description>Original Modified Date</description>
<type>d:text</type>
</property>
</properties>
</aspect>

Log:

Image Extraction:
Mapped and Accepted: {{http://www.alfresco.org/model/exif/1.0}focalLength=4.5, {http://www.alfresco.org/model/exif/1.0}model=TG-5, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}flash=false, {http://www.alfresco.org/model/exif/1.0}fNumber=8.0, {http://www.alfresco.org/model/exif/1.0}isoSpeedRatings=100, {http://www.alfresco.org/model/content/1.0}description={en_US=OLYMPUS DIGITAL CAMERA}, {http://www.alfresco.org/model/exif/1.0}dateTimeOriginal=Sun May 07 13:23:51 EDT 2017, {http://www.alfresco.org/model/exif/1.0}manufacturer=OLYMPUS CORPORATION, {http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2017-05-07T13:23:51, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=590, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=442, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.alfresco.org/model/exif/1.0}exposureTime=0.005}
2020-05-22 09:58:00,041 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-14] Completed metadata extraction: 
reader:    ContentAccessor[ contentUrl=store://2020/5/22/9/57/db13881d-4caf-4a72-a481-054bb9246b63.bin, mimetype=image/jpeg, size=94399, encoding=UTF-8, locale=en_US]
extracter: org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter@126bd574
changed:   {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2017-05-07T13:23:51,{http://www.alfresco.org/model/exif/1.0}focalLength=4.5, {http://www.alfresco.org/model/exif/1.0}model=TG-5, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}flash=false, {http://www.alfresco.org/model/exif/1.0}fNumber=8.0, {http://www.alfresco.org/model/exif/1.0}isoSpeedRatings=100, {http://www.alfresco.org/model/content/1.0}description={en_US=OLYMPUS DIGITAL CAMERA}, {http://www.alfresco.org/model/exif/1.0}dateTimeOriginal=Sun May 07 13:23:51 EDT 2017, {http://www.alfresco.org/model/exif/1.0}manufacturer=OLYMPUS CORPORATION, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=590, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=442, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.alfresco.org/model/exif/1.0}exposureTime=0.005}

PDF Extraction:
Mapped and Accepted: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2018-10-26T20:36:24Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2018-10-26T20:36:28Z} 2020-05-22 09:58:11,676 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-2] Completed metadata extraction: reader: ContentAccessor[ contentUrl=store://2020/5/22/9/58/262e3dc1-5cfc-4558-9f01-fae20c5cae2d.bin, mimetype=application/pdf, size=3104712, encoding=UTF-8, locale=en_US] extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@8414655 changed: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2018-10-26T20:36:24Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2018-10-26T20:36:28Z}
Office extraction:
Mapped and Accepted: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2020-02-10T16:13:00Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=Abhinav, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2020-02-10T20:05:00Z} 2020-05-22 09:58:22,021 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-11] Completed metadata extraction: reader: ContentAccessor[ contentUrl=store://2020/5/22/9/58/f3281f14-7ffb-4d91-a3b2-d0fc8de305d5.bin, mimetype=application/vnd.openxmlformats-officedocument.wordprocessingml.document, size=3075453, encoding=UTF-8, locale=en_US] extracter: org.alfresco.repo.content.metadata.PoiMetadataExtracter@2752d52e changed: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2020-02-10T16:13:00Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=Abhinav, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2020-02-10T20:05:00Z}

Image metadata on share view details:

imageImage medata and original created/modified dates extratcted via TikaAutoMetadataExtracter

PDF And Office metadata on share view details:

ImageMetadata extracted via PdfBoxMetadataExtracterimageMetadata extracted via PoiMetadataExtracter

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

Hi,

If posssible then can you share demo that you have created because for Image,it is not working even I have used your code.

@sanjaybandhniya  Please share your contentmodel, share config, bean definition, extractor properties and log here. 

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

@sanjaybandhniya  Find the demo project here:

https://github.com/abhinavmishra14/alfresco-metadataextraction-demo

I had an observation between community and enterprise versions. Examples i gave above works perfectly fine with enterprise versions of 5.2.x (i used 5.2.6) and 6.1.x(used 6.1), but properties files are not getting picked correctly (its some sort of intermittent behavior) on community editions. 

Only change i did is highlighted below for community edition and it picks up always corretly.

<property name="mappingProperties">
    <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
	<property name="location">
	   <value>classpath:alfresco/module/${project.artifactId}/metadataextraction/TikaAutoMetadataExtracter.properties</value>
	</property>
    </bean>
</property>

On enterprise version both works fine, above path and below given path as well:

<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>

This one also works on both versions:

<value>classpath:alfresco/extension/metadata/TikaAutoMetadataExtracter.properties</value>

https://github.com/abhinavmishra14/alfresco-metadataextraction-demo/blob/master/metadata-extractor-d...

I am not sure what difference the two type of versions (community and enterprise) has in terms of extension points, tried looking at source code but no clues. But good news is that the other path i shared above (available in demo project) works fine for both community and enterprise versions.

Hope this helps trim down your issue. 

~Abhinav
(ACSCE, AWS SAA, Azure Admin)