cancel
Showing results for 
Search instead for 
Did you mean: 

Preserve Original Create and Modified Date

sanjaybandhniya
Elite Collaborator
Elite Collaborator

Hi,

I am trying to preserve original document create/modify date and it is working for pdf and office document.

I want same behaviour for images also.

Content Model : 

<?xml version="1.0" encoding="UTF-8"?>
<model name="demo:custom-model"
	xmlns="http://www.alfresco.org/model/dictionary/1.0">
	<description>Sample model for original creation and modification dates
	</description>
	<author>Sanjay</author>
	<version>1.0</version>
	<imports>
		<import uri="http://www.alfresco.org/model/dictionary/1.0"
			prefix="d" />
	</imports>
	<namespaces>
		<namespace
			uri="http://www.alfresco.com/model/custom-model/1.0" prefix="demo" />
	</namespaces>
	<aspects>
		<aspect name="demo:testAuditMetadata">
			<title>Test Audit Metadata</title>
			<description>Test Audit Metadata</description>
			<properties>
				<property name="demo:originalCreatedDate">
					<title>Original Created Date</title>
					<description>Original Created Date</description>
					<type>d:text</type>
				</property>
				<property name="demo:originalModifiedDate">
					<title>Original Modified Date</title>
					<description>Original Modified Date</description>
					<type>d:text</type>
				</property>
			</properties>
		</aspect>
	</aspects>
</model>

Bean :

<?xml version='1.0' encoding='UTF-8'?>
<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
          http://www.springframework.org/schema/beans/spring-beans-2.0.xsd">
	<bean id="extracter.TikaAuto"		class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter"  parent="baseMetadataExtracter">
		<constructor-arg>
			<ref bean="tikaConfig" />
		</constructor-arg>
		<property name="overwritePolicy">
			<value>EAGER</value>
		</property>
		<property name="mappingProperties">
			<bean 		class="org.springframework.beans.factory.config.PropertiesFactoryBean">
				<property name="location">	<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties
					</value>
				</property>
			</bean>
		</property>
	</bean>
	<bean id="extracter.PDFBox"
	class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter"
		parent="baseMetadataExtracter">
		<property name="documentSelector"
			ref="pdfBoxEmbededDocumentSelector" />
		<property name="overwritePolicy">
			<value>EAGER</value>
		</property>
		<!-- Including custom properties -->
		<property name="mappingProperties">
			<bean				class="org.springframework.beans.factory.config.PropertiesFactoryBean">
				<property name="location">					<value>classpath:alfresco/metadata/PdfBoxMetadataExtracter.properties
					</value>
				</property>
			</bean>
		</property>
	</bean>
	<bean id="extracter.Poi"		class="org.alfresco.repo.content.metadata.PoiMetadataExtracter"
		parent="baseMetadataExtracter">
		<property name="poiFootnotesLimit"
			value="${content.transformer.Poi.poiFootnotesLimit}" />
		<property name="poiExtractPropertiesOnly"			value="${content.transformer.Poi.poiExtractPropertiesOnly}" />
		<property name="poiAllowableXslfRelationshipTypes">
			<list>
				<!-- These values are valid for Office 2007, 2010 and 2013 -->				<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps
				</value>				<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps
				</value>
			</list>
		</property>
		<property name="overwritePolicy">
			<value>EAGER</value>
		</property>
		<!-- Including custom properties -->
		<property name="mappingProperties">
			<bean				class="org.springframework.beans.factory.config.PropertiesFactoryBean">
				<property name="location">					<value>classpath:alfresco/metadata/PoiMetadataExtracter.properties
					</value>
				</property>
			</bean>
		</property>
	</bean>
</beans>

PdfBoxMetadataExtracter.properties and PoiMetadataExtracter.properties

# Namespaces
namespace.prefix.cm=http://www.alfresco.org/model/content/1.0
#Custom model namespace
namespace.prefix.demo=http://www.alfresco.com/model/custom-model/1.0

# OOTB Default Mappings
author=cm:author
title=cm:title
description=cm:description

# Custom Properties to be mapped
created=demo:originalCreatedDate
modified=demo:originalModifiedDate

TikaAutoMetadataExtracter.properties

# Namespaces
namespace.prefix.cm=http://www.alfresco.org/model/content/1.0
namespace.prefix.exif=http://www.alfresco.org/model/exif/1.0
namespace.prefix.audio=http://www.alfresco.org/model/audio/1.0
#Custom model namespace
namespace.prefix.demo=http://www.alfresco.com/model/custom-model/1.0

# OOTB Default Mappings
author=cm:author
title=cm:title
description=cm:description
created=cm:created

# Custom Properties to be mapped
created=demo:originalCreatedDate
modified=demo:originalModifiedDate

geo\:lat=cm:latitude
geo\:long=cm:longitude
tiff\:ImageWidth=exif:pixelXDimension
tiff\:ImageLength=exif:pixelYDimension
tiff\:Make=exif:manufacturer
tiff\:Model=exif:model
tiff\:Software=exif:software
tiff\:Orientation=exif:orientation
tiff\:XResolution=exif:xResolution
tiff\:YResolution=exif:yResolution
tiff\:ResolutionUnit=exif:resolutionUnit
exif\:Flash=exif:flash
exif\:ExposureTime=exif:exposureTime
exif\:FNumber=exif:fNumber
exif\:FocalLength=exif:focalLength
exif\:IsoSpeedRatings=exif:isoSpeedRatings
exif\:DateTimeOriginal=exif:dateTimeOriginal
xmpDM\:album=audio:album
xmpDM\:artist=audio:artist
xmpDM\:composer=audio:composer
xmpDM\:engineer=audio:engineer
xmpDM\:genre=audio:genre
xmpDM\:trackNumber=audio:trackNumber
xmpDM\:releaseDate=audio:releaseDate
#xmpDM:logComment
xmpDM\:audioSampleRate=audio:sampleRate
xmpDM\:audioSampleType=audio:sampleType
xmpDM\:audioChannelType=audio:channelType
xmpDM\:audioCompressor=audio:compressor

Let me know what I am missign for image metadata extractor.

1 ACCEPTED ANSWER

abhinavmishra14
World-Class Innovator
World-Class Innovator

@sanjaybandhniya  Replied here alread: https://hub.alfresco.com/t5/alfresco-content-services-forum/how-to-preserve-original-document-create...

Sharing the response on this thread of clarity.

Find the demo project here:

https://github.com/abhinavmishra14/alfresco-metadataextraction-demo

I had an observation between community and enterprise versions. Examples i gave above works perfectly fine with enterprise versions of 5.2.x (i used 5.2.6) and 6.1.x(used 6.1), but properties files are not getting picked correctly (its some sort of intermittent behavior) on community editions. 

Only change i did is highlighted below for community edition and it picks up always corretly.

<property name="mappingProperties">
    <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
	<property name="location">
	   <value>classpath:alfresco/module/${project.artifactId}/metadataextraction/TikaAutoMetadataExtracter.properties</value>
	</property>
    </bean>
</property>

On enterprise version both works fine, above path and below given path as well:

<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>

This one also works on both versions:

<value>classpath:alfresco/extension/metadata/TikaAutoMetadataExtracter.properties</value>

https://github.com/abhinavmishra14/alfresco-metadataextraction-demo/blob/master/metadata-extractor-d...

I am not sure what difference the two type of versions (community and enterprise) has in terms of extension points, tried looking at source code but no clues. But good news is that the other path i shared above (available in demo project) works fine for both community and enterprise versions.

Hope this helps. 

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

View answer in original post

7 REPLIES 7

abhinavmishra14
World-Class Innovator
World-Class Innovator

@sanjaybandhniya  Replied here alread: https://hub.alfresco.com/t5/alfresco-content-services-forum/how-to-preserve-original-document-create...

Sharing the response on this thread of clarity.

Find the demo project here:

https://github.com/abhinavmishra14/alfresco-metadataextraction-demo

I had an observation between community and enterprise versions. Examples i gave above works perfectly fine with enterprise versions of 5.2.x (i used 5.2.6) and 6.1.x(used 6.1), but properties files are not getting picked correctly (its some sort of intermittent behavior) on community editions. 

Only change i did is highlighted below for community edition and it picks up always corretly.

<property name="mappingProperties">
    <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
	<property name="location">
	   <value>classpath:alfresco/module/${project.artifactId}/metadataextraction/TikaAutoMetadataExtracter.properties</value>
	</property>
    </bean>
</property>

On enterprise version both works fine, above path and below given path as well:

<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>

This one also works on both versions:

<value>classpath:alfresco/extension/metadata/TikaAutoMetadataExtracter.properties</value>

https://github.com/abhinavmishra14/alfresco-metadataextraction-demo/blob/master/metadata-extractor-d...

I am not sure what difference the two type of versions (community and enterprise) has in terms of extension points, tried looking at source code but no clues. But good news is that the other path i shared above (available in demo project) works fine for both community and enterprise versions.

Hope this helps. 

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

Thank you @abhinavmishra14 

I dont know what could be the issue , for me result  of image is uncertain.

For you it is working for all Images?

Is there any wan we can unable this for FTP?

I tested on jpeg and png images i have and works fine for them. If tika is able to extract "created" metadata it would apply that. For the images where you are not seeing the created/modified metadata, try checking what you get in the log for 

Found: {....}

Mapped and Accepted: {....}

changed: {...}

Like this one:

Mapped and Accepted: {{http://www.alfresco.org/model/exif/1.0}focalLength=4.5, {http://www.alfresco.org/model/exif/1.0}model=TG-5, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}flash=false, {http://www.alfresco.org/model/exif/1.0}fNumber=8.0, {http://www.alfresco.org/model/exif/1.0}isoSpeedRatings=100, {http://www.alfresco.org/model/content/1.0}description={en_US=OLYMPUS DIGITAL CAMERA}, {http://www.alfresco.org/model/exif/1.0}dateTimeOriginal=Sun May 07 13:23:51 EDT 2017, {http://www.alfresco.org/model/exif/1.0}manufacturer=OLYMPUS CORPORATION, {http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2017-05-07T13:23:51, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=590, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=442, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.alfresco.org/model/exif/1.0}exposureTime=0.005}
2020-05-22 09:58:00,041 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-14] Completed metadata extraction: 
reader:    ContentAccessor[ contentUrl=store://2020/5/22/9/57/db13881d-4caf-4a72-a481-054bb9246b63.bin, mimetype=image/jpeg, size=94399, encoding=UTF-8, locale=en_US]
extracter: org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter@126bd574
changed:   {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2017-05-07T13:23:51,{http://www.alfresco.org/model/exif/1.0}focalLength=4.5, {http://www.alfresco.org/model/exif/1.0}model=TG-5, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}flash=false, {http://www.alfresco.org/model/exif/1.0}fNumber=8.0, {http://www.alfresco.org/model/exif/1.0}isoSpeedRatings=100, {http://www.alfresco.org/model/content/1.0}description={en_US=OLYMPUS DIGITAL CAMERA}, {http://www.alfresco.org/model/exif/1.0}dateTimeOriginal=Sun May 07 13:23:51 EDT 2017, {http://www.alfresco.org/model/exif/1.0}manufacturer=OLYMPUS CORPORATION, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=590, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=442, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.alfresco.org/model/exif/1.0}exposureTime=0.005}

Ideally it should also work for images/files uploaded via FTP as well because extraction is happening after nodes are already created in repo. I haven't checked for FTP though. 

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

For Image I am getting this.

2020-05-27 12:33:33,923  DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-3] Starting metadata extraction: 
   reader: ContentAccessor[ contentUrl=store://2020/5/27/12/33/751e874b-3f94-4cfa-ae75-2965f94505aa.bin, mimetype=image/jpeg, size=138468, encoding=UTF-8, locale=en_US]
   extracter: org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter@58e342ff
 2020-05-27 12:33:33,949  DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-3] Converted extracted raw values to system values: 
   Raw Properties:    {date=2020-05-26T17:19:21, Compression Type=Progressive, Huffman, Data Precision=8 bits, Number of Components=3, tiff:ImageLength=720, Component 2=Cb component: Quantization table 1, Sampling factors 1 horiz/1 vert, dcterms:created=2020-05-26T17:19:21, Component 1=Y component: Quantization table 0, Sampling factors 2 horiz/2 vert, dcterms:modified=2020-05-26T17:19:21, Last-Modified=2020-05-26T17:19:21, title=null, X Resolution=96 dots, Last-Save-Date=2020-05-26T17:19:21, Component 3=Cr component: Quantization table 1, Sampling factors 1 horiz/1 vert, meta:save-date=2020-05-26T17:19:21, modified=2020-05-26T17:19:21, tiff:BitsPerSample=8, Content-Type=image/jpeg, Resolution Units=inch, comments=null, meta:creation-date=2020-05-26T17:19:21, author=null, created=2020-05-26T17:19:21, Date/Time=2020:05:26 17:19:21, Creation-Date=2020-05-26T17:19:21, Image Height=720 pixels, Unknown tag (0x000b)=Windows Photo Editor 10.0.10011.16384, Orientation=Right side, top (Rotate 90 CW), tiff:Orientation=6, Image Width=1280 pixels, tiff:Software=Windows Photo Editor 10.0.10011.16384, Unknown tag (0xea1c)=[2060 bytes], Software=Windows Photo Editor 10.0.10011.16384, tiff:ImageWidth=1280, Y Resolution=96 dots}
   System Properties: {{http://www.alfresco.com/model/custom-model/1.0}originalModifiedDate=2020-05-26T17:19:21, {http://www.alfresco.org/model/exif/1.0}orientation=6, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=720, {http://www.alfresco.com/model/custom-model/1.0}originalCreatedDate=2020-05-26T17:19:21, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=1280, {http://www.alfresco.org/model/exif/1.0}software=Windows Photo Editor 10.0.10011.16384, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=null}
 2020-05-27 12:33:33,950  DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-3] Extracted Metadata from ContentAccessor[ contentUrl=store://2020/5/27/12/33/751e874b-3f94-4cfa-ae75-2965f94505aa.bin, mimetype=image/jpeg, size=138468, encoding=UTF-8, locale=en_US]
  Found: {date=2020-05-26T17:19:21, Compression Type=Progressive, Huffman, Data Precision=8 bits, Number of Components=3, tiff:ImageLength=720, Component 2=Cb component: Quantization table 1, Sampling factors 1 horiz/1 vert, dcterms:created=2020-05-26T17:19:21, Component 1=Y component: Quantization table 0, Sampling factors 2 horiz/2 vert, dcterms:modified=2020-05-26T17:19:21, Last-Modified=2020-05-26T17:19:21, title=null, X Resolution=96 dots, Last-Save-Date=2020-05-26T17:19:21, Component 3=Cr component: Quantization table 1, Sampling factors 1 horiz/1 vert, meta:save-date=2020-05-26T17:19:21, modified=2020-05-26T17:19:21, tiff:BitsPerSample=8, Content-Type=image/jpeg, Resolution Units=inch, comments=null, meta:creation-date=2020-05-26T17:19:21, author=null, created=2020-05-26T17:19:21, Date/Time=2020:05:26 17:19:21, Creation-Date=2020-05-26T17:19:21, Image Height=720 pixels, Unknown tag (0x000b)=Windows Photo Editor 10.0.10011.16384, Orientation=Right side, top (Rotate 90 CW), tiff:Orientation=6, Image Width=1280 pixels, tiff:Software=Windows Photo Editor 10.0.10011.16384, Unknown tag (0xea1c)=[2060 bytes], Software=Windows Photo Editor 10.0.10011.16384, tiff:ImageWidth=1280, Y Resolution=96 dots}
  Mapped and Accepted: {{http://www.alfresco.org/model/exif/1.0}software=Windows Photo Editor 10.0.10011.16384, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}orientation=6, {http://www.alfresco.com/model/custom-model/1.0}originalCreatedDate=2020-05-26T17:19:21, {http://www.alfresco.com/model/custom-model/1.0}originalModifiedDate=2020-05-26T17:19:21, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=1280, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=720, {http://www.alfresco.org/model/content/1.0}author=null}
 2020-05-27 12:33:33,950  DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-3] Completed metadata extraction: 
   reader:    ContentAccessor[ contentUrl=store://2020/5/27/12/33/751e874b-3f94-4cfa-ae75-2965f94505aa.bin, mimetype=image/jpeg, size=138468, encoding=UTF-8, locale=en_US]
   extracter: org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter@58e342ff
   changed:   {{http://www.alfresco.org/model/exif/1.0}software=Windows Photo Editor 10.0.10011.16384, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}orientation=6, {http://www.alfresco.com/model/custom-model/1.0}originalCreatedDate=2020-05-26T17:19:21, {http://www.alfresco.com/model/custom-model/1.0}originalModifiedDate=2020-05-26T17:19:21, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=1280, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=720, {http://www.alfresco.org/model/content/1.0}author=null}
 

For FTP, it is not working.Log is also not getting.