05-23-2020 11:08 PM
Hi,
I am trying to preserve original document create/modify date and it is working for pdf and office document.
I want same behaviour for images also.
Content Model :
<?xml version="1.0" encoding="UTF-8"?> <model name="demo:custom-model" xmlns="http://www.alfresco.org/model/dictionary/1.0"> <description>Sample model for original creation and modification dates </description> <author>Sanjay</author> <version>1.0</version> <imports> <import uri="http://www.alfresco.org/model/dictionary/1.0" prefix="d" /> </imports> <namespaces> <namespace uri="http://www.alfresco.com/model/custom-model/1.0" prefix="demo" /> </namespaces> <aspects> <aspect name="demo:testAuditMetadata"> <title>Test Audit Metadata</title> <description>Test Audit Metadata</description> <properties> <property name="demo:originalCreatedDate"> <title>Original Created Date</title> <description>Original Created Date</description> <type>d:text</type> </property> <property name="demo:originalModifiedDate"> <title>Original Modified Date</title> <description>Original Modified Date</description> <type>d:text</type> </property> </properties> </aspect> </aspects> </model>
Bean :
<?xml version='1.0' encoding='UTF-8'?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd"> <bean id="extracter.TikaAuto" class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter" parent="baseMetadataExtracter"> <constructor-arg> <ref bean="tikaConfig" /> </constructor-arg> <property name="overwritePolicy"> <value>EAGER</value> </property> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties </value> </property> </bean> </property> </bean> <bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter"> <property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" /> <property name="overwritePolicy"> <value>EAGER</value> </property> <!-- Including custom properties --> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/PdfBoxMetadataExtracter.properties </value> </property> </bean> </property> </bean> <bean id="extracter.Poi" class="org.alfresco.repo.content.metadata.PoiMetadataExtracter" parent="baseMetadataExtracter"> <property name="poiFootnotesLimit" value="${content.transformer.Poi.poiFootnotesLimit}" /> <property name="poiExtractPropertiesOnly" value="${content.transformer.Poi.poiExtractPropertiesOnly}" /> <property name="poiAllowableXslfRelationshipTypes"> <list> <!-- These values are valid for Office 2007, 2010 and 2013 --> <value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps </value> <value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps </value> </list> </property> <property name="overwritePolicy"> <value>EAGER</value> </property> <!-- Including custom properties --> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/PoiMetadataExtracter.properties </value> </property> </bean> </property> </bean> </beans>
PdfBoxMetadataExtracter.properties and PoiMetadataExtracter.properties
# Namespaces namespace.prefix.cm=http://www.alfresco.org/model/content/1.0 #Custom model namespace namespace.prefix.demo=http://www.alfresco.com/model/custom-model/1.0 # OOTB Default Mappings author=cm:author title=cm:title description=cm:description # Custom Properties to be mapped created=demo:originalCreatedDate modified=demo:originalModifiedDate
TikaAutoMetadataExtracter.properties
# Namespaces namespace.prefix.cm=http://www.alfresco.org/model/content/1.0 namespace.prefix.exif=http://www.alfresco.org/model/exif/1.0 namespace.prefix.audio=http://www.alfresco.org/model/audio/1.0 #Custom model namespace namespace.prefix.demo=http://www.alfresco.com/model/custom-model/1.0 # OOTB Default Mappings author=cm:author title=cm:title description=cm:description created=cm:created # Custom Properties to be mapped created=demo:originalCreatedDate modified=demo:originalModifiedDate geo\:lat=cm:latitude geo\:long=cm:longitude tiff\:ImageWidth=exif:pixelXDimension tiff\:ImageLength=exif:pixelYDimension tiff\:Make=exif:manufacturer tiff\:Model=exif:model tiff\:Software=exif:software tiff\:Orientation=exif:orientation tiff\:XResolution=exif:xResolution tiff\:YResolution=exif:yResolution tiff\:ResolutionUnit=exif:resolutionUnit exif\:Flash=exif:flash exif\:ExposureTime=exif:exposureTime exif\:FNumber=exif:fNumber exif\:FocalLength=exif:focalLength exif\:IsoSpeedRatings=exif:isoSpeedRatings exif\:DateTimeOriginal=exif:dateTimeOriginal xmpDM\:album=audio:album xmpDM\:artist=audio:artist xmpDM\:composer=audio:composer xmpDM\:engineer=audio:engineer xmpDM\:genre=audio:genre xmpDM\:trackNumber=audio:trackNumber xmpDM\:releaseDate=audio:releaseDate #xmpDM:logComment xmpDM\:audioSampleRate=audio:sampleRate xmpDM\:audioSampleType=audio:sampleType xmpDM\:audioChannelType=audio:channelType xmpDM\:audioCompressor=audio:compressor
Let me know what I am missign for image metadata extractor.
05-25-2020 02:56 PM
@sanjaybandhniya Replied here alread: https://hub.alfresco.com/t5/alfresco-content-services-forum/how-to-preserve-original-document-create...
Sharing the response on this thread of clarity.
Find the demo project here:
https://github.com/abhinavmishra14/alfresco-metadataextraction-demo
I had an observation between community and enterprise versions. Examples i gave above works perfectly fine with enterprise versions of 5.2.x (i used 5.2.6) and 6.1.x(used 6.1), but properties files are not getting picked correctly (its some sort of intermittent behavior) on community editions.
Only change i did is highlighted below for community edition and it picks up always corretly.
<property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/module/${project.artifactId}/metadataextraction/TikaAutoMetadataExtracter.properties</value> </property> </bean> </property>
On enterprise version both works fine, above path and below given path as well:
<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>
This one also works on both versions:
<value>classpath:alfresco/extension/metadata/TikaAutoMetadataExtracter.properties</value>
I am not sure what difference the two type of versions (community and enterprise) has in terms of extension points, tried looking at source code but no clues. But good news is that the other path i shared above (available in demo project) works fine for both community and enterprise versions.
Hope this helps.
05-25-2020 02:56 PM
@sanjaybandhniya Replied here alread: https://hub.alfresco.com/t5/alfresco-content-services-forum/how-to-preserve-original-document-create...
Sharing the response on this thread of clarity.
Find the demo project here:
https://github.com/abhinavmishra14/alfresco-metadataextraction-demo
I had an observation between community and enterprise versions. Examples i gave above works perfectly fine with enterprise versions of 5.2.x (i used 5.2.6) and 6.1.x(used 6.1), but properties files are not getting picked correctly (its some sort of intermittent behavior) on community editions.
Only change i did is highlighted below for community edition and it picks up always corretly.
<property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/module/${project.artifactId}/metadataextraction/TikaAutoMetadataExtracter.properties</value> </property> </bean> </property>
On enterprise version both works fine, above path and below given path as well:
<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>
This one also works on both versions:
<value>classpath:alfresco/extension/metadata/TikaAutoMetadataExtracter.properties</value>
I am not sure what difference the two type of versions (community and enterprise) has in terms of extension points, tried looking at source code but no clues. But good news is that the other path i shared above (available in demo project) works fine for both community and enterprise versions.
Hope this helps.
05-26-2020 01:12 AM
Thank you @abhinavmishra14
I dont know what could be the issue , for me result of image is uncertain.
For you it is working for all Images?
Is there any wan we can unable this for FTP?
05-26-2020 09:18 AM
I tested on jpeg and png images i have and works fine for them. If tika is able to extract "created" metadata it would apply that. For the images where you are not seeing the created/modified metadata, try checking what you get in the log for
Found: {....}
Mapped and Accepted: {....}
changed: {...}
Like this one:
Mapped and Accepted: {{http://www.alfresco.org/model/exif/1.0}focalLength=4.5, {http://www.alfresco.org/model/exif/1.0}model=TG-5, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}flash=false, {http://www.alfresco.org/model/exif/1.0}fNumber=8.0, {http://www.alfresco.org/model/exif/1.0}isoSpeedRatings=100, {http://www.alfresco.org/model/content/1.0}description={en_US=OLYMPUS DIGITAL CAMERA}, {http://www.alfresco.org/model/exif/1.0}dateTimeOriginal=Sun May 07 13:23:51 EDT 2017, {http://www.alfresco.org/model/exif/1.0}manufacturer=OLYMPUS CORPORATION, {http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2017-05-07T13:23:51, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=590, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=442, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.alfresco.org/model/exif/1.0}exposureTime=0.005} 2020-05-22 09:58:00,041 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-14] Completed metadata extraction: reader: ContentAccessor[ contentUrl=store://2020/5/22/9/57/db13881d-4caf-4a72-a481-054bb9246b63.bin, mimetype=image/jpeg, size=94399, encoding=UTF-8, locale=en_US] extracter: org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter@126bd574 changed: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2017-05-07T13:23:51,{http://www.alfresco.org/model/exif/1.0}focalLength=4.5, {http://www.alfresco.org/model/exif/1.0}model=TG-5, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}flash=false, {http://www.alfresco.org/model/exif/1.0}fNumber=8.0, {http://www.alfresco.org/model/exif/1.0}isoSpeedRatings=100, {http://www.alfresco.org/model/content/1.0}description={en_US=OLYMPUS DIGITAL CAMERA}, {http://www.alfresco.org/model/exif/1.0}dateTimeOriginal=Sun May 07 13:23:51 EDT 2017, {http://www.alfresco.org/model/exif/1.0}manufacturer=OLYMPUS CORPORATION, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=590, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=442, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.alfresco.org/model/exif/1.0}exposureTime=0.005}
Ideally it should also work for images/files uploaded via FTP as well because extraction is happening after nodes are already created in repo. I haven't checked for FTP though.
05-27-2020 03:05 AM
For Image I am getting this.
2020-05-27 12:33:33,923 DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-3] Starting metadata extraction: reader: ContentAccessor[ contentUrl=store://2020/5/27/12/33/751e874b-3f94-4cfa-ae75-2965f94505aa.bin, mimetype=image/jpeg, size=138468, encoding=UTF-8, locale=en_US] extracter: org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter@58e342ff 2020-05-27 12:33:33,949 DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-3] Converted extracted raw values to system values: Raw Properties: {date=2020-05-26T17:19:21, Compression Type=Progressive, Huffman, Data Precision=8 bits, Number of Components=3, tiff:ImageLength=720, Component 2=Cb component: Quantization table 1, Sampling factors 1 horiz/1 vert, dcterms:created=2020-05-26T17:19:21, Component 1=Y component: Quantization table 0, Sampling factors 2 horiz/2 vert, dcterms:modified=2020-05-26T17:19:21, Last-Modified=2020-05-26T17:19:21, title=null, X Resolution=96 dots, Last-Save-Date=2020-05-26T17:19:21, Component 3=Cr component: Quantization table 1, Sampling factors 1 horiz/1 vert, meta:save-date=2020-05-26T17:19:21, modified=2020-05-26T17:19:21, tiff:BitsPerSample=8, Content-Type=image/jpeg, Resolution Units=inch, comments=null, meta:creation-date=2020-05-26T17:19:21, author=null, created=2020-05-26T17:19:21, Date/Time=2020:05:26 17:19:21, Creation-Date=2020-05-26T17:19:21, Image Height=720 pixels, Unknown tag (0x000b)=Windows Photo Editor 10.0.10011.16384, Orientation=Right side, top (Rotate 90 CW), tiff:Orientation=6, Image Width=1280 pixels, tiff:Software=Windows Photo Editor 10.0.10011.16384, Unknown tag (0xea1c)=[2060 bytes], Software=Windows Photo Editor 10.0.10011.16384, tiff:ImageWidth=1280, Y Resolution=96 dots} System Properties: {{http://www.alfresco.com/model/custom-model/1.0}originalModifiedDate=2020-05-26T17:19:21, {http://www.alfresco.org/model/exif/1.0}orientation=6, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=720, {http://www.alfresco.com/model/custom-model/1.0}originalCreatedDate=2020-05-26T17:19:21, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=1280, {http://www.alfresco.org/model/exif/1.0}software=Windows Photo Editor 10.0.10011.16384, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=null} 2020-05-27 12:33:33,950 DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-3] Extracted Metadata from ContentAccessor[ contentUrl=store://2020/5/27/12/33/751e874b-3f94-4cfa-ae75-2965f94505aa.bin, mimetype=image/jpeg, size=138468, encoding=UTF-8, locale=en_US] Found: {date=2020-05-26T17:19:21, Compression Type=Progressive, Huffman, Data Precision=8 bits, Number of Components=3, tiff:ImageLength=720, Component 2=Cb component: Quantization table 1, Sampling factors 1 horiz/1 vert, dcterms:created=2020-05-26T17:19:21, Component 1=Y component: Quantization table 0, Sampling factors 2 horiz/2 vert, dcterms:modified=2020-05-26T17:19:21, Last-Modified=2020-05-26T17:19:21, title=null, X Resolution=96 dots, Last-Save-Date=2020-05-26T17:19:21, Component 3=Cr component: Quantization table 1, Sampling factors 1 horiz/1 vert, meta:save-date=2020-05-26T17:19:21, modified=2020-05-26T17:19:21, tiff:BitsPerSample=8, Content-Type=image/jpeg, Resolution Units=inch, comments=null, meta:creation-date=2020-05-26T17:19:21, author=null, created=2020-05-26T17:19:21, Date/Time=2020:05:26 17:19:21, Creation-Date=2020-05-26T17:19:21, Image Height=720 pixels, Unknown tag (0x000b)=Windows Photo Editor 10.0.10011.16384, Orientation=Right side, top (Rotate 90 CW), tiff:Orientation=6, Image Width=1280 pixels, tiff:Software=Windows Photo Editor 10.0.10011.16384, Unknown tag (0xea1c)=[2060 bytes], Software=Windows Photo Editor 10.0.10011.16384, tiff:ImageWidth=1280, Y Resolution=96 dots} Mapped and Accepted: {{http://www.alfresco.org/model/exif/1.0}software=Windows Photo Editor 10.0.10011.16384, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}orientation=6, {http://www.alfresco.com/model/custom-model/1.0}originalCreatedDate=2020-05-26T17:19:21, {http://www.alfresco.com/model/custom-model/1.0}originalModifiedDate=2020-05-26T17:19:21, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=1280, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=720, {http://www.alfresco.org/model/content/1.0}author=null} 2020-05-27 12:33:33,950 DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-3] Completed metadata extraction: reader: ContentAccessor[ contentUrl=store://2020/5/27/12/33/751e874b-3f94-4cfa-ae75-2965f94505aa.bin, mimetype=image/jpeg, size=138468, encoding=UTF-8, locale=en_US] extracter: org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter@58e342ff changed: {{http://www.alfresco.org/model/exif/1.0}software=Windows Photo Editor 10.0.10011.16384, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}orientation=6, {http://www.alfresco.com/model/custom-model/1.0}originalCreatedDate=2020-05-26T17:19:21, {http://www.alfresco.com/model/custom-model/1.0}originalModifiedDate=2020-05-26T17:19:21, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=1280, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=720, {http://www.alfresco.org/model/content/1.0}author=null}
For FTP, it is not working.Log is also not getting.
05-27-2020 07:30 AM
By looking at the log you shared, i can see it is extracting created and modified metadata and it is mapped-accepted and changed as well (highlighted in red). Did you verified the image in node browser to see whether these "changed" properties are applied ? Based on log it should be available. I am not sure why you can't see these changes.
Found: { date=2020-05-26T17:19:21, Compression Type=Progressive, Huffman, Data Precision=8 bits, Number of Components=3, tiff:ImageLength=720, Component 2=Cb component: Quantization table 1, Sampling factors 1 horiz/1 vert, dcterms:created=2020-05-26T17:19:21, Component 1=Y component: Quantization table 0, Sampling factors 2 horiz/2 vert, dcterms:modified=2020-05-26T17:19:21, Last-Modified=2020-05-26T17:19:21, title=null, X Resolution=96 dots, Last-Save-Date=2020-05-26T17:19:21, Component 3=Cr component: Quantization table 1, Sampling factors 1 horiz/1 vert, meta:save-date=2020-05-26T17:19:21, modified=2020-05-26T17:19:21, tiff:BitsPerSample=8, Content-Type=image/jpeg, Resolution Units=inch, comments=null, meta:creation-date=2020-05-26T17:19:21, author=null, created=2020-05-26T17:19:21, Date/Time=2020:05:26 17:19:21, Creation-Date=2020-05-26T17:19:21, Image Height=720 pixels, Unknown tag (0x000b)=Windows Photo Editor 10.0.10011.16384, Orientation=Right side, top (Rotate 90 CW), tiff:Orientation=6, Image Width=1280 pixels, tiff:Software=Windows Photo Editor 10.0.10011.16384, Unknown tag (0xea1c)=[2060 bytes], Software=Windows Photo Editor 10.0.10011.16384, tiff:ImageWidth=1280, Y Resolution=96 dots } Mapped and Accepted: { {http://www.alfresco.org/model/exif/1.0}software=Windows Photo Editor 10.0.10011.16384, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}orientation=6, {http://www.alfresco.com/model/custom-model/1.0}originalCreatedDate=2020-05-26T17:19:21, {http://www.alfresco.com/model/custom-model/1.0}originalModifiedDate=2020-05-26T17:19:21, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=1280, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=720, {http://www.alfresco.org/model/content/1.0}author=null } changed: { {http://www.alfresco.org/model/exif/1.0}software=Windows Photo Editor 10.0.10011.16384, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}orientation=6, {http://www.alfresco.com/model/custom-model/1.0}originalCreatedDate=2020-05-26T17:19:21, {http://www.alfresco.com/model/custom-model/1.0}originalModifiedDate=2020-05-26T17:19:21, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=1280, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=720, {http://www.alfresco.org/model/content/1.0}author=null }
I am not sure why metadata extraction is not triggering when you upload via FTP, I have not checked it. Try debugging
11-05-2020 01:32 AM
@abhinavmishra14 Thank you very much for your help.
I just want to confirm that will this solution work for FTP Upload?
11-05-2020 09:45 AM
@sanjaybandhniya wrote:
@abhinavmishra14 Thank you very much for your help.
I just want to confirm that will this colution work for FTP Upload?
@sanjaybandhniya theroritically it should. But i have not tried with FTP. you can try and see if it works, if it doesn't then open a new thread.
Explore our Alfresco products with the links below. Use labels to filter content by product module.