05-17-2020 01:30 AM
Hi,
I want to preserve original document create and modified date during upload. how can I achieve that?
If this is possible then will It preserve during FTP upload?
05-18-2020 11:44 AM
Ok, if you are able to locate the extracted metadata in log by AbstractMappingMetadataExtracter/PdfBoxMetadataExtracter and check that Found: {..............} has 'created/modified' metadata but Mapped and Accepted: {............} doesn't show it then here is what could be happening.
So, these values are set at the time of node creation and marked read-only after that.
<property name="cm:created"> <title>Created</title> <type>d:datetime</type> <protected>true</protected> <mandatory enforced="true">true</mandatory> <index enabled="true"> <atomic>true</atomic> <stored>false</stored> <tokenised>both</tokenised> <facetable>true</facetable> </index> </property> <property name="cm:modified"> <title>Modified</title> <type>d:datetime</type> <protected>true</protected> <mandatory enforced="true">true</mandatory> <index enabled="true"> <atomic>true</atomic> <stored>false</stored> <tokenised>both</tokenised> <facetable>true</facetable> </index> </property>
The alternative solution for this would be create your custom properties in your custom content model; and keep the created/modified matadata values on those custom properties for your use. Unless you want to override the default behavior of auditable aspect properties which i believe would not be a good idea.
For example:::::
Create following properties in your custom content model:
<aspect name="demo:customAuditMetadata"> <title>Custom Audit Metadata</title> <description>Custom Audit Metadata</description> <properties> <property name="demo:originCreatedDate"> <title>Original Created Date</title> <description>Created date of files based on incoming metadata extracted from metadata extractor</description> <type>d:text</type> </property> <property name="demo:originModifiedDate"> <title>Original Modified Date</title> <description>Modified date of files based on incoming metadata extracted from metadata extractor</description> <type>d:text</type> </property> </properties> </aspect>
Add following bean definition and add the above properties in the mappingProperties:
<bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter"> <property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" /> <property name="inheritDefaultMapping"> <value>false</value> </property> <property name="overwritePolicy"> <!-- Allow extraction happens all the time (e.g. when content is updated or new version is uploaded).--> <value>EAGER</value> </property> <property name="mappingProperties"> <props> <prop key="namespace.prefix.demo">http://www.github.com/model/demo/1.0</prop> <prop key="created">demo:originCreatedDate</prop> <prop key="modified">demo:originModifiedDate</prop> </props> </property> </bean>
- Update the share config to display the newly added properties on document-details page as needed.
05-25-2020 02:54 PM
@sanjaybandhniya Find the demo project here:
https://github.com/abhinavmishra14/alfresco-metadataextraction-demo
I had an observation between community and enterprise versions. Examples i gave above works perfectly fine with enterprise versions of 5.2.x (i used 5.2.6) and 6.1.x(used 6.1), but properties files are not getting picked correctly (its some sort of intermittent behavior) on community editions.
Only change i did is highlighted below for community edition and it picks up always corretly.
<property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/module/${project.artifactId}/metadataextraction/TikaAutoMetadataExtracter.properties</value> </property> </bean> </property>
On enterprise version both works fine, above path and below given path as well:
<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>
This one also works on both versions:
<value>classpath:alfresco/extension/metadata/TikaAutoMetadataExtracter.properties</value>
I am not sure what difference the two type of versions (community and enterprise) has in terms of extension points, tried looking at source code but no clues. But good news is that the other path i shared above (available in demo project) works fine for both community and enterprise versions.
Hope this helps trim down your issue.
08-01-2024 10:55 AM
I have created an alternative deployment for Docker that allows apply this configuration.
Sample project available in https://github.com/aborroy/alfresco-custom-metadata-extractor
05-22-2020 07:55 AM
@sanjaybandhniya Hope you have read the information shared above.
"TikaAutoMetadataExtracter takes care of other mimetypes which doesn't have specific extractors, It uses AutoDetectParser for parsing and extraction. E.g. for images"
And gave example of TikaAutoMetadataExtractor and other with bold letters: "An example for Images, PDF, Office"
Look ath this bean definition which is provided in above response as well:
<bean id="extracter.TikaAuto" class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter" parent="baseMetadataExtracter">
05-22-2020 08:02 AM
This is my bean.
<bean id="extracter.TikaAuto" class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter" parent="baseMetadataExtracter"> <constructor-arg> <ref bean="tikaConfig" /> </constructor-arg> <property name="overwritePolicy"> <value>EAGER</value> </property> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value> </property> </bean> </property> </bean> <bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter"> <property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" /> <property name="overwritePolicy"> <value>EAGER</value> </property> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/PdfBoxMetadataExtracter.properties</value> </property> </bean> </property> </bean> <bean id="extracter.Poi" class="org.alfresco.repo.content.metadata.PoiMetadataExtracter" parent="baseMetadataExtracter"> <property name="poiFootnotesLimit" value="${content.transformer.Poi.poiFootnotesLimit}" /> <property name="poiExtractPropertiesOnly" value="${content.transformer.Poi.poiExtractPropertiesOnly}" /> <property name="poiAllowableXslfRelationshipTypes"> <list> <!-- These values are valid for Office 2007, 2010 and 2013 --> <value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps</value> <value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps</value> </list> </property> <property name="overwritePolicy"> <value>EAGER</value> </property> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/PoiMetadataExtracter.properties</value> </property> </bean> </property> </bean>
05-22-2020 08:52 AM
Your bean looks correct, what is the config in these files:
PdfBoxMetadataExtracter.properties
PoiMetadataExtracter.properties
TikaAutoMetadataExtracter.properties
05-22-2020 09:16 AM
Properties file having my custom properties.
namespace.prefix.ks=http://www.alfresco.com/model/custom-model/1.0
created=ksriginalCreationDate
modified=ksriginalModificationDate
My content Model
<aspects> <aspect name="ks:importedDoc"> <properties> <property name="ks:originalCreationDate"> <type>d:date</type> </property> <property name="ks:originalModificationDate"> <type>d:date</type> </property> </properties> </aspect> </aspects>
It is working for Pdf and Office files.
05-22-2020 09:29 AM
Hmm kind of weird. It should work i think. Let me try at my end and see what i get.
05-22-2020 12:51 PM
It seems to work perfectly. Try re-checking the configs and logs and see what you get.
Here is the test i did:
<bean id="extracter.TikaAuto" class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter" parent="baseMetadataExtracter"> <constructor-arg> <ref bean="tikaConfig" /> </constructor-arg> <property name="overwritePolicy"> <value>EAGER</value> </property> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value> </property> </bean> </property> </bean> <bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter"> <property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" /> <property name="overwritePolicy"> <value>EAGER</value> </property> <!-- Including custom properties --> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/PdfBoxMetadataExtracter.properties</value> </property> </bean> </property> </bean> <bean id="extracter.Poi" class="org.alfresco.repo.content.metadata.PoiMetadataExtracter" parent="baseMetadataExtracter"> <property name="poiFootnotesLimit" value="${content.transformer.Poi.poiFootnotesLimit}" /> <property name="poiExtractPropertiesOnly" value="${content.transformer.Poi.poiExtractPropertiesOnly}" /> <property name="poiAllowableXslfRelationshipTypes"> <list> <!-- These values are valid for Office 2007, 2010 and 2013 --> <value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps</value> <value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps</value> </list> </property> <property name="overwritePolicy"> <value>EAGER</value> </property> <!-- Including custom properties --> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/PoiMetadataExtracter.properties</value> </property> </bean> </property> </bean>
TikaAutoMetadataExtracter.properties
# Namespaces namespace.prefix.cm=http://www.alfresco.org/model/content/1.0 namespace.prefix.exif=http://www.alfresco.org/model/exif/1.0 namespace.prefix.audio=http://www.alfresco.org/model/audio/1.0
#Custom model namespace namespace.prefix.demo=http://www.github.com/abhinavmishra14/model/demo/1.0
# OOTB Default Mappings author=cm:author title=cm:title description=cm:description created=cm:created
# Custom Properties to be mapped created=demo:originCreatedDate modified=demo:originModifiedDate
geo\:lat=cm:latitude geo\:long=cm:longitude tiff\:ImageWidth=exif:pixelXDimension tiff\:ImageLength=exif:pixelYDimension tiff\:Make=exif:manufacturer tiff\:Model=exif:model tiff\:Software=exif:software tiff\:Orientation=exif:orientation tiff\:XResolution=exif:xResolution tiff\:YResolution=exif:yResolution tiff\:ResolutionUnit=exif:resolutionUnit exif\:Flash=exif:flash exif\:ExposureTime=exif:exposureTime exif\:FNumber=exif:fNumber exif\:FocalLength=exif:focalLength exif\:IsoSpeedRatings=exif:isoSpeedRatings exif\:DateTimeOriginal=exif:dateTimeOriginal xmpDM\:album=audio:album xmpDM\:artist=audio:artist xmpDM\:composer=audio:composer xmpDM\:engineer=audio:engineer xmpDM\:genre=audio:genre xmpDM\:trackNumber=audio:trackNumber xmpDM\:releaseDate=audio:releaseDate #xmpDM:logComment xmpDM\:audioSampleRate=audio:sampleRate xmpDM\:audioSampleType=audio:sampleType xmpDM\:audioChannelType=audio:channelType xmpDM\:audioCompressor=audio:compressor
PdfBoxMetadataExtracter.properties
# Namespaces namespace.prefix.cm=http://www.alfresco.org/model/content/1.0
#Custom model namespace namespace.prefix.demo=http://www.github.com/abhinavmishra14/model/demo/1.0
# OOTB Default Mappings author=cm:author title=cm:title subject=cm:description
# Custom Properties to be mapped created=demo:originCreatedDate modified=demo:originModifiedDate
PoiMetadataExtracter.properties
# Namespaces namespace.prefix.cm=http://www.alfresco.org/model/content/1.0
#Custom model namespace namespace.prefix.demo=http://www.github.com/abhinavmishra14/model/demo/1.0
# OOTB Default Mappings author=cm:author title=cm:title description=cm:description
# Custom Properties to be mapped created=demo:originCreatedDate modified=demo:originModifiedDate
ContentModel:
<aspect name="demo:testAuditMetadata">
<title>Test Audit Metadata</title>
<description>Test Audit Metadata</description>
<properties>
<property name="demo:originCreatedDate">
<title>Original Created Date</title>
<description>Original Created Date</description>
<type>d:text</type>
</property>
<property name="demo:originModifiedDate">
<title>Original Modified Date</title>
<description>Original Modified Date</description>
<type>d:text</type>
</property>
</properties>
</aspect>
Log:
Image Extraction: Mapped and Accepted: {{http://www.alfresco.org/model/exif/1.0}focalLength=4.5, {http://www.alfresco.org/model/exif/1.0}model=TG-5, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}flash=false, {http://www.alfresco.org/model/exif/1.0}fNumber=8.0, {http://www.alfresco.org/model/exif/1.0}isoSpeedRatings=100, {http://www.alfresco.org/model/content/1.0}description={en_US=OLYMPUS DIGITAL CAMERA}, {http://www.alfresco.org/model/exif/1.0}dateTimeOriginal=Sun May 07 13:23:51 EDT 2017, {http://www.alfresco.org/model/exif/1.0}manufacturer=OLYMPUS CORPORATION, {http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2017-05-07T13:23:51, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=590, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=442, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.alfresco.org/model/exif/1.0}exposureTime=0.005} 2020-05-22 09:58:00,041 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-14] Completed metadata extraction: reader: ContentAccessor[ contentUrl=store://2020/5/22/9/57/db13881d-4caf-4a72-a481-054bb9246b63.bin, mimetype=image/jpeg, size=94399, encoding=UTF-8, locale=en_US] extracter: org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter@126bd574 changed: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2017-05-07T13:23:51,{http://www.alfresco.org/model/exif/1.0}focalLength=4.5, {http://www.alfresco.org/model/exif/1.0}model=TG-5, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}flash=false, {http://www.alfresco.org/model/exif/1.0}fNumber=8.0, {http://www.alfresco.org/model/exif/1.0}isoSpeedRatings=100, {http://www.alfresco.org/model/content/1.0}description={en_US=OLYMPUS DIGITAL CAMERA}, {http://www.alfresco.org/model/exif/1.0}dateTimeOriginal=Sun May 07 13:23:51 EDT 2017, {http://www.alfresco.org/model/exif/1.0}manufacturer=OLYMPUS CORPORATION, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=590, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=442, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.alfresco.org/model/exif/1.0}exposureTime=0.005}
PDF Extraction: Mapped and Accepted: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2018-10-26T20:36:24Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2018-10-26T20:36:28Z} 2020-05-22 09:58:11,676 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-2] Completed metadata extraction: reader: ContentAccessor[ contentUrl=store://2020/5/22/9/58/262e3dc1-5cfc-4558-9f01-fae20c5cae2d.bin, mimetype=application/pdf, size=3104712, encoding=UTF-8, locale=en_US] extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@8414655 changed: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2018-10-26T20:36:24Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2018-10-26T20:36:28Z}
Office extraction: Mapped and Accepted: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2020-02-10T16:13:00Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=Abhinav, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2020-02-10T20:05:00Z} 2020-05-22 09:58:22,021 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-11] Completed metadata extraction: reader: ContentAccessor[ contentUrl=store://2020/5/22/9/58/f3281f14-7ffb-4d91-a3b2-d0fc8de305d5.bin, mimetype=application/vnd.openxmlformats-officedocument.wordprocessingml.document, size=3075453, encoding=UTF-8, locale=en_US] extracter: org.alfresco.repo.content.metadata.PoiMetadataExtracter@2752d52e changed: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2020-02-10T16:13:00Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=Abhinav, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2020-02-10T20:05:00Z}
Image metadata on share view details:
PDF And Office metadata on share view details:
05-23-2020 12:06 AM
Hi,
If posssible then can you share demo that you have created because for Image,it is not working even I have used your code.
05-23-2020 10:20 AM
@sanjaybandhniya Please share your contentmodel, share config, bean definition, extractor properties and log here.
05-23-2020 11:11 PM
@abhinavmishra14 I have create new thread.please check
05-25-2020 02:54 PM
@sanjaybandhniya Find the demo project here:
https://github.com/abhinavmishra14/alfresco-metadataextraction-demo
I had an observation between community and enterprise versions. Examples i gave above works perfectly fine with enterprise versions of 5.2.x (i used 5.2.6) and 6.1.x(used 6.1), but properties files are not getting picked correctly (its some sort of intermittent behavior) on community editions.
Only change i did is highlighted below for community edition and it picks up always corretly.
<property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/module/${project.artifactId}/metadataextraction/TikaAutoMetadataExtracter.properties</value> </property> </bean> </property>
On enterprise version both works fine, above path and below given path as well:
<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>
This one also works on both versions:
<value>classpath:alfresco/extension/metadata/TikaAutoMetadataExtracter.properties</value>
I am not sure what difference the two type of versions (community and enterprise) has in terms of extension points, tried looking at source code but no clues. But good news is that the other path i shared above (available in demo project) works fine for both community and enterprise versions.
Hope this helps trim down your issue.
Explore our Alfresco products with the links below. Use labels to filter content by product module.