05-17-2020 01:30 AM
Hi,
I want to preserve original document create and modified date during upload. how can I achieve that?
If this is possible then will It preserve during FTP upload?
05-18-2020 11:44 AM
Ok, if you are able to locate the extracted metadata in log by AbstractMappingMetadataExtracter/PdfBoxMetadataExtracter and check that Found: {..............} has 'created/modified' metadata but Mapped and Accepted: {............} doesn't show it then here is what could be happening.
So, these values are set at the time of node creation and marked read-only after that.
<property name="cm:created"> <title>Created</title> <type>d:datetime</type> <protected>true</protected> <mandatory enforced="true">true</mandatory> <index enabled="true"> <atomic>true</atomic> <stored>false</stored> <tokenised>both</tokenised> <facetable>true</facetable> </index> </property> <property name="cm:modified"> <title>Modified</title> <type>d:datetime</type> <protected>true</protected> <mandatory enforced="true">true</mandatory> <index enabled="true"> <atomic>true</atomic> <stored>false</stored> <tokenised>both</tokenised> <facetable>true</facetable> </index> </property>
The alternative solution for this would be create your custom properties in your custom content model; and keep the created/modified matadata values on those custom properties for your use. Unless you want to override the default behavior of auditable aspect properties which i believe would not be a good idea.
For example:::::
Create following properties in your custom content model:
<aspect name="demo:customAuditMetadata"> <title>Custom Audit Metadata</title> <description>Custom Audit Metadata</description> <properties> <property name="demo:originCreatedDate"> <title>Original Created Date</title> <description>Created date of files based on incoming metadata extracted from metadata extractor</description> <type>d:text</type> </property> <property name="demo:originModifiedDate"> <title>Original Modified Date</title> <description>Modified date of files based on incoming metadata extracted from metadata extractor</description> <type>d:text</type> </property> </properties> </aspect>
Add following bean definition and add the above properties in the mappingProperties:
<bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter"> <property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" /> <property name="inheritDefaultMapping"> <value>false</value> </property> <property name="overwritePolicy"> <!-- Allow extraction happens all the time (e.g. when content is updated or new version is uploaded).--> <value>EAGER</value> </property> <property name="mappingProperties"> <props> <prop key="namespace.prefix.demo">http://www.github.com/model/demo/1.0</prop> <prop key="created">demo:originCreatedDate</prop> <prop key="modified">demo:originModifiedDate</prop> </props> </property> </bean>
- Update the share config to display the newly added properties on document-details page as needed.
05-25-2020 02:54 PM
@sanjaybandhniya Find the demo project here:
https://github.com/abhinavmishra14/alfresco-metadataextraction-demo
I had an observation between community and enterprise versions. Examples i gave above works perfectly fine with enterprise versions of 5.2.x (i used 5.2.6) and 6.1.x(used 6.1), but properties files are not getting picked correctly (its some sort of intermittent behavior) on community editions.
Only change i did is highlighted below for community edition and it picks up always corretly.
<property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/module/${project.artifactId}/metadataextraction/TikaAutoMetadataExtracter.properties</value> </property> </bean> </property>
On enterprise version both works fine, above path and below given path as well:
<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>
This one also works on both versions:
<value>classpath:alfresco/extension/metadata/TikaAutoMetadataExtracter.properties</value>
I am not sure what difference the two type of versions (community and enterprise) has in terms of extension points, tried looking at source code but no clues. But good news is that the other path i shared above (available in demo project) works fine for both community and enterprise versions.
Hope this helps trim down your issue.
08-01-2024 10:55 AM
I have created an alternative deployment for Docker that allows apply this configuration.
Sample project available in https://github.com/aborroy/alfresco-custom-metadata-extractor
05-17-2020 10:52 AM
I had a similar requirement in my project. And to keep the original create/creator/modified/modifier values when documents are uploaded initially, we added custom auditable aspect in our custom model. And we apply the aspect with same values as we can see on olfresco's auditable aspect while documents are created and we used document creation behavior to do this.
Alfresco then keeps on updating its original auditable aspect based on further updates on document. And our custom aspect remains unchanged
05-17-2020 11:51 PM
I am not talking about alfresco upload date,I want to preserve document original creation date during upload.
Please check below Image.
05-18-2020 09:19 AM
OOTB Metadata extrator does maps extraction and application of created date metadata. If you look at the pdfbox metadata extactor properties you would notice that "created" metadata is mapped to "cm:created".
# Mappings author=cm:author title=cm:title description=cm:description created=cm:created
This is the class which may be parsing the newly uploaded pdf files and extracting their available metadata and map them to content model metadata:
You can enable following logs to see if metafdata is getting extracted or not:
log4j.logger.org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter=DEBUG log4j.logger.org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter=DEBUG log4j.logger.org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter=DEBUG
Have a look at this test class as well which tests about "createdate" metadata.:
You can also look at auto metada extractor impl as well for reference: https://github.com/Alfresco/alfresco-repository/blob/master/src/main/java/org/alfresco/repo/content/...
05-18-2020 10:50 AM
Hi,
I have checked all these class but I am not getting idea what customization I have to make to unable mapping of original document created date with cm:created.
05-18-2020 11:44 AM
Ok, if you are able to locate the extracted metadata in log by AbstractMappingMetadataExtracter/PdfBoxMetadataExtracter and check that Found: {..............} has 'created/modified' metadata but Mapped and Accepted: {............} doesn't show it then here is what could be happening.
So, these values are set at the time of node creation and marked read-only after that.
<property name="cm:created"> <title>Created</title> <type>d:datetime</type> <protected>true</protected> <mandatory enforced="true">true</mandatory> <index enabled="true"> <atomic>true</atomic> <stored>false</stored> <tokenised>both</tokenised> <facetable>true</facetable> </index> </property> <property name="cm:modified"> <title>Modified</title> <type>d:datetime</type> <protected>true</protected> <mandatory enforced="true">true</mandatory> <index enabled="true"> <atomic>true</atomic> <stored>false</stored> <tokenised>both</tokenised> <facetable>true</facetable> </index> </property>
The alternative solution for this would be create your custom properties in your custom content model; and keep the created/modified matadata values on those custom properties for your use. Unless you want to override the default behavior of auditable aspect properties which i believe would not be a good idea.
For example:::::
Create following properties in your custom content model:
<aspect name="demo:customAuditMetadata"> <title>Custom Audit Metadata</title> <description>Custom Audit Metadata</description> <properties> <property name="demo:originCreatedDate"> <title>Original Created Date</title> <description>Created date of files based on incoming metadata extracted from metadata extractor</description> <type>d:text</type> </property> <property name="demo:originModifiedDate"> <title>Original Modified Date</title> <description>Modified date of files based on incoming metadata extracted from metadata extractor</description> <type>d:text</type> </property> </properties> </aspect>
Add following bean definition and add the above properties in the mappingProperties:
<bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter"> <property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" /> <property name="inheritDefaultMapping"> <value>false</value> </property> <property name="overwritePolicy"> <!-- Allow extraction happens all the time (e.g. when content is updated or new version is uploaded).--> <value>EAGER</value> </property> <property name="mappingProperties"> <props> <prop key="namespace.prefix.demo">http://www.github.com/model/demo/1.0</prop> <prop key="created">demo:originCreatedDate</prop> <prop key="modified">demo:originModifiedDate</prop> </props> </property> </bean>
- Update the share config to display the newly added properties on document-details page as needed.
05-18-2020 11:42 PM
Hi,
I have tried given approach and its working for alfresco upload and CMIS sync.Is there any way for FTP to achieve same thing.
05-21-2020 08:49 AM
Hi,
I want to unable this for Pdf/Office Document and image files so which other class I need to use other than
org.alfresco.repo.content.metadata.PoiMetadataExtracter ,
org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter
05-21-2020 02:21 PM
If you look at the https://github.com/Alfresco/alfresco-repository/blob/alfresco-repository-6.8/src/main/resources/alfr...
there is extractors configured based on specific mimetypes. Their mapping is within properties file here https://github.com/Alfresco/alfresco-repository/tree/alfresco-repository-6.8/src/main/resources/alfr...
TikaAutoMetadataExtracter takes care of other mimetypes which doesn't have specific extractors, It uses AutoDetectParser for parsing and extraction. E.g. for images
Identify all the file types you want to extend, and add appropriate bean config (copy from content-service-context.xml for reference) to inject your custom properties. Copy the ootb properties files and keep it under "alfresco/metadata/" classpath in your project. e.g. : alfresco/metadata/TikaAutoMetadataExtractor.properties.
OR use the mapping like this:
<property name="mappingProperties"> <props> <prop key="namespace.prefix.cm">http://www.alfresco.org/model/content/1.0</prop> <prop key="namespace.prefix.demo">http://www.github.com/model/demo/1.0</prop> <prop key="author">cm:author</prop> <prop key="title">cm:title</prop> <prop key="subject">cm:description</prop> <prop key="created">demo:originCreatedDate</prop> <prop key="modified">demo:originModifiedDate</prop> </props> </property>
In TikaAutoMetadataExtractor.properties file there are so many mappings, so for this you should choose to use properties file directly instead of mapping the properties within bean definition.
An example for Images, PDF, Office :
<bean id="extracter.TikaAuto" class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter" parent="baseMetadataExtracter"> <constructor-arg> <ref bean="tikaConfig" /> </constructor-arg> <property name="overwritePolicy"> <value>EAGER</value> </property> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value> </property> </bean> </property> </bean> <bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter"> <property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" /> <property name="overwritePolicy"> <value>EAGER</value> </property> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/PdfBoxMetadataExtracter.properties</value> </property> </bean> </property> </bean> <bean id="extracter.Poi" class="org.alfresco.repo.content.metadata.PoiMetadataExtracter" parent="baseMetadataExtracter"> <property name="poiFootnotesLimit" value="${content.transformer.Poi.poiFootnotesLimit}" /> <property name="poiExtractPropertiesOnly" value="${content.transformer.Poi.poiExtractPropertiesOnly}" /> <property name="poiAllowableXslfRelationshipTypes"> <list> <!-- These values are valid for Office 2007, 2010 and 2013 --> <value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps</value> <value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps</value> </list> </property> <property name="overwritePolicy"> <value>EAGER</value> </property> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/PoiMetadataExtracter.properties</value> </property> </bean> </property> </bean>
If you choose to use properties file then add custom namespace and properties mapping in the properties filed mapped to selected extractors.
For example:
in alfresco/metadata/TikaAutoMetadataExtractor.properties add:
namespace.prefix.demo=http://www.github.com/abhinavmishra14/model/demo/1.0 # Custom Properties to be mapped created=demo:originCreatedDate modified=demo:originModifiedDate
Same way in other selected extractor properties you can add above mappings.
For reference: https://docs.alfresco.com/6.0/references/dev-extension-points-custom-metadata-extractor.html
05-22-2020 12:18 AM
Hi, @abhinavmishra14
I did the above configuration and it is working for PDF and Office Document.
Can you guide what configuration I have to do for image files?
Explore our Alfresco products with the links below. Use labels to filter content by product module.