05-17-2020 01:30 AM
Hi,
I want to preserve original document create and modified date during upload. how can I achieve that?
If this is possible then will It preserve during FTP upload?
05-18-2020 11:44 AM
Ok, if you are able to locate the extracted metadata in log by AbstractMappingMetadataExtracter/PdfBoxMetadataExtracter and check that Found: {..............} has 'created/modified' metadata but Mapped and Accepted: {............} doesn't show it then here is what could be happening.
So, these values are set at the time of node creation and marked read-only after that.
<property name="cm:created"> <title>Created</title> <type>d:datetime</type> <protected>true</protected> <mandatory enforced="true">true</mandatory> <index enabled="true"> <atomic>true</atomic> <stored>false</stored> <tokenised>both</tokenised> <facetable>true</facetable> </index> </property> <property name="cm:modified"> <title>Modified</title> <type>d:datetime</type> <protected>true</protected> <mandatory enforced="true">true</mandatory> <index enabled="true"> <atomic>true</atomic> <stored>false</stored> <tokenised>both</tokenised> <facetable>true</facetable> </index> </property>
The alternative solution for this would be create your custom properties in your custom content model; and keep the created/modified matadata values on those custom properties for your use. Unless you want to override the default behavior of auditable aspect properties which i believe would not be a good idea.
For example:::::
Create following properties in your custom content model:
<aspect name="demo:customAuditMetadata"> <title>Custom Audit Metadata</title> <description>Custom Audit Metadata</description> <properties> <property name="demo:originCreatedDate"> <title>Original Created Date</title> <description>Created date of files based on incoming metadata extracted from metadata extractor</description> <type>d:text</type> </property> <property name="demo:originModifiedDate"> <title>Original Modified Date</title> <description>Modified date of files based on incoming metadata extracted from metadata extractor</description> <type>d:text</type> </property> </properties> </aspect>
Add following bean definition and add the above properties in the mappingProperties:
<bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter"> <property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" /> <property name="inheritDefaultMapping"> <value>false</value> </property> <property name="overwritePolicy"> <!-- Allow extraction happens all the time (e.g. when content is updated or new version is uploaded).--> <value>EAGER</value> </property> <property name="mappingProperties"> <props> <prop key="namespace.prefix.demo">http://www.github.com/model/demo/1.0</prop> <prop key="created">demo:originCreatedDate</prop> <prop key="modified">demo:originModifiedDate</prop> </props> </property> </bean>
- Update the share config to display the newly added properties on document-details page as needed.
05-25-2020 02:54 PM
@sanjaybandhniya Find the demo project here:
https://github.com/abhinavmishra14/alfresco-metadataextraction-demo
I had an observation between community and enterprise versions. Examples i gave above works perfectly fine with enterprise versions of 5.2.x (i used 5.2.6) and 6.1.x(used 6.1), but properties files are not getting picked correctly (its some sort of intermittent behavior) on community editions.
Only change i did is highlighted below for community edition and it picks up always corretly.
<property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/module/${project.artifactId}/metadataextraction/TikaAutoMetadataExtracter.properties</value> </property> </bean> </property>
On enterprise version both works fine, above path and below given path as well:
<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>
This one also works on both versions:
<value>classpath:alfresco/extension/metadata/TikaAutoMetadataExtracter.properties</value>
I am not sure what difference the two type of versions (community and enterprise) has in terms of extension points, tried looking at source code but no clues. But good news is that the other path i shared above (available in demo project) works fine for both community and enterprise versions.
Hope this helps trim down your issue.
08-01-2024 10:55 AM
I have created an alternative deployment for Docker that allows apply this configuration.
Sample project available in https://github.com/aborroy/alfresco-custom-metadata-extractor
07-30-2024 10:13 AM
Hello.
Were you able to achieve this using the t-engine way of doing things?
07-31-2024 12:02 PM
This documentation seems talking about the steps. It is too confusing.
https://docs.alfresco.com/content-services/latest/develop/repo-ext-points/metadata-extractors/
For one it is asking to extend the interface to org.alfresco.repo.content.metadata.MetadataExtractorPropertyMappingOverride and include the bean here under metadataExtractorPropertyMappingOverrides:
<bean id="extractor.Asynchronous" class="org.alfresco.repo.content.metadata.AsynchronousExtractor" parent="baseMetadataExtracter"> <property name="nodeService" ref="nodeService" /> <property name="namespacePrefixResolver" ref="namespaceService" /> <property name="transformerDebug" ref="transformerDebug" /> <property name="renditionService2" ref="renditionService2" /> <property name="renditionDefinitionRegistry2" ref="renditionDefinitionRegistry2" /> <property name="contentService" ref="ContentService" /> <property name="transactionService" ref="transactionService" /> <property name="transformServiceRegistry" ref="transformServiceRegistry" /> <property name="taggingService" ref="taggingService" /> <property name="metadataExtractorPropertyMappingOverrides"> <list> <ref bean="extracter.RFC822" /> <!-- The RM AMP overrides this bean, extending the base class --> <ref bean="extracter.custom" /> </list> </property> </bean>
I could however, get it to work only when i override this property file via custom image build : https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/tika/src/main/resources/Tika...
Ideally there should be a way to just configure here: https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/tika/src/main/resources/tika... as something like:
{"extractMapping":{ "author":["{http://www.xyz.org/model/customcontent/1.0}author"], }, "timeout":20000, "sourceEncoding":"UTF-8"}
but its not clear per documentation as to how this can be done.
07-31-2024 12:59 PM
Indeed, it isn't very clear.
I've tried creating a custom image, but it didn't work in my case.
Can you please share what and how you did it?
Thank you for taking the time to read this.
07-31-2024 03:09 PM
Yeah i just build the image using this: https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/tika/Dockerfile
and added custom property here: https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/tika/src/main/resources/Tika...
And then used the newly built image.
To test i changed
author=demo:author
08-01-2024 10:55 AM
I have created an alternative deployment for Docker that allows apply this configuration.
Sample project available in https://github.com/aborroy/alfresco-custom-metadata-extractor
Explore our Alfresco products with the links below. Use labels to filter content by product module.