cancel
Showing results for 
Search instead for 
Did you mean: 

How to run a metadata extraction during bulk upload

boneill
Star Contributor
Star Contributor

Hi Guys,

We are upgrading to Alfresco 7.0.1.  The new architecture uses the Alfesco Transformation Services AIO Tengines.  These are configured to extract metadata if an item is added to Alfresco by default.

What I need to do is to do a metadata extraction which maps additional values from the source document to the metadata of the document in Alfresco.  I only want to do this during a bulk upgrade.  Our previous solution was to use a rule on a folder to run a custom metadata extraction on document creation.  This allowed us to turn the rule on or off during migrations.  The new Alfresco Transform Service framework now handles metaadata extraction.  Therefore, 

1) If I want to extract extra fields, is it possible to call a metatadata extraction transform by request from a rule by either java or js.

2) Should I create a new tEngine to specifically do the extraction of existing fields, or update the config for the existing tEngine (which always gets called from what I understand).

3) Is there a different way to do this other than using a metadata extractor.

Regards

Brian

3 REPLIES 3

afaust
Legendary Innovator
Legendary Innovator

1) You can call the generic "extract-metadata" action to trigger an extraction from a rule. Or you build a custom action that calls the RenditionService2 (not part of the supported Java API) to perform a rendition to the magic target mimetype "alfresco-metadata-extract". If there are mutiple possible T-Engine extractions from the same source mimetype, you can't really select which one to use except by using transform options that apply only to the one you intend to be run.

2) The default T-Engines should use the overwrite policy PRAGMATIC, which set metadata fields only if they don't exist, are null, the empty string, or EXIF/audio model-related. I have not found yet how to configure the engines to use PRUDENT instead, which removes the special handling for EXIF/audio, or even CAUTIOUS (will set only if property is not set yet - won't set if property explicitly set to null). If you find a way to specify the overwrite policy via config, that would be my way to go - otherwise: custom engine.

3) There always is a different way. None that makes use of default Alfresco features though...

boneill
Star Contributor
Star Contributor

Hi Axel,

Time got away from me and I forgot to thankyou for your response.  Many thanks for answering this question and clarifying it for me.

Regards

Brian

merinas2
Champ in-the-making
Champ in-the-making

Alfresco provides content rules that can be applied at the folder level. These rules can specify actions to be taken when documents are added to or removed from a folder. You can create a custom content rule that triggers metadata extraction using a custom tEngine when documents are added to a specific folder. This can provide you with fine-grained control over when the extraction process occurs. If you want to use the Alfresco Transformation Services AIO Engines to handle metadata extraction, you can create a custom metadata extractor. This would involve developing a custom transformation or tEngine that performs the specific metadata mapping you need. You can then configure this tEngine to run as part of the transformation process when documents are uploaded.