cancel
Showing results for 
Search instead for 
Did you mean: 

How to create a new metadata extractor - MAGE-ML tag

sergio
Champ in-the-making
Champ in-the-making
Hi all.

I am going to use Alfresco for a bio-informatics project. We want to use the capabilities of Alfresco to store, manage and retrieve microarray data. In order to reach this goal we want to store MAGE-ML files - XML files - in which there are many tags we want to extract at run-time during the upload step. For each file we have to extract a set of tags and their corresponding values. The extracted metadata will be available for queries in the advanced search function.

The problem is: what is the best way to approach the problem?

I think to create some new java classes, just on the way of HtmlMetadataExtracter and others contained into the repo\content\metadata directory of the project. For example a class named MAGEMLExtracter.java that implements the extraction of some specific metadata.

Is this the correct way? What are the steps I have to do in order to create this kind of extension for Alfresco?

Any suggestions will be appreciated.
18 REPLIES 18

sergio
Champ in-the-making
Champ in-the-making
Hi Paul!

Sorry, but I can not understand your kindly reply. Maybe did you want to reply to another post?

Plaese let me know if it is my mistake.

Best regards,

Sergio

rwetherall
Confirmed Champ
Confirmed Champ
Hi Sergio,

Yes creating a meta data extractor sounds like the correct approach.

Perhaps you would like to contact me directly and we can discus how I can help get this project off the ground.

Many thanks,
Roy

sergio
Champ in-the-making
Champ in-the-making
Ok.

I have sent an email to your personal address.

Many thanks,

Sergio

benjamin_doughe
Champ in-the-making
Champ in-the-making
Like Sergio we to need to automatically extract meta data on input from bespoke xml files, for search etc. I understand writing a MetadataExtracter is the approach to take but I was wondering if you could send me or post the same information on the steps etc.

We have to handle multiple schema's so the extractor code will need to be more involved. In addition we will need to handle XMP/IPTC information from image and other binary files (see http://www.iptc.org/IPTC4XMP/).

Thanks,
Ben.

sergio
Champ in-the-making
Champ in-the-making
Hi Benjamin!

At the moment I am involved in developing an Alfresco extension to extract MAGE-ML metadata from .xml files coming from microarray experiments, but I think the problem is similar to yours. I am going to request the creation of a new Forge project for which I will be supervised by an Alfresco's developer.

I am completing a documentation for the problem on the wiki, so if you are interested in I can provide you with some specs about how to create the extractor.

However, I am not sure the approach I was suggested to follow is the only one and, above all, is the best for all the possible extractors. In my case the extractor is very specialized and it will work only for specific types of xml files.

From your short explanation your need seems very similar to mine, so we can talk about a little if you want.

Please let me say.

All the best,

Sergio

benjamin_doughe
Champ in-the-making
Champ in-the-making
It seems an IPTC extractor is already being developed at http://forge.alfresco.com/projects/iptc-exif/

sergio
Champ in-the-making
Champ in-the-making
Ok.

I will take a look.

Cheers.

Sergio

rgauss
Champ in-the-making
Champ in-the-making
Not sure if this would still be of interest to anyone here but the IPTC/EXIF project mentioned above (http://forge.alfresco.com/projects/iptc-exif/) now supports reading IPTC/EXIF in XMP.

It does so by extending the XMP Metadata Extensions project (http://forge.alfresco.com/projects/xmp/) which should be able to read XMP from a variety of file types.

Thanks again to Ben for the bug input on IPTC.

sergio
Champ in-the-making
Champ in-the-making
Many thanks.

I have been working at new extractor for clinical-genomics metadata (MAGE-ML standard) that will be available soon as an extension of Alfresco, too.

Cheers,

Sergio