How to create a new metadata extractor - MAGE-ML tag
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎09-18-2006 09:55 AM
I am going to use Alfresco for a bio-informatics project. We want to use the capabilities of Alfresco to store, manage and retrieve microarray data. In order to reach this goal we want to store MAGE-ML files - XML files - in which there are many tags we want to extract at run-time during the upload step. For each file we have to extract a set of tags and their corresponding values. The extracted metadata will be available for queries in the advanced search function.
The problem is: what is the best way to approach the problem?
I think to create some new java classes, just on the way of HtmlMetadataExtracter and others contained into the repo\content\metadata directory of the project. For example a class named MAGEMLExtracter.java that implements the extraction of some specific metadata.
Is this the correct way? What are the steps I have to do in order to create this kind of extension for Alfresco?
Any suggestions will be appreciated.
- Labels:
-
Archive
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎09-19-2006 03:26 AM
Sorry, but I can not understand your kindly reply. Maybe did you want to reply to another post?
Plaese let me know if it is my mistake.
Best regards,
Sergio

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎09-19-2006 03:45 AM
Yes creating a meta data extractor sounds like the correct approach.
Perhaps you would like to contact me directly and we can discus how I can help get this project off the ground.
Many thanks,
Roy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎09-19-2006 04:50 AM
I have sent an email to your personal address.
Many thanks,
Sergio

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎11-02-2006 09:48 AM
We have to handle multiple schema's so the extractor code will need to be more involved. In addition we will need to handle XMP/IPTC information from image and other binary files (see http://www.iptc.org/IPTC4XMP/).
Thanks,
Ben.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎11-02-2006 10:03 AM
At the moment I am involved in developing an Alfresco extension to extract MAGE-ML metadata from .xml files coming from microarray experiments, but I think the problem is similar to yours. I am going to request the creation of a new Forge project for which I will be supervised by an Alfresco's developer.
I am completing a documentation for the problem on the wiki, so if you are interested in I can provide you with some specs about how to create the extractor.
However, I am not sure the approach I was suggested to follow is the only one and, above all, is the best for all the possible extractors. In my case the extractor is very specialized and it will work only for specific types of xml files.
From your short explanation your need seems very similar to mine, so we can talk about a little if you want.
Please let me say.
All the best,
Sergio

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎11-02-2006 11:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎11-02-2006 11:22 AM
I will take a look.
Cheers.
Sergio
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-24-2007 06:41 PM
It does so by extending the XMP Metadata Extensions project (http://forge.alfresco.com/projects/xmp/) which should be able to read XMP from a variety of file types.
Thanks again to Ben for the bug input on IPTC.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-25-2007 03:04 AM
I have been working at new extractor for clinical-genomics metadata (MAGE-ML standard) that will be available soon as an extension of Alfresco, too.
Cheers,
Sergio
