cancel
Showing results for 
Search instead for 
Did you mean: 

Import Metadata to DB directly or ACP

jlabuelo
Champ on-the-rise
Champ on-the-rise
Good Morning

Please would like to know if it is possible to upload a file and insert the metadata directly to the DB. Let me explain you

We have a set of documents which we have scanned with an OCR software to extract information from the document which we would like to store in Alfresco (Image Scanned + Metadata).

We are studying the possible ways to import this data and image in Alfresco. We have readed that this could be done with an ACP file where we can include all the tif documents scanned and a XML file containig the metadata information. Our understanding is that once Alfresco gets the ACP file the information will be uncompressed in the space adding automatically the metadata included in the xml file to each of the tif documents. Are we wrong ?

We would like to know if there is any other way, like for example, pass the tif documents to Alfresco using FTP or CIFS and then add the metadata information using  a ODBC conection directly.

Would this second option possible? if so could you please point us how to do it or where we can information about this.

Thanks a lot.
4 REPLIES 4

mrogers
Star Contributor
Star Contributor
The first option is possible,  but the second is a big no!  You shouldn't try to access the database directly and certainly not to write data.

What you should probably do is to use one of Alfresco's remote interfaces to upload your content and set whatever properties are required.     Web Scripts are the preferred way to do that.

Or you can invent your own datatype that includes content or metadata and upload that to alfresco with a rule to unpack it.

Or you can use a format like ACP.  However your next question will probably be what format is ACP …

There's lots more options that others may come in with.

jlabuelo
Champ on-the-rise
Champ on-the-rise
Hi there and thanks for the answer

Yes in our mind the first option was to use the ACP format, however please correct me in my understanding if I am wrong

a) If we create the ACP file with the image *.tif files inside and the XML with the metadata of each of the tif files. As soon as we upload it to alfresco , we can have a rule, which is launched every time a new acp file is added to the space, to reproduce the Import process to unpack the acp file and assign the value of the metadata stored in the xml file to each of the tif files. are we right or there is anything else we should keep in mind?

b) How can we create an ACP file in Windows or Linux packing in it the tif files and the xml file with the metadata, so it keeps the same format and it is recognized by Alfresco once we upload it?

Thanks once more for the quick question

Cheers

invictus9
Champ in-the-making
Champ in-the-making
I found this sourceforge project that does something similar to what you want:

http://forge.alfresco.com/projects/acpgenerator/

mdutoo
Champ on-the-rise
Champ on-the-rise
Hi jlabueno

(disclaimer : I'm the contributor)

there is another, more flexible option (HOWEVER there are bugs in Alfresco 3.0 and would need a migration & recompile to 3.1+) :

use the Talend Open Source ETL along with the Alfresco ETL Connector.

Looking at your "second option", it would for instance allow you to first upload files using CIFS or FTP, then update their metadata by addressing them by their "name path". Actually, you could even let your files sit on a (mounted remote) filesystem that you Alfresco server can see, and let Talend + ETL Connector manage the upload.

Also allows to set associations, rights, any types and aspects, create folder trees, works in batches, outs single import errors… And it's an ETL, so it works with any kind of input data. By the way, it uses the ACP format under the hood but goes beyond its limits (like upload size, fail on first error).

More information :
http://knowledge.openwide.fr/Main/AlfrescoETLConnector
http://forge.alfresco.com/projects/etlconnector/
http://www.talend.com

Regards,
Marc