cancel
Showing results for 
Search instead for 
Did you mean: 

Importing files from xml that describes file structure

ebogaard
Champ on-the-rise
Champ on-the-rise
Recently I received a zip with a file file structure and an xml-file that describes these files. This xml is an export from a ms sql-db describing the binaries in the file structure. As this file structure is completely 'random' (comparable to the way Alfresco stores the binaries), the metadata is in the xml file. One problem though, it doesn't include the folders (parents) to be created and the files (children) in those folders. So I'm wondering how I can import this in Alfresco.

The XML looks like this:

    <document>
        <document_naam>Versie II - RenD-uitgaven private sector.doc</document_naam>
        <pk_document_id>3489</pk_document_id>
        <gepubliceerd>1</gepubliceerd>
        <document_actief>1</document_actief>
        <URI>W:\websites\dms\CheckedIn\00\00\00\51431C7B-AC23-4C9C-AB00-D69738B8077A.doc</URI>
        <file_created/>
        <file_modified/>
        <created>2008-01-07T15:53:56+01:00</created>
        <modified>2008-01-07T15:53:56+01:00</modified>
        <eigenaar_persoon_id>1559218</eigenaar_persoon_id>
        <eigenaar_persoon_formal_naam>De heer ABC</eigenaar_persoon_formal_naam>
        <doc_ident_id>3489_5304_58946</doc_ident_id>
        <tabelnaam>tbl_werkgroep</tabelnaam>
        <PK_tabel_ID>4617</PK_tabel_ID>
        <context_naam>Beleidscommissie Innovatie &amp; Kennis</context_naam>
        <html_label>Vergadering 14 maart 2008</html_label>
        <tree_path>/Vergadering 14 maart 2008</tree_path>
        <identifier>VergaderingenBijWerkgroep</identifier>
        <PK_tree_node_locator_id>5304</PK_tree_node_locator_id>
        <PK_tree_node_id>58946</PK_tree_node_id>
    </document>

This XML is completely 'flat', as all files are specified the same way and on the same 'level'.

The most important attributes are:
- document_naam: document name
- URI: location of the binary in the current file structure
- tree_path: location in the file structure 'to be'. This can be multiple folders deep.

It would also be nice to import the 'created' and 'modified' data, but those aren't necessary.

Can someone help me with this? I know how to import ACP-fiels, but I'm not sure how to create the file structure when the folder metadata is in the file attributes and is not specified as a separate node.
7 REPLIES 7

mitpatoliya
Star Collaborator
Star Collaborator
I do not understand your requirement. Few questions.

1) do you expect that xml will attach the metadata to the imported files I do not think this is possible out of box.

You can create your custom code which will attach the metadata by reading that xml file and move the files to given folder path.

ebogaard
Champ on-the-rise
Champ on-the-rise
The thing I want to accomplish is that the files from the location 'W:\websites\dms\CheckedIn\' are put in a new structure, based on the attributes in the xml file. So in this example, the file 'W:\websites\dms\CheckedIn\00\00\00\51431C7B-AC23-4C9C-AB00-D69738B8077A.doc' should end up in the folder + filename: '/Vergadering 14 maart 2008/Versie II - RenD-uitgaven private sector.doc'.

As the end target is Alfresco, I see two possibilities to get to this end:
1. Create the folder structure under Linux of Windows with a certain tool that can use the attributes in XML to create that structure, then import this folder structure with all the files in Alfresco.
2. Use the xml to directly import the files in Alfresco, and create the right folder structure and filenames 'on the fly'.

In both cases it's a nice to have to be able to set the created and modified dates, but it's not a must.
And in both cases: I'm not sure how to do it.

mitpatoliya
Star Collaborator
Star Collaborator
So in your XML file one document tag denotes the metadata of one particular document right?
so, now you need to do is need to do some coding to create one webscript which will read your xml file  and do the necessary job in the alfresco.
For coding you will require Alfresco SDK

http://wiki.alfresco.com/wiki/Alfresco_SDK

and creating webscript you can refer this.
http://wiki.alfresco.com/wiki/Java-backed_Web_Scripts_Samples

ebogaard
Champ on-the-rise
Champ on-the-rise
Thanks for your suggestion, but this goes a little bit over my head.
I was hoping for a far easier tool or procedure.

mitpatoliya
Star Collaborator
Star Collaborator
See you Alfresco provide bulk import feature with the help of which you can put all your contents in zip file in particular folder structure it should  laying under alfresco as well.
Then if you simply import that file it will upload all the files in the alfresco and create similar structure.

Now in your case you want to import metadata as well which is something will require either manual effort or some coding.

Or else you can go for the addins supported by alfresco.
http://addons.alfresco.com/addons/importexport-acpzip-share

ebogaard
Champ on-the-rise
Champ on-the-rise
The bulk import feature is great, but there's one problem (at least, that's what I think): the xml doesn't describe the folders (parents) or which files are in that folder (children). That information is in an XML-attribute of each document. That's why I was hoping for another tool to either transform the xml, or create a file structure that the bulk importer gets.

Or do you think the bulk importer gets this xml-file anyway? Maybe I'm just going to try it and see what happens.

mdutoo
Champ on-the-rise
Champ on-the-rise
"I was hoping for another tool to either transform the xml"

=> that's an ETL such as Talend, which is incidentally integrated with Alfresco through the Alfresco ETL Connector :

http://knowledge.openwide.fr/Main/AlfrescoETLConnector

With an ETL like Talend, you can visually configure the transformation of your XML (and mashup it with others, or even with existing metadata from Alfresco using the CMIS Connector) in an Alfresco file & folder tree structure. And you can configure the ETL to do it in several passes, ex. create tree structure from XML and 2. fill it with imported documents.

Regards