Hyland Connect

cornekloppers · ‎02-27-2007

Hi all

I'm currently investing a lot of time looking into utilizing JCR for
getting a huge amount of content under control (Legislative- and
parliamentary data).

Background information:
I'm busy implementing XSD's for defining the document content structures for
each document type found within a Legislative context, documents like
Gazettes, Case Law, Journals, Bill, e.g. I'm planning to use these XSD's
to validate the legislative XML documents.
Currently the legislative content is in Folio format so one of my first planned tasks is to convert the Folio content into XML. The developed XSDâ€™s will then be used for validation purpose.

My questions:
1. How can XSDâ€™s be utilized within Alfresco and how does XSD relate to the Data Dictionary?

2. Can node structures within the repository be automatically setup by reading/importing existing XSDâ€™s?

3. Is there any import functionality for importing XML files (content) into an Alfresco repository so that each tag in the XML file is represented as a node?
To my understanding this will give you the most flexibility for cross referencing your data?

My apologies for the million questions Smile But any insight will be much
appreciated.

Kind regards,
CornÃ© Kloppers
Cape Town, South Africa

kbpair · ‎02-27-2007

We are in a similar situation and have been evaluating Alfresco. I think we will end up with two products - Alfresco for coarse grained Query and a native XML Database like MarkLogic for fine grained queries.

Here is my thinking, and I would love for someone from Alfresco to tell me I am wrong or just comment on this topic.

If you want your xml indexed so that you can run fine grained queries with XPath or XQuery against elements inside the XML document you cannot do it with Alfresco. Alfresco will let you do text based queries resulting in a google-like search and you can do XPath queries against only the JCR nodes (not the content of the nodes).

Alfresco does give you great coarse grained access, meaning if you want to find the whole document and you know the main search criteria, you can create your repository accordingly. You can also create different views of the data with templates but users cannot dynamically reformat the data as you can with XQuery. Users access content via defined 'views' and search via defined properties about the content (plus the text based search mentioned above).

Your question takes the logical step of breaking up the xml document with the help of a schema into JCR nodes. I am not aware of an automated way to do this and I think the volume of nodes that would create would be very large.

I thought I read somewhere Alfresco is hoping to one day be able to handle 1 billion nodes, but I think that may require multiple repositories and federated search. I do not know enough to really speak on this.

Our situation requires millions of updates each night and I do not think we could do this with Alfresco at that level.

I really wish there was a product that combined the fine-grained access of an XML Database with the coarse-grained access of a JCR implementation but I know of no product that does this today.

I think it may be on Alfresco's roadmap though.

Hope that helps and I would love to hear from Alfresco on the subject.

Hyland Connect

XML Schema and XML Content