Hyland Connect

peterwilson · ‎08-29-2006

Hi

I am considering Alfresco for our company to use as a source for about 40 million documents, most of which are really indexes containing (max 10 peices of meta data of about 10 characters), for physical documents we have stored. Approx <1 million are scanned pdf's tiffs and some word documents. The documents are pretty evenly split over about 600 departments and access is restricted to each departments document set.

Would it be best to put a separate database to store the indexes, a more complex solution for us. Or should we put all the documents into Alfresco and where the document is a link to a physical document put in a empty document and only the meta data?

Any advice or experiences would be greatly appreciated.

Regards

Peter

derek · ‎09-05-2006

Hi,

I am sure that we can put together a few achievable ideas, but some more details would be good.

The level of effort will be determined by the extent to which you are able to use Alfresco as your main storage mechanism and application. Keeping the raw files outside of Alfresco will not be difficult, if that is what you require. Storing the metadata externally will be much more effort. Hopefully, the type of functionality you require is supported by Alfresco (perhaps with a few extensions) and the problems boils right down to importing your existing data and educating end-users.

In case it is relevant, we support linked documents, full user authentication via several mechanisms, user groups, automatic full-text and metadata indexing, automatic document metadata extraction, automatic document conversion, FTP and CIFS (windows explorer) interaction, etc. With 1.4, you get complex workflows, javascript-enabled actions, powerful templates and so on.

I look forward to your response.
Regards

peterwilson · ‎09-20-2006

Hi

The problem is a number of warehouses with files in boxes the boxes are owned by different departments, a department can of course own more than one box.

Against a box we have meta data, normally something like

Description
Date for Destruction
Cost Code
ID

No more than 10 pieces of meta data.

Against a file we have pretty much the same

Person
Date for Destruction
Description (255 text)
ID
Box
Name
Surname
Date of Birth
Location in warehouse

Again no more than 10 pieces

Now some of these files have been scanned as tiff some as pdf, and the paper may or may not exist anymore.

Currently users have to search at least two systems one holding the physical indexes in the warehouse and one of their electronic copies. In reality there are many electronic indexes as many departments have their own electronic document store (a bunch of CD?s).

For the physical warehouse I have a database with three main tables department, box, file and the meta data against each within that table.

So I was thinking would it be a good idea and could I achieve it with Alfresco to convert my three tables into spaces for departments (with attributes) spaces for a boxes (with attributes) and a document for a file, if the file existed it would be a tiff or a pdf, if it did not exist it would be an empty document as a placeholder for the physical file (again with attributes)

Documents are normally searched for either via ID, Surname, or something in description, but users who have electronic archives would like to be able to search across and within the documents themselves (when text based pdfs). The ID?s are not unique and a list is returned and the user browses to find what they think might be the best match.

I am not at all married to the notion of index documents, but it felt right (with no experience granted).

There are too many physical documents to import and the departments have differing budgets so some could be scanned but not all of the so I am always going to have to consider physical paper documents.

If everything can be put into alfresco then great, I am guessing we would just have entries that contained zero length files and meta data pointing to what they are and where they are located in the warehouses.

Hopefully this explains my problem in more detail.

Regards

Peter

Hyland Connect

Advice on empty documents