cancel
Showing results for 
Search instead for 
Did you mean: 

Advice on empty documents

peterwilson
Champ in-the-making
Champ in-the-making
Hi

I am considering Alfresco for our company to use as a source for about 40 million documents, most of which are really indexes containing (max 10 peices of meta data of about 10 characters), for physical documents we have stored. Approx <1 million are scanned pdf's tiffs and some word documents. The documents are pretty evenly split over about 600 departments and access is restricted to each departments document set.

Would it be best to put a separate database to store the indexes, a more complex solution for us. Or should we put all the documents into Alfresco and where the document is a link to a physical document put in a empty document and only the meta data?

Any advice or experiences would be greatly appreciated.

Regards

Peter
2 REPLIES 2

derek
Star Contributor
Star Contributor
Hi,

I am sure that we can put together a few achievable ideas, but some more details would be good.
    1. Are the indexes documents in their own right or are they more like links?
    2. What type of metadata is found in the indexes?
    3. Could you give a summary of the type of document access that typically occurs?  i.e. a summary of end-user interaction and how this is met by the server, or how the documents are accessed by client code, etc.
    4. Are you married to the notion of index documents?
    5. Would you consider an import of the physical documents into Alfresco, or are you bound to keeping them in a separate location?
The level of effort will be determined by the extent to which you are able to use Alfresco as your main storage mechanism and application.  Keeping the raw files outside of Alfresco will not be difficult, if that is what you require.  Storing the metadata externally will be much more effort.  Hopefully, the type of functionality you require is supported by Alfresco (perhaps with a few extensions) and the problems boils right down to importing your existing data and educating end-users.

In case it is relevant, we support linked documents, full user authentication via several mechanisms, user groups, automatic full-text and metadata indexing, automatic document metadata extraction, automatic document conversion, FTP and CIFS (windows explorer) interaction, etc.  With 1.4, you get complex workflows, javascript-enabled actions, powerful templates and so on.

I look forward to your response.
Regards

peterwilson
Champ in-the-making
Champ in-the-making
Hi

The problem is a number of warehouses with files in boxes the boxes are owned by different departments, a department can of course own more than one box.

Against a box we have meta data, normally something like

Description
Date for Destruction
Cost Code
ID

No more than 10 pieces of meta data.

Against a file we have pretty much the same

Person
Date for Destruction
Description (255 text)
ID
Box
Name
Surname
Date of Birth
Location in warehouse

Again no more than 10 pieces

Now some of these files have been scanned as tiff some as pdf, and the paper may or may not exist anymore.

Currently users have to search at least two systems one holding the physical indexes in the warehouse and one of their electronic copies. In reality there are many electronic indexes as many departments have their own electronic document store (a bunch of CD?s).

For the physical warehouse I have a database with three main tables department, box, file and the meta data against each within that table.

So I was thinking would it be a good idea and could I achieve it with Alfresco to convert my three tables into spaces for departments (with attributes) spaces for a boxes (with attributes) and a document for a file, if the file existed it would be a tiff or a pdf, if it did not exist it would be an empty document as a placeholder for the physical file (again with attributes)

Documents are normally searched for either via ID, Surname, or something in description, but users who have electronic archives would like to be able to search across and within the documents themselves (when text based pdfs). The ID?s are not unique and a list is returned and the user browses to find what they think might be the best match.

I am not at all married to the notion of index documents, but it felt right (with no experience granted).

There are too many physical documents to import and the departments have differing budgets so some could be scanned but not all of the so I am always going to have to consider physical paper documents.

If everything can be put into alfresco then great, I am guessing we would just have entries that contained zero length files and meta data pointing to what they are and where they are located in the warehouses.

Hopefully this explains my problem in more detail.

Regards

Peter