cancel
Showing results for 
Search instead for 
Did you mean: 

Recommendations for developing heavy content store.

javed_afroz
Champ in-the-making
Champ in-the-making
I have to develop a solution based on Alfresco Share. The plan is to store 5 TB of data with custom meta data in Alfresco. There will be approx 5 million documents in different formats. I have developed Alfresco based solution earlier but handling this much data is something that I haven't done before. I have following concerns:

1. What are the best practices in Alfresco to develop solution around this much data?
2. What are the rules that I should keep in mind while developing content model and repository architecture to optimize search response time?
3. What will be recommended hardware/software configuration?

Early help is deeply appreciated.
5 REPLIES 5

abarisone
Star Contributor
Star Contributor
Hi,
first of all you should consider the number of overall users and the number of concurrent users.
The total dimension of your archive is important but is more important its increase rate in time.
Also try to figure out the approximate document modification rate.
Don't file more than 1000 objects per folder.
Another important aspect is the full-text search: try to understand how much it is important to you and if you can avoid it on some content. Remember that the indexing process is expensive in terms of performance and disk space.

About content model try to specialize as much as you can and to widely use aspects, avoiding setting them mandatory.

About development try to use Maven archetypes and to build an AMP module in order to keep it separate from the original alfresco.war
Web Services and WebScripts serve right for the application, since it is build on top of Springframework, giving you a lot of manageability and extensibility.

Hope this helps.
Regards,
Andrea

javed_afroz
Champ in-the-making
Champ in-the-making
Thanks Andrea for you suggestion. I was thinking that if I follow your suggestion of not allowing more than 1000 nodes under one node then I will end up having deeply nested nodes. How does this effect search performance on root node? What should be the optimum depth to give best search performance in Alfresco?

Thanks,
Javed

abarisone
Star Contributor
Star Contributor
HI,
as far as I know there is no an 'optimum depth', it depends on how your repository is structured.
Remember that it is always a trade-off between number of children for each node and node tree depth.
If your performance is not satisfactory you could think about scaling up your environment.

Regards,
Andrea

mrogers
Star Contributor
Star Contributor
This obsolete "rule of thumb" of 1000 docs was always dubious, yes certain operations used to be a problem for example using Explorer to list thousands of documents when you just wanted to read the first one. 

You should structure your content store in a way that makes sense to your users or to allow your app to access data. 

However if you expect your users to page through 1000 docs to read document 1001 they won't be happy!

@mrogers

We have had serious performance issues when we were not following that "obsolete rule of thumb". Load several thousand documents into one space and each new document that gets inserted into that space will block entire Alfresco for a minute or two (read/insert/update) for all users.