cancel
Showing results for 
Search instead for 
Did you mean: 

Performing problems as repository nodes increases.

sidi
Champ in-the-making
Champ in-the-making
Via webServices we are inserting scanned documents to an specific space. At the beguining of the process the ratio was 100 documents per minute for nodes 34 KB big and 13 properties for searching. By now after 150.000 documents inserted in alfresco the ratio is reduced to 10 docs. per minute. We have an Oracle D.B. and the last thing we did was a total index rebuild but the ratio remains the same. We spected to introduce 700.000 docs in 45 days  inserting five hours per day but our plans crashed do to new ratio. Is there some adjust needed in Lucene , working space, or any Alfresco configuration?. Its seems  that is not an Oracle problem.
Can anybody suggest?
Thanks in advance.
5 REPLIES 5

theorbix
Confirmed Champ
Confirmed Champ
How many documents you have, on average, in each Space?

Or are you archiving all documents in the same Space?

sidi
Champ in-the-making
Champ in-the-making
They are mostly in one space. When you have index.recovery.mode=VALIDATE what exactly means if I stop and restart Alfresco de index will be rebuild?.
thanks for your replay.

theorbix
Confirmed Champ
Confirmed Champ
Mmmm…. might be wrong, but I probably see where your problem is.

Most document management systems have problems when the quantity of documents contained in a virtual "folder" (or space, to use Alfresco's terminology) increases.

I would also expect performance problems in doing queries on very large spaces.

In the past I worked on commercial products that were showing a noticeable performance degradations (mostly in searching and browsing content in folders) when the quantity of documents in a folder was over 500-1000 items.

Maybe Alfresco's architecture is different and it can easily handle spaces with hundreds of thousands of documents, or even millions of documents.

But in designing your application, I would suggest to try to implement an automatic "space splitting" algorithm capable of keeping the number of items in the space down to a reasonable limit, and see if this improves the performance of your application.

From a broader point of view, a comment from Alfresco's engineers here would be greatly appreciated:
1) what is the "reasonable" number of items that - according to the product architecture and the tests you've done so far (I'm thinking about the Unisys benchmark, for instance) - can be stored in a Space without incurring in performance degradations?
2) what "repository design" practices are recommended for applications that need to store more than one millions of documents?

sidi
Champ in-the-making
Champ in-the-making
I've made an index recovery and from 23 folders finished with three, tooked five hours and gain a performance of 54 docs per minute. Still less than espected. Few questions for anybody that Knows.

1.- How can I check that the automatic process that unifies segments of index in lucene is working.
2.- Why after inserting only 760 docs just with the index recovery done, created 15 folders in lucene-index/SpaceStore.
3.- Maybe is obvious but not for me, if when searching for docs uses lucene indexes what's the use of Oracle indexes?
      Are we  duplicating information for nothing?
4.-  Is there any tunning on Alfresco for lucene porposes or whatever is needed to get performance back?.
5.-  Is Alfresco a good repository for milions of documents with little size and lots of custom propierties?.
6.-  Did anybody had performance problems whe inserting daily aprox. 15.000 docs and how he solve it?.
7.-  It is a curious thing but searching  is realy fast, inserting very slow.
8.-  Where can I find documentation on how Alfresco uses lucene if the problem is lucene?.

Thanks in advance.
I'm a little scared imagin the repository with 10 million documents and making index recovery every week, and spending another  week to do it.

colindstephenso
Champ in-the-making
Champ in-the-making
I am performing something similar.  Currently the ingestion process I am using (Via web services) creates folders in a space.  We have noticed a couple of performance issues and where looking for recommendations.

1. On ingestion, I search for the existence of a folder.  If I search using an XPATH query, it appears the web service call takes between 15 - 20 seconds to return.  If I use a query.getChildren() approach then the I need to page the result sets as these are limited to 1000 records per set.  This approach starts out fast, then slows over time as the number of folders increases.  Any ideas why the XPATH query would take so long to return?

2. What is the suggested limit of folders in a space before there is noticeable degradation in performance.  In the DM, it seems once there is more than 1000 folders in a space, it takes some time to complete the query and render the page.

I am using Alfresco 3.2 in Tomcat 6.0.18 with MySQL 5 on the backend.

Thanks,
Colin.