cancel
Showing results for 
Search instead for 
Did you mean: 

Amount of files per space recommendation

tonyc
Champ in-the-making
Champ in-the-making
Hello all,

I have created a bulk uploader to load thousands of xml files (usually around 2k each).   Performance takes a huge hit when there is a high amount of files in a single space and I have been tinkering with amounts. 

I was just wondering if anyone out there had any recommendations for the amount of files to have in a given space.  Is there a threshold that performance really falls off at?
6 REPLIES 6

loftux
Star Contributor
Star Contributor
How is performance affected?
Does your loading slow down, or is it the folder browsing that slows down?

As for browsing I don't know of any exact threshold, but in practice, finding documents by browsing isn' really useful if you have more than 100 in a space. If you do, it is easier to find using search.

Loading should not be affected by number of documents in a space as far as i know, but you need to tell us how exactly you load them if this is your problem.

tonyc
Champ in-the-making
Champ in-the-making
The performance hit that I'm concerned with is the amount of time it takes to load the files into the repository…  My rough calculations have it at about 7 minutes for 500 files (importing from zip and extracting metadata) and about 19 minutes for 1000.

Is there tweaks or something I can do to help the performance?

loftux
Star Contributor
Star Contributor
If you compare this to normal file operations, you also have db inserts/updates, and indexing starting. So it will always take a little bit longer.
What you can do, if you haven't already. Put the database on a separate server. Make sure you have lots of available memory. Have your files (alf_data) on a fast disk, especially important for the lucene index.

tonyc
Champ in-the-making
Champ in-the-making
Thank you for the feedback..  The current configuration I have now is a Dual 2.8Ghz xeons w/4G of ram running the tomcat application servers.  On the back end I have a dual 3.2Ghz xeons w/4G of ram running mysql server as well as hosting the data with SCSI320 arrays.  I'm using NFS on the tomcat servers to access the alf_data mount.  

Do you think this setup is an ideal one for what I am trying to do? Any other recommendations?

Also, I have a production tomcat and a development tomcat running..  I have tuned the JVM on the production server but left defaults for the development,  do you think this will help out performance substantially?   I have not yet done testing on the production side.

Once again I thank you for your feedback.

Tony

loftux
Star Contributor
Star Contributor
Hi,

I would try to compare storing alf_data on the local file system to having it stored on the NFS volume. I do not think that NFS performance is good enough for this purpose. The best way for you to find is to actually test.

Have you found this page about tuning http://wiki.alfresco.com/wiki/JVM_Tuning?
I also try to set xms and xmx equal as per this page http://docs.sun.com/app/docs/doc/819-3681/abeii?a=view

loftux
Star Contributor
Star Contributor
Peter Monks has a blog post on bulk loading that you might find worth reading
http://blogs.alfresco.com/wp/pmonks/2009/10/22/bulk-import-from-a-filesystem/