Showing results for 
Search instead for 
Did you mean: 

Amount of files per space recommendation

Champ in-the-making
Champ in-the-making
Hello all,

I have created a bulk uploader to load thousands of xml files (usually around 2k each).   Performance takes a huge hit when there is a high amount of files in a single space and I have been tinkering with amounts. 

I was just wondering if anyone out there had any recommendations for the amount of files to have in a given space.  Is there a threshold that performance really falls off at?

Star Contributor
Star Contributor
How is performance affected?
Does your loading slow down, or is it the folder browsing that slows down?

As for browsing I don't know of any exact threshold, but in practice, finding documents by browsing isn' really useful if you have more than 100 in a space. If you do, it is easier to find using search.

Loading should not be affected by number of documents in a space as far as i know, but you need to tell us how exactly you load them if this is your problem.

Champ in-the-making
Champ in-the-making
The performance hit that I'm concerned with is the amount of time it takes to load the files into the repository…  My rough calculations have it at about 7 minutes for 500 files (importing from zip and extracting metadata) and about 19 minutes for 1000.

Is there tweaks or something I can do to help the performance?

Star Contributor
Star Contributor
If you compare this to normal file operations, you also have db inserts/updates, and indexing starting. So it will always take a little bit longer.
What you can do, if you haven't already. Put the database on a separate server. Make sure you have lots of available memory. Have your files (alf_data) on a fast disk, especially important for the lucene index.

Champ in-the-making
Champ in-the-making
Thank you for the feedback..  The current configuration I have now is a Dual 2.8Ghz xeons w/4G of ram running the tomcat application servers.  On the back end I have a dual 3.2Ghz xeons w/4G of ram running mysql server as well as hosting the data with SCSI320 arrays.  I'm using NFS on the tomcat servers to access the alf_data mount.  

Do you think this setup is an ideal one for what I am trying to do? Any other recommendations?

Also, I have a production tomcat and a development tomcat running..  I have tuned the JVM on the production server but left defaults for the development,  do you think this will help out performance substantially?   I have not yet done testing on the production side.

Once again I thank you for your feedback.


Star Contributor
Star Contributor

I would try to compare storing alf_data on the local file system to having it stored on the NFS volume. I do not think that NFS performance is good enough for this purpose. The best way for you to find is to actually test.

Have you found this page about tuning
I also try to set xms and xmx equal as per this page

Star Contributor
Star Contributor
Peter Monks has a blog post on bulk loading that you might find worth reading