cancel
Showing results for 
Search instead for 
Did you mean: 

performance problems when too many files in one directory

lagreen
Champ in-the-making
Champ in-the-making
Hello,

I have a real performance problem when user/s tries to open directory (either through CIFS or through WEB interface) with more than 1000 files in it (it takes like 20 seconds). I have MySQL DB and Alfresco 3.2.
I did some monitoring tests and tomcat and JVM seems to be running OK. So i think that the weak point is MySQL DB. Maybe i have to rebuild the tables indexes. I serached the forum but did not find anything.

Does somebody have some ideas what to optimize or what to do?

thanx.
31 REPLIES 31

fracat71
Champ on-the-rise
Champ on-the-rise
(15 minutes and is working……untill now).
Delete is still working ……

fracat71
Champ on-the-rise
Champ on-the-rise
(15 minutes and is working……untill now).
Delete is still working ……

For a my mistake i have killed the process  Smiley Sad

All the folders are there after 1 hour(more or less)… now i don't know how to do, probably re-create DB and ALF_IND.
So i have to wait for a new version of the Community? My version (3.2.r2) is difficult to use with these performance inside a production Environment.
Expleror view seems to work untill 1000 folders on MySQL (250 on Oracle) , but the delete is not working.

Thank in advance

derek
Star Contributor
Star Contributor
What sort of production machine are you using?

fracat71
Champ on-the-rise
Champ on-the-rise
Hi,
i have a similar problem: a space (named as root) contains 40000 spaces (created by a java main external to the Alfresco application).
Let me explain. I have two enviroments (named as A , B).

Env. A:
- ALFRESCO: Community Current version 3.2.0 (r2 2440) schema 3300
- OS: Windows xp 32 bit , RAM 2,5G , Intel Core Duo 2,1 GHz
- AS: Tomcat 6.0.18
- JAVA: Jdk 6.0_11. Heap 986,125MB
- DB :Mysql 5.1.35 Community

Env.B:
- ALFRESCO: Community Current version 3.2.0 (r2 2440) schema 3300
- OS: Linux Oracle  64 bit (RedHat 5.4), RAM 6G
- AS: JBOSS 4.2.2 GA
- JAVA: Jdk 6.0_18.  Heap 4GB
- DB: Oracle 11g

Problems:

I don't know what happen, always strange behaviour.
Bye

I have updated Env.B to:
-ALFRESCO:  Community Current version 3.2.0 (r2 2440) schema 3300(on OracleDB)
- OS: Linux Oracle  64 bit (RedHat 5.4), RAM 18G, 2xCPU QuadCore
- AS: Tomcat 6.0.18
- JAVA: JRockit 4 ( 64 Bit) Heap 8GB
-DB: Oracle  10g Rel 2 (10.2.0.3) on different server

by derek » 26 May 2010, 12:23

Hi,
1: I'm not convinced on the behaviour of 1 (it says a distinct entity per row, if I remember) and I don't know if it's rolling the related properties and aspects up correctly. We don't have that in our tested code.
2: Yes. That eliminates the duplicates and prevents blow-outs of the subsequent queries.
3: Yes. That prevents blow-out of the resultset from left joins to nodes AND aspects in the same query. We have a Criteria query for nodes-and-properties and a Criteria query for nodes-and-aspects.
I have news (i have applied 1 and 2 not last derek changes it is not simple to do tests with a lot of data)
Env A (Development):
- Explorer view a folder with 1000 subfolders (every subfolder has others 5 subsubfolders) : OK
- Explorer view a folder with 10000 subfolders (every subfolder has others 5 subsubfolders) : OK
- Explorer delete a folder with 1000 subfolders (every subfolder has others 5 subsubfolders) : KO

Env B (Test):
- Explorer view a folder with 250 subfolders : OK

I have also an Env C (Production) but i don't have excuted tests on this environment:
This is an HA Alfresco installed with standard configuration see : http://wiki.alfresco.com/wiki/Cluster_Configuration_V2.1.3_and_Later
[img]http://wiki.alfresco.com/w/images/9/91/Alfresco_LB_Diagram.png[/img]
Web Server A: digit1 and Web Server:digit2
-ALFRESCO:         Community Current version 3.2.0 (r2 2440) schema 3300(on OracleDB)
-OS:               Enterprise Linux 5.4 x86_64, RAM 18G, 2xCPU QuadCore
-Software Cluster: Oracle Grid Infrastructure
-AS:               Tomcat 6 (not session replication)
-JAVA:              JRockit 4 ( 64 Bit)  Heap 8GB
-DB:               Oracle  10g Rel 2 (10.2.0.3) on different server
I have changed (after a lot of tests) alfresco-global.properties and ehcache-custom.xml to make the cluster working.


Regards

fracat71
Champ on-the-rise
Champ on-the-rise
Hi derek,
i see this inside SVN, so 4 classes/xml must be updated.


Revision: 20226
Author: derekh
Date: 17.59.22, giovedì 13 maggio 2010
Message:
Merged BRANCHES/V3.3 to HEAD:
   20192: Merged PATCHES/V3.1.2 to BRANCHES/V3.3:
        20182: Fixed ALF-2712: Performance degradation from 3.1.0 to 3.1.2
   20207: Merged PATCHES/V3.1.2 to BRANCHES/V3.3:
        20203: Fix fallout from ALF-2712 … move back to no results rather than AccessDeniedException
   20222: Merged PATCHES/V3.2.1 to BRANCHES/V3.3:
        20212: Fix ALF-2719: 'patch.convertContentUrls' can result in "No ContentData value exists for ID" errors


/alfresco/HEAD
/alfresco/HEAD/root/projects/repository/source/java/org/alfresco/repo/admin/patch/impl/ContentUrlConverterPatch.java
/alfresco/HEAD/root/projects/repository/source/java/org/alfresco/repo/domain/hibernate/Node.hbm.xml
/alfresco/HEAD/root/projects/repository/source/java/org/alfresco/repo/domain/patch/AbstractPatchDAOImpl.java
/alfresco/HEAD/root/projects/repository/source/java/org/alfresco/repo/jscript/ScriptNode.java
/alfresco/HEAD/root/projects/repository/source/java/org/alfresco/repo/node/db/hibernate/HibernateNodeDaoServiceImpl.java

Can you tell me the best way of work on my version (3.2.r2 version on SVN 17458 ) to solve this problem?
A possible solution is to upgrade to 3.3.

Thanks in advance

derek
Star Contributor
Star Contributor
This topic has been dealing with:
   ALF-2839: Node pre-loading generates needless resultset rows (a blocker for 3.3)
Your svn logs are mainly related to:
   ALF-2712: Performance degradation from 3.1.0 to 3.1.2 (a critical for 3.3)
Both have been solved for 3.3 Enterprise but only 2712 made it into 3.3g before cut-off.

You should proceed by applying the fixes for ALF-2839 if you are affected by it (it depends on the product of aspect count and property count).  At the simplest level just remove the call to cacheNodes, but you will need to decide if you can live with ETHREEOH-2657.

You can look at the bug discussions and diffs for ALF-2712 to decide if you want to patch your version.  Once again, at the simplest you can remove the call to cacheNodes.

Regards

fracat71
Champ on-the-rise
Champ on-the-rise
Hi all,
i have found also a possible solution for a related problem:
http://forums.alfresco.com/en/viewtopic.php?f=14&t=26355&p=88594#p88594

Thanks derek

albertoferrini
Champ in-the-making
Champ in-the-making
In my opinion the problem is how Alfresco 3.2 and below manages the PATH query via the lucene-personalization-classes PathQuery, PathScorer and LaefScorer.
The algorithms have a performace problem (time and cpu leak) that appears when a parent node contains thousands of child nodes.

A possible (really sad, I know…) workaround that I have implemented is to use lucene search to find the parent node avoiding the use of PATH: (use instead an attribute search for name, type, etc.) and use the method getChildAssociations to find the child nodes (this method uses the db).

Regards.

Alberto Ferrini

zs_b
Champ in-the-making
Champ in-the-making
Hi,

we had the same problem at several area in Alfresco. For example in previous version when we had too many completed task the "My completed tasks" killed the server with an outofmemory.
Because of that I sent a patch that was located at https://issues.alfresco.com/jira/browse/ALFCOM-3405

This is the same problem as with the "too many files". I had append functions to the nodeservice to have PagedListDataModel. This means that in Alfresco normally if you open a directory the metadata of all of the subdirectories and files are loaded into the memory. This is not really good as the more files and folders will need more memory and time to render. Well the number is big enough if we have much memory however when there were like 10000 files in one directory if somebody opened the page (and pushed refresh several times as it would have taken minutes to load) the server got killed with even 2 gigs of memory for the JVM.

So the solution is to add to all of the listing functions in NodeService a version like listNodes(…, int firstResult, int maxResults). With that it would be possible to load only a smaller amount of records at one time into the memory. In the managed beans it has to be handled that only one page should be loaded when one page is shown.

Also the left tree view has to be modified when we see the subdirs because if we open a dir branch that contains 10000 subdirs it will freeze as well. So the tree should be modified to have only ten subdirs shown at a time and for example "…" as the first in the subtree and … on the last. With this a pager could be shown in the tree view.

There is an article that might describes in a better way what I wanted to express in the myfaces wiki: http://wiki.apache.org/myfaces/WorkingWithLargeTables

Without these modifications Alfresco cannot be used in a way that it handles directories with many contents. We had to hack this logic into Alfresco at some places like at the jira issue in the beginning of my post. After that it worked great however as new versions came out I did not find the time to make it in the newest versions as well (based on svn checkout).

Regards,
Balazs

derek
Star Contributor
Star Contributor
This has been resolved.  Retry using a 3.4 release.