performance problems when too many files in one directory

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-29-2010 08:29 AM
I have a real performance problem when user/s tries to open directory (either through CIFS or through WEB interface) with more than 1000 files in it (it takes like 20 seconds). I have MySQL DB and Alfresco 3.2.
I did some monitoring tests and tomcat and JVM seems to be running OK. So i think that the weak point is MySQL DB. Maybe i have to rebuild the tables indexes. I serached the forum but did not find anything.
Does somebody have some ideas what to optimize or what to do?
thanx.
- Labels:
-
Archive
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-24-2010 07:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-25-2010 05:39 AM
as workaround, we have changed the ResultTransformer in /Repository/source/java/org/alfresco/repo/node/db/hibernate/HibernateNodeDaoServiceImpl.java around line 3126:
private void cacheNodesNoBatch(Store store, List<String> uuids) { Criteria criteria = getSession().createCriteria(NodeImpl.class, "node"); // ————————————————————————————- // Kludge AW 2010-05-07: Discard duplicate node ids // // criteria.setResultTransformer(Criteria.ROOT_ENTITY); criteria.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY); // ————————————————————————————-
This is perhaps essentially the same as the fix proposed by Alfresco - the problem with OUTER JOIN fetching is a known weakness in Hibernate, so it has it's own workaround :mrgreen:
Results are quite promising: a folder with ~ 600 documents, which was peviously shown in 40 secs now shows up in 4…
Cheers
Gyro
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-25-2010 07:00 AM
We've done a similar fix: https://issues.alfresco.com/jira/browse/ALF-2839. There are also a few other related tweaks to speed things up. You'll still be getting 2n+1 performance against cold caches, but it's better than the blow-out of the resultset caused by the X*Y*Z rows returned by the original query.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-25-2010 07:49 AM
sorry for my question, but what changes solve the problem?
This is mine version (revision is 17458)
private void cacheNodesNoBatch(Store store, List<String> uuids) { Criteria criteria = getSession().createCriteria(NodeImpl.class, "node"); criteria.setResultTransformer(Criteria.ROOT_ENTITY); criteria.add(Restrictions.eq("store.id", store.getId())); criteria.add(Restrictions.in("uuid", uuids)); criteria.setCacheMode(CacheMode.PUT); criteria.setFlushMode(FlushMode.MANUAL); List nodeList = criteria.list(); List nodeIds = new ArrayList(nodeList.size()); for (Node node : nodeList) { Long nodeId = node.getId(); this.storeAndNodeIdCache.put(node.getNodeRef(), nodeId); nodeIds.add(nodeId); } if (nodeIds.size() == 0) { return; } criteria = getSession().createCriteria(ChildAssocImpl.class, "parentAssoc"); criteria.setResultTransformer(Criteria.ROOT_ENTITY); criteria.add(Restrictions.in("child.id", nodeIds)); criteria.setCacheMode(CacheMode.PUT); criteria.setFlushMode(FlushMode.MANUAL); List parentAssocs = criteria.list(); Map parentAssocMap = new HashMap(nodeIds.size() * 2); for (ChildAssoc parentAssoc : parentAssocs) { nodeId = parentAssoc.getChild().getId(); List parentAssocsOfNode = (List)parentAssocMap.get(nodeId); if (parentAssocsOfNode == null) { parentAssocsOfNode = new ArrayList(3); parentAssocMap.put(nodeId, parentAssocsOfNode); } parentAssocsOfNode.add(parentAssoc); if (this.isDebugParentAssocCacheEnabled) { loggerParentAssocsCache.debug("\nParent associations cache - Adding entry: \n Node: " + nodeId + "\n" + " Assocs: " + parentAssocsOfNode); } } for (Node node : nodeList) { nodeId = node.getId(); List parentAsssocsOfNode = (List)parentAssocMap.get(nodeId); if (parentAsssocsOfNode == null) { parentAsssocsOfNode = Collections.emptyList(); } this.parentAssocsCache.put(nodeId, new NodeInfo(node, this.qnameDAO, parentAsssocsOfNode)); } }
This is TH solution. See the comment //Kludge AW 2010-05-07: Discard duplicate node ids
private void cacheNodesNoBatch(Store store, List<String> uuids) { Criteria criteria = getSession().createCriteria(NodeImpl.class, "node"); // ————————————————————————————- // Kludge AW 2010-05-07: Discard duplicate node ids // // criteria.setResultTransformer(Criteria.ROOT_ENTITY); criteria.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY); // ————————————————————————————-
This is 3.3 version (revision is 20392).See the comment //We have duplicate nodes, so make sure we only process each node once
private void cacheNodesNoBatch(Store store, List<String> uuids) { Criteria criteria = getSession().createCriteria(NodeImpl.class, "node"); criteria.setResultTransformer(Criteria.ROOT_ENTITY); criteria.add(Restrictions.eq("store.id", store.getId())); criteria.add(Restrictions.in("uuid", uuids)); criteria.setCacheMode(CacheMode.PUT); criteria.setFlushMode(FlushMode.MANUAL); List<Node> nodeList = criteria.list(); Set<Long> nodeIds = new HashSet<Long>(nodeList.size()*2); for (Node node : nodeList) { // We have duplicate nodes, so make sure we only process each node once Long nodeId = node.getId(); if (!nodeIds.add(nodeId)) { // Already processed continue; } storeAndNodeIdCache.put(node.getNodeRef(), nodeId); } if (nodeIds.size() == 0) { // Can't query return; } criteria = getSession().createCriteria(ChildAssocImpl.class, "parentAssoc"); criteria.setResultTransformer(Criteria.ROOT_ENTITY); criteria.add(Restrictions.in("child.id", nodeIds)); criteria.setCacheMode(CacheMode.PUT); criteria.setFlushMode(FlushMode.MANUAL); List<ChildAssoc> parentAssocs = criteria.list(); Map<Long, List<ChildAssoc>> parentAssocMap = new HashMap<Long, List<ChildAssoc>>(nodeIds.size() * 2); for (ChildAssoc parentAssoc : parentAssocs) { Long nodeId = parentAssoc.getChild().getId(); List<ChildAssoc> parentAssocsOfNode = parentAssocMap.get(nodeId); if (parentAssocsOfNode == null) { parentAssocsOfNode = new ArrayList<ChildAssoc>(3); parentAssocMap.put(nodeId, parentAssocsOfNode); } parentAssocsOfNode.add(parentAssoc); if (isDebugParentAssocCacheEnabled) { loggerParentAssocsCache.debug("\n" + "Parent associations cache - Adding entry: \n" + " Node: " + nodeId + "\n" + " Assocs: " + parentAssocsOfNode); } } // Cache NodeInfo for each node for (Node node : nodeList) { Long nodeId = node.getId(); List<ChildAssoc> parentAsssocsOfNode = parentAssocMap.get(nodeId); if (parentAsssocsOfNode == null) { parentAsssocsOfNode = Collections.emptyList(); } parentAssocsCache.put(nodeId, new NodeInfo(node, qnameDAO, parentAsssocsOfNode)); } }
Regards

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-25-2010 10:25 AM
i have applied both changes to my project.
I have two Envs.
- Env. A:
- ALFRESCO: Community Current version 3.2.0 (r2 2440) schema 3300
- OS: Windows xp 32 bit , RAM 2,5G , Intel Core Duo 2,1 GHz
- AS: Tomcat 6.0.18
- JAVA: Jdk 6.0_11. Heap 986,125MB
- DB :Mysql 5.1.35 Community
Env.B:
- ALFRESCO: Community Current version 3.2.0 (r2 2440) schema 3300
- OS: Linux Oracle 64 bit (RedHat 5.4), RAM 6G
- AS: JBOSS 4.2.2 GA
- JAVA: Jdk 6.0_18. Heap 4GB
- DB: Oracle 11g
I have not applied and tested on Env. A where i have problems of perfomance and returned values
I have applied on Env.B and now no excpetion is eraised until 247 elements returned (before the changes every Oracle query on DB has errors ORA-01795 also with 247) , on this Env.B i have not tested with a lot data (40000) like my previuos post.
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-25-2010 12:58 PM
// Get nodes and properties Criteria criteria = getSession().createCriteria(NodeImpl.class, "node"); criteria.setResultTransformer(Criteria.ROOT_ENTITY); criteria.add(Restrictions.eq("store.id", store.getId())); criteria.add(Restrictions.in("uuid", uuids)); criteria.setFetchMode("aspects", FetchMode.SELECT); // Don't join to aspects criteria.setCacheMode(CacheMode.PUT); criteria.setFlushMode(FlushMode.MANUAL); criteria.list(); // Get nodes and aspects criteria = getSession().createCriteria(NodeImpl.class, "node"); criteria.setResultTransformer(Criteria.ROOT_ENTITY); criteria.add(Restrictions.eq("store.id", store.getId())); criteria.add(Restrictions.in("uuid", uuids)); criteria.setFetchMode("properties", FetchMode.SELECT); // Don't join to properties criteria.setCacheMode(CacheMode.PUT); criteria.setFlushMode(FlushMode.MANUAL); List<Node> nodeList = criteria.list(); Set<Long> nodeIds = new HashSet<Long>(nodeList.size()*2); for (Node node : nodeList) { // We have duplicate nodes, so make sure we only process each node once Long nodeId = node.getId(); if (!nodeIds.add(nodeId)) { // Already processed continue; } storeAndNodeIdCache.put(node.getNodeRef(), nodeId); } if (nodeIds.size() == 0) { // Can't query return; }

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-26-2010 03:33 AM
now i have a lot of versions of the software……
1- Gyro.Gearless :
replace
criteria.setResultTransformer(Criteria.ROOT_ENTITY);
with criteria.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);
2- SVN Head :
replace
for (Node node : nodeList) { Long nodeId = node.getId(); this.storeAndNodeIdCache.put(node.getNodeRef(), nodeId); nodeIds.add(nodeId);}
with for (Node node : nodeList) { // We have duplicate nodes, so make sure we only process each node once Long nodeId = node.getId(); if (!nodeIds.add(nodeId)) { // Already processed continue; } storeAndNodeIdCache.put(node.getNodeRef(), nodeId); }
3-derek
add
…..criteria.setFetchMode("aspects", FetchMode.SELECT);……criteria.setFetchMode("properties", FetchMode.SELECT);…..
Do i have applay all?I believe that this a blocking bug.
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-26-2010 07:23 AM
1: I'm not convinced on the behaviour of 1 (it says a distinct entity per row, if I remember) and I don't know if it's rolling the related properties and aspects up correctly. We don't have that in our tested code.
2: Yes. That eliminates the duplicates and prevents blow-outs of the subsequent queries.
3: Yes. That prevents blow-out of the resultset from left joins to nodes AND aspects in the same query. We have a Criteria query for nodes-and-properties and a Criteria query for nodes-and-aspects.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-26-2010 08:29 AM
thanks for your support.
But i don't understand yuor code it seems to code written twice with a criteria.list() without returned value: is it ok?
// Get nodes and properties
Criteria criteria = getSession().createCriteria(NodeImpl.class, "node");
criteria.setResultTransformer(Criteria.ROOT_ENTITY);
criteria.add(Restrictions.eq("store.id", store.getId()));
criteria.add(Restrictions.in("uuid", uuids));
criteria.setFetchMode("aspects", FetchMode.SELECT); // Don't join to aspects
criteria.setCacheMode(CacheMode.PUT);
criteria.setFlushMode(FlushMode.MANUAL);
criteria.list();
// Get nodes and aspects
criteria = getSession().createCriteria(NodeImpl.class, "node");
criteria.setResultTransformer(Criteria.ROOT_ENTITY);
criteria.add(Restrictions.eq("store.id", store.getId()));
criteria.add(Restrictions.in("uuid", uuids));
criteria.setFetchMode("properties", FetchMode.SELECT); // Don't join to properties
criteria.setCacheMode(CacheMode.PUT);
criteria.setFlushMode(FlushMode.MANUAL);
List<Node> nodeList = criteria.list();
Set<Long> nodeIds = new HashSet<Long>(nodeList.size()*2);
for (Node node : nodeList)
{
// We have duplicate nodes, so make sure we only process each node once
Long nodeId = node.getId();
if (!nodeIds.add(nodeId))
{
// Already processed
continue;
}
storeAndNodeIdCache.put(node.getNodeRef(), nodeId);
}
if (nodeIds.size() == 0)
{
// Can't query
return;
}
Note that i'm testing 1+2 on OracleDB with 250 elements and seems to work( fast and without errors).
Now i'm testing 1+2 on MysqlDB a folder with 1000 subfolders (every subfolder has others 5 subsubfolders) and the explorer view is working(2/3 seconds to display) but the delete is very very very slow (15 minutes and is working……untill now).
The log4j says:
14:18:29,218 User:admin WARN [org.alfresco.storeAndNodeIdTransactionalCache] Transactional update cache 'org.alfresco.storeAndNodeIdTransactionalCache' is full (10000).
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-26-2010 08:43 AM
