cancel
Showing results for 
Search instead for 
Did you mean: 

performance problems when too many files in one directory

lagreen
Champ in-the-making
Champ in-the-making
Hello,

I have a real performance problem when user/s tries to open directory (either through CIFS or through WEB interface) with more than 1000 files in it (it takes like 20 seconds). I have MySQL DB and Alfresco 3.2.
I did some monitoring tests and tomcat and JVM seems to be running OK. So i think that the weak point is MySQL DB. Maybe i have to rebuild the tables indexes. I serached the forum but did not find anything.

Does somebody have some ideas what to optimize or what to do?

thanx.
31 REPLIES 31

kgeis
Champ on-the-rise
Champ on-the-rise
I've filed ALF-2960 in response to this.  I changed the node->properties and node->aspects associations to fetch via subselect, and that has helped some.  Further tuning is in order.  It's still not returning results within ten seconds, so I might need to bump up my system.acl.maxPermissionCheckTimeMillis.

gyro_gearless
Champ in-the-making
Champ in-the-making
Hi all,

as workaround, we have changed the ResultTransformer in /Repository/source/java/org/alfresco/repo/node/db/hibernate/HibernateNodeDaoServiceImpl.java around line 3126:

    private void cacheNodesNoBatch(Store store, List<String> uuids)
    {
        Criteria criteria = getSession().createCriteria(NodeImpl.class, "node");

        // ————————————————————————————-
        // Kludge AW 2010-05-07: Discard duplicate node ids
        //
        // criteria.setResultTransformer(Criteria.ROOT_ENTITY);
        criteria.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);
        // ————————————————————————————-

This is perhaps essentially the same as the fix proposed by Alfresco - the problem with OUTER JOIN fetching is a known weakness in Hibernate, so it has it's own workaround  :mrgreen:
Results are quite promising: a folder with ~ 600 documents, which was peviously shown in 40 secs now shows up in 4…

Cheers
Gyro

derek
Star Contributor
Star Contributor
Hi.
We've done a similar fix: https://issues.alfresco.com/jira/browse/ALF-2839.  There are also a few other related tweaks to speed things up.  You'll still be getting 2n+1 performance against cold caches, but it's better than the blow-out of the resultset caused by the X*Y*Z rows returned by the original query.

fracat71
Champ on-the-rise
Champ on-the-rise
Hi all,
sorry for my question, but what changes solve the problem?
This is mine version (revision is 17458)

  private void cacheNodesNoBatch(Store store, List<String> uuids)
  {
    Criteria criteria = getSession().createCriteria(NodeImpl.class, "node");
    criteria.setResultTransformer(Criteria.ROOT_ENTITY);
    criteria.add(Restrictions.eq("store.id", store.getId()));
    criteria.add(Restrictions.in("uuid", uuids));
    criteria.setCacheMode(CacheMode.PUT);
    criteria.setFlushMode(FlushMode.MANUAL);

    List nodeList = criteria.list();
    List nodeIds = new ArrayList(nodeList.size());
    for (Node node : nodeList)
    {
      Long nodeId = node.getId();
      this.storeAndNodeIdCache.put(node.getNodeRef(), nodeId);
      nodeIds.add(nodeId);
    }

    if (nodeIds.size() == 0)
    {
      return;
    }

    criteria = getSession().createCriteria(ChildAssocImpl.class, "parentAssoc");
    criteria.setResultTransformer(Criteria.ROOT_ENTITY);
    criteria.add(Restrictions.in("child.id", nodeIds));
    criteria.setCacheMode(CacheMode.PUT);
    criteria.setFlushMode(FlushMode.MANUAL);
    List parentAssocs = criteria.list();
    Map parentAssocMap = new HashMap(nodeIds.size() * 2);
    for (ChildAssoc parentAssoc : parentAssocs)
    {
      nodeId = parentAssoc.getChild().getId();
      List parentAssocsOfNode = (List)parentAssocMap.get(nodeId);
      if (parentAssocsOfNode == null)
      {
        parentAssocsOfNode = new ArrayList(3);
        parentAssocMap.put(nodeId, parentAssocsOfNode);
      }
      parentAssocsOfNode.add(parentAssoc);
      if (this.isDebugParentAssocCacheEnabled)
      {
        loggerParentAssocsCache.debug("\nParent associations cache - Adding entry: \n   Node:   " + nodeId + "\n" + "   Assocs: " + parentAssocsOfNode);
      }

    }

    for (Node node : nodeList)
    {
      nodeId = node.getId();
      List parentAsssocsOfNode = (List)parentAssocMap.get(nodeId);
      if (parentAsssocsOfNode == null)
      {
        parentAsssocsOfNode = Collections.emptyList();
      }
      this.parentAssocsCache.put(nodeId, new NodeInfo(node, this.qnameDAO, parentAsssocsOfNode));
    }
  }


This is TH solution. See the comment //Kludge AW 2010-05-07: Discard duplicate node ids

    private void cacheNodesNoBatch(Store store, List<String> uuids)
    {
        Criteria criteria = getSession().createCriteria(NodeImpl.class, "node");

        // ————————————————————————————-
        // Kludge AW 2010-05-07: Discard duplicate node ids
        //
        // criteria.setResultTransformer(Criteria.ROOT_ENTITY);
        criteria.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);
        // ————————————————————————————-

This is 3.3 version (revision is 20392).See the comment //We have duplicate nodes, so make sure we only process each node once

    private void cacheNodesNoBatch(Store store, List<String> uuids)
    {
        Criteria criteria = getSession().createCriteria(NodeImpl.class, "node");
        criteria.setResultTransformer(Criteria.ROOT_ENTITY);
        criteria.add(Restrictions.eq("store.id", store.getId()));
        criteria.add(Restrictions.in("uuid", uuids));
        criteria.setCacheMode(CacheMode.PUT);
        criteria.setFlushMode(FlushMode.MANUAL);

        List<Node> nodeList = criteria.list();
        Set<Long> nodeIds = new HashSet<Long>(nodeList.size()*2);
        for (Node node : nodeList)
        {
            // We have duplicate nodes, so make sure we only process each node once
            Long nodeId = node.getId();
            if (!nodeIds.add(nodeId))
            {
                // Already processed
                continue;
            }
            storeAndNodeIdCache.put(node.getNodeRef(), nodeId);
        }       
       
        if (nodeIds.size() == 0)
        {
            // Can't query
            return;
        }
       
        criteria = getSession().createCriteria(ChildAssocImpl.class, "parentAssoc");
        criteria.setResultTransformer(Criteria.ROOT_ENTITY);
        criteria.add(Restrictions.in("child.id", nodeIds));
        criteria.setCacheMode(CacheMode.PUT);
        criteria.setFlushMode(FlushMode.MANUAL);
        List<ChildAssoc> parentAssocs = criteria.list();
        Map<Long, List<ChildAssoc>> parentAssocMap = new HashMap<Long, List<ChildAssoc>>(nodeIds.size() * 2);
        for (ChildAssoc parentAssoc : parentAssocs)
        {
            Long nodeId = parentAssoc.getChild().getId();
            List<ChildAssoc> parentAssocsOfNode = parentAssocMap.get(nodeId);
            if (parentAssocsOfNode == null)
            {
                parentAssocsOfNode = new ArrayList<ChildAssoc>(3);
                parentAssocMap.put(nodeId, parentAssocsOfNode);
            }
            parentAssocsOfNode.add(parentAssoc);
            if (isDebugParentAssocCacheEnabled)
            {
                loggerParentAssocsCache.debug("\n" +
                        "Parent associations cache - Adding entry: \n" +
                        "   Node:   " + nodeId + "\n" +
                        "   Assocs: " + parentAssocsOfNode);
            }
        }
        // Cache NodeInfo for each node
        for (Node node : nodeList)
        {
            Long nodeId = node.getId();
            List<ChildAssoc> parentAsssocsOfNode = parentAssocMap.get(nodeId);
            if (parentAsssocsOfNode == null)
            {
                parentAsssocsOfNode = Collections.emptyList();
            }
            parentAssocsCache.put(nodeId, new NodeInfo(node, qnameDAO, parentAsssocsOfNode));
        }       
    }

Regards

fracat71
Champ on-the-rise
Champ on-the-rise
Hi all,
i have applied both changes to my project.
I have two Envs.

- Env. A:
- ALFRESCO: Community Current version 3.2.0 (r2 2440) schema 3300
- OS: Windows xp 32 bit , RAM 2,5G , Intel Core Duo 2,1 GHz
- AS: Tomcat 6.0.18
- JAVA: Jdk 6.0_11. Heap 986,125MB
- DB :Mysql 5.1.35 Community

Env.B:
- ALFRESCO: Community Current version 3.2.0 (r2 2440) schema 3300
- OS: Linux Oracle 64 bit (RedHat 5.4), RAM 6G
- AS: JBOSS 4.2.2 GA
- JAVA: Jdk 6.0_18. Heap 4GB
- DB: Oracle 11g

I have not applied and tested on Env. A where i have problems of perfomance and returned values
I have applied on Env.B and now no excpetion is eraised until 247 elements returned (before the changes every Oracle query on DB has errors ORA-01795 also with 247) , on this Env.B i have not tested with a lot data (40000) like my previuos post.

Regards

derek
Star Contributor
Star Contributor

        // Get nodes and properties
        Criteria criteria = getSession().createCriteria(NodeImpl.class, "node");
        criteria.setResultTransformer(Criteria.ROOT_ENTITY);
        criteria.add(Restrictions.eq("store.id", store.getId()));
        criteria.add(Restrictions.in("uuid", uuids));
        criteria.setFetchMode("aspects", FetchMode.SELECT);                 // Don't join to aspects
        criteria.setCacheMode(CacheMode.PUT);
        criteria.setFlushMode(FlushMode.MANUAL);
        criteria.list();

        // Get nodes and aspects
        criteria = getSession().createCriteria(NodeImpl.class, "node");
        criteria.setResultTransformer(Criteria.ROOT_ENTITY);
        criteria.add(Restrictions.eq("store.id", store.getId()));
        criteria.add(Restrictions.in("uuid", uuids));
        criteria.setFetchMode("properties", FetchMode.SELECT);              // Don't join to properties
        criteria.setCacheMode(CacheMode.PUT);
        criteria.setFlushMode(FlushMode.MANUAL);

        List<Node> nodeList = criteria.list();
        Set<Long> nodeIds = new HashSet<Long>(nodeList.size()*2);
        for (Node node : nodeList)
        {
            // We have duplicate nodes, so make sure we only process each node once
            Long nodeId = node.getId();
            if (!nodeIds.add(nodeId))
            {
                // Already processed
                continue;
            }
            storeAndNodeIdCache.put(node.getNodeRef(), nodeId);
        }       
       
        if (nodeIds.size() == 0)
        {
            // Can't query
            return;
        }

fracat71
Champ on-the-rise
Champ on-the-rise
Hi all,
now i have a lot of versions of the software……

1- Gyro.Gearless :
replace
     
 criteria.setResultTransformer(Criteria.ROOT_ENTITY);
with
   
 criteria.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);

2- SVN Head :
replace   
for (Node node : nodeList)
    {
      Long nodeId = node.getId();
      this.storeAndNodeIdCache.put(node.getNodeRef(), nodeId);
      nodeIds.add(nodeId);
}
with
      
 for (Node node : nodeList)
        {
            // We have duplicate nodes, so make sure we only process each node once
            Long nodeId = node.getId();
            if (!nodeIds.add(nodeId))
            {
                // Already processed
                continue;
            }
            storeAndNodeIdCache.put(node.getNodeRef(), nodeId);
        }

3-derek
add

…..
criteria.setFetchMode("aspects", FetchMode.SELECT);
……
criteria.setFetchMode("properties", FetchMode.SELECT);
…..
Do i have applay all?
I believe that this a blocking bug.

Regards

derek
Star Contributor
Star Contributor
Hi,
1: I'm not convinced on the behaviour of 1 (it says a distinct entity per row, if I remember) and I don't know if it's rolling the related properties and aspects up correctly.  We don't have that in our tested code.
2: Yes. That eliminates the duplicates and prevents blow-outs of the subsequent queries.
3: Yes. That prevents blow-out of the resultset from left joins to nodes AND aspects in the same query.  We have a Criteria query for nodes-and-properties and a Criteria query for nodes-and-aspects.

fracat71
Champ on-the-rise
Champ on-the-rise
Hi derek,
thanks for your support.
But i don't understand yuor code it seems to code written twice with a criteria.list() without returned value: is it ok?

        // Get nodes and properties
        Criteria criteria = getSession().createCriteria(NodeImpl.class, "node");
        criteria.setResultTransformer(Criteria.ROOT_ENTITY);
        criteria.add(Restrictions.eq("store.id", store.getId()));
        criteria.add(Restrictions.in("uuid", uuids));
        criteria.setFetchMode("aspects", FetchMode.SELECT);                 // Don't join to aspects
        criteria.setCacheMode(CacheMode.PUT);
        criteria.setFlushMode(FlushMode.MANUAL);
        criteria.list();

        // Get nodes and aspects
        criteria = getSession().createCriteria(NodeImpl.class, "node");
        criteria.setResultTransformer(Criteria.ROOT_ENTITY);
        criteria.add(Restrictions.eq("store.id", store.getId()));
        criteria.add(Restrictions.in("uuid", uuids));
        criteria.setFetchMode("properties", FetchMode.SELECT);              // Don't join to properties
        criteria.setCacheMode(CacheMode.PUT);
        criteria.setFlushMode(FlushMode.MANUAL);

        List<Node> nodeList = criteria.list();
        Set<Long> nodeIds = new HashSet<Long>(nodeList.size()*2);
        for (Node node : nodeList)
        {
            // We have duplicate nodes, so make sure we only process each node once
            Long nodeId = node.getId();
            if (!nodeIds.add(nodeId))
            {
                // Already processed
                continue;
            }
            storeAndNodeIdCache.put(node.getNodeRef(), nodeId);
        }       
       
        if (nodeIds.size() == 0)
        {
            // Can't query
            return;
        }


Note that i'm testing 1+2 on OracleDB with 250 elements and seems to work( fast and without errors).
Now i'm testing 1+2 on MysqlDB a folder with 1000 subfolders (every subfolder has others 5 subsubfolders) and the explorer view is working(2/3 seconds to display) but the delete is very very very slow (15 minutes and is working……untill now).
The log4j says:

14:18:29,218 User:admin WARN  [org.alfresco.storeAndNodeIdTransactionalCache] Transactional update cache 'org.alfresco.storeAndNodeIdTransactionalCach
e' is full (10000).

Regards

derek
Star Contributor
Star Contributor
You might be better off without the batch code completely.  On the Enterprise release we have also added code to check the presence of nodes in the cache before adding it to the list to batch-fetch.  The cache being full is a problem and you should increase the cache size to accommodate the types of operations you wish to perform.