Hyland Connect

mlagneaux · ‎06-16-2011

Hello,

I developed a patch that replaces the values of a property "Status". For each old value for this property, I match a new value.
My patch scans all "content" nodes of the store (they are retrieved via a Lucene query), get the current value of the status, determines the new value and set the "Status" property of the node with the new value . Actions on the nodes are made via the nodeService.

In this patch, I also :
- disable behaviours associated with auditable aspect so that modification date of processed nodes does not change (behaviours are reactivated at the end of the patch) ;
- add logs every 1000 nodes processed.

I'm currently running this patch on a store containing 300,000 documents. This patch runs for over a day and we see through the logs that as time passes, performance decreases (the time to process 1000 nodes increases).

What could explain this drop in performance?
Are there settings in Alfresco that could solve my problem?

gyro_gearless · ‎06-17-2011

So when you say, "performance decreases", does the execution time per 1000 nodes ever increase? Maybe some numbers would be helpful to judge if this is normal or sub-optimal performance…
Do you fetch all your nodes at once or in bulks (i guess Lucene will return at most 1000 nodes per search by default)? And by what criterion?

Maybe it would be faster to gather all nodes by using FileFolderService and recursively scanning your document tree using listFiles() and listFolders()

Cheers
Gyro

mlagneaux · ‎06-17-2011

At the beginnig of the patch, the execution time per 1000 nodes is about 30 seconds.
After 100000 nodes processed, it is 2 minutes.
After 200000 nodes processed, it is 9 minutes.
After 290000 nodes processed, it is 30 minutes.

I fetch all the nodes at once thanks to a Lucene query on type "content" and the property status to "*" (status:*). The query gives me all my 300000 nodes to process.

I'm going to try with FileFolderService.

mrogers · ‎06-17-2011

I suspect you are scanning through more records as your patch progresses. (So your first batch reads 1000, processes 1000, second reads 2000, processes 1000, third reads 3000, processes 1000 etc…)

However without details of your code that's just a guess.

mlagneaux · ‎06-20-2011

Here is the code of my patch :


public int translateStatus()
{
   int nbStatus = 0;
   
   buildStatusMap();

   SearchParameters sp = new SearchParameters();
   sp.addStore(new StoreRef(StoreRef.PROTOCOL_WORKSPACE, "SpacesStore"));
   sp.setLanguage(SearchService.LANGUAGE_LUCENE);
   sp.addLocale(Locale.FRENCH);
   String query = "(TYPE:\"cm:content\""  
         + " OR TYPE:\"" + CeaModel.TYPE_DOCUMENT_PAPIER + "\")"
         + " AND   @cm\\:status:*";
   sp.setQuery(query);

   ResultSet results = null;
    if (logger.isDebugEnabled())
       logger.debug("Query:\r\n" + query);
       
   try
   {
      results = this.searchService.query(sp);
      if (results.length() > 0) {
         int total = results.length();
         int nbNodesProcessed = 0;
         int percent = 0;
         logger.info("Number of nodes to process : "+total);
            
         List<NodeRef> nodeRefList = results.getNodeRefs();
            
         for(NodeRef nodeRef : nodeRefList){
            if(nbNodesProcessed != 0 && nbNodesProcessed % 1000 == 0 ){
               percent = (100*nbNodesProcessed) / total;
               logger.info(nbNodesProcessed+" nodes processed out of "+total+". "+percent+"% complete.");
            }
            nbNodesProcessed++;
               
               
            try{
               if(this.nodeService.exists(nodeRef)){
                  logger.debug("NodeRef processed : " + nodeRef);
                  String oldDocStatus = (String)this.nodeService.getProperty(nodeRef, CeaModel.PROP_STATUS);
                  String newDocStatus = statusMap.get(oldDocStatus);
                  if (newDocStatus == null){
                     if (!statusMap.containsValue(oldDocStatus))
                        logger.error("Status \"" + oldDocStatus + "\" cannot be renamed (" + nodeRef + ")");
                  } else {
                     try{                  
                        this.nodeService.setProperty(nodeRef, CeaModel.PROP_STATUS, newDocStatus);
                        nbStatus++;
                        logger.debug("Status \"" + oldDocStatus 
                              + "\" was renamed \"" + newDocStatus + "\"");
                     } catch (Exception e){
                        logger.error("Status \"" + oldDocStatus + "\" cannot be renamed (" + nodeRef + ")");
                     }
                  }
               }
               else{
                  logger.error("NodeRef does not exist : " + nodeRef);
               }
            }
            catch(DataIntegrityViolationException e){
               logger.error("NodeRef ignored : "+nodeRef);
               
               e.printStackTrace();
            }
         }
         
         percent = (100*nbNodesProcessed) / total;
         logger.info(nbNodesProcessed+" nodes processed out of "+total+". "+percent+"% complete.");
      }
   } catch (Exception e){
      logger.error("Status translations failed.");
      
      e.printStackTrace();
   } finally {
      if (results != null)
      {
         results.close();
      }
   }
   
   return nbStatus;
} 
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

mlagneaux · ‎06-22-2011

I've modified my patch taking example on FixNameCrcValuesPatch. Now I get the list of id (in the database alf_node) to process thanks to a query via hibernate.
For each id, I create a object of type Node, I get the nodeRef and I update the status using the nodeService.

A test on a 20,000 documents store shows that my new patch is faster (about two times). But, what takes a lot of time is the commit of the transaction (which is done after the patch).

Is it possible to save time on the commit?
Is there a faster way than nodeService to update the property? This property is indexed, so I need that the nodes are reindexed after the update of the status.

Thank you in advance for your help.

mlagneaux · ‎07-05-2011

I consider that this post is solved since changing the way I get the nodes to process (from Lucene query to SQL query) improves the performance of my patch.
For a 300,000 nodes store, it took less than 7 hours against more than 48 before.

Hyland Connect

Running a patch : performance issue