cancel
Showing results for 
Search instead for 
Did you mean: 

Script Node move

smcardle
Champ in-the-making
Champ in-the-making
Hi All

I'm trying to get to the bottom of an issue in a JavaScript webscript controller.

We have thousands of spaces to move to a new parent structure. The current structure has maybe 50 parent spaces each of which has several thousand sub spaces to move under a new top level space. The following is an example of the current structure…


Branches —
     AB —
          10001 —
          10002 —
          …
     CD —
          200002 —
          210001 —
          …
     EF —
     …

And we are moving to this structure

Clients —
    10001 —
    10002 —
    10003 —
    ….

i.e the Clients under each branch are being flattened into a dedicated Clients space

As I said, each branch has several thousand clients and in total there are 10s of thousands to move. There should not be any overlap of clients i.e. a client space should not exist in more than one branch. However, there are exceptions Smiley Sad

So, I created a javascript web script to move the clients in each branch to the new Clients space (one branch at a time only) /api/move-branch/{branchid}

Now my problems … There are 2

As this is a webscript everything is done in one transaction so if the webscript produces an exception everything roles back (this works). The problem here is that the script node move is supposed to return true or false, however, if the space already exists under the client structure it throws an exception and even if I catch the exception and handle it Client move's occurs, it seam to be that the exception triggers a transaction rollback which I would have only expected has I exited the web script via an exception. I'm pretty sure this is not correct.

The only way I could get this to work was to check the new Client parent space for the existence of the new Client prior to doing the move and only if it does not exist attempt the move… This adds extra checking that should not really be needed.

My second problem…. It take over an HOUR just to move about two thousand spaces ????? Why does this method take so long ??? It means our migration script could run for DAYS???

Here is the webscript….



function moveBranch() {

    var notMovedClients = new Array();       
    var movedClients = new Array();       

   var branchid = url.templateArgs["branchid"]
   if(branchid) {
      try {
         var branch = companyhome.childByNamePath("/Branches/" + branchid);
         if(!branch) {
            status.setCode(status.STATUS_OK, "Branch [" + branchid + "] does not exist. Nothing to do.");
            return;
         }
         var clients = companyhome.childByNamePath("/Clients");
         var remove = true;

         var len = branch.children.length;
         for(var i = 0; i < len; i++) {
            var child = branch.children;
            try {
               if(clients.childByNamePath(child.name)) {
                  // Should not be required as move should return true or false.
                  // This does stop move throwing exceptions and causing the whole transaction to abort
                  remove = false;
                  notMovedClients.push(child.name);
                  continue;
               }
               var moved = child.move(clients);
               if(!moved) {
                  remove = false;
                  notMovedClients.push(child.name);
               } else {
                  movedClients.push(child.name);
               }
            }catch(e) {
               // The move should not throw an exception but it does so catch it
               // and treat it as a failure to move
               remove = false;
               notMovedClients.push(child.name);
            }
         }
         if(remove) {
            branch.remove();
         }
         var message = "Finished processing Branch [" + branchid + "].";

         if(notMovedClients.length != 0) {
            message += " The following clients were not able to be moved automatically and will need to be moved by hand [" + notMovedClients + "].";
         }
         if(movedClients.length != 0) {
            message += " The follwing clients were moved [" + movedClients + "].";
         }
      
         status.setCode(status.STATUS_OK, message);
         return;
      }catch(e) {
         status.setCode(status.STATUS_INTERNAL_SERVER_ERROR, e.toString());
         return;
      }

   } else {
      status.setCode(status.STATUS_INTERNAL_SERVER_ERROR, "Branch must be defined");   
      return;
   }
}

moveBranch();


Any insights here or an alternative method to achieve the same thing would be good..



Steve
11 REPLIES 11

afaust
Legendary Innovator
Legendary Innovator
Hello,

which version of Alfresco are you using? Are you using Lucene indexing?

Regards
Axel

smcardle
Champ in-the-making
Champ in-the-making
Hi Axel

Thanks for responding.

Yes we are using Lucene indexing.

Regards

afaust
Legendary Innovator
Legendary Innovator
Hello,

ok, this explains some of the duration. When you are using Lucene you have to keep in mind that changes you do in a transaction are indexed before the transaction completes. If you move a couple of thousand client folders (with how many documents?) in one web script call, the indexer takes quite a bit of time to reindex all moved folders AND the contained structures. The move itself probably completed in just a few minutes, but the indexing part is the one that really takes a long time. You can't optimize this away by rewriting your script, but you may want to think about disabling indexing while you perform the move and then running a re-index afterwards (which should be faster, since it can run in parallel).

As far as the structure of your folders goes: If you stick to DB-based paged navigation or direct ID lookups, you should be fine concerning the navigation performance. But if remotely possible, I'd personally try to find a way to sub-divide the clients further, e.g. have subfolders for clients whose ID begins with 101, 102 … etc. - not so much for performance sake, but for administration / navigation in Alfresco views.

Regards
Axel

smcardle
Champ in-the-making
Champ in-the-making
Thanks Axel

I had considered the lucene indexing and was in-fact working on switching it off prior to migration just to test that.

We don't have an issue with alfresco views as we never use the Alfresco UI in deployment. The whole repository is accessed via webscripts and we never do general queries i.e. get all clients.

I'll get back with results for disabling the indexing

Thanks

Steve

smcardle
Champ in-the-making
Champ in-the-making
Hi Axel.

How do I completely disable indexing ?

lucene.indexer.contentIndexingEnabled=false
This property controls whether or not the content of the documents is indexed. If false, content is not indexed.

The content may not be indexed but it seems that everything else is…..

This has had little effect on the time it takes to run the migration of clients from the branch structure as the documents currently stored under a client are very small, text files in the order of 100-200 bytes only

So, my webscript is throwing thousands of exceptions during the move of a branch containing thousands of clients.

Basically, I want to switch indexing off completely, mover the clients from the branched into their new client structure, re-enable indexing, stop alfresc and set properties for a full re-index and viola…..

Only I can't do this due to large numbers of failures during the move that then rolls back the transaction

What in the webscript could be causing the exceptions to occur?

Regards

Steve

smcardle
Champ in-the-making
Champ in-the-making
Sorry Axel.

Also forgot to mention this is a 4.2.c community install

lista
Star Contributor
Star Contributor
I don't think that's the right approach, having that many nodes under a single node will result in poor performance. You should really re-think if you need this.

smcardle
Champ in-the-making
Champ in-the-making
Hi Lista

Unfortunately we can't change this structure and I would not have thought that this would cause performance issues.

Each client has multiple child folders containing different document types. It's just that we have thousands of Clients under a Clients node.

There is no other logical grouping we can apply without significant rework in the underlying webscripts.

Steve

mitpatoliya
Star Collaborator
Star Collaborator
Steve
I think you can optimize it like this

try {
            var existingclient=clients.childByNamePath(child.name)
                                      
               if(null!=existingclient) {
                  // Should not be required as move should return true or false.
                  // This does stop move throwing exceptions and causing the whole transaction to abort
                  remove = false;
                  notMovedClients.push(child.name);
                  continue;
               }
               
               else {
                  var moved = child.move(clients);
                  movedClients.push(child.name);
               }
            }catch(e) {
               // The move should not throw an exception but it does so catch it
               // and treat it as a failure to move
               remove = false;
               notMovedClients.push(child.name);
            }


Also you want to keep track of the clients which are moved that is adding overhead in this script

One more suggestion if it gets much more complected you can always go for JAVA backed webscript.