cancel
Showing results for 
Search instead for 
Did you mean: 

Bulk uploading using web scripts

mrsaqib
Champ in-the-making
Champ in-the-making
Hi ,

I have written a web script which gets called each minute (cron job). That script starts uploading files from the sub-folder of one of my defined directory, each sub-folder contains around 2000 files to process.
Now, as the script is being called automatically after each minute despite of whether the previous execution of the same script finished its execution or not, this helps me in paralleling my job so that script call in each minute starts processing his own sub-folder.

My question is, whether this solution of each minute execution of the same script is prefer or should i strict to only one execution of that script and do parallel processing using Java threads?

I Hoped you guys got my point Smiley Happy

Regards
3 REPLIES 3

openpj
Elite Collaborator
Elite Collaborator
I suggest to work on a transactional environment on WebScripts, that is by default if you set the value of the property "transaction" (in the descriptor file) to "required" or "requiresnew".

If you would like to work in a similar way of Java threads you could set "requiresnew" and the Spring context will create a new transaction for each WebScript call.

Remember that if you set the value of the property "transaction" to "none" this means that you are working without any transactional support, this means that you could corrupt the repository with some critical operations. But it could be useful this setting if you need to explicitly manage the transactions using the TransactionService of Alfresco deciding when the container must execute a rollback of all the operations. This is possible only using the Java API of Alfresco (Foundation Services API).

By default WebScripts are transactional (if the transaction property has a "requiresnew" or "required" value) for ALL the operations that are executed inside of it, this means that if only one of the operation fails, all the changes will be rollbacked.

Hope this helps  :wink:

mrogers
Star Contributor
Star Contributor
The problem with adding a new job every minute is that if each job takes longer than a minute you will eventually have a problem.  You may also have contention between the multiple jobs so your throughput drops.  However some degree of parallel processing is essential for the highest throughput. 

If the traffic tends to even out then this may not be a problem.   

If I were building it it I'd go for a single multi-threaded process or consider some sort of "throttle" somwhere.

mrsaqib
Champ in-the-making
Champ in-the-making
Yeah My Job (1 execution of script) takes more than minute, it depends upon the no of files but in most of the cases if takes many minutes. but each execution of job are having different data to process but eventually uploading to same alfresco repository.
I am having no issue till now regarding any conflict between jobs etc. but definitely i am concerned with the approach, i want to make it even better (efficient).