cancel
Showing results for 
Search instead for 
Did you mean: 

Performance fluctuation

ustraub
Champ in-the-making
Champ in-the-making
Hi,

we are using Alfresco community edition 4.0.e on Linux (RedHat). Alfresco, Tomcat and postgresql were freshly installed from the Alfresco community edition 4.0.e.

We want to migrate several million files (file size 1 KB to 1 MB or so) from another DMS system to this NEW alfresco installation. We are using CMIS for accessing alfresco (Apache OpenCMIS 0.7).

When archiving I see extreme performance variations. In two thirds of the uploads the time to create a CMIS object is of the order of 0,4 sec, in one third of the cases it is 3 to 6 sec.
More precisely: the two CMIS functions session.getObjectFactory().createContentStream() or parent.createDocument() take 3 to 6 sec in about one third of all calls.

When retrieving I see similar variations. Usually it takes 0,1 sec to retrieve the file, but regularly about every 4 sec it takes about 3 sec to access the corresponding CMIS object.
More precisely: the two CMIS functions session.getObjectByPath() or doc.getContentStreamOfDocument() take 3 sec sometimes.

Questions:
What is the reason of these fluctuations?
How can they be avoided?
Is concurrent archiving into the same folder possible, or does one archive operation block the other concurrent archive operation for the same folder.

Regards
U.Straub
1 REPLY 1

iblanco
Confirmed Champ
Confirmed Champ
I think that in the scenario you are explaining there might be several reasons at the same time that make this fluctuation happen. There is not enough information to determine exactly which ones might happen, but I would consider at least those:

<ul>
<li>Disk i/o on destination machine. System memory cache might make first writes fast and next ones slow</li>
<li>Database throughput might change along time due to statistical calculations related with the newly generated load. This used to happen to me with Mysql and it was "solved" by reanalyzing/optimizing tables.</li>
<li>Database transaction resolution time might change. If you are doing the load in parallel it would be posible that transactions compete with each other.</li>
<li>Lucene Index locking. I think writting to a Lucene index is a serialized operation, so there might be a bottleneck there. Disabling indexing during load and regenerating the Index later might be an option to consider.</li>
</ul>

With the information you provide we can't be more precise.

But anyway, if you are doing such a big file load, why are you using CMIS ? Wouldn't it be much better using some of the specialized tools like <a href="https://code.google.com/p/alfresco-bulk-filesystem-import/">the bulk filesystem import tool</a> ?

This tool batches the content so that not so many transactions are generated. It can use side files whith metadata if required and it can even do inplace ingestion so that you could "easily load" in alfresco files once you already have them accesible to your server in the content store.

The import tool is already part of Alfresco 4.0 I think, but the one in the community version does not allow inplace ingestion. If you install the module in google code instead it does support in place loading.

Hope it helps.