cancel
Showing results for 
Search instead for 
Did you mean: 

Store Transactions and Performance (FTP+CIFS)

lessrandom
Champ in-the-making
Champ in-the-making
hello,

we plan to use alfresco as a media store and have large amounts of files transferred via cifs and ftp.
unfortunately our tests show a rather bad performance using these protocols.
i tried to find out what is the reason for this and used a profiling tool to find the methods using most of the cpu time.

we tested this by transferring a directory with many 100k sized files, comparing it to other protocols.
ftp was around 6s
ftp to alfresco 60s (which is more like scp  oerformance)

a lot of time is consumed in the AlfrescoDiskDriver, in the methods
closeFile
fileExists
openFile
within closeFile it's
doInWriteTransaction and RetryingTransactionManager.

so here comes my question:
Is there a way to improve this?
Is it necessary for each ftp'ed file?
Is there a way to make tranbsactions more responsive for cifs and ftp?

kind regards

randolph
8 REPLIES 8

mrogers
Star Contributor
Star Contributor
First of all 3.4 is much faster, try that over 3.3 to see if it makes a big difference for you.

And CIFS is unlikely to be a good choice to upload content since it is very "chatty".

Mike Farman gave a presentation on this sort of issue at DevCon.

Here's some points from that presentation:

FTP or WebScript would be a better choice since content can simply be streamed to the server.

Also make sure that any processing for example indexing and thumbnailing is running asynchronously rather than in transaction.

For the highest performance upload. there's a "back door" which is to use some sort of UDP based protocol to transfer your files into the content store and then use the node service directly to create a node for your content.

kevinr
Star Contributor
Star Contributor
FTP seems "slow" because of indexing of content - FTP to a basic file server will obviously not be doing this, Alfresco is doing a LOT more! So yes async indexing will reduce this a lot.

lessrandom
Champ in-the-making
Champ in-the-making
hi again,

thx for your replies.
We moved from CIFS to FTP for our tests to eliminate the chattyness & overhead of CIFS so we could see how big the influence is. Actually performance is a little bit better via FTP but not getting us any closer to the speed of normal FTP.

Also 3.4a and 3.4b did not show much better results (I installed both out of the box, empty, with the bundled mysql-DB and ran the tests). But also in these setups, Alfresco is 6,5-7,5 times slower than direct FTP into the same system (altough is was the best performance I was able to accieve so far).

How can I make sure that indexing and thumbnailing is done asynchronous ?
What about the transactions that are running on the Alfresco-side ?

thanks and kind regards,

randolph.

kevinr
Star Contributor
Star Contributor
Alfresco will never be as fast as a dumb filesystem, because it does a lot, lot more! FTP is a route into Alfresco, it is never going to be as fast as pure FTP filesystem copy. I'm glad that the move from 3.3->3.4 plus a few tweaks have at least taken you from 10x slower to 5x slower than a filesystem - good news for 3.4 performance.

Start with this global property to enable background indexing of all docs: lucene.maxAtomicTransformationTime=0
The recent Alfresco Devcon covered bulk upload techniques in great detail, including this setting.

Kev

lessrandom
Champ in-the-making
Champ in-the-making
Hi Kev,

thx for the pointer ! Setting lucene.maxAtomicTransformationTime=0 improved the performance a little bit, not massively, though. What I can see copying in from Win7 via CIFS into Alfresco is that the speed is varying a lot and the copy sometimes even stalls for some (precious) seconds.  So, I think there's still quite some stuff for me to be done here.

Where can I pick up the presentations from the DevCon, for example the one from Mike Farman ?
Working through these documents and trying tweaks would certainly get me ahead …

Thanks and kind regards,

randolph.

norbertveenbrin
Champ in-the-making
Champ in-the-making
Hi there,

did you ever get more info on this? I am actually in the middel of a conversion of legacy documents to Alfresco. I am using FTP as method of uploading, however it just get's slower and slower. At this rate it'll take far too long. I have about 10 years of scan-archives to upload in a matter of 2 months, while not interfering with the current new uploads and searches done by users. Funny how managers always assume it all just magically works out according to an imaginary schedule.

At first glance it doesn't seem to be a high CPU usage, databaseload or maxing out memory. It actually went pretty fast when we started out approx. 100.000 files in about 6-10 hours, just right for a nightly batch. Now it takes more than 3 secs per file which means it has to run day and night. Fortunately it doesn't seem to bother the overall performance much, other uploads just pass right through without having to pause the bulk upload, and no user has complained so far about slow performance or unresponsiveness.

I am at a loss as to where the slowdown may be originating. Possibly the indexing has something to do with it, but I couldn't be sure. Any info on tweaking performance in Bulk uploads would be very welcome.

kevinr
Star Contributor
Star Contributor
Can i ask what version of Alfresco you are trying this on? I know for a fact there have been some additional improvements for 4.0, and even more coming.

Cheers,

Kev

norbertveenbrin
Champ in-the-making
Champ in-the-making
Hi Kevin,
we're on version 3.3.4 and will of course be upgrading. However I am at a loss where this performance degredation is coming from. It even went back to the old performance recently, without any changes we made, only to slowly degrade in the same way. I am now actually almost waiting for it to spontaneously come back to life again. This all feels like there is a leak somewhere that when pushed hard and long enough just comes to a point where it starts cleaning up. No idea if this is a Garbage-collection issue, a transactional cache problem or something else.

Perhaps you could give me some pointers? Immediate software-upgrade is not an option right now for several reasons. Funny thing is, this could become possible just after the bulk-upload should be completed, thereby making it less urgent. The bulk-upload however was scheduled around a certain performance.