cancel
Showing results for 
Search instead for 
Did you mean: 

Indexing questions...

jonnycat26
Champ in-the-making
Champ in-the-making
Hi,

We've been using Alfresco as our school CMS system, and we experience a problem yesterday that I suspect is related to indexing, and I was hoping for some input.

We had a user upload a bit over 800MB of files via SMB, and immediately after the CPU spiked and the machine became unresponsive.  I suspect it's because alfresco/lucene was busy indexing the files, but wanted to see if anyone else had any guesses.

Are there some settings I can use to mitigate the impact of a massive upload like this?  Teachers will continue to do mass migrations like this, and I'd like to have the indexer act a little less aggressively when new files are introduced.

Any input?
6 REPLIES 6

ebo
Champ in-the-making
Champ in-the-making
Not an expert myself but this may be useful

C:\Alfresco\tomcat\webapps\alfresco\WEB-INF\classes\alfresco\repository.properties

This file has a bunch of lucene index properties in it. Also, check out

http://wiki.alfresco.com/wiki/Data_Dictionary_Guide

and search for the word index … you can set the atomic to be false so lucene indexes run in the background.

Hope this helps and puts you on the right path.

andy
Champ on-the-rise
Champ on-the-rise
Hi

CIFS indexes as it goes (it does not work as one big transaction and index at the end). Content will most likely be indexed in the background.
This will happen as it goes (after up to a minute)

So it is likely to be something like the conversion of content to text in the background, running rules etc. Background indexing is fine unless you have some big conversion that eats all your available memory.

Are you using OpenOffice for conversions?
What kind of docs were loaded?
Can you report the memory, CPU and IO load?
Have you got any rules/actions set up?
Can you get a snapshot of what is going on using jstack?

Andy

jonnycat26
Champ in-the-making
Champ in-the-making
Are you using OpenOffice for conversions?
What kind of docs were loaded?
Can you report the memory, CPU and IO load?
Have you got any rules/actions set up?
Can you get a snapshot of what is going on using jstack?

Andy,

Yes, we are using OpenOffice for conversions, and these were all Word docs.  We had about 800MB uploaded in roughly 12 mins.  I can't report specifically on the CPU and IO load because we're running Alfresco in an ESX cluster and I find the information to be less than totally reliable. 

We do have 768MB of RAM on the box, and we're running Alfresco with it's default out-of-the-box config.  I'm starting to think we need to allocate more memory to the box and JVM, but that's just a guess and any input is appreciated.

-jon

andy
Champ on-the-rise
Champ on-the-rise
Hi

Is the database on the same machine/vmware image?

I think you need more memory and if possible a few more CPUs. Especially if the DB is on the same machine. If you only have 768M on the machine or VMWare image you will likely end up swapping in VMWare.

VMWare IO access will be slow compared with native IO. The DB and lucene indexing will be affected by this. If you go over your memory size and end up swapping then my guess is everything will stop.

Andy

jonnycat26
Champ in-the-making
Champ in-the-making
Hi

Is the database on the same machine/vmware image?

I think you need more memory and if possible a few more CPUs. Especially if the DB is on the same machine. If you only have 768M on the machine or VMWare image you will likely end up swapping in VMWare.

Andy,

No, the database is on another machine, and we just reconfigured the machine and upped the RAM to 2GB and gave the JVM 1.2gb of memory.  FYI, it's configured as a 2CPU machine, and the rest of the cluster is pretty idle, so this machine isn't being starved for CPU.

We're still having issues tho, when uploading large amounts of files via CIFS.  For each file we upload, the CPU does spike, and after a while, the entire JVM comes down.  Is there any way I can tune the indexing (I have set atomic to false) to make it much, much, much less aggressive?

bejelith
Champ in-the-making
Champ in-the-making
No, the database is on another machine, and we just reconfigured the machine and upped the RAM to 2GB and gave the JVM 1.2gb of memory.  FYI, it's configured as a 2CPU machine, and the rest of the cluster is pretty idle, so this machine isn't being starved for CPU.

We're still having issues tho, when uploading large amounts of files via CIFS.  For each file we upload, the CPU does spike, and after a while, the entire JVM comes down.  Is there any way I can tune the indexing (I have set atomic to false) to make it much, much, much less aggressive?
Same problem here!!