cancel
Showing results for 
Search instead for 
Did you mean: 

Problem with index recovery time and 32000 file limit

cariou
Champ in-the-making
Champ in-the-making
Hi,

My lucene index folder has reach the limit of 32000 subdirectories that is a systeme limit (you can find other post about this subject).

As I can/do not change this limit, I wanted to rebuild my indexes, in order to merge all the lucene indexes in one segment.

So I remove the lucene index folders, set recovery mode to FULL (instead of validate) and start alfesco (2.1).

The reindexing process start but it is very very very long (3 days and not reach the half…).

So my questions are :
* Why is there so many subfolder in my lucene index repository ? How to have only a fex subfolders ?
* Why is the index recovery mode so long ? How to accelerate it ?

Precisions :
* I have an ldap-authentication synchronization running every 10 minutes (for users and groups). Can it be the reason why the number of index is so importants?
* The alf_transaction table has more than 1 million rows. Is it normal ?
11 REPLIES 11

lotharm
Champ on-the-rise
Champ on-the-rise
To understand how the index is designed and how it is build with lucene as a tool, the wiki page describes this very good:
http://wiki.alfresco.com/wiki/Index_Version_2

To sum it up:
Basically there is one lucene index per transaction holding the small index delta information. That means in turn, there is one directory in the alf_data/lucene-indexes/workspace directory. A background job merges the delta to the "main" lucene index and removes the delta-index and its directory. Perhaps an Alfresco developer could tell us more about this?

All used index directories are referenced from the IndexInfo file. If there are directories not listed there, I believe they can be removed safely.
There is then an issue, not removing the leftovers from stopped transactions or the like.

Sadly, I do not know of a small CLI tool to dump the contents of the binary IndexInfo file (Well, hexdump might work…).

lothar

andy
Champ on-the-rise
Champ on-the-rise
Hi

The IndexInfo class has a main method that shows how to dump the index info.

Index rebuild is now multi-threaded.

Have you got any custom code that uses ResultSets - make sure they get closed.

It is possible background merging has failed (it does not recover so well on some versions) - or you do not have enough disk space to do the next index merge.

Andy