cancel
Showing results for 
Search instead for 
Did you mean: 

Problem with index recovery time and 32000 file limit

cariou
Champ in-the-making
Champ in-the-making
Hi,

My lucene index folder has reach the limit of 32000 subdirectories that is a systeme limit (you can find other post about this subject).

As I can/do not change this limit, I wanted to rebuild my indexes, in order to merge all the lucene indexes in one segment.

So I remove the lucene index folders, set recovery mode to FULL (instead of validate) and start alfesco (2.1).

The reindexing process start but it is very very very long (3 days and not reach the half…).

So my questions are :
* Why is there so many subfolder in my lucene index repository ? How to have only a fex subfolders ?
* Why is the index recovery mode so long ? How to accelerate it ?

Precisions :
* I have an ldap-authentication synchronization running every 10 minutes (for users and groups). Can it be the reason why the number of index is so importants?
* The alf_transaction table has more than 1 million rows. Is it normal ?
11 REPLIES 11

kumbach
Champ in-the-making
Champ in-the-making
We hit the 32000 file limit too. We're running version 2.2.0 enterprise on Solaris. There is a bug in that version that causes old lucene directories to hang around instead of being deleted. We were told it was fixed in 2.2.1e but when we went to upgrade to that version we discovered that the lucene bug was indeed fixed but another bug was introduced somewhere else that stopped us from upgrading.

For now I go in and manually delete any lucene directory older than 7 days. Is that crazy or what. Rebuilding the indexes is not an option for us because it would probably take an entire day and we can't have the production server down that long. We have a lot of folders, but not a lot of documents.

I don't know how such a huge bug like that could have been released. I'm not too impressed with Alfresco's quality control.

paul_lahitte
Champ in-the-making
Champ in-the-making
I am running community 2.1 and I have the same problem and since my repository is 100Gb I can no longer afford reindexing (I tried it during a week-end and after 48H it was still not done!) . My lucene indexe contains 15000 directories and I can't reindex , could you explain more precisely how you delete olders directories to keep my system running ?
I am in the process of upgrading to labs3.c but I still have errors (duplicate row within the database during the schema upgrade )…

rudischmitz
Champ in-the-making
Champ in-the-making
Maybe you could go with ext4 or another less limiting file system for your lucene-indexes folder?

http://www.ibm.com/developerworks/linux/library/l-ext4/

More subdirectories   If you've ever felt constrained by the fact that a directory can only hold 32,000 subdirectories in ext3, you'll be relieved to know that this limit has been eliminated in ext4.

So maybe in the future put your lucene folders on an ext4 partition? 

http://kernelnewbies.org/Ext4#head-97cbed179e6bcc48e47e645e06b95205ea832a68

2.3. Sub directory scalability
Right now the maximum possible number of sub directories contained in a single directory in Ext3 is 32000. Ext4 breaks that limit and allows a unlimited number of sub directories.


Ext4 will be the default FS for linux setups in the future.   Jaunty Jackelope ubuntu 9.04 already has it.  Smiley Very Happy

paul_lahitte
Champ in-the-making
Champ in-the-making
thank's a lot but I did choose Redhat on purpose  as a professional server operating system, I am afraid my system will reach the 32000 dir limit before the extfs4 is being released .
I can't beleive there is no other solution to stop this ridiculous bug .The best is  I know my users are never or very few using index search !!

Do I have to migrate to sharepoint ??

rudischmitz
Champ in-the-making
Champ in-the-making
Sharepoint = all your data in a blob in the DB.   Smiley Surprisedops: . Unless you hack it to not to. Forget about your iso files.

http://sharepointandbeyond.com/2008/04/10/storing-data-outside-sql-server/

jcarrasco
Champ in-the-making
Champ in-the-making
We have the same problem at one of our stacks. Our stack has the follow features:

- Cluster mounted on OCFS2
- High volume
- Custom model with custom indexable metadata
- A folder based hierarchy (4 levels)
- Red Hat
- Alfresco 2.1 E
- Jdk 1.5 07

Could you tell us your stack features for try to focus on the feature that perhaps is the guilty of have so many lucene archieves ?

paul_lahitte
Champ in-the-making
Champ in-the-making
My stack is (not sure to understand what' needed)

Vmware esx server on EMC
High Volume 100gb
No custom model
A folde based hierachy must be more the 10 levels
REDHAT EL5
Alfresco tomcat community 2.1.0
jdk1.6.0


I am quite fed up since I tried to upgrade (it fails due to an unclear duplicate entry in the database) then I tried to export the full repository (it's huge) and importing in the Lab3c but it failed after 24 hours (I read in the forum there is a bug importing package bigger then 4Gb …)
Now i am trying to install this on a release supporting ext4 but so far nothing is working and the lucenes dir are growing growing ….and so far i havent found an other way out of reindexing (this will take days that we can't afford on a production system) and it is just a way aof geeting more time …

I have found that reiserfs is capable of managing about 65000 subdirs an to get more time I will backup format ad restore this ext3fs …

ra74
Champ in-the-making
Champ in-the-making
I belive there should be a way to solve this issue. We have a system that reads all the content from the alfresco and the content is indexed by lucene + compass. I've to check but as I remember there're about 10 files
at all. Tools like Luke can be used to view and manage the indexes. I don't know how it's working in alfresco but there're a lot directories and I could't use Luke
Moreover we've severe problem just now - copy of production system was made, backup lucene directory was renamed to lucene-indexes but the alfresco cannot start. Reindexing the content is not an option as it takes a few days and this is our test system. What I learnt the checker uses lucene to search for all the stores' root and the they cannot be found. I would say alfresco setup is correct

paul_lahitte
Champ in-the-making
Champ in-the-making
he only workaround that I have from is to migrate the computer from redhat to suze (supporting reseirfs) ,recreate the filesystem with reiserfs and recover the data !