Problem with index recovery time and 32000 file limit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎12-01-2008 08:49 AM
My lucene index folder has reach the limit of 32000 subdirectories that is a systeme limit (you can find other post about this subject).
As I can/do not change this limit, I wanted to rebuild my indexes, in order to merge all the lucene indexes in one segment.
So I remove the lucene index folders, set recovery mode to FULL (instead of validate) and start alfesco (2.1).
The reindexing process start but it is very very very long (3 days and not reach the half…).
So my questions are :
* Why is there so many subfolder in my lucene index repository ? How to have only a fex subfolders ?
* Why is the index recovery mode so long ? How to accelerate it ?
Precisions :
* I have an ldap-authentication synchronization running every 10 minutes (for users and groups). Can it be the reason why the number of index is so importants?
* The alf_transaction table has more than 1 million rows. Is it normal ?
- Labels:
-
Archive
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎12-01-2008 04:59 PM
For now I go in and manually delete any lucene directory older than 7 days. Is that crazy or what. Rebuilding the indexes is not an option for us because it would probably take an entire day and we can't have the production server down that long. We have a lot of folders, but not a lot of documents.
I don't know how such a huge bug like that could have been released. I'm not too impressed with Alfresco's quality control.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎02-05-2009 11:22 AM
I am in the process of upgrading to labs3.c but I still have errors (duplicate row within the database during the schema upgrade )…

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎02-06-2009 11:26 AM
http://www.ibm.com/developerworks/linux/library/l-ext4/
More subdirectories If you've ever felt constrained by the fact that a directory can only hold 32,000 subdirectories in ext3, you'll be relieved to know that this limit has been eliminated in ext4.
So maybe in the future put your lucene folders on an ext4 partition?
http://kernelnewbies.org/Ext4#head-97cbed179e6bcc48e47e645e06b95205ea832a68
2.3. Sub directory scalability
Right now the maximum possible number of sub directories contained in a single directory in Ext3 is 32000. Ext4 breaks that limit and allows a unlimited number of sub directories.
Ext4 will be the default FS for linux setups in the future. Jaunty Jackelope ubuntu 9.04 already has it.


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎02-06-2009 11:40 AM
I can't beleive there is no other solution to stop this ridiculous bug .The best is I know my users are never or very few using index search !!
Do I have to migrate to sharepoint ??

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎02-06-2009 02:05 PM

http://sharepointandbeyond.com/2008/04/10/storing-data-outside-sql-server/

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎02-07-2009 04:30 PM
- Cluster mounted on OCFS2
- High volume
- Custom model with custom indexable metadata
- A folder based hierarchy (4 levels)
- Red Hat
- Alfresco 2.1 E
- Jdk 1.5 07
Could you tell us your stack features for try to focus on the feature that perhaps is the guilty of have so many lucene archieves ?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎02-07-2009 04:46 PM
Vmware esx server on EMC
High Volume 100gb
No custom model
A folde based hierachy must be more the 10 levels
REDHAT EL5
Alfresco tomcat community 2.1.0
jdk1.6.0
I am quite fed up since I tried to upgrade (it fails due to an unclear duplicate entry in the database) then I tried to export the full repository (it's huge) and importing in the Lab3c but it failed after 24 hours (I read in the forum there is a bug importing package bigger then 4Gb …)
Now i am trying to install this on a release supporting ext4 but so far nothing is working and the lucenes dir are growing growing ….and so far i havent found an other way out of reindexing (this will take days that we can't afford on a production system) and it is just a way aof geeting more time …
I have found that reiserfs is capable of managing about 65000 subdirs an to get more time I will backup format ad restore this ext3fs …
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎04-08-2009 07:14 PM
at all. Tools like Luke can be used to view and manage the indexes. I don't know how it's working in alfresco but there're a lot directories and I could't use Luke
Moreover we've severe problem just now - copy of production system was made, backup lucene directory was renamed to lucene-indexes but the alfresco cannot start. Reindexing the content is not an option as it takes a few days and this is our test system. What I learnt the checker uses lucene to search for all the stores' root and the they cannot be found. I would say alfresco setup is correct

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎04-09-2009 04:29 AM
