cancel
Showing results for 
Search instead for 
Did you mean: 

32000 lucene index folder

aktif
Champ in-the-making
Champ in-the-making
Hello,
I am using alfresco community 4.0.d on redhat linux with tomcat and mysql. I have 3.3TB data on production environment.
When i reach 32000 folders limit in lucenes-indexes/workspace/SpacesStore folder, I restart the tomcat and merge process starts while alfresco restarting.
32000 folders decrease to ~20 after restart. I have to restart the application every time i reach 32000 limit.

Obviously there is a index merge job in the restart process of the alfresco.
Are there anybody know the name of this job? I think it is off for online usage. where is it in the configuration?
I want to use this index merge job while alfresco is running. i don't want to restart application priodically.
Thanks.
3 REPLIES 3

afaust
Legendary Innovator
Legendary Innovator
Hello,

the index merge job is not only active during startup but active all the time. The main problem is that it needs to be able to merge, which isn't the case when the index files are "apparently" still in use. There is a programmer error that can cause the Lucene index to swell up like this based on not cleaning up search results properly. This keeps index files marked as "in use" which causes Alfresco to create copies whenever new nodes are indexed. The longer the system runs and the longer those result sets aren't cleared by application code, the more of those folders / copies accumulate. When you restart the system, those "in use" marks are all gone and Alfresco is able to remove all those redundant copies.

Since Lucene is not used in current Alfresco versions (4.0 and up have SOLR, 5.0 removed Lucene completely), there isn't much current documentation on that matter. But <a href="https://alfrescoshare.wordpress.com/2009/11/19/coding-best-practice-lucene-search-query-resultset-cl...">an older blog posts</a> highlights the relevant pattern of closing result sets.

Regards
Axel

aktif
Champ in-the-making
Champ in-the-making
Thank you for your answer Axel. I find several old alfresco issues about this topic after your post.
I think this problem doesn't fixed in 4.0.d version.
Apperantly there are lots of result sets in alfresco source codes which is not closed properly. This leads to 32000 lucene index directories!

Are there any class/method name list for 4.0.d version to patch?
Or should I review the source codes on my own?

https://alfrescoshare.wordpress.com/2009/11/27/coding-best-practice-lucene-search-query-resultset-cl...
https://issues.alfresco.com/jira/browse/ALF-1588
https://issues.alfresco.com/jira/browse/ETHREEOH-3229
https://issues.alfresco.com/jira/browse/MNT-1500
https://issues.alfresco.com/jira/browse/ALFCOM-3683

afaust
Legendary Innovator
Legendary Innovator
It might be wise to upgrade to Alfresco 4.2.f (the last version which still has Lucene) before you spent your time looking through code to fix things yourself. While SOLR has been the preferred search sub system since 4.0, Lucene issues still received fixes up to 4.2, so you may be chasing issues that have already been addressed.

Regards
Axel