cancel
Showing results for 
Search instead for 
Did you mean: 

Disable Full Text Indexing

jjf
Champ in-the-making
Champ in-the-making
Is there a way to disable full text indexing of content?  If so, will this aid in the performance of running "index.recovery.mode=FULL".  Right now a full index takes hours to run and over 1GB of memory.  We have no need for Full text indexing in Alfresco so if it can be disabled, we'd like to pursue that. Thanks.
24 REPLIES 24

marco_sindoni
Champ in-the-making
Champ in-the-making
Same problem. Any suggestion is greatly appreciated.

Bye

Marco

msporled
Champ in-the-making
Champ in-the-making
I'm not entirely sure if it will work, but you could try setting:
lucene.indexer.maxFieldLength=0
in custom-repository.properties

Also- if you use versioning I would highly recommend you use, at least, 2.1.6 since it speeds up a full index rebuild a lot.

derek
Star Contributor
Star Contributor
Hi,
Full text indexing is not the problem since this is done in the background anyway.  The problem is index rebuilding.
msporled is quite right: 2.1 SP6 improves the index rebuild speed significantly by batch-processing transactions using a thread pool.  If you are running the community edition, then you could upgrade to Labs 3b to get the faster, multithreaded version of the rebuild.

jjf
Champ in-the-making
Champ in-the-making
What about changing the lucene.indexer.maxFieldLength field as suggested above?  Will that improve performance?

t_broyer
Champ in-the-making
Champ in-the-making
Is there a way to disable full text indexing of content?  If so, will this aid in the performance of running "index.recovery.mode=FULL".  Right now a full index takes hours to run and over 1GB of memory.  We have no need for Full text indexing in Alfresco so if it can be disabled, we'd like to pursue that. Thanks.

You should be able to override the definition of the luceneIndexerAndSearcher to pass it a custom ContentService (proxying the "original" one) that'll always return a "null" ContentReader: if Alfresco cannot read a document's content, it won't be able to index it.

derek
Star Contributor
Star Contributor
Hi,
As I said before, disabling full text indexing will not be a magic bullet.  Stick a profiler against the system, if you wish.  The fundamental issue is that the full text indexing was single-threaded.
Regards

jjf
Champ in-the-making
Champ in-the-making
If it's not full text indexing that's causing the issue then I guess the question is what is taking up memory and time when it does an index rebuild?  And is there anyway to tweak the system to improve the memory usage and rebuild time?  We've tried tweaking the lucene settings in repository.properties but that hasn't shown any benefits.

Is upgrading to 2.1Sp6/labs3b the only option?

msporled
Champ in-the-making
Champ in-the-making
Is upgrading to 2.1Sp6/labs3b the only option?

Upgrading to 2.1.6 took our rebuild from hours to minutes.  (or from days to hours, depending on how much data we had and the size of the box)

lee
Champ in-the-making
Champ in-the-making
I'm using 3.1sp1 and have noticed that there IS a problem indexing large excel files (300+meg). During indexing, our system grinds to a halt and we can experience oom issues.

I've profile the app and Alfresco uses POI to open and index these files. This takes upwards of 1g+ in memory whilst indexing. When finished indexing, the memory is released and the system goes back to normal.

We would like to disable indexing of content for only one of our content types.

Can we override the cm:content indexing property in our custom property?

If not, how can we get around this?

Thanks for any help!