Hyland Connect

jjf · ‎10-21-2008

Is there a way to disable full text indexing of content? If so, will this aid in the performance of running "index.recovery.mode=FULL". Right now a full index takes hours to run and over 1GB of memory. We have no need for Full text indexing in Alfresco so if it can be disabled, we'd like to pursue that. Thanks.

marco_sindoni · ‎11-13-2008

Same problem. Any suggestion is greatly appreciated.

Bye

Marco

msporled · ‎11-14-2008

I'm not entirely sure if it will work, but you could try setting:
lucene.indexer.maxFieldLength=0
in custom-repository.properties

Also- if you use versioning I would highly recommend you use, at least, 2.1.6 since it speeds up a full index rebuild a lot.

derek · ‎11-14-2008

Hi,
Full text indexing is not the problem since this is done in the background anyway. The problem is index rebuilding.
msporled is quite right: 2.1 SP6 improves the index rebuild speed significantly by batch-processing transactions using a thread pool. If you are running the community edition, then you could upgrade to Labs 3b to get the faster, multithreaded version of the rebuild.

jjf · ‎11-14-2008

What about changing the lucene.indexer.maxFieldLength field as suggested above? Will that improve performance?

t_broyer · ‎11-17-2008

Is there a way to disable full text indexing of content? If so, will this aid in the performance of running "index.recovery.mode=FULL". Right now a full index takes hours to run and over 1GB of memory. We have no need for Full text indexing in Alfresco so if it can be disabled, we'd like to pursue that. Thanks.

You should be able to override the definition of the luceneIndexerAndSearcher to pass it a custom ContentService (proxying the "original" one) that'll always return a "null" ContentReader: if Alfresco cannot read a document's content, it won't be able to index it.

derek · ‎11-17-2008

Hi,
As I said before, disabling full text indexing will not be a magic bullet. Stick a profiler against the system, if you wish. The fundamental issue is that the full text indexing was single-threaded.
Regards

jjf · ‎11-17-2008

If it's not full text indexing that's causing the issue then I guess the question is what is taking up memory and time when it does an index rebuild? And is there anyway to tweak the system to improve the memory usage and rebuild time? We've tried tweaking the lucene settings in repository.properties but that hasn't shown any benefits.

Is upgrading to 2.1Sp6/labs3b the only option?

msporled · ‎11-18-2008

Is upgrading to 2.1Sp6/labs3b the only option?

Upgrading to 2.1.6 took our rebuild from hours to minutes. (or from days to hours, depending on how much data we had and the size of the box)

lee · ‎11-24-2009

I'm using 3.1sp1 and have noticed that there IS a problem indexing large excel files (300+meg). During indexing, our system grinds to a halt and we can experience oom issues.

I've profile the app and Alfresco uses POI to open and index these files. This takes upwards of 1g+ in memory whilst indexing. When finished indexing, the memory is released and the system goes back to normal.

We would like to disable indexing of content for only one of our content types.

Can we override the cm:content indexing property in our custom property?

If not, how can we get around this?

Thanks for any help!

Hyland Connect

Disable Full Text Indexing