cancel
Showing results for 
Search instead for 
Did you mean: 

Lucene indexes are around 5 times larger than contentstore?

damonrand
Champ in-the-making
Champ in-the-making
We've run around 13,000 word and rtf documents into an Alfresco 2.1 instance on Linux. We're finding the lucene-indexes are five times larger than the content we are indexing? Has anyone else seen this?

    -> 2.8 Gb contentstore
    -> 15.4 Gb lucene-indexes
   
Regards,
Damon.
6 REPLIES 6

damonrand
Champ in-the-making
Champ in-the-making
We've now done some further testing around indexes sizes to see why they are so much larger than our content store..

Below are different tests and the results on the indexes sizes for each.
These were done sequentially.  The last one is the most interesting in that
blowing indexes away results in a large decrease. It seems old and
presumably unused indexes are hanging around???

Damon.


After Bootstrap Alfresco:

4.3M    live/lucene-indexes
4.0K    live/contentstore.deleted
4.6M    live/contentstore
4.0K    live/audit.contentstore
8.8M    live/

After migrating 9 folders with a few hundred files:

27M     live/lucene-indexes
4.0K    live/contentstore.deleted
13M     live/contentstore
4.0K    live/audit.contentstore
40M     live/

lucene-indexes directory breakdown:

80K     ./lucene-indexes/user
4.0K    ./lucene-indexes/locks
48K     ./lucene-indexes/archive
100K    ./lucene-indexes/system
27M     ./lucene-indexes/workspace

After server restarted actually went down a little:

26M     ./lucene-indexes
4.0K    ./contentstore.deleted
13M     ./contentstore
4.0K    ./audit.contentstore

Set index.recovery.mode=FULL and restarted the server:

36M     ./lucene-indexes
4.0K    ./contentstore.deleted
13M     ./contentstore
4.0K    ./audit.contentstore
49M     .

Set blew away index and set index.recovery.mode=FULL and restarted the
server:
9.6M    ./lucene-indexes
4.0K    ./contentstore.deleted
13M     ./contentstore
4.0K    ./audit.contentstore
23M     .

andy
Champ on-the-rise
Champ on-the-rise
Hi

How are you loading this data?

Andy

chatch
Champ in-the-making
Champ in-the-making
The data is loaded using a migration script which loads using a combination of the following calls:

nodeService.createNode followed by a contentWriter.putContent into that node
fileFolderService.create
fileFolderService.copy    (from space templates)

Chris

andy
Champ on-the-rise
Champ on-the-rise
Hi

Do you do any queries? Do you make sure you close the result sets?

Andy

chatch
Champ in-the-making
Champ in-the-making
yes I do and no I don't! I'll close all handles and re run and see what happens … cheers.

chatch
Champ in-the-making
Champ in-the-making
sweet. thats got it.

After a load (without closes):
25456   contentstore
51460   lucene-indexes

After a load (with closes):
15164   contentstore
16296   lucene-indexes