cancel
Showing results for 
Search instead for 
Did you mean: 

Solr indexing stops

rjohnson
Star Contributor
Star Contributor
I have some issue with solr indexing which manifests itself with there being no commits to the main index (the archive one continues OK).

Whe you look at the solr stats you can see documents contantly being added but the commit count never goes up.

It seems to be connected with the index on a certain folder which is clearly present in a site when you look at the document library in Share but is not found by a lucene search. It has been found for over a year but it now isn't.

If you run a FIX on solr and then restart tomcat it has started indexing again properly but if you update the folder name (to try and get the search to find it) the commits just dry up.

Run a FIX and then restart and the indexing seems to be committing again.

I get no errors in any logs, just a steadily climbing added document count in the solr admin forms but no commits.

It looks like Alfresco calls its own commits every "batch" so I am guessing that for whatever reason something fails and so commits stop.

Is there anything I can turn on to try and debug this issue?

Can I safely turn on autocommit in the solr configuration? Would that help?
5 REPLIES 5

kwang1
Champ in-the-making
Champ in-the-making
I am experiencing the same problem in the environment -  Alfresco Enterprise v4.1.5 / Solr Version: 1.4.1

The solr index hangs up in the same point(Last Index commit date) each time after I regenerate the index, and the index space just keeps to increase until almost using up all the disk space(100G), I have to regenerate the index again before the indexing issue is fixed completely. 

Can I ask anyone if this may be related to the lack of solr server memory and big disk space usage of content store? or what I need to check and investigate that helps? any comments be greatly appreciated.


Content store space:

/app/alf_data/contentstore                               260G
/app/alf_data/contentstore.deleted                       224K

Solr index server(space/memory):

/app/solr/alf_data/solr]
193M    archive
244K    archive-SpacesStore
8.9G    workspace
244K    workspace-SpacesStore


$ free -m
             total       used       free     shared    buffers     cached
Mem:         15951      15151        800          0        103       5438
-/+ buffers/cache:       9609       6341
Swap:         3999        145       3854


the index report for me is as below.

<lst name="alfresco">
<long name="DB transaction count">125265</long>
<long name="DB acl transaction count">194</long>
<long name="Count of duplicated transactions in the index">0</long>
<long name="Count of duplicated acl transactions in the index">0</long>
<long name="Count of transactions in the index but not the DB">0</long>
<long name="Count of acl transactions in the index but not the DB">0</long>
<long name="Count of missing transactions from the Index">46067</long>
<long name="First transaction missing from the Index">1979308</long>
<long name="Count of missing acl transactions from the Index">0</long>
<long name="Index transaction count">79198</long>
<long name="Index acl transaction count">194</long>
<long name="Index unique transaction count">79198</long>
<long name="Index unique acl transaction count">194</long>
<long name="Index leaf count">72264</long>
<long name="Count of duplicate leaves in the index">0</long>
<long name="Index aux count">72264</long>
<long name="Count of duplicate aux docs in the index">0</long>
<long name="Index error count">0</long>
<long name="Count of duplicate error docs in the index">0</long>
<long name="Index unindexed count">49645</long>
<long name="Count of duplicate unindexed docs in the index">0</long>
<long name="Last index commit time">1427905765306</long>
<str name="Last Index commit date">2015-04-01T09:29:25</str>
<long name="Last TX id before holes">-1</long>
</lst>

kwang1
Champ in-the-making
Champ in-the-making
Can I also check and ask if there is any potential problem if I found 30+ documents in content store, but not in DB, and is the file size(~ 60M) some kind of big? do I have to remove the none sync-ed documents?

rjohnson
Star Contributor
Star Contributor
Do you mean in Postgres or in Solr?

Does anyone has any suggestions to this query? We are facing a similar issue with Solr reindex

1.   We rebuild solr index, after 40 hours we noticed the index stopped from the logs and the solr search show empty results:
{
"responseHeader":{
  "status":0,
  "QTime":382},
"response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
}}

2.   We rebuild index again on another server with same DB and data store, again the index stopped at the same point. We found that the last success indexed noteID is exact same from alfresco access logs.

3.   the index report for us is as below:

<lst name="alfresco">
<long name="DB transaction count">442076</long>
<long name="DB acl transaction count">1203</long>
<long name="Count of duplicated transactions in the index">0</long>
<long name="Count of duplicated acl transactions in the index">0</long>
<long name="Count of transactions in the index but not the DB">0</long>
<long name="Count of acl transactions in the index but not the DB">0</long>
<long name="Count of missing transactions from the Index">442076</long>
<long name="First transaction missing from the Index">1</long>
<long name="Count of missing acl transactions from the Index">0</long>
<long name="Index transaction count">0</long>
<long name="Index acl transaction count">1203</long>
<long name="Index unique transaction count">0</long>
<long name="Index unique acl transaction count">1203</long>
<long name="Index leaf count">0</long>
<long name="Count of duplicate leaves in the index">0</long>
<long name="Last index commit time">0</long>
<str name="Last Index commit date">1970-01-01T07:30:00</str>
<long name="Last TX id before holes">1963223</long>
</lst>


4.   After restarting the server, solr search works but number of result is less than documents in database.

5.   However Solr Incremental index on another server is working well.

6.   System environment
a.   Alfresco Community - v4.2.0 (4428)
b.   Redhat 64bit OS
c.   tomcat-6.0.29
d.   heap.maxsize: 96GB


We managed to find the issue why the index was stopped.
Actually it did not stop, it was slow when it was indexing the folders with the last updated date.
Still trying to find out why when using the bulk ingestion tool there is an update to a content, parent folder modified date is updated.
Once the Modified date of the parent folder is updated it may be indexing all the contents within the parent folder irrespective of an update to the content file.
Does anyone has inputs please help to reply back.