cancel
Showing results for 
Search instead for 
Did you mean: 

Solr 4 indexing lag

joe_l3
Confirmed Champ
Confirmed Champ

Hello, 

has anyone experienced Solr 4 indexing lagging with huge content store ?

I am facing a problem with Solr and the NRT (NearRealTime) indexing. Basically, in my environment Solr takes too long to sync indexes with the DB data. Documents are searchable only after 8-10 minutes.

Here is my stack:

  • Alfresco Community 5.2 - 1 Server - 12 vCPU - 40 GB Ram - JVM Heap 20 GB
  • Solr 4 - 1 Server - 12 vCPU - 30 GB Ram - I/O throughput 1843.93 MB/s - JVM Heap 18 GB
  • Mysql 5.7 - 1 Server - 12 vCPU - 20 GB Ram

Sizing and settings worth mentioning:

  • Size on disk of content repository: 5 TB
  • Size on disk of Solr indexes: 300 GB
  • Num. Docs on Solr: 140 Mln
  • Content indexing disabled
  • Solr suggester disabled
  • Alfresco tracking every 8 secs
  • 11 indexing threads for each tracking transaction
2 REPLIES 2

angelborroy
Community Manager Community Manager
Community Manager

Initially, it appears that the database might be the bottleneck. Do you have any metrics on the performance of the database queries?

Hyland Developer Evangelist

Hi Angel, thank you for your response.

On DB side, I enabled the slow-query logging and observed the environment for about 10-15 minutes. Apparently all looks fine:

  • 15-25 jdbc connections (in average)
  • no slow queries logged (tracking query duration greater than 5 seconds)

I think Solr takes too long to index even a small commit. I noticed that index folder have a lot of 2 GB files:

## /solr4/index/workspace/SpacesStore/index
-rw-r--r-- 1 alfresco alfresco 2.2G Mar 10 2023 _256j_Lucene41_0.tim
-rw-r--r-- 1 alfresco alfresco 2.2G Mar 10 2023 _2gzk_Lucene41_0.tim
-rw-r--r-- 1 alfresco alfresco 2.1G Mar 10 2023 _37iw_Lucene41_0.tim
-rw-r--r-- 1 alfresco alfresco 2.1G Mar 10 2023 _2tvr_Lucene41_0.tim
-rw-r--r-- 1 alfresco alfresco 2.0G Mar 9 2023 _1s66_Lucene41_0.tim
....
-rw-r--r-- 1 alfresco alfresco 1.2M Jul 4 10:07 _ifkj.nvd
-rw-r--r-- 1 alfresco alfresco 6.9M Jul 4 10:07 _ifkj_Lucene410_0.dvd
-rw-r--r-- 1 alfresco alfresco 3.6M Jul 4 10:06 _ifkj_Lucene41_0.doc

In addition thers's another folder named "content" with small gz files and a lot of numbered sub folders that include many other gz files. That folder looks very heavy as well, and I'm unable to list all files within a reasonable time...even the command "ls -l" executed from terminal takes to long to respond 

## solr4/content/_DEFAULT_/db
drwxrwxr-x 2 solr solr 264K Jul  6 17:09 1962
drwxrwxr-x 2 solr solr 264K Jul  6 14:34 1963
drwxrwxr-x 2 solr solr 264K Jul  6 15:10 2086
drwxrwxr-x 2 solr solr 260K Jul 15 08:33 1105
drwxrwxr-x 2 solr solr 260K Jul 10 10:54 1106
....

It looks like solr is always doing something with those huge files and that takes quite a long time. The courious thing is that searches take 3-7 secs (acceptable for 15 items paginated queries on a huge repository) but indexing is 7-10 minutes behind the DB (system clock of both servers are synced)