07-19-2024 12:10 PM
Hello,
has anyone experienced Solr 4 indexing lagging with huge content store ?
I am facing a problem with Solr and the NRT (NearRealTime) indexing. Basically, in my environment Solr takes too long to sync indexes with the DB data. Documents are searchable only after 8-10 minutes.
Here is my stack:
Sizing and settings worth mentioning:
07-22-2024 03:23 AM
Initially, it appears that the database might be the bottleneck. Do you have any metrics on the performance of the database queries?
07-22-2024 04:50 AM - edited 11-05-2024 07:36 AM
Hi Angel, thank you for your response.
On DB side, I enabled the slow-query logging and observed the environment for about 10-15 minutes. Apparently all looks fine:
I think Solr takes too long to index even a small commit. I noticed that index folder have a lot of 2 GB files:
## /solr4/index/workspace/SpacesStore/index -rw-r--r-- 1 alfresco alfresco 2.2G Mar 10 2023 _256j_Lucene41_0.tim -rw-r--r-- 1 alfresco alfresco 2.2G Mar 10 2023 _2gzk_Lucene41_0.tim -rw-r--r-- 1 alfresco alfresco 2.1G Mar 10 2023 _37iw_Lucene41_0.tim -rw-r--r-- 1 alfresco alfresco 2.1G Mar 10 2023 _2tvr_Lucene41_0.tim -rw-r--r-- 1 alfresco alfresco 2.0G Mar 9 2023 _1s66_Lucene41_0.tim .... -rw-r--r-- 1 alfresco alfresco 1.2M Jul 4 10:07 _ifkj.nvd -rw-r--r-- 1 alfresco alfresco 6.9M Jul 4 10:07 _ifkj_Lucene410_0.dvd -rw-r--r-- 1 alfresco alfresco 3.6M Jul 4 10:06 _ifkj_Lucene41_0.doc
In addition thers's another folder named "content" with small gz files and a lot of numbered sub folders that include many other gz files. That folder looks very heavy as well, and I'm unable to list all files within a reasonable time...even the command "ls -l" executed from terminal takes to long to respond
## solr4/content/_DEFAULT_/db drwxrwxr-x 2 solr solr 264K Jul 6 17:09 1962 drwxrwxr-x 2 solr solr 264K Jul 6 14:34 1963 drwxrwxr-x 2 solr solr 264K Jul 6 15:10 2086 drwxrwxr-x 2 solr solr 260K Jul 15 08:33 1105 drwxrwxr-x 2 solr solr 260K Jul 10 10:54 1106 ....
It looks like solr is always doing something with those huge files and that takes quite a long time. The courious thing is that searches take 3-7 secs (acceptable for 15 items paginated queries on a huge repository) but indexing is 7-10 minutes behind the DB (system clock of both servers are synced)
Explore our Alfresco products with the links below. Use labels to filter content by product module.