04-24-2018 08:35 AM
I have a Alfresco Community (201707) installation which i am using to compare the default solr 4 vs solr 6 in the alfresco-search-services-1.1.0 install.
After a full index with Solr 4, I get the following info from the solr4 admin page:
Num Docs: 163458
Max Docs: 163458
...
Deleted Docs: 0
...
Master (Searching) 1524504594659 159 6.5 GB
...
Nodes in Index: 70921
Transactions in Index: 80844
Approx transactions remaining: 0...
Unindexed Nodes: 11441
Error Nodes in Index: 0
in the solr4 SUMMARY report, I can see that it's done:
Node count with FTSStatus Clean 69165
Node count with FTSStatus Dirty 0
Node count with FTSStatus New 0
When I test the solr 6 setup, I stop the alfresco app, make the changes to the alfresco install for Solr 6, start the solr server and the alfresco server, and let it re-index. It plugs along for a few hours, and then completes with the following stats:
Num Docs:164357
Max Doc:164357
...
Deleted Docs: 0
...
Master (Searching) 1524581958240 586 2.48 GB
, and in the SUMMARY report:
Alfresco Nodes in Index 70937
Alfresco Transactions in Index 81470
Alfresco Unindexed Nodes 11698
Alfresco Error Nodes in Index 0Node count with FTSStatus Clean 69181
Node count with FTSStatus Dirty 0
Node count with FTSStatus New 0
When i run the ERROR query I get nothing:
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"ERROR*",
"wt":"json"}},
"response":{"numFound":0,"start":0,"docs":[]
}}
So the indexer looks done and comparable volume-wise to the solr4 setup.
What first concerned me was the significantly smaller size: the Solr4 6.5 Gb vs Solr6 2.5 Gb size after a complete reindex, when I was expecting a 15% size increase with the introduction of fingerprints.
There are some docs that I can't get in a full text search result set, even though the docs have the index aspect attached. I can try to reindex one of those docs, but no luck
http://[myip]:8983/solr/admin/cores?action=reindex&query=sys%5C%3Anode%5C-dbid%3A135156
At reindex time I saw a few
"FlateFilter: stop reading corrupt stream due to a DataFormatException"
and
"An error occured when reading table hmtx"
But no more then I saw on the solr4 setup.
Any thoughts on how best to troubleshoot the inconsistencies?
Also, I know i can't upgrade to the pdfbox 2.0.X in 5.2, but anyone able to replace the pdfbox-1.8.10.jar and pdfbox-1.8.10.jar with pdfbox-1.8.13.jar and pdfbox-1.8.13.jar to get over the pdfbox probs?
04-25-2018 05:55 AM
I re-read a previous question that Cesar Capillas had answered, and I think he answered the potential for the size discrepancies (https://community.alfresco.com/message/830710-request-for-solr-6-search-services-troubleshooting-adv...). I needed to look at my shared.properties, so thanks for the previous answer
Explore our Alfresco products with the links below. Use labels to filter content by product module.