Our colleague @aitseitz reached me to point out a missing feature related to Content Indexing. Despite we are still considering the addition of this feature to the product, this blog post describe alternative approaches to get required information.
Alfresco Search Services indexes metadata and content using different trackers. Once the metadata has been indexed, some components (like the Enterprise Admin Web Console for the Enterprise version) will report that no indexation is happening.
However, despite all the metadata has been created in SOLR Index, the content of the documents is still being indexed by the ContentTracker component. While we improve the feature, this blog post will give you some tools to check the status of the content indexation.
Search Services 2.0.x
From Search Services 2.0 there is no content store in SOLR Side, so two new fields are required to identify if the content of a SOLR Document needs to be indexed or updated:
Additional details can be found in:
The following SOLR Query will return the number of SOLR Documents with a different value in those *_CONTENT_VERSION_ID fields:
http://localhost:8983/solr/alfresco/select
?q=*
&fq={!frange l=1 u=1 v=$equals}
&equals=if(not(eq(LATEST_APPLIED_CONTENT_VERSION_ID,LAST_INCOMING_CONTENT_VERSION_ID)),1,0)
&indent=on
&wt=json
{ "responseHeader":{ ... }, "_original_parameters_":{ ... }, "lastIndexedTx":574, "lastIndexedTxTime":1652429863621, "txRemaining":0, "response":{"numFound":76,"start":0,"docs":[ ... ]}, "processedDenies":false }
In this sample, the content of 76 documents is still pending to be indexed or updated. While the lastIndexedTx points the latest TX in DB and the txRemaining value indicates there is no metadata pending to be indexed.
If you want to know the status of the nodes pending to be indexed or updated, you can add those *_CONTENT_VERSION_ID fields to the "fl" parameter:
http://localhost:8983/solr/alfresco/select
?q=*
&fl=[cached]LATEST_APPLIED_CONTENT_VERSION_ID,LAST_INCOMING_CONTENT_VERSION_ID
&fq={!frange l=1 u=1 v=$equals}
&equals=if(not(eq(LATEST_APPLIED_CONTENT_VERSION_ID,LAST_INCOMING_CONTENT_VERSION_ID)),1,0)
&indent=on
&wt=json
{ "responseHeader":{ ... }, "_original_parameters_":{ ... }, "lastIndexedTx":574, "lastIndexedTxTime":1652429863621, "txRemaining":0, "response":{"numFound":7,"start":0,"docs":[ { "LATEST_APPLIED_CONTENT_VERSION_ID":591, "LAST_INCOMING_CONTENT_VERSION_ID":-10}, { "LATEST_APPLIED_CONTENT_VERSION_ID":393, "LAST_INCOMING_CONTENT_VERSION_ID":-10}, { "LATEST_APPLIED_CONTENT_VERSION_ID":573, "LAST_INCOMING_CONTENT_VERSION_ID":-10}, { "LATEST_APPLIED_CONTENT_VERSION_ID":579, "LAST_INCOMING_CONTENT_VERSION_ID":-10}, { "LATEST_APPLIED_CONTENT_VERSION_ID":606, "LAST_INCOMING_CONTENT_VERSION_ID":-10}, { "LATEST_APPLIED_CONTENT_VERSION_ID":582, "LAST_INCOMING_CONTENT_VERSION_ID":-10}, { "LATEST_APPLIED_CONTENT_VERSION_ID":585, "LAST_INCOMING_CONTENT_VERSION_ID":-10}] }, "processedDenies":false }
In this sample, we can see that the content of the 7 documents is still pending to be indexed, since the LAST_INCOMING_CONTENT_VERSION_ID value is set to SolrInformationServer.CONTENT_OUTDATED_MARKER (-10)
Alternatively, Search Services Admin REST API can be used to get this information.
http://localhost:8983/solr/admin/cores?action=summary&core=alfresco
<lst name="FTS"> <long name="Node count whose content is in sync">170</long> <long name="Node count whose content needs to be updated">7</long> </lst>
Search Services 1.4.x
When using Search Services 1.3.x / 1.4.x the logic is quite different.
There is a field in the Solr schema called FTSSTATUS that could have the following domain:
The following SOLR Query will return the number of SOLR Documents faceted by FTSSTATUS field:
http://localhost:8080/solr/alfresco/select?facet.field=FTSSTATUS&facet=on&indent=on&q=*&wt=json
{ "responseHeader":{ ... }, "_original_parameters_":{ ... }, "lastIndexedTx":136, "lastIndexedTxTime":1652430806500, "txRemaining":0, "response":{"numFound":874,"start":0,"docs":[ ... ]}, "facet_counts":{ "facet_fields":{ "FTSSTATUS":[ "New",152, "Clean",125, "Dirty",3]}, "processedDenies":false }
In this sample, we can see the content of 152 New Documents and 3 Dirty Documents need to be indexed. The content of 125 Documents is in sync (Clean) and it's not required.
Alternatively, Search Services Admin REST API can be used to get this information.
http://localhost:8983/solr/admin/cores?action=summary&core=alfresco
<lst name="FTS">
<long name="Node count with FTSStatus Clean">125</long>
<long name="Node count with FTSStatus Dirty">3</long>
<long name="Node count with FTSStatus New">152</long>
</lst>