This blog post describes the procedure to remove a number of transactions from Alfresco SOLR Index so they can be re-indexed from scratch. This procedure could help with upgrading and re-indexing scenarios.
Alfresco SOLR cores (alfresco, archive) include different DOC_TYPE documents:
Details on the number of different document types can be obtained by using following URL:
http://localhost:8983/solr/#/alfresco/schema?field=DOC_TYPE
Following steps are required to remove and re-index latest transactions in SOLR Core:
Properties for the document type "State" include latest indexed transaction and permission list:
[ { "id":"TRACKER!STATE!ACLTX", "_version_":1736233537222213632, "S_ACLTXID":9, "S_INACLTXID":9, "S_ACLTXCOMMITTIME":1655799514574, "DOC_TYPE":"State", "LAST_INCOMING_CONTENT_VERSION_ID":-10}, { "id":"TRACKER!STATE!TX", "_version_":1736234575631220736, "S_TXID":43, "S_INTXID":43, "S_TXCOMMITTIME":1655799968144, "DOC_TYPE":"State", "LAST_INCOMING_CONTENT_VERSION_ID":-10 } ]
In the sample above, for alfresco core, latest indexed transaction is 43 with a commit time 1655799968144
In order to remove transactions 41, 42 and 43 from SOLR Core, get properties for transaction 40.
{ "id":"TRACKER!TX!8000000000000028", "_version_":1736233557160886272, "TXID":40, "INTXID":40, "TXCOMMITTIME":1655799960307, "DOC_TYPE":"Tx", "int@s_@cascade":0, "LAST_INCOMING_CONTENT_VERSION_ID":-10}] }
Remove the transactions from SOLR Core using the SOLR Admin REST API:
$ curl --location --request GET 'http://localhost:8983/solr/admin/cores?action=purge&txid=43' $ curl --location --request GET 'http://localhost:8983/solr/admin/cores?action=purge&txid=42' $ curl --location --request GET 'http://localhost:8983/solr/admin/cores?action=purge&txid=41'
This operation may take a while, verify the latest transaction in SOLR Core is the expected one (40 in this example) before moving forward.
{ "id":"TRACKER!TX!8000000000000028", "_version_":1736233557160886272, "TXID":40, "INTXID":40, "TXCOMMITTIME":1655799960307, "DOC_TYPE":"Tx", "int@s_@cascade":0, "LAST_INCOMING_CONTENT_VERSION_ID":-10}] }
Once the transactions have been removed from SOLR Core, the status document TRACKER!STATE!TX needs to be modified. Before performing this udpate, stop Alfresco Search Services and include following configuration in solrcore.properties to disable tracking process. You need to set this property in both cores: alfresco, archive.
enable.alfresco.tracking=false
Once Alfresco Search Services is up & running again, use the following command to update the status with the properties of transaction 40
$ curl --location --request POST \ 'http://localhost:8983/solr/alfresco/update?commitWithin=1000&overwrite=true&wt=json' \ --header 'Content-Type: application/json' \ --data-raw '[ { "id":"TRACKER!STATE!TX", "_version_":1, "S_TXID":40, "S_INTXID":40, "S_TXCOMMITTIME":1655799960307, "DOC_TYPE":"State", "LAST_INCOMING_CONTENT_VERSION_ID":-10 } ]'
Stop Alfresco Search Services again and revert previous configuration in solrcore.properties files
enable.alfresco.tracking=true
Once Alfresco Search Services is up & running, transactions from Id 40 will be indexed on the regular tracking process. After a while, latest transaction can be verified as 43 in both TX and TRACKER!STATE!TX documents.
{ "id":"TRACKER!STATE!TX", "_version_":1736237510552453120, "S_TXID":43, "S_INTXID":43, "S_TXCOMMITTIME":1655799968144, "DOC_TYPE":"State", "LAST_INCOMING_CONTENT_VERSION_ID":-10 }
{ "id":"TRACKER!TX!800000000000002b", "_version_":1736237530581303296, "TXID":43, "INTXID":43, "TXCOMMITTIME":1655799968144, "DOC_TYPE":"Tx", "int@s_@cascade":0, "LAST_INCOMING_CONTENT_VERSION_ID":-10 }
Additional notes
An alternative approach to disable indexing, contributed by @morganp1, is the use of SOLR REST API "disable indexing" action:
http://localhost:8983/solr/admin/cores?action=disable-indexing
<response> <lst name="action"> <lst name="alfresco"> <bool name="CASCADE">false</bool> <bool name="CONTENT">false</bool> <bool name="ACL">false</bool> <bool name="METADATA">false</bool> </lst> <lst name="archive"> <bool name="CASCADE">false</bool> <bool name="CONTENT">false</bool> <bool name="ACL">false</bool> <bool name="METADATA">false</bool> </lst> </lst> </response>
This operation doesn't require re-starting the SOLR Server, that may be recommended for some use cases.
In order to restore the indexing process again, use the action in the opposite way:
http://localhost:8983/solr/admin/cores?action=enable-indexing
<response> </lst> <lst name="action"> <lst name="alfresco"> <bool name="CASCADE">true</bool> <bool name="CONTENT">true</bool> <bool name="ACL">true</bool> <bool name="METADATA">true</bool> </lst> <lst name="archive"> <bool name="CASCADE">true</bool> <bool name="CONTENT">true</bool> <bool name="ACL">true</bool> <bool name="METADATA">true</bool> </lst> </lst> </response>