11-19-2021 05:00 AM
Hi Community:
Sometimes, I have a painful requirement when dealing with large repositories (let's say more than 10M of documents).
I have to apply an aspect (cm:indexControl) and some properties (cm:isIndexed=true, and cm:isContentIndexed=false) on every document of the repository. What strategies may you use in a very large repository ? Is there a safer or controlled way for doing this ?
In the past I did it in smaller repos with a basic script, useful but I think it is not enough for this case.
- I used REST API for obtaining the full set of nodeRefs to apply. Basically I did TYPE based paginated searches for every document type.
- And then I iterated over the set of custom nodeRefs, with a simple custom webscript for applying the aspect and properties on each node.
Surely this is not the most effective / fast way for doing. What do you think ? Is there a way for not doing this one by one ? How would you improve each part ?
I use Alfresco 5.2 EE and Alfresco Search Services 1.3.
Kind regards and thanks in advance.
--C.
P.S: Yes, the idea is reindexing SOLR later, for getting smaller SOLR contentstore and indices.
11-22-2021 02:56 AM
I guess the safer way is to create something on the Repo side, using the Java API.
Developing an Scheduled Job to apply the aspect to the nodes using a paginated search will be faster than using the external API.
11-26-2021 04:16 AM
Thanks for the idea Angel:
It seems reasonable to develop an scheduled job. It reminds a little bit the SOLR cronjob strategy (but in this case it would be in the repo part).
But do you know how would you query over all living and relevant nodes in an efective way ?
Regards.
--C.
11-26-2021 04:44 AM
You may use DB or Search Service in order to get the batch of nodes to be updated. Using DB will be more efficient, but it may depend on your requirements.
If you need some inspiration, take a look at the implementation of the TraschcanCleaner addon:
Explore our Alfresco products with the links below. Use labels to filter content by product module.