Hyland Connect

miroslav · ‎05-18-2021

Hello,

we have a queue of events that follow each other (import actions). Events use data from Solr6 to pair documents and it is necessary that the second event does not start until the Solr completes indexing of the data from the first action, so there is no danger that Solr will return only a subset of searched data (for document pairing).

Is it possible/appropriate to use Solr for this purpose? I need to ensure / verify that Solr is in a synchronous state with data imported with previous action before starting another one. Shouldn't I implement logic into a separate relational db instead?

Thank you for answer!

afaust · ‎05-20-2021

I am not talking about using direct SQL to deal with data in Alfresco - more like using Alfresco's search capabilities to perform the necessary queries on the actual nodes, instead of setting up some secondary data source just for queueing.

Normally, I try to keep users out of import handling. But when user-triggered import actions need to be processed in a specific sequence, I would store them as (metadata-ordered) placeholder/parameter nodes in a specific structure within Alfresco, and have whatever asynchronous process is handling these work on those in the order given (e.g. FTS search for a action placeholder/parameter node sorted by the metadata-field used for sequence definition).

View answer in original post

afaust · ‎05-19-2021

You can not rely on SOLR 6 (as it is used by Alfresco) for transactional operations. It would technically be possible to verify that a specific transaction has been indexed, but that does not necessarily mean that all indexing work is done for the nodes in that transaction, as work is split into different workers (ACL, metadata and content tracker). It also does not mean that all work done for that transaction has been done without errors. You'd have to separately check each and every node against the index using undocumented, technical check queries.

I find it is generally quite easy to perform such operations without relying on SOLR or a secondary / separate DB. As part of an import, I would flag nodes that need further processing with additional, temporary aspects, and use Alfresco TMQ capabilities to query those nodes in an independent, second stage action / job directly from Alfresco's database., and remove it, when that processing is done. With proper handling of concurrency / locking and batch processing, this typically is sufficient for what I often encounter in those import use cases.

miroslav · ‎05-19-2021

@afaust Thank you for answer! These actions are triggered by multiple users and included in the sequential queue because of potential data collisions (assoc between documents, cancellation of actions, rollback). I don't want to import secondary document that does not have a primary one. Pairing is based on the key that is in the property of primary document (in aspect). I don't know if it is effective to take data directly from alfresco DB using sql (have to try it). I need all the documents (their id) with the given value of property. What do you think about it?

afaust · ‎05-20-2021

I am not talking about using direct SQL to deal with data in Alfresco - more like using Alfresco's search capabilities to perform the necessary queries on the actual nodes, instead of setting up some secondary data source just for queueing.

Normally, I try to keep users out of import handling. But when user-triggered import actions need to be processed in a specific sequence, I would store them as (metadata-ordered) placeholder/parameter nodes in a specific structure within Alfresco, and have whatever asynchronous process is handling these work on those in the order given (e.g. FTS search for a action placeholder/parameter node sorted by the metadata-field used for sequence definition).

Hyland Connect

Transactional operations with Apache Sol6