Hyland Connect

andy1 · ‎06-19-2017

Introduction

Last week, eventual consistency cropped up more than usual. What it means, how to understand it and how to deal with it. This is not about the pros and cons of eventual consistency, when you may want transactional behaviour, etc. This post describes what eventual consistency is and its foibles in the context of the Alfresco Index Engine. So here are the answers to last week's questions ....

Background

Back in the day, Alfresco 3.x supported a transactional index of metadata using Apache lucene. Alfresco 4.0 introduced an eventually consistent index based on Apache SOLR 1.4. Alfresco 5.0 moved to SOLR 4 and also introduced transaction metadata query (TMDQ). TMDQ was added specifically to support the transactional use cases that used to be addressed by the lucene index in previous versions. TMDQ uses the database and adds a bunch of required indexes as optional patches. Alfresco 5.1 supports a later version of SOLR 4 and made improvements to TMDQ. Alfresco Content Services 5.2 supports SOLR 4, SOLR 6 and TMDQ.

When changes are made to the repository they are picked up by SOLR via a polling mechanism. The required updates are made to the Index Engine to keep the two in sync. This takes some time. The Index Engine may well be in a state that reflects some previous version of the repository. It will eventually catch up and be consistent with the repository - assuming it is not forever changing.

When a query is executed it can happen in one of two ways. By default, if the query can be executed against the database it is; if not, it goes to the Index Engine. There are some subtle differences between the results: For example, collation and how permission are applied. Some queries are just not supported by TMDQ. For example, facets, full text, "in tree" and structure. If a query is not supported by TMDQ it can only go to the Index Engine.

What does eventual consistency mean?

If the Index Engine is up to date, a query against the database or the Index Engine will see the same state. The results may still be subtly different - this will be the next topic! If the index engine is behind the repository then a query may produce results that do not, as yet, reflect all the changes that have been made to the repository.

Nodes may have been deleted

Nodes are present in the index but deleted from the repository
- Deleted nodes are filtered from the results when they are returned from the query
  - As a result you may see a "short page" of results even though there are more results
  - (we used to leave in a "this node has been deleted" place holder but this annoyed people more)
- The result count may be lower than the facet counts
- Faceting will include the "to be deleted nodes" in the counts
  - There is no sensible post fix for this other than re-querying to filtering stuff out and someone could have deleted more....

Nodes may have been added

Nodes have been added to the repository but are not yet in the index at all
- These new nodes will not be found in the results or included in faceting
Nodes have been added to the repository but only the metadata is present in the index
- These nodes cannot be found by content

Nodes metadata has changed

The index reflects out of date metadata
- Some out of date nodes may be in the results when they should not be
- Some out of date nodes may be missing from the results when they should not be
- Some nodes may be counted in the wrong facets due to out of date metadata
- Some nodes may be ordered using out of date metadata

Node Content has changed

The index reflects out of date content but the metadata is up to date
- Some out of date nodes may be in the results when they should not be
- Some out of date nodes may be missing from the results when they should not be

Node Content and metadata has changed

The index reflects the out of date metadata and content
The index reflects out of date content (the metadata is updated first)
- Some out of date nodes may be in the results when they should not be
- Some out of date nodes may be missing from the results when they should not be
- Some nodes may be counted in facets due to out of date metadata

An update has been made to an ACL (adding an access control entry to a node)

The old ACL is reflected in queries
- Some out of date nodes may be in the results when they should not be
- Some out of date nodes may be missing from the results when they should not be
- The ACLs that are enforced may be out of date but are consistent with the repository state when the node was added to the index. Again, to be clear, the node and ACL may be out of date but permission for the content and metadata is consistent with this prior state. For nodes in the version index, they are assigned the ACL of the "live" node when the version was added to the index.

A node may be continually updated

It is possible that such a node may never appear in the index.
By default, when the Index Engine tracks the repository it only picks up changes that are older than one second. This is configurable. If we are indexing node 27 in state 120, we only add information for node 27 if it is still in that state. If it has moved on to state 236, say, we will skip node 27 until we are indexing state 236 - assuming it has not moved on again. This avoids pulling "later" information into the index which may have an updated ACE or present an overall view inconsistent with a repository state. Any out of date-ness means we have older information in the index - never newer information.

How do I deal with eventual consistency?

To a large extent this depends on your use case. If you do need a transactional answer, the default behaviour will give you one if it can. For some queries it is not possible to get a transactional answer. You can force this in the Java API and it will be coming soon in the public API.

If you are using SOLR 6, the response from the search public API will return some information to help. It will report the index state consistent with the query.

...
"context": {
    "consistency": {
        "lastTxId": 18
    }
},
....

This can be compared with the last transaction on the repository. If they are equal the query was consistent.

In fact, we know the repository state for each node when we added it to the index. In the future we may check if the index state for a node reflects the repository state for the same node - we can mark nodes as potentially out of date - but only for the page of results. Faceting and aggregation is much more of a pain. Marking potentially out of date nodes and providing other indicators of consistency are on the backlog for the public API.

If your query goes to the Index Server and it is not up to date you could see any of the issues described above in what eventual consistency means.

Using the Index Engine based on SOLR 6 gives better consistency for metadata updates. Some update operations that infrequently require many nodes to be updated are now done in the background - these are mostly move and and rename operations that affect structure. So a node is now renamed quickly. Any structural information that is consequently changed on all of its children is done after. Alfresco Search Services 1.0.0 also includes improved commit coordination and concurrency improvements. These both reduce the time for changes to be reflected in the index. Some of the delay also comes from the work that SOLR does before an index goes live. This can be reduced by tuning. The cost is usually a query performance hit later.

Hybrid Query?

Surely we can take the results from the Index Engine for transactions 1-1076 and add 1077 - 2012 from TMDQ?

It's not quite that simple. TMDQ does not support all queries, it does not currently support faceting and aggregation, scoring does not really exist and collation is not as flexible or the same. You reinvent the query coordination that is already in SOLR to combine the two result sets. It turns out to be a difficult but not forgotten problem.

Summary

For most use cases eventual consistency is perfectly fine. For transactional use cases TMDQ is the only solution unless the index and repository are in sync. The foibles of eventual consistency are well known and hopefully clearer, particularly in the context of the Alfresco Index Server.