Hyland Connect

asirika · ‎11-13-2024

Overview

Currently ACS (Alfresco content Service) customers are encouraged to migrate to Elasticsearch. Therefore, customers, partners and other interest parties having concerns on this migration process. Therefore, this blog post is written to share my experience and knowledge gained through assisting one of my customers who successfully adopt Elasticsearch.

Q1. What is the responsibility of activemq when it comes to Alfresco Search Enterprise (Elasticsearch)?

Alfresco Search Enterprise component consist of 2 main components which is re-indexing and live indexing. Please refer section 1 & 2 of this blog post for better understanding.

Live-indexing: Totally relying on message queues therefore it is MUST to have activemq.

Reindexing : Unless content indexing is performed activemq is NOT required while reindexing. However, there is no harm having it as well.

Q2. What Architecture best suit?

Unlike Solr ACS communicate with Elastic-Search for indexing via ActiveMQ & Elastic-Search-Connectors. Below is the integration diagram for Alfresco Search Enterprise.

ESC(Elastic-Search-Connector) also required service of ATS(Alfresco Transform Service) while indexing content.

Below are two options you may consider while designing the solution

Option 1: Simple approach (coupled) architecture which ActiveMQ, ATS and ESC (Elastic Search Connectors) are configured to communicate within localhost.

Some customers prefer this architecture as this aligns with their current architecture considering

Having activemq running on each ACS node and
Other security concerns communicating with common decoupled ActiveMQ node.

However, selecting this approach definitely

Require customer to have additional monitoring process for Transform-Service, ActiveMQ and Elastic-Search-Connector processes on each running nodes to identify if any of the services were dead for any period which required to trigger indexing manually.

Option 2: Decouple architecture with independent ActiveMQ, Transform Service Stack and Elastic-Search-Connector stack

Need to configure message-queue service like Amazon MQ or self-managed ActiveMQ instance
Auto-scaling-grp for Elastic-Search-Connectors
Auto-scaling-grp for ATS (Alfresco Transform Service)
Configure Security groups to access cross services among ESC, ATS and ActiveMQ.
Note that all components here communicate through ActiveMQ via TCP protocol

Q3. Can Elasticsearch/OpenSearch be shard?

Of course it is possible and shard count has to be determine during index creation. Refer for more details: https://aws.amazon.com/blogs/database/get-started-with-amazon-elasticsearch-service-how-many-shards-...

Q4. Is dynamic Shading allowed in Elasticsearch?

It is not a secret that dynamic shading is not allowed in Elasticsearch therefore we need to forecast the growth while creating the Elasticsearch cluster. While forecasting for near future is not much hard, it is hard to forecast for many years and create shards accordingly. Therefore, it creates the need to recreate the elastic search cluster from the scratch and perform re-indexing.

Thanks to parrallel reindexing and indexing speed this task is no longer a nightmare.

Guide to setup parallel re-indexing : https://connect.hyland.com/t5/alfresco-blog/offline-parallel-re-indexing-with-elasticsearch/ba-p/125...

Guide to re-indexing at Scale can be found in Section 3.2 : https://connect.hyland.com/t5/alfresco-blog/alfresco-search-enterprise-3-2-deploying-at-scale/ba-p/1...

Q5. What should be the best way of determining shard count?

As mentioned earlier due to the limitation of dynamic sharding is not allowed in elastic search, customer must forecast for near future. According to ES documentation its recommendation for a shard capacity is 10GB to 50GB per shard. Thus, while calculating shards it may be best to determine the CAP value for shard as 40GB.

Screenshot 2024-11-14 at 10.27.26 AM.png

Q6. What to consider during Elasticsearch index creation

Default field limit for ES is 1,000. Therefore, it is must to increase this value if there are more fields.

curl -XPUT 'http://localhost:9200/alfresco?pretty' -H 'Content-Type: application/json' -d'

{ "settings" :{…..,

"index.mapping.total_fields.limit":2000 } }'

Limitation of searching the items beyond 10,000

Default limit of max_result_window is 10,000 , Therefore indexes above 10,000 will not be searchable throwing an error. Therefore, we can need to increase this value to search above default.

curl -XPUT "http://localhost:9200/alfresco/_settings" -d '

{ "index" : { "max_result_window" : 450000 } }' -H "Content-Type: application/json"

Alfresco API

POST /alfresco/api/-default-/public/search/versions/1/search

{ "query": { "query": "cm:name:FILE-F*"},

"limits": { "trackTotalHitsLimit": -1 }}

-1 → Unlimited track of the total hits up to the max value allowed by Elastic (TRACK_TOTAL_HITS_ACCURATE - Integer max)
0 → Use Default settings (Elastic constant DEFAULT_TRACK_TOTAL_HITS_UP_TO - tracks total hits up to 10.000)
Between 1 and Elastic’s constant TRACK_TOTAL_HITS_ACCURATE → Track total hits up to this value
Any other number uses the default settings (DEFAULT_TRACK_TOTAL_HITS_UP_TO - tracks total hits up to 10.000)

It is recommended to set number_of_replicas to 0 during bulk re-indexing this will speed up the indexing process whereas setting replicas can negatively impact on indexing speed.

curl -XPUT 'http://localhost:9200/alfresco?pretty' -H 'Content-Type: application/json' -d'

{ "settings" :{ "number_of_shards":3, "number_of_replicas":0, "index.mapping.total_fields.limit":2000 }}'

Dynamic: false and correct field mappings are must to have in OpenSearch/Elastic Search Alfresco index to make the fields indexable and search accurately.

curl -XPUT "http://localhost:9200/alfresco?pretty" -H 'Content-Type: application/json' -d' {

"mappings": { "dynamic": false},

"settings" :{ ………..}}'

Hyland Connect

Elasticsearch Frequently Asked Question