Overview
Currently ACS (Alfresco content Service) customers are encouraged to migrate to Elasticsearch. Therefore, customers, partners and other interest parties having concerns on this migration process. Therefore, this blog post is written to share my experience and knowledge gained through assisting one of my customers who successfully adopt Elasticsearch.
Q1. What is the responsibility of activemq when it comes to Alfresco Search Enterprise (Elasticsearch)?
Alfresco Search Enterprise component consist of 2 main components which is re-indexing and live indexing. Please refer section 1 & 2 of this blog post for better understanding.
Live-indexing: Totally relying on message queues therefore it is MUST to have activemq.
Reindexing : Unless content indexing is performed activemq is NOT required while reindexing. However, there is no harm having it as well.
Q2. What Architecture best suit?
Unlike Solr ACS communicate with Elastic-Search for indexing via ActiveMQ & Elastic-Search-Connectors. Below is the integration diagram for Alfresco Search Enterprise.
ESC(Elastic-Search-Connector) also required service of ATS(Alfresco Transform Service) while indexing content.
Below are two options you may consider while designing the solution
Some customers prefer this architecture as this aligns with their current architecture considering
However, selecting this approach definitely
Option 2: Decouple architecture with independent ActiveMQ, Transform Service Stack and Elastic-Search-Connector stack
Q3. Can Elasticsearch/OpenSearch be shard?
Of course it is possible and shard count has to be determine during index creation. Refer for more details: https://aws.amazon.com/blogs/database/get-started-with-amazon-elasticsearch-service-how-many-shards-...
Q4. Is dynamic Shading allowed in Elasticsearch?
It is not a secret that dynamic shading is not allowed in Elasticsearch therefore we need to forecast the growth while creating the Elasticsearch cluster. While forecasting for near future is not much hard, it is hard to forecast for many years and create shards accordingly. Therefore, it creates the need to recreate the elastic search cluster from the scratch and perform re-indexing.
Thanks to parrallel reindexing and indexing speed this task is no longer a nightmare.
Guide to setup parallel re-indexing : https://connect.hyland.com/t5/alfresco-blog/offline-parallel-re-indexing-with-elasticsearch/ba-p/125...
Guide to re-indexing at Scale can be found in Section 3.2 : https://connect.hyland.com/t5/alfresco-blog/alfresco-search-enterprise-3-2-deploying-at-scale/ba-p/1...
Q5. What should be the best way of determining shard count?
As mentioned earlier due to the limitation of dynamic sharding is not allowed in elastic search, customer must forecast for near future. According to ES documentation its recommendation for a shard capacity is 10GB to 50GB per shard. Thus, while calculating shards it may be best to determine the CAP value for shard as 40GB.
Q6. What to consider during Elasticsearch index creation
curl -XPUT 'http://localhost:9200/alfresco?pretty' -H 'Content-Type: application/json' -d'
{ "settings" :{…..,
"index.mapping.total_fields.limit":2000 } }'
Default limit of max_result_window is 10,000 , Therefore indexes above 10,000 will not be searchable throwing an error. Therefore, we can need to increase this value to search above default.
curl -XPUT "http://localhost:9200/alfresco/_settings" -d '
{ "index" : { "max_result_window" : 450000 } }' -H "Content-Type: application/json"
Alfresco API
POST /alfresco/api/-default-/public/search/versions/1/search
{ "query": { "query": "cm:name:FILE-F*"},
"limits": { "trackTotalHitsLimit": -1 }}
curl -XPUT 'http://localhost:9200/alfresco?pretty' -H 'Content-Type: application/json' -d'
{ "settings" :{ "number_of_shards":3, "number_of_replicas":0, "index.mapping.total_fields.limit":2000 }}'
curl -XPUT "http://localhost:9200/alfresco?pretty" -H 'Content-Type: application/json' -d' {
"mappings": { "dynamic": false},
"settings" :{ ………..}}'