cancel
Showing results for 
Search instead for 
Did you mean: 
amitsingh
Champ in-the-making
Champ in-the-making

AWS Elasticsearch Setup With ACS

Elasticsearch 7.10 deployment using AWS Elasticsearch service

  • Elasticsearch was deployed in same VPC as ACS with Security group allowing access to all incoming traffic from ACS, Database, Transform Service, Share File Store etc. necessary services.

  • Elasticsearch infrastructure will depend on the data volume, expected user load, cost etc. factors

Certificate generation for Elasticsearch

  • Allow 443 post access between Security group of ACS and Elasticsearch

  • Connect with ACS instance and generate elasticsearch-certificate.cer file in /home/ec2-user using below command

  • sudo echo | openssl s_client -servername <domain name of ES without https://> -connect <domain name of ES without https://>:443 2>/dev/null | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > elasticsearch-certificate.cer
  • Import generated certificate to /opt/alfresco-content-services/jdk-11.0.2/lib/security/cacerts and /opt/alfresco-share-services/jdk-11.0.2/lib/security/cacerts
  • sudo /opt/alfresco-content-services/jdk-11.0.2/bin/keytool -import -alias elasticsearch -keystore /opt/alfresco-content-services/jdk-11.0.2/lib/security/cacerts -file  /home/ec2-user/elasticsearch-certificate.cer -storepass changeit -noprompt
    sudo /opt/alfresco-share-services/jdk-11.0.2/bin/keytool -import -alias elasticsearch -keystore /opt/alfresco-share-services/jdk-11.0.2/lib/security/cacerts -file  /home/ec2-user/elasticsearch-certificate.cer -storepass changeit -noprompt
  • Update the cacert path in the alfresco-global.properties (/opt/alfresco-content-services and /opt/alfresco-share-services) under the keystore location, as stated below
  • encryption.ssl.truststore.location=/opt/alfresco-content-services/jdk-11.0.2/lib/security/cacerts
  • Update the cacerts path in setenv.sh (/opt/alfresco-content-services/tomcat/bin) in variable JAVA_OPTS= after -Dalfresco.home
  • -Djavax.net.ssl.trustStore=/opt/alfresco-content-services/jdk-11.0.2/lib/security/cacerts
  • Update the cacerts path in setenv.sh (/opt/alfresco-share-services/tomcat/bin) in variable JAVA_OPTS= after -Dalfresco.home
  • -Djavax.net.ssl.trustStore=/opt/alfresco-share-services/jdk-11.0.2/lib/security/cacerts
  • Restart tomcat and share_tomcat (we are running Content Service & Share Service on Tomcat servers in EC2 instance)
  • sudo service tomcat restart
    sudo service share_tomcat restart
  • Run Curl command to verify if ACS and ES are able to communicate, as shown belowimage

Changes in Alfresco Global Properties

  • Solr settings to be retained to allow access to Search Service on ACS UI
  • ### Solr ###
    index.subsystem.name=solr6
    Gdir.keystore=${dir.root}/keystore/
    dir.keystore=/opt/alfresco-content-services/keystore/metadata-keystore
  • Elasticsearch sub-system to be included
  • # Set the Elasticsearch subsystem
    index.subsystem.name=elasticsearch
    # Elasticsearch index properties
    elasticsearch.indexName=alfresco
    elasticsearch.createIndexIfNotExists=true
    # Elasticsearch server properties
    #elasticsearch.protocol=https
    elasticsearch.host=https://<elasticsearch host name>.amazonaws.com
    elasticsearch.port=443
    elasticsearch.baseUrl=/
  • Keystore to be updated to include the the generated ES certificate
  • ### Keystore Properties ###
    encryption.keystore.type=JCEKS
    encryption.ssl.truststore.location=/opt/alfresco-content-services/jdk-11.0.2/lib/security/cacerts
  • Restart tomcat and share_tomcat (we are running Content Service & Share Service on Tomcat servers in EC2 instance)
  • sudo service tomcat restart
    sudo service share_tomcat restart

ACS Admin Console Update

Following changes are to made in Search Service (Admin Console) after all steps stated above are completed.imageRestart tomcat and share_tomcat (we are running Content Service & Share Service on Tomcat servers in EC2 instance)

sudo service tomcat restart
sudo service share_tomcat restart

Search Service in Use: Select Elasticsearch

Elasticsearch Hostname: Enter Elasticsearch domain endpoint after removing https as shown in screenshot above

Port: 443 is used for HTTPS connections

Secure Communications: Select https

Click on Save and restart ACS service to implement the changes

Indexing in Elasticsearch

  • Run Curl command from ACS instance to verify if alfresco named index was created in Elasticsearch
  • curl https://vpc-env-acs-large82-nodes3-3chcmyl2bamxwyxyyhor4yar7i.eu-west-2.es.amazonaws.com/_cat/indices?v
  • Run Curl command to create index named alfresco in Elasticsearch manually with desired number of Shards
  • curl -XPUT 'https://vpc-env-acs-large82-nodes3-3chcmyl2bamxwyxyyhor4yar7i.eu-west-2.es.amazonaws.com:443/alfresco?pretty' -H 'Content-Type: application/json' -d'
    {
      "settings" :{
        "number_of_shards":10,
            "number_of_replicas":0
      }
    }'

Indexing Pre-populated data

  • Create an EC2 instance with Linux OS having 2 core CPU and 8GiB RAM

  • Attach with security group of ACS, Elasticsearch, TS, DB

  • Install Java11 using command, command may change based on OS version

  • sudo amazon-linux-extras install java-openjdk11
  • Copy the JAR file from Nexus repo: alfresco-elasticsearch-connector-distribution-3.1.0-A2 and browse to the folder where alfresco-elasticsearch-reindexing-3.1.0-A2-app.jar is present

  • Run following commands to start indexing of un-indexed data in a newly deployed environment

  • nohup java -Xmx6G -jar alfresco-elasticsearch-reindexing-3.1.0-A2-app.jar \
    --alfresco.reindex.jobName=reindexByIds \
    --spring.elasticsearch.rest.uris=https://vpc-env-acs-large82-nodes3-3chcmyl2bamxwyxyyhor4yar7i.eu-west-2.es.amazonaws.com:443 \
    --spring.datasource.url=jdbc:postgresql://env-acs-large82-cluster.cluster-cd9ifkuhgqhi.eu-west-2.rds.amazonaws.com:5432/alfresco \
    --spring.datasource.username=alfresco \
    --spring.datasource.password=admin2019 \
    --alfresco.accepted-content-media-types-cache.enabled=false \
    --spring.activemq.broker-url=failover:\(ssl://b-044009cc-074a-4f4d-9223-a50260e5ad30-1.mq.eu-west-2.amazonaws.com:61617,ssl://b-044009cc-074a-4f4d-9223-a50260e5ad30-2.mq.eu-west-2.amazonaws.com:61617\) \
    --spring.activemq.user=alfresco \
    --spring.activemq.password='!Alfresco2019' \
    --alfresco.reindex.fromId=0 \
    --alfresco.reindex.toId=80000000 \
    --alfresco.reindex.multithreadedStepEnabled=true \
    --alfresco.reindex.concurrentProcessors=10 \
    --alfresco.reindex.metadataIndexingEnabled=true \
    --alfresco.reindex.contentIndexingEnabled=false \
    --alfresco.reindex.pathIndexingEnabled=true \
    --alfresco.reindex.pagesize=100 \
    --alfresco.reindex.batchSize=100  &

Scaling Up With Elasticsearch

Once we have ES configured with ACS, there is a need to create index that will be referred for performing all search operations. It also needs to have shards that is going to divide the indexed data in small chunks of 25 GB to 50GB which is mandatory to make search operation fast. AWS Elasticsearch recommends the size of each shard to be in range of 25GB to 50GB. Once, these shards are created and indexed data is stored, shards cannot be altered with and thus there is a need to plan its count and size based on data volume. For example, size of 1 Billion files (with metadata and Path indexed) is ~1.3TB and keeping size of each shard at 40 GB, the count of shard comes to 32. If we want to have scope for scaling it for additional 500 million, total size would be around 2TB and to keep size of each shard at 40 GB, we need to have 50 shards.

Optimizing Elasticsearch For Optimum Performance And Cost

AWS recommends having at least 1 replica shard for each Primary shard. Replica shards duplicates the content of primary shards and is better for providing resilience and to cater to very high traffic, i.e. to perform the search operation whenever primary shards become overwhelmed with search requests. But, creating 1 or 2 or 3 replica shards for each Primary will need 2X or 3X, or 4X disk size respectively. With up to 1 Billion files of data volume performance testing, replica shards have been seen to have very less impact on search query performance and are not worth spending. However, replica shards are helpful in providing resilience to Elasticsearch.

AWS Elasticsearch offers high user load scalability i.e. to enable our application to support user load of let us say 100 which was initially 50, we have two approaches. The expensive one is to increase the number of Data Nodes that host the shards, this approach will increase the cost due to extra nodes and has not been seen to have satisfactory performance improvements. The second approach is to increase the IOPS capability of the data nodes by selecting EBS: General Purpose (SSD) - gp3 instead of General Purpose (SSD) - gp2. Increasing IOPS capability of data nodes has been seen to have better performance results than adding data nodes to much extent. It is also believed to be up to 20% cheaper per GB than using GP2 which does not support scalable IOPS at fixed EBS volume.