Elasticsearch was deployed in same VPC as ACS with Security group allowing access to all incoming traffic from ACS, Database, Transform Service, Share File Store etc. necessary services.
Elasticsearch infrastructure will depend on the data volume, expected user load, cost etc. factors
Allow 443 post access between Security group of ACS and Elasticsearch
Connect with ACS instance and generate elasticsearch-certificate.cer file in /home/ec2-user using below command
sudo echo | openssl s_client -servername <domain name of ES without https://> -connect <domain name of ES without https://>:443 2>/dev/null | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > elasticsearch-certificate.cer
sudo /opt/alfresco-content-services/jdk-11.0.2/bin/keytool -import -alias elasticsearch -keystore /opt/alfresco-content-services/jdk-11.0.2/lib/security/cacerts -file /home/ec2-user/elasticsearch-certificate.cer -storepass changeit -noprompt sudo /opt/alfresco-share-services/jdk-11.0.2/bin/keytool -import -alias elasticsearch -keystore /opt/alfresco-share-services/jdk-11.0.2/lib/security/cacerts -file /home/ec2-user/elasticsearch-certificate.cer -storepass changeit -noprompt
encryption.ssl.truststore.location=/opt/alfresco-content-services/jdk-11.0.2/lib/security/cacerts
-Djavax.net.ssl.trustStore=/opt/alfresco-content-services/jdk-11.0.2/lib/security/cacerts
-Djavax.net.ssl.trustStore=/opt/alfresco-share-services/jdk-11.0.2/lib/security/cacerts
sudo service tomcat restart sudo service share_tomcat restart
### Solr ### index.subsystem.name=solr6 Gdir.keystore=${dir.root}/keystore/ dir.keystore=/opt/alfresco-content-services/keystore/metadata-keystore
# Set the Elasticsearch subsystem index.subsystem.name=elasticsearch # Elasticsearch index properties elasticsearch.indexName=alfresco elasticsearch.createIndexIfNotExists=true # Elasticsearch server properties #elasticsearch.protocol=https elasticsearch.host=https://<elasticsearch host name>.amazonaws.com elasticsearch.port=443 elasticsearch.baseUrl=/
### Keystore Properties ### encryption.keystore.type=JCEKS encryption.ssl.truststore.location=/opt/alfresco-content-services/jdk-11.0.2/lib/security/cacerts
sudo service tomcat restart sudo service share_tomcat restart
Following changes are to made in Search Service (Admin Console) after all steps stated above are completed.Restart tomcat and share_tomcat (we are running Content Service & Share Service on Tomcat servers in EC2 instance)
sudo service tomcat restart sudo service share_tomcat restart
Search Service in Use: Select Elasticsearch
Elasticsearch Hostname: Enter Elasticsearch domain endpoint after removing https as shown in screenshot above
Port: 443 is used for HTTPS connections
Secure Communications: Select https
Click on Save and restart ACS service to implement the changes
curl https://vpc-env-acs-large82-nodes3-3chcmyl2bamxwyxyyhor4yar7i.eu-west-2.es.amazonaws.com/_cat/indices?v
curl -XPUT 'https://vpc-env-acs-large82-nodes3-3chcmyl2bamxwyxyyhor4yar7i.eu-west-2.es.amazonaws.com:443/alfresco?pretty' -H 'Content-Type: application/json' -d' { "settings" :{ "number_of_shards":10, "number_of_replicas":0 } }'
Create an EC2 instance with Linux OS having 2 core CPU and 8GiB RAM
Attach with security group of ACS, Elasticsearch, TS, DB
Install Java11 using command, command may change based on OS version
sudo amazon-linux-extras install java-openjdk11
Copy the JAR file from Nexus repo: alfresco-elasticsearch-connector-distribution-3.1.0-A2 and browse to the folder where alfresco-elasticsearch-reindexing-3.1.0-A2-app.jar is present
Run following commands to start indexing of un-indexed data in a newly deployed environment
nohup java -Xmx6G -jar alfresco-elasticsearch-reindexing-3.1.0-A2-app.jar \ --alfresco.reindex.jobName=reindexByIds \ --spring.elasticsearch.rest.uris=https://vpc-env-acs-large82-nodes3-3chcmyl2bamxwyxyyhor4yar7i.eu-west-2.es.amazonaws.com:443 \ --spring.datasource.url=jdbc:postgresql://env-acs-large82-cluster.cluster-cd9ifkuhgqhi.eu-west-2.rds.amazonaws.com:5432/alfresco \ --spring.datasource.username=alfresco \ --spring.datasource.password=admin2019 \ --alfresco.accepted-content-media-types-cache.enabled=false \ --spring.activemq.broker-url=failover:\(ssl://b-044009cc-074a-4f4d-9223-a50260e5ad30-1.mq.eu-west-2.amazonaws.com:61617,ssl://b-044009cc-074a-4f4d-9223-a50260e5ad30-2.mq.eu-west-2.amazonaws.com:61617\) \ --spring.activemq.user=alfresco \ --spring.activemq.password='!Alfresco2019' \ --alfresco.reindex.fromId=0 \ --alfresco.reindex.toId=80000000 \ --alfresco.reindex.multithreadedStepEnabled=true \ --alfresco.reindex.concurrentProcessors=10 \ --alfresco.reindex.metadataIndexingEnabled=true \ --alfresco.reindex.contentIndexingEnabled=false \ --alfresco.reindex.pathIndexingEnabled=true \ --alfresco.reindex.pagesize=100 \ --alfresco.reindex.batchSize=100 &
Once we have ES configured with ACS, there is a need to create index that will be referred for performing all search operations. It also needs to have shards that is going to divide the indexed data in small chunks of 25 GB to 50GB which is mandatory to make search operation fast. AWS Elasticsearch recommends the size of each shard to be in range of 25GB to 50GB. Once, these shards are created and indexed data is stored, shards cannot be altered with and thus there is a need to plan its count and size based on data volume. For example, size of 1 Billion files (with metadata and Path indexed) is ~1.3TB and keeping size of each shard at 40 GB, the count of shard comes to 32. If we want to have scope for scaling it for additional 500 million, total size would be around 2TB and to keep size of each shard at 40 GB, we need to have 50 shards.
AWS recommends having at least 1 replica shard for each Primary shard. Replica shards duplicates the content of primary shards and is better for providing resilience and to cater to very high traffic, i.e. to perform the search operation whenever primary shards become overwhelmed with search requests. But, creating 1 or 2 or 3 replica shards for each Primary will need 2X or 3X, or 4X disk size respectively. With up to 1 Billion files of data volume performance testing, replica shards have been seen to have very less impact on search query performance and are not worth spending. However, replica shards are helpful in providing resilience to Elasticsearch.
AWS Elasticsearch offers high user load scalability i.e. to enable our application to support user load of let us say 100 which was initially 50, we have two approaches. The expensive one is to increase the number of Data Nodes that host the shards, this approach will increase the cost due to extra nodes and has not been seen to have satisfactory performance improvements. The second approach is to increase the IOPS capability of the data nodes by selecting EBS: General Purpose (SSD) - gp3 instead of General Purpose (SSD) - gp2. Increasing IOPS capability of data nodes has been seen to have better performance results than adding data nodes to much extent. It is also believed to be up to 20% cheaper per GB than using GP2 which does not support scalable IOPS at fixed EBS volume.