We have used three instances, one as Master eligible node that can work as both Master and Data node; other two will work as data node only. One can choose to use any number of EC2 instances, here we have used three. The instances that we choose have the following configurations:
AMI ID : ami-08153220276a5d89b
, this AMI has RHEL 8
EC2 Instance type : r5.xlarge
Attached EBS : 1500 GiB
(Note: EBS Volume will depend on the volume of data to be indexed. For 80mln use 1.5TB, for 500mln use 10TB and 1000mln use 20TB)
Security Group : Use SG same as ACS, or use another SG with open ports to connect with ACS
Private IPs of three launched instances [“10.0.2.85", "10.0.2.81","10.0.2.68"]
Install & Configure Elasticsearch on Instances
Now out of three EC2 instances we are making one EC2 instance act as both Master and Data node and other two EC2 instances as Data node.
Let’s assume its Private IP = 10.0.2.68
Install Elasticsearch by using the following command
sudo rpm -i https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.5.1-x86_64.rpm
This will rerun all generators , reload all unit files, and recreate the entire dependency tree.
sudo systemctl daemon-reload
Use the following command to enable elasticsearch service
sudo systemctl enable elasticsearch.service
Open the elasticsearch.yml file by using the following command
sudo vi /etc/elasticsearch/elasticsearch.yml
Add/modify the following configurations in elasticsearch.yml file
cluster.name: my-application node.name: node-1 network.host: 10.0.2.68 http.port: 9200 discovery.seed_hosts: ["10.0.2.85", "10.0.2.81","10.0.2.68"] cluster.initial_master_nodes: ["10.0.2.68"] node.master: true
Use the given below command to start the elasticsearch
sudo systemctl start elasticsearch.service
Use below command to check the Elasticsearch status
sudo systemctl status elasticsearch.service
Expected Response:
elasticsearch.service - Elasticsearch Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2022-12-06 12:50:25 UTC; 22h ago Docs: http://www.elastic.co Main PID: 5528 (java) Tasks: 84 (limit: 201139) Memory: 28.1G CGroup: /system.slice/elasticsearch.service ├─5528 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF> └─5622 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller Dec 06 12:50:14 ip-10-0-2-68.eu-west-2.compute.internal systemd[1]: Starting Elasticsearch... Dec 06 12:50:15 ip-10-0-2-68.eu-west-2.compute.internal elasticsearch[5528]: OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a fut> Dec 06 12:50:25 ip-10-0-2-68.eu-west-2.compute.internal systemd[1]: Started Elasticsearch.
Let’s assume its Private IP = 10.0.2.81
Install Elasticsearch by using the following command
sudo rpm -i https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.5.1-x86_64.rpm
This will rerun all generators , reload all unit files, and recreate the entire dependency tree.
sudo systemctl daemon-reload
Use the following command to enable elasticsearch service
sudo systemctl enable elasticsearch.service
Open the elasticsearch.yml file by using the following command
sudo vi /etc/elasticsearch/elasticsearch.yml
Add/modify the following configurations in elasticsearch.yml file
cluster.name: my-application node.name: node-2 network.host: 10.0.2.81 http.port: 9200 discovery.seed_hosts: ["10.0.2.85", "10.0.2.81","10.0.2.68"] cluster.initial_master_nodes: ["10.0.2.68"]
Use the given below command to start the elasticsearch
sudo systemctl start elasticsearch.service
Use below command to check the Elasticsearch status
sudo systemctl status elasticsearch.service
Expected Response:
elasticsearch.service - Elasticsearch Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2022-12-06 12:50:25 UTC; 22h ago Docs: http://www.elastic.co Main PID: 5528 (java) Tasks: 84 (limit: 201139) Memory: 28.1G CGroup: /system.slice/elasticsearch.service ├─5528 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF> └─5622 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller Dec 06 12:50:14 ip-10-0-2-68.eu-west-2.compute.internal systemd[1]: Starting Elasticsearch... Dec 06 12:50:15 ip-10-0-2-68.eu-west-2.compute.internal elasticsearch[5528]: OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a fut> Dec 06 12:50:25 ip-10-0-2-68.eu-west-2.compute.internal systemd[1]: Started Elasticsearch.
Let’s assume its Private IP = 10.0.2.85
Install Elasticsearch by using the following command
sudo rpm -i https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.5.1-x86_64.rpm
This will rerun all generators , reload all unit files, and recreate the entire dependency tree.
sudo systemctl daemon-reload
Use the following command to enable elasticsearch service
sudo systemctl enable elasticsearch.service
Open the elasticsearch.yml file by using the following command
sudo vi /etc/elasticsearch/elasticsearch.yml
Add/modify the following configurations in elasticsearch.yml file
cluster.name: my-application node.name: node-3 network.host: 10.0.2.85 http.port: 9200 discovery.seed_hosts: ["10.0.2.85", "10.0.2.81","10.0.2.68"] cluster.initial_master_nodes: ["10.0.2.68"]
Use the given below command to start the elasticsearch
sudo systemctl start elasticsearch.service
Use below command to check the Elasticsearch status
sudo systemctl status elasticsearch.service
Expected Response:
elasticsearch.service - Elasticsearch Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2022-12-06 12:50:25 UTC; 22h ago Docs: http://www.elastic.co Main PID: 5528 (java) Tasks: 84 (limit: 201139) Memory: 28.1G CGroup: /system.slice/elasticsearch.service ├─5528 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF> └─5622 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller Dec 06 12:50:14 ip-10-0-2-68.eu-west-2.compute.internal systemd[1]: Starting Elasticsearch... Dec 06 12:50:15 ip-10-0-2-68.eu-west-2.compute.internal elasticsearch[5528]: OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a fut> Dec 06 12:50:25 ip-10-0-2-68.eu-west-2.compute.internal systemd[1]: Started Elasticsearch.
Now Finally we are done with making a cluster with 3 data nodes in which one node is both Data and Master Node and two other nodes are Data Node.
Create desired number of Primary and Replica Shards using below Curl command
curl -XPUT 'http://10.0.2.68:9200/alfresco?pretty' -H 'Content-Type: application/json' -d' { "settings" :{ "number_of_shards":24, "number_of_replicas":0 } }'
Now hit curl command from bastion to check Elasticsearch Cluster Details
curl -X GET http://10.0.2.68:9200/_cluster/health?pretty
Expected Response:
{ "cluster_name" : "my-application", "status" : "green", "timed_out" : false, "number_of_nodes" : 3, "number_of_data_nodes" : 3, "active_primary_shards" : 24, "active_shards" : 24, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }
Once we are done with above steps, we have our application ready for indexing the metadata, content, path of files in the repository or to be uploaded which will be later used for performing search results