Configure Hadoop and start cluster services using Ansible Playbook!!!

Kanishka Shakya
5 min readMar 21, 2021

TASK DESCRIPTION:-

🖋️ 11.1 Configure Hadoop and start cluster services using Ansible Playbook.

🖋️ 11.3 Restarting HTTPD Service is not idempotence in nature and also consume more resources suggest a way to rectify this challenge in Ansible playbook.

What is Apache Hadoop??

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Hadoop Cluster:-

Hadoop Cluster Architecture, diagram. Moreover, we will look at the Hadoop Cluster advantages and Hadoop Nodes configuration. Cluster is a set of connected computers which work together as a single system. Similarly, the Hadoop cluster is just a computer cluster which we use for Handling huge volume of data distributedly.

NameNode:-

  • NameNode is the centerpiece of HDFS.
  • NameNode is also known as the Master
  • NameNode only stores the metadata of HDFS — the directory tree of all files in the file system, and tracks the files across the cluster.

DataNode:- .

. DataNode is responsible for storing the actual data in HDFS.

. DataNode is also known as the Slave

. NameNode and DataNode are in constant communication.

TargetNode:-

If the targetNode property points to the scene node, then new particles are adjusted when the emitter node rotates, but previously generated particles are not.

LET’S START:-

11.1 Configure Hadoop and start cluster services using Ansible Playbook.

Firstly, we install ansible in our controller node.

#sudo amazon-linux-extras install ansible2

Now, We check ansible install or not .

Now, we creating inventory in the controller node to manage or configure other nodes.

Now, we create an inventory file.

Now, check the list of hosts and connectivity.

Now, We can configure the Data Node similar to Name Node the file is core-site.xml, hdfs-site.xml this.

Here, is the playbook for configuring the target nodes as namenode and datanode.

Here is the Output to configure Nodes.

So,Here is Successfully Setup the hadoop cluster using Ansible.

đź“ť11.3 Restarting HTTPD Service is not idempotence in nature and also consume more resources suggest a way to rectify this challenge in Ansible playbook!!!

Idempotent:-

Idempotence is “the property of certain operations in mathematics and computer science that can be applied multiple times without changing the result beyond the initial application”. For Ansible it means after 1 run of a playbook to set things to a desired state, further runs of the same playbook should result in 0 changes. In simplest terms, idempotency means you can be sure of a consistent state in your environment.

Handlers:-

A Handler is exactly the same as a Task, but it will run when called by another Task. A Handler will take an action when called by an event it listens for.

This is useful for secondary actions that might be required after running a Task, such as starting a new service after installation or reloading a service after a configuration change.

HTTPD Service:-

HTTPd stands for “Hypertext Transfer Protocol daemon” (i.e. Web server).

HTTP Daemon is a software program that runs in the background of a web server and waits for the incoming server requests. The daemon answers the request automatically and serves the hypertext and multimedia documents over the Internet using HTTP.

I have used variable concept to make is Dynamic nd for this i have used vars_files module.

Results:-

Here, this is my OUTPUT:

My Web Page has been deployed.

These shown clearly that my server are running correctly and the configuration is successfully!!!

--

--

Kanishka Shakya

Aviatrix Certified Engineer | DevOps | Python | Big Data | RHCSA 8 | AWS-CSA | AWS-DEVELOPER | Ansible | Docker | CKA & CKAD | GIT & GITHUB |