Hadoop Multi node cluster

Summary -

In this topic, we described about the below sections -

Multi node cluster

Process is to setup multi-node cluster like below.

Hadoop Master	192.168.1.1	hadoop-master
Hadoop Slave1	192.168.1.2	hadoop-slave1
Hadoop Slave2	192.168.1.3	hadoop-slave2

Installing/verifying Java

Java is the main prerequisite for Hadoop installation and running. The Java JDK latest version should be installed on the system. The version can be checked by using the command below.

> java -version

If java is installed on the machine, it will give you the following output.

java version "1.8.0_161" 
Java(TM) SE Runtime Environment (build 1.8.0_161-b12) 
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

If java is not installed, install the java with below steps.

a.1:

Download java (JDK <latest version> - X64.tar.gz) from the following link http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

Download jdk-8u<latest-version>-linux-x64.tar.gz from the list for the 64-bit system. If different bit system, choose accordingly.

For example, if the latest-version is 161, the file to download would be jdk-8u161-linux-x64.tar.gz.

a.2:

Change the directory to target directory where the java needs to installed and move the .tar.gz to the target directory.

a.3:

Unpack the tarball and install the JDK.

> % tar zxvf jdk-8u<latest-version>-linux-x64.tar.gz

For example, if the latest-version is 161, the command would be

> % tar zxvf jdk-8u161-linux-x64.tar.gz

The Java Development Kit files are installed in a directory called jdk1.8.0_<latest-version> in the current directory.

a.4:

Delete the .tar.gz file to save disk space.

> rm jdk-8u<latest-version>-linux-x64.tar.gz

Now verify the java version with -version command from the terminal.

> java -version

Produces the below output -

java version "1.8.0_161" 
Java(TM) SE Runtime Environment (build 1.8.0_161-b12) 
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

a.5:

To make java available to all the users, move it to the location “/usr/local/”. Open the root and type the following commands.

$ su
password:

a.6:

For setting up PATH and JAVA_HOME variables, add the following commands to ~/.bashrc file.

export JAVA_HOME=/usr/local/jdk1.7.0_71
export PATH=PATH:$JAVA_HOME/bin

Follow the above process and install java in all your cluster nodes.

Creating User Account

Creating Hadoop user group and user is not mandatory. But it is recommended before installing Hadoop.

Open the Linux terminal and type below command to create user group.

$ sudo addgroup hadoop

If the command is successful, you will get the below messages and command prompt will display again.

$ sudo addgroup hadoop
Adding group ‘hadoop’ (GID 1001) …
Done.
$

Type below command to create user.

$ sudo adduser —ingroup hadoop hdpuser

If the command is successful, you will be prompted to enter the below details highlighted in bold.

$ sudo adduser —ingroup hadoop hdpuser
Adding user ‘hdpuser’ ...
Adding new user ‘hdpuser’ (1002) with the group ‘hadoop’ ...
Creating home directory ‘/home/hdpuser’ ...
Copying files from ‘/etc/skel’ ...
Enter new UNIX password:
Retype new UNIX password:
Password: password updated successfully
Changing the user information for hdpuser
Enter the new value or press enter for the default
	Full Name[]: 
	Room Number[]:
	Work Phone[]:
	Home Phone[]:
	Other[]:
Is the information correct? [Y/n]  Y
$

Once the command prompt appeared, then the user created successfully.

Create a system user account on both master and slave systems to use the Hadoop installation.

# useradd hadoop 
# passwd hadoop

Mapping the nodes

Edit hosts file in /etc/ folder on all nodes and specify the IP address of each system followed by their host names.

$ vi /etc/hosts
# enter the following lines in the /etc/hosts file.
192.168.1.1 hadoop-master 
192.168.1.2 hadoop-slave1 
192.168.1.3 hadoop-slave2

Configuring Key Based Login

Setup ssh in every node to communicate with one another without prompting for password.

# su hadoop 
$ ssh-keygen -t rsa 
$ ssh-copy-id -i ~/.ssh/id_rsa.pub tc@hadoop-master 
$ ssh-copy-id -i ~/.ssh/id_rsa.pub tc@hadoop-slave1 
$ ssh-copy-id -i ~/.ssh/id_rsa.pub tc@hadoop-slave2 
$ chmod 0600 ~/.ssh/authorized_keys 
$ exit

Installing Hadoop

In the Master server, download and extract Hadoop 3.1.0 from Apache software foundation using the following commands.

$ mkdir /hadoop 
$ cd /hadoop/ 
$ wget http://www-us.apache.org/dist/hadoop/common/
hadoop-3.1.0/hadoop-3.1.0-src.tar.gz
$

Once the download completes, we need to untar the Tarball archive.

$ tar -xzf hadoop-3.1.0.tar.gz

Rename the folder extracted to hadoop to avoid confusion.

$ mv hadoop-3.1.0 hadoop
$ chown -R hadoop /hadoop

Configuring Hadoop

You have to configure Hadoop server by making the following changes as given below.

core-site.xml

Open the core-site.xml file and edit it as shown below.

<configuration>
    <property> 
        <name>fs.default.name</name> 
        <value>hdfs://hadoop-master:9000/</value> 
    </property> 
    <property> 
        <name>dfs.permissions</name> 
        <value>false</value> 
    </property> 
</configuration>

hdfs-site.xml

Open the hdfs-site.xml file and edit it as shown below.

<configuration>
    <property> 
        <name>dfs.data.dir</name> 
        <value>/opt/hadoop/hadoop/dfs/name/data</value> 
        <final>true</final> 
    </property> 

    <property> 
        <name>dfs.name.dir</name> 
        <value>/opt/hadoop/hadoop/dfs/name</value> 
        <final>true</final> 
    </property> 

    <property> 
        <name>dfs.replication</name> 
        <value>1</value> 
    </property> 
</configuration>

mapred-site.xml

Open the mapred-site.xml file and edit it as shown below.

<configuration>
    <property> 
        <name>mapred.job.tracker</name>
        <value>hadoop-master:9001</value>
    </property>
</configuration>

hadoop-env.sh

Open the hadoop-env.sh file and edit JAVA_HOME, HADOOP_CONF_DIR, and HADOOP_OPTS as shown below.

$ vi /home/hdpuser/hadoop/conf/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true 
export HADOOP_CONF_DIR=/opt/hadoop/hadoop/conf
:wq
$

Installing Hadoop on Slave Servers

Install Hadoop on all the slave servers by following the given commands.

$ su hadoop 
$ cd /opt/hadoop 
$ scp -r hadoop hadoop-slave1:/opt/hadoop 
$ scp -r hadoop hadoop-slave2:/opt/hadoop

Configuring Hadoop on Master Server

Open the master server and configure it by following the given commands.

$ su hadoop
$ cd /opt/hadoop/hadoop

Configuring Master Node

$ vi etc/hadoop/masters
hadoop-master

Configuring Slave Node

$ vi etc/hadoop/slaves
hadoop-slave1 
hadoop-slave2

Format Name Node on Hadoop Master

$ su hadoop 
$ cd /opt/hadoop/hadoop 
$ bin/hadoop namenode –format

Starting Hadoop Services

Start Hadoop

Below command is to start all the Hadoop services on the Hadoop-Master.

$ cd $HADOOP_HOME/sbin
$ start-all.sh

Start HDFS Services

Run the below command on the Hadoop-master

$ sbin/start-dfs.sh

Start YARN Services

Run the below command on the Hadoop-master

$sbin/start-yarn.sh

Check for Hadoop services

Check daemons on Master

jps
NameNode
ResourceManager

I.4.b. Check daemons on Slaves

jps
DataNode
NodeManager