Summary -
In this topic, we described about the below sections -
Process is to setup multi-node cluster like below.
Hadoop Master | 192.168.1.1 | hadoop-master |
Hadoop Slave1 | 192.168.1.2 | hadoop-slave1 |
Hadoop Slave2 | 192.168.1.3 | hadoop-slave2 |
Installing/verifying Java
Java is the main prerequisite for Hadoop installation and running. The Java JDK latest version should be installed on the system. The version can be checked by using the command below.
> java -version
If java is installed on the machine, it will give you the following output.
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
If java is not installed, install the java with below steps.
a.1:
Download java (JDK <latest version> - X64.tar.gz) from the following link http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
Download jdk-8u<latest-version>-linux-x64.tar.gz from the list for the 64-bit system. If different bit system, choose accordingly.
For example, if the latest-version is 161, the file to download would be jdk-8u161-linux-x64.tar.gz.
a.2:
Change the directory to target directory where the java needs to installed and move the .tar.gz to the target directory.
a.3:
Unpack the tarball and install the JDK.
> % tar zxvf jdk-8u<latest-version>-linux-x64.tar.gz
For example, if the latest-version is 161, the command would be
> % tar zxvf jdk-8u161-linux-x64.tar.gz
The Java Development Kit files are installed in a directory called jdk1.8.0_<latest-version> in the current directory.
a.4:
Delete the .tar.gz file to save disk space.
> rm jdk-8u<latest-version>-linux-x64.tar.gz
Now verify the java version with -version command from the terminal.
> java -version
Produces the below output -
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
a.5:
To make java available to all the users, move it to the location “/usr/local/”. Open the root and type the following commands.
$ su
password:
a.6:
For setting up PATH and JAVA_HOME variables, add the following commands to ~/.bashrc file.
export JAVA_HOME=/usr/local/jdk1.7.0_71
export PATH=PATH:$JAVA_HOME/bin
Follow the above process and install java in all your cluster nodes.
Creating User Account
Creating Hadoop user group and user is not mandatory. But it is recommended before installing Hadoop.
Open the Linux terminal and type below command to create user group.
$ sudo addgroup hadoop
If the command is successful, you will get the below messages and command prompt will display again.
$ sudo addgroup hadoop
Adding group ‘hadoop’ (GID 1001) …
Done.
$
Type below command to create user.
$ sudo adduser —ingroup hadoop hdpuser
If the command is successful, you will be prompted to enter the below details highlighted in bold.
$ sudo adduser —ingroup hadoop hdpuser
Adding user ‘hdpuser’ ...
Adding new user ‘hdpuser’ (1002) with the group ‘hadoop’ ...
Creating home directory ‘/home/hdpuser’ ...
Copying files from ‘/etc/skel’ ...
Enter new UNIX password:
Retype new UNIX password:
Password: password updated successfully
Changing the user information for hdpuser
Enter the new value or press enter for the default
Full Name[]:
Room Number[]:
Work Phone[]:
Home Phone[]:
Other[]:
Is the information correct? [Y/n] Y
$
Once the command prompt appeared, then the user created successfully.
Create a system user account on both master and slave systems to use the Hadoop installation.
# useradd hadoop
# passwd hadoop
Mapping the nodes
Edit hosts file in /etc/ folder on all nodes and specify the IP address of each system followed by their host names.
$ vi /etc/hosts
# enter the following lines in the /etc/hosts file.
192.168.1.1 hadoop-master
192.168.1.2 hadoop-slave1
192.168.1.3 hadoop-slave2
Configuring Key Based Login
Setup ssh in every node to communicate with one another without prompting for password.
# su hadoop
$ ssh-keygen -t rsa
$ ssh-copy-id -i ~/.ssh/id_rsa.pub tc@hadoop-master
$ ssh-copy-id -i ~/.ssh/id_rsa.pub tc@hadoop-slave1
$ ssh-copy-id -i ~/.ssh/id_rsa.pub tc@hadoop-slave2
$ chmod 0600 ~/.ssh/authorized_keys
$ exit
Installing Hadoop
In the Master server, download and extract Hadoop 3.1.0 from Apache software foundation using the following commands.
$ mkdir /hadoop
$ cd /hadoop/
$ wget http://www-us.apache.org/dist/hadoop/common/
hadoop-3.1.0/hadoop-3.1.0-src.tar.gz
$
Once the download completes, we need to untar the Tarball archive.
$ tar -xzf hadoop-3.1.0.tar.gz
Rename the folder extracted to hadoop to avoid confusion.
$ mv hadoop-3.1.0 hadoop
$ chown -R hadoop /hadoop
Configuring Hadoop
You have to configure Hadoop server by making the following changes as given below.
core-site.xml
Open the core-site.xml file and edit it as shown below.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master:9000/</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
hdfs-site.xml
Open the hdfs-site.xml file and edit it as shown below.
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop/hadoop/dfs/name/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/hadoop/hadoop/dfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml
Open the mapred-site.xml file and edit it as shown below.
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoop-master:9001</value>
</property>
</configuration>
hadoop-env.sh
Open the hadoop-env.sh file and edit JAVA_HOME, HADOOP_CONF_DIR, and HADOOP_OPTS as shown below.
$ vi /home/hdpuser/hadoop/conf/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
export HADOOP_CONF_DIR=/opt/hadoop/hadoop/conf
:wq
$
Installing Hadoop on Slave Servers
Install Hadoop on all the slave servers by following the given commands.
$ su hadoop
$ cd /opt/hadoop
$ scp -r hadoop hadoop-slave1:/opt/hadoop
$ scp -r hadoop hadoop-slave2:/opt/hadoop
Configuring Hadoop on Master Server
Open the master server and configure it by following the given commands.
$ su hadoop
$ cd /opt/hadoop/hadoop
Configuring Master Node
$ vi etc/hadoop/masters
hadoop-master
Configuring Slave Node
$ vi etc/hadoop/slaves
hadoop-slave1
hadoop-slave2
Format Name Node on Hadoop Master
$ su hadoop
$ cd /opt/hadoop/hadoop
$ bin/hadoop namenode –format
Starting Hadoop Services
Start Hadoop
Below command is to start all the Hadoop services on the Hadoop-Master.
$ cd $HADOOP_HOME/sbin
$ start-all.sh
Start HDFS Services
Run the below command on the Hadoop-master
$ sbin/start-dfs.sh
Start YARN Services
Run the below command on the Hadoop-master
$sbin/start-yarn.sh
Check for Hadoop services
Check daemons on Master
jps
NameNode
ResourceManager
I.4.b. Check daemons on Slaves
jps
DataNode
NodeManager