Summary -
In this topic, we described about the below sections -
Hadoop supported to install on GNU/Linux platform. Hadoop installation is easy when installing on the Linux environment. Hadoop needs to be installed on VirtualBox if installing Hadoop on any OS other than Linux.
We will discuss about Hadoop installation on both Linux and VirtualBox environments.
Installing/verifying Java -
Java is the main prerequisite for Hadoop installation and running. The Java JDK latest version should be installed on the system. The version can be checked by using the command below.
> java -version
If java is installed on the machine, it will give you the following output.
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
If java is not installed, install the java with below steps.
a.1 -
Download java (JDK <latest version> - X64.tar.gz) from the following link http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
Download jdk-8u<latest-version>-linux-x64.tar.gz from the list for the 64-bit system. If different bit system, choose accordingly.
For example, if the latest-version is 161, the file to download would be jdk-8u161-linux-x64.tar.gz.
a.2 -
Change the directory to target directory where the java needs to installed and move the .tar.gz to the target directory.
a.3 -
Unpack the tarball and install the JDK.
> % tar zxvf jdk-8u<latest-version>-linux-x64.tar.gz
For example, if the latest-version is 161, the command would be
> % tar zxvf jdk-8u161-linux-x64.tar.gz
The Java Development Kit files are installed in a directory called jdk1.8.0_<latest-version> in the current directory.
a.4 -
Delete the .tar.gz file to save disk space.
> rm jdk-8u<latest-version>-linux-x64.tar.gz
Now verify the java version with -version command from the terminal.
> java -version
Produces the below output -
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
Pre-installation Setup -
Linux installation must be completed and should be in working condition before proceeding with the Hadoop installation. There are some other preinstallation setup in Linux environment is required. The below steps will provide the steps to setting up linux environment –
Creating a User Group and User -
Creating Hadoop user group and user is not mandatory. But it is recommended before installing Hadoop. Open the Linux terminal and type below command to create user group.
$ sudo addgroup hadoop
If the command is successful, you will get the below messages and command prompt will display again.
$ sudo addgroup hadoop
Adding group ‘hadoop’ (GID 1001) …
Done.
$
Type below command to create user.
$ sudo adduser —ingroup hadoop hdpuser
If the command is successful, you will be prompted to enter the below details highlighted in bold.
$ sudo adduser —ingroup hadoop hdpuser
Adding user ‘hdpuser’ ...
Adding new user ‘hdpuser’ (1002) with the group ‘hadoop’ ...
Creating home directory ‘/home/hdpuser’ ...
Copying files from ‘/etc/skel’ ...
Enter new UNIX password:
Retype new UNIX password:
Password: password updated successfully
Changing the user information for hdpuser
Enter the new value or press enter for the default
Full Name[]:
Room Number[]:
Work Phone[]:
Home Phone[]:
Other[]:
Is the information correct? [Y/n] Y
$
Once the command prompt appeared, then the user created successfully.
SSH Server installation -
SSH is used for remote login. SSH is required in Hadoop to manage its nodes, i.e. remote machines and local machine. Hadoop requires SSH Server to login to local host. So SSH Server installation is mandatory step and should be installed successfully.
b.2.1 -
Installation of SSH: Type below command to install the SSH server
$ sudo apt-get install ssh
$ sudo apt-get install pdsh
b.2.2 -
After installing the SSH server, need to login as a newly created user. This step is optional and not mandatory. Type below command to login with the user created
$ su – hdpuser
b.2.3 -
Generate Key Pairs: As a next step, create the SSH key using sshgen command.
$ ssh-keygen -t rsa -P ""
b.2.4 -
After successful key generation, the key needs to be added to the authorized keys file to enable the login without prompting for password.
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Or
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
b.2.5 -
Change the file permission that contains keys.
$ chmod 0600 ~/.ssh/authorized_keys
Downloading Hadoop
Download and extract Hadoop 3.1.0 from Apache software foundation using the following commands.
$ wget http://www-us.apache.org/dist/hadoop/common/
hadoop-3.1.0/hadoop-3.1.0-src.tar.gz$
Once the download completes, we need to untar the Tarball archive
$ tar -xzf hadoop-3.1.0.tar.gz
Rename the folder extracted to hadoop to avoid confusion
$ mv hadoop-3.1.0 hadoop
Installing Hadoop in Standalone Mode
The below processing steps shows the installation of hadoop.
Setting Up Hadoop
c.1.1 -
First step need to update the bashrc file with few environment variables.
$vi .bashrc
Copy below information to bashrc and save it
# Set Hadoop-related environment variables,
Points to Hadoop home directory
export HADOOP_PREFIX=/home/hduser/hadoop
# This one points to the Java home oracle home directory.
# Set JAVA_HOME (we will also configure JAVA_HOME directly
for Hadoop later)
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
# The last one is to update the PATH to include the Hadoop
Home directory
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}
:wq
$
c.1.2 -
Update hadoop-env.sh file with java Home directory. In this update only, java host file
$ vi /home/hdpuser/hadoop/conf/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
:wq
$
c.1.3 -
Update core-site.xml file to update the temp directory and fs default name.
$ mkdir /home/hduser/tmp
$ vi /home/hdpuser/hadoop/conf/core-site.xml
Copy the below information in between the configuration tags (<configuration></configuration>).
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
c.1.4 -
Update the mapred-site.xml to define the property of job tracker
$ vi /home/hdpuser/hadoop/conf/mapred-site.xml
Copy the below information in between the configuration tags (<configuration></configuration>)
<configuration>
<property>
<name>mapreduce.job.tracker </name>
<value>localhost:9000</value>
</property>
</configuration>
c.1.5 -
Update the hdfs-site.xml to define the property of replication.
$ vi /home/hdpuser/hadoop/conf/hdfs-site.xml
Copy the below information in between the configuration tags (<configuration></configuration>)
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Now the configuration completed successfully.
Verify the hadoop Installation
c.2.1 -
Format the name node when the hadoop running first time.
$ hadoop namenode –format
Remember, this is one-time activity, should be done only first time when running hadoop and not required every time.
c.2.2: Start Hadoop -
There are two scripts to start hadoop
$ start-dfs.sh
$ start-mapred.sh
c.2.3 -
To check the services running on hadoop, use the below command
$jps
Accessing Hadoop on Browser
The default port number to access Hadoop is 50070 and use the following url to get Hadoop services on browser.
http://localhost:50070/
Verify All Applications for Cluster
The default port number to access all applications of cluster is 8088 and use the following url to visit this service.
http://localhost:8088/
To stop Hadoop need to use the below commands
$ stop-dfs.sh
$ stop-mapred.sh