Summary -
In this topic, we described about the below sections -
Before proceeding with Hive installation, Hadoop should be installed first. Refer Hadoop installation steps here
Step-1: Apache Hive installation
Below are the steps to install Apache Hive -
Download the Apache Hive
Start downloading the Hive most recent stable release from Apache download mirrors link: https://hive.apache.org/downloads.html
And the mirror site suggested http://mirror.fibergrid.in/apache/hive/
To download from Hadoop use the below command
$ wget http://mirror.fibergrid.in/apache/hive
//hive-x.y.z/hive-x.y.z.tar.gz
Or use the below method to find the version after download from webpage.
$ cd Downloads
$ ls
ls command will display the downloaded file apache-hive-x.y.z-bin.tar.gz
Installing the Apache Hive
Verify and Untar the archive file downloaded.
The below command is used to verify and untar the downloaded file.
$ tar -xzvf hive-x.y.z.tar.gz $ ls
ls command will display the downloaded file and untar file.
apache-hive-x.y.z-bin apache-hive-x.y.z-bin.tar.gz
Copy the files to /usr/local/hive directory
$ cd /home/user/Download $ mv apache-hive-x.y.z-bin /usr/local/hive
Set the environment variable HIVE_HOME to point to the installation directory
$ cd hive $ export HIVE_HOME=/usr/local/hive
Add $HIVE_HOME/bin to your PATH
$ export PATH=$HIVE_HOME/bin:$PATH
Add $CLASSPATH to libraries
export CLASSPATH=$CLASSPATH:/usr/local/Hadoop/lib/*:. export CLASSPATH=$CLASSPATH:/usr/local/hive/lib/*:.
Configuring the Apache Hive
To configure Hive with Hadoop, change hive-env.sh file in $HIVE_HOME/conf directory. Go to the directory
$ cd $HIVE_HOME/conf
$ cp hive-env.sh.template hive-env.sh
Edit the hive-env.sh to append the below.
export HADOOP_HOME=/usr/local/hadoop
Now Hive installation got successfully completed. Hive requires an external database server to configure metastore.
Step-2: External Database Server installation
Download the Apache Derby distribution from the Derby web site at http://db.apache.org/derby/derby_downloads.html. Always download the latest version of the Apache Derby distribution form the Derby website.
The below are the latest version of Apache Derby based on the operating system while creating this tutorial.
Operating System | Download File |
---|---|
Windows | db-derby-10.12.1.1-bin.zip |
UNIX, Linux, and Mac | db-derby-10.12.1.1-bin.tar.gz |
Installing Derby
Choose the directory with write permissions for the user to install the Derby software. The below is installation procedure for windows and unix, linux and mac separately.
Windows | Unix, linux and mac |
---|---|
|
|
Set DERBY_INSTALL
Set the DERBY_INSTALL variable to the location where Derby installed.
Windows | Unix, linux and mac |
---|---|
|
|
Configure Embedded Derby
To use Derby in embedded mode, set CLASSPATH to include the jar files Derby.jar, Derbytools.jar.
- derby.jar: contains the Derby engine and the Derby Embedded JDBC driver
- derbytools.jar: optional
Windows | Unix, linux and mac |
---|---|
|
|
Change directory into the DERBY_INSTALL/bin directory. For Derby embedded usage, the setEmbeddedCP.bat (Windows) and setEmbeddedCP (UNIX) scripts use the DERBY_INSTALL variable to set the CLASSPATH.
Windows | Unix, linux and mac |
---|---|
|
|
Verify Derby
Echo CLASSPATH to double check each entry in class path to verify that the jar file where it expected:
Windows | Unix, linux and mac |
---|---|
|
|
Step-3: Configuring Hive metastore
Metastore need to be configured to specify where the database is stored. This needs a change in hive-site.xml filewhich is in the $HIVE_HOME/conf directory. As a first step, the template file need to be copied using the below command:
$ cd $HIVE_HOME/conf
$ cp hive-default.xml.template hive-site.xml
Edit hive-site.xml and append the following lines between the <configuration> and </configuration> tag
Path: /opt/hadoop/hive/conf/hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://hadoop1:1527/metastore_db;create=true
</value>
<description>JDBC connect string for a JDBC metastore
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.ClientDriver</value>
<description>Driver class name for a JDBC metastore
</description>
</property>
JPOX properties can be specified in hive-site.xml. Changes required in the below file.
Path: /opt/hadoop/hive/conf/jpox.properties
javax.jdo.PersistenceManagerFactoryClass=org.
jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema=false
org.jpox.validateTables=false
org.jpox.validateColumns=false
org.jpox.validateConstraints=false
org.jpox.storeManagerType=rdbms
org.jpox.autoCreateSchema=true
org.jpox.autoStartMechanismMode=checked
org.jpox.transactionIsolation=read_committed
javax.jdo.option.DetachAllOnCommit=true
javax.jdo.option.NontransactionalRead=true
javax.jdo.option.ConnectionDriverName=
org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL=jdbc:derby:
//hadoop1:1527/metastore_db;create=true
javax.jdo.option.ConnectionUserName=APP
javax.jdo.option.ConnectionPassword=mine
Step-4: Running and Verifying Hive
Hadoop must be installed in the path or cap the Hadoop by using below command.
export HADOOP_HOME=<hadoop-install-dir>
Create /tmp and /user/hive/warehouse and set them chmod g+w in HDFS before a table create in Hive.
Commands to perform this setup:
$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
Below commands are used to verify the Hive installation
$ cd $HIVE_HOME
$ bin/hive
After successful login the Hive prompt will be shown.
hive>
The metastore will not be created until the first query hits it. so trigger the below query.
hive> show tables;
OK
Now you can run multiple Hive instances working on the same data simultaneously and remotely.