Hive Insert into Table

The INSERT INTO command in Hive is used to add new data to a table. This can be data from:

  • Another table in Hive (using a SELECT statement).
  • Or manually written values (in some cases).

In Hive, since it is mostly used for big data and batch processing, the most common way to insert data is by reading from existing tables or external files.

Why Do We Insert Data?

There are many reasons why you’d need to insert data into Hive tables:

  • To load new batches of data coming in daily, weekly, or monthly.
  • To transform data from one table into another (for reporting or analysis).
  • To create summary tables based on calculations.
Syntax -

INSERT INTO TABLE (Adding Data): used when we want to add more rows to an existing table without deleting the existing data.

INSERT INTO TABLE table_name 
VALUES (value1, value2, value3, ...);

INSERT OVERWRITE TABLE (Replacing Data) or Data Using SELECT (From Another Table): used when we want to replace the existing data with new data. That’s when you use INSERT OVERWRITE.

INSERT OVERWRITE TABLE table_name SELECT ...;
Note! INSERT will be used to do bulk insert from one table to another table.
Normally INSERT will be used to copy the data from one database to another database for the same table resides in both databases.
Examples -

Scenario1: Adds a new row with employee details to the existing table employee_data.:

INSERT INTO TABLE employee_data 
VALUES (101, 'Alice', 'IT', 65000);

INSERT INTO appends the new rows to the existing ones and no existing data is lost.

Scenario2: Replace the existing data with new data:

INSERT OVERWRITE TABLE monthly_sales 
SELECT * FROM daily_sales WHERE sale_date = '2025-07-05';

Remove all previous data from monthly_sales. Insert only the selected data from daily_sales where the date matches.

Scenario3: Inserting Data Using SELECT (From Another Table):

INSERT INTO TABLE high_value_orders 
SELECT * FROM orders WHERE order_amount > 10000;

Here you are selecting records from the orders table where the order amount is greater than 10,000 and inserting them into the high_value_orders table.

Scenario4: Inserting Data Using SELECT (From Another Table):

INSERT INTO TABLE sales_partition PARTITION (region='East') 
SELECT order_id, product, amount FROM sales_temp WHERE region = 'East';

This way, data goes into the correct partition in Hive for better performance and organization.