SQL DELETE Duplicate Rows

Summary -

In this topic, we described about the below sections -

DELETE Duplicate Rows
- By using temp table
- By using direct DELETE query

A table may contain duplicate rows if the key not defined. To define a key on the table, we have to delete the duplicate rows from the table. So as a first step, we have to check the table has a duplicate row or not. To do this, sort the table using the column on which the key to define. If multiple rows found with the same data, then follow the below process to delete the duplicates before defining the key.

Note :- Sometimes even though the primary key defined on the table, the table may contain duplicates.

There are two ways to delete the duplicate rows from the table. Those are –

By using temp table - This method is useful when all columns have duplicates and no unique column exists.
By using direct DELETE query – This method is useful when atleast one unique column exists in the table.

Let us discuss about both processes in detail –

By using temp table

In this process, we need to create a temp table. To do that, we need to have admin privileges.

If you have it, then we can proceed with this process. Otherwise, we should proceed with option2.

Let us assume, we have admin privileges and the process steps are specified below -

Create a temp table with same structure as base table
Copy the distinct rows from base table
Delete all rows from base table
Copy distinct rows from temp table to base table.

The syntaxes for the above process is -

Syntax for create a temporary table as base table -

CREATE TABLE temp_table AS
SELECT * FROM base_table WHERE 1=2;

Syntax for moving DISTICT rows from base table to a temporary table –

INSERT INTO temp_table
SELECT DISTINCT * FROM base_table;

Syntax for deleting duplicate data in base table –

DELETE FROM base_table;

Syntax for moving DISTICT rows from a temporary table to actual table –

INSERT INTO base_table
SELECT DISTINCT * FROM temp_table;

Example –

Let us consider below table(s) as an example table(s) to frame the SQL query for getting the desired results.

employee_details -

emp_id	emp_name	designation	manager_id	date_of_hire	salary	dept_id
001	Employee1	Director		2019-07-11	45000.00	1000
002	Employee2	Director		2019-07-11	40000.00	2000
003	Employee3	Manager	Employee1	2019-07-11	27000.00	1000
004	Employee4	Manager	Employee2	2019-10-08	25000.00	2000
005	Employee5	Analyst	Employee3	2019-07-11	20000.00	1000
006	Employee6	Analyst	Employee3	2019-10-08	18000.00	1000
007	Employee7	Clerk	Employee3	2019-07-11	15000.00	1000
008	Employee8	Salesman	Employee4	2019-09-09	14000.00	2000
009	Employee9	Salesman	Employee4	2019-10-08	13000.00	2000
010	Employee6	Analyst	Employee3	2019-10-08	18000.00	1000
011	Employee7	Clerk	Employee3	2019-07-11	15000.00	1000
012	Employee8	Salesman	Employee4	2019-09-09	14000.00	2000
013	Employee9	Salesman	Employee4	2019-10-08	13000.00	2000

Step1 – Create table as employee_details.

CREATE TABLE temp_edetails AS
SELECT * FROM employee_details
WHERE 1=2;

By executing above query, we can create temp_edetails table with the same structure as employee_details table.

Step2 – Copy distinct rows from base table to temp table.

INSERT INTO temp_edetails
SELECT DISTINCT * FROM employee_details;

By executing above query, we can copy unique rows from employee_detail table to temp_edetails table. The temp_edetails table now contains the below rows -

Temp_edetails -

emp_id	emp_name	designation	manager_id	date_of_hire	salary	dept_id
001	Employee1	Director		2019-07-11	45000.00	1000
002	Employee2	Director		2019-07-11	40000.00	2000
003	Employee3	Manager	Employee1	2019-07-11	27000.00	1000
004	Employee4	Manager	Employee2	2019-10-08	25000.00	2000
005	Employee5	Analyst	Employee3	2019-07-11	20000.00	1000
006	Employee6	Analyst	Employee3	2019-10-08	18000.00	1000
007	Employee7	Clerk	Employee3	2019-07-11	15000.00	1000
008	Employee8	Salesman	Employee4	2019-09-09	14000.00	2000
009	Employee9	Salesman	Employee4	2019-10-08	13000.00	2000

Step3 – Delete all rows from the base table.

DELETE FROM employee_details;

By executing above query, we can delete all data present in the employee_details table. The output was as follows –

employee_details -

emp_id	emp_name	designation	manager_id	date_of_hire	salary	dept_id

Step4 – Copy distinct rows from temp table to base table

INSERT INTO employee_details
SELECT DISTINCT * FROM temp_edetails;

By executing above query, we can INSERT DISTINCT details from temp_edetails to employee_details. Now, employee_details table contains unique rows without duplicates –

employee_details -

emp_id	emp_name	designation	manager_id	date_of_hire	salary	dept_id
001	Employee1	Director		2019-07-11	45000.00	1000
002	Employee2	Director		2019-07-11	40000.00	2000
003	Employee3	Manager	Employee1	2019-07-11	27000.00	1000
004	Employee4	Manager	Employee2	2019-10-08	25000.00	2000
005	Employee5	Analyst	Employee3	2019-07-11	20000.00	1000
006	Employee6	Analyst	Employee3	2019-10-08	18000.00	1000
007	Employee7	Clerk	Employee3	2019-07-11	15000.00	1000
008	Employee8	Salesman	Employee4	2019-09-09	14000.00	2000
009	Employee9	Salesman	Employee4	2019-10-08	13000.00	2000

By using direct DELETE query

Execute DELETE query to delete the duplicate rows. Let us see how to prepare the DELETE query. The process should follow the below steps –

Prepare query to get only unique rows
Verify the query as a subquery to get duplicate rows
Execute DELETE query

Let us consider the above employee_details table with duplicates.

emp_id	emp_name	designation	manager_id	date_of_hire	salary	dept_id
001	Employee1	Director		2019-07-11	45000.00	1000
002	Employee2	Director		2019-07-11	40000.00	2000
003	Employee3	Manager	Employee1	2019-07-11	27000.00	1000
004	Employee4	Manager	Employee2	2019-10-08	25000.00	2000
005	Employee5	Analyst	Employee3	2019-07-11	20000.00	1000
006	Employee6	Analyst	Employee3	2019-10-08	18000.00	1000
007	Employee7	Clerk	Employee3	2019-07-11	15000.00	1000
008	Employee8	Salesman	Employee4	2019-09-09	14000.00	2000
009	Employee9	Salesman	Employee4	2019-10-08	13000.00	2000
010	Employee6	Analyst	Employee3	2019-10-08	18000.00	1000
011	Employee7	Clerk	Employee3	2019-07-11	15000.00	1000
012	Employee8	Salesman	Employee4	2019-09-09	14000.00	2000
013	Employee9	Salesman	Employee4	2019-10-08	13000.00	2000

Step1 – Prepare query to get only unique rows.

In the above example, emp_id has unique values. However, the table contains duplicates. So prepare the query to get the latest unique rows emp_id from the table.

SELECT MAX(emp_id) FROM employee_details GROUP BY emp_name,
designation, manager_id, date_of_hire, salary, dept_id;

By executing above query, we can get the DISTINCT rows emp_id from employee_details table. The result was as follows –

emp_id
001
002
003
004
005
006
007
008
009
010
011
012
013

Note :- The columns in the GROUP BY are the columns having the duplicate values.

Step2 – Verify the above query as a subquery to get duplicate rows

SELECT * FROM employee_details WHERE emp_id NOT IN (SELECT MAX(emp_id)
FROM employee_details GROUP BY emp_name, designation, manager_id,
date_of_hire, salary, dept_id);

By executing above query, we can get only duplicate rows from employee_details table. The output was as follows –

emp_id	emp_name	designation	manager_id	date_of_hire	salary	dept_id
006	Employee6	Analyst	Employee3	2019-10-08	18000.00	1000
007	Employee7	Clerk	Employee3	2019-07-11	15000.00	1000
008	Employee8	Salesman	Employee4	2019-09-09	14000.00	2000
009	Employee9	Salesman	Employee4	2019-10-08	13000.00	2000

Step3 - Execute DELETE query

DELETE FROM employee_details WHERE emp_id NOT IN (SELECT MAX(emp_id)
FROM employee_details GROUP BY emp_id, emp_name, designation, manager_id,
date_of_hire, salary, dept_id);

By executing above query, we can delete the duplicate rows from employee_details table.

Once the DELETE statement executed, verify whether duplicate rows are deleted or not by using below query –

SELECT * FROM employee_details;

emp_id	emp_name	designation	manager_id	date_of_hire	salary	dept_id
001	Employee1	Director		2019-07-11	45000.00	1000
002	Employee2	Director		2019-07-11	40000.00	2000
003	Employee3	Manager	Employee1	2019-07-11	27000.00	1000
004	Employee4	Manager	Employee2	2019-10-08	25000.00	2000
005	Employee5	Analyst	Employee3	2019-07-11	20000.00	1000
010	Employee6	Analyst	Employee3	2019-10-08	18000.00	1000
011	Employee7	Clerk	Employee3	2019-07-11	15000.00	1000
012	Employee8	Salesman	Employee4	2019-09-09	14000.00	2000
013	Employee9	Salesman	Employee4	2019-10-08	13000.00	2000