Summary -
In this topic, we described about the below sections -
A table may contain duplicate rows if the key not defined. To define a key on the table, we have to delete the duplicate rows from the table. So as a first step, we have to check the table has a duplicate row or not. To do this, sort the table using the column on which the key to define. If multiple rows found with the same data, then follow the below process to delete the duplicates before defining the key.
There are two ways to delete the duplicate rows from the table. Those are –
- By using temp table - This method is useful when all columns have duplicates and no unique column exists.
- By using direct DELETE query – This method is useful when atleast one unique column exists in the table.
Let us discuss about both processes in detail –
By using temp table
In this process, we need to create a temp table. To do that, we need to have admin privileges.
If you have it, then we can proceed with this process. Otherwise, we should proceed with option2.
Let us assume, we have admin privileges and the process steps are specified below -
- Create a temp table with same structure as base table
- Copy the distinct rows from base table
- Delete all rows from base table
- Copy distinct rows from temp table to base table.
The syntaxes for the above process is -
Syntax for create a temporary table as base table -
CREATE TABLE temp_table AS
SELECT * FROM base_table WHERE 1=2;
Syntax for moving DISTICT rows from base table to a temporary table –
INSERT INTO temp_table
SELECT DISTINCT * FROM base_table;
Syntax for deleting duplicate data in base table –
DELETE FROM base_table;
Syntax for moving DISTICT rows from a temporary table to actual table –
INSERT INTO base_table
SELECT DISTINCT * FROM temp_table;
Example –
Let us consider below table(s) as an example table(s) to frame the SQL query for getting the desired results.
employee_details -
emp_id | emp_name | designation | manager_id | date_of_hire | salary | dept_id |
---|---|---|---|---|---|---|
001 | Employee1 | Director | 2019-07-11 | 45000.00 | 1000 | |
002 | Employee2 | Director | 2019-07-11 | 40000.00 | 2000 | |
003 | Employee3 | Manager | Employee1 | 2019-07-11 | 27000.00 | 1000 |
004 | Employee4 | Manager | Employee2 | 2019-10-08 | 25000.00 | 2000 |
005 | Employee5 | Analyst | Employee3 | 2019-07-11 | 20000.00 | 1000 |
006 | Employee6 | Analyst | Employee3 | 2019-10-08 | 18000.00 | 1000 |
007 | Employee7 | Clerk | Employee3 | 2019-07-11 | 15000.00 | 1000 |
008 | Employee8 | Salesman | Employee4 | 2019-09-09 | 14000.00 | 2000 |
009 | Employee9 | Salesman | Employee4 | 2019-10-08 | 13000.00 | 2000 |
010 | Employee6 | Analyst | Employee3 | 2019-10-08 | 18000.00 | 1000 |
011 | Employee7 | Clerk | Employee3 | 2019-07-11 | 15000.00 | 1000 |
012 | Employee8 | Salesman | Employee4 | 2019-09-09 | 14000.00 | 2000 |
013 | Employee9 | Salesman | Employee4 | 2019-10-08 | 13000.00 | 2000 |
Step1 – Create table as employee_details.
CREATE TABLE temp_edetails AS
SELECT * FROM employee_details
WHERE 1=2;
By executing above query, we can create temp_edetails table with the same structure as employee_details table.
Step2 – Copy distinct rows from base table to temp table.
INSERT INTO temp_edetails
SELECT DISTINCT * FROM employee_details;
By executing above query, we can copy unique rows from employee_detail table to temp_edetails table. The temp_edetails table now contains the below rows -
Temp_edetails -
emp_id | emp_name | designation | manager_id | date_of_hire | salary | dept_id |
---|---|---|---|---|---|---|
001 | Employee1 | Director | 2019-07-11 | 45000.00 | 1000 | |
002 | Employee2 | Director | 2019-07-11 | 40000.00 | 2000 | |
003 | Employee3 | Manager | Employee1 | 2019-07-11 | 27000.00 | 1000 |
004 | Employee4 | Manager | Employee2 | 2019-10-08 | 25000.00 | 2000 |
005 | Employee5 | Analyst | Employee3 | 2019-07-11 | 20000.00 | 1000 |
006 | Employee6 | Analyst | Employee3 | 2019-10-08 | 18000.00 | 1000 |
007 | Employee7 | Clerk | Employee3 | 2019-07-11 | 15000.00 | 1000 |
008 | Employee8 | Salesman | Employee4 | 2019-09-09 | 14000.00 | 2000 |
009 | Employee9 | Salesman | Employee4 | 2019-10-08 | 13000.00 | 2000 |
Step3 – Delete all rows from the base table.
DELETE FROM employee_details;
By executing above query, we can delete all data present in the employee_details table. The output was as follows –
employee_details -
emp_id | emp_name | designation | manager_id | date_of_hire | salary | dept_id |
---|
Step4 – Copy distinct rows from temp table to base table
INSERT INTO employee_details
SELECT DISTINCT * FROM temp_edetails;
By executing above query, we can INSERT DISTINCT details from temp_edetails to employee_details. Now, employee_details table contains unique rows without duplicates –
employee_details -
emp_id | emp_name | designation | manager_id | date_of_hire | salary | dept_id |
---|---|---|---|---|---|---|
001 | Employee1 | Director | 2019-07-11 | 45000.00 | 1000 | |
002 | Employee2 | Director | 2019-07-11 | 40000.00 | 2000 | |
003 | Employee3 | Manager | Employee1 | 2019-07-11 | 27000.00 | 1000 |
004 | Employee4 | Manager | Employee2 | 2019-10-08 | 25000.00 | 2000 |
005 | Employee5 | Analyst | Employee3 | 2019-07-11 | 20000.00 | 1000 |
006 | Employee6 | Analyst | Employee3 | 2019-10-08 | 18000.00 | 1000 |
007 | Employee7 | Clerk | Employee3 | 2019-07-11 | 15000.00 | 1000 |
008 | Employee8 | Salesman | Employee4 | 2019-09-09 | 14000.00 | 2000 |
009 | Employee9 | Salesman | Employee4 | 2019-10-08 | 13000.00 | 2000 |
By using direct DELETE query
Execute DELETE query to delete the duplicate rows. Let us see how to prepare the DELETE query. The process should follow the below steps –
- Prepare query to get only unique rows
- Verify the query as a subquery to get duplicate rows
- Execute DELETE query
Let us consider the above employee_details table with duplicates.
emp_id | emp_name | designation | manager_id | date_of_hire | salary | dept_id |
---|---|---|---|---|---|---|
001 | Employee1 | Director | 2019-07-11 | 45000.00 | 1000 | |
002 | Employee2 | Director | 2019-07-11 | 40000.00 | 2000 | |
003 | Employee3 | Manager | Employee1 | 2019-07-11 | 27000.00 | 1000 |
004 | Employee4 | Manager | Employee2 | 2019-10-08 | 25000.00 | 2000 |
005 | Employee5 | Analyst | Employee3 | 2019-07-11 | 20000.00 | 1000 |
006 | Employee6 | Analyst | Employee3 | 2019-10-08 | 18000.00 | 1000 |
007 | Employee7 | Clerk | Employee3 | 2019-07-11 | 15000.00 | 1000 |
008 | Employee8 | Salesman | Employee4 | 2019-09-09 | 14000.00 | 2000 |
009 | Employee9 | Salesman | Employee4 | 2019-10-08 | 13000.00 | 2000 |
010 | Employee6 | Analyst | Employee3 | 2019-10-08 | 18000.00 | 1000 |
011 | Employee7 | Clerk | Employee3 | 2019-07-11 | 15000.00 | 1000 |
012 | Employee8 | Salesman | Employee4 | 2019-09-09 | 14000.00 | 2000 |
013 | Employee9 | Salesman | Employee4 | 2019-10-08 | 13000.00 | 2000 |
Step1 – Prepare query to get only unique rows.
In the above example, emp_id has unique values. However, the table contains duplicates. So prepare the query to get the latest unique rows emp_id from the table.
SELECT MAX(emp_id) FROM employee_details GROUP BY emp_name,
designation, manager_id, date_of_hire, salary, dept_id;
By executing above query, we can get the DISTINCT rows emp_id from employee_details table. The result was as follows –
emp_id |
---|
001 |
002 |
003 |
004 |
005 |
006 |
007 |
008 |
009 |
010 |
011 |
012 |
013 |
Step2 – Verify the above query as a subquery to get duplicate rows
SELECT * FROM employee_details WHERE emp_id NOT IN (SELECT MAX(emp_id)
FROM employee_details GROUP BY emp_name, designation, manager_id,
date_of_hire, salary, dept_id);
By executing above query, we can get only duplicate rows from employee_details table. The output was as follows –
emp_id | emp_name | designation | manager_id | date_of_hire | salary | dept_id |
---|---|---|---|---|---|---|
006 | Employee6 | Analyst | Employee3 | 2019-10-08 | 18000.00 | 1000 |
007 | Employee7 | Clerk | Employee3 | 2019-07-11 | 15000.00 | 1000 |
008 | Employee8 | Salesman | Employee4 | 2019-09-09 | 14000.00 | 2000 |
009 | Employee9 | Salesman | Employee4 | 2019-10-08 | 13000.00 | 2000 |
Step3 - Execute DELETE query
DELETE FROM employee_details WHERE emp_id NOT IN (SELECT MAX(emp_id)
FROM employee_details GROUP BY emp_id, emp_name, designation, manager_id,
date_of_hire, salary, dept_id);
By executing above query, we can delete the duplicate rows from employee_details table.
Once the DELETE statement executed, verify whether duplicate rows are deleted or not by using below query –
SELECT * FROM employee_details;
emp_id | emp_name | designation | manager_id | date_of_hire | salary | dept_id |
---|---|---|---|---|---|---|
001 | Employee1 | Director | 2019-07-11 | 45000.00 | 1000 | |
002 | Employee2 | Director | 2019-07-11 | 40000.00 | 2000 | |
003 | Employee3 | Manager | Employee1 | 2019-07-11 | 27000.00 | 1000 |
004 | Employee4 | Manager | Employee2 | 2019-10-08 | 25000.00 | 2000 |
005 | Employee5 | Analyst | Employee3 | 2019-07-11 | 20000.00 | 1000 |
010 | Employee6 | Analyst | Employee3 | 2019-10-08 | 18000.00 | 1000 |
011 | Employee7 | Clerk | Employee3 | 2019-07-11 | 15000.00 | 1000 |
012 | Employee8 | Salesman | Employee4 | 2019-09-09 | 14000.00 | 2000 |
013 | Employee9 | Salesman | Employee4 | 2019-10-08 | 13000.00 | 2000 |