Tech Point Fundamentals

Sunday, February 6, 2022

Different Ways to Check and Delete Duplicate Records in SQL

Different Ways to Check and Delete Duplicate Records in SQL

deleting-duplicate-records-in-sql

In the SQL Interview, this question is very common that how can you check and delete the duplicate records from the table. Some interviewers ask this question in a more specific way i.e what you have to use and which record should be kept as the original record. So in this article, we will walk through all the possible ways to check and delete duplicate records in SQL. 


Please visit our Youtube Channel for more SQL Interview Questions and Answers videos by the below link:




Below are some Common Interview Questions which are for identifying and deleting duplicate records from the SQL Table:


Q01. What are the different ways to DELETE the duplicate records from a table in SQL?


Q02. What are the different ways to DELETE the duplicate records from a table which contains UNIQUE Index or Primary Key in SQL?


Q03. What are the different ways to DELETE the duplicate records from a table which does not contain any UNIQUE Index or Primary Key in SQL?


Q04. How can you DELETE the duplicate records by keeping the first inserted record as the original record?


Q05. How can you DELETE the duplicate records by keeping the last inserted record as the original record?




Q06. How can you check and DELETE the duplicate records using SQL SubQuery?


Q07. How can you check and DELETE the duplicate records using SQL CoRelated SubQuery?


Q08. How can you check and DELETE the duplicate records using IN  Clause?


Q09. How can you check and DELETE the duplicate records using SQL JOIN?


Q10. How can you check and DELETE the duplicate records using the Derived Table(DT)?


Q11. How can you check and DELETE the duplicate records using the Common Table Expression (CTE)?


Q12. How can you DELETE the duplicate records using SQL Server integration service(SSIS) in SQL?


Please visit here to know the different ways to get the 3rd Highest Salary in SQL.







Introduction


We have a table called Customer, which contains the below records:


customer-table




As we can see that this table contains some duplicate records. 


Please watch the Count(*) vs COUNT(1) video here.


duplicate-records



We will use this Customer table for our query.  We will see all the possible ways to check and delete the duplicate records from this table. 


For each delete query demo, here I have kept the first inserted record as the original record. If you want to keep the last inserted record as the original record, you can change the condition accordingly.


After each and every delete query I have restored the data before executing the next delete query behind the scene. So please don't confuse that after deleting the duplicate how can we delete the duplicate records again and again.


Please watch the DELETE vs TRUNCATE video here.




Deleting Duplicate Records Using IN Clause or SubQuery


We can use the IN Clause to check and delete the duplicate records from a table. Here in the Subquery, we have used the GROUP BY Clause to get the minimum record id which we have to keep as the original record.


delete-using-subquery

Please watch the SubQuery video here and IN vs NOT IN Clause video here for more details.



Deleting Duplicate Records Using CoRelated SubQuery


We can also use the CoRelated Subquery to check and delete the duplicate records from a table. Here in the Subquery, we have used the MIN or MAX function to get the minimum or maximum record id which we have to keep as the original record.


removing-duplicate-by-corelated-subquery


Please watch the CoRelated SubQuery video here for more details.



Deleting Duplicate Records Using JOIN


We can use the JOIN (Self Join) as well to check and delete the duplicate records from a table.  


Please watch the INNER JOIN vs IN Clause video here.


Here in the Subquery, we have used the GROUP BY and HAVING clause to exclude the record having minimum record id which we have to keep as the original record. 


Please watch the HAVING vs WHERE Clause video here.


removing-duplicate-using-join

Please watch the INNER JOIN video here and the SELF JOIN video here for more details.




Deleting Duplicate Records Using JOIN and RANK Function


Similarly, we can use the SQL Rank Function instead of the GROUP BY and HAVING clause to exclude the record having minimum RankNo which we have to keep as the original record.


Please watch the WHERE Clause vs ON Clause video here.


removing-duplicate-using-join-rank


Please watch the RANK Function video here for more details.




Deleting Duplicate Records Using JOIN and ROW NUMBER Function


Similarly, we can use the SQL Row Number Function as well instead of  Rank Function to exclude the record having minimum RowNo which we have to keep as the original record. 

Please watch the Row Number vs Rank Function video here.

removing-duplicate-using-join-rowno

Please watch the ROW NUMBER Function video here.




Deleting Duplicate Records Using CTE and RANK Function


We can use the Common Table Expression (CTE) as well for identifying and removing the duplicate data from the table. 


The SQL Rank Function is generally used to generate unique row numbers for the table records which do not have any Primary Key or Unique Key or Unique Index to identify the records uniquely.   


Please watch the Primary Key video here, Unique Key video here, and Unique Index video here.


Here we are using the SQL Rank Function to generate and get the minimum Record Id which we have to keep as the original record.


removing-duplicate-by-cte-rank

Please watch the CTE video here and the Rank Function video here for more details.




Deleting Duplicate Records Using CTE and ROW NUMBER Function


We can use the Common Table Expression (CTE) with SQL Row Number Function as well for identifying and removing the duplicate data from the table. 


The SQL Row Number Function is generally used to generate unique rank numbers for the table records which do not have any Primary Key or Unique Key or Unique Index to identify the records uniquely.   


Please watch the RowNumber vs Rank vs Dense Rank video here for more details.


Here we are using the SQL Row Number Function to generate and get the minimum Record Id which we have to keep as the original record.


removing-duplicates-using-cte-rowno

Please watch the CTE video here and the Row Number function video here for more details.




Deleting Duplicate Records Using DT and RANK Function


We can also use the Derived Table (DT) for identifying and removing the duplicate records from the table.  


Here we are using the SQL Rank Function again to generate and get the minimum Record Id which we have to keep as the original record.


removing-duplicate-by-dt

Please watch the DT video here and the Rank Function video here for more details.




Deleting Duplicate Records Using DT and ROW NUMBER Function


Similarly, we can use the Derived Table (DT) with SQL Row Number Function as well for identifying and removing the duplicate records from the table.  


Here we are using the SQL Row Number Function again to generate and get the minimum Record Id which we have to keep as the original record.


removing-duplicate-by-dt-rowno

Please watch the DT video here and the Row Number Function video here for more details.




Live Demo


Watch our related videos here




SQL Scripts 


 

-- Creating Table
CREATE TABLE Customer
( 
	[ID] INT IDENTITY(1,1) PRIMARY KEY, 
	[Name] Varchar(100), 
	[Email] Varchar(100)		
) 

-- Inserting Data into the Table
INSERT INTO Customer(Name, Email)
	VALUES ('James', 'james@gmail.com'),
		   ('Thomas', 'thomas@gmail.com'),
		   ('James', 'james@gmail.com'),
		   ('Thomas', 'thomas@gmail.com'),
		   ('James', 'james@gmail.com'),
		   ('Peter', 'peter@gmail.com'),
		   ('Alex', 'alex@gmail.com'),
		   ('Peter', 'peter@gmail.com');

-- Selecting Data from the Table
SELECT * FROM Customer;



-- Deleting Duplicate Records using IN Clause
-- Keeping First Inserted Record as Original Record

DELETE FROM Customer
WHERE ID NOT IN
(
	SELECT MIN(ID) AS MinRecordID
	FROM Customer
	GROUP BY Name, Email
);	

-- Table Records after Deleting the duplicate Records
SELECT * FROM Customer ORDER BY Name, ID;

-- Deleting Duplicate Records using CoRelated Subquery
-- Keeping First Inserted Record as Original Record

DELETE C1 FROM Customer C1 
WHERE ID > 
	( 
		SELECT MIN (ID) FROM Customer C2 WHERE C1.Email = C2.Email
	);

-- Records after Deleting Duplicate Records
SELECT * FROM Customer;


-- Deleting Duplicate Records using JOIN
-- Keeping First Inserted Record as Original Record

DELETE C FROM Customer C
	INNER JOIN
	(   SELECT MIN(ID) AS MinID, Name, Email,
		COUNT(1) AS TotalRecordCount
		FROM Customer
		GROUP BY Name, Email
		HAVING COUNT(1) > 1 
	) TMP ON C.ID > TMP.MinID AND (C.Name = TMP.Name AND C.Email = TMP.Email);	

-- Table Records after Deleting the Duplicate Records
SELECT * FROM Customer ORDER BY Name, ID;



-- Deleting Duplicate Records using JOIN and RANK
-- Keeping First Inserted Record as Original Record

DELETE C FROM Customer C
	INNER JOIN
	(   SELECT	 
		ID, 
		RANK() OVER(PARTITION BY Name, Email ORDER BY ID) AS RankNo
		FROM Customer		
	) TMP ON TMP.ID = C.ID	
	WHERE TMP.RankNo > 1;

-- Table Records after Deleting the Duplicate Records
SELECT * FROM Customer ORDER BY Name, ID;


-- Deleting Duplicate Records using JOIN and RowNumber
-- Keeping First Inserted Record as Original Record

DELETE C FROM Customer C
	INNER JOIN
	(   SELECT	 
		ID, 
		Row_Number() OVER(PARTITION BY Name, Email ORDER BY ID) AS RowNo
		FROM Customer		
	) TMP ON TMP.ID = C.ID	
	WHERE TMP.RowNo > 1;

-- Table Records after Deleting the Duplicate Records
SELECT * FROM Customer ORDER BY Name, ID;


-- Deleting Duplicate Records using CTE and RowNumber
-- Keeping First Inserted Record as Original Record

WITH CTE 
AS 
(	
	SELECT Id,Name,	Email,
	ROW_NUMBER() OVER(PARTITION BY Name, Email ORDER BY ID) AS RowNo                                         
	FROM Customer
)
DELETE FROM CTE WHERE RowNo > 1;

-- Table Records after Deleting the Duplicate Records
SELECT * FROM Customer ORDER BY Name, ID;


-- Deleting Duplicate Records using CTE and Rank
-- Keeping First Inserted Record as Original Record

WITH CTE 
AS 
(	
	SELECT Id,Name,	Email,
	Rank() OVER(PARTITION BY Name, Email ORDER BY ID) AS RankNo                                         
	FROM Customer
)
DELETE FROM CTE WHERE RankNo > 1;

-- Table Records after Deleting the Duplicate Records
SELECT * FROM Customer ORDER BY Name, ID;


-- Deleting Duplicate Records using DT and Rank
-- Keeping First Inserted Record as Original Record

DELETE FROM DT 
	FROM 
	(
		SELECT 	Id, Name, Email,
			RANK() OVER(PARTITION BY Name, Email ORDER BY ID) AS RankNo                                         
		FROM Customer
	) DT 		
WHERE RankNo > 1;

-- Table Records after Deleting the Duplicate Records
SELECT * FROM Customer ORDER BY Name, ID;


-- Deleting Duplicate Records using DT and RowNumber
-- Keeping First Inserted Record as Original Record

DELETE FROM DT 
	FROM 
	(
		SELECT 	Id, Name, Email,
			Row_Number() OVER(PARTITION BY Name, Email ORDER BY ID) AS RowNo                                         
		FROM Customer
	) DT 		
WHERE RowNo > 1;

-- Table Records after Deleting the Duplicate Records
SELECT * FROM Customer ORDER BY Name, ID;






Recommended Articles


Most Frequently Asked SQL Interview Questions
Most Frequently Asked C# Interview Questions
Foreign Key Interview Questions
Unique Key Interview Questions
Primary Key Interview Questions
Derived Table (DT) Interview Questions
Unicode vs Non-Unicode Data Types Interview Questions
Table Variable (TV) Interview Questions
Table-Valued Type (TVT) Interview Questions
Common Table Expression(CTE) Interview Questions
Temp Table Interview Questions
Static Class Interview Questions and Answers
Sealed Class Interview Questions and Answers
Sealed Method Interview Questions and Answers
Abstract Class vs Interface
Interface Interview Questions and Answers






Thanks for visiting this page. Please follow us on Twitter, Facebook, LinkedIn, Telegram, WhatsApp, Youtube, and Quora for regular updates.

1 comment:

  1. Oops! There are no duplicate records in your Customers table.

    ReplyDelete

Please do not enter any HTML. JavaScript or spam link in the comment box.