2LeggedSpider

Deleting duplicate records/rows in SQL Server

Posted in SQL by Sumit Thomas on June 28, 2007

[tweetmeme style=”compact”]A duplicate record in a table is redundant data and is a violation of table integrity. Utmost care should be taken to avoid such situations. But, there might be situations where there is a loosely designed table which holds duplicate records. Removing duplicate records is of high priority in such situations.

Consider the following table design…

CREATE TABLE [dbo].[Categories](
	[CategoryId] [int] IDENTITY(1,1) NOT NULL,
	[Category] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
 CONSTRAINT [PK_Categories] PRIMARY KEY CLUSTERED 
(
	[CategoryId] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

Lets insert some records in this table

INSERT INTO Categories VALUES('Books')
INSERT INTO Categories VALUES('Books')
INSERT INTO Categories VALUES('Books')
INSERT INTO Categories VALUES('Toys')
INSERT INTO Categories VALUES('Automobiles')
INSERT INTO Categories VALUES('Gadgets')
INSERT INTO Categories VALUES('Toys')
INSERT INTO Categories VALUES('Toys')

As you can see we have inserted some duplicate data in this table. Imagine if we had thousands of records in the Categories table with lots of duplicate records. Fixing the table would be a pain. To solve this problem we need to first identify the duplicate records. The following query will help us do that.

SELECT Category, Count(Category) AS Occurence FROM Categories 
GROUP BY Category
HAVING COUNT(Category) > 1

We get the following result…

Category            Occurence
----------------  -----------
Books                 4
Toys                  3

As you can see it displays the duplicate records and the number of times it occurs in the table. Now how do we delete the duplicates?

Lets try another query on the table.

SELECT 	MAX(CategoryID), Category
FROM Categories
GROUP BY Category

The result is…

            Category
-------- --------------------------------------------------
6           Automobiles
4           Books
7           Gadgets
9           Toys

This is how our table should look like, no duplicates! The SQL query to delete the duplicate records is now simple…

DELETE FROM Categories
WHERE 	CategoryID NOT IN
	(SELECT MAX(CategoryID)
        FROM Categories
        GROUP BY Category)

and finally when we fetch all records from the Categories table, it gives us the desired result…

CategoryId  Category
----------- --------------------------------------------------
4           Books
6           Automobiles
7           Gadgets
9           Toys

This is a simple method to remove duplicates from a table. Please note that the table should have an Identity column for this method to work. There are several methods available to remove duplicate records from a table. If you have come across a better alternative please share it.

Advertisements
Tagged with:

3 Responses

Subscribe to comments with RSS.

  1. pinaldave said, on June 28, 2007 at 5:22 pm

    Hi,

    I have similar article as yours which is written while ago.
    SQL SERVER – Delete Duplicate Records – Rows

    Regards,
    Pinal Dave (SQLAuthority.com)

  2. 2leggedspider said, on June 29, 2007 at 8:05 am

    Ah! Thats true!

    Thanks for letting me know Dave! I went through your blog just now. Its very informative and I’ve added it to my blogroll 🙂

  3. Yogesh said, on January 20, 2009 at 2:52 pm

    The below article showed me a simple way
    Delete Duplicate Rows in Sql Server


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: