How to Delete Duplicate Rows in SQL (Complete Guide)

How to Delete Duplicate Rows in SQL (Complete Guide)

As you know Duplicate rows are a common issue in databases. They usually come due to incorrect data imports, sometimes human errors, missing constraints, or application glitches and these issues are common. If duplicates are not cleaned on exact time, they will affect analytics, increase storage cost, and lead to wrong reporting.
In this guide, you’ll learn all the 5 major methods to delete duplicate rows in SQL, with clear examples suitable for beginners and professionals. Let’s start

Why Do Duplicate Rows Occur?

Before deleting duplicates, it’s important to understand why they appear:

  • Missing PRIMARY KEY or UNIQUE constraints

  • Importing data from Excel/CSV without validation

  • Application bugs creating identical entries

  • Users submitting forms multiple times

  • Poor database design

Once you identify the cause, you can fix it. But first—let’s remove the duplicate rows.

Method 1: Delete Duplicates Using ROW_NUMBER () (Best Method)

This method works in MySQL 8+, PostgreSQL, SQL Server, and Oracle because these platforms support window functions.

Step 1: Identify duplicate rows

SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY name, email ORDER BY id
) AS rn
FROM users
) t
WHERE t.rn > 1;

Here:

  • PARTITION BY → columns that define duplicates

  • rn > 1 → fetch duplicate rows while keeping the first one

Step 2: Delete duplicate rows

DELETE FROM users
WHERE id IN (
SELECT id FROM (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY name, email ORDER BY id) AS rn
FROM users
) x
WHERE rn > 1
);

Method 2: Delete Duplicates Using GROUP BY and MIN (id)

This method is most popular in MySQL.

You keep the row with the smallest id and delete the rest.

DELETE t1
FROM users t1
JOIN users t2
ON t1.name = t2.name
AND t1.email = t2.email
AND t1.id > t2.id;

  • t1.id > t2.id ensures only duplicates are deleted

  • The lowest id remains

This method is fast and widely used.

Method 3: Delete Duplicates Using DISTINCT (Safe Method)

If your table has no unique ID and you want all rows unique:

CREATE TABLE users_new AS
SELECT DISTINCT * FROM users;

DROP TABLE users;

ALTER TABLE users_new RENAME TO users;

When to use this method?

  • Small database

  • No primary key

  • Safe data export/import

Avoid this method on large tables or where foreign key relationships exist.

Method 4: Delete Duplicates Using a Temporary Table

This method works on all SQL databases.

CREATE TABLE temp_users AS
SELECT DISTINCT * FROM users;

TRUNCATE TABLE users;

INSERT INTO users SELECT * FROM temp_users;

This is cleaner than dropping the table and useful when:

  • You need a simple fix

  • You want to preserve columns

  • Your table has limited data

Method 5: Delete Duplicates in SQL Server Using CTE

SQL Server supports CTE (Common Table Expressions) with window functions.

WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY name, email ORDER BY id
) AS rn
FROM users
)
DELETE FROM CTE WHERE rn > 1;

This is one of the easiest and cleanest ways in SQL Server.

How to Check for Duplicate Rows Before Deleting?

Always check duplicates before deleting to avoid data loss.

SELECT name, email, COUNT(*)
FROM users
GROUP BY name, email
HAVING COUNT(*) > 1;

Preventing Duplicate Rows (Very Important!)

After cleaning duplicates, ensure they don’t come back.

1. Add a UNIQUE constraint

ALTER TABLE users
ADD CONSTRAINT unique_user UNIQUE (email);

2. Use PRIMARY KEY

ALTER TABLE users
ADD PRIMARY KEY (id);

3. Validate data at application level

  • Add server-side validation

  • Block duplicate form submissions

  • Use transactions properly

Best Method to Use?

Database Best Method
MySQL 8+ ROW_NUMBER() or MIN(id)
MySQL < 8 MIN(id) or temporary table
PostgreSQL ROW_NUMBER()
SQL Server CTE with ROW_NUMBER()
Oracle ROW_NUMBER()
No ID column DISTINCT or temporary table

To make it Delete duplicate rows in SQL is very easy when you know the correct and simple method. The safest and most flexible method is using ROW_NUMBER (), as it allows you to target specific duplicates accurately. For MySQL users, the self-join with MIN (id) method is extremely efficient.
Once duplicates are cleaned, always add constraints to prevent them from happening again.
With these methods, you can keep your database clean, fast, and reliable.


Author

  • Urvarshi Sharma is a writer specializing in IT services, focusing on creating insightful content about technology, innovation, and industry trends. With a keen understanding of the IT landscape, she writes engaging articles that simplify complex topics, helping businesses stay informed and make strategic decisions in the ever-evolving tech world.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top