How to check for duplicates in SQL

Summary

There are 2 main methods to check for duplicates in SQL. The first method uses the keyword DISTINCT and the other one uses GROUP BY.

DISTINCT method

GROUP BY method

The advantage of the GROUP BY method is that you can see the number of duplicates. The drawback is more code.

Finally, to check for duplicates you just have to look at the number of rows before and after you used either of the 2 methods above. If the count before and after is not the same, then you have duplicates.

In-Depth

Sometimes you will notice that your query returned duplicates. Other times, often for big queries, you will think that there’s no duplicates even when there is.

For that reason it is good practice to check if your view contains duplicates, especially for complex queries.

Theory

Where do duplicates come from? They can come from a bug in the ETL process, when data is inserted. But most often than not, they will come from the way you designed your query.

When the relationship in between 2 tables is not 1 to 1 and you join them you can end up with duplicates.

How to check for duplicates in SQL

One easy way to check if you have duplicates or not in SQL is to count the number of rows before you join your table, and after you do. If you see that the count is different before and after then it means you have duplicates.

Practice

To practice, create the following 2 tables:

Let’s say you work as an inventory analyst and you’re trying to identify items that have inventory. You can run the following query:

And this will return 3 rows: itemA, itemB, itemB.

We have duplicates because the second table can have multiple values for the same item.

Using one of the 2 methods above (DISTINCT or GROUP BY) would return the correct result: itemA, itemB.

Get The Job You Want – Free Content

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.