How to Delete Duplicate Record?

This article does not talk about the “duplicate” that you are used to hearing. We aren't talking about the knock-off products AKA "duplicate" that you see in China-Town either. This is for a data enthusiast who happened to see duplicate records in their tables. This article will answer you; how to find the duplicate records or check if duplicate record exists? How do you remove them and prevent the duplicate records? If you are looking for something else, sorry pal, you took a wrong exit!! The answer to above questions can be best explained with a lab exercise. Let’s begin our exercise!!

Drop Table If Exist:

--Drop if exist already.
DROP TABLE SCOTT.MY_DUPLICATE_TABLE;

Create Table:

--Create table
CREATE TABLE SCOTT.MY_DUPLICATE_TABLE
  (
    EMPLOYEE_ID    NUMBER,
    EMPLOYEE_FNAME VARCHAR2(25),
    EMPLOYEE_LNAME  VARCHAR2(25)
  );

Verify Table:

--Verify that your table exist
DESC SCOTT.MY_DUPLICATE_TABLE;

Insert Records:

--insert duplicate records
BEGIN
INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1054, 'John', 'Smith');
INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1054, 'John', 'Smith');
INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1054, 'John', 'Smith');
INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1055, 'Pat', 'Peterson');
INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1055, 'Pat', 'Peterson');
INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1056, 'Jack', 'Johnson');
INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1056, 'Jack', 'Johnson');
INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1057, 'Ray', 'Ramano');
INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1057, 'Ray', 'Ramano');
INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1057, 'Ray', 'Ramano');
INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1058, 'Jennifer','Harold');
INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1058, 'Jennifer','Harlod');
INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1059, 'King','Kong');
COMMIT;
END;
/

Record Count:

--Let’s count the record.
SELECT COUNT(*) FROM MY_DUPLICATE_TABLE;

How To Check Duplicate Records?

--How to check if duplicate record exist?
SELECT employee_id,
  COUNT(employee_id)
FROM my_duplicate_table
GROUP BY employee_id
ORDER BY employee_id;

The above SQL statement will tell you if there are duplicates. Hey, do you see with this query? I will leave that up to you but this query gets the job done,correct? Here's a solid way to capture duplicates.

SELECT employee_id,
       Count(employee_id) AS dup
FROM   my_duplicate_table
GROUP  BY employee_id
HAVING Count(employee_id) > 1
ORDER  BY employee_id;

Now, lets delete the duplicate, why? cause I don't need them!! The SAFE way to delete from a table is to write a select statement to display the unwanted records then convert the “select statement” to “delete from” statement once you verify your select statement. This is the best practices that database junkies follow before they say good-bye to unwanted data! And you should follow it as well!!

SELECT *
FROM MY_DUPLICATE_TABLE
WHERE ROWID NOT IN
  (SELECT MAX(rowid)
  FROM my_duplicate_table
  GROUP BY employee_id,
    employee_fname,
    employee_lname
  );

Lets try and understand what this query is doing? The outer query is just a plain old select statement.What is the sub-query is doing ? It is selecting the max ROWID from duplicate records and which is what you want to keep correct?. The outer query will display everything from the table Minus the result from sub-query (what you want in a table).We are not done yet. We now need to convert thee above “SELECT” into DELETE statement? See the conversion below, Easy Huh!!!

DELETE
FROM scott.my_duplicate_table
WHERE rowid IN (SELECT rowid
 FROM scott.my_duplicate_table
 WHERE ROWID NOT IN
  (SELECT MIN(ROWID)
  FROM scott.my_duplicate_table
  GROUP BY employee_id
  )
 );

Tips: Always, verify your table records and then commit/rollback the transaction. This applies to any change you are making on DDL, DML or in application development. Group by operation should be on the columns which identify the duplicate. The more columns you add, the strong the query become ( this is what i think and believe!!) There are several ways to get rid of the duplicate and this is one of them. Hey, duplicates are gone, but we didn't do a jack to prevent this from happening again? How do you prevent this from happening again? You can alter table to add primary key or unique column. As a DBA, I will not let any table push to production without a primary key or unique column unless they come with an Xception…..

Titulo

How to Delete Duplicate Record?

No comments

Archive

Certification

Facebook

Categories

Database Career Tips

Popular Topics