This article does not talk about the “duplicate” that you are used to hearing. We aren't talking about the knock-off products AKA "duplicate" that you see in China-Town either. This is for a data enthusiast who happened to see duplicate records in their tables. This article will answer you; how to find the duplicate records or check if duplicate record exists? How do you remove them and prevent the duplicate records?
If you are looking for something else, sorry pal, you took a wrong exit!! The answer to above questions can be best explained with a lab exercise. Let’s begin our exercise!!
Drop Table If Exist:
Create Table:
Verify Table:
Insert Records:
Record Count:
How To Check Duplicate Records?
The above SQL statement will tell you if there are duplicates. Hey, do you see with this query? I will leave that up to you but this query gets the job done,correct? Here's a solid way to capture duplicates.
Now, lets delete the duplicate, why? cause I don't need them!! The SAFE way to delete from a table is to write a select statement to display the unwanted records then convert the “select statement” to “delete from” statement once you verify your select statement. This is the best practices that database junkies follow before they say good-bye to unwanted data! And you should follow it as well!!
Lets try and understand what this query is doing? The outer query is just a plain old select statement.What is the sub-query is doing ? It is selecting the max ROWID from duplicate records and which is what you want to keep correct?. The outer query will display everything from the table Minus the result from sub-query (what you want in a table).We are not done yet. We now need to convert thee above “SELECT” into DELETE statement? See the conversion below, Easy Huh!!!
Tips: Always, verify your table records and then commit/rollback the transaction. This applies to any change you are making on DDL, DML or in application development. Group by operation should be on the columns which identify the duplicate. The more columns you add, the strong the query become ( this is what i think and believe!!) There are several ways to get rid of the duplicate and this is one of them. Hey, duplicates are gone, but we didn't do a jack to prevent this from happening again? How do you prevent this from happening again? You can alter table to add primary key or unique column. As a DBA, I will not let any table push to production without a primary key or unique column unless they come with an Xception…..
Drop Table If Exist:
--Drop if exist already. DROP TABLE SCOTT.MY_DUPLICATE_TABLE;
Create Table:
--Create table CREATE TABLE SCOTT.MY_DUPLICATE_TABLE ( EMPLOYEE_ID NUMBER, EMPLOYEE_FNAME VARCHAR2(25), EMPLOYEE_LNAME VARCHAR2(25) );
Verify Table:
--Verify that your table exist DESC SCOTT.MY_DUPLICATE_TABLE;
Insert Records:
--insert duplicate records BEGIN INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1054, 'John', 'Smith'); INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1054, 'John', 'Smith'); INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1054, 'John', 'Smith'); INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1055, 'Pat', 'Peterson'); INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1055, 'Pat', 'Peterson'); INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1056, 'Jack', 'Johnson'); INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1056, 'Jack', 'Johnson'); INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1057, 'Ray', 'Ramano'); INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1057, 'Ray', 'Ramano'); INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1057, 'Ray', 'Ramano'); INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1058, 'Jennifer','Harold'); INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1058, 'Jennifer','Harlod'); INSERT INTO MY_DUPLICATE_TABLE (EMPLOYEE_ID, EMPLOYEE_FNAME, EMPLOYEE_LNAME) VALUES ( 1059, 'King','Kong'); COMMIT; END; /
Record Count:
--Let’s count the record. SELECT COUNT(*) FROM MY_DUPLICATE_TABLE;
How To Check Duplicate Records?
--How to check if duplicate record exist? SELECT employee_id, COUNT(employee_id) FROM my_duplicate_table GROUP BY employee_id ORDER BY employee_id;
The above SQL statement will tell you if there are duplicates. Hey, do you see with this query? I will leave that up to you but this query gets the job done,correct? Here's a solid way to capture duplicates.
SELECT employee_id, Count(employee_id) AS dup FROM my_duplicate_table GROUP BY employee_id HAVING Count(employee_id) > 1 ORDER BY employee_id;
Now, lets delete the duplicate, why? cause I don't need them!! The SAFE way to delete from a table is to write a select statement to display the unwanted records then convert the “select statement” to “delete from” statement once you verify your select statement. This is the best practices that database junkies follow before they say good-bye to unwanted data! And you should follow it as well!!
SELECT * FROM MY_DUPLICATE_TABLE WHERE ROWID NOT IN (SELECT MAX(rowid) FROM my_duplicate_table GROUP BY employee_id, employee_fname, employee_lname );
Lets try and understand what this query is doing? The outer query is just a plain old select statement.What is the sub-query is doing ? It is selecting the max ROWID from duplicate records and which is what you want to keep correct?. The outer query will display everything from the table Minus the result from sub-query (what you want in a table).We are not done yet. We now need to convert thee above “SELECT” into DELETE statement? See the conversion below, Easy Huh!!!
DELETE FROM scott.my_duplicate_table WHERE rowid IN (SELECT rowid FROM scott.my_duplicate_table WHERE ROWID NOT IN (SELECT MIN(ROWID) FROM scott.my_duplicate_table GROUP BY employee_id ) );
Tips: Always, verify your table records and then commit/rollback the transaction. This applies to any change you are making on DDL, DML or in application development. Group by operation should be on the columns which identify the duplicate. The more columns you add, the strong the query become ( this is what i think and believe!!) There are several ways to get rid of the duplicate and this is one of them. Hey, duplicates are gone, but we didn't do a jack to prevent this from happening again? How do you prevent this from happening again? You can alter table to add primary key or unique column. As a DBA, I will not let any table push to production without a primary key or unique column unless they come with an Xception…..
Interested in working with me? I can be reached at pbaniya04[at]gmail.com for any questions, consulting opportunities or you may drop a line to say HELLO. Thank your again for visiting my blog and looking forward to serving you more.
Have a Database-ious Day!
Have a Database-ious Day!
No comments