Possible 1: We have employee table and we are going to check
and display duplicate rows. We will call data duplicate if the data from all
the columns from employee table occurs more than once.
Query:
SELECT empno, ename, job, mgr, hrideate, sal, comm, deptno, COUNT(*) -- Use all the columns from employee table FROM hr.employee GROUP BY empno, ename, job, mgr, hrideate, sal, comm, deptno HAVING COUNT(*) > 1;
The select query here selects the row that has occurred more
than once. Here, the data must match from all the columns from employee table.
Possible 2: The other possibility is that the request is
only concerns if two or more employees have same empno. No two employees should
have same employee number, don’t you agree? Here, we are only concerns about the employee
number even though the rest of the details are different. How do you find the
duplicates empno here?
Query:
SELECT empno, COUNT(*) FROM hr.employee GROUP BY empno HAVING COUNT(*) > 1;
Do you want duplicate records? No, you don’t, how do you
resolve this? The first step is to remove duplicate records and then add Primary
Key or Unique Constraint on empno. Adding constraints resolves most of the problem
regarding duplicate but not all. We will discuss what constraints solve and
what it does not next.
Now the employee table has PK on empno column. We can’t have
duplicate employee number. Primary Key ensures no duplicate data on the column
and also adds Index automatically. To verify if the PK is working/exist or not, you can use
the above query or just check on empno column instead of all the columns to display
duplicate. Well this isn’t the right way to check PK there are other solid and
fast way which we are interested here.
What PK does not prevent is other columns. You will have unique employee number but all other information may be duplicated. This is possible when someone is doing a data entry from an application or loading employee data from a file. This is something we can’t prevent it. Therefore, I would check to ensure PK is there and do a duplicate check on all the columns except the one with PK constraint. The query output should always be NONE. If the query displays some output either your query is wrong or you have duplicate records.
What PK does not prevent is other columns. You will have unique employee number but all other information may be duplicated. This is possible when someone is doing a data entry from an application or loading employee data from a file. This is something we can’t prevent it. Therefore, I would check to ensure PK is there and do a duplicate check on all the columns except the one with PK constraint. The query output should always be NONE. If the query displays some output either your query is wrong or you have duplicate records.
Interested in working with me? I can be reached at pbaniya04[at]gmail.com for any questions, consulting opportunities or you may drop a line to say HELLO. Thank your again for visiting my blog and looking forward to serving you more.
Have a Database-ious Day!
Have a Database-ious Day!
No comments