Recently, I was working on a big project where I needed to find new data by comparing two tables. The number of new data came out pretty high number. For simplicity consider there are two tables A & B. The Table A has the latest and greatest data which updates frequently via API. Table B is on different database than table A therefore, someone goes to table B and updates manually every month with the new data from table A or on demand. I am a big fan of a-u-t-o-m-a-t-i-o-n and hates to repeat the same task often because I believe anywhere data moves, you can script it to automate. I have done a complete article on AUTOMATION if you haven't read it ready. I volunteered to write a script to automate the data sync almost in a real time or before client finds the issue.
I started by writing a script to compare data between tables to find the differences. I was shocked when I select everything from Table A MINUS from Table B. The differences came out a huge number. The number is same as the total number of records in table A. What is this result telling you? your answer is either the table B is empty or the table refresh never happened. The table B isn't empty
but I don't know when the last refreshed was done.
What is wrong here? after scrolling data on both tables, I cannot see any issue that I can see visually see. I then verified my SQL query on demo tables. It suddenly strikes to my mind to check the length of data on both that table. The data length does not match and the length is higher on table B. Why are those length not matching? Because the data on table B was loaded with either a leading or trailing spaces. After using TRIM Function on all the columns, I was able to get the actual difference or at least more reasonable differences.
It seems like whoever loaded data on table B, loaded with leading or trailing spaces on all data which cause the problem when getting the difference. Always use TRIM function to load data whenever you are reading data from a table or file unless you are confident on that data you are using are clean ones.
TRIM function is one of the simple Function that comes with Oracle. TRIM removes the leading or trailing characters or both from a character string. If you do not specify anything to leading or trailing character, it defaults to space. Let's see few examples of TRIM, LTRIM and RTRIM
TRIM() : Removes any leading or trailing space.
LTRIM(): Removes from the left end of char all of the characters contained in set. By default, it removes single blank.
RTRIM(): Removes from the right end of char all of the characters contained in a set. By default, it removes a sing blank.
A common problem is items with leading or trailing spaces. You can’t see them on the screen, but after you’ve encountered them a few times, you learn to check for them. The LENGTH function is a great help if you suspect there are hidden space characters. We discussed the importance of TRIM Function and the syntax of TRIM, LTRIM and RTRIM. Using these Functions are costly operations, therefore I would be extra careful when loading data to avoid using TRIM functions on every column later.
I started by writing a script to compare data between tables to find the differences. I was shocked when I select everything from Table A MINUS from Table B. The differences came out a huge number. The number is same as the total number of records in table A. What is this result telling you? your answer is either the table B is empty or the table refresh never happened. The table B isn't empty
but I don't know when the last refreshed was done.
What is wrong here? after scrolling data on both tables, I cannot see any issue that I can see visually see. I then verified my SQL query on demo tables. It suddenly strikes to my mind to check the length of data on both that table. The data length does not match and the length is higher on table B. Why are those length not matching? Because the data on table B was loaded with either a leading or trailing spaces. After using TRIM Function on all the columns, I was able to get the actual difference or at least more reasonable differences.
It seems like whoever loaded data on table B, loaded with leading or trailing spaces on all data which cause the problem when getting the difference. Always use TRIM function to load data whenever you are reading data from a table or file unless you are confident on that data you are using are clean ones.
TRIM function is one of the simple Function that comes with Oracle. TRIM removes the leading or trailing characters or both from a character string. If you do not specify anything to leading or trailing character, it defaults to space. Let's see few examples of TRIM, LTRIM and RTRIM
TRIM() : Removes any leading or trailing space.
--lenghth witout trim SELECT LENGTH(' hello dba ') FROM dual; --12 --lenght with trim SELECT LENGTH(trim(' hello dba ')) FROM dual; --9
LTRIM(): Removes from the left end of char all of the characters contained in set. By default, it removes single blank.
SELECT, product_serial, LTRIM(product_serial, 'EMR') as Number_only_Serial FROM tbl_product; EMR1234567890, 1234567890
RTRIM(): Removes from the right end of char all of the characters contained in a set. By default, it removes a sing blank.
SELECT RTRIM('RANDOM: 123456789', '123456789') "RTRIM example" FROM DUAL; RANDOM:
A common problem is items with leading or trailing spaces. You can’t see them on the screen, but after you’ve encountered them a few times, you learn to check for them. The LENGTH function is a great help if you suspect there are hidden space characters. We discussed the importance of TRIM Function and the syntax of TRIM, LTRIM and RTRIM. Using these Functions are costly operations, therefore I would be extra careful when loading data to avoid using TRIM functions on every column later.
Interested in working with me? I can be reached at pbaniya04[at]gmail.com for any questions, consulting opportunities or you may drop a line to say HELLO. Thank your again for visiting my blog and looking forward to serving you more.
Have a Database-ious Day!
Have a Database-ious Day!
No comments