Toad World® Forums

How to test exact key match between 2 tables?


#1

This is related to Steven’s previous question about comparing datasets,but I’m looking for an actual suggestion on how to test this usingQCT4O. (BTW, what do you think of that acronym for the product??)

Let’s say you have 2 tables linked by a key in both tables (don’t ask why all data isn’t in a single table). The common key for both is the id_number column. I know CORE_TABLE has distinct id_numbers. I need to test that CHILD_TABLE has one-and-only-one row with id_number matching CORE_TABLE. That is, there are no duplicate rows in CHILD_TABLE.

Thinking the data streams should be the same, I chose the QCT4O test:

“Program dataset contains the same rows as the expected dataset?”
My queries were:

SELECT id_number FROM CORE_TABLE;

and

SELECT id_number FROM CHILD_TABLE;

That generates the assertion code:

SELECT 1
FROM DUAL
WHERE EXISTS (
((SELECT id_number
FROM CORE_TABLE)
MINUS
(SELECT id_number
FROM CHILD_TABLE))
UNION
((SELECT id_number
FROM CHILD_TABLE)
MINUS
(SELECT id_number
FROM CORE_TABLE));

But this will succeed even if CHILD_TABLE includes multiple rows with the same id_number.

I worked around this by defining 2 tests. The above test was the first. The second was “Queried scalar matches expected value?”

Where the query is:

SELECT COUNT()
FROM (SELECT id_number, COUNT(
)
FROM CHILD_TABLE
GROUP BY id_number
HAVING COUNT(*) > 1);

Expected value = 0.

I also thought of a more complex set of queries to compare, appending the row_number() analytic to the id_number, like this:

SELECT TO_CHAR(to_number(id_number)) || ‘-’ || ROW_NUMBER() OVER(PARTITION BY id_number ORDER BY id_number ASC)
FROM CORE_TABLE;

SELECT TO_CHAR(to_number(id_number)) || ‘-’ || ROW_NUMBER() OVER(PARTITION BY id_number ORDER BY id_number ASC)
FROM CHILD_TABLE;

But this seems ugly.

Is there a better way to do this with a single test/outcome?

Thanks,

Stew


#2

Dang, nice catch, Stew!

What if we added to the test for equality of datasets that we also check the count. Clearly, if the number of rows are different, they cannot be the same!

Do you think that would meet your needs?

SF


#3

Code Tester 1.6, which will be out around mid-year, will check counts before checking contents, so duplicates are checked that way (one possible impact of duplicates, anyway). In addition, we are changing UNION to UNION ALL so that dups are preserved. I think this addresses the issue.

Do you?

SF


#4

Steven,

I think the count solution would work pretty well in 99.5% of the cases. I think it could still fail if you had multiple records for some matches but they were dupes on different records, but that doesn’t seem very likely.

I don’t see any difference in the results changing UNION to UNION ALL. What’s the thinking there?

As I might have said in my original note, I tend to use a query comparison that I found on Tom Kyte’s AskTom website, which handles this duplicate row situation just fine. It was a little hard for me to understand at first, but I’m sure you’ll catch on what’s happening right away!

SELECT COUNT(src1) AS in_clipboard_ids, COUNT(src2) AS in_temp_ids, id_number
FROM (SELECT id_number,
1 AS src1,
to_number(NULL) AS src2
FROM clipboard_ids
UNION ALL
SELECT to_number(id_number),
to_number(NULL) AS src1,
2 AS src2
FROM temp_ids
)
GROUP BY id_number
HAVING COUNT(src1) <> COUNT(src2);

I added it to a template in PL/SQL Developer, and when I invoke it, it prompts me for the table names and the column names to compare. So it’s very easy to invoke. :slight_smile:

Unfortunately it would require parsing each query to determine which columns to group by, which I don’t see how could be done dynamically! :frowning:

I guess the record count will have to suffice here!

Thanks for keeping this in mind,

Stew


#5

Thanks, Stew. I will work really hard at understanding that SQL (it’s not my strong suit, but we have others better at it on the dev team).

My idea with UNION ALL is that UNION will get rid of duplicates while UNION ALL preserves them, but thinking about it a bit more, I guess the question is whether MINUS has already obliterated the duplicates.

By the way, in terms of COUNT: if they are different we stop. If they are the same, we continue with the diff analysis.

SF