Toad World® Forums

Data duplication issue


#1

HI!
I have data that is duplicated by a field that should be unique but is not. The whole record is distinct due to date field values and other CHARVARs.

Mem_ID is expected to be duplicative where appropriate, i.e. more than one uniqueID by Mem_Id.

Would like to keep just one record by UniqueID (earliest).

Ideally, min(admit) and set discharge date to max(discharge)

Have tried combinations of SELECT DISTINCT & MIN & MAX, but keep getting same number of records, obviously doing something wrong.

Thanks for any insight!

Here are some examples.

For example, UniqueID 5459110 is just bad data due to influences beyond the scope of my current needs and only need one record.

UniqueID Mem_Id admit discharge Status
5459110 135802 7/20/2010 7/27/2010 Some Description 1
5459110 135802 7/22/2010 7/27/2010 Some Description 1
13776129 478821 6/8/2011 6/12/2011 Some Description 4
13776129 478821 6/11/2011 6/12/2011 Some Description 1
14844766 478914 4/28/2011 5/2/2011 Some Description 2
14844766 478914 4/29/2011 5/2/2011 Some Description 1
14844766 478914 4/30/2011 5/2/2011 Some Description 1
14844766 478914 5/1/2011 5/2/2011 Some Description 1
14844766 478914 5/2/2011 5/2/2011 Some Description 3


#2

Will this query give you the solution you need?

create table hospital_stay
(unique_id number, mem_id number, admit date, discharge date, status varchar2 (20)) ;

insert into hospital_stay (unique_id, mem_id, admit, discharge, status)
values (5459110, 135802, DATE ‘2010-07-20’, DATE ‘2010-07-27’, ‘Some Description 1’) ;
insert into hospital_stay (unique_id, mem_id, admit, discharge, status)
values (5459110, 135802, DATE ‘2010-07-22’, DATE ‘2010-07-27’, ‘Some Description 1’) ;
insert into hospital_stay (unique_id, mem_id, admit, discharge, status)
values (13776129, 478821, DATE ‘2010-06-8’, DATE ‘2010-06-12’, ‘Some Description 4’) ;
insert into hospital_stay (unique_id, mem_id, admit, discharge, status)
values (13776129, 478821, DATE ‘2010-06-11’, DATE ‘2010-06-12’, ‘Some Description 1’) ;
insert into hospital_stay (unique_id, mem_id, admit, discharge, status)
values (14844766, 478914, DATE ‘2010-04-28’, DATE ‘2010-05-2’, ‘Some Description 2’) ;
insert into hospital_stay (unique_id, mem_id, admit, discharge, status)
values (14844766, 478914, DATE ‘2010-04-29’, DATE ‘2010-05-2’, ‘Some Description 1’) ;
insert into hospital_stay (unique_id, mem_id, admit, discharge, status)
values (14844766, 478914, DATE ‘2010-04-30’, DATE ‘2010-05-2’, ‘Some Description 1’) ;
insert into hospital_stay (unique_id, mem_id, admit, discharge, status)
values (14844766, 478914, DATE ‘2010-05-1’, DATE ‘2010-05-2’, ‘Some Description 1’) ;
insert into hospital_stay (unique_id, mem_id, admit, discharge, status)
values (14844766, 478914, DATE ‘2010-05-2’, DATE ‘2010-05-2’, ‘Some Description 3’) ;

commit ;

select * from
(
select unique_id, mem_id,
min (admit) over (partition by mem_id) as first_admit,
max (discharge) over (partition by mem_id) as last_discharge,
status,
row_number () over (partition by mem_id order by unique_id) as row_counter
from hospital_stay
)
where row_counter = 1 ;

UNIQUE_ID MEM_ID FIRST_ADM LAST_DISC STATUS ROW_COUNTER


5459110 135802 20-JUL-10 27-JUL-10 Some Description 1 1
13776129 478821 08-JUN-10 12-JUN-10 Some Description 1 1
14844766 478914 28-APR-10 02-MAY-10 Some Description 1 1


#3

Good morning Jacques.

Thanks for your help Sir.

I actually came up with an “almost perfect” solution using what I learned earlier from you and adding the Row_number function to create counts similar to your latest example.

Then performed a delete as you can see below.

The problem is that it eliminates all records that are duplicative by the CNT2 and I do not get the earliest record, this is only the case for a handful of records, but still not perfect.

When I try to substitute my actual fields into your latest example, (which I think is exacly what I need), I get Teradata error dialog box

“Syntax error, expected something like a name or a unicode delimited identifier or an ‘UDFCALLNAME’ keyword between ‘)’ and the ‘WHERRE’ keyword”

Here is my almost perfect logic:

CREATE TABLE ar_asrh_dm_etl_beacon.Flg_ReAdAlgo1a1 AS
(SELECT Admits1.AdmitClaim_id AS AdmitClaim_id2
,Admits1.AdmitMem_id AS AdmitMem_id2
,MIN(Admits1.AdmitDte_Admit_id) AS ADte
,MAX(Admits1.AdmitDte_Disch_id) AS DDte
,Admits1.AdmitPatStat_dl AS AdmitPatStat_dl2
,Admits1.AdmitPatStat_id AS AdmitPatStat_id2
,Admits1.AdmitProvRend_id AS AdmitProvRend_id2
,Admits1.AdmitProvRend_LastName AS AdmitProvRend_LastName2
,
row_number() over (partition BY Admits1.AdmitMem_id ORDER BY Admits1.AdmitDte_Admit_id) AS CNT1,
COUNT(*) over (partition BY Admits1.AdmitClaim_id) AS CNT2

FROM ar_asrh_dm_etl_beacon.Flg_ReAdAlgo1 AS Admits1

GROUP BY Admits1.AdmitMem_id, Admits1.AdmitClaim_id, Admits1.AdmitDte_Admit_id, Admits1.AdmitDte_Disch_id,
Admits1.AdmitPatStat_dl, Admits1.AdmitPatStat_id, Admits1.AdmitProvRend_id, Admits1.AdmitProvRend_LastName)

WITH DATA;

/Checking progress/
SELECT COUNT() AS NumberOf FROM ar_asrh_dm_etl_beacon.Flg_ReAdAlgo1a1; / = 161,705 Obs*/

DELETE FROM ar_asrh_dm_etl_beacon.Flg_ReAdAlgo1a1
WHERE CNT2 >1 and CNT1 >1;

/Checking progress/
SELECT COUNT() AS NumberOf FROM ar_asrh_dm_etl_beacon.Flg_ReAdAlgo1a1; / = 160,719 Obs*/

Here is my substitution of your logic that is erroring out:

CREATE TABLE ar_asrh_dm_etl_beacon.Flg_ReAdAlgo1a1a AS

(SELECT * FROM

(SELECT AdmitClaim_id2, AdmitMem_id2,
MIN (ADte) over (partition BY AdmitMem_id2) AS first_admit,
MAX (DDte) over (partition BY AdmitMem_id2) AS last_discharge, AdmitPatStat_dl2, AdmitPatStat_id2,
AdmitProvRend_LastName2, AdmitProvRend_id2,

row_number () over (partition by mem_id ORDER BY AdmitClaim_id2) AS row_counter
FROM ar_asrh_dm_etl_beacon.Flg_ReAdAlgo1a1)

WHERE row_counter = 1

GROUP BY AdmitMem_id2, AdmitClaim_id, ADte, DDte, AdmitPatStat_dl2, AdmitPatStat_id2,
AdmitProvRend_id, AdmitProvRend_LastName)

WITH DATA;

I would prefer to evaluate your example for completeness verses mine, but the error is a barrier.

Thanks for any advice!

WHP