by Ludek Bob Jankovsky, 28-Aug-2008 (ETL PATTERNS)
ETL Pattern means standard functional task defining what has to happen despite technical implementation.
Basic insert-wise patterns:
INSERT - just inserts records from source (simpliest)
DIFF_INSERT - inserts new records only, existing records (based on matching key) are ignored
SCD2_INSERT - inserts new and changed records. Changed records are inserted as duplicates. That way supports load of SCD2 (slowly changing dimensions).
Examples in the article have been built using Simple ETL module generator. SCD2_INSERT pattern is not directly supported by the generator. Worth workaround is usage of DIFF_INSERT pattern with Matchig flag over all meaning columns.
INSERT INTO A2.PARTY TRG$$
(
TRG$$.PT_SEQ
,TRG$$.PT_SID
,TRG$$.PT_ID
,TRG$$.PT_FIRST_NAME
,TRG$$.PT_LAST_NAME
,TRG$$.PT_BIRTH_DATE
,TRG$$.PT_CODE
,TRG$$.PT_DELETED_FLAG
,TRG$$.UPD_EFF_DATE
,TRG$$.INS_DT
,TRG$$.UPD_DT
,TRG$$.INS_PROCESS
,TRG$$.UPD_PROCESS
)
Select
PT_SEQ.nextval
,SRC$$.PT_SID
,SRC$$.PT_ID
,SRC$$.PT_FIRST_NAME
,SRC$$.PT_LAST_NAME
,SRC$$.PT_BIRTH_DATE
,SRC$$.PT_CODE
,SRC$$.PT_DELETED_FLAG
,SRC$$.UPD_EFF_DATE
,SRC$$.INS_DT
,SRC$$.UPD_DT
,SRC$$.INS_PROCESS
,SRC$$.UPD_PROCESS
from(
SELECT
'SR1' as PT_SID
,to_char(CUST.ID) as PT_ID
,CUST.CUST_FIRST_NAME as PT_FIRST_NAME
,CUST.CUST_LAST_NAME as PT_LAST_NAME
,CUST.CUST_BIRTH_DATE as PT_BIRTH_DATE
,CUST.CUST_CODE as PT_CODE
,CUST.CUST_DELETED_FLAG as PT_DELETED_FLAG
,p_effective_date as UPD_EFF_DATE
,sysdate as INS_DT
,sysdate as UPD_DT
,p_process_key as INS_PROCESS
,p_process_key as UPD_PROCESS
FROM A2.CUSTOMER CUST
) SRC$$
;
Implementation of DIFF_INSERT pattern based on the same metadata:
MERGE INTO A2.PARTY TRG$$
USING(
SELECT
'SR1' as PT_SID
,to_char(CUST.ID) as PT_ID
,CUST.CUST_FIRST_NAME as PT_FIRST_NAME
,CUST.CUST_LAST_NAME as PT_LAST_NAME
,CUST.CUST_BIRTH_DATE as PT_BIRTH_DATE
,CUST.CUST_CODE as PT_CODE
,CUST.CUST_DELETED_FLAG as PT_DELETED_FLAG
,p_effective_date as UPD_EFF_DATE
,sysdate as INS_DT
,sysdate as UPD_DT
,p_process_key as INS_PROCESS
,p_process_key as UPD_PROCESS
FROM A2.CUSTOMER CUST
) SRC$$
ON(
TRG$$.PT_SID=SRC$$.PT_SID
AND TRG$$.PT_ID=SRC$$.PT_ID
)
WHEN NOT MATCHED THEN INSERT
(
TRG$$.PT_SEQ
,TRG$$.PT_SID
,TRG$$.PT_ID
,TRG$$.PT_FIRST_NAME
,TRG$$.PT_LAST_NAME
,TRG$$.PT_BIRTH_DATE
,TRG$$.PT_CODE
,TRG$$.PT_DELETED_FLAG
,TRG$$.UPD_EFF_DATE
,TRG$$.INS_DT
,TRG$$.UPD_DT
,TRG$$.INS_PROCESS
,TRG$$.UPD_PROCESS
)VALUES(
PT_SEQ.nextval
,SRC$$.PT_SID
,SRC$$.PT_ID
,SRC$$.PT_FIRST_NAME
,SRC$$.PT_LAST_NAME
,SRC$$.PT_BIRTH_DATE
,SRC$$.PT_CODE
,SRC$$.PT_DELETED_FLAG
,SRC$$.UPD_EFF_DATE
,SRC$$.INS_DT
,SRC$$.UPD_DT
,SRC$$.INS_PROCESS
,SRC$$.UPD_PROCESS
)
;
SCD2_INSERT pattern requires in our case modification of metadata.
Metadata of the workaroun - mention MatchFlag="Y" for all meaningful columns:
Implementation of the SCD2_INSERT pattern (mention form of the merge usage):
Just another technical algorithm of DIFF_INSERT - DIFF_INSERT(MINUS).
It's more efficient with transformation thru DB-link. The algorithm supports DirectFlag option for all non-meaningful columns.
Implementation:
MERGE INTO A2.PARTY TRG$$
USING(
SELECT
'SR1' as PT_SID
,to_char(CUST.ID) as PT_ID
,CUST.CUST_FIRST_NAME as PT_FIRST_NAME
,CUST.CUST_LAST_NAME as PT_LAST_NAME
,CUST.CUST_BIRTH_DATE as PT_BIRTH_DATE
,CUST.CUST_CODE as PT_CODE
,CUST.CUST_DELETED_FLAG as PT_DELETED_FLAG
,p_effective_date as VALID_FROM
,to_date('29991231') as VALID_THRU
,p_effective_date as UPD_EFF_DATE
,sysdate as INS_DT
,sysdate as UPD_DT
,p_process_key as INS_PROCESS
,p_process_key as UPD_PROCESS
FROM A2.CUSTOMER CUST
) SRC$$
ON(
TRG$$.PT_SID=SRC$$.PT_SID
AND TRG$$.PT_ID=SRC$$.PT_ID
AND TRG$$.PT_FIRST_NAME=SRC$$.PT_FIRST_NAME
AND TRG$$.PT_LAST_NAME=SRC$$.PT_LAST_NAME
AND TRG$$.PT_BIRTH_DATE=SRC$$.PT_BIRTH_DATE
AND TRG$$.PT_CODE=SRC$$.PT_CODE
AND TRG$$.PT_DELETED_FLAG=SRC$$.PT_DELETED_FLAG
)
WHEN NOT MATCHED THEN INSERT
(
TRG$$.PT_SEQ
,TRG$$.PT_SID
,TRG$$.PT_ID
,TRG$$.PT_FIRST_NAME
,TRG$$.PT_LAST_NAME
,TRG$$.PT_BIRTH_DATE
,TRG$$.PT_CODE
,TRG$$.PT_DELETED_FLAG
,TRG$$.VALID_FROM
,TRG$$.VALID_THRU
,TRG$$.UPD_EFF_DATE
,TRG$$.INS_DT
,TRG$$.UPD_DT
,TRG$$.INS_PROCESS
,TRG$$.UPD_PROCESS
)VALUES(
PT_SEQ.nextval
,SRC$$.PT_SID
,SRC$$.PT_ID
,SRC$$.PT_FIRST_NAME
,SRC$$.PT_LAST_NAME
,SRC$$.PT_BIRTH_DATE
,SRC$$.PT_CODE
,SRC$$.PT_DELETED_FLAG
,SRC$$.VALID_FROM
,SRC$$.VALID_THRU
,SRC$$.UPD_EFF_DATE
,SRC$$.INS_DT
,SRC$$.UPD_DT
,SRC$$.INS_PROCESS
,SRC$$.UPD_PROCESS
)
;