A Slowly Changing Dimension (SCD) tracks how dimension data changes over time.
ü SCD Type 1 (Overwrite old data)
Definition
No history tracked.
If dimension attributes change, update the existing row.
Useful for non-historical or correction-based fields.
Real-world example
Changing customer email ID or phone number where history is not important.
ü SCD Type 1 – Job Design (DataStage)
Job Stages
Source (DB extract / file)
Lookup Stage
Lookup on Dimension Key (e.g., Customer_ID)
Transformer
Compare columns → Identify changed rows
If no match → INSERT
If match but attributes changed → UPDATE
Two Output Links
Insert link → Dimension table
Update link → Dimension table
Target (DB Connector)
Change comparison logic example
IF Lkp_CustID IS NULL THEN 'I'ELSE IF Inp.Email <> Lkp.Email THEN 'U'ELSE 'R' -- no change
Sample Input / Output – SCD Type 1
Source Input
Cust_ID | Name | Email |
101 | John | john@gmail.com |
Existing Dimension Table
Cust_ID | Name | Email |
101 | John | john.old@gmail.com |
Expected Output (after SCD1)
Cust_ID | Name | Email |
101 | John | john@gmail.com |
Old email is overwritten, no history kept.
SCD Type 2 (Preserve full history)
Definition
Creates a new row when the attribute changes.
Maintains full history with date ranges or active flag.
Common SCD2 Fields
Column | Purpose |
Start_Date | When record became active |
End_Date | When record stopped being active |
Current_Flag | Y/N |
Version_No | Optional incremental version |
SCD Type 2 – Job Design (DataStage)
Job Stages
Source Stage
Lookup Stage on business key
Transformer Stage
Compare incoming vs existing attribute(s)
Identify:
New record → INSERT
Changed record → EXPIRE existing + INSERT new
Two Output Links
Expired-Update Link → Update old row
Current_Flag = 'N'End_Date = CurrentTimestamp()
New-Insert Link → Insert new row
Current_Flag = 'Y'Start_Date = CurrentTimestamp()Version_No = previous+1
Target (DB Connector)
Sample Input / Output – SCD Type 2
Source Input
Cust_ID | Name | City |
101 | John | Chennai |
Existing Dimension Table
Dim_Key | Cust_ID | Name | City | Start_Date | End_Date | Curr_Flag | Version |
1 | 101 | John | Mumbai | 2020-01-01 | 9999-12-31 | Y | 1 |
Incoming change:
City changed
Mumbai → Chennai
Expected Output
Expire old row
Dim_Key | Cust_ID | Name | City | Start_Date | End_Date | Curr_Flag | Version |
1 | 101 | John | Mumbai | 2020-01-01 | 2024-12-03 17:00 | N | 1 |
Insert new row
Dim_Key | Cust_ID | Name | City | Start_Date | End_Date | Curr_Flag | Version |
2 | 101 | John | Chennai | 2024-12-03 | 9999-12-31 | Y | 2 |
No comments:
Post a Comment