What is a Surrogate Key ?

A Surrogate Key (SK) is an artificial, system-generated unique identifier assigned to dimension records in a data warehouse.
It has no business meaning and is used instead of natural/business keys.


🔹 Why Surrogate Keys are Used

✔️ Handles Slowly Changing Dimensions (SCD)
✔️ Improves join performance
✔️ Avoids issues with changing business keys
✔️ Ensures uniqueness across history


🔹 Example

Source (Business Key)

Customer_ID | Name | City

1001        | Ravi | Pune

Dimension Table

Cust_Key | Customer_ID | Name | City

1        | 1001        | Ravi | Pune

2        | 1001        | Ravi | Mumbai   ← Type 2 change

 Cust_Key is the surrogate key.


🔹 How Surrogate Keys are Handled in DataStage

DataStage provides multiple ways to generate and manage surrogate keys.


1️ Surrogate Key Generator Stage.

🔸 What it does

  • Automatically generates sequential unique numbers
  • Thread-safe in parallel jobs

🔸 Job Flow

Source → Transformer → Surrogate Key Generator → Dimension Table

🔸 Configuration

  • Specify:
    • Key column name
    • Starting value
    • Increment value

2️ Using Sequence in Database

🔸 How it works

  • Database sequence generates key
  • DataStage reads the generated value

🔸 Example

          NEXTVAL(customer_seq)


3️ Using Max + 1 Logic

        SELECT MAX(CUST_KEY) + 1 FROM DIM_CUSTOMER


🔹 Surrogate Keys in SCD Handling

🔸 Type 1

  • Same surrogate key
  • Record updated

🔸 Type 2

  • New surrogate key generated
  • Old record expired

🔸 Type 3

  • Same surrogate key
  • Limited history columns updated

🔹 Surrogate Key Lookup During Fact Load

Fact table stores only surrogate keys, not business keys.

Fact Load Flow

Source → Lookup (Dimension) → Fact Table

Example

Fact Table

----------

Cust_Key | Amount

1        | 5000


🔹 Best Practices in DataStage

✔️ Always use Surrogate Key Generator stage
✔️ Maintain natural key + surrogate key
✔️ Never expose surrogate keys to business users
✔️ Use lookup caching for performance


 

No comments:

Post a Comment

Most Recent posts

Transformer stage scenario based questions.

  Question: Design a data stage job to get a target output as below . Source:         Target  Eno Ename    Eno Ename  1   ...