Monday, 1 December 2025

What is the Change Capture Stage in DataStage?

The Change Capture stage in DataStage compares old and new datasets and identifies delta changes such as inserts, updates, and deletes. It outputs change codes that help in performing incremental data loads efficiently.


Why Do We Use the Change Capture Stage?

In real-time ETL processes, loading the entire dataset every day is inefficient.
Instead, it is better to load only the delta changes.

The Change Capture stage provides this by comparing:

  • Old Dataset (Before)

  • New Dataset (After)

and generating a list of records that have changed.


How It Works

The stage compares both datasets row by row using:

It then outputs records with specific change codes to indicate what type of change occurred.


Change Codes (Key Output)

Change CodeMeaning
I or 1Insert (New record in After dataset, not found in Before dataset)
D or 2Delete (Record exists in Before dataset, missing in After dataset)
C or 3Change (Record exists in both, but one or more column values changed)
E or 4Copy (Record unchanged) — normally filtered out

Most ETL loads use only I, D, C

Because unchanged records do not need reprocessing.


Example Scenario

Before Dataset (Day 1)

IDNameSalary
1John3000
2Mary4000

After Dataset (Day 2)

IDNameSalary
1John3200
3Alex3500

Change Capture Output

IDChange CodeDescription
1CSalary updated
2DRecord deleted
3INew record inserted

This delta is used for incremental loading into the target table.

When Do We Use Change Capture Stage?

  • Daily incremental/delta loads
  • CDC (Change Data Capture) processes
  • Slowly Changing Dimensions (SCD)
  • Synchronizing two data repositories
  • Large datasets where full reload is expensive

No comments:

Post a Comment

Most Recent posts

How to configure DB Connector Stages –