Monday, 1 December 2025

What is a Delta Load in DataStage?

 

A Delta Load means loading only the newly added or changed data instead of loading the full table every time.

This is used to save time, reduce load on the database, and improve job performance.

Why Delta Load is Important

  • Faster ETL jobs

  • Less resource usage

  • Prevents duplicate data

  • Ideal for daily or hourly jobs


Types of Delta Loads

  1. Insert Only – load only new rows

  2. Insert + Update – load new and changed rows

  3. CDC (Change Data Capture) – capture change flag from database

  4. Date-based filtering – use timestamp column


How Delta Load Works in DataStage

You can design delta load using:

1. Change Capture Stage

  • Compares old dataset vs new dataset

  • Outputs Inserts, Updates, Deletes

2. CDC from database (Oracle / SQL Server)

  • Reads change tables or logs

  • Directly captures only changed rows

3. Modified Timestamp Filtering

Example SQL:

SELECT * FROM CUSTOMER WHERE LAST_UPDATED > :LAST_RUN_DATE;

Real-Life Example

A sales table receives 10,000 new records every day.

Without Delta Load

  • Full 1 million rows get loaded daily

  • Job takes 20–30 minutes

With Delta Load

  • Only 10,000 new or changed rows are loaded

  • Job finishes in 2–3 minutes


Simple Delta Load Job Design in DataStage

  1. Extract table with filter LAST_UPDATED > LAST RUN DATE

  2. Apply transformations

  3. Load into target

  4. Update a run control table with last run timestamp



No comments:

Post a Comment

Most Recent posts

How to configure DB Connector Stages –