Learn ETL Datastage faster: Head Stage

Saturday, 20 December 2025

The Head stage in IBM Data Stage is a Processing (active) stage used to limit the number of rows passed to the next stage.

It is mainly used to read only the first N records from a dataset or source.

What does Head stage do?

Key characteristics

🔹 Active stage (controls row flow)

🔹 Reduces row count

🔹 Works in parallel jobs

🔹 Simple and fast

Common use cases

1️.Testing jobs with sample data

Read only first 100 records instead of millions

2️.Data validation

Check column values and transformations on a limited dataset.

3️.Performance tuning

Run jobs quickly using limited rows during development.

Configuration

In the Head stage properties:

· Rows to copy → specify number (e.g., 10, 100, 1000)

Example:

Rows to copy = 10

Example job flow

Source

Head (Rows to copy = 5)

Transformer

Target

Only 5 records reach the target.

Head vs Tail stage

Important points.

· Head stage does not sort data

· Output depends on input order

· In parallel jobs, behavior depends on partitioning

· Often used only in development, not production

Can Head stage guarantee the same records every run?
Only if the input data order is fixed (e.g., after a Sort stage).

Learn ETL Datastage faster