Saturday, 20 December 2025

Head Stage

 The Head stage in IBM Data Stage is a Processing (active) stage used to limit the number of rows passed to the next stage.

It is mainly used to read only the first N records from a dataset or source.


What does Head stage do?

  • Passes only the first specified number of rows
  • Stops downstream processing after reaching that count
  • Helps in testing, sampling, and performance checks

Key characteristics

🔹 Active stage (controls row flow)

🔹 Reduces row count

🔹 Works in parallel jobs

🔹 Simple and fast


Common use cases

1️.Testing jobs with sample data

Read only first 100 records instead of millions

2️.Data validation

Check column values and transformations on a limited dataset.

3️.Performance tuning

Run jobs quickly using limited rows during development.


Configuration

In the Head stage properties:

·        Rows to copy → specify number (e.g., 10, 100, 1000)

Example:

Rows to copy = 10


Example job flow

Source

  |

Head (Rows to copy = 5)

  |

Transformer

  |

Target

Only 5 records reach the target.


Head vs Tail stage

Head

Tail

Takes first N rows

Takes last N rows

Used for sampling

Used for recent/latest data

Stops early

Needs full read


Important points.

·        Head stage does not sort data

·        Output depends on input order

·        In parallel jobs, behavior depends on partitioning

·        Often used only in development, not production


Can Head stage guarantee the same records every run?
    Only if the input data order is fixed (e.g., after a Sort stage).


 

 

No comments:

Post a Comment

Most Recent posts

Head Stage

  The Head stage in IBM Data Stage is a Processing (active) stage used to limit the number of rows passed to the next stage. It is mainly us...