The Head stage in IBM Data Stage is a Processing (active) stage used to limit the number of rows passed to the next stage.
It is mainly used to read only the first N records from a dataset or source.
What does Head stage do?
- Passes only the first specified number of rows
- Stops downstream processing after reaching that count
- Helps in testing, sampling, and performance checks
Key characteristics
🔹 Active stage (controls row flow)
🔹 Reduces row count
🔹 Works in parallel jobs
🔹 Simple and fast
Common use cases
1️.Testing jobs with sample data
Read only first 100 records instead of millions
2️.Data validation
Check column values and transformations on a limited dataset.
3️.Performance tuning
Run jobs quickly using limited rows during development.
Configuration
In the Head stage properties:
· Rows to copy → specify number (e.g., 10, 100, 1000)
Example:
Rows to copy = 10
Example job flow
Source
|
Head (Rows to copy = 5)
|
Transformer
|
Target
Only 5 records reach the target.
Head vs Tail stage
|
Head |
Tail |
|
Takes first N rows |
Takes last N rows |
|
Used for sampling |
Used for recent/latest data |
|
Stops early |
Needs full read |
Important points.
· Head stage does not sort data
· Output depends on input order
· In parallel jobs, behavior depends on partitioning
· Often used only in development, not production
Can
Head stage guarantee the same records every run?
Only if the input data order
is fixed (e.g., after a Sort stage).
No comments:
Post a Comment