Learn ETL Datastage faster: 2025-12-14

Saturday, 20 December 2025

Head Stage

The Head stage in IBM Data Stage is a Processing (active) stage used to limit the number of rows passed to the next stage.

It is mainly used to read only the first N records from a dataset or source.

What does Head stage do?

Passes only the first specified number of rows
Stops downstream processing after reaching that count
Helps in testing, sampling, and performance checks

Key characteristics

🔹 Active stage (controls row flow)

🔹 Reduces row count

🔹 Works in parallel jobs

🔹 Simple and fast

Common use cases

1️.Testing jobs with sample data

Read only first 100 records instead of millions

2️.Data validation

Check column values and transformations on a limited dataset.

3️.Performance tuning

Run jobs quickly using limited rows during development.

Configuration

In the Head stage properties:

· Rows to copy → specify number (e.g., 10, 100, 1000)

Example:

Rows to copy = 10

Example job flow

Source

Head (Rows to copy = 5)

Transformer

Target

Only 5 records reach the target.

Head vs Tail stage

Head	Tail
Takes first N rows	Takes last N rows
Used for sampling	Used for recent/latest data
Stops early	Needs full read

Important points.

· Head stage does not sort data

· Output depends on input order

· In parallel jobs, behavior depends on partitioning

· Often used only in development, not production

Can Head stage guarantee the same records every run?
Only if the input data order is fixed (e.g., after a Sort stage).

Debugging Stages

What is Row Generator in IBM DataStage?

Row Generator is a DataStage stage used to generate dummy or test data.
It does not read data from any source; instead, it creates rows internally based on values you define.

Key Purpose of Row Generator

✔ Create test data
✔ Generate sequence numbers
✔ Produce constant or derived values
✔ Used in unit testing, debugging, and job validation

How Row Generator Works

You define:

Number of rows to generate
Column metadata
Derivations for each column

The stage then produces that many rows and sends them to the next stage.

Important Properties

1️ . Rows per partition

Defines how many rows each partition generates
Total rows = Rows per partition × number of partitions

Example:

Rows per partition = 100
Partitions = 4
➡ Total rows = 400

2️. Column Derivations

You can use:

Constants
Functions
System variables

Example:

ID = @INROWNUM

NAME = "TEST"

LOAD_DT = CurrentDate()

Common Use Cases

🔹 1. Generate Sequence Numbers

EMP_ID = @INROWNUM

🔹 2. Create Dummy Test Data

Useful when:

Source system not available
Testing job flow

CUST_ID = @INROWNUM

CUST_NM = "Customer_" : StringFromInt(@INROWNUM)

🔹 3. Debugging / Unit Testing

Test transformer logic
Test lookup logic
Validate target table mappings

🔹 4. Control Table Initialization

Used to:

Load initial control / parameter tables
Generate static reference data

🆚 Row Generator vs Sequential File

Feature	Row Generator	Sequential File
Reads source data	❌ No	✅ Yes
Generates data	✅ Yes	❌ No
Used for testing	✅	⚠️ Limited
Requires file	❌	✅

Important Points.

✔ No input link
✔ Total rows depend on partitions
✔ Commonly used with @INROWNUM
✔ Mostly for testing, debugging, POC
✔ Not used in production loads normally

🎯 Sample Job Flow

Row Generator → Transformer → Sequential File / DB Target

Top of Form

Bottom of Form

Thursday, 18 December 2025

Transformer stage scenario based questions.

Question: Design a data stage job to get a target output as below .

Source: Target

Eno Ename Eno Ename

1 a,b 1 a

2 c,d 2 b

3 e,f 3 c

Design :

sequence stage -> transformer ->row generator ->target

Transformer – Split the Ename Column

Create two new columns:

ename1 = Field(Ename, ",", 1)

ename2 = Field(Ename, ",", 2)

Row Generator Stage

Purpose: Convert columns → rows

Row Generator Properties

· Number of rows to generate: 2

· Row number column: row_num

Transformer – Output Logic

o Derive Ename:

§ If row_num = 1 Then ename1

§ Else ename2

§ Derive Eno using Sequential Counter:

o NextValue()

§ ✔️ This produces:

§ a

§ b

§ c

· Final Output Produced

Eno	Ename
1	a
2	b
3	c

Learn ETL Datastage faster

Pages

Saturday, 20 December 2025

Head Stage

Debugging Stages

Thursday, 18 December 2025

Transformer stage scenario based questions.

Most Recent posts

Copy and Modify Stages