Saturday, 20 December 2025

Head Stage

 The Head stage in IBM Data Stage is a Processing (active) stage used to limit the number of rows passed to the next stage.

It is mainly used to read only the first N records from a dataset or source.


What does Head stage do?

  • Passes only the first specified number of rows
  • Stops downstream processing after reaching that count
  • Helps in testing, sampling, and performance checks

Key characteristics

🔹 Active stage (controls row flow)

🔹 Reduces row count

🔹 Works in parallel jobs

🔹 Simple and fast


Common use cases

1️.Testing jobs with sample data

Read only first 100 records instead of millions

2️.Data validation

Check column values and transformations on a limited dataset.

3️.Performance tuning

Run jobs quickly using limited rows during development.


Configuration

In the Head stage properties:

·        Rows to copy → specify number (e.g., 10, 100, 1000)

Example:

Rows to copy = 10


Example job flow

Source

  |

Head (Rows to copy = 5)

  |

Transformer

  |

Target

Only 5 records reach the target.


Head vs Tail stage

Head

Tail

Takes first N rows

Takes last N rows

Used for sampling

Used for recent/latest data

Stops early

Needs full read


Important points.

·        Head stage does not sort data

·        Output depends on input order

·        In parallel jobs, behavior depends on partitioning

·        Often used only in development, not production


Can Head stage guarantee the same records every run?
    Only if the input data order is fixed (e.g., after a Sort stage).


 

 

Debugging Stages

What is Row Generator in IBM DataStage?

Row Generator is a DataStage stage used to generate dummy or test data.
It does not read data from any source; instead, it creates rows internally based on values you define.

Key Purpose of Row Generator

Create test data
Generate sequence numbers
Produce constant or derived values
Used in unit testing, debugging, and job validation

How Row Generator Works

You define:

  1. Number of rows to generate
  2. Column metadata
  3. Derivations for each column

The stage then produces that many rows and sends them to the next stage.


 Important Properties

1️ . Rows per partition

  • Defines how many rows each partition generates
  • Total rows = Rows per partition × number of partitions

Example:

  • Rows per partition = 100
  • Partitions = 4
    Total rows = 400

2️. Column Derivations

You can use:

  • Constants
  • Functions
  • System variables

Example:

ID        = @INROWNUM

NAME      = "TEST"

LOAD_DT   = CurrentDate()


 Common Use Cases

🔹 1. Generate Sequence Numbers

EMP_ID = @INROWNUM


🔹 2. Create Dummy Test Data

Useful when:

  • Source system not available
  • Testing job flow

CUST_ID = @INROWNUM

CUST_NM = "Customer_" : StringFromInt(@INROWNUM)


🔹 3. Debugging / Unit Testing

  • Test transformer logic
  • Test lookup logic
  • Validate target table mappings

🔹 4. Control Table Initialization

Used to:

  • Load initial control / parameter tables
  • Generate static reference data

🆚 Row Generator vs Sequential File

Feature

Row Generator

Sequential File

Reads source data

No

Yes

Generates data

Yes

No

Used for testing

⚠️ Limited

Requires file


Important Points.

No input link
Total rows depend on partitions
Commonly used with @INROWNUM
Mostly for testing, debugging, POC
Not used in production loads normally


🎯 Sample Job Flow

Row Generator → Transformer → Sequential File / DB Target


 

Top of Form

 

Bottom of Form

 

 


Thursday, 18 December 2025

Transformer stage scenario based questions.

 

Question: Design a data stage job to get a target output as below .

Source:         Target 

Eno Ename    Eno Ename 

1      a,b           1      a 

2      c,d            2     b

 3      e,f           3      c

 

Design : 

 

sequence  stage -> transformer ->row generator ->target 

 

Transformer – Split the Ename Column

Create two new columns:

ename1 = Field(Ename, ",", 1)

ename2 = Field(Ename, ",", 2)

Row Generator Stage

Purpose: Convert columns → rows

Row Generator Properties

·        Number of rows to generate: 2

·        Row number column: row_num

Transformer – Output Logic

o   Derive Ename:

§  If row_num = 1 Then ename1

§  Else ename2

§  Derive Eno using Sequential Counter:

o   NextValue()

§  ✔️ This produces:

§  a

§  b

§  c

·        Final Output Produced

Eno

Ename

1

a

2

b

3

c

 

 

 

 

Most Recent posts

Copy and Modify Stages

In IBM Infosphere DataStage , both Copy Stage and Modify Stage are simple processing stages used in parallel jobs , but their purpose i...