Friday, 9 January 2026

What is OSH?

OSH is the underlying execution engine / scripting language used by DataStage to run parallel jobs.

When you run a Parallel Job, DataStage internally converts the job design into an OSH script and executes it.

OSH (Orchestrate Shell) is the execution language used by IBM DataStage’s parallel engine. Parallel jobs are converted into OSH scripts, which control how stages run, how data is partitioned, and how processing happens across nodes. 


Why OSH Is Important

  • Controls job execution
  • Manages parallelism
  • Handles partitioning
  • Manages data flow between stages
  • Executes on multiple nodes

Simple Flow

DataStage Job Design

     

Generated OSH Script

     

APT Engine executes OSH


Where OSH Exists

  • OSH scripts are created temporarily during job execution
  • Location (example):
    • $APT_TMPDIR
  • Usually auto-deleted after job completion (unless debug enabled)

What OSH Contains

  • Stage operators
  • Link definitions
  • Partitioning logic
  • Sorting logic
  • File paths
  • Node allocations

Example (Conceptual)

ds_operator input | ds_transform | ds_aggregator | ds_operator output


OSH vs Unix Shell

Aspect

OSH

Unix Shell

Purpose

DataStage job execution

OS command execution

Used by

DataStage engine

Users / scripts

Parallelism

Built-in

Manual

User writes it?

No (auto-generated)

Yes


When You See OSH (Real Projects)

  • Job failure analysis
  • Performance tuning
  • Debugging parallel jobs
  • DS_SUPPORT / DSENGINE logs

 

Tuesday, 6 January 2026

How do you identify a datastge job is running parallely or sequentially.?

 

Check the Job Type in Data Stage Designer

This is the first and simplest check.

·        Parallel Job → runs in parallel

·        Server Job / Sequence Job → runs sequentially

📌 If it’s a Server Job, it cannot run in parallel.


2️ .Check Stage Type Used

Some stages are always sequential.

Sequential-only stages:

·        Server Sequential File

·        Server Transformer

·        Server Lookup

·        Server Join

📌 If your job mainly uses Server stages, the job is sequential.


3️. Look at the Job Log (Very Important)

Open Director → Job Log.

Parallel job log shows:

Operator: pxfunnel

Operator: pxpartition

Operator: pxsort

Number of nodes = 4

Sequential execution indicators:

  • No mention of px operators
  • No mention of nodes
  • Single process messages only

📌 If you don’t see px* operators → job is behaving sequentially.


4️. Check Environment Variable: $APT_CONFIG_FILE

This controls parallelism.

·        If not set or invalid, job runs on 1 node

·        If points to a valid config file → parallel execution

📌 Verify in:

Job Properties → Parameters → Environment


5️. Check Number of Partitions on Links

In Designer:

·        Right-click link → Properties

·        Check Partitioning

Sequential behavior if:

·        Partition count = 1

·        Partitioning = Entire / Same

Parallel behavior:

·        Hash / Range / Round-Robin with multiple partitions   


6️. CPU & Process Monitoring (OS Level)

On the DataStage server:

·        Parallel job → multiple osh / dsapi_slave processes

·        Sequential job → single process

Commands:

ps -ef | grep dsapi

top


7️. Dataset vs Sequential File

  • Dataset (.ds) → supports parallelism
  • Sequential File (.txt, .dat) → often forces serialization (unless multiple readers/writers)

📌 Heavy use of Sequential Files can make a parallel job behave sequentially.


8️. Peek / Debug Mode

If you enable Peek and see only one data stream, the job is not parallel.


I identify a DataStage job running sequentially by checking the job type, stage types, partitioning on links, $APT_CONFIG_FILE, and especially the job log for px operators and node count.


 

Most Recent posts

What is OSH?

OSH is the underlying execution engine / scripting language used by DataStage to run parallel jobs. When you run a Parallel Job, DataStage...