Different DB connector stages

 

ODBC Connector, Oracle Connector and DB2 Connector stages

·        ODBC Connector: Generic connector; can connect to any DB supporting ODBC drivers.

·        Oracle Connector: Optimized specifically for Oracle; supports array insert, bulk load, and Oracle-specific features (like direct path load).

·        DB2 Connector: Optimized for IBM DB2; supports DB2 CLI features and better pushdown optimization.
Tip: Use native connectors (Oracle/DB2) instead of ODBC for better performance.


1.Configuring connection parameters dynamically

·        Use Job Parameters or Parameter Sets for DB name, user, password, and table name.

·        In the connector properties, use syntax like:

·        Server = #DB_SERVER#

·        User = #DB_USER#

·        Password = #DB_PASSWORD#

·        This allows reuse across environments (Dev, QA, Prod).

Properties for reading data

·        Table name / SQL Query

·        Column list

·        Read Mode (Sequential / Parallel)

·        Fetch Buffer Size / Array Size

·        Before / After SQL

·        Isolation Level
These control what and how data is extracted.


2. Difference between Read, Write, and User-defined SQL

·        Read → Automatically generates a SELECT * FROM table.

·        Write → Automatically generates INSERT INTO table VALUES(...).

·        User-defined SQL → Lets you write custom SQL (e.g., joins, filters, stored procedure calls).


3. Parameter sets for DB connection

·        Create a Parameter Set with DB parameters (server, user, pwd, schema).

·        Reference it in each DB Connector.
Makes migration easy and avoids hardcoding connection values.

Array Size / Batch Size

·        Defines how many rows are fetched or inserted per database round trip.

·        Higher value = fewer DB calls = better performance.
 Too large can cause memory issues. Common value: 1000–5000.


4.Commit frequency

·        Defines how often a transaction is committed (e.g., every 10,000 rows).

·        Helps control rollback size and improve performance.
   
If set too high → long rollback; too low → frequent commits = slow.


5.Auto-Commit = Yes

·        Each record is committed individually.
Useful for small updates or CDC jobs.
Avoid for bulk loads (very slow).

Capturing rejected records / SQL errors

·        Enable Reject link or Reject file in the stage.

·        Properties: “Reject link”, “Write to log”, “Abort on error”.

·        You can also output SQLSTATE, SQLCODE, DBMSCODE for debugging.


6.Transaction Size

·        Similar to commit frequency — defines number of rows per commit.

·        Example: Set Transaction Size = 10,000 → commits every 10k rows.


1. Performance tuning for large inserts

·        Use Array Size and Transaction Size appropriately.

·        Disable Auto-Commit.

·        Use bulk load mode if supported (Oracle Direct Load).

·        Avoid index updates during load — rebuild indexes afterward.

·        Tune commit interval based on rollback segment.

2. Handling Deadlock / ORA-00060

·        Identify conflicting transactions (using DB logs).

·        Retry logic in job or re-run the batch.

·        Use smaller commit intervals.

·        For update operations, maintain a consistent order of access to rows.


3️ Implementing UPSERT

Options:

·        Use User-defined SQL with a MERGE statement.

·        Or, in Transformer, split links into INSERT and UPDATE paths using lookup to check existing records.


3.Enable SQL tracing / logging

·        Enable Trace Level or Verbose Logging in connector advanced properties.

·        Oracle: set TraceSQL=Yes.

·        Or use environment variables like DS_LOG_LEVEL=DEBUG to capture SQL in job log.


4.Before SQL / After SQL

·        Before SQL: Executes before main operation (e.g., truncate table).

·        After SQL: Executes after main operation (e.g., update status table).

Useful for pre/post-load logic within same job.

In DB Connector, I enable the Reject link to capture any records that fail during insert or update operations.
I also include columns like SQLCODE, SQLSTATE, and DBMSCODE in the reject link for debugging.
For example, if a record violates a constraint, it gets routed to the reject link with the corresponding SQL error code.
This helps in reprocessing or analysis without failing the entire job.

 What is a Database Round Trip?

A database round trip means one complete request–response cycle between the DataStage job and the database server.

 It includes:

1.     Sending a query or batch of records from DataStage → Database

2.     Database processing the request

3.     Database sending results or acknowledgment → back to DataStage

Each time this happens, it’s a round trip.

⚙️ Example:

If you insert 1,000,000 rows and your Array Size = 1,
➡️ DataStage sends 1 row per request1,000,000 round trips

If you set Array Size = 1000,
➡️ DataStage sends 1000 rows per requestonly 1,000 round trips

Fewer round trips = less network overhead = faster performance.


📈 Key Performance Tip:

·        Array Size or Batch Size controls how many rows go in a single round trip.

·        Increasing it reduces communication between DataStage and DB → huge speed gain.

·        But too large can consume more memory.


🧮 Example Scenario

You have a DataStage job inserting 1,000,000 rows into an Oracle table using the DB Connector stage.


Case 1: Array Size = 1

·        DataStage sends 1 row per request

·        So → 1,000,000 database round trips

·        Suppose each trip takes 5 milliseconds (network + DB time)

🕒 Total time = 1,000,000 × 5 ms = 5,000,000 ms = ~83 minutes


Case 2: Array Size = 1,000

·        DataStage sends 1,000 rows per request

·        So → 1,000 round trips

·        Same 5 ms per trip

🕒 Total time = 1,000 × 5 ms = 5,000 ms = 5 seconds

Real-World Practice

·        In production, we usually set Array Size = 1000–5000

·        It’s a trade-off:

o   Too small → too many round trips (slow)

o   Too large → memory usage increases

No comments:

Post a Comment

Most Recent posts

Transformer stage scenario based questions.

  Question: Design a data stage job to get a target output as below . Source:         Target  Eno Ename    Eno Ename  1   ...