Learn ETL Datastage faster: Different DB connector stages

ODBC Connector, Oracle Connector and DB2 Connector stages

· ODBC Connector: Generic connector; can connect to any DB supporting ODBC drivers.

· Oracle Connector: Optimized specifically for Oracle; supports array insert, bulk load, and Oracle-specific features (like direct path load).

· DB2 Connector: Optimized for IBM DB2; supports DB2 CLI features and better pushdown optimization.
Tip: Use native connectors (Oracle/DB2) instead of ODBC for better performance.

1.Configuring connection parameters dynamically

· Use Job Parameters or Parameter Sets for DB name, user, password, and table name.

· In the connector properties, use syntax like:

· Server = #DB_SERVER#

· User = #DB_USER#

· Password = #DB_PASSWORD#

· This allows reuse across environments (Dev, QA, Prod).

Properties for reading data

· Table name / SQL Query

· Column list

· Read Mode (Sequential / Parallel)

· Fetch Buffer Size / Array Size

· Before / After SQL

· Isolation Level
These control what and how data is extracted.

2. Difference between Read, Write, and User-defined SQL

· Read → Automatically generates a SELECT * FROM table.

· Write → Automatically generates INSERT INTO table VALUES(...).

· User-defined SQL → Lets you write custom SQL (e.g., joins, filters, stored procedure calls).

3. Parameter sets for DB connection

· Create a Parameter Set with DB parameters (server, user, pwd, schema).

· Reference it in each DB Connector.
✅ Makes migration easy and avoids hardcoding connection values.

Array Size / Batch Size

· Defines how many rows are fetched or inserted per database round trip.

· Higher value = fewer DB calls = better performance.
Too large can cause memory issues. Common value: 1000–5000.

4.Commit frequency

· Defines how often a transaction is committed (e.g., every 10,000 rows).

· Helps control rollback size and improve performance.
If set too high → long rollback; too low → frequent commits = slow.

5.Auto-Commit = Yes

· Each record is committed individually.
✅ Useful for small updates or CDC jobs.
❌ Avoid for bulk loads (very slow).

Capturing rejected records / SQL errors

· Enable Reject link or Reject file in the stage.

· Properties: “Reject link”, “Write to log”, “Abort on error”.

· You can also output SQLSTATE, SQLCODE, DBMSCODE for debugging.

6.Transaction Size

· Similar to commit frequency — defines number of rows per commit.

· Example: Set Transaction Size = 10,000 → commits every 10k rows.

1. Performance tuning for large inserts

· Use Array Size and Transaction Size appropriately.

· Disable Auto-Commit.

· Use bulk load mode if supported (Oracle Direct Load).

· Avoid index updates during load — rebuild indexes afterward.

· Tune commit interval based on rollback segment.

2. Handling Deadlock / ORA-00060

· Identify conflicting transactions (using DB logs).

· Retry logic in job or re-run the batch.

· Use smaller commit intervals.

· For update operations, maintain a consistent order of access to rows.

3️⃣ Implementing UPSERT

Options:

· Use User-defined SQL with a MERGE statement.

· Or, in Transformer, split links into INSERT and UPDATE paths using lookup to check existing records.

3.Enable SQL tracing / logging

· Enable Trace Level or Verbose Logging in connector advanced properties.

· Oracle: set TraceSQL=Yes.

· Or use environment variables like DS_LOG_LEVEL=DEBUG to capture SQL in job log.

4.Before SQL / After SQL

· Before SQL: Executes before main operation (e.g., truncate table).

· After SQL: Executes after main operation (e.g., update status table).

Useful for pre/post-load logic within same job.

In DB Connector, I enable the Reject link to capture any records that fail during insert or update operations.
I also include columns like SQLCODE, SQLSTATE, and DBMSCODE in the reject link for debugging.
For example, if a record violates a constraint, it gets routed to the reject link with the corresponding SQL error code.
This helps in reprocessing or analysis without failing the entire job.

What is a Database Round Trip?

A database round trip means one complete request–response cycle between the DataStage job and the database server.

It includes:

1. Sending a query or batch of records from DataStage → Database

2. Database processing the request

3. Database sending results or acknowledgment → back to DataStage

Each time this happens, it’s a round trip.

⚙️ Example:

If you insert 1,000,000 rows and your Array Size = 1,
➡️ DataStage sends 1 row per request → 1,000,000 round trips

If you set Array Size = 1000,
➡️ DataStage sends 1000 rows per request → only 1,000 round trips

✅ Fewer round trips = less network overhead = faster performance.

📈 Key Performance Tip:

· Array Size or Batch Size controls how many rows go in a single round trip.

· Increasing it reduces communication between DataStage and DB → huge speed gain.

· But too large can consume more memory.

🧮 Example Scenario

You have a DataStage job inserting 1,000,000 rows into an Oracle table using the DB Connector stage.

Case 1: Array Size = 1

· DataStage sends 1 row per request

· So → 1,000,000 database round trips

· Suppose each trip takes 5 milliseconds (network + DB time)

🕒 Total time = 1,000,000 × 5 ms = 5,000,000 ms = ~83 minutes

Case 2: Array Size = 1,000

· DataStage sends 1,000 rows per request

· So → 1,000 round trips

· Same 5 ms per trip

🕒 Total time = 1,000 × 5 ms = 5,000 ms = 5 seconds

Real-World Practice

· In production, we usually set Array Size = 1000–5000

· It’s a trade-off:

o Too small → too many round trips (slow)

o Too large → memory usage increases

Learn ETL Datastage faster

Pages

Different DB connector stages

No comments:

Post a Comment

Most Recent posts

Copy and Modify Stages