ODBC Connector, Oracle Connector and DB2 Connector stages
· ODBC Connector: Generic connector; can connect to any DB supporting ODBC drivers.
· Oracle Connector: Optimized specifically for Oracle; supports array insert, bulk load, and Oracle-specific features (like direct path load).
·
DB2
Connector:
Optimized for IBM DB2; supports DB2 CLI features and better pushdown
optimization.
Tip: Use native connectors (Oracle/DB2) instead of ODBC for better
performance.
1.Configuring connection parameters dynamically
· Use Job Parameters or Parameter Sets for DB name, user, password, and table name.
· In the connector properties, use syntax like:
· Server = #DB_SERVER#
· User = #DB_USER#
· Password = #DB_PASSWORD#
· This allows reuse across environments (Dev, QA, Prod).
Properties for reading data
· Table name / SQL Query
· Column list
· Read Mode (Sequential / Parallel)
· Fetch Buffer Size / Array Size
· Before / After SQL
·
Isolation
Level
These control what and how data is extracted.
2. Difference between Read, Write, and User-defined SQL
· Read → Automatically generates a SELECT * FROM table.
· Write → Automatically generates INSERT INTO table VALUES(...).
· User-defined SQL → Lets you write custom SQL (e.g., joins, filters, stored procedure calls).
3. Parameter sets for DB connection
· Create a Parameter Set with DB parameters (server, user, pwd, schema).
·
Reference
it in each DB Connector.
✅
Makes migration easy and avoids hardcoding connection values.
Array Size / Batch Size
· Defines how many rows are fetched or inserted per database round trip.
·
Higher
value = fewer DB calls = better performance.
Too large can cause memory issues. Common value: 1000–5000.
4.Commit frequency
· Defines how often a transaction is committed (e.g., every 10,000 rows).
·
Helps
control rollback size and improve performance.
If set too high → long rollback; too low → frequent commits = slow.
5.Auto-Commit = Yes
·
Each
record is committed individually.
✅
Useful for small updates or CDC jobs.
❌
Avoid for bulk loads (very slow).
Capturing rejected records / SQL errors
· Enable Reject link or Reject file in the stage.
· Properties: “Reject link”, “Write to log”, “Abort on error”.
· You can also output SQLSTATE, SQLCODE, DBMSCODE for debugging.
6.Transaction Size
· Similar to commit frequency — defines number of rows per commit.
· Example: Set Transaction Size = 10,000 → commits every 10k rows.
1. Performance tuning for large inserts
· Use Array Size and Transaction Size appropriately.
· Disable Auto-Commit.
· Use bulk load mode if supported (Oracle Direct Load).
· Avoid index updates during load — rebuild indexes afterward.
· Tune commit interval based on rollback segment.
2. Handling Deadlock / ORA-00060
· Identify conflicting transactions (using DB logs).
· Retry logic in job or re-run the batch.
· Use smaller commit intervals.
· For update operations, maintain a consistent order of access to rows.
3️⃣ Implementing UPSERT
Options:
· Use User-defined SQL with a MERGE statement.
· Or, in Transformer, split links into INSERT and UPDATE paths using lookup to check existing records.
3.Enable SQL tracing / logging
· Enable Trace Level or Verbose Logging in connector advanced properties.
· Oracle: set TraceSQL=Yes.
· Or use environment variables like DS_LOG_LEVEL=DEBUG to capture SQL in job log.
4.Before SQL / After SQL
· Before SQL: Executes before main operation (e.g., truncate table).
· After SQL: Executes after main operation (e.g., update status table).
Useful for pre/post-load logic within same job.
In DB
Connector, I enable the Reject link to capture any records that fail
during insert or update operations.
I also include columns like SQLCODE, SQLSTATE, and DBMSCODE in the reject link
for debugging.
For example, if a record violates a constraint, it gets routed to the reject
link with the corresponding SQL error code.
This helps in reprocessing or analysis without failing the entire job.
What is a Database Round Trip?
A database round trip means one complete request–response cycle between the DataStage job and the database server.
It includes:
1. Sending a query or batch of records from DataStage → Database
2. Database processing the request
3. Database sending results or acknowledgment → back to DataStage
Each time this happens, it’s a round trip.
⚙️ Example:
If you insert 1,000,000 rows and your Array Size =
1,
➡️
DataStage sends 1 row per request → 1,000,000 round trips
If you set Array Size = 1000,
➡️
DataStage sends 1000 rows per request → only 1,000 round trips
✅ Fewer round trips = less network overhead = faster performance.
📈 Key Performance Tip:
· Array Size or Batch Size controls how many rows go in a single round trip.
· Increasing it reduces communication between DataStage and DB → huge speed gain.
· But too large can consume more memory.
🧮 Example Scenario
You have a DataStage job inserting 1,000,000 rows into an Oracle table using the DB Connector stage.
Case 1: Array Size = 1
· DataStage sends 1 row per request
· So → 1,000,000 database round trips
· Suppose each trip takes 5 milliseconds (network + DB time)
🕒 Total time = 1,000,000 × 5 ms = 5,000,000 ms = ~83 minutes
Case 2: Array Size = 1,000
· DataStage sends 1,000 rows per request
· So → 1,000 round trips
· Same 5 ms per trip
🕒 Total time = 1,000 × 5 ms = 5,000 ms = 5 seconds
Real-World Practice
· In production, we usually set Array Size = 1000–5000
· It’s a trade-off:
o Too small → too many round trips (slow)
o Too large → memory usage increases
No comments:
Post a Comment