What is Filter Stage in Data Stage?
The Filter stage is a processing stage used to filter
rows based on conditions.
It passes only the records that satisfy the condition and rejects the rest.
1️. Basic Filter Stage Diagram
Source Data
|
v
+------------+
| FILTER |
| salary > |
| 50000 |
+------------+
|
v
Filtered Output
Explanation
- Reads input rows
- Applies condition (salary > 50000)
- Outputs only matching rows
2️. Filter Stage with Reject Link
Source
|
v
+-----------+
| FILTER |
| status= |
| 'ACTIVE' |
+-----------+
| |
v v
Output Reject
(ACTIVE) (INACTIVE)
Explanation
- Output link → rows meeting condition
- Reject link → rows failing condition
- Useful for data validation
3️. Filter Stage in Parallel Processing
Input Data
|
----------------------------
| | |
Partition 1 Partition 2 Partition 3
| | |
+-------+ +-------+ +-------+
|FILTER | |FILTER | |FILTER |
+-------+ +-------+ +-------+
| | |
----------------------------
|
Filtered Output
Explanation
- Filter runs on each partition
- Very fast and scalable
- No data movement between nodes
4️. Filter vs Transformer.
Simple Condition
(status='A')
|
v
FILTER STAGE
Complex Logic
(if-else, derivations)
|
v
TRANSFORMER STAGE
5️. Real-Time Example.
Use case: Load only today’s transactions
Source Table
|
v
+------------------+
| FILTER |
| txn_date = |
| CURRENT_DATE |
+------------------+
|
v
Target Table
6️. Filter Stage vs Where Clause (DB)
Database
|
|-- WHERE condition --> (BEST)
|
|-- No WHERE
|
v
FILTER STAGE
Explanation
- WHERE clause in SQL is faster
- Use Filter stage when:
- Source is file
- Logic is dynamic
- Reusable job design needed
7️. Common Filter Conditions :
salary > 50000
country = 'INDIA'
status <> 'INACTIVE'
order_date >= '2025-01-01'
ISNULL(email)
8️. Filter vs Sort:
FILTER → Removes rows
SORT → Reorders rows
No comments:
Post a Comment