Filter stage

What is Filter Stage in Data Stage?

The Filter stage is a processing stage used to filter rows based on conditions.
It passes only the records that satisfy the condition and rejects the rest.


1️. Basic Filter Stage Diagram

Source Data

     |

     v

+------------+

|  FILTER    |

| salary >  |

| 50000     |

+------------+

     |

     v

Filtered Output

Explanation

  • Reads input rows
  • Applies condition (salary > 50000)
  • Outputs only matching rows

2️. Filter Stage with Reject Link

Source

  |

  v

+-----------+

|  FILTER   |

| status=  |

| 'ACTIVE' |

+-----------+

   |      |

   v      v

Output   Reject

(ACTIVE) (INACTIVE)

Explanation

  • Output link → rows meeting condition
  • Reject link → rows failing condition
  • Useful for data validation

3️. Filter Stage in Parallel Processing

             Input Data

                  |

       ----------------------------

       |            |             |

   Partition 1  Partition 2   Partition 3

       |            |             |

   +-------+    +-------+     +-------+

   |FILTER |    |FILTER |     |FILTER |

   +-------+    +-------+     +-------+

       |            |             |

       ----------------------------

                  |

           Filtered Output

Explanation

  • Filter runs on each partition
  • Very fast and scalable
  • No data movement between nodes

4️. Filter vs Transformer.

Simple Condition

(status='A')

     |

     v

  FILTER STAGE 

 

Complex Logic

(if-else, derivations)

     |

     v

TRANSFORMER STAGE


5️. Real-Time  Example.

Use case: Load only today’s transactions

Source Table

     |

     v

+------------------+

| FILTER           |

| txn_date =      |

| CURRENT_DATE    |

+------------------+

     |

     v

Target Table


6️. Filter Stage vs Where Clause (DB)

Database

   |

   |-- WHERE condition --> (BEST)

   |

   |-- No WHERE

           |

           v

    FILTER STAGE

Explanation

  • WHERE clause in SQL is faster
  • Use Filter stage when:
    • Source is file
    • Logic is dynamic
    • Reusable job design needed

7️. Common Filter Conditions :

salary > 50000

country = 'INDIA'

status <> 'INACTIVE'

order_date >= '2025-01-01'

ISNULL(email)


8️. Filter vs Sort:

FILTER → Removes rows

SORT   → Reorders rows


 


No comments:

Post a Comment

Most Recent posts

Transformer stage scenario based questions.

  Question: Design a data stage job to get a target output as below . Source:         Target  Eno Ename    Eno Ename  1   ...