Tuesday, 23 December 2025

What are LAG () and LEAD ()?

 


LAG () and LEAD () are window (analytic) functions used to access data from another row without using self-joins.

·        LAG () → gets value from a previous row

·        LEAD () → gets value from a next row


LAG() and LEAD() allow comparison of a row with its previous or next row within a window.


🔹 Syntax

LAG (column, offset, default)

OVER (PARTITION BY column ORDER BY column)

 

LEAD (column, offset, default)

OVER (PARTITION BY column ORDER BY column)

·        offset → how many rows before/after (default = 1)

  • default → value if row doesn’t exist (optional)

🔹 Example Table: sales

month

amount

Jan

1000

Feb

1200

Mar

1100

Apr

1500


🔹 LAG() Example (Previous Value)

SELECT month, amount,

       LAG(amount) OVER (ORDER BY month) AS prev_amount

FROM sales;

Output:

month

amount

prev_amount

Jan

1000

NULL

Feb

1200

1000

Mar

1100

1200

Apr

1500

1100


🔹 LEAD() Example (Next Value)

SELECT month, amount,

       LEAD(amount) OVER (ORDER BY month) AS next_amount

FROM sales;

Output:

month

amount

next_amount

Jan

1000

1200

Feb

1200

1100

Mar

1100

1500

Apr

1500

NULL


🔹 Very Common Interview Use Case: Difference Calculation

SELECT month, amount,

       amount - LAG(amount) OVER (ORDER BY month) AS diff

FROM sales;

Shows month-to-month change


🔹 Using Default Value

LAG (amount, 1, 0) OVER (ORDER BY month)

First row gets 0 instead of NULL


🔹 PARTITION BY

SELECT emp, dept, salary,

       LAG(salary) OVER (

         PARTITION BY dept

         ORDER BY salary

       ) AS prev_salary

FROM emp;

Comparison happens within each department


🔹 ETL / Data Engineering Use Cases

Delta / incremental load detection
Change data capture logic
SCD Type-2 comparison
Trend analysis
Data quality checks

Bottom of Form

 

ROW_NUMBER vs RANK vs DENSE_RANK

 

ROW_NUMBER vs RANK vs DENSE_RANK

All three are window (analytic) functions used for ranking rows.


🔹 Sample Data

emp

dept

salary

A

IT

9000

B

IT

9000

C

IT

8000

D

IT

7000


🔹 ROW_NUMBER()

  • Assigns unique number to each row
  • No ties (even if values are same)

SELECT emp, salary,

       ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn

FROM emp_table;

Output:

emp

salary

rn

A

9000

1

B

9000

2

C

8000

3

D

7000

4

Always unique numbering.


🔹 RANK()

  • Same values get same rank
  • Gaps appear in ranking

SELECT emp, salary,

       RANK() OVER (ORDER BY salary DESC) AS rnk

FROM emp_table;

Output:

emp

salary

rnk

A

9000

1

B

9000

1

C

8000

3

D

7000

4

Rank 2 is skipped


🔹 DENSE_RANK()

  • Same values get same rank
  • No gaps in ranking

SELECT emp, salary,

       DENSE_RANK() OVER (ORDER BY salary DESC) AS drnk

FROM emp_table;

Output:

emp

salary

drnk

A

9000

1

B

9000

1

C

8000

2

D

7000

3

No missing rank numbers


🔹 Key Differences (Quick Table)

Function

Handles Ties

Gaps in Rank

Unique Values

ROW_NUMBER

No

No

Yes

RANK

Yes

Yes

No

DENSE_RANK

Yes

No

No


🔹 When to Use What

Use ROW_NUMBER

  • Remove duplicates
  • Pick exactly one record
  • Deduplication logic

Use RANK

  • Competition-style ranking
  • Reports where rank gaps matter

Use DENSE_RANK

  • Top-N per group
  • Business ranking without gaps

🔹 ETL / Data Engineering Example

Top 2 Salaries per Department:

SELECT *

FROM (

  SELECT emp, dept, salary,

         DENSE_RANK () OVER (

           PARTITION BY dept

           ORDER BY salary DESC

         ) AS dr

  FROM emp_table

) t

WHERE dr <= 2;


ROW_NUMBER gives unique numbers, RANK allows gaps, and DENSE_RANK removes gaps while handling ties.


Top of Form

 

Bottom of Form

 

What are Window Functions in SQL?

 

What are Window Functions?

Window functions perform calculations across a set of rows related to the current row, without collapsing rows (unlike GROUP BY).

  You get aggregated values + row-level data together.


Window functions calculate values over a window (set of rows) while keeping each row visible.


🔹 Why Window Functions Are Important

  • Needed for ranking
  • Needed for running totals
  • Needed for top N per group
  • Used heavily in ETL transformations
  • Avoids complex subqueries

🔹 Basic Syntax

function_name (expression)

OVER (

  PARTITION BY column

  ORDER BY column

  ROWS / RANGE clause

)


🔹 Example Table: sales

emp

dept

amount

A

HR

5000

B

HR

7000

C

IT

9000

D

IT

6000


🔹 Example 1: Running Total

SELECT emp, dept, amount,

       SUM(amount) OVER (

         PARTITION BY dept

         ORDER BY amount

       ) AS running_total

FROM sales;

Calculates cumulative sum per department


🔹 Example 2: Ranking Employees by Salary

SELECT emp, dept, amount,

       RANK () OVER (

         PARTITION BY dept

         ORDER BY amount DESC

       ) AS rank_in_dept

FROM sales;


🔹 Example 3: Top 1 Salary per Department

SELECT *

FROM (

  SELECT emp, dept, amount,

         ROW_NUMBER() OVER (

           PARTITION BY dept

           ORDER BY amount DESC

         ) AS rn

  FROM sales

) t

WHERE rn = 1;


🔹 Common Window Functions :

🔸 Ranking Functions

·        ROW_NUMBER()

·        RANK()

·        DENSE_RANK()

🔸 Aggregate Window Functions

·        SUM()

·        AVG()

  • COUNT()
  • MAX(), MIN()

🔸 Analytical Functions

  • LAG()
  • LEAD()
  • FIRST_VALUE()
  • LAST_VALUE()

🔹 LAG & LEAD (ETL Usage)

SELECT emp, amount,

       LAG(amount) OVER (ORDER BY emp) AS prev_amount

FROM sales;

Used for change detection / delta logic


🔹 GROUP BY vs Window Functions (Important)

GROUP BY

Window Function

Aggregates rows

Keeps all rows

Reduces output rows

Same number of rows

Cannot show row-level + aggregate

Can show both


🔹 ETL / DataStage Context

  • Replace Aggregator stage logic in SQL
  • Used in delta load, SCD, ranking
  • Improves performance vs subqueries

 

Most Recent posts

What are LAG () and LEAD ()?

  LAG () and LEAD () are window (analytic) functions used to access data from another row without using self-joins. ·         LAG () →...