Sql Indexes

 

An index is a pointer structure created on a table column to speed up searching, filtering, and sorting.

Without an index → database scans the entire table (slow).
With an index → database directly jumps to the required rows (fast).

Example

Suppose you have a table with 10 lakh records:

SELECT * FROM employees WHERE employee_id = 1203;

Without index → scans all 10 lakh rows.

With index on employee_id → jumps directly to row 120

 How an Index Works :  

Internally, most databases use B-Tree structure.

It stores sorted values

Allows binary search

             Speeds up data retrieval

 

 Types of Indexes

1️ Primary Index

Automatically created on primary key.

2️ .Unique Index

Ensures values are unique (like email, SSN).

3️. Non-Unique Index

Created on columns frequently used in:

WHERE

JOIN

ORDER BY

GROUP BY

4️ Composite Index

Index on multiple columns.

Example:

CREATE INDEX idx_emp_dept ON employees(department_id, salary);

5️ Bitmap Index (Oracle)

Best for low-cardinality columns like gender (M/F), status (Y/N).

6️ Full-text Index

Used for searching long text like articles, comments.


Advantages of Indexes

Faster SELECT queries
Faster searching
Faster sorting & grouping
Reduces full table scans


⚠️ Disadvantages of Indexes

Indexes speed up SELECT, but slow down:

INSERT

UPDATE

DELETE

Because the index also needs to be updated.

Indexes also take extra disk space.


When to Create Indexes? (Important)

Create index on columns used in:

WHERE
JOIN
ORDER BY
GROUP BY
Foreign keys

Do NOT create index on:

Columns with high updates
Very small tables
Columns with very few distinct values (except bitmap index)


Simple Real-Time Example

Scenario:
You frequently run:

SELECT * FROM orders WHERE order_date = '2024-01-01';

Solution:

CREATE INDEX idx_order_date ON orders(order_date);


§  Clustered vs Non-Clustered Index
B-Tree vs Bitmap Index
 How to check index usage
 How indexes affect performance

-------------------------------------------------------

1. How Indexes IMPROVE Performance

1) Faster Searching (Main Benefit)

Without index → Full Table Scan (scans all rows)
With index → Index Range Scan (jumps directly to matching rows)

Example:

SELECT * FROM employees WHERE emp_id = 120;

If emp_id is indexed → result in milliseconds.

2) Faster JOIN Operations

JOIN uses indexed columns to match rows quickly.

Example:

SELECT * FROM emp e JOIN dept d ON e.dept_id = d.dept_id;

Index on dept_id improves join speed 10x–50x.


3) Faster Sorting (ORDER BY)

If the column in ORDER BY is indexed, sorting becomes faster because index stores values in sorted form (B-tree).


4) Faster GROUP BY / Aggregations

If grouping column is indexed, database avoids scanning everything.

Example:

SELECT dept_id, count(*) FROM emp GROUP BY dept_id;

If dept_id has index → performance boost.


Summary of Positive Impact:

Query Type

Speed Up with Index

WHERE filters

Yes

JOIN

Yes

ORDER BY

Yes

GROUP BY

Yes

DISTINCT

Yes


2. How Indexes DECREASE Performance (Negative Impact)

Indexes do NOT always help.
Sometimes they hurt performance.


 1) Slow INSERT operations

Every time a new row is inserted, DB must also insert the value into the index tree.

So:

More indexes → Slower inserts


2) Slow UPDATE

If you update an indexed column, DB must reorder the index.

Example:

UPDATE emp SET salary = 60000 WHERE emp_id = 101;

If salary has an index → index must be updated too.


3) Slow DELETE

Deleting a row means:

Removing from table

Removing from index

More indexes → slower deletes.


4) Extra Storage

Indexes consume disk space (B-tree structure).

Big tables → big indexes.


5) Index not used – overhead remains

Sometimes the query is written in a way that prevents index usage:

Examples:

WHERE UPPER(name) = 'BINDU'   -- index on name won't be used

WHERE salary + 1000 > 5000    -- index not used

WHERE TO_CHAR(order_date) = '2025-01-01' -- index not used

These functions break index usage → DB does full scan → slower.


3. When Indexes Help (Use Cases)

1. Large tables
2. Columns frequently used in filters
3.Columns used in joins
4. Columns with high selectivity (distinct values)
5. Columns used in sorting or grouping


4. When Indexes DO NOT Help

Very small tables
Columns with only 2–3 values (gender, status) Except bitmap
Columns rarely used in WHERE or JOIN
Columns heavily updated
Expressions on indexed columns


5. Real-Time Example

🧪 Scenario:

You have a table EMP with 10 million records.

Query:

SELECT * FROM emp WHERE email = 'abc@x.com';

With index on email:

            10–50 ms

Without index:

5–10 seconds

Full table scan

High CPU usage

Slows down whole DB


Summary: Good Index = High Performance

Good SELECT performance
Good JOIN performance
Less workload on DB

But…

Too many indexes = Slow write operations (insert, update, delete)




 


No comments:

Post a Comment

Most Recent posts

Copy and Modify Stages

In IBM Infosphere DataStage , both Copy Stage and Modify Stage are simple processing stages used in parallel jobs , but their purpose i...