Fact and Dimension tables

Dimension tables store descriptive attributes, while fact tables store measurable business data and foreign keys referencing dimensions. Fact data is loaded after dimensions using surrogate keys generated during ETL.

Business Scenario (Sales Data)

A retail company wants to analyze:

  • Total sales

  • Sales by product

  • Sales by customer

  • Sales by date

  • Sales by store

    Dimension Tables (Descriptive Data)

    📌 DIM_CUSTOMER

    Stores customer details.

    Customer_Key (SK)Customer_ID (BK)Customer_NameCityCountry
    101C001RahulDelhiIndia
    102C002AnithaChennaiIndia

    👉 Customer_Key = Surrogate Key (generated in ETL)
    👉 Customer_ID = Business Key (from source)


    📌 DIM_PRODUCT

    Product_KeyProduct_IDProduct_NameCategory
    201P1001LaptopElectronics
    202P1002MobileElectronics

    📌 DIM_DATE

    Date_KeyFull_DateMonthYear
    202401012024-01-01Jan2024

    📌 DIM_STORE

    Store_Key  Store_ID  Store_Name City
    301S01  BigBazaar     Delhi

    3️⃣ Fact Table (Transactional / Measurable Data)

    📌 FACT_SALES

    Date_KeyCustomer_KeyProduct_KeyStore_KeyQuantitySales_Amount
    202401011012013012120000

    👉 Fact table contains only keys + measures
    👉 No descriptive text


    4️⃣ Source Data (OLTP / Flat File Example)

    Order_IDOrder_DateCustomer_IDProduct_IDStore_IDQtyAmount
    O10012024-01-01C001P1001S012120000

    5️⃣ How Data is Loaded (ETL Flow – Data Stage Style)

    🔹 Step 1: Load Dimension Tables

    Example: Customer Dimension

    1. Read source customer data

    2. Check if Customer_ID already exists

    3. If new → generate Surrogate Key

    4. Insert into DIM_CUSTOMER

    📌 Tools used:

    • Lookup stage

    • Transformer (for surrogate key)

    • Slowly Changing Dimension (if applicable)


    🔹 Step 2: Load Fact Table

    1. Read sales transaction data

    2. Lookup:

      • Customer_Key from DIM_CUSTOMER

      • Product_Key from DIM_PRODUCT

      • Date_Key from DIM_DATE

      • Store_Key from DIM_STORE

    3. Populate measures (Quantity, Sales_Amount)

    4. Insert into FACT_SALES

    📌 Tools used:

    • Multiple Lookup stages

    • Transformer

    • Fact load job


    6️⃣ DataStage Job Design (High Level)

    Source Sales File | v Transformer | Lookups (Customer, Product, Date, Store) | v FACT_SALES Table

No comments:

Post a Comment

Most Recent posts

IBM Cloud Pak for Datastage