Data Modeling Foundations

  • Data Modeling is the art of structuring data so it’s both analytically powerful and operationally maintainable
  • Dimensional Modeling denormalize intentionally, trading storage space for query speed and analytical clarity. The result is a structure purpose-built for the questions the business actually asks. 

Fact Tables vs Dimensional Tables

Grain Definitions

  • Grain is the single most important decision in dimensional modeling.
  • Grain defines exactly what one row in your fact table represents

Star Schema vs Snowflake Schema

Slowly Changing Dimensions (SCDs)

Data Warehouse Architectures

Inmon – Top-down

  • Decide on:
    • The naming
    • The definition
    • Which data is valid if there are conflicts
    • And all the data cleaning operations before it enters the warehouse
  • Data is in a normalized form
  • Data moves to a department-focused data marts where end users and application can query it.
ProsCons
Single source of truth for the organizationNormalization = less storageEasy to change data marts to support reporting changesMore joins = slower response timeLengthy upfront workHigher startup cost

Kimball – bottom up

  • Once the data has been brought in, it is denormalized into a star schema
  • The goal is to get the data to reporting as fast as possible
    • This is done by first organizing and defining data definitions of one department of the organization, placing that data into the data mart, and making it available for reporting
    • After completing one department, a new department is chosen, and the cycle repeats
  • Various data attributes connect the data marts
  • Data marts are integrated into a data warehouse
ProsCons
How fast it can get up and running by taking an incremental approach resulting in lower upfront costs on warehousing projectsDenormalized data model makes the data easy to consume for usersDenormalization increases the processing time within the ETL process and can create duplicate data when used across different data marts.

ERD Design + SQL ETL

ERD Design

  • Before writing a single table, you always answer three questions:
    • What are the core entities?
    • What are the relationships between them?
    • What is the grain of each fact table?