Data Denormalization

Transform and Analyze Data with Microsoft Fabric

Luis Silva

Solution Architect - Data & AI

What is Normalization?

  • Organize data to reduce redundancy and increase integrity
  • Based on principles set in 1970 by computer scientist Tedd Codd, the inventor of the relational model
  • Third Normal Form (3NF): Non-key attributes depend only on the primary key
Transform and Analyze Data with Microsoft Fabric

What is Normalization?

  • Normalization is achieved by using keys and new tables to replace attributes that otherwise would lead to data redundancy.

 

Diagram illustrating the concept of splitting a table in three to reduce redundancy

Transform and Analyze Data with Microsoft Fabric

Normalization example

Sample table containing video games titles along with their publishers and genres. Example row: Game title Gran Turismo, Publisher Sony, genre Racing

  • Lots of replicated text entries for the values of publisher and genre.
Transform and Analyze Data with Microsoft Fabric

Normalization example

Sample table containing video games titles along with their publishers and genres. The text descriptions for publishers and genres have been moved to separate tables and the text entries in the games table have been replaced with numeric IDs referencing the separate publishers and genres master tables

  • Text entries for the values of publisher and genre now appear only once.
  • Numeric keys replacing publisher and genre in the Games table take up less space
Transform and Analyze Data with Microsoft Fabric

Denormalization

  • Denormalization uses redundancy to flatten the data model; it is the opposite of normalization
  • Denormalization leads to fewer tables at the expense of increased redundancy

Diagram illustrating the concept of joining multiple tables into one

Transform and Analyze Data with Microsoft Fabric

When should you use Normalization?

  • OLTP transactional systems

    • Optimizing for data writes (individual inserts, updates and deletes)
    • Ensuring data integrity
  • Fact tables

    • Millions of rows
    • Reducing storage space
    • Making the model simpler to understand
    • Basis for the star schema

 

Diagram of a star schema, highlighting the fact table

Transform and Analyze Data with Microsoft Fabric

When should you use Denormalization?

  • Dimension tables
    • Generally much smaller than fact tables
    • Redundancy = Fewer joins = Faster queries
    • Improved query performance outweighs the cost of storing redundant data
    • Star schema is simpler than snowflake schema

 

Diagram of a star schema, highlighting the dimension tables

Transform and Analyze Data with Microsoft Fabric

Implementing denormalization

 

 

Icons representing three tools: SQL, Spark and Dataflows

Transform and Analyze Data with Microsoft Fabric

Implementing denormalization with SQL

  • SELECT + JOIN statement
-- [dim_videogames]: Videogames table
-- [dim_genres]: Genres table
-- [dim_publishers]: Publishers table

SELECT game_id, title, gen.genre, pub.publisher
FROM dim_videogames_norm vidg
JOIN dim_genres gen
  ON vidg.genre_id = gen.genre_id
JOIN dim_publishers pub
  ON vidg.publisher_id = pub.publisher_id
Transform and Analyze Data with Microsoft Fabric

Implementing denormalization with Spark

  • DataFrame join( ) and select( )
# [videogamesDF]: Videogames DataFrame
# [genresDF]: Genres DataFrame
# [publishersDF]: Publishers DataFrame

videogames1DF = videogamesDF.join(genresDF, ["genre_id"])
videogamesdenormDF = videogames1DF.join(publishersDF, ["publisher_id"])
videogamesdenormDF.select("game_id", "title", "genre_id", "publisher_id").show()
Transform and Analyze Data with Microsoft Fabric

Implementing denormalization with Dataflows

  • Use queries to load data
  • Use Merge queries or Merge queries as new transformations to join the queries

Screenshot of a Dataflow that uses the Merge queries as new to join the queries videogames, genres and publishers

Transform and Analyze Data with Microsoft Fabric

Let's practice!

Transform and Analyze Data with Microsoft Fabric

Preparing Video For Download...