Data Cleaning and Preparation

Introduction to Power Query in Excel

Lyndsay Girard

Performance Analytics Consultant

ETL_Schematic_1bof4.jpg

Introduction to Power Query in Excel

ETL_Schematic_2bof4.jpg

Introduction to Power Query in Excel

ETL_Schematic_3bof4.png

Introduction to Power Query in Excel

ETL_Schematic_3cof4.png

Introduction to Power Query in Excel

ETL_Schematic_4of4.png

Introduction to Power Query in Excel

Importance of clean data

  • Ensures accuracy and reliability
  • Improves analysis efficiency and effectiveness
  • Enhances consistency and comparability
  • Fosters integrity

Image of woman holding cleaning supplies over laptop screen

Introduction to Power Query in Excel

Missing Data

Handling missing data

  • Data Importance/Relevance

Ch2_TableWithMissingData.png

Introduction to Power Query in Excel

Missing Data

Handling missing data

  • Data Importance/Relevance
  • Data Volume

Ch2_TableWithMissingData.png

Ch2_TableWithMissingData_highlighted.png

Introduction to Power Query in Excel

Missing Data

Handling missing data

  • Data Importance/Relevance
  • Data Volume
  • Data Source

Ch2_TableWithMissingData.png

Ch2_TableWithMissingData_highlighted.png

Introduction to Power Query in Excel

Missing Data

Handling missing data

  • Data Importance/Relevance
  • Data Volume
  • Data Source

Ch2_TableWithMissingData.png

Ch2_TableWithMissingData_Imputed.png

Introduction to Power Query in Excel

Outliers and entry errors

Outliers

  • Data point(s) that significantly deviate from the majority of values in a dataset.
  • Due to typo (entry error) or true value.

Handling outliers

  • Consider the impact on analysis
  • Imputation

Ch2_Scatter_Outlier_with_arrow.png

Introduction to Power Query in Excel

Duplicate Data

Why is duplicate data a problem?

  • Data inaccuracy.
  • Increased Storage and Processing Overhead.
  • Data Integrity.

Ch2_TableWithDuplicateData_highlighted.png

Introduction to Power Query in Excel

Duplicate Data

Why is duplicate data a problem?

  • Data inaccuracy.
  • Increased Storage and Processing Overhead.
  • Data Integrity.

Ch2_TableWithDuplicateData_strikethrough.png

Ch2_TableWithDuplicateData_deleted.png

Introduction to Power Query in Excel

Let's practice!

Introduction to Power Query in Excel

Preparing Video For Download...