Data Exploration and Transformation

Transform and Analyze Data with Microsoft Fabric

Luis Silva

Solution Architect - Data & AI

Data transformation scenarios

Why do we need to transform the data?

  • Remove or clean incorrect data from source systems

Image of data table with some invalid entries highlighted in red

Transform and Analyze Data with Microsoft Fabric

Data transformation scenarios

Why do we need to transform the data?

  • Remove or clean incorrect data from source systems.
  • Convert data types to ensure consistency.

Image representing a data column storing numbers as text being converted to actual numbers

Transform and Analyze Data with Microsoft Fabric

Data transformation scenarios

Why do we need to transform the data?

  • Remove or clean incorrect data from source systems.
  • Convert data types to ensure consistency.
  • Filter out irrelevant data.

Image that shows a table  before and after filtering out rows that do not meet a given criteria, in this case keeping only records where a certain field is not equal to 'X'

Transform and Analyze Data with Microsoft Fabric

Data transformation scenarios

Why do we need to transform the data?

  • Remove or clean incorrect data from source systems.
  • Convert data types to ensure consistency.
  • Filter out irrelevant data.
  • Combine data from multiple source systems.

Diagram representing two tables that are combined into one-

Transform and Analyze Data with Microsoft Fabric

Data transformation scenarios

Why do we need to transform the data?

  • Remove or clean incorrect data from source systems.
  • Convert data types to ensure consistency.
  • Filter out irrelevant data.
  • Combine data from multiple source systems.
  • Put the data in a format more suitable for analysis and reporting.

Diagram representing one large table that is transformed into a star schema

Transform and Analyze Data with Microsoft Fabric

Cleansing data

  • Reduce irrelevant rows (remove top rows, remove bottom rows)
  • Remove duplicates
  • Remove blank rows
  • Remove error rows
  • Convert data types

Image of source data table with some invalid entries highlighted in red and an arrow indicating how these invalid records are removed from the destination table

Transform and Analyze Data with Microsoft Fabric

Filtering data

  • Narrowing down datasets

    • Select only a subset of the data
    • Discard records that do no meet certain criteria
  • Example: SQL WHERE clause

Screenshot of a query that selects data from the Orders table where the State column equals NY

Diagram that shows a table filtering out any row not related to the state of NY

Transform and Analyze Data with Microsoft Fabric

Merging and Joining data

  • Join two or more datasets
  • Useful for building star schemas

Diagram illustrating the merge of two tables

Transform and Analyze Data with Microsoft Fabric

Data Aggregation and De-aggregation

  • Aggregate: compile and summarize data

Table with sales orders per state aggregated into a new table with the total number of orders and total sales amount per state

  • Disaggregate: split data, usually single column into multiple columns

Table with branch locations where the location has city, state deaggregated into a new table where city and state are in separate columns

Transform and Analyze Data with Microsoft Fabric

Data exploration and transformation tools

  • Dataflows
  • Data Wrangler

Two screenshots side by side, showing the graphical interface of dataflows and data wrangler

  • Spark notebooks
  • SQL scripts

Two screenshots side by side, showing a sample Spark notebook and a SQL script

Transform and Analyze Data with Microsoft Fabric

Dataflows

  • Low-code graphical interface
  • Cleansing
  • Filtering
  • Aggregating

Screenshot of the Dataflow designer with a couple of sample queries and showing some of the transformation operations available

Transform and Analyze Data with Microsoft Fabric

Data Wrangler

  • Notebook-based tool for exploratory data analysis
  • Data display grid
  • Dynamic summary statistics
  • Built-in visualizations
  • Library of common data-cleaning operations

Screenshot of Data Wrangler interface showing some sample operations

Transform and Analyze Data with Microsoft Fabric

Spark notebooks

  • Variety of languages
  • Python libraries for data manipulation
    • PySpark for large datasets
    • Pandas for smaller datasets

Screenshot of a Spark notebooks with some sample PySpark code

Transform and Analyze Data with Microsoft Fabric

SQL scripts

  • T-SQL statements for data manipulation: SELECT, INSERT, UPDATE, DELETE
  • JOIN to merge data
  • CAST and CONVERT to convert data types

Screenshot of the SQL editor with a sample SELECT query

Transform and Analyze Data with Microsoft Fabric

Let's practice!

Transform and Analyze Data with Microsoft Fabric

Preparing Video For Download...