Data Engineering in Microsoft Fabric

Transform and Analyze Data with Microsoft Fabric

Luis Silva

Solution Architect - Data & AI

Data Analytics End To End

Diagram showing the components of an end-to-end data analytics solution, including data sources, ingestion, prepare and transform, store, query and visualize and analyze

  • Ingest data from source and store it in a data lake
  • Prepare and transform the data
  • Visualize and analyze the data
Transform and Analyze Data with Microsoft Fabric

Data Factory

Diagram showing the components of an end-to-end data analytics solution, highlighting Data Pipelines in the Ingest component and Dataflows in the Prepare and Transform component

  • Ingest, prepare and transform data
  • Dataflows and Data Pipelines
Transform and Analyze Data with Microsoft Fabric

Dataflows

  • Low-code interface for data ingestion and transformation
  • Power Query transformation engine

Screenshot of the Dataflow designer showing a sample ingestion dataflow

Transform and Analyze Data with Microsoft Fabric

Data Pipelines

  • Collection of activities that perform a task
  • Types of activities:
    • Data movement (Copy activity, Dataflow)
    • Data transformation (Notebook, Stored Procedure, Script)
    • Control (Switch, If, ForEach, Wait)

Screenshot of the Data Pipeline designer showing a sample ingestion pipeline with a Copy activity and a Dataflow activity

Transform and Analyze Data with Microsoft Fabric

Synapse Data Engineering

Diagram showing the components of an end-to-end data analytics solution, highlighting Lakehouse items in the Store component, and Notebooks and Spark job in the Prepare and Transform component

  • Lakehouses
  • Notebooks
  • Apache Spark Job definitions
Transform and Analyze Data with Microsoft Fabric

Lakehouses

  • Structured data (tables)
  • Unstructured data (files)

Screenshot of the Lakehouse Explorer in the Fabric portal, showing a Lakehouse containing Tables and Files

Transform and Analyze Data with Microsoft Fabric

Notebooks

  • Interactive web interface
    • Data manipulation code
    • Data visualizations
    • Comments / Explanations
  • Multi-language support:
    • PySpark (Python)
    • Spark (Scala)
    • Spark SQL (SQL)
    • SparkR (R)

Screenshot of the Notebook editor, showing a sample notebook containing text descriptions, Python code and a histogram chart

Transform and Analyze Data with Microsoft Fabric

Apache Spark Job Definitions

  • Submit batch/streaming jobs to Spark clusters
  • Alternative or complementary to Notebooks:
    • Notebooks for data exploration, prototyping and collaborative development
    • Spark Job Definition for automation of production-ready data processing code

Screenshot of a Spark Job definition showing configuration parameters

Transform and Analyze Data with Microsoft Fabric

Synapse Data Warehouse

Diagram showing the components of an end-to-end data analytics solution, highlighting Warehouse items in the Store component

  • Behaves like a traditional relational data warehouse
  • Stores data in OneLake using the open Delta Lake format
  • Enables interoperability with other Fabric workloads
  • No need to create multiple copies of data
Transform and Analyze Data with Microsoft Fabric

Choosing a Data Store

  • Lakehouse
    • Unstructured data (files)
    • Spark as the primary development interface

 

  • Warehouse
    • Structured data (tables)
    • T-SQL as the primary development interface
Transform and Analyze Data with Microsoft Fabric

Choosing a Data Copy Tool

Table summarizing some aspects to consider when choosing between Pipeline Copy Activity, Dataflow, and Spark. Aspects include Amount of code required, developer skills, data sources and transformation complexity

Transform and Analyze Data with Microsoft Fabric

Choosing a Data Copy Tool

Table summarizing some aspects to consider when choosing between Pipeline Copy Activity, Dataflow, and Spark. Aspects include Amount of code required, developer skills, data sources and transformation complexity

Transform and Analyze Data with Microsoft Fabric

Choosing a Data Copy Tool

Table summarizing some aspects to consider when choosing between Pipeline Copy Activity, Dataflow, and Spark. Aspects include Amount of code required, developer skills, data sources and transformation complexity

Transform and Analyze Data with Microsoft Fabric

Let's practice!

Transform and Analyze Data with Microsoft Fabric

Preparing Video For Download...