Data Engineering in Microsoft Fabric

Transform and Analyze Data with Microsoft Fabric

Luis Silva

Solution Architect - Data & AI

Data Analytics End To End

Diagram showing the components of an end-to-end data analytics solution, including data sources, ingestion, prepare and transform, store, query and visualize and analyze

Ingest data from source and store it in a data lake
Prepare and transform the data
Visualize and analyze the data

Data Factory

Diagram showing the components of an end-to-end data analytics solution, highlighting Data Pipelines in the Ingest component and Dataflows in the Prepare and Transform component

Ingest, prepare and transform data
Dataflows and Data Pipelines

Dataflows

Low-code interface for data ingestion and transformation
Power Query transformation engine

Screenshot of the Dataflow designer showing a sample ingestion dataflow

Data Pipelines

Collection of activities that perform a task
Types of activities:
- Data movement (Copy activity, Dataflow)
- Data transformation (Notebook, Stored Procedure, Script)
- Control (Switch, If, ForEach, Wait)

Screenshot of the Data Pipeline designer showing a sample ingestion pipeline with a Copy activity and a Dataflow activity

Synapse Data Engineering

Diagram showing the components of an end-to-end data analytics solution, highlighting Lakehouse items in the Store component, and Notebooks and Spark job in the Prepare and Transform component

Lakehouses
Notebooks
Apache Spark Job definitions

Lakehouses

Structured data (tables)
Unstructured data (files)

Screenshot of the Lakehouse Explorer in the Fabric portal, showing a Lakehouse containing Tables and Files

Notebooks

Interactive web interface
- Data manipulation code
- Data visualizations
- Comments / Explanations
Multi-language support:
- PySpark (Python)
- Spark (Scala)
- Spark SQL (SQL)
- SparkR (R)

Screenshot of the Notebook editor, showing a sample notebook containing text descriptions, Python code and a histogram chart

Apache Spark Job Definitions

Submit batch/streaming jobs to Spark clusters
Alternative or complementary to Notebooks:
- Notebooks for data exploration, prototyping and collaborative development
- Spark Job Definition for automation of production-ready data processing code

Screenshot of a Spark Job definition showing configuration parameters

Synapse Data Warehouse

Diagram showing the components of an end-to-end data analytics solution, highlighting Warehouse items in the Store component

Behaves like a traditional relational data warehouse
Stores data in OneLake using the open Delta Lake format
Enables interoperability with other Fabric workloads
No need to create multiple copies of data

Choosing a Data Store

Lakehouse
- Unstructured data (files)
- Spark as the primary development interface

Warehouse
- Structured data (tables)
- T-SQL as the primary development interface

Choosing a Data Copy Tool

Table summarizing some aspects to consider when choosing between Pipeline Copy Activity, Dataflow, and Spark. Aspects include Amount of code required, developer skills, data sources and transformation complexity

Choosing a Data Copy Tool

Let's practice!

Transform and Analyze Data with Microsoft Fabric