Optimizing Dataflow Performance

Data Ingestion and Semantic Models with Microsoft Fabric

Alex Kuntz

Head of Cloud Curriculum, DataCamp

Staging in Dataflows Gen2

Temporarily holds data during transformation to optimize performance

  • Staging Artifacts:
    • Hidden internal Lakehouse storage used during data transformations
    • Automatically managed by Dataflows; not for direct user access
  • When to Use Staging:
    • Enabled by default for improved SQL endpoint performance
    • Disabled for direct Lakehouse and non-warehouse loading (can be re-enabled)
  • Removing Staging Data:
    • Disable staging and refresh (cleared after 30 days)
    • Delete the dataflow or workspace to remove immediately
Data Ingestion and Semantic Models with Microsoft Fabric

Accelerating Data Ingestion with Fast Copy

A high-speed data ingestion feature that scales to handle large datasets efficiently

  • Architecture: Redistributes heavy workloads from Power Query to a high-performance pipeline for faster processing
  • Benefit: Minimizes processing time by leveraging scalable backend resources for large data

DataFlows Gen2 Fast Copy Architecture

Data Ingestion and Semantic Models with Microsoft Fabric

Optimizing Fast Copy: Prerequisites and Key Settings

Prerequisites:

  • Files: 100 MB+ (CSV/Parquet)
  • Databases: 5M+ rows (Azure SQL DB, PostgreSQL)
  • Supported Connectors: ADLS Gen2, Blob Storage, SQL DB, Lakehouse, PostgreSQL, On premise SQL Server, Warehouse, Oracle
  • Supported Transformations: Combine files, Select columns, Change data types, Rename/Remove columns

Require Fast Copy Option:

  • Forces the use of Fast Copy, fails immediately if criteria is not met
  • Saves time by avoiding long wait times with slower processing
Data Ingestion and Semantic Models with Microsoft Fabric

Dataflow Gen2 Default Destination

  • Create stand-alone Dataflows for specific data destinations (Lakehouse, Warehouse, or KQL Database).

  • Preset data destination settings are applied automatically, speeding up development!

Preset Behaviors: Following are default behaviors and cannot be changed

  • Lakehouse: Replace update method, Dynamic schema
  • Warehouse/KQL Database: Append update method, Fixed schema

Default Data Destination in Dataflows Gen2

Data Ingestion and Semantic Models with Microsoft Fabric

Let's practice!

Data Ingestion and Semantic Models with Microsoft Fabric

Preparing Video For Download...