Delta Sharing: types and trade-offs

Introduction to Databricks Lakehouse

Gang Wang

Senior Data Scientist

Two approaches to sharing

$$

comparison: Native Sharing vs Open Protocol Sharing

$$

  • Native sharing - Databricks to Databricks
    • Seamless Unity Catalog integration
    • Full governance on both sides
  • Open protocol - any platform
    • Spark, pandas, Power BI, Snowflake
    • More setup, but wider reach
Introduction to Databricks Lakehouse

Databricks-native sharing

$$

  • Both parties are Databricks customers
  • Shared data appears in recipient's Unity Catalog
  • Full governance on both sides
  • Minimal setup - just create the share and grant access

$$

recraft: half: Two modern buildings connected by a glowing bridge between them, representing a seamless direct connection between two Databricks workspaces

Introduction to Databricks Lakehouse

Open protocol sharing

$$

# Recipient's code (using pandas)
import delta_sharing

profile = "config.share"
client = delta_sharing.SharingClient(
    profile
)
tables = client.list_all_tables()
df = delta_sharing.load_as_pandas(
    f"{profile}#share.schema.table"
)

$$

  • Recipient can use any platform
  • Recipient installs a Delta Sharing client
  • Spark, pandas, Power BI, Snowflake, Tableau
Introduction to Databricks Lakehouse

Cost considerations

$$

  • Same region - typically no egress charges
  • Cross-region - cloud provider charges for data transfer
  • Cross-cloud (e.g., Azure to AWS) - highest egress fees
  • Keep shared data near your main recipients

$$

layers: Same Region - Free or cheap, Cross-Region - Egress charges, Cross-Cloud - Highest fees

Introduction to Databricks Lakehouse

Choosing the right approach

$$

decision tree: Is the recipient on Databricks? Yes - use Native Sharing (full governance, low setup, audit logging). No - use Open Protocol (any platform, provider-only governance, moderate setup). Both - consider egress costs by region.

Introduction to Databricks Lakehouse

Summary

$$

  • Native sharing - Databricks-to-Databricks, seamless, full governance
  • Open protocol - any platform, broader reach, more setup
  • Egress charges increase with cross-region and cross-cloud sharing
  • Choose based on the recipient's platform and data location
Introduction to Databricks Lakehouse

Let's practice!

Introduction to Databricks Lakehouse

Preparing Video For Download...