Getting started with the Databricks SDK

Databricks with the Python SDK

Avi Steinberg

Software/Data Engineer

Why learn Databricks?

Profile Pic

  • Leading ML, AI and Data Engineering Cloud Platforms
  • Used by over 60% of Fortune 500 Companies

Databricks Emblem

1 https://www.databricks.com/company/newsroom/press-releases/databricks-raising-10b-series-j-investment-62b-valuation
Databricks with the Python SDK

What is a Databricks Workspace?

  • Databricks deployment of cloud resources
  • Environment to access Databricks assets
  • Example assets in Databricks Workspace: jobs, clusters and notebooks
Databricks with the Python SDK

Install Databricks SDK

Install the SDK

pip install databricks-sdk

Instantiate WorkspaceClient

from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
1 https://docs.databricks.com/en/dev-tools/sdk-python.html#language-venv
Databricks with the Python SDK

Authentication environment variables

  • Environment variable = key/value pair used to configure application behavior

  • DATABRICKS_CLIENT_SECRET

  • DATABRICKS_CLIENT_ID
  • DATABRICKS_HOST=<workspace_id>.cloud.databricks.com

Screenshot of Databricks Workspace Web Url

Databricks with the Python SDK

Authenticate using a Service Principal

  • Service Principal = security identity within a cloud platform representing an application

  • Create a Service Principal

  • Assign permissions to a Service Principal
  • Generate secret for Service Principal

Image of Secrets page for Databricks Workspace Service Principal

1 https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html#step-1-create-a-service-principal
Databricks with the Python SDK

Default authentication

from databricks.sdk import WorkspaceClient
import os

os.environ['DATABRICKS_CLIENT_SECRET'] = <Your-Client-Secret>
os.environ['DATABRICKS_CLIENT_ID'] = <Your-Databricks-Client-Id>
os.environ['DATABRICKS_HOST'] = <Your-Databricks-Host>
w = WorkspaceClient()
  • Automatically reads the values for the three environment variables
Databricks with the Python SDK

Interacting with our Databricks workspace

# Authenticate WorkspaceClient
from databricks.sdk import WorkspaceClient
import os
os.environ['DATABRICKS_CLIENT_SECRET'] = <Your-Client-Secret>
os.environ['DATABRICKS_CLIENT_ID'] = <Your-Databricks-Client-Id>
os.environ['DATABRICKS_HOST'] = <Your-Databricks-Host>
w = WorkspaceClient()

# Create Cluster: cluster = w.clusters.create(cluster_name="Test-Cluster")
# List Clusters: for cluster in w.clusters.list(): print(f"Clusters Name: {c.cluster_name}")
ClusterId=0113-13328-woj98c32
1 https://databricks-sdk-py.readthedocs.io/en/latest/clients/workspace.html
Databricks with the Python SDK

Let's practice!

Databricks with the Python SDK

Preparing Video For Download...