Basic operations with Databricks Python SDK

Databricks with the Python SDK

Avi Steinberg

Senior Software Engineer

Databricks Clusters API

What is a Databricks Cluster?

  • A set of cloud compute resources and configurations
  • Typically used to you run data-intensive workloads
  • Shared by multiple users for collaborative analysis

Examples:

  • Production ETL pipelines
  • Streaming analytics

Databricks Cluster Images

The Databricks SDK Clusters API allows you to create, start, edit, list and delete clusters

Databricks with the Python SDK

Listing clusters

Let's walk through an example of listing the clusters created on a Databricks workspace

from databricks.sdk import WorkspaceClient
w = WorkspaceClient()

clusters = w.clusters.list() for cluster in clusters: print(f"ClusterId={cluster.cluster_id}")
ClusterId=0113-13328-woj98c32
1 1. https://databricks-sdk-py.readthedocs.io/en/stable/workspace/compute/clusters.html#databricks.sdk.service.compute.ClustersExt.list
Databricks with the Python SDK

Databricks Jobs API

What is a Databricks job?

  • Run code stored in a Databricks notebook on a Databricks cluster with scalable resources
  • Single task or a large, multi-task workflow with complex dependencies
  • Can be scheduled to run at specific times using CRON syntax

The Jobs API allows you to create, edit, and delete jobs

1 https://docs.databricks.com/api/workspace/jobs
Databricks with the Python SDK

Listing jobs

Use the .list() method to list the jobs in the authenticated user's Workspace

from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
c = w.jobs.list()
for job in jobs:
    print(f"JobId={job.job_id}")
JobId=888763141802192
JobId=37050453972815
JobId=681550316180975
JobId=629994089852037
1 1. https://databricks-sdk-py.readthedocs.io/en/stable/workspace/jobs/jobs.html
Databricks with the Python SDK

Databricks Jobs Dashboard

The resources for a workspace can be visualized online at https://<workspace-deployment-name>.cloud.databricks.com

Databricks Workspace Dashboard (UI)

Databricks with the Python SDK

Databricks Notebooks

Run Python code in Databricks Notebook

Screenshot of Databricks Notebook

Databricks with the Python SDK

Let's try it out!

Databricks with the Python SDK

Preparing Video For Download...