Job creation and management

Databricks with the Python SDK

Avi Steinberg

Senior Software Engineer

Databricks Notebook path

  • Databricks Jobs can run code written in a Databricks Notebook
  • WorkspaceClient.current_user.me().user_name gets the username of the logged in user
  • The notebook path for a notebook is /Users/${username}/${notebook_name}
# Assume there is a notebook in our workspace called My_Notebook
w = WorkspaceClient()
notebook_path = f'/Users/{w.current_user.me().user_name}/My_Notebook'
Databricks with the Python SDK

Creating a Databricks job

WorkspaceClient.jobs.create(createParams)

Pass in the name and tasks parameters to describe the job being created

createParams:
  name: str
  tasks: List[Task]
Task:
  description=str
  notebook_task=NotebookTask
  task_key=str
NotebookTask:
  notebook_path: str
1 https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs/jobs.html
Databricks with the Python SDK

Creating and running a Databricks job

from databricks.sdk import WorkspaceClient

# Create notebook path pointing to notebook called "My_Notebook"
w = WorkspaceClient()
notebook_path = f'/Users/{w.current_user.me().user_name}/My_Notebook'

# Create a job that runs the Datacamp_Test_Notebook new_job = w.jobs.create(name='sdk-dc-project-task', tasks=[ jobs.Task( description="create_notebook_test", notebook_task=jobs.NotebookTask( notebook_path=notebook_path), task_key="my-key") ]) print(f"New Job Id={new_job.job_id})
w.jobs.run_now(job_id=new_job.job_id).result() # Run created Job
1 https://docs.databricks.com/en/dev-tools/sdk-python.html
Databricks with the Python SDK

Listing Databricks jobs

from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
jobs = w.jobs.list()
for job in jobs:
    print(f"JobId={job.job_id}")
JobId=888763141802192
JobId=37050453972815
JobId=681550316180975
JobId=629994089852037
Databricks with the Python SDK

Deleting a Databricks job

WorkspaceClient.jobs.delete(job_id: str)
1 1. https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs/jobs.html
Databricks with the Python SDK

Deleting a Databricks job

# Pre-requisite to this is to create desired 
# Databricks notebook in databricks cluster
w = WorkspaceClient()
notebook_path = f'/Users/{w.current_user.me().user_name}/My_Notebook'
# Create a job that runs the Datacamp_Test_Notebook
new_job = w.jobs.create(name='sdk-dc-project-task',
                            tasks=[
                              jobs.Task(description="create_notebook_test",
                                        existing_cluster_id=cluster_id,
                      notebook_task=jobs.NotebookTask(notebook_path=notebook_path),
                                        task_key="my-key")
                            ])

w.jobs.delete(job_id=new_job.job_id)
1 1. https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs/jobs.html
Databricks with the Python SDK

Cron syntax

Cron Expression to Schedule Job at:

  1. 3:30:00 AM Every Day = 0 30 3 * * ?
  2. 2:45:00 PM Every Day = 0 45 2 * * ?
1 https://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html
Databricks with the Python SDK

Scheduling a job

We can schedule a job to run notebook every day at 3 am

# Create a job that runs the Datacamp_Test_Notebook
cron_expression = "0 0 3 * * ?"

created_job = w.jobs.create( name='sdk-dc-project-task', tasks=[jobs.Task(description="test",
notebook_task=jobs.NotebookTask(notebook_path=notebook_path), task_key="my-key")], timeout_seconds=3600,
schedule=jobs.CronSchedule(quartz_cron_expression=cron_expression, timezone_id="America/New_York") )
1 https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs/jobs.html
Databricks with the Python SDK

Let's practice!

Databricks with the Python SDK

Preparing Video For Download...