Querying with the Databricks SQL agent

Databricks with the Python SDK

Avi Steinberg

Senior Software Engineer

Databricks Unity Catalog

  • Primary method to store data in Databricks
  • Contains schemas, which can contain multiple tables
  • Create tables in schema from a wide variety of sources
  • Multiple open-source schemas in the samples catalog

Databricks Unity Catalog

1 https://docs.databricks.com/aws/en/catalogs/
Databricks with the Python SDK

LangChain

  • Library that allows us to build and deploy LLM applications
  • Allows you to create AI agents that use SQL to query data LangChain
Databricks with the Python SDK

Python LangChain dependencies

Install Python package dependencies:

pip install --upgrade databricks-langchain langchain-community langchain 
                      databricks-sql-connector databricks-sqlalchemy

Import libraries:

from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.sql_database import SQLDatabase
from databricks_langchain import ChatDatabricks
Databricks with the Python SDK

Databricks SQL Warehouse ID

Databricks SQL Warehouse Connection Details

Databricks with the Python SDK

Authenticate LangChain to Databricks Workspace

Export environment variables:

os.environ['DATABRICKS_TOKEN'] = '<Your-Access-Token>'
os.environ["DATABRICKS_HOST"] = "<your-workspace-id>.cloud.databricks.com"
Databricks with the Python SDK

Databricks SQL agent

  • Ask an AI agent questions about data in a Databricks catalog schema
  • AI agent can answer questions about the data that require SQL
  • Specify verbose=True to have the AI agent show its work

LangChain Query output

Databricks with the Python SDK

Use Databricks SQL agent to query Catalog

db = SQLDatabase.from_databricks(
    catalog="samples", 
    schema="nyctaxi",
    warehouse_id=warehouse_id)
llm = ChatDatabricks(
    endpoint="databricks-meta-llama-3-3-70b-instruct",
    temperature=0.1,
    max_tokens=100)
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
agent = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)
# Query the Databricks SQL Agent
result = agent.run("What's the time and distance of the longest trip?")
print(result)
1 https://docs.databricks.com/aws/en/large-language-models/langchain#databricks-sql-agent
Databricks with the Python SDK

Create a LangChain SQL database from a Databricks Catalog

db = SQLDatabase.from_databricks(
    catalog="samples", 
    schema="nyctaxi",
    warehouse_id=<your-warehouse-id>
)
1 api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html
Databricks with the Python SDK

Create a LangChain LLM using a foundational model

llm = ChatDatabricks(

endpoint="databricks-meta-llama-3-3-70b-instruct",
temperature=0.1, max_tokens=100, )
  • temperature: a float between 0 and 1 used to specify the randomness in model responses
  • max_tokens: specify the maximum number of tokens in the response
1 https://docs.databricks.com/aws/en/large-language-models/langchain
Databricks with the Python SDK

Create a LangChain SQL database toolkit

toolkit = SQLDatabaseToolkit(db=db, llm=llm)
1 https://docs.databricks.com/aws/en/large-language-models/langchain
Databricks with the Python SDK

Create a Databricks SQL Agent

agent = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)

result = agent.run("What's the time and distance of the longest trip?") display(result)
The average trip takes approximately 15 minutes.
1 https://docs.databricks.com/aws/en/large-language-models/langchain
Databricks with the Python SDK

Let's query Databricks data!

Databricks with the Python SDK

Preparing Video For Download...