Benchmarking and continuous improvement

Introduction to Databricks Genie

Gang Wang

Senior Data Scientist

What is benchmarking

Gold Standard questions and answers

  • Calculates accuracy

recraft: half: Test suite checklist with green pass and red fail indicators next to questions, automated testing dashboard, quality scorecard

Introduction to Databricks Genie

Benchmarking frequency

recraft: half: Spreadsheet-style evaluation dataset with columns for question, expected SQL, expected answer, organized test cases

  • 10-20 Gold Standard questions covering simple counts to complex Trusted KPIs
  • Run during development - daily or after every significant curation change
  • After schema changes run the full suite immediately when adding new tables or columns
  • In production - run weekly or monthly as a health check
Introduction to Databricks Genie

Interpreting results

  • Pass rate gives you a headline score
  • Real value is in interpreting failures
  • "Total sales by city" passes, but "Total revenue by municipality" fails?
  • That means Genie needs Synonyms, not a logic change
  • Compare Expected SQL vs Generated SQL to find the gap

recraft: half: Results dashboard showing 75 percent pass rate pie chart, failed queries highlighted in red with diagnosis notes, improvement opportunities

Introduction to Databricks Genie

Prioritizing fixes

Priority matrix

  • Identify failure patterns in your benchmark results
  • Jump from result to the Curation menu to add synonyms and examples
  • Prioritize by frequency and business impact
  • Fix high-frequency, high-impact failures first
Introduction to Databricks Genie

Continuous improvement cycle

Improvement cycle

$$

Step 1: Build your test suite of Gold Standard questions.

Introduction to Databricks Genie

Continuous improvement cycle

Improvement cycle

$$

Step 2: Execute and interpret results. What failed and why?

Introduction to Databricks Genie

Continuous improvement cycle

Improvement cycle

$$

Step 3: Prioritize and curate fixes using synonyms, relationships, or SQL examples.

Introduction to Databricks Genie

Continuous improvement cycle

Improvement cycle

$$

Step 4: Verify improvement by re-running the benchmark. Confirm accuracy went up before deploying.

Introduction to Databricks Genie

Scaling across the organization

Scaling organization

  • Centralize data and permissions with Unity Catalog
  • Use Trusted Assets for company-wide KPIs
  • Reserve Editor access for trained Data Stewards
  • Mandate thumbs-up/thumbs-down feedback
  • Run regular benchmarks to catch accuracy drift
Introduction to Databricks Genie

Let's practice!

Introduction to Databricks Genie

Preparing Video For Download...