Databricks Concepts
Kevin Barlow
Data Practitioner


What do I have?

What do I want?

scikit-learn, SparkML, TensorFlowMLFlow
import pandas as pd
pd.describe(df)
# Spark DF
df.summary()
dbutils.data.summarize()
import bamboolib as bam
df

| count | category | price | shelf_loc | rating |
|---|---|---|---|---|
| 4 | horror | 12.50 | end | 3 |
| 6 | romance | 13.99 | top | 4.5 |
| 12 | sci-fi | 16.50 | bottom | 5 |
| 31 | romance | 9.99 | bottom | 3.5 |
| 23 | fantasy | 24.99 | top | 4 |
| 18 | horror | 19.99 | end | 2.5 |
| 19 | cooking | 17.50 | end | 4.5 |
| 7 | fantasy | 12.99 | top | 3 |
| 37 | sci-fi | 14.99 | bottom | 5 |
| count | category | price | shelf_loc | rating |
|---|---|---|---|---|
| 4 | 1 | 12.50 | 1 | 3 |
| 6 | 2 | 13.99 | 2 | 4.5 |
| 12 | 3 | 16.50 | 3 | 5 |
| 31 | 2 | 9.99 | 3 | 3.5 |
| 23 | 4 | 24.99 | 2 | 4 |
| 18 | 1 | 19.99 | 1 | 2.5 |
| 19 | 5 | 17.50 | 1 | 4.5 |
| 7 | 4 | 12.99 | 2 | 3 |
| 37 | 3 | 14.99 | 3 | 5 |

from databricks import feature_store
fs = feature_store.FeatureStoreClient()
fs.create_table(
name=table_name,
primary_keys=["wine_id"],
df=features_df,
schema=features_df.schema,
description="wine features"
)
Databricks Concepts