Databricks Concepts
Kevin Barlow
Data Practitioner
What do I have?
What do I want?
scikit-learn
, SparkML
, TensorFlow
MLFlow
import pandas as pd
pd.describe(df)
# Spark DF
df.summary()
dbutils.data.summarize()
import bamboolib as bam
df
count | category | price | shelf_loc | rating |
---|---|---|---|---|
4 | horror | 12.50 | end | 3 |
6 | romance | 13.99 | top | 4.5 |
12 | sci-fi | 16.50 | bottom | 5 |
31 | romance | 9.99 | bottom | 3.5 |
23 | fantasy | 24.99 | top | 4 |
18 | horror | 19.99 | end | 2.5 |
19 | cooking | 17.50 | end | 4.5 |
7 | fantasy | 12.99 | top | 3 |
37 | sci-fi | 14.99 | bottom | 5 |
count | category | price | shelf_loc | rating |
---|---|---|---|---|
4 | 1 | 12.50 | 1 | 3 |
6 | 2 | 13.99 | 2 | 4.5 |
12 | 3 | 16.50 | 3 | 5 |
31 | 2 | 9.99 | 3 | 3.5 |
23 | 4 | 24.99 | 2 | 4 |
18 | 1 | 19.99 | 1 | 2.5 |
19 | 5 | 17.50 | 1 | 4.5 |
7 | 4 | 12.99 | 2 | 3 |
37 | 3 | 14.99 | 3 | 5 |
from databricks import feature_store
fs = feature_store.FeatureStoreClient()
fs.create_table(
name=table_name,
primary_keys=["wine_id"],
df=features_df,
schema=features_df.schema,
description="wine features"
)
Databricks Concepts