Basic Column Expectations

Introduzione alla Data Quality con Great Expectations

Davina Moossazadeh

Data Scientist

The Shein Footwear Dataset

The first five rows of Kaggle's Shein Footwear Dataset, loaded into pandas. The DataFrame has the following columns: "name", "link", "price_usd", "mark_price_usd", "star_rating", "colour", "seller_name", "review_count", "sku_id", and "hero_image".

1 https://www.kaggle.com/datasets/atharvataras/shein-footwear-dataset
Introduzione alla Data Quality con Great Expectations

Row-level Expectations

  • Row-Level Expectations are applied to each row independently
    • succeed only if the condition holds for every row
Introduzione alla Data Quality con Great Expectations

Row-level Expectations

Missingness Expectation

gx.expectations.ExpectColumnValuesToNotBeNull(
    column="colour"
)

Type Expectation

gx.expectations.ExpectColumnValuesToBeOfType(
    column="review_count", type_="str"
)
Introduzione alla Data Quality con Great Expectations

Aggregate-level Expectations: distinct values

Distinct values Expectation

gx.expectations.ExpectColumnDistinctValuesToEqualSet(
    column="seller_name", value_set={"Womens Shoes"}
)
Introduzione alla Data Quality con Great Expectations

Aggregate-level Expectations: unique value count

Unique value count Expectation

gx.expectations.ExpectColumnUniqueValueCountToBeBetween(
    column="review_count", min_value=5, max_value=101
)
Introduzione alla Data Quality con Great Expectations

Aggregate-level Expectations: uniqueness

Uniqueness Expectation

gx.expectations.ExpectColumnValuesToBeUnique(
    column="sku_id"
)
Introduzione alla Data Quality con Great Expectations

Aggregate-level Expectations: mode

Mode Expectation

gx.expectations.ExpectColumnMostCommonValueToBeInSet(
    column="colour", value_set={"Khaki", "Purple", "Grey"}
)
Introduzione alla Data Quality con Great Expectations

Cheat sheet

Row-level Expectations:

ExpectColumnValuesToNotBeNull(
    column: str
)
ExpectColumnValuesToBeOfType(
    column: str, type_: str
)

Aggregate-level Expectations:

ExpectColumnDistinctValuesToEqualSet(
    column: str, value_set: set
)
ExpectColumnUniqueValueCountToBeBetween(
    column: str, 
    min_value: int, max_value: int
)
ExpectColumnValuesToBeUnique(column: str)
ExpectColumnMostCommonValueToBeInSet(
    column: str, value_set: set
)
Introduzione alla Data Quality con Great Expectations

Let's practice!

Introduzione alla Data Quality con Great Expectations

Preparing Video For Download...