Basic Column Expectations

Introduction to Data Quality with Great Expectations

Davina Moossazadeh

Data Scientist

The Shein Footwear Dataset

The first five rows of Kaggle's Shein Footwear Dataset, loaded into pandas. The DataFrame has the following columns: "name", "link", "price_usd", "mark_price_usd", "star_rating", "colour", "seller_name", "review_count", "sku_id", and "hero_image".

1 https://www.kaggle.com/datasets/atharvataras/shein-footwear-dataset
Introduction to Data Quality with Great Expectations

Row-level Expectations

  • Row-Level Expectations are applied to each row independently
    • succeed only if the condition holds for every row
Introduction to Data Quality with Great Expectations

Row-level Expectations

Missingness Expectation

gx.expectations.ExpectColumnValuesToNotBeNull(
    column="colour"
)

Type Expectation

gx.expectations.ExpectColumnValuesToBeOfType(
    column="review_count", type_="str"
)
Introduction to Data Quality with Great Expectations

Aggregate-level Expectations: distinct values

Distinct values Expectation

gx.expectations.ExpectColumnDistinctValuesToEqualSet(
    column="seller_name", value_set={"Womens Shoes"}
)
Introduction to Data Quality with Great Expectations

Aggregate-level Expectations: unique value count

Unique value count Expectation

gx.expectations.ExpectColumnUniqueValueCountToBeBetween(
    column="review_count", min_value=5, max_value=101
)
Introduction to Data Quality with Great Expectations

Aggregate-level Expectations: uniqueness

Uniqueness Expectation

gx.expectations.ExpectColumnValuesToBeUnique(
    column="sku_id"
)
Introduction to Data Quality with Great Expectations

Aggregate-level Expectations: mode

Mode Expectation

gx.expectations.ExpectColumnMostCommonValueToBeInSet(
    column="colour", value_set={"Khaki", "Purple", "Grey"}
)
Introduction to Data Quality with Great Expectations

Cheat sheet

Row-level Expectations:

ExpectColumnValuesToNotBeNull(
    column: str
)
ExpectColumnValuesToBeOfType(
    column: str, type_: str
)

Aggregate-level Expectations:

ExpectColumnDistinctValuesToEqualSet(
    column: str, value_set: set
)
ExpectColumnUniqueValueCountToBeBetween(
    column: str, 
    min_value: int, max_value: int
)
ExpectColumnValuesToBeUnique(column: str)
ExpectColumnMostCommonValueToBeInSet(
    column: str, value_set: set
)
Introduction to Data Quality with Great Expectations

Let's practice!

Introduction to Data Quality with Great Expectations

Preparing Video For Download...