Conditional Expectations

Introduction to Data Quality with Great Expectations

Davina Moossazadeh

Data Scientist

What are Conditional Expectations?

Conditional Expectations - Expectations for a subset of the data

Why? Because some variables are dependent on the values of other variables

For example:

  • An Expectation that the value of a column star_rating must be 0 for all rows with a value of 0 for review_count
1 https://docs.greatexpectations.io/docs/core/customize_expectations/expectation_conditions/
Introduction to Data Quality with Great Expectations

Syntax for Conditional Expectations

Dataset Expectations can be converted into Conditional Expectations with two additional arguments:

  1. row_condition
    • a boolean expression string that defines the subset of data to which to apply the Conditional Expectation
  2. condition_parser
    • a string that defines the syntax of the row_condition
Introduction to Data Quality with Great Expectations

The condition parser

When implementing Conditional Expectations with pandas, this argument must be set to "pandas"

expectation = gx.Expect...(
    **kwargs,
    condition_parser="pandas",
    row_condition=...
)
Introduction to Data Quality with Great Expectations

The row condition

pandas Syntax
df["foo"] == 'Two Two'
df["foo"].notNull()
df["foo"] <= datetime.date(2023, 3, 13)
(df["foo"] < 5) & (df["foo"] >= 3.14)

df["foo"].str.startswith("bar")
Great Expectations row_condition
'foo == "Two Two"'
'foo.notNull()'
'foo <= datetime.date(2023, 3, 13)'
'(foo > 5) & (foo <= 3.14)'
'foo > 5 and foo <= 3.14'
'foo.str.startswith("bar")'
Introduction to Data Quality with Great Expectations

The row condition

Rules
  1. Don't use single quotes inside

    • row_condition="foo=='Two Two'"
    • row_condition='foo=="Two Two"' Check mark indicating that the syntax of the `row_condition` in the second bullet point is correct.
  2. Don't use line breaks inside

    • row_condition="""
      foo=="Two Two"
      """  
      
    • row_condition='foo=="Two Two"' Check mark indicating that the syntax of the `row_condition` in the second bullet point is correct.
Introduction to Data Quality with Great Expectations

Example Expectation: star rating

No condition
expectation = gx.expectations.\
ExpectColumnValuesToBeBetween(
    column="price_usd",
    max_value=10,
)
validation_results = batch.validate(
    expect=expectation
)
print(validation_results.success)
False
Conditional
expectation = gx.expectations.\
ExpectColumnValuesToBeBetween(
    column="price_usd",
    max_value=10,
    condition_parser='pandas',
    row_condition='mark_price_usd < 10',
)
validation_results = batch.validate(
    expect=expectation
)
print(validation_results.success)
True
Introduction to Data Quality with Great Expectations

Let's practice!

Introduction to Data Quality with Great Expectations

Preparing Video For Download...