Create Expectations

Introduction to Data Quality with Great Expectations

Davina Moossazadeh

Data Scientist

Expectations

Expectation - A verifiable assertion about data

  • Column Expectations
  • Shape and schema Expectations
    • schema - the blueprint of a dataset's structure
1 https://docs.greatexpectations.io/docs/reference/learn/terms/expectation
Introduction to Data Quality with Great Expectations

Expectations

Expectation - A verifiable assertion about data

  • Column Expectations
  • Shape and schema Expectations
    • schema - the blueprint of a dataset's structure
1 https://docs.greatexpectations.io/docs/reference/learn/terms/expectation
Introduction to Data Quality with Great Expectations

The Renewable Power Generation dataset

A pandas DataFrame containing the Renewable Power Generation data, with the following columns: "Time", "Energy delta[Wh]", "GHI", "temp", "pressure", "humidity", "wind_speed", "rain_1h", "snow_1h" and "clouds_all". The DataFrame has 118,066 rows.

1 https://www.kaggle.com/datasets/pythonafroz/renewable-power-generation-and-weather-conditions
Introduction to Data Quality with Great Expectations

Creating an Expectation

gx.expectations.Expect...(...)

GX classes (Expectations): PascalCase

GX functions / methods: snake_case

1 https://docs.greatexpectations.io/docs/core/define_expectations/create_an_expectation/
Introduction to Data Quality with Great Expectations

Creating an Expectation

row_count_expectation = gx.expectations.ExpectTableRowCountToEqual(

value=118000 )
validation_results = batch.validate(
expect=row_count_expectation )
1 https://docs.greatexpectations.io/docs/core/define_expectations/create_an_expectation/ https://docs.greatexpectations.io/docs/core/define_expectations/test_an_expectation/
Introduction to Data Quality with Great Expectations

Assessing an Expectation

print(validation_results)
{ 
    "success": false,
    "expectation_config": {
        "type": "expect_table_row_count_to_equal",
        "kwargs": {"batch_id": "my_pandas_datasource-my_dataframe_asset", "value": 118000},
        "meta": {},
        "rendered_content": [{"name": "atomic.prescriptive.summary", "value": {"schema": {"type": "com.superconductive.rendered.string"}, "template": "Must have exactly $value rows.", "params": {"value": {"schema": {"type": "number"}, "value": 118000}}}, "value_type": "StringValueType"}]
    },
    "result": {"observed_value": 118066},
    "meta": {},
    "exception_info": {"raised_exception": false, "exception_traceback": null, "exception_message": null},
    "rendered_content": [{"name": "atomic.diagnostic.observed_value", "value": {"schema": {"type": "com.superconductive.rendered.string"}, "template": "118066", "params": {}}, "value_type": "StringValueType"}]
}
1 https://docs.greatexpectations.io/docs/core/run_validations/run_a_validation_definition/
Introduction to Data Quality with Great Expectations

Assessing an Expectation

print(validation_results.describe())
{
    "expectation_type": "expect_table_row_count_to_equal",
    "success": false,
    "kwargs": {
        "batch_id": "my_pandas_datasource-my_dataframe_asset",
        "value": 118000
    },
    "result": {
        "observed_value": 118066
    }
}
Introduction to Data Quality with Great Expectations

Assessing an Expectation

print(validation_results.success)
False
print(validation_results["success"])
False
Introduction to Data Quality with Great Expectations

Assessing an Expectation

print(validation_results.result)
{'observed_value': 118066}
print(validation_results["result"])
{'observed_value': 118066}
Introduction to Data Quality with Great Expectations

Other common Expectations

Shape Expectations:

ExpectTableRowCountToEqual(value: int)
ExpectTableRowCountToBeBetween(
    min_value: int, max_value: int
)
ExpectTableColumnCountToEqual(
    value: int
)
ExpectTableColumnCountToBeBetween(
    min_value: int, max_value: int
)

Column name Expectations:

ExpectTableColumnsToMatchSet(
    column_set: set
)
ExpectColumnToExist(column: str)
Introduction to Data Quality with Great Expectations

Cheat sheet

Create an Expectation:

gx.expectations.Expect...(...)

Validate an Expectation:

validation_results = batch.validate(
    expect=expectation
)

Create a row count Expectation:

expectation = gx.expectations. \
ExpectTableRowCountToEqual(
    value: int
)

Check Validation Results:

validation_results.describe()
validation_results.success
validation_results.result
Introduction to Data Quality with Great Expectations

Let's practice!

Introduction to Data Quality with Great Expectations

Preparing Video For Download...