Introduction to Data Quality with Great Expectations
Davina Moossazadeh
Data Scientist
Expectation - A verifiable assertion about data
Expectation - A verifiable assertion about data
gx.expectations.Expect...(...)
GX classes (Expectations): PascalCase
GX functions / methods: snake_case
row_count_expectation = gx.expectations.ExpectTableRowCountToEqual(
value=118000 )
validation_results = batch.validate(
expect=row_count_expectation )
print(validation_results)
{
"success": false,
"expectation_config": {
"type": "expect_table_row_count_to_equal",
"kwargs": {"batch_id": "my_pandas_datasource-my_dataframe_asset", "value": 118000},
"meta": {},
"rendered_content": [{"name": "atomic.prescriptive.summary", "value": {"schema": {"type": "com.superconductive.rendered.string"}, "template": "Must have exactly $value rows.", "params": {"value": {"schema": {"type": "number"}, "value": 118000}}}, "value_type": "StringValueType"}]
},
"result": {"observed_value": 118066},
"meta": {},
"exception_info": {"raised_exception": false, "exception_traceback": null, "exception_message": null},
"rendered_content": [{"name": "atomic.diagnostic.observed_value", "value": {"schema": {"type": "com.superconductive.rendered.string"}, "template": "118066", "params": {}}, "value_type": "StringValueType"}]
}
print(validation_results.describe())
{
"expectation_type": "expect_table_row_count_to_equal",
"success": false,
"kwargs": {
"batch_id": "my_pandas_datasource-my_dataframe_asset",
"value": 118000
},
"result": {
"observed_value": 118066
}
}
print(validation_results.success)
False
print(validation_results["success"])
False
print(validation_results.result)
{'observed_value': 118066}
print(validation_results["result"])
{'observed_value': 118066}
Shape Expectations:
ExpectTableRowCountToEqual(value: int)
ExpectTableRowCountToBeBetween(
min_value: int, max_value: int
)
ExpectTableColumnCountToEqual(
value: int
)
ExpectTableColumnCountToBeBetween(
min_value: int, max_value: int
)
Column name Expectations:
ExpectTableColumnsToMatchSet(
column_set: set
)
ExpectColumnToExist(column: str)
Create an Expectation:
gx.expectations.Expect...(...)
Validate an Expectation:
validation_results = batch.validate(
expect=expectation
)
Create a row count Expectation:
expectation = gx.expectations. \
ExpectTableRowCountToEqual(
value: int
)
Check Validation Results:
validation_results.describe()
validation_results.success
validation_results.result
Introduction to Data Quality with Great Expectations