Introduction to Data Quality with Great Expectations
Davina Moossazadeh
Data Scientist
Validation Definition - A reference that links an Expectation Suite to data that it describes
Create a Validation Definition with the ValidationDefinition
class:
validation_definition = gx.ValidationDefinition(
name="my_validation_definition",
data=batch_definition,
suite=suite, )
print(validation_definition)
name='my_validation_definition'
data=BatchDefinition(
id='1fcb36d6-fac6-4b9a-8ba6-a659978fd59e',
name='my_batch_definition',
partitioner=None
)
suite={
"name": "my_suite",
"id": "0a123b9c-e370-4b18-b703-785dde88732d",
"expectations": [],
"meta": {"great_expectations_version": "1.2.4"},
"notes": null
}
id=None
print(validation_definition.name)
'my_validation_definition'
print(validation_definition.data)
id='1fcb36d6-fac6-4b9a-8ba6-a659978fd59e'
name='my_batch_definition'
partitioner=None
print(validation_definition.suite)
{
"name": "my_suite",
"id": "0a123b9c-e370-4b18-b703-785dde88732d",
"expectations": [],
"meta": {"great_expectations_version": "1.2.4"},
"notes": null
}
print(validation_definition.id)
None
print(validation_definition.data_source)
assets:
- batch_definitions:
- name: my_batch_definition
partitioner: null
batch_metadata: {}
id: 83682084-3bc4-4898-a807-fadc0f911415
name: 'my_dataframe_asset'
type: dataframe
id: f71d275e-a5b2-402e-a53c-8dad6975cce5
name: 'my_pandas_data_source'
type: pandas
Run a Validation using the Validation Definition's .run()
method, passing the DataFrame via batch_parameters
:
validation_results = validation_definition.run(
batch_parameters={"dataframe": dataframe} )
Note the error:
ValidationDefinitionRelatedResourcesFreshnessError:
ExpectationSuite 'my_suite' must be added to the DataContext before it can be
updated. Please call `context.suites.add(<SUITE_OBJECT>)`, then try your action
again.
Running a Validation Definition before adding the Expectation Suite to the Data Context raises an error
Add an Expectation Suite to the Data Context using .suites.add()
:
suite = context.suites.add(
suite=suite )
validation_results = validation_definition.run(
batch_parameters={"dataframe": dataframe}
)
Calculating Metrics: 0/0 [00:00<?, ?it/s]
print(validation_results.success)
False
print(validation_results.describe())
{ "success": false,
"statistics": {
"evaluated_expectations": 1, "successful_expectations": 0,
"unsuccessful_expectations": 1, "success_percent": 0.0
},
"expectations": [{
"expectation_type": "expect_table_row_count_to_equal",
"success": false,
"kwargs": {"batch_id": ""my_datasource-my_dataframe_asset", "value": 118000},
"result": {"observed_value": 11866}}
],
"result_url": "https://app.greatexpectations.io/organizations/my_org/data-assets/*/validations/expectation-suites/0a123b9c-e370-4b18-b703-785dde88732d/results/cb093105-6ede-47d4-a141-dee10c632e18"
}
Add Expectation Suite to Data Context:
context.suites.add(suite)
Create Validation Definition:
validation_definition = \
gx.ValidationDefinition(
name: str,
data=batch_definition,
suite=suite
)
Run Validation:
validation_results = \
validation_definition.run(
batch_parameters={"dataframe": dataframe}
)
Check Validation Results:
validation_results.success
validation_results.describe()
Introduction to Data Quality with Great Expectations