Kennismaking met Datakwaliteit met Great Expectations
Davina Moossazadeh
Data Scientist
Validation Definition - Een verwijzing die een Expectation Suite koppelt aan de bijbehorende data

Maak een Validation Definition met de klasse ValidationDefinition:
validation_definition = gx.ValidationDefinition(name="my_validation_definition",data=batch_definition,suite=suite, )
print(validation_definition)
name='my_validation_definition'
data=BatchDefinition(
id='1fcb36d6-fac6-4b9a-8ba6-a659978fd59e',
name='my_batch_definition',
partitioner=None
)
suite={
"name": "my_suite",
"id": "0a123b9c-e370-4b18-b703-785dde88732d",
"expectations": [],
"meta": {"great_expectations_version": "1.2.4"},
"notes": null
}
id=None
print(validation_definition.name)
'my_validation_definition'
print(validation_definition.data)
id='1fcb36d6-fac6-4b9a-8ba6-a659978fd59e'
name='my_batch_definition'
partitioner=None
print(validation_definition.suite)
{
"name": "my_suite",
"id": "0a123b9c-e370-4b18-b703-785dde88732d",
"expectations": [],
"meta": {"great_expectations_version": "1.2.4"},
"notes": null
}
print(validation_definition.id)
None
print(validation_definition.data_source)
assets:
- batch_definitions:
- name: my_batch_definition
partitioner: null
batch_metadata: {}
id: 83682084-3bc4-4898-a807-fadc0f911415
name: 'my_dataframe_asset'
type: dataframe
id: f71d275e-a5b2-402e-a53c-8dad6975cce5
name: 'my_pandas_data_source'
type: pandas
Voer een Validation uit met de .run()-methode van de Validation Definition en geef de DataFrame door via batch_parameters:
validation_results = validation_definition.run(batch_parameters={"dataframe": dataframe} )
Let op de foutmelding:
ValidationDefinitionRelatedResourcesFreshnessError:
ExpectationSuite 'my_suite' must be added to the DataContext before it can be
updated. Please call `context.suites.add(<SUITE_OBJECT>)`, then try your action
again.
Een Validation Definition uitvoeren voordat de Expectation Suite aan de Data Context is toegevoegd, geeft een fout
Voeg een Expectation Suite toe aan de Data Context met .suites.add():
suite = context.suites.add(suite=suite )
validation_results = validation_definition.run(
batch_parameters={"dataframe": dataframe}
)
Calculating Metrics: 0/0 [00:00<?, ?it/s]
print(validation_results.success)
False
print(validation_results.describe())
{ "success": false,
"statistics": {
"evaluated_expectations": 1, "successful_expectations": 0,
"unsuccessful_expectations": 1, "success_percent": 0.0
},
"expectations": [{
"expectation_type": "expect_table_row_count_to_equal",
"success": false,
"kwargs": {"batch_id": ""my_datasource-my_dataframe_asset", "value": 118000},
"result": {"observed_value": 11866}}
],
"result_url": "https://app.greatexpectations.io/organizations/my_org/data-assets/*/validations/expectation-suites/0a123b9c-e370-4b18-b703-785dde88732d/results/cb093105-6ede-47d4-a141-dee10c632e18"
}
Expectation Suite toevoegen aan Data Context:
context.suites.add(suite)
Validation Definition maken:
validation_definition = \
gx.ValidationDefinition(
name: str,
data=batch_definition,
suite=suite
)
Validation uitvoeren:
validation_results = \
validation_definition.run(
batch_parameters={"dataframe": dataframe}
)
Validation-results controleren:
validation_results.success
validation_results.describe()
Kennismaking met Datakwaliteit met Great Expectations