Pengantar Data Quality dengan Great Expectations
Davina Moossazadeh
Data Scientist
Validation Definition - Referensi yang mengaitkan Expectation Suite dengan data yang dideskripsikannya

Buat Validation Definition dengan kelas ValidationDefinition:
validation_definition = gx.ValidationDefinition(name="my_validation_definition",data=batch_definition,suite=suite, )
print(validation_definition)
name='my_validation_definition'
data=BatchDefinition(
id='1fcb36d6-fac6-4b9a-8ba6-a659978fd59e',
name='my_batch_definition',
partitioner=None
)
suite={
"name": "my_suite",
"id": "0a123b9c-e370-4b18-b703-785dde88732d",
"expectations": [],
"meta": {"great_expectations_version": "1.2.4"},
"notes": null
}
id=None
print(validation_definition.name)
'my_validation_definition'
print(validation_definition.data)
id='1fcb36d6-fac6-4b9a-8ba6-a659978fd59e'
name='my_batch_definition'
partitioner=None
print(validation_definition.suite)
{
"name": "my_suite",
"id": "0a123b9c-e370-4b18-b703-785dde88732d",
"expectations": [],
"meta": {"great_expectations_version": "1.2.4"},
"notes": null
}
print(validation_definition.id)
None
print(validation_definition.data_source)
assets:
- batch_definitions:
- name: my_batch_definition
partitioner: null
batch_metadata: {}
id: 83682084-3bc4-4898-a807-fadc0f911415
name: 'my_dataframe_asset'
type: dataframe
id: f71d275e-a5b2-402e-a53c-8dad6975cce5
name: 'my_pandas_data_source'
type: pandas
Jalankan Validasi dengan metode .run() pada Validation Definition, teruskan DataFrame melalui batch_parameters:
validation_results = validation_definition.run(batch_parameters={"dataframe": dataframe} )
Perhatikan error berikut:
ValidationDefinitionRelatedResourcesFreshnessError:
ExpectationSuite 'my_suite' must be added to the DataContext before it can be
updated. Please call `context.suites.add(<SUITE_OBJECT>)`, then try your action
again.
Menjalankan Validation Definition sebelum menambahkan Expectation Suite ke Data Context akan memunculkan error
Tambahkan Expectation Suite ke Data Context menggunakan .suites.add():
suite = context.suites.add(suite=suite )
validation_results = validation_definition.run(
batch_parameters={"dataframe": dataframe}
)
Calculating Metrics: 0/0 [00:00<?, ?it/s]
print(validation_results.success)
False
print(validation_results.describe())
{ "success": false,
"statistics": {
"evaluated_expectations": 1, "successful_expectations": 0,
"unsuccessful_expectations": 1, "success_percent": 0.0
},
"expectations": [{
"expectation_type": "expect_table_row_count_to_equal",
"success": false,
"kwargs": {"batch_id": ""my_datasource-my_dataframe_asset", "value": 118000},
"result": {"observed_value": 11866}}
],
"result_url": "https://app.greatexpectations.io/organizations/my_org/data-assets/*/validations/expectation-suites/0a123b9c-e370-4b18-b703-785dde88732d/results/cb093105-6ede-47d4-a141-dee10c632e18"
}
Tambahkan Expectation Suite ke Data Context:
context.suites.add(suite)
Buat Validation Definition:
validation_definition = \
gx.ValidationDefinition(
name: str,
data=batch_definition,
suite=suite
)
Jalankan Validasi:
validation_results = \
validation_definition.run(
batch_parameters={"dataframe": dataframe}
)
Periksa Hasil Validasi:
validation_results.success
validation_results.describe()
Pengantar Data Quality dengan Great Expectations