Introduzione alla Data Quality con Great Expectations
Davina Moossazadeh
Data Scientist
Checkpoint - Oggetto che raggruppa ed esegue Validation Definition con parametri condivisi

Actions - Componenti configurati dai Checkpoint che integrano GX con altri strumenti in base ai Validation Results
Riutilizzabilità
Actions
$$
Creare un Checkpoint con notifica Slack via gx.Checkpoint():
checkpoint = gx.Checkpoint(name="my_checkpoint",validation_definitions=[validation_definition],actions=[SlackNotificationAction()] # opzionale )
Eseguire un Checkpoint prima di aggiungere la Validation Definition al Data Context genera un errore:
CheckpointRelatedResourcesFreshnessError:
ValidationDefinition 'my_validation_definition' must be added to the DataContext
before it can be updated. Please call `context.validation_definitions.add(
<VALIDATION_DEFINITION_OBJECT>)`, then try your action again.
Aggiungi la Validation Definition al Data Context con .validation_definitions.add():
validation_definition = context.validation_definitions.add(validation_definition=validation_definition )
checkpoint_results = checkpoint.run(
batch_parameters={"dataframe": dataframe}
)

print(checkpoint_results.success)
False
print(checkpoint_results.describe())
{ "success": false,
"statistics": {
"evaluated_expectations": 1, "successful_expectations": 0,
"unsuccessful_expectations": 1, "success_percent": 0.0
},
"expectations": [{
"expectation_type": "expect_table_row_count_to_equal",
"success": false,
"kwargs": {"batch_id": ""my_datasource-my_dataframe_asset", "value": 118000},
"result": {"observed_value": 11866}}
],
"result_url": "https://app.greatexpectations.io/organizations/my_org/data-assets/*/validations/expectation-suites/0a123b9c-e370-4b18-b703-785dde88732d/results/cb093105-6ede-47d4-a141-dee10c632e18"
}
Data Docs - siti statici generati dai metadati GX
# Checkpoint con Action per aggiornare i Data Docs
gx.Checkpoint(
name,
validation_definitions,
actions=[
gx.checkpoint.actions.UpdateDataDocsAction(
name="update_my_site", site_names="my_data_docs_site"
)
],
)

Aggiungi la Validation Definition al Data Context:
context.validation_definitions.add(
validation_definition
)
Crea un Checkpoint:
checkpoint = gx.Checkpoint(
name: str,
validation_definitions: list,
)
Esegui il Checkpoint:
checkpoint_results = checkpoint.run(
batch_parameters={"dataframe": dataframe}
)
Verifica i risultati del Checkpoint:
checkpoint_results.success
checkpoint_results.describe()
Introduzione alla Data Quality con Great Expectations