Introduction to Data Quality with Great Expectations
Davina Moossazadeh
Data Scientist
Checkpoint - An object that groups and runs Validation Definitions with shared parameters
Actions - Components configured by Checkpoints that integrate GX with other tools based on Validation Results
Reusability
Actions
$$
Creating a Checkpoint with Slack Notification via gx.Checkpoint()
:
checkpoint = gx.Checkpoint(
name="my_checkpoint",
validation_definitions=[validation_definition],
actions=[SlackNotificationAction()] # optional )
Running a Checkpoint before adding the Validation Definition to the Data Context raises an error:
CheckpointRelatedResourcesFreshnessError:
ValidationDefinition 'my_validation_definition' must be added to the DataContext
before it can be updated. Please call `context.validation_definitions.add(
<VALIDATION_DEFINITION_OBJECT>)`, then try your action again.
Add the Validation Definition to the Data Context using .validation_definitions.add()
:
validation_definition = context.validation_definitions.add(
validation_definition=validation_definition )
checkpoint_results = checkpoint.run(
batch_parameters={"dataframe": dataframe}
)
print(checkpoint_results.success)
False
print(checkpoint_results.describe())
{ "success": false,
"statistics": {
"evaluated_expectations": 1, "successful_expectations": 0,
"unsuccessful_expectations": 1, "success_percent": 0.0
},
"expectations": [{
"expectation_type": "expect_table_row_count_to_equal",
"success": false,
"kwargs": {"batch_id": ""my_datasource-my_dataframe_asset", "value": 118000},
"result": {"observed_value": 11866}}
],
"result_url": "https://app.greatexpectations.io/organizations/my_org/data-assets/*/validations/expectation-suites/0a123b9c-e370-4b18-b703-785dde88732d/results/cb093105-6ede-47d4-a141-dee10c632e18"
}
Data Docs - static websites generated from GX metadata
# Checkpoint with Action for Updating Data Docs
gx.Checkpoint(
name,
validation_definitions,
actions=[
gx.checkpoint.actions.UpdateDataDocsAction(
name="update_my_site", site_names="my_data_docs_site"
)
],
)
Add Validation Definition to Data Context:
context.validation_definitions.add(
validation_definition
)
Create Checkpoint:
checkpoint = gx.Checkpoint(
name: str,
validation_definitions: list,
)
Run Checkpoint:
checkpoint_results = checkpoint.run(
batch_parameters={"dataframe": dataframe}
)
Check Checkpoint Results:
checkpoint_results.success
checkpoint_results.describe()
Introduction to Data Quality with Great Expectations