Apply Expectations to New Data

Introduzione alla Data Quality con Great Expectations

Davina Moossazadeh

Data Scientist

Checkpoints

Checkpoint - An object that groups and runs Validation Definitions with shared parameters

A schematic showing a Checkpoint, which contains the following workflow: Batch Requests -> Data Source -> Validation Definition. The Validation Definition outputs Validation Results, which feed into an optional Action List, containing one or more Actions.

Actions - Components configured by Checkpoints that integrate GX with other tools based on Validation Results

1 https://docs.greatexpectations.io/docs/core/trigger_actions_based_on_results/create_a_checkpoint_with_actions/
Introduzione alla Data Quality con Great Expectations

Why use Checkpoints?

Reusability

  • Can run multiple Validation Definitions against a Batch

Actions

  • Can trigger Actions based on Validation Results
Introduzione alla Data Quality con Great Expectations

Creating a Checkpoint

$$

Creating a Checkpoint with Slack Notification via gx.Checkpoint():

checkpoint = gx.Checkpoint(

name="my_checkpoint",
validation_definitions=[validation_definition],
actions=[SlackNotificationAction()] # optional )
Introduzione alla Data Quality con Great Expectations

Checkpoint errors

Running a Checkpoint before adding the Validation Definition to the Data Context raises an error:

CheckpointRelatedResourcesFreshnessError: 
ValidationDefinition 'my_validation_definition' must be added to the DataContext 
before it can be updated. Please call `context.validation_definitions.add(
<VALIDATION_DEFINITION_OBJECT>)`, then try your action again.
Introduzione alla Data Quality con Great Expectations

Adding a Validation Definition

Add the Validation Definition to the Data Context using .validation_definitions.add():

validation_definition = context.validation_definitions.add(

validation_definition=validation_definition )
Introduzione alla Data Quality con Great Expectations

Running a Checkpoint

checkpoint_results = checkpoint.run(
    batch_parameters={"dataframe": dataframe}
)

Checkpont Results. The output is long and does not fit on the slide.

Introduzione alla Data Quality con Great Expectations

Assessing Checkpoint Results

print(checkpoint_results.success)
False
print(checkpoint_results.describe())
Introduzione alla Data Quality con Great Expectations

Assessing Checkpoint Results

{ "success": false,
  "statistics": {
    "evaluated_expectations": 1, "successful_expectations": 0,
    "unsuccessful_expectations": 1, "success_percent": 0.0
  },
  "expectations": [{
    "expectation_type": "expect_table_row_count_to_equal",
    "success": false,
    "kwargs": {"batch_id": ""my_datasource-my_dataframe_asset", "value": 118000}, 
    "result": {"observed_value": 11866}}
  ],
  "result_url": "https://app.greatexpectations.io/organizations/my_org/data-assets/*/validations/expectation-suites/0a123b9c-e370-4b18-b703-785dde88732d/results/cb093105-6ede-47d4-a141-dee10c632e18"
}
Introduzione alla Data Quality con Great Expectations

Data Docs

Data Docs - static websites generated from GX metadata

# Checkpoint with Action for Updating Data Docs
gx.Checkpoint(
    name,
    validation_definitions,
    actions=[
      gx.checkpoint.actions.UpdateDataDocsAction(
          name="update_my_site", site_names="my_data_docs_site"
        )
    ],
)
1 https://docs.greatexpectations.io/docs/core/configure_project_settings/configure_data_docs/
Introduzione alla Data Quality con Great Expectations

Data Docs

A screenshot of a Data Docs webpage, depicting a list of table level Expectations and their observed values. One Expectation for row count range has an expanded Validation History table, showing the run time, observed value, and min and max values. In the top right corner, a green box with a check mark reads "All Expectations met".

Introduzione alla Data Quality con Great Expectations

Cheat sheet

Add Validation Definition to Data Context:

context.validation_definitions.add(
    validation_definition
)

Create Checkpoint:

checkpoint = gx.Checkpoint(
    name: str, 
    validation_definitions: list,
)

Run Checkpoint:

checkpoint_results = checkpoint.run(
    batch_parameters={"dataframe": dataframe}
)

Check Checkpoint Results:

checkpoint_results.success
checkpoint_results.describe()
Introduzione alla Data Quality con Great Expectations

Let's practice!

Introduzione alla Data Quality con Great Expectations

Preparing Video For Download...