Apply Expectations to New Data

Introduction to Data Quality with Great Expectations

Davina Moossazadeh

Data Scientist

Checkpoints

Checkpoint - An object that groups and runs Validation Definitions with shared parameters

A schematic showing a Checkpoint, which contains the following workflow: Batch Requests -> Data Source -> Validation Definition. The Validation Definition outputs Validation Results, which feed into an optional Action List, containing one or more Actions.

Actions - Components configured by Checkpoints that integrate GX with other tools based on Validation Results

1 https://docs.greatexpectations.io/docs/core/trigger_actions_based_on_results/create_a_checkpoint_with_actions/
Introduction to Data Quality with Great Expectations

Why use Checkpoints?

Reusability

  • Can run multiple Validation Definitions against a Batch

Actions

  • Can trigger Actions based on Validation Results
Introduction to Data Quality with Great Expectations

Creating a Checkpoint

$$

Creating a Checkpoint with Slack Notification via gx.Checkpoint():

checkpoint = gx.Checkpoint(

name="my_checkpoint",
validation_definitions=[validation_definition],
actions=[SlackNotificationAction()] # optional )
Introduction to Data Quality with Great Expectations

Checkpoint errors

Running a Checkpoint before adding the Validation Definition to the Data Context raises an error:

CheckpointRelatedResourcesFreshnessError: 
ValidationDefinition 'my_validation_definition' must be added to the DataContext 
before it can be updated. Please call `context.validation_definitions.add(
<VALIDATION_DEFINITION_OBJECT>)`, then try your action again.
Introduction to Data Quality with Great Expectations

Adding a Validation Definition

Add the Validation Definition to the Data Context using .validation_definitions.add():

validation_definition = context.validation_definitions.add(

validation_definition=validation_definition )
Introduction to Data Quality with Great Expectations

Running a Checkpoint

checkpoint_results = checkpoint.run(
    batch_parameters={"dataframe": dataframe}
)

Checkpont Results. The output is long and does not fit on the slide.

Introduction to Data Quality with Great Expectations

Assessing Checkpoint Results

print(checkpoint_results.success)
False
print(checkpoint_results.describe())
Introduction to Data Quality with Great Expectations

Assessing Checkpoint Results

{ "success": false,
  "statistics": {
    "evaluated_expectations": 1, "successful_expectations": 0,
    "unsuccessful_expectations": 1, "success_percent": 0.0
  },
  "expectations": [{
    "expectation_type": "expect_table_row_count_to_equal",
    "success": false,
    "kwargs": {"batch_id": ""my_datasource-my_dataframe_asset", "value": 118000}, 
    "result": {"observed_value": 11866}}
  ],
  "result_url": "https://app.greatexpectations.io/organizations/my_org/data-assets/*/validations/expectation-suites/0a123b9c-e370-4b18-b703-785dde88732d/results/cb093105-6ede-47d4-a141-dee10c632e18"
}
Introduction to Data Quality with Great Expectations

Data Docs

Data Docs - static websites generated from GX metadata

# Checkpoint with Action for Updating Data Docs
gx.Checkpoint(
    name,
    validation_definitions,
    actions=[
      gx.checkpoint.actions.UpdateDataDocsAction(
          name="update_my_site", site_names="my_data_docs_site"
        )
    ],
)
1 https://docs.greatexpectations.io/docs/core/configure_project_settings/configure_data_docs/
Introduction to Data Quality with Great Expectations

Data Docs

A screenshot of a Data Docs webpage, depicting a list of table level Expectations and their observed values. One Expectation for row count range has an expanded Validation History table, showing the run time, observed value, and min and max values. In the top right corner, a green box with a check mark reads "All Expectations met".

Introduction to Data Quality with Great Expectations

Cheat sheet

Add Validation Definition to Data Context:

context.validation_definitions.add(
    validation_definition
)

Create Checkpoint:

checkpoint = gx.Checkpoint(
    name: str, 
    validation_definitions: list,
)

Run Checkpoint:

checkpoint_results = checkpoint.run(
    batch_parameters={"dataframe": dataframe}
)

Check Checkpoint Results:

checkpoint_results.success
checkpoint_results.describe()
Introduction to Data Quality with Great Expectations

Let's practice!

Introduction to Data Quality with Great Expectations

Preparing Video For Download...