Validate Expectation Suites

Introduction to Data Quality with Great Expectations

Davina Moossazadeh

Data Scientist

Validation Definitions

Validation Definition - A reference that links an Expectation Suite to data that it describes

A schematic showing a Validation Definition with Data and Expectation Suite as inputs and Validation Results as the output.

1 https://docs.greatexpectations.io/docs/core/run_validations/create_a_validation_definition/
Introduction to Data Quality with Great Expectations

Creating a Validation Definition

Create a Validation Definition with the ValidationDefinition class:

validation_definition = gx.ValidationDefinition(

name="my_validation_definition",
data=batch_definition,
suite=suite, )
print(validation_definition)
Introduction to Data Quality with Great Expectations

Viewing a Validation Definition

name='my_validation_definition' 
data=BatchDefinition(
    id='1fcb36d6-fac6-4b9a-8ba6-a659978fd59e', 
    name='my_batch_definition', 
    partitioner=None
) 
suite={
  "name": "my_suite",
  "id": "0a123b9c-e370-4b18-b703-785dde88732d",
  "expectations": [],
  "meta": {"great_expectations_version": "1.2.4"},
  "notes": null
} 
id=None
Introduction to Data Quality with Great Expectations

Viewing a Validation Definition

print(validation_definition.name)
'my_validation_definition' 
print(validation_definition.data)
id='1fcb36d6-fac6-4b9a-8ba6-a659978fd59e' 
name='my_batch_definition' 
partitioner=None
print(validation_definition.suite)
{
  "name": "my_suite",
  "id": "0a123b9c-e370-4b18-b703-785dde88732d",
  "expectations": [],
  "meta": {"great_expectations_version": "1.2.4"},
  "notes": null
} 
print(validation_definition.id)
None
Introduction to Data Quality with Great Expectations

Viewing a Validation Definition

print(validation_definition.data_source)
assets:
  - batch_definitions:
      - name: my_batch_definition
        partitioner: null
    batch_metadata: {}
    id: 83682084-3bc4-4898-a807-fadc0f911415
    name: 'my_dataframe_asset'
    type: dataframe
id: f71d275e-a5b2-402e-a53c-8dad6975cce5
name: 'my_pandas_data_source'
type: pandas
Introduction to Data Quality with Great Expectations

Running a Validation Definition

Run a Validation using the Validation Definition's .run() method, passing the DataFrame via batch_parameters:

validation_results = validation_definition.run(

batch_parameters={"dataframe": dataframe} )
Introduction to Data Quality with Great Expectations

Validation Definition errors

Note the error:

ValidationDefinitionRelatedResourcesFreshnessError:
ExpectationSuite 'my_suite' must be added to the DataContext before it can be 
updated. Please call `context.suites.add(<SUITE_OBJECT>)`, then try your action 
again.

Running a Validation Definition before adding the Expectation Suite to the Data Context raises an error

Introduction to Data Quality with Great Expectations

Adding an Expectation Suite

Add an Expectation Suite to the Data Context using .suites.add():

suite = context.suites.add(

suite=suite )
Introduction to Data Quality with Great Expectations

Assessing a Validation Definition

validation_results = validation_definition.run(
    batch_parameters={"dataframe": dataframe}
)
Calculating Metrics: 0/0 [00:00<?, ?it/s]
print(validation_results.success)
False
print(validation_results.describe())
Introduction to Data Quality with Great Expectations

Assessing a Validation Definition

{ "success": false,
  "statistics": {
    "evaluated_expectations": 1, "successful_expectations": 0,
    "unsuccessful_expectations": 1, "success_percent": 0.0
  },
  "expectations": [{
    "expectation_type": "expect_table_row_count_to_equal",
    "success": false,
    "kwargs": {"batch_id": ""my_datasource-my_dataframe_asset", "value": 118000}, 
    "result": {"observed_value": 11866}}
  ],
  "result_url": "https://app.greatexpectations.io/organizations/my_org/data-assets/*/validations/expectation-suites/0a123b9c-e370-4b18-b703-785dde88732d/results/cb093105-6ede-47d4-a141-dee10c632e18"
}
Introduction to Data Quality with Great Expectations

A note about GX

  • Great Expectations offers multiple workflows for similar tasks
    • e.g., Batch Definitions vs. Validation Definitions
  • This course provides a broad understanding, but alternative GX implementations may use different approaches
Introduction to Data Quality with Great Expectations

Cheat sheet

Add Expectation Suite to Data Context:

context.suites.add(suite)

Create Validation Definition:

validation_definition = \
gx.ValidationDefinition(
  name: str, 
  data=batch_definition, 
  suite=suite
)

Run Validation:

validation_results = \
validation_definition.run(
  batch_parameters={"dataframe": dataframe}
)

Check Validation Results:

validation_results.success
validation_results.describe()
Introduction to Data Quality with Great Expectations

Let's practice!

Introduction to Data Quality with Great Expectations

Preparing Video For Download...