Create a Data Context

Introduction to Data Quality with Great Expectations

Davina Moossazadeh

Data Scientist

What is data quality?

How fit a dataset is for its intended purpose

  • Completeness
  • Accuracy
  • Validity
  • Uniqueness
  • Timeliness
  • Integrity
  • Consistency
  • etc.

A scatter plot showing a cluster of values and an outlier,

1 https://nitin9809.medium.com/outlier-detection-and-treatment-part-1-aa0b09f60e50
Introduction to Data Quality with Great Expectations

Why is data quality important?

Garbage going in.

An amazing model receiving garbage as input.

Garbage going out.

A model can only be as good as the the data going in!

Introduction to Data Quality with Great Expectations

What is Great Expectations?

Great Expectations logo.

Great Expectations (GX) - A platform for managing data quality

  • GX Cloud - web-based UI
  • GX Core - Python package
Introduction to Data Quality with Great Expectations

Expectations

Expectation - Verifiable assertion about data

  • Dataset shape
  • Null values
  • Duplicates
  • Value sets/ranges
  • String formatting
  • Data distributions
  • Data quality issues
  • etc.

images-1.png

1 https://docs.greatexpectations.io/docs/core/define_expectations/create_an_expectation/ https://mathbitsnotebook.com/Algebra2/Statistics/STnormalDistribution.html
Introduction to Data Quality with Great Expectations

Data Contexts

Data Context - The primary entry point for a GX deployment

  • Configurations and methods for all supporting GX components
    • Data Sources
    • Expectation Suites
    • Checkpoints
    • Data Docs
    • Validation Results
    • Metrics
1 https://docs.greatexpectations.io/docs/core/set_up_a_gx_environment/create_a_data_context/
Introduction to Data Quality with Great Expectations

Importing GX

Import Great Expectations with the alias gx:

import great_expectations as gx
Introduction to Data Quality with Great Expectations

Creating a Data Context

Use get_context() to create the Data Context:

context = gx.get_context()

print(context)
{ "analytics_enabled": true,
  "checkpoint_store_name": "default_checkpoint_store",
  "config_variables_file_path": "uncommitted/config_variables.yml",
  "config_version": 4.0,
  "data_context_id": "5b407294-b17c-43e3-aa5f-4f8a4741e772",
  "expectations_store_name": "default_expectations_store",
  "fluent_datasources": {},
  "plugins_directory": "plugins/",
  "stores": {},
  "validation_results_store_name": "default_validations_store" }
Introduction to Data Quality with Great Expectations

Let's practice!

Introduction to Data Quality with Great Expectations

Preparing Video For Download...