Connect to Data

Introduction to Data Quality with Great Expectations

Davina Moossazadeh

Data Scientist

Components

GX components - Python classes that represent data and data validation entities

  • Data Context ✅
  • Data Sources & Data Assets (✔)
  • Batch Definitions & Batches ☐
  • Expectations ☐
  • Expectation Suites ☐
  • Validation Definitions ☐
  • Checkpoints & Actions ☐
  • Data Docs ☐
1 https://docs.greatexpectations.io/docs/core/introduction/gx_overview
Introduction to Data Quality with Great Expectations

Data Sources

Data Source - An object that tells GX how to connect to a specific source of external data

SQL logo.

Spark logo.

Pandas logo.

1 https://docs.greatexpectations.io/docs/core/connect_to_data/dataframes/
Introduction to Data Quality with Great Expectations

Data Sources

Data Source - An object that tells GX how to connect to a specific source of external data

SQL logo.

Spark logo.

Pandas logo -- distinguished by a box around it.

1 https://docs.greatexpectations.io/docs/core/connect_to_data/dataframes/
Introduction to Data Quality with Great Expectations

Creating a Data Source

Manage Data Sources with the .data_sources attribute, using the .add_pandas() method:

data_source = context.data_sources.add_pandas(

name="my_pandas_datasource" )

Note: The name parameter in GX is different from the Python variable name

  • You can assign them different values, e.g., "my_pandas_datasource" vs. data_source
1 https://docs.greatexpectations.io/docs/core/connect_to_data/dataframes/
Introduction to Data Quality with Great Expectations

Data Assets

Data Asset - A collection of records within a Data Source

data_asset = data_source.add_dataframe_asset(

name="my_dataframe_asset" )
1 https://docs.greatexpectations.io/docs/core/connect_to_data/dataframes/
Introduction to Data Quality with Great Expectations

Cheat sheet

Create Data Source from Data Context:

data_source = context.data_sources.add_pandas(
    name: str
)

Create Data Asset from Data Source:

data_asset = data_source.add_dataframe_asset(
    name: str
)
Introduction to Data Quality with Great Expectations

Let's practice!

Introduction to Data Quality with Great Expectations

Preparing Video For Download...