Introduction to Data Quality with Great Expectations
Davina Moossazadeh
Data Scientist
Missingness Expectation
gx.expectations.ExpectColumnValuesToNotBeNull(
column="colour"
)
Type Expectation
gx.expectations.ExpectColumnValuesToBeOfType(
column="review_count", type_="str"
)
Distinct values Expectation
gx.expectations.ExpectColumnDistinctValuesToEqualSet(
column="seller_name", value_set={"Womens Shoes"}
)
Unique value count Expectation
gx.expectations.ExpectColumnUniqueValueCountToBeBetween(
column="review_count", min_value=5, max_value=101
)
Uniqueness Expectation
gx.expectations.ExpectColumnValuesToBeUnique(
column="sku_id"
)
Mode Expectation
gx.expectations.ExpectColumnMostCommonValueToBeInSet(
column="colour", value_set={"Khaki", "Purple", "Grey"}
)
Row-level Expectations:
ExpectColumnValuesToNotBeNull(
column: str
)
ExpectColumnValuesToBeOfType(
column: str, type_: str
)
Aggregate-level Expectations:
ExpectColumnDistinctValuesToEqualSet(
column: str, value_set: set
)
ExpectColumnUniqueValueCountToBeBetween(
column: str,
min_value: int, max_value: int
)
ExpectColumnValuesToBeUnique(column: str)
ExpectColumnMostCommonValueToBeInSet(
column: str, value_set: set
)
Introduction to Data Quality with Great Expectations