Introduction to Data Quality with Great Expectations
Davina Moossazadeh
Data Scientist
Batch Definition - A configuration for how a Data Asset should be divided for testing
batch_definition = data_asset.add_batch_definition_whole_dataframe(
name="my_batch_definition" )
print(batch_definition)
id='69e2a81d-1c28-4d1a-b66e-52cdc1198266'
name='my_batch_definition'
partitioner=None
Batch - A group of records that validations can be run on
batch = batch_definition.get_batch(
batch_parameters={"dataframe": dataframe} )
We can use .head()
as with pandas:
print(batch.head())
print(batch.head(fetch_all=True))
.columns()
shows all DataFrame columns (note the ()
)
print(batch.columns())
['Location',
'Date_Time',
'Temperature_C',
'Humidity_pct',
'Precipitation_mm',
'Wind_Speed_kmh']
Create Batch Definition from Data Asset:
batch_definition = data_asset. \
add_batch_definition_whole_dataframe(
name: str
)
Create Batch from Batch Definition:
batch = batch_definition.get_batch(
batch_parameters={"dataframe": dataframe}
)
Get Batch DataFrame rows:
batch.head(fetch_all: bool)
Get Batch DataFrame column list:
batch.columns()
Introduction to Data Quality with Great Expectations