Buat Expectation

Pengantar Data Quality dengan Great Expectations

Davina Moossazadeh

Data Scientist

Expectations

Expectation - Pernyataan terverifikasi tentang data

  • Column Expectations
  • Shape dan schema Expectations
    • schema - cetak biru struktur dataset
1 https://docs.greatexpectations.io/docs/reference/learn/terms/expectation
Pengantar Data Quality dengan Great Expectations

Expectations

Expectation - Pernyataan terverifikasi tentang data

  • Column Expectations
  • Shape dan schema Expectations
    • schema - cetak biru struktur dataset
1 https://docs.greatexpectations.io/docs/reference/learn/terms/expectation
Pengantar Data Quality dengan Great Expectations

Dataset Renewable Power Generation

Sebuah DataFrame pandas berisi data Renewable Power Generation, dengan kolom: "Time", "Energy delta[Wh]", "GHI", "temp", "pressure", "humidity", "wind_speed", "rain_1h", "snow_1h", dan "clouds_all". DataFrame memiliki 118.066 baris.

1 https://www.kaggle.com/datasets/pythonafroz/renewable-power-generation-and-weather-conditions
Pengantar Data Quality dengan Great Expectations

Membuat Expectation

gx.expectations.Expect...(...)

Kelas GX (Expectation): PascalCase

Fungsi/metode GX: snake_case

1 https://docs.greatexpectations.io/docs/core/define_expectations/create_an_expectation/
Pengantar Data Quality dengan Great Expectations

Membuat Expectation

row_count_expectation = gx.expectations.ExpectTableRowCountToEqual(

value=118000 )
validation_results = batch.validate(
expect=row_count_expectation )
1 https://docs.greatexpectations.io/docs/core/define_expectations/create_an_expectation/ https://docs.greatexpectations.io/docs/core/define_expectations/test_an_expectation/
Pengantar Data Quality dengan Great Expectations

Menilai Expectation

print(validation_results)
{ 
    "success": false,
    "expectation_config": {
        "type": "expect_table_row_count_to_equal",
        "kwargs": {"batch_id": "my_pandas_datasource-my_dataframe_asset", "value": 118000},
        "meta": {},
        "rendered_content": [{"name": "atomic.prescriptive.summary", "value": {"schema": {"type": "com.superconductive.rendered.string"}, "template": "Must have exactly $value rows.", "params": {"value": {"schema": {"type": "number"}, "value": 118000}}}, "value_type": "StringValueType"}]
    },
    "result": {"observed_value": 118066},
    "meta": {},
    "exception_info": {"raised_exception": false, "exception_traceback": null, "exception_message": null},
    "rendered_content": [{"name": "atomic.diagnostic.observed_value", "value": {"schema": {"type": "com.superconductive.rendered.string"}, "template": "118066", "params": {}}, "value_type": "StringValueType"}]
}
1 https://docs.greatexpectations.io/docs/core/run_validations/run_a_validation_definition/
Pengantar Data Quality dengan Great Expectations

Menilai Expectation

print(validation_results.describe())
{
    "expectation_type": "expect_table_row_count_to_equal",
    "success": false,
    "kwargs": {
        "batch_id": "my_pandas_datasource-my_dataframe_asset",
        "value": 118000
    },
    "result": {
        "observed_value": 118066
    }
}
Pengantar Data Quality dengan Great Expectations

Menilai Expectation

print(validation_results.success)
False
print(validation_results["success"])
False
Pengantar Data Quality dengan Great Expectations

Menilai Expectation

print(validation_results.result)
{'observed_value': 118066}
print(validation_results["result"])
{'observed_value': 118066}
Pengantar Data Quality dengan Great Expectations

Expectation umum lainnya

Shape Expectations:

ExpectTableRowCountToEqual(value: int)
ExpectTableRowCountToBeBetween(
    min_value: int, max_value: int
)
ExpectTableColumnCountToEqual(
    value: int
)
ExpectTableColumnCountToBeBetween(
    min_value: int, max_value: int
)

Expectation nama kolom:

ExpectTableColumnsToMatchSet(
    column_set: set
)
ExpectColumnToExist(column: str)
Pengantar Data Quality dengan Great Expectations

Ringkasan cepat

Buat Expectation:

gx.expectations.Expect...(...)

Validasi Expectation:

validation_results = batch.validate(
    expect=expectation
)

Buat Expectation jumlah baris:

expectation = gx.expectations. \
ExpectTableRowCountToEqual(
    value: int
)

Periksa Hasil Validasi:

validation_results.describe()
validation_results.success
validation_results.result
Pengantar Data Quality dengan Great Expectations

Ayo berlatih!

Pengantar Data Quality dengan Great Expectations

Preparing Video For Download...