Designing Forecasting Pipelines for Production
Rami Krispin
Senior Manager, Data Science and Engineering

























print(raw)
period respondent respondent-name type type-name value value-units
0 2025-05-01 00:00:00 US48 United States Lower 48 D Demand 504242
1 2025-04-30 23:00:00 US48 United States Lower 48 D Demand 508099
2 2025-04-30 22:00:00 US48 United States Lower 48 D Demand 508323
3 2025-04-30 21:00:00 US48 United States Lower 48 D Demand 500551
4 2025-04-30 20:00:00 US48 United States Lower 48 D Demand 492240
import pointblank as pbtable_schema = pb.Schema( columns=[ ("index", "datetime64[ns]"), ("respondent", "object"), ("respondent-name", "object"), ("type", "object"), ("type-name", "object"), ("value", "int64"), ("value-units", "object") ] )
validation = (pb.Validate(data=raw, tbl_name="US48 Data Validation", label="Data Refresh", thresholds=pb.Thresholds(warning=0.2, error=0, critical=0.1)).col_schema_match(schema=table_schema).col_vals_gt(columns="value", value=0).col_vals_in_set(columns="respondent", set = ["US48"]) .col_vals_in_set(columns="type", set = ["D"]).col_vals_not_null(columns=["period", "value"]).rows_distinct().interrogate())

print(validation.all_passed())
True
Designing Forecasting Pipelines for Production