Data quality and ingestion

Concetti MLOps

Folkert Stijnman

ML Engineer

Data quality and ingestion

Machine learning lifecycle data acquisition

Concetti MLOps

What is data quality?

  • Data quality is a measure of how well data serves its intended purpose
  • Evaluated through various dimensions
  • Quality of ML model depends on data
Concetti MLOps

Data quality dimensions

  • Accuracy
  • Completeness
  • Consistency
  • Timeliness
Concetti MLOps

Data quality dimensions example

Dimension Example question to answer Example of dimension quality
Accuracy Does our data correctly describe the customer? The customer's age in the data is 18, but is actually 32.
Completeness Is there any customer data missing? For 80% of the customers, we don't have a last name.
Consistency Is the definition of the customer synchronized throughout the company? The customer is stated as active in one database but not active in another.
Timeliness When is the customer ordering data available? The customer orders are synchronized at the end of the day but are not available in real-time.

Low data quality is not the end of the project!

Concetti MLOps

Data ingestion

Data pipeline

Concetti MLOps

Let's practice!

Concetti MLOps

Preparing Video For Download...