Data source limitations

Responsible AI Data Management

Maria Prokofieva

Lead ML Engineer

Data source and common limitations

  • Legal compliance
  • Bias
  • Methodology
  • Role of domain knowledge
Responsible AI Data Management

Legal and access-based limitations

  • Restricted use of data for particular types of projects
  • Additional compliance
  • Prohibitive cost

 

Data compliance

1 Image by Streamline HQ
Responsible AI Data Management

Bias in data sources

  • Systematic errors
  • Distorted perceptions
  • Uneven outcomes
  • Disadvantage certain groups
Responsible AI Data Management

Types of bias

  • Historical bias:
    • Irrelevant patterns and outcomes
  • Selection bias:
    • Choosing which data points to include
  • Sampling bias
    • Method used for sampling

Bias scale

1 Image by Streamline HQ
Responsible AI Data Management

Bias and origin-based limitations

  • Restricted data coverage
  • Cultural and geographical constraints
  • Restricted in scope and inclusion

   

Group of people

Responsible AI Data Management

Bias and methodology-based limitations

  • Choice of data collection methods
  • Sampling approaches

Multicolored hands with open palms, raised up

Responsible AI Data Management

Domain knowledge

  • Limitations or hidden biases
  • Engage with domain experts early
  • Mitigate limitations before the modeling
Responsible AI Data Management

Urban traffic flow project

Data sources:

  • Traffic count data
  • City council meeting notes
  • GPS tracking data

Urban traffic flow project

1 Images by Streamline HQ
Responsible AI Data Management

Urban traffic flow project

Historical traffic count data:

  • Historical bias
  • Past urban layouts

Meeting minutes data:

  • Selection bias
  • Disproportionately representation of some community members

traffic data

1 Images by Streamline HQ
Responsible AI Data Management

Urban traffic flow project

GPS Tracking Data:

  • Sampling and selection biases
  • May not represent all commuters

gps

1 Image by Streamline HQ
Responsible AI Data Management

Let's practice!

Responsible AI Data Management

Preparing Video For Download...