Data sources

Responsible AI Data Management

Maria Prokofieva

Lead ML engineer

Coming up..

  • Data source types
  • Limitations and selection
  • Integrating multiple data sources

People taking documents from shelves, using magnifying glass and searching files in electronic database

Responsible AI Data Management

Why data source is important

  • Integrity
  • Diversity
  • Fair representation

data source cooking

1 Image by Streamline HQ
Responsible AI Data Management

Types by origin

  • Primary

    • Data collected within the project
    • Compliance and consent
  • Secondary

    • Data acquired from existing resources
    • Licensing agreements

Licensing contract abstract concept

Responsible AI Data Management

Types by nature

  • Quantitative

    • Numeric data
  • Qualitative

    • Non-numeric
  • Mixed

    • Combination of numeric and non-numeric

numbers to text

1 Image by Streamline HQ
Responsible AI Data Management

Types by temporality

  • Static
    • Does not change over time
    • Census data
    • Corporate addresses
  • Dynamic
    • Updated real-time
    • Social media streams
    • API
    • Financial market feeds
    • Sensor data

dynamic flows

1 Image by Streamline HQ
Responsible AI Data Management

Diversity and fairness in data sources

  • Data collectors' direct biases
  • Measurable bias checks
  • Not accurately representing current realities, outdated biases
  • Inherited biases from the original context
  • Nuanced analysis
  • Continually evolve, possibly introducing real-time biases
Responsible AI Data Management

Urban traffic flow project

Data sources:

  • Historical traffic data
  • City council meeting notes
  • GPS tracking data

Urban traffic flow project

1 Images by Streamline HQ
Responsible AI Data Management

Historical traffic data

  • City's transportation department
  • Last 5 years
  • Includes vehicle counts and time of day/week

This is a primary static quantitative source

traffic data

1 Image by Streamline HQ
Responsible AI Data Management

Council meetings minutes

  • Public records available at the council website
  • Urban planning and traffic management summaries

This is a qualitative secondary source

minutes

1 Image by Streamline HQ
Responsible AI Data Management

GPS data

  • Primary dynamic source
  • Immediate insights into current traffic conditions, speeds, and delays

GPS

1 Image by Streamline HQ
Responsible AI Data Management

Let's practice!

Responsible AI Data Management

Preparing Video For Download...