Data source selection

Responsible AI Data Management

Maria Prokofieva

Lead ML engineer

Why select?

  • Ensure data quality
  • Legal compliance
  • Fairness

 

Puzzled woman choosing between two buttons and pushing blue one

Responsible AI Data Management

Step 1. Project relevance

  • Relevance to the project objectives
  • Check for alignment with
    • Subject area
    • Scope
    • Anticipated outcomes

Target icon

Responsible AI Data Management

Step 2. Data source integrity

  • Assess integrity and trustworthiness
  • Reviews and testimonials
  • Transparency in data collection
  • Compliance and licensing
  • High-quality: regular updates

Handshake and shield line icon

Responsible AI Data Management

Step 3. Legal compliance

  • Lawfulness of data
  • Legal compliance for the project
  • Legal counsel:
    • Applicable laws and restrictions
    • Data anonymization
    • Data security requirements
  • Approved by a legal team

laws

Responsible AI Data Management

Step 4. Technical quality

  • Structural integrity and usability
  • Complete
  • Consistent
  • Accurate
  • Timely

Certificate of quality

Responsible AI Data Management

Step 5. Bias and representativeness

  • Demographic representation analysis
  • Protected characteristics
  • Analyze the distribution of groups
  • Fairness metrics
  • Data augmentation
Responsible AI Data Management

Step 6. Selection

  • Include if:
    • Consistently aligns
    • Can be corrected with transformation, augmentation, or algorithms
  • Exclude if:
    • Lacks in key areas
  • Consult domain experts

Making a selection

Responsible AI Data Management

Urban traffic flow project

Data sources:

  1. Traffic count data
  2. Council meeting notes
  3. GPS tracking data
  4. Social media mentions of traffic conditions
  5. Commuter survey data

Urban traffic flow project

1 Images by Streamline HQ
Responsible AI Data Management

Urban traffic flow project

Exclude:

  • Social media data
  • Commuter survey data

Alterations:

  • Council meeting notes
  • Traffic count data
    • Additional data from sensors
  • GPS data
    • Additional data from cameras
Responsible AI Data Management

Let's practice!

Responsible AI Data Management

Preparing Video For Download...