Mitigating bias in data collection

Conquering Data Bias

Konstantinos Kattidis

Data Analytics Lead

Identifying bias in data collection

  • Selection bias, historical bias and measurement bias
  • Understanding these biases creates awareness, enabling data experts to proactively identify them and take action

Comparison between two analyses

  • Sensitivity analysis involves exploring how different assumptions, alternative subgroups, or weighting strategies affect the analysis results
  • External validation compares data against independent sources to check for consistency and accuracy
Conquering Data Bias

Random and stratified sampling

Diagram about random and stratified sampling

  • Selecting an appropriate sampling technique is important
  • Random sampling involves selecting individuals or data points from a population randomly
  • Stratified sampling divides the population into subgroups and then selects samples from each subgroup
Conquering Data Bias

Balancing subgroup representation

Diagram showing undersampling and oversampling

  • Oversampling involves deliberately increasing the representation of certain groups or classes in a dataset to balance the distribution
  • Undersampling involves reducing the representation of overrepresented groups to achieve a more balanced dataset
  • Weighting involves assigning different weights to observations based on their importance, compensating for any imbalances in the sample distribution
Conquering Data Bias

Data augmentation

  • To address historical bias, this technique enriches the dataset with additional data points
  • The aim is to cover underrepresented periods or events
  • It includes:
    • Filling data gaps
    • Diversifying perspectives
    • Updating and correcting errors

Puzzle filling the gaps

Conquering Data Bias

Data measurement practices

Four data measurement practices

  • Standardization of measurement tools and protocols
  • Training and calibration of data collectors
  • Pilot testing can be used to assess the accuracy and consistency of data collection procedures
  • Regular quality assurance checks and automation of processes can further enhance data quality
Conquering Data Bias

Continuous monitoring and adjustment

Data monitoring dashboard

  • Continuous monitoring and adjustment are essential to address emerging biases
  • Regular reviews of data quality metrics
  • Bias assessments
  • These enable immediate identification of biases
Conquering Data Bias

Let's practice!

Conquering Data Bias

Preparing Video For Download...