Ethics issues across data life cycle

Introduction to Data Ethics

Shalini Kurapati, PhD

Co-founder and CEO, Clearbox AI

The data life cycle

  • At every stage of data life cycle
  • Data acquisition- collection, sourcing
  • Data preparation- cleaning, labeling, quality checks
  • Data storage - infrastructure, security, integrity
  • Analysis- AI, interpretation, decision-making
  • Retention/ archival
  • Sharing

Illustration of data life cycle.

1 https://www.ibm.com/topics/data-life cycle-management#Data%20life cycle%20management%20and%20IBM 2 Icon made by Flat Icons on from www.flaticon.com
Introduction to Data Ethics

Data acquisition

Illustration of data acquisition of data from different sources flowing into a computer

  • Many ways - surveys, mobile apps, sensors, wearables, web scraping, third parties
  • Are you allowed to collect the data?- privacy, copyright
  • Purposeful collection - clear about why, and how much
  • Representative data, respectful of people's time
  • Informed consent- crucial
  • Vet your data suppliers
1 Icon made by Parzival 1997 from www.flaticon.com
Introduction to Data Ethics

Data preparation

Screenshot of a TIME magazine article on how Kenyan workers were exploited during data cleaning and data labeling to improve ChatGPT.

  • Cleaning, labeling, annotation- Transcribing audio files, labeling text or images, flagging inappropriate content
  • Human annotators- inadequate training, exploitation, Kenyans workers for ChatGPT
  • Data quality inconsistencies, biased labels
1 https://time.com/6247678/openai-chatgpt-kenya-workers/
Introduction to Data Ethics

Data storage

Illustration of a secure data storage set-up.

  • Confidentiality and integrity- prevent data breaches or accidental losses
  • Data security- no unauthorized access
  • Technical:
    • Infrastructure, methods, techniques, and devices for data storage
  • Organizational:
    • Companies policies, training
1 Icon made by HJ studio from www.flaticon.com
Introduction to Data Ethics

Data sharing

Map indicating the epicenters and spread of the Covid-19 outbreak

  • Data sharing needed for innovation and collaboration, sometimes monetization
  • Positive outcome if responsible- Covid data sharing
  • Privacy regulations, individual rights
  • Data ownership, informed consent
  • Privacy preserving sharing
1 https://www.ga4gh.org/news/regulatory-ethics-perspective-on-covid-19-data-sharing-an-interview-with-johan-ordish/
Introduction to Data Ethics

Let's practice!

Introduction to Data Ethics

Preparing Video For Download...