Design of experiments

Introduction to Statistics in Python

Maggie Matsui

Content Developer, DataCamp

Vocabulary

Experiment aims to answer: What is the effect of the treatment on the response?

  • Treatment: explanatory/independent variable
  • Response: response/dependent variable

 

E.g.: What is the effect of an advertisement on the number of products purchased?

  • Treatment: advertisement
  • Response: number of products purchased
Introduction to Statistics in Python

Controlled experiments

  • Participants are assigned by researchers to either treatment group or control group
    • Treatment group sees advertisement
    • Control group does not
  • Groups should be comparable so that causation can be inferred
  • If groups are not comparable, this could lead to confounding (bias)
    • Treatment group average age: 25
    • Control group average age: 50
    • Age is a potential confounder
Introduction to Statistics in Python

The gold standard of experiments will use...

  • Randomized controlled trial

    • Participants are assigned to treatment/control randomly, not based on any other characteristics
    • Choosing randomly helps ensure that groups are comparable
  • Placebo

    • Resembles treatment, but has no effect
    • Participants will not know which group they're in
    • In clinical trials, a sugar pill ensures that the effect of the drug is actually due to the drug itself and not the idea of receiving the drug
Introduction to Statistics in Python

The gold standard of experiments will use...

  • Double-blind trial
    • Person administering the treatment/running the study doesn't know whether the treatment is real or a placebo
    • Prevents bias in the response and/or analysis of results

 

Fewer opportunities for bias = more reliable conclusion about causation

Introduction to Statistics in Python

Observational studies

  • Participants are not assigned randomly to groups

    • Participants assign themselves, usually based on pre-existing characteristics
  • Many research questions are not conducive to a controlled experiment

    • You can't force someone to smoke or have a disease
    • You can't make someone have certain past behavior
  • Establish association, not causation
    • Effects can be confounded by factors that got certain people into the control or treatment group
    • There are ways to control for confounders to get more reliable conclusions about association
Introduction to Statistics in Python

Longitudinal vs. cross-sectional studies

Longitudinal study

  • Participants are followed over a period of time to examine effect of treatment on response
  • Effect of age on height is not confounded by generation
  • More expensive, results take longer

Cross-sectional study

  • Data on participants is collected from a single snapshot in time
  • Effect of age on height is confounded by generation
  • Cheaper, faster, more convenient
Introduction to Statistics in Python

Let's practice!

Introduction to Statistics in Python

Preparing Video For Download...