Design of experiments

Introduction to Statistics in R

Maggie Matsui

Content Developer, DataCamp

Vocabulary

Experiment aims to answer: What is the effect of the treatment on the response?

  • Treatment: explanatory/independent variable
  • Response: response/dependent variable

 

What is the effect of an advertisement on the number of products purchased?

  • Treatment: advertisement
  • Response: number of products purchased
Introduction to Statistics in R

Controlled experiments

  • Participants are assigned by researchers to either treatment group or control group
    • Treatment group sees advertisement
    • Control group does not
  • Groups should be comparable so that causation can be inferred
  • If groups are not comparable, this could lead to confounding (bias)
    • Treatment group average age: 25
    • Control group average age: 50
    • Age is a potential confounder
Introduction to Statistics in R

The gold standard of experiments will use...

  • Randomized controlled trial

    • Participants are assigned to treatment/control randomly, not based on any other characteristics
    • Choosing randomly helps ensure that groups are comparable
  • Placebo

    • Resembles treatment, but has no effect
    • Participants will not know which group they're in
    • In clinical trials, a sugar pill ensures that the effect of the drug is actually due to the drug itself and not the idea of receiving the drug
Introduction to Statistics in R

The gold standard of experiments will use...

  • Double-blind trial
    • Person administering the treatment/running the study doesn't know whether the treatment is real or a placebo
    • Prevents bias in the response and/or analysis of results

 

Fewer opportunities for bias = more reliable conclusion about causation

Introduction to Statistics in R

Observational studies

  • Participants are not assigned randomly to groups

    • Participants assign themselves, usually based on pre-existing characteristics
  • Many research questions are not conducive to a controlled experiment

    • You can't force someone to smoke or have a disease
    • You can't make someone have certain past behavior
  • Establish association, not causation
    • Effects can be confounded by factors that got certain people into the control or treatment group
    • There are ways to control for confounders to get more reliable conclusions about association
Introduction to Statistics in R

Longitudinal vs. cross-sectional studies

Longitudinal study

  • Participants are followed over a period of time to examine effect of treatment on response
  • Effect of age on height is not confounded by generation
  • More expensive, results take longer

Cross-sectional study

  • Data on participants is collected from a single snapshot in time
  • Effect of age on height is confounded by generation
  • Cheaper, faster, more convenient
Introduction to Statistics in R

Let's practice!

Introduction to Statistics in R

Preparing Video For Download...