Correlation

Introduction to Statistics

George Boorman

Curriculum Manager, DataCamp

Relationships between two variables

scatter_plot_displaying_monthly_gym_costs_vs_cost_of_a_bottle_of_water.png

Introduction to Statistics

Pearson correlation coefficient

  • Published by Karl Pearson in 1896!
  • Quantifies the strength of a relationship between two variables
  • Number between minus one and one
  • Magnitude corresponds to strength of relationship
  • Sign (+ or -) corresponds to direction of relationship
1 https://royalsocietypublishing.org/doi/10.1098/rsta.1896.0007
Introduction to Statistics

Linear relationships

  • Linear = proportionate changes between dependent and independent variables

scatter_plot_of_gym_vs_water_costs_with_annotations_for_observations_of_one_dollar_water_and_thirty_dollar_gym_costs_plus_one_dollar_fifty_water_and_forty_five_dollar_gym_costs_and_correlation_coefficient_equals_zero_point_three_six.png

Introduction to Statistics

Values = strength of the relationship

0.99 (very strong relationship)

Scatterplot with points very close to an invisible line.png

Introduction to Statistics

Values = strength of the relationship

0.99 (very strong relationship)

Scatterplot with points very close to an invisible line.png

0.75 (strong relationship)

Scatterplot with points further from the invisible line.png

Introduction to Statistics

Values = strength of the relationship

0.56 (moderate relationship)

Scatterplot with points even further from the invisible line.png

Introduction to Statistics

Values = strength of the relationship

0.56 (moderate relationship)

Scatterplot with points even further from the invisible line.png

0.21 (weak relationship)

Scatterplot with points that look almost totally randomly scattered.png

Introduction to Statistics

Values = strength of the relationship

0.04 (no relationship)

Scatterplot with points that look totally randomly scattered.png

  • Knowing the value of x doesn't tell us anything about y
Introduction to Statistics

Sign = direction

0.75: as x increases, y increases

Scatterplot where y increases as x increases.png

-0.75: as x increases, y decreases

Scatterplot where y decreases as x increases.png

Introduction to Statistics

Gym costs vs. water costs

scatter_plot_displaying_monthly_gym_costs_vs_cost_of_a_bottle_of_water.png

Introduction to Statistics

Adding a trendline

scatter_plot_displaying_monthly_gym_costs_vs_cost_of_a_bottle_of_water_with_trendline_and_annotated_p_equals_zero_point_three_five.png

Introduction to Statistics

Life expectancy vs. cost of a bottle of water

scater_plot_of_life_expectancy_vs_water_bottle_cost_showing_trendline_and_p_equals_zero_point_six_one.png

Introduction to Statistics

Correlation does not equal causation

  • Will increasing the cost of water result in an increase in life expectancy?

water_bottles.png

elderly_couple.png

  • Correlation does not equal causation
1 Image credit: https://unsplash.com/@micheile; https://unsplash.com/@jon_chng
Introduction to Statistics

Confounding variables

  • What else might be affecting life expectancy?

    • A bottle of water costs more in countries with strong economies
    • These countries generally offer access to high-quality healthcare
  • The strength of the economy could be a confounding variable

    • A confounding variable is not measured, but may affect the relationship between our variables

doctor.jpg

Introduction to Statistics

Let's practice!

Introduction to Statistics

Preparing Video For Download...