Correlation

Introdução à estatística

George Boorman

Curriculum Manager, DataCamp

Relationships between two variables

scatter_plot_displaying_monthly_gym_costs_vs_cost_of_a_bottle_of_water.png

Introdução à estatística

Pearson correlation coefficient

  • Published by Karl Pearson in 1896!
  • Quantifies the strength of a relationship between two variables
  • Number between minus one and one
  • Magnitude corresponds to strength of relationship
  • Sign (+ or -) corresponds to direction of relationship
1 https://royalsocietypublishing.org/doi/10.1098/rsta.1896.0007
Introdução à estatística

Linear relationships

  • Linear = proportionate changes between dependent and independent variables

scatter_plot_of_gym_vs_water_costs_with_annotations_for_observations_of_one_dollar_water_and_thirty_dollar_gym_costs_plus_one_dollar_fifty_water_and_forty_five_dollar_gym_costs_and_correlation_coefficient_equals_zero_point_three_six.png

Introdução à estatística

Values = strength of the relationship

0.99 (very strong relationship)

Scatterplot with points very close to an invisible line.png

Introdução à estatística

Values = strength of the relationship

0.99 (very strong relationship)

Scatterplot with points very close to an invisible line.png

0.75 (strong relationship)

Scatterplot with points further from the invisible line.png

Introdução à estatística

Values = strength of the relationship

0.56 (moderate relationship)

Scatterplot with points even further from the invisible line.png

Introdução à estatística

Values = strength of the relationship

0.56 (moderate relationship)

Scatterplot with points even further from the invisible line.png

0.21 (weak relationship)

Scatterplot with points that look almost totally randomly scattered.png

Introdução à estatística

Values = strength of the relationship

0.04 (no relationship)

Scatterplot with points that look totally randomly scattered.png

  • Knowing the value of x doesn't tell us anything about y
Introdução à estatística

Sign = direction

0.75: as x increases, y increases

Scatterplot where y increases as x increases.png

-0.75: as x increases, y decreases

Scatterplot where y decreases as x increases.png

Introdução à estatística

Gym costs vs. water costs

scatter_plot_displaying_monthly_gym_costs_vs_cost_of_a_bottle_of_water.png

Introdução à estatística

Adding a trendline

scatter_plot_displaying_monthly_gym_costs_vs_cost_of_a_bottle_of_water_with_trendline_and_annotated_p_equals_zero_point_three_five.png

Introdução à estatística

Life expectancy vs. cost of a bottle of water

scater_plot_of_life_expectancy_vs_water_bottle_cost_showing_trendline_and_p_equals_zero_point_six_one.png

Introdução à estatística

Correlation does not equal causation

  • Will increasing the cost of water result in an increase in life expectancy?

water_bottles.png

elderly_couple.png

  • Correlation does not equal causation
1 Image credit: https://unsplash.com/@micheile; https://unsplash.com/@jon_chng
Introdução à estatística

Confounding variables

  • What else might be affecting life expectancy?

    • A bottle of water costs more in countries with strong economies
    • These countries generally offer access to high-quality healthcare
  • The strength of the economy could be a confounding variable

    • A confounding variable is not measured, but may affect the relationship between our variables

doctor.jpg

Introdução à estatística

Let's practice!

Introdução à estatística

Preparing Video For Download...