Correlation tests

Foundations of Inference in Python

Paul Savala

Assistant Professor of Mathematics

Correlation

Statistical relationship between two variables

Two sinusoidal lines which are roughly correlated.

Foundations of Inference in Python

Rent prices in Chicago

A line graph with date on the x-axis, normalized rent on the y-axis, and a line dipping from 2011 to 2012, then climbing from 2012 to 2017.

1 https://www.zillow.com/research/data/
Foundations of Inference in Python

Rent prices in Chicago versus USA

A line graph with date on the x-axis, normalized rent on the y-axis, and two lines representing rents in Chicago and the USA average. Both lines are moving in unison.

Foundations of Inference in Python

Pearson's R in SciPy

r, p_value = stats.pearsonr(chicago_rents, usa_rents)

print(r)
0.939
print(p_value < 0.05)
TRUE

Conclusion: Rent prices between USA and Chicago are correlated

Foundations of Inference in Python

Explained variance

$R^2:$ Percent of variation explained

print(r**2)
0.883
  • 88.3% of variation in Chicago rent explained by USA rent

A scatter plot with USA average rents on the x-axis, Chicago rents on the y-axis, and a positive linear trend between them.

Foundations of Inference in Python

Inference from correlation

  • Factors unique to Chicago?
    • Job prospects?
    • Weather?
    • Taxes?
    • Something else?
  • Goal: Explain remaining variation unexplained by USA rents
Foundations of Inference in Python

Drawbacks of correlation

  • Correlated data is not independent
  • Violates assumptions of some hypothesis tests
  • Data can be correlated with itself

A line graph with date on the x-axis, normalized rent on the y-axis, and a line dipping from 2011 to 2012, then climbing from 2012 to 2017.

Foundations of Inference in Python

Autocorrelation

  • Autocorrelation: Correlation with past measurements
  • Rent prices tend correlated with prices one year earlier

A scatter plot showing Chicago rents on the x-axis, rent from one year prior in Chicago on the y-axis, and generally positive linear trend.

Foundations of Inference in Python

Let's practice!

Foundations of Inference in Python

Preparing Video For Download...