Covariance and correlation

Practicing Statistics Interview Questions in R

Zuzanna Chmielewska

Actuary

Covariance and correlation

a factory

Practicing Statistics Interview Questions in R

a linear relationship

Practicing Statistics Interview Questions in R

Covariance

Formula for a sample: $$ cov(X, Y) = \frac{\sum_{i = 1}^{n} (x_i - \overline{x}) \cdot (y_i - \overline{y})}{n-1} $$

Formula for a population: $$ cov(X, Y) = \frac{\sum_{i = 1}^{n} (x_i - \overline{x}) \cdot (y_i - \overline{y})}{n} $$

Practicing Statistics Interview Questions in R

Covariance

Formula for a sample: $$ cov(X, Y) = \frac{\sum_{i = 1}^{n} (x_i - \overline{x}) \cdot (y_i - \overline{y})}{n-1} $$

Practicing Statistics Interview Questions in R

Covariance

Formula for a population: $$ cov(X, Y) = \frac{\sum_{i = 1}^{n} (x_i - \overline{x}) \cdot (y_i - \overline{y})}{n} $$

Practicing Statistics Interview Questions in R

Covariance - numerical example

$ x_1 = 3, x_2 = 5, x_3 = 7 $

$ y_1 = 6, y_2 = 11, y_3 = 13 $

$ \overline{x} = 5$

$ \overline{y} = 10$

$(x_1 - \overline{x}) \cdot (y_1 - \overline{y})= 8 $

$(x_2 - \overline{x}) \cdot (y_2 - \overline{y})= 0 $

$(x_3 - \overline{x}) \cdot (y_3 - \overline{y})= 6 $

$ \sum_{i=1}^{n} (x_i - \overline{x}) \cdot (y_i - \overline{y}) = 14$

$ \frac{\sum_{i=1}^{n} (x_i - \overline{x}) \cdot (y_i - \overline{y})}{n-1} = 7$

Practicing Statistics Interview Questions in R

Correlation coefficient

$$ corr(X, Y) = \frac{cov(X, Y)}{\sigma_x \cdot \sigma_y} $$

Practicing Statistics Interview Questions in R

Correlation coefficient

perfectly positively correlated data points

Practicing Statistics Interview Questions in R

Correlation coefficient

perfectly negatively correlated data points

Practicing Statistics Interview Questions in R

Correlation coefficient

examples of data points with some positive, some negative and almost no linear relationship

Practicing Statistics Interview Questions in R

Correlation coefficient

examples of data points with some positive, some negative and almost no linear relationship

Practicing Statistics Interview Questions in R

Correlation coefficient

examples of data points with some positive, some negative and almost no linear relationship

Practicing Statistics Interview Questions in R

Correlation coefficient

correlation coefficient

Practicing Statistics Interview Questions in R

Nonlinear relationships

nonlinear relationships

Practicing Statistics Interview Questions in R

Correlation does not imply causation!

domino

Practicing Statistics Interview Questions in R

Summary

  • covariance
  • correlation coefficient
Practicing Statistics Interview Questions in R

Let's practice!

Practicing Statistics Interview Questions in R

Preparing Video For Download...