Introduction to Statistics in R
Maggie Matsui
Content Developer, DataCamp








x doesn't tell us anything about yx increases, y increases
x increases, y decreases
ggplot(df, aes(x, y)) +
  geom_point()

ggplot(df, aes(x, y)) + geom_point() +geom_smooth(method = "lm", se = FALSE)

cor(df$x, df$y)
-0.7472765
cor(df$y, df$x)
-0.7472765
df$x
-3.2508382  -9.1599807   3.4515013   4.1505899          NA   11.9806140   ...
cor(df$x, df$y)
NA
cor(df$x, df$y, use = "pairwise.complete.obs")
-0.7471757
$$ r =\frac{\sum ^n _{i=1}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum ^n _{i=1}(x_i - \bar{x})^2} \sqrt{\sum ^n _{i=1}(y_i - \bar{y})^2}} $$
Introduction to Statistics in R