Introduction to correlations

A/B Testing in R

Lauryn Burleigh

Data Scientist

Correlation in A/B design

  • Relationship strength and direction
  • Two variables
  • Increase or decrease in one variable per unit of the other variable
    • Enjoyment correlated with time to eat
  • Ignore groups

Scatter plot of a positive correlation with time on the x-axis and enjoyment on the y-axis with each point colored either pink if Pepperoni pizza and blue if Cheese pizza.

A/B Testing in R

Correlation in A/B design

  • Within groups

 

Scatter plot of a positive correlation with time on the x-axis and enjoyment on the y-axis of only Cheese pizza.

 

 

Scatter plot of a positive correlation with time on the x-axis and enjoyment on the y-axis of only Pepperoni pizza.

A/B Testing in R

Correlation

  • Does NOT imply causation

  • Increase in drownings & ice cream sales

    • NOT causal
    • Likely: warm months
  • Using A/B to deduce causation
    • Make and test changes
  • A/B: cheese and pepperoni

    • Enjoyment and time to eat pizza
    • Group change - Relationship change
  • Does not indicate dependency

ggplot(data, aes(x = drownings, 
                 y = icecream)) +
  geom_point()

A scatter plot showing a positive correlation with number of drownings on the x-axis and ice cream sales on the y-axis.

A/B Testing in R

Correlation coefficient

  • Correlation coefficient (r) - degree of association
    • -1 to +1
  • More extreme values - stronger association
    • Better prediction

 

A scatter plot showing no correlation with points at the same level on the y-axis across the entirety of the x-axis.

A scatter plot showing a negative correlation with points high on the left of the x-axis and decreasing as moving to the right of the x-axis.

A scatter plot showing a positive correlation with points low on the left of the x-axis and increasing as moving to the right of the x-axis.

A/B Testing in R

Correlation values

  • Proportion of variation in time eating that can be attributed to enjoyment
    • R^2
corvalue <- cor(data$time, data$enjoyment)
corvalue
[1]    .73
corvalue^2
[1]        .5329
A/B Testing in R

Correlation limitations

  • Particularly susceptible to outliers

 

Correlation coefficients

  • Not line of best fit

    • Regression
  • Not indication of significance

    • Correlation coefficient and sample size for p-value

Image of a positive correlation on a computer with a data point away from the linear correlation at the bottom right labeled as an outlier.

A scatter plot of a positive correlation with a red line of best fit through drownings on the x-axis and ice cream sales on the y-axis.

A/B Testing in R

Let's practice!

A/B Testing in R

Preparing Video For Download...