Effect size

Foundations of Inference in Python

Paul Savala

Assistant Professor of Mathematics

What is effect size?

A doctor holding a broken cigarette.

  • Effect size: Measure of strength between two variables

A collection of junk food.

Foundations of Inference in Python

Why measure effect size

  • Measures strength of relationship
  • Smoking: Large effect size
  • Poor diet: Small effect size
Foundations of Inference in Python

P-Values

 

  • Does a relationship exist?
  • Comes from hypothesis test

Effect size

 

  • How strong is the relationship?
  • Separate from a hypothesis test
Foundations of Inference in Python

Effect size for means - Cohen's d

$n_1 = \text{Sample size of group one}$

$n_2 = \text{Sample size of group two}$

$s_1 = \text{Standard deviation of group one}$

$s_2 = \text{Standard deviation of group two}$

$\overline{x}_1 = \text{Mean of group one}$

$\overline{x}_2 = \text{Mean of group two}$

$s = \displaystyle\sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}$

Cohen's $d = \displaystyle\frac{\overline{x}_1 - \overline{x}_2}{s}$

Foundations of Inference in Python

Interpreting Cohen's d

  • 0.01 - Very small
  • 0.20 - Small
  • 0.50 - Medium
  • 0.80 - Large
  • 1.20 - Very large

Cohen's $d = 0.6$

Medium-to-large effect size

1 https://books.google.com/books?id=2v9zDAsLvA0C&pg=PP1 https://doi.org/10.22237%2Fjmasm%2F1257035100
Foundations of Inference in Python

Effect size for correlation

r, p_value = stats.pearsonr(
    btc_sp_df['Close_BTC'], 
    btc_sp_df['Close_SP500']
    )

print(r**2)
0.82

$R^2:$ Percent of variation in one variable explained by knowing the other

A scatter plot with the S and P 500 closing price on the x axis and the Bitcoin closing price on the y-axis. The graph is roughly linear with a positive slope.

Foundations of Inference in Python

Effect size for categorical variables

  • $\chi^2$ = Chi-squared statistic from contingency table
  • $n$ = total number of data points
  • $d$ = degrees of freedom = $min(\text{rows}-1, \text{cols}-1)$

Cramer's $V = \displaystyle\sqrt{\frac{\chi^2/n}{d}}$

Foundations of Inference in Python

Calculating Cramer's V

chi2, p, d, e = stats.chi2_contingency(
    contingency_table)

dof = min(6-1, 2-1) = 1 n = 3394
v = np.sqrt((chi2 / n) / dof)
v = 0.52

A table showing male and females along with job titles. The table shows how many males and females hold each job title.

1 https://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics)
Foundations of Inference in Python

Interpreting Cramer's V

Cramer's V = 0.52, Degrees of Freedom = 1

A table showing degrees of freedom as one through five, and the Cramer's V values needed for an effect size of small, medium and large. Small goes 0.1, 0.07, 0.06, 0.05, 0.04. Medium goes 0.3, 0.21, 0.17, 0.15, 0.13. Large goes 0.5, 0.35, 0.29, 0.25, 0.22.

1 https://www.statology.org/interpret-cramers-v
Foundations of Inference in Python

Let's practice!

Foundations of Inference in Python

Preparing Video For Download...