Paired t-tests

Hypothesis Testing in Python

James Chapman

Curriculum Manager, DataCamp

US Republican presidents dataset

         state       county  repub_percent_08  repub_percent_12
0      Alabama         Hale         38.957877         37.139882
1     Arkansas       Nevada         56.726272         58.983452
2   California         Lake         38.896719         39.331367
3   California      Ventura         42.923190         45.250693
..         ...          ...               ...               ...
96   Wisconsin    La Crosse         37.490904         40.577038
97   Wisconsin    Lafayette         38.104967         41.675050
98     Wyoming       Weston         76.684241         83.983328
99      Alaska  District 34         77.063259         40.789626

[100 rows x 4 columns]

100 rows; each row represents county-level votes in a presidential election.

1 https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ
Hypothesis Testing in Python

Hypotheses

Question: Was the percentage of Republican candidate votes lower in 2008 than 2012?

$H_{0}$: $\mu_{2008} - \mu_{2012} = 0$

$H_{A}$: $\mu_{2008} - \mu_{2012} < 0$

Set $\alpha = 0.05$ significance level.

  • Data is paired → each voter percentage refers to the same county
    • Want to capture voting patterns in model
Hypothesis Testing in Python

From two samples to one

sample_data = repub_votes_potus_08_12
sample_data['diff'] = sample_data['repub_percent_08'] - sample_data['repub_percent_12']
import matplotlib.pyplot as plt
sample_data['diff'].hist(bins=20)

Histogram of the diff variable - most values are between -10 and 10 with some outliers.

Hypothesis Testing in Python

Calculate sample statistics of the difference

xbar_diff = sample_data['diff'].mean()
-2.877109041242944
Hypothesis Testing in Python

Revised hypotheses

Old hypotheses:

$H_{0}$: $\mu_{2008} - \mu_{2012} = 0$

$H_{A}$: $\mu_{2008} - \mu_{2012} < 0$

 

New hypotheses:

$H_{0}$: $\mu_{\text{diff}} = 0$

$H_{A}$: $ \mu_{\text{diff}} < 0$

$t = \dfrac{\bar{x}_{\text{diff}} - \mu_{\text{diff}}}{\sqrt{\dfrac{s_{diff}^2}{n_{\text{diff}}}}}$

$df = n_{diff} - 1$

Hypothesis Testing in Python

Calculating the p-value

n_diff = len(sample_data)
100
s_diff = sample_data['diff'].std()
t_stat = (xbar_diff-0) / np.sqrt(s_diff**2/n_diff)
-5.601043121928489
degrees_of_freedom = n_diff - 1
99

$t = \dfrac{\bar{x}_{\text{diff}} - \mu_{\text{diff}}}{\sqrt{\dfrac{s_{\text{diff}}^2}{n_{\text{diff}}}}}$

$df = n_{\text{diff}} - 1$

 

from scipy.stats import t
p_value = t.cdf(t_stat, df=n_diff-1)
9.572537285272411e-08
Hypothesis Testing in Python

Testing differences between two means using ttest()

import pingouin

pingouin.ttest(x=sample_data['diff'],
y=0,
alternative="less")
               T  dof alternative         p-val          CI95%   cohen-d  \
T-test -5.601043   99        less  9.572537e-08  [-inf, -2.02]  0.560104   

             BF10  power  
T-test  1.323e+05    1.0
1 Details on Returns from pingouin.ttest() are available in the API docs for pingouin at https://pingouin-stats.org/generated/pingouin.ttest.html#pingouin.ttest.
Hypothesis Testing in Python

ttest() with paired=True

pingouin.ttest(x=sample_data['repub_percent_08'],
               y=sample_data['repub_percent_12'],
               paired=True,
               alternative="less")
               T  dof alternative         p-val          CI95%   cohen-d  \
T-test -5.601043   99        less  9.572537e-08  [-inf, -2.02]  0.217364   

             BF10     power  
T-test  1.323e+05  0.696338
Hypothesis Testing in Python

Unpaired ttest()

pingouin.ttest(x=sample_data['repub_percent_08'],
               y=sample_data['repub_percent_12'],
               paired=False, # The default
               alternative="less")
               T  dof alternative     p-val         CI95%   cohen-d   BF10  \
T-test -1.536997  198        less  0.062945  [-inf, 0.22]  0.217364  0.927   

           power  
T-test  0.454972  
  • Unpaired t-tests on paired data increases the chances of false negative errors
Hypothesis Testing in Python

Let's practice!

Hypothesis Testing in Python

Preparing Video For Download...