Hypothesis Testing in Python
James Chapman
Curriculum Manager, DataCamp
print(repub_votes_small)
state county repub_percent_08 repub_percent_12
80 Texas Red River 68.507522 69.944817
84 Texas Walker 60.707197 64.971903
33 Kentucky Powell 57.059533 61.727293
81 Texas Schleicher 74.386503 77.384464
93 West Virginia Morgan 60.857614 64.068711
- At least 30 pairs of observations across the samples.
alpha = 0.01
import pingouin pingouin.ttest(x=repub_votes_potus_08_12_small['repub_percent_08'], y=repub_votes_potus_08_12_small['repub_percent_12'], paired=True, alternative="less")
T dof alternative p-val CI95% cohen-d BF10 power
T-test -5.875753 4 less 0.002096 [-inf, -2.11] 0.500068 26.468 0.239034
x = [1, 15, 3, 10, 6]
from scipy.stats import rankdata
rankdata(x)
array([1., 5., 2., 4., 3.])
repub_votes_small['diff'] = repub_votes_small['repub_percent_08'] -
repub_votes_small['repub_percent_12']
print(repub_votes_small)
state county repub_percent_08 repub_percent_12 diff
80 Texas Red River 68.507522 69.944817 -1.437295
84 Texas Walker 60.707197 64.971903 -4.264705
33 Kentucky Powell 57.059533 61.727293 -4.667760
81 Texas Schleicher 74.386503 77.384464 -2.997961
93 West Virginia Morgan 60.857614 64.068711 -3.211097
repub_votes_small['abs_diff'] = repub_votes_small['diff'].abs()
print(repub_votes_small)
state county repub_percent_08 repub_percent_12 diff abs_diff
80 Texas Red River 68.507522 69.944817 -1.437295 1.437295
84 Texas Walker 60.707197 64.971903 -4.264705 4.264705
33 Kentucky Powell 57.059533 61.727293 -4.667760 4.667760
81 Texas Schleicher 74.386503 77.384464 -2.997961 2.997961
93 West Virginia Morgan 60.857614 64.068711 -3.211097 3.211097
from scipy.stats import rankdata
repub_votes_small['rank_abs_diff'] = rankdata(repub_votes_small['abs_diff'])
print(repub_votes_small)
state county repub_percent_08 repub_percent_12 diff abs_diff rank_abs_diff
80 Texas Red River 68.507522 69.944817 -1.437295 1.437295 1.0
84 Texas Walker 60.707197 64.971903 -4.264705 4.264705 4.0
33 Kentucky Powell 57.059533 61.727293 -4.667760 4.667760 5.0
81 Texas Schleicher 74.386503 77.384464 -2.997961 2.997961 2.0
93 West Virginia Morgan 60.857614 64.068711 -3.211097 3.211097 3.0
state county repub_percent_08 repub_percent_12 diff abs_diff rank_abs_diff
80 Texas Red River 68.507522 69.944817 -1.437295 1.437295 1.0
84 Texas Walker 60.707197 64.971903 -4.264705 4.264705 4.0
33 Kentucky Powell 57.059533 61.727293 -4.667760 4.667760 5.0
81 Texas Schleicher 74.386503 77.384464 -2.997961 2.997961 2.0
93 West Virginia Morgan 60.857614 64.068711 -3.211097 3.211097 3.0
T_minus = 1 + 4 + 5 + 2 + 3
T_plus = 0
W = np.min([T_minus, T_plus])
0
alpha = 0.01
pingouin.wilcoxon(x=repub_votes_potus_08_12_small['repub_percent_08'],
y=repub_votes_potus_08_12_small['repub_percent_12'],
alternative="less")
W-val alternative p-val RBC CLES
Wilcoxon 0.0 less 0.03125 -1.0 0.72
Fail to reject $H_0$, since 0.03125 > 0.01
Hypothesis Testing in Python