Hypothesis Testing in Python
James Chapman
Curriculum Manager, DataCamp
stack_overflow['job_sat'].value_counts()
Very satisfied 879
Slightly satisfied 680
Slightly dissatisfied 342
Neither 201
Very dissatisfied 159
Name: job_sat, dtype: int64
Is mean annual compensation different for different levels of job satisfaction?
import seaborn as sns
import matplotlib.pyplot as plt
sns.boxplot(x="converted_comp",
y="job_sat",
data=stack_overflow)
plt.show()
alpha = 0.2
pingouin.anova(data=stack_overflow, dv="converted_comp", between="job_sat")
Source ddof1 ddof2 F p-unc np2
0 job_sat 4 2256 4.480485 0.001315 0.007882
0.001315
$< \alpha$
Set significance level to $\alpha = 0.2$.
pingouin.pairwise_tests(data=stack_overflow,
dv="converted_comp",
between="job_sat",
padjust="none")
Contrast A B Paired Parametric ... dof alternative p-unc BF10 hedges
0 job_sat Slightly satisfied Very satisfied False True ... 1478.622799 two-sided 0.000064 158.564 -0.192931
1 job_sat Slightly satisfied Neither False True ... 258.204546 two-sided 0.484088 0.114 -0.068513
2 job_sat Slightly satisfied Very dissatisfied False True ... 187.153329 two-sided 0.215179 0.208 -0.145624
3 job_sat Slightly satisfied Slightly dissatisfied False True ... 569.926329 two-sided 0.969491 0.074 -0.002719
4 job_sat Very satisfied Neither False True ... 328.326639 two-sided 0.097286 0.337 0.120115
5 job_sat Very satisfied Very dissatisfied False True ... 221.666205 two-sided 0.455627 0.126 0.063479
6 job_sat Very satisfied Slightly dissatisfied False True ... 821.303063 two-sided 0.002166 7.43 0.173247
7 job_sat Neither Very dissatisfied False True ... 321.165726 two-sided 0.585481 0.135 -0.058537
8 job_sat Neither Slightly dissatisfied False True ... 367.730081 two-sided 0.547406 0.118 0.055707
9 job_sat Very dissatisfied Slightly dissatisfied False True ... 247.570187 two-sided 0.259590 0.197 0.119131
[10 rows x 11 columns]
pingouin.pairwise_tests(data=stack_overflow,
dv="converted_comp",
between="job_sat",
padjust="bonf")
Contrast A B ... p-unc p-corr p-adjust BF10 hedges
0 job_sat Slightly satisfied Very satisfied ... 0.000064 0.000638 bonf 158.564 -0.192931
1 job_sat Slightly satisfied Neither ... 0.484088 1.000000 bonf 0.114 -0.068513
2 job_sat Slightly satisfied Very dissatisfied ... 0.215179 1.000000 bonf 0.208 -0.145624
3 job_sat Slightly satisfied Slightly dissatisfied ... 0.969491 1.000000 bonf 0.074 -0.002719
4 job_sat Very satisfied Neither ... 0.097286 0.972864 bonf 0.337 0.120115
5 job_sat Very satisfied Very dissatisfied ... 0.455627 1.000000 bonf 0.126 0.063479
6 job_sat Very satisfied Slightly dissatisfied ... 0.002166 0.021659 bonf 7.43 0.173247
7 job_sat Neither Very dissatisfied ... 0.585481 1.000000 bonf 0.135 -0.058537
8 job_sat Neither Slightly dissatisfied ... 0.547406 1.000000 bonf 0.118 0.055707
9 job_sat Very dissatisfied Slightly dissatisfied ... 0.259590 1.000000 bonf 0.197 0.119131
[10 rows x 11 columns]
padjust : string
Method used for testing and adjustment of pvalues.
'none'
: no correction [default]'bonf'
: one-step Bonferroni correction'sidak'
: one-step Sidak correction'holm'
: step-down method using Bonferroni adjustments'fdr_bh'
: Benjamini/Hochberg FDR correction'fdr_by'
: Benjamini/Yekutieli FDR correctionHypothesis Testing in Python