Non-parametric tests

Hypothesis Testing in Python

James Chapman

Curriculum Manager, DataCamp

Parametric tests

  • z-test, t-test, and ANOVA are all parametric tests
  • Assume a normal distribution
  • Require sufficiently large sample sizes
Hypothesis Testing in Python

Smaller Republican votes data

print(repub_votes_small)
            state      county  repub_percent_08  repub_percent_12
80          Texas   Red River         68.507522         69.944817
84          Texas      Walker         60.707197         64.971903
33       Kentucky      Powell         57.059533         61.727293
81          Texas  Schleicher         74.386503         77.384464
93  West Virginia      Morgan         60.857614         64.068711
Hypothesis Testing in Python

Results with pingouin.ttest()

  • 5 pairs is not enough to meet the sample size condition for the paired t-test:
  • At least 30 pairs of observations across the samples.
alpha = 0.01

import pingouin pingouin.ttest(x=repub_votes_potus_08_12_small['repub_percent_08'], y=repub_votes_potus_08_12_small['repub_percent_12'], paired=True, alternative="less")
               T  dof alternative     p-val          CI95%   cohen-d    BF10     power
T-test -5.875753    4        less  0.002096  [-inf, -2.11]  0.500068  26.468  0.239034
Hypothesis Testing in Python

Non-parametric tests

  • Non-parametric tests avoid the parametric assumptions and conditions
  • Many non-parametric tests use ranks of the data
x = [1, 15, 3, 10, 6]
from scipy.stats import rankdata
rankdata(x)
array([1., 5., 2., 4., 3.])
Hypothesis Testing in Python

Non-parametric tests

  • Non-parametric tests are more reliable than parametric tests for small sample sizes and when data isn't normally distributed
Hypothesis Testing in Python

Non-parametric tests

  • Non-parametric tests are more reliable than parametric tests for small sample sizes and when data isn't normally distributed

 

Wilcoxon-signed rank test
  • Developed by Frank Wilcoxon in 1945
  • One of the first non-parametric procedures
Hypothesis Testing in Python

Wilcoxon-signed rank test (Step 1)

  • Works on the ranked absolute differences between the pairs of data
repub_votes_small['diff'] = repub_votes_small['repub_percent_08'] -
                            repub_votes_small['repub_percent_12']
print(repub_votes_small)
            state      county  repub_percent_08  repub_percent_12      diff
80          Texas   Red River         68.507522         69.944817 -1.437295
84          Texas      Walker         60.707197         64.971903 -4.264705
33       Kentucky      Powell         57.059533         61.727293 -4.667760
81          Texas  Schleicher         74.386503         77.384464 -2.997961
93  West Virginia      Morgan         60.857614         64.068711 -3.211097
Hypothesis Testing in Python

Wilcoxon-signed rank test (Step 2)

  • Works on the ranked absolute differences between the pairs of data
repub_votes_small['abs_diff'] = repub_votes_small['diff'].abs()
print(repub_votes_small)
            state      county  repub_percent_08  repub_percent_12      diff  abs_diff
80          Texas   Red River         68.507522         69.944817 -1.437295  1.437295
84          Texas      Walker         60.707197         64.971903 -4.264705  4.264705
33       Kentucky      Powell         57.059533         61.727293 -4.667760  4.667760
81          Texas  Schleicher         74.386503         77.384464 -2.997961  2.997961
93  West Virginia      Morgan         60.857614         64.068711 -3.211097  3.211097
Hypothesis Testing in Python

Wilcoxon-signed rank test (Step 3)

  • Works on the ranked absolute differences between the pairs of data
from scipy.stats import rankdata
repub_votes_small['rank_abs_diff'] = rankdata(repub_votes_small['abs_diff'])
print(repub_votes_small)
            state      county  repub_percent_08  repub_percent_12      diff  abs_diff  rank_abs_diff
80          Texas   Red River         68.507522         69.944817 -1.437295  1.437295            1.0
84          Texas      Walker         60.707197         64.971903 -4.264705  4.264705            4.0
33       Kentucky      Powell         57.059533         61.727293 -4.667760  4.667760            5.0
81          Texas  Schleicher         74.386503         77.384464 -2.997961  2.997961            2.0
93  West Virginia      Morgan         60.857614         64.068711 -3.211097  3.211097            3.0
Hypothesis Testing in Python

Wilcoxon-signed rank test (Step 4)

            state      county  repub_percent_08  repub_percent_12      diff  abs_diff  rank_abs_diff
80          Texas   Red River         68.507522         69.944817 -1.437295  1.437295            1.0
84          Texas      Walker         60.707197         64.971903 -4.264705  4.264705            4.0
33       Kentucky      Powell         57.059533         61.727293 -4.667760  4.667760            5.0
81          Texas  Schleicher         74.386503         77.384464 -2.997961  2.997961            2.0
93  West Virginia      Morgan         60.857614         64.068711 -3.211097  3.211097            3.0
  • Incorporate the sum of the ranks for negative and positive differences
T_minus = 1 + 4 + 5 + 2 + 3

T_plus = 0
W = np.min([T_minus, T_plus])
0
Hypothesis Testing in Python

Implementation with pingouin.wilcoxon()

alpha = 0.01
pingouin.wilcoxon(x=repub_votes_potus_08_12_small['repub_percent_08'],
                  y=repub_votes_potus_08_12_small['repub_percent_12'],
                  alternative="less")
          W-val alternative    p-val  RBC  CLES
Wilcoxon    0.0        less  0.03125 -1.0  0.72

Fail to reject $H_0$, since 0.03125 > 0.01

Hypothesis Testing in Python

Let's practice!

Hypothesis Testing in Python

Preparing Video For Download...