Chi-square goodness of fit tests

Hypothesis Testing in Python

James Chapman

Curriculum Manager, DataCamp

Purple links

How do you feel when you discover that you've already visited the top resource?

purple_link_counts = stack_overflow['purple_link'].value_counts()
purple_link_counts = purple_link_counts.rename_axis('purple_link')\
                                       .reset_index(name='n')\
                                       .sort_values('purple_link')
         purple_link     n
2             Amused   368
3            Annoyed   263
0  Hello, old friend  1225
1        Indifferent   405
Hypothesis Testing in Python

Declaring the hypotheses

hypothesized = pd.DataFrame({
  'purple_link': ['Amused', 'Annoyed', 'Hello, old friend', 'Indifferent'], 
  'prop': [1/6, 1/6, 1/2, 1/6]})
         purple_link      prop
0             Amused  0.166667
1            Annoyed  0.166667
2  Hello, old friend  0.500000
3        Indifferent  0.166667

$H_{0}$: The sample matches the hypothesized distribution

$H_{A}$: The sample does not match the hypothesized distribution

$\chi^{2}$ measures how far observed results are from expectations in each group

alpha = 0.01
Hypothesis Testing in Python

Hypothesized counts by category

n_total = len(stack_overflow)
hypothesized["n"] = hypothesized["prop"] * n_total
         purple_link      prop            n
0             Amused  0.166667   376.833333
1            Annoyed  0.166667   376.833333
2  Hello, old friend  0.500000  1130.500000
3        Indifferent  0.166667   376.833333
Hypothesis Testing in Python

Visualizing counts

import matplotlib.pyplot as plt

plt.bar(purple_link_counts['purple_link'], purple_link_counts['n'], 
        color='red', label='Observed')

plt.bar(hypothesized['purple_link'], hypothesized['n'], alpha=0.5, color='blue', label='Hypothesized') plt.legend() plt.show()
Hypothesis Testing in Python

Visualizing counts

Bar plot of number of answers versus purple_link answer, with the observed counts in red and the hypothesized counts in blue.

Hypothesis Testing in Python

chi-square goodness of fit test

print(hypothesized)
         purple_link      prop            n
0             Amused  0.166667   376.833333
1            Annoyed  0.166667   376.833333
2  Hello, old friend  0.500000  1130.500000
3        Indifferent  0.166667   376.833333
from scipy.stats import chisquare
chisquare(f_obs=purple_link_counts['n'], f_exp=hypothesized['n'])
Power_divergenceResult(statistic=44.59840778416629, pvalue=1.1261810719413759e-09)
Hypothesis Testing in Python

Let's practice!

Hypothesis Testing in Python

Preparing Video For Download...