Choosing the right statistical test

Experimental Design in Python

James Chapman

Curriculum Manager, DataCamp

Selecting the right test

Dataset's features
- Data types
- Distributions → assumed to be normal in many tests!
- Number of variables
Hypotheses

Result: accurate and dependable conclusions!

t-tests, ANOVA, and chi-square

Library of tools and books

The dataset: athletic performance

Impact of training programs and diets on athletic performance

athletic_perf.sample(n=5)

 Athlete_ID  Training_Program     Diet_Type  Initial_Fitness  Performance_Inc 
        167         Endurance   Plant-Based             High         9.113040               
        289         Endurance          Keto              Low        11.039744               
        164         Endurance   Plant-Based           Medium        11.614835              
         30          Strength          Keto           Medium         7.384686               
        186              HIIT  High-Protein              Low         6.776078

Independent samples t-test

Comparing means of two groups
Assumptions: normal distribution, equal variances

from scipy.stats import ttest_ind
group1 = athletic_perf[athletic_perf['Training_Program'] == 'HIIT']['Performance_Inc']
group2 = athletic_perf[athletic_perf['Training_Program'] == 'Endurance']['Performance_Inc']


t_stat, p_val = ttest_ind(group1, group2)
print(f"T-statistic: {t_stat}, P-value: {p_val}")

T-statistic: 0.20671020082911742, P-value: 0.8364563849070663

p_val > $\alpha$ → insufficient evidence of a difference in means

One-way ANOVA

Comparing means across multiple (>2) groups
Assumption: equal variances among groups

from scipy.stats import f_oneway
program_types = ['HIIT', 'Endurance', 'Strength']
groups = [athletic_perf_data[athletic_perf_data['Training_Program'] == program]
['Performance_Increase'] for program in program_types]

f_stat, p_val = f_oneway(*groups)
print(f"F-statistic: {f_stat}, P-value: {p_val}")

F-statistic: 1.5270022393256704, P-value: 0.2188859009050602

p_val > $\alpha$ → insufficient evidence of a difference in means

Chi-square test of association

Testing relationships between categorical variables
No assumptions about distributions

from scipy.stats import chi2_contingency
import pandas as pd
contingency_table = pd.crosstab(athletic_perf['Training_Program'],
                                athletic_perf['Diet_Type'])

Diet_Type         High-Protein  Keto  Plant-Based
Training_Program                                 
Endurance                   33    28           33
HIIT                        27    32           40
Strength                    38    29           40

Chi-square test of association

chi2_stat, p_val, dof, expected = chi2_contingency(contingency_table)
print(f"Chi2-statistic: {chi2_stat}, P-value: {p_val}")

Chi2-statistic: 2.154450885821988, P-value: 0.7073764021451127

p_val > $\alpha$ → insufficient evidence of an association

Let's practice!

Experimental Design in Python