Choosing the right statistical test

Experimental Design in Python

James Chapman

Curriculum Manager, DataCamp

Selecting the right test

 

  • Dataset's features
    • Data types
    • Distributions → assumed to be normal in many tests!
    • Number of variables
  • Hypotheses

Result: accurate and dependable conclusions!

  • t-tests, ANOVA, and chi-square

Library of tools and books

Experimental Design in Python

The dataset: athletic performance

  • Impact of training programs and diets on athletic performance
athletic_perf.sample(n=5)
 Athlete_ID  Training_Program     Diet_Type  Initial_Fitness  Performance_Inc 
        167         Endurance   Plant-Based             High         9.113040               
        289         Endurance          Keto              Low        11.039744               
        164         Endurance   Plant-Based           Medium        11.614835              
         30          Strength          Keto           Medium         7.384686               
        186              HIIT  High-Protein              Low         6.776078
Experimental Design in Python

Independent samples t-test

  • Comparing means of two groups
  • Assumptions: normal distribution, equal variances
from scipy.stats import ttest_ind
group1 = athletic_perf[athletic_perf['Training_Program'] == 'HIIT']['Performance_Inc']
group2 = athletic_perf[athletic_perf['Training_Program'] == 'Endurance']['Performance_Inc']

t_stat, p_val = ttest_ind(group1, group2) print(f"T-statistic: {t_stat}, P-value: {p_val}")
T-statistic: 0.20671020082911742, P-value: 0.8364563849070663

p_val > $\alpha$ → insufficient evidence of a difference in means

Experimental Design in Python

One-way ANOVA

  • Comparing means across multiple (>2) groups
  • Assumption: equal variances among groups
from scipy.stats import f_oneway
program_types = ['HIIT', 'Endurance', 'Strength']
groups = [athletic_perf_data[athletic_perf_data['Training_Program'] == program]
['Performance_Increase'] for program in program_types]

f_stat, p_val = f_oneway(*groups) print(f"F-statistic: {f_stat}, P-value: {p_val}")
F-statistic: 1.5270022393256704, P-value: 0.2188859009050602

p_val > $\alpha$ → insufficient evidence of a difference in means

Experimental Design in Python

Chi-square test of association

  • Testing relationships between categorical variables
  • No assumptions about distributions
from scipy.stats import chi2_contingency
import pandas as pd
contingency_table = pd.crosstab(athletic_perf['Training_Program'],
                                athletic_perf['Diet_Type'])
Diet_Type         High-Protein  Keto  Plant-Based
Training_Program                                 
Endurance                   33    28           33
HIIT                        27    32           40
Strength                    38    29           40
Experimental Design in Python

Chi-square test of association

chi2_stat, p_val, dof, expected = chi2_contingency(contingency_table)
print(f"Chi2-statistic: {chi2_stat}, P-value: {p_val}")
Chi2-statistic: 2.154450885821988, P-value: 0.7073764021451127

p_val > $\alpha$ → insufficient evidence of an association

Experimental Design in Python

Let's practice!

Experimental Design in Python

Preparing Video For Download...