Parametric tests

Foundations of Inference in Python

Paul Savala

Assistant Professor or Mathematics

ANOVA

  • ANOVA - Compares mean response in each factor
  • Response - A numerical measured value
  • Factor - A categorical value defining groups

A table showing venture capital funding from several companies in several different markets.

Foundations of Inference in Python

ANOVA

investments_df.groupby('market')['funding_total_usd'].mean()
Market        Average funding
===========   ===============
Advertising      13806610
Analytics        14762930
Biotechnology    20838670
...              ...
  • Response: Funding
  • Factor: Market
  • ANOVA: Compare mean funding by market
Foundations of Inference in Python

Assumptions of ANOVA

  • Responses for each factor are normally distributed
    • Funding amounts by market are normally distributed
  • Responses by factor has equal population variance
    • Funding variation by market are normally distributed
Foundations of Inference in Python

Normally distributed response

health_df = investments_df[investments_df['market'] == 'Health and Wellness']
health_df['funding_total_usd'].plot(kind='hist')

A histogram with total funding by company on the x-axis, frequency on the y-axis, one very tall bar near zero, and several much smaller bars beyond that.

Foundations of Inference in Python

Log-transformations and normality

health_log = np.log(health_df['funding_total_usd'])

health_log.plot(kind='hist')

A histogram with total funding by company on the x-axis, frequency on the y-axis, one very tall bar near zero, and several much smaller bars beyond that.

Foundations of Inference in Python

Equal variance

investments_df['log_funding'] = np.log(investments_df['funding_total_usd'])

investments_df.groupby('market')['log_funding'].std()
Advertising            2.254390
Analytics              2.152852
Biotechnology          1.946059
...                    ...

Levene test of equal variance

$H_0:$ Populations have equal variance

$H_a:$ Populations have different variances

Foundations of Inference in Python

Equal variance

from scipy import stats

health_df = investments_df[investments_df['market'] == 'Health and Wellness']
analytics_df = investments_df[investments_df['market'] == 'Analytics']

s, p_value = stats.levene(health_df['log_funding'], analytics_df['log_funding'])
print(p_value < 0.05)
False

Conclusion: Fail to reject null hypothesis. Markets have equal variance in funding.

Foundations of Inference in Python

ANOVA in SciPy

s, p_value = stats.f_oneway(health_df['log_funding'], 
                            analytics_df['log_funding'])

print(p_value < 0.05)
True

Conclusion: The markets have statistically significant different funding.

Foundations of Inference in Python

Inference based on ANOVA

  • $H_0:$ All means are the same
  • $H_a:$ At least one mean is different
  • Can't conclude which mean is different without further analysis.
Foundations of Inference in Python

Let's practice!

Foundations of Inference in Python

Preparing Video For Download...