Foundations of Inference in Python
Paul Savala
Assistant Professor or Mathematics
investments_df.groupby('market')['funding_total_usd'].mean()
Market Average funding
=========== ===============
Advertising 13806610
Analytics 14762930
Biotechnology 20838670
... ...
health_df = investments_df[investments_df['market'] == 'Health and Wellness']
health_df['funding_total_usd'].plot(kind='hist')
health_log = np.log(health_df['funding_total_usd'])
health_log.plot(kind='hist')
investments_df['log_funding'] = np.log(investments_df['funding_total_usd'])
investments_df.groupby('market')['log_funding'].std()
Advertising 2.254390
Analytics 2.152852
Biotechnology 1.946059
... ...
Levene test of equal variance
$H_0:$ Populations have equal variance
$H_a:$ Populations have different variances
from scipy import stats health_df = investments_df[investments_df['market'] == 'Health and Wellness'] analytics_df = investments_df[investments_df['market'] == 'Analytics']
s, p_value = stats.levene(health_df['log_funding'], analytics_df['log_funding'])
print(p_value < 0.05)
False
Conclusion: Fail to reject null hypothesis. Markets have equal variance in funding.
s, p_value = stats.f_oneway(health_df['log_funding'], analytics_df['log_funding'])
print(p_value < 0.05)
True
Conclusion: The markets have statistically significant different funding.
Foundations of Inference in Python