Statistical inference and random sampling

Foundations of Inference in Python

Paul Savala

Assistant Professor of Mathematics

Descriptive statistics

  • Sample statistics meant to summarize the data
  • Descriptive statistics summarize our sample
Date SP500 Close Daily Change
2017-08-07 2480.91 6.14
2017-08-08 2474.92 -5.99
2017-08-09 2474.02 -0.90
2017-08-10 2438.21 -35.81

Average daily change: -$9.14

Foundations of Inference in Python

Inference

  • Infer something about our population
  • Descriptive statistics: Describe data
  • Inference: Make conclusions and decisions
Date SP500 Close Daily Change
2017-08-07 2480.91 6.14
2017-08-08 2474.92 -5.99
2017-08-09 2474.02 -0.90
2017-08-10 2438.21 -35.81

Average daily swing for any days ~$9.14

Foundations of Inference in Python

Statistical inference process

A diagram showing a population leading to a sample, leading to inference, leading back to the population.

Foundations of Inference in Python

Point estimates

  • Given by a single value
  • "Best guess" at an unknown population statistic

Point estimate: 1158.95 BTC daily swing

The first five rows of the Bitcoin dataset

btc_high = btc_sp_df['High_BTC']
btc_low = btc_sp_df['Low_BTC']

np.mean(btc_high - btc_low)
1158.95
Foundations of Inference in Python

Sampling

Point estimates depend on the sample

btc_sp_first100 = btc_sp_df.iloc[:100]

np.mean(btc_sp_first100['High_BTC'] - btc_sp_first100['Low_BTC'])
659.60
initial_row = np.random.choice(btc_sp_df.shape[0]-100)

btc_sp_random_100 = btc_sp_df.iloc[initial_row:initial_row+100]
np.mean(btc_sp_first100['High_BTC'] - btc_sp_first100['Low_BTC'])
943.83
Foundations of Inference in Python

Let's practice!

Foundations of Inference in Python

Preparing Video For Download...