Sampling in Python
James Chapman
Curriculum Manager, DataCamp
Sample size: 5
Sample size: 20
Sample size: 80
Sample size: 320
Averages of independent samples have approximately normal distributions.
As the sample size increases,
The distribution of the averages gets closer to being normally distributed
The width of the sampling distribution gets narrower
coffee_ratings['total_cup_points'].mean()
82.15120328849028
Use np.mean()
on each approximate sampling distribution:
Sample size | Mean sample mean |
---|---|
5 | 82.18420719999999 |
20 | 82.1558634 |
80 | 82.14510154999999 |
320 | 82.154017925 |
coffee_ratings['total_cup_points'].std(ddof=0)
2.685858187306438
ddof=0
when calling .std()
on populationsddof=1
when calling np.std()
on samples or sampling distributionsSample size | Std dev sample mean |
---|---|
5 | 1.1886358227738543 |
20 | 0.5940321141669805 |
80 | 0.2934024263916487 |
320 | 0.13095083089190876 |
Sample size | Std dev sample mean | Calculation | Result |
---|---|---|---|
5 | 1.1886358227738543 |
2.685858187306438 / sqrt(5) |
1.201 |
20 | 0.5940321141669805 |
2.685858187306438 / sqrt(20) |
0.601 |
80 | 0.2934024263916487 |
2.685858187306438 / sqrt(80) |
0.300 |
320 | 0.13095083089190876 |
2.685858187306438 / sqrt(320) |
0.150 |
Sampling in Python