Einführung in die Statistik in Python
Maggie Matsui
Content Developer, DataCamp
die = pd.Series([1, 2, 3, 4, 5, 6])# Roll 5 times samp_5 = die.sample(5, replace=True) print(samp_5)
array([3, 1, 4, 1, 1])
np.mean(samp_5)
2.0

# Roll 5 times and take mean
samp_5 = die.sample(5, replace=True)
np.mean(samp_5)
4.4
samp_5 = die.sample(5, replace=True)
np.mean(samp_5)
3.8
10 Mal wiederholen:
sample_means = []for i in range(10):samp_5 = die.sample(5, replace=True) sample_means.append(np.mean(samp_5))print(sample_means)
[3.8, 4.0, 3.8, 3.6, 3.2, 4.8, 2.6,
3.0, 2.6, 2.0]
Stichprobenverteilung des Stichprobenmittelwerts

sample_means = []
for i in range(100):
sample_means.append(np.mean(die.sample(5, replace=True)))

sample_means = []
for i in range(1000):
sample_means.append(np.mean(die.sample(5, replace=True)))

Die Stichprobenverteilung einer Statistik nähert sich mit zunehmender Anzahl von Versuchen immer mehr der Normalverteilung an.
Histogramme von 10, 100 und 1000 Stichprobenmittelwerten, wobei eine höhere Anzahl von Stichprobenmittelwerten eine glockenförmigere Verteilung aufweist](https://assets.datacamp.com/production/repositories/5786/datasets/68c668ba8e7538984edc15be7f82f1855ad2dc41/Screen%20Shot%202020-07-16%20at%204.48.14%20PM.png)
sample_sds = []
for i in range(1000):
sample_sds.append(np.std(die.sample(5, replace=True)))

sales_team = pd.Series(["Amir", "Brian", "Claire", "Damian"])sales_team.sample(10, replace=True)
array(['Claire', 'Damian', 'Brian', 'Damian', 'Damian', 'Amir', 'Amir', 'Amir',
'Amir', 'Damian'], dtype=object)
sales_team.sample(10, replace=True)
array(['Brian', 'Amir', 'Brian', 'Claire', 'Brian', 'Damian', 'Claire', 'Brian',
'Claire', 'Claire'], dtype=object)

# Estimate expected value of die
np.mean(sample_means)
3.48
# Estimate proportion of "Claire"s
np.mean(sample_props)
0.26
Einführung in die Statistik in Python