Unsupervised Learning in Python
Benjamin Wilson
Director of Research at lateral.io
species setosa versicolor virginica
labels
0 0 2 36
1 50 0 0
2 0 48 14
pandas
libraryspecies
print(species)
['setosa', 'setosa', 'versicolor', 'virginica', ... ]
import pandas as pd
df = pd.DataFrame({'labels': labels, 'species': species})
print(df)
labels species
0 1 setosa
1 1 setosa
2 2 versicolor
3 2 virginica
4 1 setosa
...
ct = pd.crosstab(df['labels'], df['species'])
print(ct)
species setosa versicolor virginica
labels
0 0 2 36
1 50 0 0
2 0 48 14
How to evaluate a clustering, if there were no species information?
Using only samples and their cluster labels
A good clustering has tight clusters
Samples in each cluster bunched together
fit()
, available as attribute inertia_
from sklearn.cluster import KMeans
model = KMeans(n_clusters=3)
model.fit(samples)
print(model.inertia_)
78.9408414261
Unsupervised Learning in Python