Clusteren met meerdere features

Clusteranalyse in Python

Shaumik Daityari

Business Analyst

Basischecks

# Clustercentra
print(fifa.groupby('cluster_labels')[['scaled_heading_accuracy', 
    'scaled_volleys', 'scaled_finishing']].mean())

cluster_labels	scaled_heading_accuracy	scaled_volleys	scaled_finishing
0	3.21	2.83	2.76
1	0.71	0.64	0.58

# Clusteromvang
print(fifa.groupby('cluster_labels')['ID'].count())

cluster_labels	count
0	886
1	114

Visualisaties

Visualiseer clustercentra
Visualiseer andere variabelen per cluster

# Plot clustercentra
fifa.groupby('cluster_labels') \
  [scaled_features].mean()
  .plot(kind='bar')
plt.show()

Topitems per cluster

# Namen van top 5 spelers per cluster
for cluster in fifa['cluster_labels'].unique():
    print(cluster, fifa[fifa['cluster_labels'] == cluster]['name'].values[:5])

Clusterlabel	Topspelers
0	['Cristiano Ronaldo' 'L. Messi' 'Neymar' 'L. Suárez' 'R. Lewandowski']
1	['M. Neuer' 'De Gea' 'G. Buffon' 'T. Courtois' 'H. Lloris']

Feature-reductie

Factoranalyse
Multidimensionale schaling

Laatste oefeningen!

Clusteranalyse in Python