Clusteranalyse in Python
Shaumik Daityari
Business Analyst
# Clustercentra
print(fifa.groupby('cluster_labels')[['scaled_heading_accuracy',
'scaled_volleys', 'scaled_finishing']].mean())
| cluster_labels | scaled_heading_accuracy | scaled_volleys | scaled_finishing |
|---|---|---|---|
| 0 | 3.21 | 2.83 | 2.76 |
| 1 | 0.71 | 0.64 | 0.58 |
# Clusteromvang
print(fifa.groupby('cluster_labels')['ID'].count())
| cluster_labels | count |
|---|---|
| 0 | 886 |
| 1 | 114 |
# Plot clustercentra
fifa.groupby('cluster_labels') \
[scaled_features].mean()
.plot(kind='bar')
plt.show()

# Namen van top 5 spelers per cluster
for cluster in fifa['cluster_labels'].unique():
print(cluster, fifa[fifa['cluster_labels'] == cluster]['name'].values[:5])
| Clusterlabel | Topspelers |
|---|---|
| 0 | ['Cristiano Ronaldo' 'L. Messi' 'Neymar' 'L. Suárez' 'R. Lewandowski'] |
| 1 | ['M. Neuer' 'De Gea' 'G. Buffon' 'T. Courtois' 'H. Lloris'] |
Clusteranalyse in Python