Cluster Analysis in R
Dmitriy Gorenshteyn
Lead Data Scientist, Memorial Sloan Kettering Cancer Center
print(players)
x y
<dbl> <dbl>
1 -1 1
2 -2 -3
3 8 6
4 7 -8
5 -12 8
6 -15 0
dist_players <- dist(players, method = 'euclidean')
hc_players <- hclust(dist_players, method = 'complete')
cluster_assignments <- cutree(hc_players, k = 2)
print(cluster_assignments)
[1] 1 1 1 1 2 2
library(dplyr)
players_clustered <- mutate(players, cluster = cluster_assignments)
print(players_clustered)
x y cluster
<dbl> <dbl> <int>
1 -1 1 1
2 -2 -3 1
3 8 6 1
4 7 -8 1
5 -12 8 2
6 -15 0 2
library(ggplot2)
ggplot(players_clustered, aes(x = x, y = y, color = factor(cluster))) +
geom_point()
Cluster Analysis in R