Cluster Analysis in R
Dmitriy (Dima) Gorenshteyn
Lead Data Scientist, Memorial Sloan Kettering Cancer Center
Within Cluster Distance: C(i)
Closest Neighbor Distance: N(i)
Within Cluster Distance: C(i)
Closest Neighbor Distance: N(i)
Within Cluster Distance: C(i)
Closest Neighbor Distance: N(i)
Within Cluster Distance: C(i)
Closest Neighbor Distance: N(i)
Within Cluster Distance: C(i)
Closest Neighbor Distance: N(i)
library(cluster) pam_k3 <- pam(lineup, k = 3)
pam_k3$silinfo$widths cluster neighbor sil_width 4 1 2 0.465320054 2 1 3 0.321729341 10 1 2 0.311385893 1 1 3 0.271890169 9 2 1 0.443606497 ... ... ... ...
sil_plot <- silhouette(pam_k3)
plot(sil_plot)
sil_plot <- silhouette(pam_k3)
plot(sil_plot)
pam_k3$silinfo$avg.width
[1] 0.353414
library(purrr)
sil_width <- map_dbl(2:10, function(k){
model <- pam(x = lineup, k = k)
model$silinfo$avg.width
})
sil_df <- data.frame(
k = 2:10,
sil_width = sil_width
)
print(sil_df)
k sil_width
1 2 0.4164141
2 3 0.3534140
3 4 0.3535534
4 5 0.3724115
... ... ...
ggplot(sil_df, aes(x = k, y = sil_width)) +
geom_line() +
scale_x_continuous(breaks = 2:10)
ggplot(sil_df, aes(x = k, y = sil_width)) +
geom_line() +
scale_x_continuous(breaks = 2:10)
Cluster Analysis in R