The importance of scale

Cluster Analysis in R

Dmitriy (Dima) Gorenshteyn

Lead Data Scientist, Memorial Sloan Kettering Cancer Center

Distance between individuals

Observation Height (feet) Weight (lbs)
1 6.0 200
2 6.0 202
3 8.0 200
... ... ...
... ... ...
Cluster Analysis in R

Distance between individuals

Cluster Analysis in R

Distance between individuals

Cluster Analysis in R

Distance between individuals

Cluster Analysis in R

Distance between individuals

Cluster Analysis in R

Distance between individuals

Cluster Analysis in R

Scaling our features

   $$height_{scaled} = \frac{height - mean(height)}{sd(height)}$$

Cluster Analysis in R

Distance between individuals

Cluster Analysis in R

Distance between individuals

Cluster Analysis in R

scale() function

print(height_weight)
  Height Weight
1      6    200
2      6    202
3      8    200
...   ...    ...
scale(height_weight)
   Height   Weight
1    0.60    0.67
2    0.60    0.73
3    11.3    0.67
...   ...    ...
Cluster Analysis in R

Let's practice!

Cluster Analysis in R

Preparing Video For Download...