Clustering and cluster models

Discrete Event Simulation in Python

Diogo Costa (PhD, MSc)

Adjunct Professor, University of Saskatchewan, Canada & CEO of ImpactBLUE-Scientific

Histograms of model results

  • Explore model results
  • Identify tipping points and bottlenecks
  • Optimize the system

Histogram

  • Graph showing frequency distributions
  • Gives number of observations at given interval

Matplotlib package

import matplotlib.pyplot as plt

Use: Create an histogram of dataset data with 50 bins

plt.hist(data, bins=50)

An histogram with 50 data bins showing displaying two gaussian distributions suggesting two custers of data.

Discrete Event Simulation in Python

Cluster analysis and application to models

  • Applications
    • Pattern recognition (e.g., model results)
    • Image analysis

A satellite image of night lights in North America.

  • Data compression
  • Computer graphics
  • Machine learning

    A pictorial image of a neuron network inside a brain alludes to this technique in machine learning and the emergence of data patterns and clustering.

  • In discrete-event models

    • Identify model output patterns
    • More actionable information
Discrete Event Simulation in Python

k-means clustering

Our focus

  • k-means clustering (centroid model)
  • Partition of observations into k clusters
  • Each observation belongs to cluster with nearest mean
  • Mean of clusters called "Cluster centroids"

Observations and cluster centroids

Graph showing data with cluster centroids calculated using k-means clustering.

Discrete Event Simulation in Python

k-means clustering with SciPy

SciPy method

scipy.cluster.vq.kmeans()

Implementation

import scipy
scipy.cluster.vq.kmeans(
obs, k_or_guess, iter=20, thresh=1e-05,
check_finite=True, *, seed=None)
  • obs is a numpy array
  • Returns:
    1. Cluster centroids
    2. Distortion (mean distance between observations and centroids generated)
Discrete Event Simulation in Python

Data whitening: Decorrelation and rescalling

Before running k-mean: Data whitening

  1. Decorrelate obs data
  2. Rescale each dimension of obs by its standard deviation

The plot shows three panels, where the first show the correlated data, the second show decorrelated data, and the third shows the whitened data.

In SciPy

scipy.cluster.vq.whiten(
obs, check_finite=True)
  • obs is a numpy array
Discrete Event Simulation in Python

Example of whitening and k-means

  • Manufacturing activity involving several processes
  • Let's examine the impact of Process 1

Plot showing the duration of process 1 against the total duration for the raw data and the whitened data with clusters calculated using k-means clusters.

Import package

import scipy.cluster.vq as scvq

Whiten model results

white_data = scvq.whiten(model_results)

Find 2 clusters (blue dots)

cluster_centroids, distortion = 
scvq.kmeans(white_data, 2)
Discrete Event Simulation in Python

Optimum number of clusters

Techniques

  • Simple method (max number of clusters)
  • Elbow method
  • Silhouette score coefficient
  • Gap statistic

Simple method

  • Determine maximum number of clusters
  • How to use: $\Big(\dfrac{nobs}{2}\Big)^{0.5}$
    • nobs = number of observations
num_clusters = 
  int((model_results.shape[0]/2)**0.5)
  • Console output
    22
    
Discrete Event Simulation in Python

Optimum number of clusters: Silhouette-score method

  • Import libraries
from sklearn.metrics import silhouette_score

Calculate silhouette scores for k numbers of clusters

for k in range(2, 6):
  model = KMeans(n_clusters=k)
  model.fit(model_results)
  pred = model.predict(model_results)
  score = silhouette_score(model_results, pred)

Console output

Silhouette Score for k = 2: 0.591
Silhouette Score for k = 3: 0.472
Silhouette Score for k = 4: 0.381
Silhouette Score for k = 5: 0.364
Silhouette Score for k = 6: 0.373

Interpret results

  • Best value: score = 1
  • Worst value: score = -1
  • Overlapping clusters: score near 0
Discrete Event Simulation in Python

Let's practice!

Discrete Event Simulation in Python

Preparing Video For Download...