Visualize clusters

Cluster Analysis in Python

Shaumik Daityari

Business Analyst

Why visualize clusters?

  • Try to make sense of the clusters formed
  • An additional step in validation of clusters
  • Spot trends in data
Cluster Analysis in Python

An introduction to seaborn

  • seaborn: a Python data visualization library based on matplotlib
  • Has better, easily modifiable aesthetics than matplotlib!
  • Contains functions that make data visualization tasks easy in the context of data analytics
  • Use case for clustering: hue parameter for plots
Cluster Analysis in Python

Visualize clusters with matplotlib

from matplotlib import pyplot as plt
df = pd.DataFrame({'x': [2, 3, 5, 6, 2],
                   'y': [1, 1, 5, 5, 2],
                   'labels': ['A', 'A', 'B', 'B', 'A']})

colors = {'A':'red', 'B':'blue'}
df.plot.scatter(x='x', y='y', c=df['labels'].apply(lambda x: colors[x])) plt.show()
Cluster Analysis in Python

Visualize clusters with seaborn

from matplotlib import pyplot as plt
import seaborn as sns
df = pd.DataFrame({'x': [2, 3, 5, 6, 2],
                   'y': [1, 1, 5, 5, 2],
                   'labels': ['A', 'A', 'B', 'B', 'A']})

sns.scatterplot(x='x', y='y', hue='labels', data=df) plt.show()
Cluster Analysis in Python

Comparison of both methods of visualization

matplotlib plot

seaborn plot

Cluster Analysis in Python

Next up: Try some visualizations

Cluster Analysis in Python

Preparing Video For Download...