Unsupervised learning: basics

Cluster Analysis in Python

Shaumik Daityari

Business Analyst

Everyday example: Google news

  • How does Google News classify articles?
  • Unsupervised Learning Algorithm: Clustering
  • Match frequent terms in articles to find similarity

Cluster Analysis in Python

Labeled and unlabeled data

Data with no labels

  • Point 1: (1, 2)
  • Point 2: (2, 2)
  • Point 3: (3, 1)

Data with labels

  • Point 1: (1, 2), Label: Danger Zone
  • Point 2: (2, 2), Label: Normal Zone
  • Point 3: (3, 1), Label: Normal Zone
Cluster Analysis in Python

What is unsupervised learning?

  • A group of machine learning algorithms that find patterns in data
  • Data for algorithms has not been labeled, classified or characterized
  • The objective of the algorithm is to interpret any structure in the data
  • Common unsupervised learning algorithms: clustering, neural networks, anomaly detection
Cluster Analysis in Python

What is clustering?

  • The process of grouping items with similar characteristics
  • Items in groups similar to each other than in other groups
  • Example: distance between points on a 2D plane
Cluster Analysis in Python

Plotting data for clustering - Pokemon sightings

from matplotlib import pyplot as plt
x_coordinates = [80, 93, 86, 98, 86, 9, 15, 3, 10, 20, 44, 56, 49, 62, 44]
y_coordinates = [87, 96, 95, 92, 92, 57, 49, 47, 59, 55, 25, 2, 10, 24, 10]
plt.scatter(x_coordinates, y_coordinates)
plt.show()
Cluster Analysis in Python

Cluster Analysis in Python

Cluster Analysis in Python

Up next - some practice

Cluster Analysis in Python

Preparing Video For Download...