NMF learns interpretable parts

Unsupervised Learning in Python

Benjamin Wilson

Director of Research at lateral.io

Example: NMF learns interpretable parts

  • Word-frequency array articles (tf-idf)
  • 20,000 scientific articles (rows)
  • 800 words (columns)

Articles word frequency array

Unsupervised Learning in Python

Applying NMF to the articles

print(articles.shape)
(20000, 800)
from sklearn.decomposition import NMF
nmf = NMF(n_components=10)
nmf.fit(articles)
NMF(n_components=10)
print(nmf.components_.shape)
(10, 800)
Unsupervised Learning in Python

NMF components are topics

 

nmf.components_

Unsupervised Learning in Python

NMF components are topics

 

one row of nmf.components_ selected and a bar plot for each word where height represents the tfidf

Unsupervised Learning in Python

NMF components are topics

 

box with top words and values: species 2.95, plant 1.05, plants 0.78, etc.

Unsupervised Learning in Python

NMF components are topics

 

new box with top words from different row of nmf.components_: university 3.57, prof 1.19, college 0.88, etc

Unsupervised Learning in Python

NMF components

  • For documents:
    • NMF components represent topics
    • NMF features combine topics into documents
  • For images, NMF components are parts of images

 

box with 3 stripes approximately equals 0.98 times box with one stripe plus 0.91 times box with a different stripe plus 0.95 times box with a different stripe

Unsupervised Learning in Python

Grayscale images

  • "Grayscale" image = no colors, only shades of gray
  • Measure pixel brightness
  • Represent with value between 0 and 1 (0 is black)
  • Convert to 2D array

2 by 3 rectangle of pixels with varying degrees of white/black/gray

Unsupervised Learning in Python

Grayscale image example

  • An 8x8 grayscale image of the moon, written as an array

 

pixelated illustration of the moon with arrow pointing to array of numbers with the same dimension holding numbers between 0 to 1

Unsupervised Learning in Python

Grayscale images as flat arrays

  • Enumerate the entries
  • Row-by-row
  • From left to right, top to bottom

2 by 3 pixel box pointing to array with numbers

Unsupervised Learning in Python

Grayscale images as flat arrays

  • Enumerate the entries
  • Row-by-row
  • From left to right, top to bottom

2 by 3 pixel box pointing to 2 by 3 array with numbers pointing to a 1 by 6 array of numbers

Unsupervised Learning in Python

Encoding a collection of images

  • Collection of images of the same size
  • Encode as 2D array
  • Each row corresponds to an image
  • Each column corresponds to a pixel
  • ... can apply NMF!

One row of nmf.components_ that represents the 2x3 pixel image

Unsupervised Learning in Python

Visualizing samples

print(sample)
[ 0.   1.   0.5  1.   0.   1. ]
bitmap = sample.reshape((2, 3))
print(bitmap)
[[ 0.   1.   0.5]
 [ 1.   0.   1. ]]
from matplotlib import pyplot as plt
plt.imshow(bitmap, cmap='gray', interpolation='nearest')
plt.show()
Unsupervised Learning in Python

Let's practice!

Unsupervised Learning in Python

Preparing Video For Download...