Unsupervised Learning in Python
Benjamin Wilson
Director of Research at lateral.io
articles
is a word frequency arrayfrom sklearn.decomposition import NMF
nmf = NMF(n_components=6)
nmf_features = nmf.fit_transform(articles)
from sklearn.preprocessing import normalize
norm_features = normalize(nmf_features)
# if has index 23 current_article = norm_features[23,:] similarities = norm_features.dot(current_article)
print(similarities)
[ 0.7150569 0.26349967 ..., 0.20323616 0.05047817]
titles
import pandas as pd
norm_features = normalize(nmf_features)
df = pd.DataFrame(norm_features, index=titles)
current_article = df.loc['Dog bites man']
similarities = df.dot(current_article)
print(similarities.nlargest())
Dog bites man 1.000000
Hound mauls cat 0.979946
Pets go wild! 0.979708
Dachshunds are dangerous 0.949641
Our streets are no longer safe 0.900474
dtype: float64
Unsupervised Learning in Python