Unsupervised Learning in Python
Benjamin Wilson
Director of Research at lateral.io
articles is a word frequency arrayfrom sklearn.decomposition import NMF
nmf = NMF(n_components=6)
nmf_features = nmf.fit_transform(articles)




from sklearn.preprocessing import normalizenorm_features = normalize(nmf_features)# if has index 23 current_article = norm_features[23,:] similarities = norm_features.dot(current_article)print(similarities)
[ 0.7150569 0.26349967 ..., 0.20323616 0.05047817]
titlesimport pandas as pdnorm_features = normalize(nmf_features)df = pd.DataFrame(norm_features, index=titles)current_article = df.loc['Dog bites man']similarities = df.dot(current_article)
print(similarities.nlargest())
Dog bites man 1.000000
Hound mauls cat 0.979946
Pets go wild! 0.979708
Dachshunds are dangerous 0.949641
Our streets are no longer safe 0.900474
dtype: float64
Unsupervised Learning in Python