Unsupervised Learning in Python
Benjamin Wilson
Director of Research at lateral.io
PCA(n_components=2)
samples
= array of iris measurements (4 features)species
= list of iris species numbersfrom sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca.fit(samples)
PCA(n_components=2)
transformed = pca.transform(samples)
print(transformed.shape)
(150, 2)
import matplotlib.pyplot as plt
xs = transformed[:,0]
ys = transformed[:,1]
plt.scatter(xs, ys, c=species)
plt.show()
scipy.sparse.csr_matrix
instead of NumPy arraycsr_matrix
remembers only the non-zero entries (saves space!)PCA
doesn't support csr_matrix
TruncatedSVD
insteadfrom sklearn.decomposition import TruncatedSVD
model = TruncatedSVD(n_components=3)
model.fit(documents) # documents is csr_matrix
transformed = model.transform(documents)
Unsupervised Learning in Python