Designing Machine Learning Workflows in Python
Dr. Chris Anagnostopoulos
Honorary Associate Professor
from sklearn.neighbors import DistanceMetric as dm dist = dm.get_metric('euclidean')
X = [[0,1], [2,3], [0,6]] dist.pairwise(X)
array([[0. , 2.82842712, 5. ],
[2.82842712, 0. , 3.60555128],
[5. , 3.60555128, 0. ]])
X = np.matrix(X)
np.sqrt(np.sum(np.square(X[0,:] - X[1,:])))
2.82842712
clf = LocalOutlierFactor(
novelty=True, metric='chebyshev')
clf.fit(X_train)
y_pred = clf.predict(X_test)
dist = dm.get_metric('chebyshev')
X = [[0,1], [2,3], [0,6]]
dist.pairwise(X)
array([[0., 2., 5.],
[2., 0., 3.],
[5., 3., 0.]])
Hamming distance matrix:
dist = dm.get_metric('hamming')
X = [[0,1], [2,3], [0,6]]
dist.pairwise(X)
array([[0. , 1. , 0.5],
[1. , 0. , 1. ],
[0.5, 1. , 0. ]])
from scipy.spatial.distance import pdist
X = [[0,1], [2,3], [0,6]] pdist(X, 'cityblock')
array([4., 5., 5.])
from scipy.spatial.distance import \
squareform
squareform(pdist(X, 'cityblock'))
array([[0., 4., 5.],
[4., 0., 5.],
[5., 5., 0.]])
The Hepatitis dataset:
Class AGE SEX STEROID ...
0 2.0 40.0 0.0 0.0 ...
1 2.0 30.0 0.0 0.0 ...
2 1.0 47.0 0.0 1.0 ...
Euclidean distance:
squareform(pdist(X_hep, 'euclidean'))
[[ 0. 127. 64.1]
[127. 0. 128.2]
[ 64.1 128.2 0. ]]
Hamming distance:
squareform(pdist(X_hep, 'hamming'))
[[0. 0.5 0.7]
[0.5 0. 0.6]
[0.7 0.6 0. ]]
Designing Machine Learning Workflows in Python