Pembelajaran berbasis jarak

Merancang Alur Kerja Machine Learning di Python

Dr. Chris Anagnostopoulos

Honorary Associate Professor

Jarak dan kemiripan

from sklearn.neighbors import DistanceMetric as dm
dist = dm.get_metric('euclidean')

X = [[0,1], [2,3], [0,6]] dist.pairwise(X)
array([[0.        , 2.82842712, 5.        ],
       [2.82842712, 0.        , 3.60555128],
       [5.        , 3.60555128, 0.        ]])
X = np.matrix(X)
np.sqrt(np.sum(np.square(X[0,:] - X[1,:])))
2.82842712
Merancang Alur Kerja Machine Learning di Python

Local Outlier Factor non-Euclidean

clf = LocalOutlierFactor(
    novelty=True, metric='chebyshev')
clf.fit(X_train)
y_pred = clf.predict(X_test)
dist = dm.get_metric('chebyshev')
X = [[0,1], [2,3], [0,6]]
dist.pairwise(X)
array([[0., 2., 5.],
       [2., 0., 3.],
       [5., 3., 0.]])

Dua kluster titik hitam dengan beberapa titik merah terisolasi.

Merancang Alur Kerja Machine Learning di Python

Apakah semua metrik mirip?

Matriks jarak Hamming:

dist = dm.get_metric('hamming')
X = [[0,1], [2,3], [0,6]]
dist.pairwise(X)
array([[0. , 1. , 0.5],
       [1. , 0. , 1. ],
       [0.5, 1. , 0. ]])
Merancang Alur Kerja Machine Learning di Python

Apakah semua metrik mirip?

from scipy.spatial.distance import pdist

X = [[0,1], [2,3], [0,6]] pdist(X, 'cityblock')
array([4., 5., 5.])
from scipy.spatial.distance import \ 
    squareform
squareform(pdist(X, 'cityblock'))
array([[0., 4., 5.],
       [4., 0., 5.],
       [5., 5., 0.]])
Merancang Alur Kerja Machine Learning di Python

Contoh dunia nyata

Dataset Hepatitis:

   Class   AGE  SEX  STEROID    ...      
0    2.0  40.0  0.0      0.0    ...      
1    2.0  30.0  0.0      0.0    ...      
2    1.0  47.0  0.0      1.0    ...      
1 https://archive.ics.uci.edu/ml/datasets/Hepatitis
Merancang Alur Kerja Machine Learning di Python

Contoh dunia nyata

Jarak Euclidean:

squareform(pdist(X_hep, 'euclidean'))
[[  0.  127.   64.1]
 [127.    0.  128.2]
 [ 64.1 128.2   0. ]]
  • 1 terdekat ke 3: kelas salah

Jarak Hamming:

squareform(pdist(X_hep, 'hamming'))
[[0.  0.5 0.7]
 [0.5 0.  0.6]
 [0.7 0.6 0. ]]
  • 1 terdekat ke 2: kelas benar
Merancang Alur Kerja Machine Learning di Python

Perkakas yang lebih lengkap

Merancang Alur Kerja Machine Learning di Python

Preparing Video For Download...