Standardized data and modeling

Preprocessing for Machine Learning in Python

James Chapman

Curriculum Manager, DataCamp

K-nearest neighbors

  • Data leakage: non-training data is used to train the model
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)
knn = KNeighborsClassifier() scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test)
knn.fit(X_train_scaled, y_train)
knn.score(X_test_scaled, y_test)
Preprocessing for Machine Learning in Python

Let's practice!

Preprocessing for Machine Learning in Python

Preparing Video For Download...