Preprocessing for Machine Learning in Python
James Chapman
Curriculum Manager, DataCamp
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train y_train
0 1.0 n
1 4.0 n
...
5 5.0 n
6 6.0 n
X_test y_test
0 9.0 y
1 1.0 n
2 4.0 n
X_train,X_test,y_train,y_test = train_test_split(X, y, stratify=y, random_state=42)
y["labels"].value_counts()
class1 80
class2 20
Name: labels, dtype: int64
y_train["labels"].value_counts()
class1 60
class2 15
Name: labels, dtype: int64
y_test["labels"].value_counts()
class1 20
class2 5
Name: labels, dtype: int64
Preprocessing for Machine Learning in Python