Local validation

Winning a Kaggle Competition in Python

Yauhen Babakhin

Kaggle Grandmaster

Motivation

overfitting example with Public and Private leaderboards

Winning a Kaggle Competition in Python

Holdout set

holdout set scheme

Winning a Kaggle Competition in Python

Holdout set

holdout set scheme

Winning a Kaggle Competition in Python

Holdout set

holdout set scheme

Winning a Kaggle Competition in Python

K-fold cross-validation

 

split train data into four folds

Winning a Kaggle Competition in Python

K-fold cross-validation

 

k-fold cross validation scheme

Winning a Kaggle Competition in Python

K-fold cross-validation

# Import KFold
from sklearn.model_selection import KFold
# Create a KFold object
kf = KFold(n_splits=5, shuffle=True, random_state=123)
# Loop through each cross-validation split
for train_index, test_index in kf.split(train):

# Get training and testing data for the corresponding split cv_train, cv_test = train.iloc[train_index], train.iloc[test_index]
Winning a Kaggle Competition in Python

Stratified K-fold

  stratified k-fold cross validation scheme

Winning a Kaggle Competition in Python

Stratified K-fold

# Import StratifiedKFold
from sklearn.model_selection import StratifiedKFold

# Create a StratifiedKFold object str_kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=123)
# Loop through each cross-validation split for train_index, test_index in str_kf.split(train, train['target']): cv_train, cv_test = train.iloc[train_index], train.iloc[test_index]
Winning a Kaggle Competition in Python

Let's practice!

Winning a Kaggle Competition in Python

Preparing Video For Download...