Understand the problem

Winning a Kaggle Competition in Python

Yauhen Babakhin

Kaggle Grandmaster

Solution workflow

 

 

solution workflow

Winning a Kaggle Competition in Python

Solution workflow

 

 

solution workflow

Winning a Kaggle Competition in Python

Solution workflow

 

 

solution workflow

Winning a Kaggle Competition in Python

Solution workflow

 

 

solution workflow

Winning a Kaggle Competition in Python

Understand the problem

  • Data type: tabular data, time series, images, text, etc.

data types

Winning a Kaggle Competition in Python

Understand the problem

  • Data type: tabular data, time series, images, text, etc.

data types

Winning a Kaggle Competition in Python

Understand the problem

  • Data type: tabular data, time series, images, text, etc.

data types

Winning a Kaggle Competition in Python

Understand the problem

  • Data type: tabular data, time series, images, text, etc.

data types

  • Problem type: classification, regression, ranking, etc.
  • Evaluation metric: ROC AUC, F1-Score, MAE, MSE, etc.
Winning a Kaggle Competition in Python

Metric definition

# Some classification and regression metrics
from sklearn.metrics import roc_auc_score, f1_score, mean_squared_error

$$RMSLE = \sqrt{\frac{1}{N}\sum_{i=1}^{N}{(\log(y_i+1) - \log(\hat{y}_i+1))^2}}$$

import numpy as np

def rmsle(y_true, y_pred):
    diffs = np.log(y_true + 1) - np.log(y_pred + 1)
    squares = np.power(diffs, 2)

err = np.sqrt(np.mean(squares)) return err
Winning a Kaggle Competition in Python

Let's practice!

Winning a Kaggle Competition in Python

Preparing Video For Download...