Winning a Kaggle Competition in Python
Yauhen Babakhin
Kaggle Grandmaster
Model | Validation RMSE | Public LB RMSE | Public LB Position |
---|---|---|---|
Simple mean | 9.986 | 9.409 | 1449 / 1500 |
Group mean | 9.978 | 9.407 | 1411 / 1500 |
Gradient Boosting | 5.996 | 4.595 | 1109 / 1500 |
Add hour feature | 5.553 | 4.352 | 1068 / 1500 |
Add distance feature | 5.268 | 4.103 | 1006 / 1500 |
... | ... | ... | ... |
Model | Validation RMSE | Public LB RMSE | Public LB Position |
---|---|---|---|
Simple mean | 9.986 | 9.409 | 1449 / 1500 |
Group mean | 9.978 | ||
Gradient Boosting | 5.996 | 4.595 | 1109 / 1500 |
Add hour feature | 5.553 | ||
Add distance feature | 5.268 | 4.103 | 1006 / 1500 |
... | ... | ... | ... |
Competition type | Feature engineering | Hyperparameter optimization |
---|---|---|
Classic Machine Learning | +++ | + |
Deep Learning | - | +++ |
$$Loss = \sum_{i=1}^{N}{(y_i - \hat{y}_i)^2} \to \min$$
$$Loss = \sum_{i=1}^{N}{(y_i - \hat{y}_i)^2} \to \min$$
$$Loss = \sum_{i=1}^{N}{(y_i - \hat{y}_i)^2 + \alpha\sum_{j=1}^{K}{{w_j}^2}} \to \min$$
# Possible alpha values alpha_grid = [0.01, 0.1, 1, 10]
from sklearn.linear_model import Ridge results = {} # For each value in the grid for candidate_alpha in alpha_grid:
# Create a model with a specific alpha value ridge_regression = Ridge(alpha=candidate_alpha)
# Find the validation score for this model
# Save the results for each alpha value results[candidate_alpha] = validation_score
Winning a Kaggle Competition in Python