Winning a Kaggle Competition in Python
Yauhen Babakhin
Kaggle Grandmaster
| ID | Categorical feature | Numerical feature | Binary target |
|---|---|---|---|
| 1 | A | 5.1 | 1 |
| 2 | B | 7.2 | 0 |
| 3 | C | 3.4 | 0 |
| 4 | A | NaN | 1 |
| 5 | NaN | 2.6 | 0 |
| 6 | A | 5.3 | 0 |
| ID | Categorical feature | Numerical feature | Binary target |
|---|---|---|---|
| 1 | A | 5.1 | 1 |
| 2 | B | 7.2 | 0 |
| 3 | C | 3.4 | 0 |
| 4 | A | NaN | 1 |
| 5 | NaN | 2.6 | 0 |
| 6 | A | 5.3 | 0 |
| ID | Categorical feature | Numerical feature | Binary target |
|---|---|---|---|
| 1 | A | 5.1 | 1 |
| 2 | B | 7.2 | 0 |
| 3 | C | 3.4 | 0 |
| 4 | A | 4.72 | 1 |
| 5 | NaN | 2.6 | 0 |
| 6 | A | 5.3 | 0 |
| ID | Categorical feature | Numerical feature | Binary target |
|---|---|---|---|
| 1 | A | 5.1 | 1 |
| 2 | B | 7.2 | 0 |
| 3 | C | 3.4 | 0 |
| 4 | A | -999 | 1 |
| 5 | NaN | 2.6 | 0 |
| 6 | A | 5.3 | 0 |
| ID | Categorical feature | Numerical feature | Binary target |
|---|---|---|---|
| 1 | A | 5.1 | 1 |
| 2 | B | 7.2 | 0 |
| 3 | C | 3.4 | 0 |
| 4 | A | -999 | 1 |
| 5 | NaN | 2.6 | 0 |
| 6 | A | 5.3 | 0 |
| ID | Categorical feature | Numerical feature | Binary target |
|---|---|---|---|
| 1 | A | 5.1 | 1 |
| 2 | B | 7.2 | 0 |
| 3 | C | 3.4 | 0 |
| 4 | A | -999 | 1 |
| 5 | A | 2.6 | 0 |
| 6 | A | 5.3 | 0 |
| ID | Categorical feature | Numerical feature | Binary target |
|---|---|---|---|
| 1 | A | 5.1 | 1 |
| 2 | B | 7.2 | 0 |
| 3 | C | 3.4 | 0 |
| 4 | A | -999 | 1 |
| 5 | MISS | 2.6 | 0 |
| 6 | A | 5.3 | 0 |
df.isnull().head(1)
ID cat num target
0 False False False False
df.isnull().sum()
ID 0
cat 1
num 1
target 0
# Import SimpleImputer from sklearn.impute import SimpleImputer# Different types of imputers mean_imputer = SimpleImputer(strategy='mean') constant_imputer = SimpleImputer(strategy='constant', fill_value=-999)# Imputation df[['num']] = mean_imputer.fit_transform(df[['num']])
# Import SimpleImputer from sklearn.impute import SimpleImputer # Different types of imputers frequent_imputer = SimpleImputer(strategy='most_frequent') constant_imputer = SimpleImputer(strategy='constant', fill_value='MISS')# Imputation df[['cat']] = constant_imputer.fit_transform(df[['cat']])
Winning a Kaggle Competition in Python