Winning a Kaggle Competition in Python
Yauhen Babakhin
Kaggle Grandmaster
ID | Categorical feature | Numerical feature | Binary target |
---|---|---|---|
1 | A | 5.1 | 1 |
2 | B | 7.2 | 0 |
3 | C | 3.4 | 0 |
4 | A | NaN | 1 |
5 | NaN | 2.6 | 0 |
6 | A | 5.3 | 0 |
ID | Categorical feature | Numerical feature | Binary target |
---|---|---|---|
1 | A | 5.1 | 1 |
2 | B | 7.2 | 0 |
3 | C | 3.4 | 0 |
4 | A | NaN | 1 |
5 | NaN | 2.6 | 0 |
6 | A | 5.3 | 0 |
ID | Categorical feature | Numerical feature | Binary target |
---|---|---|---|
1 | A | 5.1 | 1 |
2 | B | 7.2 | 0 |
3 | C | 3.4 | 0 |
4 | A | 4.72 | 1 |
5 | NaN | 2.6 | 0 |
6 | A | 5.3 | 0 |
ID | Categorical feature | Numerical feature | Binary target |
---|---|---|---|
1 | A | 5.1 | 1 |
2 | B | 7.2 | 0 |
3 | C | 3.4 | 0 |
4 | A | -999 | 1 |
5 | NaN | 2.6 | 0 |
6 | A | 5.3 | 0 |
ID | Categorical feature | Numerical feature | Binary target |
---|---|---|---|
1 | A | 5.1 | 1 |
2 | B | 7.2 | 0 |
3 | C | 3.4 | 0 |
4 | A | -999 | 1 |
5 | NaN | 2.6 | 0 |
6 | A | 5.3 | 0 |
ID | Categorical feature | Numerical feature | Binary target |
---|---|---|---|
1 | A | 5.1 | 1 |
2 | B | 7.2 | 0 |
3 | C | 3.4 | 0 |
4 | A | -999 | 1 |
5 | A | 2.6 | 0 |
6 | A | 5.3 | 0 |
ID | Categorical feature | Numerical feature | Binary target |
---|---|---|---|
1 | A | 5.1 | 1 |
2 | B | 7.2 | 0 |
3 | C | 3.4 | 0 |
4 | A | -999 | 1 |
5 | MISS | 2.6 | 0 |
6 | A | 5.3 | 0 |
df.isnull().head(1)
ID cat num target
0 False False False False
df.isnull().sum()
ID 0
cat 1
num 1
target 0
# Import SimpleImputer from sklearn.impute import SimpleImputer
# Different types of imputers mean_imputer = SimpleImputer(strategy='mean') constant_imputer = SimpleImputer(strategy='constant', fill_value=-999)
# Imputation df[['num']] = mean_imputer.fit_transform(df[['num']])
# Import SimpleImputer from sklearn.impute import SimpleImputer # Different types of imputers frequent_imputer = SimpleImputer(strategy='most_frequent') constant_imputer = SimpleImputer(strategy='constant', fill_value='MISS')
# Imputation df[['cat']] = constant_imputer.fit_transform(df[['cat']])
Winning a Kaggle Competition in Python