Feature engineering

Winning a Kaggle Competition in Python

Yauhen Babakhin

Kaggle Grandmaster

Solution workflow

 

 

solution workflow

Winning a Kaggle Competition in Python

Modeling stage

  modeling stage components

Winning a Kaggle Competition in Python

Modeling stage

  modeling stage components

Winning a Kaggle Competition in Python

Modeling stage

  modeling stage components

Winning a Kaggle Competition in Python

Feature engineering

 

feature engineering scheme

Winning a Kaggle Competition in Python

Feature engineering

 

feature engineering scheme

Winning a Kaggle Competition in Python

Feature types

 

  • Numerical
  • Categorical
  • Datetime
  • Coordinates
  • Text
  • Images
Winning a Kaggle Competition in Python

Creating features

# Concatenate the train and test data
data = pd.concat([train, test])
# Create new features for the data DataFrame...
# Get the train and test back
train = data[data.id.isin(train.id)]
test = data[data.id.isin(test.id)]
Winning a Kaggle Competition in Python

Arithmetical features

# Two sigma connect competition
two_sigma.head(1)
      id     bathrooms  bedrooms price interest_level
0     10     1.5        3        3000  medium
# Arithmetical features
two_sigma['price_per_bedroom'] = two_sigma.price / two_sigma.bedrooms
two_sigma['rooms_number'] = two_sigma.bedrooms + two_sigma.bathrooms
Winning a Kaggle Competition in Python

Datetime features

# Demand forecasting challenge
dem.head(1)
          id           date store  item  sales
0     100000     2017-12-01     1     1     19
# Convert date to the datetime object
dem['date'] = pd.to_datetime(dem['date'])
Winning a Kaggle Competition in Python

Datetime features

# Year features
dem['year'] = dem['date'].dt.year

# Month features dem['month'] = dem['date'].dt.month
# Week features dem['week'] = dem['date'].dt.weekofyear
# Day features
dem['dayofyear'] = dem['date'].dt.dayofyear
dem['dayofmonth'] = dem['date'].dt.day
dem['dayofweek'] = dem['date'].dt.dayofweek
date           year  month   week
2017-12-01     2017     12     48
2017-12-02     2017     12     48
2017-12-03     2017     12     48
2017-12-04     2017     12     49

 

date      dayofyear  dayofmonth dayofweek
2017-12-01  335      1          4
2017-12-02  336      2          5
2017-12-03  337      3          6
2017-12-04  338      4          0
Winning a Kaggle Competition in Python

Let's practice!

Winning a Kaggle Competition in Python

Preparing Video For Download...