Winning a Kaggle Competition in Python
Yauhen Babakhin
Kaggle Grandmaster
# Concatenate the train and test data
data = pd.concat([train, test])
# Create new features for the data DataFrame...
# Get the train and test back
train = data[data.id.isin(train.id)]
test = data[data.id.isin(test.id)]
# Two sigma connect competition
two_sigma.head(1)
id bathrooms bedrooms price interest_level
0 10 1.5 3 3000 medium
# Arithmetical features
two_sigma['price_per_bedroom'] = two_sigma.price / two_sigma.bedrooms
two_sigma['rooms_number'] = two_sigma.bedrooms + two_sigma.bathrooms
# Demand forecasting challenge
dem.head(1)
id date store item sales
0 100000 2017-12-01 1 1 19
# Convert date to the datetime object
dem['date'] = pd.to_datetime(dem['date'])
# Year features dem['year'] = dem['date'].dt.year
# Month features dem['month'] = dem['date'].dt.month
# Week features dem['week'] = dem['date'].dt.weekofyear
# Day features
dem['dayofyear'] = dem['date'].dt.dayofyear
dem['dayofmonth'] = dem['date'].dt.day
dem['dayofweek'] = dem['date'].dt.dayofweek
date year month week
2017-12-01 2017 12 48
2017-12-02 2017 12 48
2017-12-03 2017 12 48
2017-12-04 2017 12 49
date dayofyear dayofmonth dayofweek
2017-12-01 335 1 4
2017-12-02 336 2 5
2017-12-03 337 3 6
2017-12-04 338 4 0
Winning a Kaggle Competition in Python