Winning a Kaggle Competition in Python
Yauhen Babakhin
Kaggle Grandmaster
import pandas as pd
# Read train data
taxi_train = pd.read_csv('taxi_train.csv')
taxi_train.columns.to_list()
['key',
'fare_amount',
'pickup_datetime',
'pickup_longitude',
'pickup_latitude',
'dropoff_longitude',
'dropoff_latitude',
'passenger_count']
# Read test data
taxi_test = pd.read_csv('taxi_test.csv')
taxi_test.columns.to_list()
['key',
'pickup_datetime',
'pickup_longitude',
'pickup_latitude',
'dropoff_longitude',
'dropoff_latitude',
'passenger_count']
# Read sample submission
taxi_sample_sub = pd.read_csv('taxi_sample_submission.csv')
taxi_sample_sub.head()
key fare_amount
0 2015-01-27 13:08:24.0000002 11.35
1 2015-01-27 13:08:24.0000003 11.35
2 2011-10-08 11:53:44.0000002 11.35
3 2012-12-01 21:12:12.0000002 11.35
4 2012-12-01 21:12:12.0000003 11.35
Winning a Kaggle Competition in Python