Missing value imputation using transform()

Writing Efficient Code with pandas

Leonidas Souliotis

PhD Candidate

Counting missing values

prior_counts = restaurant.groupby('time')
['total_bill'].count()
missing_counts = restaurant_nan.groupby('time')
['total_bill'].count()
print(prior_counts - missing_counts)
time
Dinner    32
Lunch     13
Name: total_bill, dtype: int64
Writing Efficient Code with pandas

Missing value imputation

missing_trans = lambda x: x.fillna(x.mean())
restaurant_nan_grouped = restaurant_nan.groupby('time')['total_bill']
restaurant_nan_grouped.transform(missing_trans)
Time using .transform(): 0.00368881225586 sec
0    20.676573
1    10.340000
2    21.010000
3    23.680000
4    24.590000
5    25.290000
6    20.676573
Name: total_bill, dtype: float64
Writing Efficient Code with pandas

Comparison with native methods

start_time = time.time()
mean_din = restaurant_nan.loc[restaurant_nan.time == 
'Dinner']['total_bill'].mean()
mean_lun = restaurant_nan.loc[restaurant_nan.time == 
'Lunch']['total_bill'].mean()

for row in range(len(restaurant_nan)):
    if restaurant_nan.iloc[row]['time'] == 'Dinner':
        restaurant_nan.loc[row, 'total_time'] = mean_din
    else:
        restaurant_nan.loc[row, 'total_time'] = mean_lun
print("Results from the above operation calculated in %s seconds" % (time.time() - start_time))
Time using native Python: 0.172566890717 sec
Difference in time: 4,578.115%
Writing Efficient Code with pandas

Let's do it!

Writing Efficient Code with pandas

Preparing Video For Download...