Writing Efficient Code with pandas
Leonidas Souliotis
PhD Candidate
prior_counts = restaurant.groupby('time')
['total_bill'].count()
missing_counts = restaurant_nan.groupby('time')
['total_bill'].count()
print(prior_counts - missing_counts)
time
Dinner 32
Lunch 13
Name: total_bill, dtype: int64
missing_trans = lambda x: x.fillna(x.mean())
restaurant_nan_grouped = restaurant_nan.groupby('time')['total_bill']
restaurant_nan_grouped.transform(missing_trans)
Time using .transform(): 0.00368881225586 sec
0 20.676573
1 10.340000
2 21.010000
3 23.680000
4 24.590000
5 25.290000
6 20.676573
Name: total_bill, dtype: float64
start_time = time.time()
mean_din = restaurant_nan.loc[restaurant_nan.time ==
'Dinner']['total_bill'].mean()
mean_lun = restaurant_nan.loc[restaurant_nan.time ==
'Lunch']['total_bill'].mean()
for row in range(len(restaurant_nan)):
if restaurant_nan.iloc[row]['time'] == 'Dinner':
restaurant_nan.loc[row, 'total_time'] = mean_din
else:
restaurant_nan.loc[row, 'total_time'] = mean_lun
print("Results from the above operation calculated in %s seconds" % (time.time() - start_time))
Time using native Python: 0.172566890717 sec
Difference in time: 4,578.115%
Writing Efficient Code with pandas