Writing Efficient Code with pandas
Leonidas Souliotis
PhD Candidate
| total_bill | tip | sex | smoker | day | time |
|--------------|-------|----------|----------|-------|----------|
| 16.99 | 1.01 | Female | No | Sun | "Dinner" |
| 10.34 | 1.66 | Male | No | Sun | "Dinner" |
restaurant_grouped = restaurant.groupby('smoker')
print(restaurant_grouped.count())
| | total_bill | tip | sex | day | time |
|------------------|------------|-------|--------|--------|--------|
| smoker | | | | | |
| No | 151 | 151 | 151 | 151 | 151 |
| Yes | 93 | 93 | 93 | 93 | 93 |
zscore = lambda x: (x - x.mean() ) / x.std()
restaurant_grouped = restaurant.groupby('time')
restaurant_transformed = restaurant_grouped.transform(zscore)
restaurant_transformed.head()
total_bill tip size
0 -0.416446 -1.457045 -0.692873
1 -1.143855 -1.004475 0.405737
2 0.023282 0.276645 0.405737
3 0.315339 0.144355 -0.692873
4 0.414880 0.353234 1.504347
restaurant.groupby('sex').transform(zscore)
mean_female = restaurant.groupby('sex').mean()['total_bill']['Female']
mean_male = restaurant.groupby('sex').mean()['total_bill']['Male']
std_female = restaurant.groupby('sex').std()['total_bill']['Female']
std_male = restaurant.groupby('sex').std()['total_bill']['Male']
for i in range(len(restaurant)):
if restaurant.iloc[i][2] == 'Female':
restaurant.iloc[i][0] = (restaurant.iloc[i][0] - mean_female)/std_female
else:
restaurant.iloc[i][0] = (restaurant.iloc[i][0] - mean_male)/std_male
Time using .groupby(): 0.016291141 seconds
Time using native Python: 3.937326908 seconds
Difference in time: 24,068.5145%
Writing Efficient Code with pandas