Data transformation using .groupby().transform

Writing Efficient Code with pandas

Leonidas Souliotis

PhD Candidate

The restaurant dataset

| total_bill   | tip   | sex      | smoker   | day   | time     | 
|--------------|-------|----------|----------|-------|----------|
| 16.99        | 1.01  | Female   | No       | Sun   | "Dinner" | 
| 10.34        | 1.66  | Male     | No       | Sun   | "Dinner" |
restaurant_grouped = restaurant.groupby('smoker')
print(restaurant_grouped.count())
|                  | total_bill |   tip |   sex  |  day   |   time |
|------------------|------------|-------|--------|--------|--------|
| smoker           |            |       |        |        |        |
| No               | 151        | 151   | 151    | 151    | 151    |
| Yes              | 93         | 93    | 93     | 93     | 93     |
Writing Efficient Code with pandas

Data transformation

zscore = lambda x: (x - x.mean()  ) / x.std()
restaurant_grouped = restaurant.groupby('time')
restaurant_transformed = restaurant_grouped.transform(zscore)
restaurant_transformed.head()
   total_bill       tip      size
0   -0.416446 -1.457045 -0.692873
1   -1.143855 -1.004475  0.405737
2    0.023282  0.276645  0.405737
3    0.315339  0.144355 -0.692873
4    0.414880  0.353234  1.504347
Writing Efficient Code with pandas

Comparison with native methods

restaurant.groupby('sex').transform(zscore)

mean_female = restaurant.groupby('sex').mean()['total_bill']['Female']
mean_male = restaurant.groupby('sex').mean()['total_bill']['Male']
std_female = restaurant.groupby('sex').std()['total_bill']['Female']
std_male = restaurant.groupby('sex').std()['total_bill']['Male']

for i in range(len(restaurant)):
    if restaurant.iloc[i][2] == 'Female':
        restaurant.iloc[i][0] = (restaurant.iloc[i][0] - mean_female)/std_female
    else:
        restaurant.iloc[i][0] = (restaurant.iloc[i][0] - mean_male)/std_male
Time using .groupby(): 0.016291141 seconds
Time using native Python: 3.937326908 seconds

Difference in time: 24,068.5145%
Writing Efficient Code with pandas

Let's practice!

Writing Efficient Code with pandas

Preparing Video For Download...