Data filtration using the filter() function

Writing Efficient Code with pandas

Leonidas Souliotis

PhD Candidate

Purpose of filter()

Limit results based on an aggregate feature

  • Number of missing values
  • Mean of a specific feature
  • Number of occurrences of the group
Writing Efficient Code with pandas

Filter using groupby().filter()

restaurant_grouped = restaurant.groupby('day')
filter_trans = lambda x : x['total_bill'].mean() > 20
restaurant_filtered = restaurant_grouped.filter(filter_trans)
Time using .filter() 0.00414085388184 sec
print(restaurant_filtered['tip'].mean())
3.11527607362
print(restaurant['tip'].mean())
2.9982786885245902
Writing Efficient Code with pandas

Comparison with native methods

t=[restaurant.loc[df['day'] == i]['tip'] for i in restaurant['day'].unique() 
    if restaurant.loc[df['day'] == i]['total_bill'].mean()>20]
restaurant_filtered = t[0]
for j in t[1:]: 
    restaurant_filtered=restaurant_filtered.append(j,ignore_index=True)
Time using native Python: 0.00663900375366 sec
print(restaurant_filtered.mean())
3.11527607362
Difference in time: 60.329341317157024%
Writing Efficient Code with pandas

Let's do it!

Writing Efficient Code with pandas

Preparing Video For Download...