Grouped summary statistics

Data Manipulation with pandas

Maggie Matsui

Senior Content Developer at DataCamp

Summaries by group

dogs[dogs["color"] == "Black"]["weight_kg"].mean()
dogs[dogs["color"] == "Brown"]["weight_kg"].mean()
dogs[dogs["color"] == "White"]["weight_kg"].mean()
dogs[dogs["color"] == "Gray"]["weight_kg"].mean()
dogs[dogs["color"] == "Tan"]["weight_kg"].mean()
26.0
24.0
74.0
17.0
2.0
Data Manipulation with pandas

Grouped summaries

dogs.groupby("color")["weight_kg"].mean()
color
Black    26.5
Brown    24.0
Gray     17.0
Tan       2.0
White    74.0
Name: weight_kg, dtype: float64
Data Manipulation with pandas

Multiple grouped summaries

dogs.groupby("color")["weight_kg"].agg([min, max, sum])
       min  max  sum
color               
Black   24   29   53
Brown   24   24   48
Gray    17   17   17
Tan      2    2    2
White   74   74   74
Data Manipulation with pandas

Grouping by multiple variables

dogs.groupby(["color", "breed"])["weight_kg"].mean()
color  breed      
Black  Chow Chow      25
       Labrador       29
       Poodle         24
Brown  Chow Chow      24
       Labrador       24
Gray   Schnauzer      17
Tan    Chihuahua       2
White  St. Bernard    74
Name: weight_kg, dtype: int64
Data Manipulation with pandas

Many groups, many summaries

dogs.groupby(["color", "breed"])[["weight_kg", "height_cm"]].mean()
                   weight_kg  height_cm
color breed                            
Black Labrador            29         59
      Poodle              24         43
Brown Chow Chow           24         46
      Labrador            24         56
Gray  Schnauzer           17         49
Tan   Chihuahua            2         18
White St. Bernard         74         77
Data Manipulation with pandas

Let's practice!

Data Manipulation with pandas

Preparing Video For Download...