Summary statistics

Data Manipulation with pandas

Maggie Matsui

Senior Content Developer at DataCamp

Summarizing numerical data

dogs["height_cm"].mean()
49.714285714285715
  • .median(), .mode()
  • .min(), .max()
  • .var(), .std()
  • .sum()
  • .quantile()
Data Manipulation with pandas

Summarizing dates

Oldest dog:

dogs["date_of_birth"].min()
'2011-12-11'

Youngest dog:

dogs["date_of_birth"].max()
'2018-02-27'
Data Manipulation with pandas

The .agg() method

def pct30(column):
    return column.quantile(0.3)
dogs["weight_kg"].agg(pct30)
22.599999999999998
Data Manipulation with pandas

Summaries on multiple columns

dogs[["weight_kg", "height_cm"]].agg(pct30)
weight_kg    22.6
height_cm    45.4
dtype: float64
Data Manipulation with pandas

Multiple summaries

def pct40(column):
    return column.quantile(0.4)
dogs["weight_kg"].agg([pct30, pct40])
pct30    22.6
pct40    24.0
Name: weight_kg, dtype: float64
Data Manipulation with pandas

Cumulative sum

dogs["weight_kg"]
0    24
1    24
2    24
3    17
4    29
5    2
6    74
Name: weight_kg, dtype: int64
dogs["weight_kg"].cumsum()
0     24

1 48
2 72
3 89 4 118 5 120 6 194 Name: weight_kg, dtype: int64
Data Manipulation with pandas

Cumulative statistics

  • .cummax()
  • .cummin()
  • .cumprod()
Data Manipulation with pandas

Walmart

sales.head()
  store type  dept       date  weekly_sales  is_holiday  temp_c  fuel_price  unemp
0     1    A     1 2010-02-05      24924.50       False    5.73       0.679  8.106
1     1    A     2 2010-02-05      50605.27       False    5.73       0.679  8.106
2     1    A     3 2010-02-05      13740.12       False    5.73       0.679  8.106
3     1    A     4 2010-02-05      39954.04       False    5.73       0.679  8.106
4     1    A     5 2010-02-05      32229.38       False    5.73       0.679  8.106
Data Manipulation with pandas

Let's practice!

Data Manipulation with pandas

Preparing Video For Download...