Counting

Data Manipulation with pandas

Maggie Matsui

Senior Content Developer at DataCamp

Avoiding double counting

shutterstock_241214137.jpg

Data Manipulation with pandas

Vet visits

print(vet_visits)
          date     name        breed  weight_kg
0   2018-09-02    Bella     Labrador      24.87
1   2019-06-07      Max     Labrador      28.35
2   2018-01-17   Stella    Chihuahua       1.51
3   2019-10-19     Lucy    Chow Chow      24.07
..         ...      ...          ...        ...
71  2018-01-20   Stella    Chihuahua       2.83
72  2019-06-07      Max    Chow Chow      24.01
73  2018-08-20     Lucy    Chow Chow      24.40
74  2019-04-22      Max     Labrador      28.54
Data Manipulation with pandas

Dropping duplicate names

vet_visits.drop_duplicates(subset="name")
          date     name        breed  weight_kg
0   2018-09-02    Bella     Labrador      24.87
1   2019-06-07      Max    Chow Chow      24.01
2   2019-03-19  Charlie       Poodle      24.95
3   2018-01-17   Stella    Chihuahua       1.51
4   2019-10-19     Lucy    Chow Chow      24.07
7   2019-03-30   Cooper    Schnauzer      16.91
10  2019-01-04   Bernie  St. Bernard      74.98

(6 2019-06-07 Max Labrador 28.35)
Data Manipulation with pandas

Dropping duplicate pairs

unique_dogs = vet_visits.drop_duplicates(subset=["name", "breed"])
print(unique_dogs)
          date     name        breed  weight_kg
0   2018-09-02    Bella     Labrador      24.87
1   2019-03-13      Max    Chow Chow      24.13
2   2019-03-19  Charlie       Poodle      24.95
3   2018-01-17   Stella    Chihuahua       1.51
4   2019-10-19     Lucy    Chow Chow      24.07
6   2019-06-07      Max     Labrador      28.35
7   2019-03-30   Cooper    Schnauzer      16.91
10  2019-01-04   Bernie  St. Bernard      74.98
Data Manipulation with pandas

Easy as 1, 2, 3

unique_dogs["breed"].value_counts()
Labrador       2
Schnauzer      1
St. Bernard    1
Chow Chow      2
Poodle         1
Chihuahua      1
Name: breed, dtype: int64
unique_dogs["breed"].value_counts(sort=True)
Labrador       2
Chow Chow      2
Schnauzer      1
St. Bernard    1
Poodle         1
Chihuahua      1
Name: breed, dtype: int64
Data Manipulation with pandas

Proportions

unique_dogs["breed"].value_counts(normalize=True)
Labrador       0.250
Chow Chow      0.250
Schnauzer      0.125
St. Bernard    0.125
Poodle         0.125
Chihuahua      0.125
Name: breed, dtype: float64
Data Manipulation with pandas

Let's practice!

Data Manipulation with pandas

Preparing Video For Download...