Data Manipulation with pandas
Maggie Matsui
Senior Content Developer at DataCamp
print(vet_visits)
date name breed weight_kg
0 2018-09-02 Bella Labrador 24.87
1 2019-06-07 Max Labrador 28.35
2 2018-01-17 Stella Chihuahua 1.51
3 2019-10-19 Lucy Chow Chow 24.07
.. ... ... ... ...
71 2018-01-20 Stella Chihuahua 2.83
72 2019-06-07 Max Chow Chow 24.01
73 2018-08-20 Lucy Chow Chow 24.40
74 2019-04-22 Max Labrador 28.54
vet_visits.drop_duplicates(subset="name")
date name breed weight_kg 0 2018-09-02 Bella Labrador 24.87 1 2019-06-07 Max Chow Chow 24.01 2 2019-03-19 Charlie Poodle 24.95 3 2018-01-17 Stella Chihuahua 1.51 4 2019-10-19 Lucy Chow Chow 24.07 7 2019-03-30 Cooper Schnauzer 16.91 10 2019-01-04 Bernie St. Bernard 74.98
(6 2019-06-07 Max Labrador 28.35)
unique_dogs = vet_visits.drop_duplicates(subset=["name", "breed"])
print(unique_dogs)
date name breed weight_kg
0 2018-09-02 Bella Labrador 24.87
1 2019-03-13 Max Chow Chow 24.13
2 2019-03-19 Charlie Poodle 24.95
3 2018-01-17 Stella Chihuahua 1.51
4 2019-10-19 Lucy Chow Chow 24.07
6 2019-06-07 Max Labrador 28.35
7 2019-03-30 Cooper Schnauzer 16.91
10 2019-01-04 Bernie St. Bernard 74.98
unique_dogs["breed"].value_counts()
Labrador 2
Schnauzer 1
St. Bernard 1
Chow Chow 2
Poodle 1
Chihuahua 1
Name: breed, dtype: int64
unique_dogs["breed"].value_counts(sort=True)
Labrador 2
Chow Chow 2
Schnauzer 1
St. Bernard 1
Poodle 1
Chihuahua 1
Name: breed, dtype: int64
unique_dogs["breed"].value_counts(normalize=True)
Labrador 0.250
Chow Chow 0.250
Schnauzer 0.125
St. Bernard 0.125
Poodle 0.125
Chihuahua 0.125
Name: breed, dtype: float64
Data Manipulation with pandas