Sorting and subsetting

Data Manipulation with pandas

Richie Cotton

Data Evangelist at DataCamp

Sorting

dogs.sort_values("weight_kg")
      name        breed  color  height_cm  weight_kg date_of_birth
5   Stella    Chihuahua    Tan         18          2    2015-04-20
3   Cooper    Schnauzer   Gray         49         17    2011-12-11
0    Bella     Labrador  Brown         56         24    2013-07-01
1  Charlie       Poodle  Black         43         24    2016-09-16
2     Lucy    Chow Chow  Brown         46         24    2014-08-25
4      Max     Labrador  Black         59         29    2017-01-20
6   Bernie  St. Bernard  White         77         74    2018-02-27
Data Manipulation with pandas

Sorting in descending order

dogs.sort_values("weight_kg", ascending=False)
      name        breed  color  height_cm  weight_kg date_of_birth
6   Bernie  St. Bernard  White         77         74    2018-02-27
4      Max     Labrador  Black         59         29    2017-01-20
0    Bella     Labrador  Brown         56         24    2013-07-01
1  Charlie       Poodle  Black         43         24    2016-09-16
2     Lucy    Chow Chow  Brown         46         24    2014-08-25
3   Cooper    Schnauzer   Gray         49         17    2011-12-11
5   Stella    Chihuahua    Tan         18          2    2015-04-20
Data Manipulation with pandas

Sorting by multiple variables

dogs.sort_values(["weight_kg", "height_cm"])
      name        breed  color  height_cm  weight_kg date_of_birth
5   Stella    Chihuahua    Tan         18          2    2015-04-20
3   Cooper    Schnauzer   Gray         49         17    2011-12-11
1  Charlie       Poodle  Black         43         24    2016-09-16
2     Lucy    Chow Chow  Brown         46         24    2014-08-25
0    Bella     Labrador  Brown         56         24    2013-07-01
4      Max     Labrador  Black         59         29    2017-01-20
6   Bernie  St. Bernard  White         77         74    2018-02-27
Data Manipulation with pandas

Sorting by multiple variables

dogs.sort_values(["weight_kg", "height_cm"], ascending=[True, False])
      name        breed  color  height_cm  weight_kg date_of_birth
5   Stella    Chihuahua    Tan         18          2    2015-04-20
3   Cooper    Schnauzer   Gray         49         17    2011-12-11
0    Bella     Labrador  Brown         56         24    2013-07-01
2     Lucy    Chow Chow  Brown         46         24    2014-08-25
1  Charlie       Poodle  Black         43         24    2016-09-16
4      Max     Labrador  Black         59         29    2017-01-20
6   Bernie  St. Bernard  White         77         74    2018-02-27

pandas with relevant values highlighted: Bella, Lucy and Charlie in descending order by height

Data Manipulation with pandas

Subsetting columns

dogs["name"]
0      Bella
1    Charlie
2       Lucy
3     Cooper
4        Max
5     Stella
6     Bernie
Name: name, dtype: object
Data Manipulation with pandas

Subsetting multiple columns

dogs[["breed", "height_cm"]]
         breed  height_cm
0     Labrador         56
1       Poodle         43
2    Chow Chow         46
3    Schnauzer         49
4     Labrador         59
5    Chihuahua         18
6  St. Bernard         77
cols_to_subset = ["breed", "height_cm"]
dogs[cols_to_subset]
         breed  height_cm
0     Labrador         56
1       Poodle         43
2    Chow Chow         46
3    Schnauzer         49
4     Labrador         59
5    Chihuahua         18
6  St. Bernard         77
Data Manipulation with pandas

Subsetting rows

dogs["height_cm"] > 50
0     True
1    False
2    False
3    False
4     True
5    False
6     True
Name: height_cm, dtype: bool
Data Manipulation with pandas

Subsetting rows

dogs[dogs["height_cm"] > 50]
     name        breed  color  height_cm  weight_kg date_of_birth
0   Bella     Labrador  Brown         56         24    2013-07-01
4     Max     Labrador  Black         59         29    2017-01-20
6  Bernie  St. Bernard  White         77         74    2018-02-27
Data Manipulation with pandas

Subsetting based on text data

dogs[dogs["breed"] == "Labrador"]
     name        breed  color  height_cm  weight_kg date_of_birth
0   Bella     Labrador  Brown         56         24    2013-07-01
4     Max     Labrador  Black         59         29    2017-01-20
Data Manipulation with pandas

Subsetting based on dates

dogs[dogs["date_of_birth"] < "2015-01-01"]
     name      breed  color  height_cm  weight_kg date_of_birth
0   Bella   Labrador  Brown         56         24    2013-07-01
2    Lucy  Chow Chow  Brown         46         24    2014-08-25
3  Cooper  Schnauzer   Gray         49         17    2011-12-11
Data Manipulation with pandas

Subsetting based on multiple conditions

is_lab = dogs["breed"] == "Labrador"

is_brown = dogs["color"] == "Brown"
dogs[is_lab & is_brown]
     name        breed  color  height_cm  weight_kg date_of_birth
0   Bella     Labrador  Brown         56         24    2013-07-01
dogs[ (dogs["breed"] == "Labrador") & (dogs["color"] == "Brown") ]
Data Manipulation with pandas

Subsetting using .isin()

is_black_or_brown = dogs["color"].isin(["Black", "Brown"])
dogs[is_black_or_brown]
      name      breed  color  height_cm  weight_kg date_of_birth
0    Bella   Labrador  Brown         56         24    2013-07-01
1  Charlie     Poodle  Black         43         24    2016-09-16
2     Lucy  Chow Chow  Brown         46         24    2014-08-25
4      Max   Labrador  Black         59         29    2017-01-20
Data Manipulation with pandas

Let's practice!

Data Manipulation with pandas

Preparing Video For Download...