Slicing and subsetting with .loc and .iloc

Data Manipulation with pandas

Richie Cotton

Data Evangelist at DataCamp

Slicing lists

breeds = ["Labrador", "Poodle", 
          "Chow Chow", "Schnauzer", 
          "Labrador", "Chihuahua", 
          "St. Bernard"]
['Labrador',
 'Poodle',
 'Chow Chow',
 'Schnauzer',
 'Labrador',
 'Chihuahua',
 'St. Bernard']
breeds[2:5]
['Chow Chow', 'Schnauzer', 'Labrador']
breeds[:3]
['Labrador', 'Poodle', 'Chow Chow']
breeds[:]
['Labrador','Poodle','Chow Chow','Schnauzer',
 'Labrador','Chihuahua','St. Bernard']
Data Manipulation with pandas

Sort the index before you slice

dogs_srt = dogs.set_index(["breed", "color"]).sort_index()
print(dogs_srt)
                      name  height_cm  weight_kg
breed       color                               
Chihuahua   Tan     Stella         18          2
Chow Chow   Brown     Lucy         46         22
Labrador    Black      Max         59         29
            Brown    Bella         56         25
Poodle      Black  Charlie         43         23
Schnauzer   Grey    Cooper         49         17
St. Bernard White   Bernie         77         74
Data Manipulation with pandas

Slicing the outer index level

dogs_srt.loc["Chow Chow":"Poodle"]
                    name  height_cm  weight_kg
breed     color                               
Chow Chow Brown     Lucy         46         22
Labrador  Black      Max         59         29
          Brown    Bella         56         25
Poodle    Black  Charlie         43         23

The final value "Poodle" is included

Full dataset

                      name  height_cm  weight_kg
breed       color                               
Chihuahua   Tan     Stella         18          2
Chow Chow   Brown     Lucy         46         22
Labrador    Black      Max         59         29
            Brown    Bella         56         25
Poodle      Black  Charlie         43         23
Schnauzer   Grey    Cooper         49         17
St. Bernard White   Bernie         77         74
Data Manipulation with pandas

Slicing the inner index levels badly

dogs_srt.loc["Tan":"Grey"]
Empty DataFrame
Columns: [name, height_cm, weight_kg]
Index: []

Full dataset

                      name  height_cm  weight_kg
breed       color                               
Chihuahua   Tan     Stella         18          2
Chow Chow   Brown     Lucy         46         22
Labrador    Black      Max         59         29
            Brown    Bella         56         25
Poodle      Black  Charlie         43         23
Schnauzer   Grey    Cooper         49         17
St. Bernard White   Bernie         77         74
Data Manipulation with pandas

Slicing the inner index levels correctly

dogs_srt.loc[
    ("Labrador", "Brown"):("Schnauzer", "Grey")]
                    name  height_cm  weight_kg
breed     color                               
Labrador  Brown    Bella         56         25
Poodle    Black  Charlie         43         23
Schnauzer Grey    Cooper         49         17

Full dataset

                      name  height_cm  weight_kg
breed       color                               
Chihuahua   Tan     Stella         18          2
Chow Chow   Brown     Lucy         46         22
Labrador    Black      Max         59         29
            Brown    Bella         56         25
Poodle      Black  Charlie         43         23
Schnauzer   Grey    Cooper         49         17
St. Bernard White   Bernie         77         74
Data Manipulation with pandas

Slicing columns

dogs_srt.loc[:, "name":"height_cm"]
                      name  height_cm
breed       color                    
Chihuahua   Tan     Stella         18
Chow Chow   Brown     Lucy         46
Labrador    Black      Max         59
            Brown    Bella         56
Poodle      Black  Charlie         43
Schnauzer   Grey    Cooper         49
St. Bernard White   Bernie         77

Full dataset

                      name  height_cm  weight_kg
breed       color                               
Chihuahua   Tan     Stella         18          2
Chow Chow   Brown     Lucy         46         22
Labrador    Black      Max         59         29
            Brown    Bella         56         25
Poodle      Black  Charlie         43         23
Schnauzer   Grey    Cooper         49         17
St. Bernard White   Bernie         77         74
Data Manipulation with pandas

Slice twice

dogs_srt.loc[
    ("Labrador", "Brown"):("Schnauzer", "Grey"), 
    "name":"height_cm"]
                    name  height_cm
breed     color                    
Labrador  Brown    Bella         56
Poodle    Black  Charlie         43
Schanuzer Grey    Cooper         49

Full dataset

                      name  height_cm  weight_kg
breed       color                               
Chihuahua   Tan     Stella         18          2
Chow Chow   Brown     Lucy         46         22
Labrador    Black      Max         59         29
            Brown    Bella         56         25
Poodle      Black  Charlie         43         23
Schnauzer   Grey    Cooper         49         17
St. Bernard White   Bernie         77         74
Data Manipulation with pandas

Dog days

dogs = dogs.set_index("date_of_birth").sort_index()
print(dogs)
                  name        breed  color  height_cm  weight_kg
date_of_birth                                                   
2011-12-11      Cooper    Schanuzer   Grey         49         17
2013-07-01       Bella     Labrador  Brown         56         25
2014-08-25        Lucy    Chow Chow  Brown         46         22
2015-04-20      Stella    Chihuahua    Tan         18          2
2016-09-16     Charlie       Poodle  Black         43         23
2017-01-20         Max     Labrador  Black         59         29
2018-02-27      Bernie  St. Bernard  White         77         74
Data Manipulation with pandas

Slicing by dates

# Get dogs with date_of_birth between 2014-08-25 and 2016-09-16
dogs.loc["2014-08-25":"2016-09-16"]
                  name      breed  color  height_cm  weight_kg
date_of_birth                                                 
2014-08-25        Lucy  Chow Chow  Brown         46         22
2015-04-20      Stella  Chihuahua    Tan         18          2
2016-09-16     Charlie     Poodle  Black         43         23
Data Manipulation with pandas

Slicing by partial dates

# Get dogs with date_of_birth between 2014-01-01 and 2016-12-31
dogs.loc["2014":"2016"]
                 name      breed  color  height_cm  weight_kg
date_of_birth                                                
2014-08-25       Lucy  Chow Chow  Brown         46         22
2015-04-20     Stella  Chihuahua    Tan         18          2
2016-09-16    Charlie     Poodle  Black         43         23
Data Manipulation with pandas

Subsetting by row/column number

print(dogs.iloc[2:5, 1:4])
       breed  color  height_cm
2  Chow Chow  Brown         46
3  Schnauzer   Grey         49
4   Labrador  Black         59

Full dataset

      name        breed  color  height_cm  weight_kg
0    Bella     Labrador  Brown         56         25
1  Charlie       Poodle  Black         43         23
2     Lucy    Chow Chow  Brown         46         22
3   Cooper    Schnauzer   Grey         49         17
4      Max     Labrador  Black         59         29
5   Stella    Chihuahua    Tan         18          2
6   Bernie  St. Bernard  White         77         74
Data Manipulation with pandas

Let's practice!

Data Manipulation with pandas

Preparing Video For Download...