Slicen en subsetten met .loc en .iloc

Datamanipulatie met pandas

Richie Cotton

Data Evangelist at DataCamp

Lijsten slicen

breeds = ["Labrador", "Poodle", 
          "Chow Chow", "Schnauzer", 
          "Labrador", "Chihuahua", 
          "St. Bernard"]
['Labrador',
 'Poodle',
 'Chow Chow',
 'Schnauzer',
 'Labrador',
 'Chihuahua',
 'St. Bernard']
breeds[2:5]
['Chow Chow', 'Schnauzer', 'Labrador']
breeds[:3]
['Labrador', 'Poodle', 'Chow Chow']
breeds[:]
['Labrador','Poodle','Chow Chow','Schnauzer',
 'Labrador','Chihuahua','St. Bernard']
Datamanipulatie met pandas

Sorteer de index, voordat je gaat slicen

dogs_srt = dogs.set_index(["breed", "color"]).sort_index()
print(dogs_srt)
                      name  height_cm  weight_kg
breed       color                               
Chihuahua   Tan     Stella         18          2
Chow Chow   Brown     Lucy         46         22
Labrador    Black      Max         59         29
            Brown    Bella         56         25
Poodle      Black  Charlie         43         23
Schnauzer   Grey    Cooper         49         17
St. Bernard White   Bernie         77         74
Datamanipulatie met pandas

Het buitenste indexniveau slicen

dogs_srt.loc["Chow Chow":"Poodle"]
                    name  height_cm  weight_kg
breed     color                               
Chow Chow Brown     Lucy         46         22
Labrador  Black      Max         59         29
          Brown    Bella         56         25
Poodle    Black  Charlie         43         23

De uiteindelijke waarde **"Poodle"** is inbegrepen

Volledige dataset

                      name  height_cm  weight_kg
breed       color                               
Chihuahua   Tan     Stella         18          2
Chow Chow   Brown     Lucy         46         22
Labrador    Black      Max         59         29
            Brown    Bella         56         25
Poodle      Black  Charlie         43         23
Schnauzer   Grey    Cooper         49         17
St. Bernard White   Bernie         77         74
Datamanipulatie met pandas

De binnenste indexniveaus slecht slicen

dogs_srt.loc["Tan":"Grey"]
Empty DataFrame
Columns: [name, height_cm, weight_kg]
Index: []

Volledige dataset

                      name  height_cm  weight_kg
breed       color                               
Chihuahua   Tan     Stella         18          2
Chow Chow   Brown     Lucy         46         22
Labrador    Black      Max         59         29
            Brown    Bella         56         25
Poodle      Black  Charlie         43         23
Schnauzer   Grey    Cooper         49         17
St. Bernard White   Bernie         77         74
Datamanipulatie met pandas

De binnenste indexniveaus goed slicen

dogs_srt.loc[
    ("Labrador", "Brown"):("Schnauzer", "Grey")]
                    name  height_cm  weight_kg
breed     color                               
Labrador  Brown    Bella         56         25
Poodle    Black  Charlie         43         23
Schnauzer Grey    Cooper         49         17

Volledige dataset

                      name  height_cm  weight_kg
breed       color                               
Chihuahua   Tan     Stella         18          2
Chow Chow   Brown     Lucy         46         22
Labrador    Black      Max         59         29
            Brown    Bella         56         25
Poodle      Black  Charlie         43         23
Schnauzer   Grey    Cooper         49         17
St. Bernard White   Bernie         77         74
Datamanipulatie met pandas

Kolommen slicen

dogs_srt.loc[:, "name":"height_cm"]
                      name  height_cm
breed       color                    
Chihuahua   Tan     Stella         18
Chow Chow   Brown     Lucy         46
Labrador    Black      Max         59
            Brown    Bella         56
Poodle      Black  Charlie         43
Schnauzer   Grey    Cooper         49
St. Bernard White   Bernie         77

Volledige dataset

                      name  height_cm  weight_kg
breed       color                               
Chihuahua   Tan     Stella         18          2
Chow Chow   Brown     Lucy         46         22
Labrador    Black      Max         59         29
            Brown    Bella         56         25
Poodle      Black  Charlie         43         23
Schnauzer   Grey    Cooper         49         17
St. Bernard White   Bernie         77         74
Datamanipulatie met pandas

Twee keer slicen

dogs_srt.loc[
    ("Labrador", "Brown"):("Schnauzer", "Grey"), 
    "name":"height_cm"]
                    name  height_cm
breed     color                    
Labrador  Brown    Bella         56
Poodle    Black  Charlie         43
Schanuzer Grey    Cooper         49

Volledige dataset

                      name  height_cm  weight_kg
breed       color                               
Chihuahua   Tan     Stella         18          2
Chow Chow   Brown     Lucy         46         22
Labrador    Black      Max         59         29
            Brown    Bella         56         25
Poodle      Black  Charlie         43         23
Schnauzer   Grey    Cooper         49         17
St. Bernard White   Bernie         77         74
Datamanipulatie met pandas

Hondendagen

dogs = dogs.set_index("date_of_birth").sort_index()
print(dogs)
                  name        breed  color  height_cm  weight_kg
date_of_birth                                                   
2011-12-11      Cooper    Schanuzer   Grey         49         17
2013-07-01       Bella     Labrador  Brown         56         25
2014-08-25        Lucy    Chow Chow  Brown         46         22
2015-04-20      Stella    Chihuahua    Tan         18          2
2016-09-16     Charlie       Poodle  Black         43         23
2017-01-20         Max     Labrador  Black         59         29
2018-02-27      Bernie  St. Bernard  White         77         74
Datamanipulatie met pandas

Slicen op datum

# Get dogs with date_of_birth between 2014-08-25 and 2016-09-16
dogs.loc["2014-08-25":"2016-09-16"]
                  name      breed  color  height_cm  weight_kg
date_of_birth                                                 
2014-08-25        Lucy  Chow Chow  Brown         46         22
2015-04-20      Stella  Chihuahua    Tan         18          2
2016-09-16     Charlie     Poodle  Black         43         23
Datamanipulatie met pandas

Slicen op gedeeltelijke datums

# Get dogs with date_of_birth between 2014-01-01 and 2016-12-31
dogs.loc["2014":"2016"]
                 name      breed  color  height_cm  weight_kg
date_of_birth                                                
2014-08-25       Lucy  Chow Chow  Brown         46         22
2015-04-20     Stella  Chihuahua    Tan         18          2
2016-09-16    Charlie     Poodle  Black         43         23
Datamanipulatie met pandas

Subsets maken op rij-/kolomnummer

print(dogs.iloc[2:5, 1:4])
       breed  color  height_cm
2  Chow Chow  Brown         46
3  Schnauzer   Grey         49
4   Labrador  Black         59

Volledige dataset

      name        breed  color  height_cm  weight_kg
0    Bella     Labrador  Brown         56         25
1  Charlie       Poodle  Black         43         23
2     Lucy    Chow Chow  Brown         46         22
3   Cooper    Schnauzer   Grey         49         17
4      Max     Labrador  Black         59         29
5   Stella    Chihuahua    Tan         18          2
6   Bernie  St. Bernard  White         77         74
Datamanipulatie met pandas

Laten we oefenen!

Datamanipulatie met pandas

Preparing Video For Download...