Handling missing data

Rimodellare i dati con pandas

Maria Eugenia Inzaugarat

Data Scientist

Review

  • Stack and unstack DataFrames:
    • All columns index levels
    • A row index level
    • Choose which levels to stack or unstack
Rimodellare i dati con pandas

Unstacking leads to missing values

Subgroups do not have the same set of labels

animals
                                jump  run  fly
class    order         name                   
Mammalia carnivora     dog        No  Yes   No
         Diprotodontia Kangaroo  Yes   No   No
Aves     hervibora     bird       No   No  Yes
Rimodellare i dati con pandas

Unstacking leads to missing values

Subgroups do not have the same set of labels

animals
                                jump  run  fly
class    order         name                   
   Mammalia carnivora     dog        No  Yes   No <--
         Diprotodontia Kangaroo  Yes   No   No
Aves     hervibora     bird       No   No  Yes
Rimodellare i dati con pandas

Unstacking leads to missing values

Subgroups do not have the same set of labels

animals.unstack(level='class')
                                 jump            run           fly         
        clas             Aves Mammalia Aves Mammalia Aves Mammalia
        order       name                                              
  Diprotodontia Kangaroo  NaN      Yes  NaN       No  NaN       No
      carnivora      Dog  NaN       No  NaN      Yes  NaN       No
Charadriiformes   Avocet   No      NaN   No      NaN  Yes      NaN
Rimodellare i dati con pandas

Unstacking leads to missing values

Subgroups do not have the same set of labels

animals.unstack(level='class')
                                 jump            run           fly         
        clas             Aves Mammalia Aves Mammalia Aves Mammalia
        order       name                                              
  Diprotodontia Kangaroo  NaN      Yes  NaN       No  NaN       No
  -----------------------------
      carnivora      Dog  NaN <--  No   NaN      Yes  NaN       No
  -----------------------------
Charadriiformes   Avocet   No      NaN   No      NaN  Yes      NaN
Rimodellare i dati con pandas

Handling NaN with unstack

animals.unstack(level='class', fill_value=    )
Rimodellare i dati con pandas

Handling NaN with unstack

animals.unstack(level='class', fill_value='No')
Rimodellare i dati con pandas

Handling NaN with unstack

animals.unstack(level='class', fill_value='No').sort_index(level=['order', 'name'], ascending=[True, False])
                                 jump            run           fly         
        clas             Aves Mammalia Aves Mammalia Aves Mammalia
        order       name                                              
  Diprotodontia Kangaroo   No      Yes   No       No   No       No
      carnivora      Dog   No       No   No      Yes   No       No
Charadriiformes   Avocet   No       No   No       No  Yes       No
Rimodellare i dati con pandas

Stack and missing values

Combinations of index and column values missing from the original DataFrame

flowers
     petals Stigma
     number   size
rose     40    NaN
Lily      8      5
Rimodellare i dati con pandas

Stack and missing values

Combinations of index and column values missing from the original DataFrame

flowers.stack()
            Stigma  petals
rose number    NaN    40.0
Lily number    NaN     8.0
       size      5     NaN
Rimodellare i dati con pandas

Stack and missing values

Combinations of index and column values missing from the original DataFrame

flowers.stack(dropna=True)
            Stigma  petals
rose number    NaN    40.0
Lily number    NaN     8.0
       size      5     NaN
Rimodellare i dati con pandas

Stack and missing values

Combinations of index and column values missing from the original DataFrame

flowers.stack(dropna=False)
            Stigma  petals
rose number    NaN    40.0
         size    NaN     NaN <--
Lily number    NaN     8.0
       size      5     NaN
Rimodellare i dati con pandas

Handling NaN with stack

flowers.stack(dropna=False).fillna(0)
            Stigma  petals
rose number      0    40.0
       size      0       0
Lily number      0     8.0
       size      5       0
Rimodellare i dati con pandas

Let's practice!

Rimodellare i dati con pandas

Preparing Video For Download...