Handling missing data

Reshaping Data with pandas

Maria Eugenia Inzaugarat

Data Scientist

Review

  • Stack and unstack DataFrames:
    • All columns index levels
    • A row index level
    • Choose which levels to stack or unstack
Reshaping Data with pandas

Unstacking leads to missing values

Subgroups do not have the same set of labels

animals
                                jump  run  fly
class    order         name                   
Mammalia carnivora     dog        No  Yes   No
         Diprotodontia Kangaroo  Yes   No   No
Aves     hervibora     bird       No   No  Yes
Reshaping Data with pandas

Unstacking leads to missing values

Subgroups do not have the same set of labels

animals
                                jump  run  fly
class    order         name                   
   Mammalia carnivora     dog        No  Yes   No <--
         Diprotodontia Kangaroo  Yes   No   No
Aves     hervibora     bird       No   No  Yes
Reshaping Data with pandas

Unstacking leads to missing values

Subgroups do not have the same set of labels

animals.unstack(level='class')
                                 jump            run           fly         
        clas             Aves Mammalia Aves Mammalia Aves Mammalia
        order       name                                              
  Diprotodontia Kangaroo  NaN      Yes  NaN       No  NaN       No
      carnivora      Dog  NaN       No  NaN      Yes  NaN       No
Charadriiformes   Avocet   No      NaN   No      NaN  Yes      NaN
Reshaping Data with pandas

Unstacking leads to missing values

Subgroups do not have the same set of labels

animals.unstack(level='class')
                                 jump            run           fly         
        clas             Aves Mammalia Aves Mammalia Aves Mammalia
        order       name                                              
  Diprotodontia Kangaroo  NaN      Yes  NaN       No  NaN       No
  -----------------------------
      carnivora      Dog  NaN <--  No   NaN      Yes  NaN       No
  -----------------------------
Charadriiformes   Avocet   No      NaN   No      NaN  Yes      NaN
Reshaping Data with pandas

Handling NaN with unstack

animals.unstack(level='class', fill_value=    )
Reshaping Data with pandas

Handling NaN with unstack

animals.unstack(level='class', fill_value='No')
Reshaping Data with pandas

Handling NaN with unstack

animals.unstack(level='class', fill_value='No').sort_index(level=['order', 'name'], ascending=[True, False])
                                 jump            run           fly         
        clas             Aves Mammalia Aves Mammalia Aves Mammalia
        order       name                                              
  Diprotodontia Kangaroo   No      Yes   No       No   No       No
      carnivora      Dog   No       No   No      Yes   No       No
Charadriiformes   Avocet   No       No   No       No  Yes       No
Reshaping Data with pandas

Stack and missing values

Combinations of index and column values missing from the original DataFrame

flowers
     petals Stigma
     number   size
rose     40    NaN
Lily      8      5
Reshaping Data with pandas

Stack and missing values

Combinations of index and column values missing from the original DataFrame

flowers.stack()
            Stigma  petals
rose number    NaN    40.0
Lily number    NaN     8.0
       size      5     NaN
Reshaping Data with pandas

Stack and missing values

Combinations of index and column values missing from the original DataFrame

flowers.stack(dropna=True)
            Stigma  petals
rose number    NaN    40.0
Lily number    NaN     8.0
       size      5     NaN
Reshaping Data with pandas

Stack and missing values

Combinations of index and column values missing from the original DataFrame

flowers.stack(dropna=False)
            Stigma  petals
rose number    NaN    40.0
         size    NaN     NaN <--
Lily number    NaN     8.0
       size      5     NaN
Reshaping Data with pandas

Handling NaN with stack

flowers.stack(dropna=False).fillna(0)
            Stigma  petals
rose number      0    40.0
       size      0       0
Lily number      0     8.0
       size      5       0
Reshaping Data with pandas

Let's practice!

Reshaping Data with pandas

Preparing Video For Download...