Stacking DataFrames

Reshaping Data with pandas

Maria Eugenia Inzaugarat

Data Scientist

Row multi-indices

  A DataFrame with row multiindex

Reshaping Data with pandas

Setting the index

churn
   credit_score age  country  num_products exited
0           619  43   France             1    Yes
1           608  34  Germany             0     No
2           502  23   France             1    Yes
Reshaping Data with pandas

Setting the index

churn.set_index(['country', 'age'], inplace=True)
              credit_score num_products exited
age  country  
 43   France           619            1    Yes
 34  Germany           608            0     No
 23   France           502            1    Yes
Reshaping Data with pandas

MultiIndex from array

new_array = [['yes', 'no', 'yes'], ['no', 'yes', 'yes']]

churn.index = pd.MultiIndex.from_arrays(new_array, names=['member', 'credit_card'])
churn
                    credit_score age  country  num_products exited
member credit_card           
   yes          no           619  43   France             1    Yes
    no         yes           608  34  Germany             0     No
   yes         yes           502  23   France             1    Yes
Reshaping Data with pandas

MultiIndex DataFrames

  DataFrame with multi row index and multi column index

Reshaping Data with pandas

MultiIndex DataFrames

index = pd.MultiIndex.from_arrays([['Wick', 'Wick', 'Shelley', 'Shelley'],
                                   ['John', 'Julien', 'Mary', 'Frank']], 
                      names=['last', 'first'])
columns = pd.MultiIndex.from_arrays([['2019', '2019', '2020', '2020'],
                                     ['age', 'weight', 'age', 'weight']],
                        names=['year', 'feature'])

patients = pd.DataFrame(data, index=index, columns=columns) patients
            year        2019        2020
         feature  age weight  age weight
   last    first
   Wick     John   25     68   26     72
          Julien   31     72   32     73
Shelley     Mary   41     68   42     69
           Frank   32     75   33     74
Reshaping Data with pandas

The .stack() method

Arrow pointing from DataFrame with two row index to DataFrame with three row index

stack method

Reshaping Data with pandas

The .stack() method

Rearrange a level of the columns to obtain a reshaped DataFrame with a new inner-most level row index

Squares highlighting the rearrange of index

Reshaping Data with pandas

Stack into a series

churn
   credit_score age  country  num_products exited
0           619  43   France             1    Yes
1           608  34  Germany             0     No
2           502  23   France             1    Yes
churned_stacked = churn.stack()
churned_stacked.head(10)
member  credit_card              
yes     no           credit_score        619
                     age                  43
                     country          France
                     num_products          1
                     churn               Yes
no      yes          credit_score        608
                     age                  34
                     country         Germany
                     num_products          0
                     churn                No
Reshaping Data with pandas

Stack into a DataFrame

patients

            year        2019        2020
         feature  age weight  age weight
   last    first
   Wick     John   25     68   26     72
          Julien   31     72   32     73
Shelley     Mary   41     68   42     69
           Frank   32     75   33     74
patients_stacked = patients.stack()
patients_stacked
                   year  2019 2020
  last   first  feature
  Wick    John      age    25   26 
                 weight    68   72
        Julien      age    31   32
                 weight    72   73
Shelley  Mary       age    41   42
                 weight    68   69
        Frank       age    32   33
                 weight    75   74
Reshaping Data with pandas

Stack a level by number

patients

            year        2019        2020
         feature  age weight  age weight
   last    first
   Wick     John   25     68   26     72
          Julien   31     72   32     73
Shelley     Mary   41     68   42     69
           Frank   32     75   33     74
patients.stack(level=0)
               feature   age weight
   last  first    year
   Wick   John    2019    25     68 
                  2020    26     72
        Julien    2019    31     72
                  2020    32     73
Shelley   Mary    2019    41     68
                  2020    42     69
         Frank    2019    32     75
                  2020    33     74
Reshaping Data with pandas

Stack a level by name

patients

            year        2019        2020
         feature  age weight  age weight
   last    first
   Wick     John   25     68   26     72
          Julien   31     72   32     73
Shelley     Mary   41     68   42     69
           Frank   32     75   33     74
patients.stack(level='year')
               feature   age weight
   last  first    year               
   Wick   John    2019    25     68 
                  2020    26     72
        Julien    2019    31     72
                  2020    32     73
Shelley   Mary    2019    41     68
                  2020    42     69
         Frank    2019    32     75
                  2020    33     74
Reshaping Data with pandas

Let's practice!

Reshaping Data with pandas

Preparing Video For Download...