Pivot tables

Reshaping Data with pandas

Maria Eugenia Inzaugarat

Data Scientist

Pivot method limitations

another_fifa.head()
                 name    variable  metric_system  imperial_system
0   Cristiano Ronaldo      weight             83           183.00
1            J. Oblak      weight             87           191.00
2   Cristiano Ronaldo      height            187             6.13
3     J. Oblak             height            188             6.16
4   Cristiano Ronaldo      height            187             6.14
another_fifa.pivot(index="name", columns="variable")
Traceback (most recent call last):
   ValueError: Index contains duplicate entries, cannot reshape
Reshaping Data with pandas

Pivot method limitations

  • General purpose pivoting
  • Index/column pair must be unique
  • Cannot aggregate values
Reshaping Data with pandas

Pivot table

  • A DataFrame containing statistics that summarizes the data of a larger DataFrame

 

A DataFrame showing summary statistics

Reshaping Data with pandas

Pivot table

A DataFrame in long format

Reshaping Data with pandas

Pivot table

Arrow pointing from a long to a wide format with summary statistics

 

The pivot table call

Reshaping Data with pandas

Pivot table

A long and summary DataFrame with highlighted columns and indexes

 

The pivot call with index and columns arguments

Reshaping Data with pandas

Pivot table

A long and summary DataFrame with highlighted column and values

 

The pivot call with values and aggregate function arguments

Reshaping Data with pandas

Pivot table

another_fifa.pivot_table(index="name", columns="variable", aggfunc="mean")
                     metric_system     imperial_system       
         variable   height  weight     height   weight
             name                                                         
Cristiano Ronaldo      187      83      6.135    183.0
         J. Oblak      188      87      6.160    191.0
Reshaping Data with pandas

Hierarchical indexes

fifa_players.head(6)
      first     last   movement   overall  attacking
0    Lionel    Messi   shooting        92         70
1 Cristiano  Ronaldo   shooting        93         89
2    Lionel    Messi    passing        92         92
3 Cristiano  Ronaldo    passing        82         83
4    Lionel    Messi    passing        96         88
5 Cristiano  Ronaldo    passing        89         84
Reshaping Data with pandas

Hierarchical indexes

fifa_players.head(6)
      first     last   movement   overall  attacking
0    Lionel    Messi   shooting        92         70
1 Cristiano  Ronaldo   shooting        93         89
2    Lionel    Messi    passing        92         92
3 Cristiano  Ronaldo    passing        82         83
4    Lionel    Messi    passing        96         88
5 Cristiano  Ronaldo    passing        89         84
fifa_players.pivot_table(index=                 , columns="movement", values=                        , aggfunc=     )
Reshaping Data with pandas

Hierarchical indexes

fifa_players.head(6)
      first     last   movement   overall  attacking
0    Lionel    Messi   shooting        92         70
1 Cristiano  Ronaldo   shooting        93         89
2    Lionel    Messi    passing        92         92
3 Cristiano  Ronaldo    passing        82         83
4    Lionel    Messi    passing        96         88
5 Cristiano  Ronaldo    passing        89         84
fifa_players.pivot_table(index=["first", "last"], columns="movement", values=                        , aggfunc=     )
Reshaping Data with pandas

Hierarchical indexes

fifa_players.head(6)
      first     last   movement   overall  attacking
0    Lionel    Messi   shooting        92         70
1 Cristiano  Ronaldo   shooting        93         89
2    Lionel    Messi    passing        92         92
3 Cristiano  Ronaldo    passing        82         83
4    Lionel    Messi    passing        96         88
5 Cristiano  Ronaldo    passing        89         84
fifa_players.pivot_table(index=["first", "last"], columns="movement", values=["overall", "attacking"], aggfunc="max")
                           attacking           overall
          movement  passing shooting  passing shooting
    first     last                
Cristiano  Ronaldo       84       89       89       93
   Lionel    Messi       92       70       96       92
Reshaping Data with pandas

Margins

fifa_players.pivot_table(index=["first", "last"], columns="movement", aggfunc="count",             )
Reshaping Data with pandas

Margins

fifa_players.pivot_table(index=["first", "last"], columns="movement", aggfunc="count", margins=True)
                                attacking                  overall
          movement  passing shooting  All    passing shooting  All
    First     Last                
Cristiano  Ronaldo        2        1    3          2        1    3
   Lionel    Messi        2        1    3          2        1    3
      All                 4        2    6          4        2    6
Reshaping Data with pandas

Pivot or pivot table?

 

Does the DataFrame have more than one value for each index/column pair?

Do you need to have a multi-index in your resulting pivoted DataFrame?

Do you need summary statistics of your large DataFrame?

Yes! Use .pivot_table()

Reshaping Data with pandas

Let's practice!

Reshaping Data with pandas

Preparing Video For Download...