Reshaping using pivot method

Reshaping Data with pandas

Maria Eugenia Inzaugarat

Data Scientist

From long to wide

  • Demonstrate relationship between two columns
  • Time series operations with the variables
  • Operation that requires columns to be the unique variable
1 https://pandas.pydata.org/docs/user_guide/reshaping.html
Reshaping Data with pandas

From long to wide

DataFrame with a long format

Reshaping Data with pandas

Pivot method

Arrow pointing from a long to a wide format

 

the pandas pivot method call

Reshaping Data with pandas

Pivot method

Arrow pointing from a long to a wide format

 

The pivot call with arguments

Reshaping Data with pandas

Pivot method

Long and wide DataFrames with column and index highlighted

 

Highlighted index argument with column name

Reshaping Data with pandas

Pivot method

Long and wide DataFrames with column and column names highlighted

 

Highlighted columns argument with column name

Reshaping Data with pandas

Pivot method

Long and wide DataFrames with column and values highlighted

 

Highlighted values argument with column name

Reshaping Data with pandas

Pivot method

A NaN cell value highlighted

 

Highlighted arguments with column names

Reshaping Data with pandas

Pivoting a dataset

fifa = pd.read_csv('fifa_players.csv')
fifa.head()
                 name    variable  metric_system  imperial_system
0   Cristiano Ronaldo      weight             83           183.00
1            J. Oblak      weight             87           191.00
2   Cristiano Ronaldo      height            187             6.13
3     J. Oblak             height            188             6.16
Reshaping Data with pandas

Pivoting a dataset

fifa.pivot(index='name'                                            )
Reshaping Data with pandas

Pivoting a dataset

fifa.pivot(index='name', columns='variable'                        )
Reshaping Data with pandas

Pivoting a dataset

fifa.pivot(index='name', columns='variable', values='metric_system')
         variable  height   weight
             name        
Cristiano Ronaldo     187       83
         J. Oblak     188       87
Reshaping Data with pandas

Pivoting multiple columns

fifa.pivot(index='name', columns='variable', values=['metric_system', 'imperial_system'])
                     metric_system     imperial_system       
         variable   height  weight     height   weight
             name                                                         
Cristiano Ronaldo      187      83       6.13    183.0
         J. Oblak      188      87       6.16    191.0
Reshaping Data with pandas

Pivoting multiple columns

 

Arrow pointing from a long to a wide format with hierarchical column index

 

Highlighted index and columns argument with column names

Reshaping Data with pandas

Pivoting multiple columns

fifa.pivot(index="name", columns="variable")
                     metric_system     imperial_system       
         variable   height  weight     height   weight
             name                                                         
Cristiano Ronaldo      187      83       6.13    183.0
         J. Oblak      188      87       6.16    191.0
Reshaping Data with pandas

Duplicate entries error

another_fifa.head()
                 name    variable  metric_system  imperial_system
0   Cristiano Ronaldo      weight             83           183.00
1            J. Oblak      weight             87           191.00
2   Cristiano Ronaldo      height            187             6.13
3            J. Oblak      height            188             6.16
4   Cristiano Ronaldo      height            187             6.14
Reshaping Data with pandas

Duplicate entries error

another_fifa.head()
                 name    variable  metric_system  imperial_system
0   Cristiano Ronaldo      weight             83           183.00
1            J. Oblak      weight             87           191.00
  2   Cristiano Ronaldo      height            187             6.13 <--
3            J. Oblak      height            188             6.16
  4   Cristiano Ronaldo      height            187             6.14 <--
Reshaping Data with pandas

Duplicate entries error

another_fifa.pivot(index="name", columns="variable")
ValueError: Index contains duplicate entries, cannot reshape

 

another_fifa = another_fifa.drop(4, axis=0)
another_fifa.pivot(index="name", columns="variable")
                     metric_system     imperial_system       
         variable   height  weight     height   weight
             name                                                         
Cristiano Ronaldo      187      83       6.13    183.0
         J. Oblak      188      87       6.16    191.0
Reshaping Data with pandas

Let's practice!

Reshaping Data with pandas

Preparing Video For Download...