Reshaping using pivot method

Rimodellare i dati con pandas

Maria Eugenia Inzaugarat

Data Scientist

From long to wide

  • Demonstrate relationship between two columns
  • Time series operations with the variables
  • Operation that requires columns to be the unique variable
1 https://pandas.pydata.org/docs/user_guide/reshaping.html
Rimodellare i dati con pandas

From long to wide

DataFrame with a long format

Rimodellare i dati con pandas

Pivot method

Arrow pointing from a long to a wide format

 

the pandas pivot method call

Rimodellare i dati con pandas

Pivot method

Arrow pointing from a long to a wide format

 

The pivot call with arguments

Rimodellare i dati con pandas

Pivot method

Long and wide DataFrames with column and index highlighted

 

Highlighted index argument with column name

Rimodellare i dati con pandas

Pivot method

Long and wide DataFrames with column and column names highlighted

 

Highlighted columns argument with column name

Rimodellare i dati con pandas

Pivot method

Long and wide DataFrames with column and values highlighted

 

Highlighted values argument with column name

Rimodellare i dati con pandas

Pivot method

A NaN cell value highlighted

 

Highlighted arguments with column names

Rimodellare i dati con pandas

Pivoting a dataset

fifa = pd.read_csv('fifa_players.csv')
fifa.head()
                 name    variable  metric_system  imperial_system
0   Cristiano Ronaldo      weight             83           183.00
1            J. Oblak      weight             87           191.00
2   Cristiano Ronaldo      height            187             6.13
3     J. Oblak             height            188             6.16
Rimodellare i dati con pandas

Pivoting a dataset

fifa.pivot(index='name'                                            )
Rimodellare i dati con pandas

Pivoting a dataset

fifa.pivot(index='name', columns='variable'                        )
Rimodellare i dati con pandas

Pivoting a dataset

fifa.pivot(index='name', columns='variable', values='metric_system')
         variable  height   weight
             name        
Cristiano Ronaldo     187       83
         J. Oblak     188       87
Rimodellare i dati con pandas

Pivoting multiple columns

fifa.pivot(index='name', columns='variable', values=['metric_system', 'imperial_system'])
                     metric_system     imperial_system       
         variable   height  weight     height   weight
             name                                                         
Cristiano Ronaldo      187      83       6.13    183.0
         J. Oblak      188      87       6.16    191.0
Rimodellare i dati con pandas

Pivoting multiple columns

 

Arrow pointing from a long to a wide format with hierarchical column index

 

Highlighted index and columns argument with column names

Rimodellare i dati con pandas

Pivoting multiple columns

fifa.pivot(index="name", columns="variable")
                     metric_system     imperial_system       
         variable   height  weight     height   weight
             name                                                         
Cristiano Ronaldo      187      83       6.13    183.0
         J. Oblak      188      87       6.16    191.0
Rimodellare i dati con pandas

Duplicate entries error

another_fifa.head()
                 name    variable  metric_system  imperial_system
0   Cristiano Ronaldo      weight             83           183.00
1            J. Oblak      weight             87           191.00
2   Cristiano Ronaldo      height            187             6.13
3            J. Oblak      height            188             6.16
4   Cristiano Ronaldo      height            187             6.14
Rimodellare i dati con pandas

Duplicate entries error

another_fifa.head()
                 name    variable  metric_system  imperial_system
0   Cristiano Ronaldo      weight             83           183.00
1            J. Oblak      weight             87           191.00
  2   Cristiano Ronaldo      height            187             6.13 <--
3            J. Oblak      height            188             6.16
  4   Cristiano Ronaldo      height            187             6.14 <--
Rimodellare i dati con pandas

Duplicate entries error

another_fifa.pivot(index="name", columns="variable")
ValueError: Index contains duplicate entries, cannot reshape

 

another_fifa = another_fifa.drop(4, axis=0)
another_fifa.pivot(index="name", columns="variable")
                     metric_system     imperial_system       
         variable   height  weight     height   weight
             name                                                         
Cristiano Ronaldo      187      83       6.13    183.0
         J. Oblak      188      87       6.16    191.0
Rimodellare i dati con pandas

Let's practice!

Rimodellare i dati con pandas

Preparing Video For Download...