Reshaping and combining data

Reshaping Data with pandas

Maria Eugenia Inzaugarat

Data Scientist

Reshaping and statistical functions

sales
                    office supply           Technology
         shop     online        onsite     online      onsite
country  year      
Italy    2017      310           123          510       340
         2018      110           100          610       120
Spain    2017      229           200          300       240
         2018      120           220          190       210
Reshaping Data with pandas

Statistical functions

 

  • Sum: .sum()
  • Mean: .mean()
  • Median: .median()
  • Difference: .diff()
Reshaping Data with pandas

Stacking and stats

  • Total amount of online and on-site sales by year in the two countries
sales.stack().sum(axis=1)
 country  year   shop  
 Italy    2017  online    820
                onsite    463
          2018  online    720
                onsite    220
 Spain    2017  online    529
                onsite    440
          2018  online    310
                onsite    430
Reshaping Data with pandas

Stacking and stats

  • Total amount of online and on- site sales by year in the two countries
sales.stack().sum(axis=1).unstack()
         shop online onsite
country  year        
 Italy   2017   820    463
         2018   720    220
 Spain   2017   529    440
         2018   310    430
Reshaping Data with pandas

Unstacking and stats

  • Mean amount of product sales by year in both countries
sales.unstack(level=0).mean(axis=1)
year
2017    281.5
2018    210.0
Reshaping Data with pandas

Unstacking and stats

  • Difference in the amount of sales between years
sales["office supply"].unstack(level='country')
Reshaping Data with pandas

Unstacking and stats

  • Difference in the amount of sales between years
sales["office supply"].unstack(level='country').diff(axis=1, periods=2) 
                         office supply
   shop         online          onsite
country   Italy  Spain   Italy   Spain
year                
2017      NaN     NaN   -187.0   -29.0
2018      NaN     NaN    -10.0   100.0
Reshaping Data with pandas

Reshaping and grouping

  • Total amount of different products by online or on-site regardless of the country
sales.stack().head(4)
                      office supply   Technology
country   year  shop        
 Italy    2017 online       310          510
               onsite       123          340
          2018 online       110          610
               onsite       100          120
Reshaping Data with pandas

Reshaping and grouping

  • Total amount of different products by online or on-site regardless of the country
sales.stack().groupby(level='shop').sum()
        office supply    Technology
shop        
online       769             1610
onsite       643             910
Reshaping Data with pandas

Reshaping after grouping

  • Median amount of products by year
sales.groupby(level='year').median()
        office supply         Technology
shop    online  onsite    online   onsite
year                
2017     269.5  161.5      405.0   290.0
2018     115.0  160.0      400.0   165.0
Reshaping Data with pandas

Reshaping after grouping

  • Median amount of products by year
sales.groupby(level=1).median().stack(level=[0, 1]).unstack(level='year')
                   year    2017     2018
                   shop        
Technology        online   405.0    400.0
                  onsite   290.0    165.0
office supply     online   269.5    115.0
                  onsite   161.5    160.0
Reshaping Data with pandas

Let's practice!

Reshaping Data with pandas

Preparing Video For Download...