Reshaping Data with pandas
Maria Eugenia Inzaugarat
Data Scientist
churn
credit_score age country num_products exited
0 619 43 France 1 Yes
1 608 34 Germany 0 No
2 502 23 France 1 Yes
churn.set_index(['country', 'age'], inplace=True)
credit_score num_products exited
age country
43 France 619 1 Yes
34 Germany 608 0 No
23 France 502 1 Yes
new_array = [['yes', 'no', 'yes'], ['no', 'yes', 'yes']]
churn.index = pd.MultiIndex.from_arrays(new_array, names=['member', 'credit_card'])
churn
credit_score age country num_products exited
member credit_card
yes no 619 43 France 1 Yes
no yes 608 34 Germany 0 No
yes yes 502 23 France 1 Yes
index = pd.MultiIndex.from_arrays([['Wick', 'Wick', 'Shelley', 'Shelley'], ['John', 'Julien', 'Mary', 'Frank']], names=['last', 'first']) columns = pd.MultiIndex.from_arrays([['2019', '2019', '2020', '2020'], ['age', 'weight', 'age', 'weight']], names=['year', 'feature'])
patients = pd.DataFrame(data, index=index, columns=columns) patients
year 2019 2020
feature age weight age weight
last first
Wick John 25 68 26 72
Julien 31 72 32 73
Shelley Mary 41 68 42 69
Frank 32 75 33 74
Rearrange a level of the columns to obtain a reshaped DataFrame with a new inner-most level row index
churn
credit_score age country num_products exited
0 619 43 France 1 Yes
1 608 34 Germany 0 No
2 502 23 France 1 Yes
churned_stacked = churn.stack()
churned_stacked.head(10)
member credit_card
yes no credit_score 619
age 43
country France
num_products 1
churn Yes
no yes credit_score 608
age 34
country Germany
num_products 0
churn No
patients
year 2019 2020
feature age weight age weight
last first
Wick John 25 68 26 72
Julien 31 72 32 73
Shelley Mary 41 68 42 69
Frank 32 75 33 74
patients_stacked = patients.stack()
patients_stacked
year 2019 2020
last first feature
Wick John age 25 26
weight 68 72
Julien age 31 32
weight 72 73
Shelley Mary age 41 42
weight 68 69
Frank age 32 33
weight 75 74
patients
year 2019 2020
feature age weight age weight
last first
Wick John 25 68 26 72
Julien 31 72 32 73
Shelley Mary 41 68 42 69
Frank 32 75 33 74
patients.stack(level=0)
feature age weight
last first year
Wick John 2019 25 68
2020 26 72
Julien 2019 31 72
2020 32 73
Shelley Mary 2019 41 68
2020 42 69
Frank 2019 32 75
2020 33 74
patients
year 2019 2020
feature age weight age weight
last first
Wick John 25 68 26 72
Julien 31 72 32 73
Shelley Mary 41 68 42 69
Frank 32 75 33 74
patients.stack(level='year')
feature age weight
last first year
Wick John 2019 25 68
2020 26 72
Julien 2019 31 72
2020 32 73
Shelley Mary 2019 41 68
2020 42 69
Frank 2019 32 75
2020 33 74
Reshaping Data with pandas