Reshaping Data with pandas
Maria Eugenia Inzaugarat
Data Scientist
books
title ratings2019 sold2019 ratings2020 sold2020
0 Mostly Harmless 4.2 456 4.3 436
1 The Hitchhiker's Guide 4.8 980 4.9 998
2 El restaurante del fin del mundo 4.5 678 4.6 638
pd.wide_to_long(books, )
pd.wide_to_long(books, stubnames=['ratings', 'sold'] )
pd.wide_to_long(books, stubnames=['ratings', 'sold'], , j='year')
pd.wide_to_long(books, stubnames=['ratings', 'sold'], i='title', j='year')
ratings sold
title year
0 Mostly Harmless 2019 4.2 456
1 The Hitchhiker's Guide 2019 4.8 980
2 El restaurante del fin del mundo 2019 4.5 678
3 Mostly Harmless 2020 4.4 436
4 The Hitchhiker's Guide 2020 4.9 998
5 El restaurante del fin del mundo 2020 4.6 638
books_with_index
author ratings2019 sold2019
title
0 To Kill a Mockingbird Harper Lee 4.7 456
1 The Hitchhiker's Guide Douglas Adams 4.8 980
2 The Black Cat Edgar Alan Poe 4.5 678
pd.wide_to_long(books_with_index, stubnames=['ratings', 'sold'], i='author', j='year')
ratings sold
author year
0 Harper Lee 2019 4.2 456
1 Douglas Adams 2019 4.8 980
2 Edgar Alan Poe 2019 4.5 678
books_with_index.reset_index(drop=False, inplace=True)
pd.wide_to_long(books_with_index, stubnames=['ratings', 'sold'], i=['author', 'title'], j='year')
ratings sold
title author year
0 To Kill a Mockingbird Harper Lee 2019 4.7 456
1 The Hitchhiker's Guide Douglas Adams 2019 4.8 980
2 The Black Cat Edgar Alan Poe 2019 4.5 678
new_books
title author ratings_2019 sold_2019 ratings_2020 sold_2020
0 A Murder Is Announced Agatha Christie 4.4 796 4.8 856
1 Sherlock Holmes Sir A. Conan Doyle 4.5 780 4.8 818
2 The Sparrow Mary Doria Russell 4.2 178 4.1 238
pd.wide_to_long(new_books, stubnames=['ratings', 'sold'], i=['title', 'author'], j='year')
sold_2020 ratings_2020 ratings_2019 sold_2019 ratings sold
title author year
pd.wide_to_long(new_books, stubnames=['ratings', 'sold'], i=['title', 'author'], j='year', sep='_')
ratings sold
title author year
0 A Murder Is Announced Agatha Christie 2019 4.4 796
1 Sherlock Holmes Sir A. Conan Doyle 2019 4.5 780
2 The Sparrow Mary Doria Russell 2019 4.2 178
3 A Murder Is Announced Agatha Christie 2020 4.8 856
4 Sherlock Holmes Sir A. Conan Doyle 2020 4.8 818
5 The Sparrow Mary Doria Russell 2020 4.1 238
another_books
title ratings_one sold_one ratings_two sold_two
0 A Murder Is Announced 4.4 796 4.8 856
1 Sherlock Holmes 4.5 780 4.8 818
2 The Sparrow 4.2 178 4.1 238
pd.wide_to_long(another_books, stubnames=['ratings', 'sold'], i='title', j='edition', sep='_')
sold_one ratings_one ratings_two sold_two ratings sold
title year
pd.wide_to_long(another_books, stubnames=['ratings', 'sold'], i='title', j='edition', sep='_', suffix='\w+')
ratings sold
title edition
0 A Murder Is Announced one 4.4 796
1 Sherlock Holmes one 4.5 780
2 The Sparrow one 4.2 178
3 A Murder Is Announced two 4.8 856
4 Sherlock Holmes two 4.8 818
5 The Sparrow two 4.1 238
Reshaping Data with pandas