Wide to long function

Rimodellare i dati con pandas

Maria Eugenia Inzaugarat

Data Scientist

Wide to long transformation

Wide to long function

Reshaping data

books

                             title ratings2019 sold2019 ratings2020 sold2020
0                  Mostly Harmless         4.2      456         4.3      436
1           The Hitchhiker's Guide         4.8      980         4.9      998
2 El restaurante del fin del mundo         4.5      678         4.6      638

Reshaping data

pd.wide_to_long(books,                                                   )

Reshaping data

pd.wide_to_long(books, stubnames=['ratings', 'sold']                     )

Reshaping data

pd.wide_to_long(books, stubnames=['ratings', 'sold'],          , j='year')

Reshaping data

pd.wide_to_long(books, stubnames=['ratings', 'sold'], i='title', j='year')

                                         ratings    sold
                             title  year 
0                  Mostly Harmless  2019     4.2     456
1           The Hitchhiker's Guide  2019     4.8     980
2 El restaurante del fin del mundo  2019     4.5     678    
3                  Mostly Harmless  2020     4.4     436
4           The Hitchhiker's Guide  2020     4.9     998
5 El restaurante del fin del mundo  2020     4.6     638

DataFrame with index

books_with_index

                                    author  ratings2019 sold2019
                    title 
0   To Kill a Mockingbird       Harper Lee          4.7      456
1  The Hitchhiker's Guide    Douglas Adams          4.8      980
2           The Black Cat   Edgar Alan Poe          4.5      678

pd.wide_to_long(books_with_index, stubnames=['ratings', 'sold'], i='author', j='year')

                        ratings    sold
          author  year
0     Harper Lee  2019      4.2      456
1  Douglas Adams  2019      4.8      980
2 Edgar Alan Poe  2019      4.5      678

DataFrame with index

books_with_index.reset_index(drop=False, inplace=True)

pd.wide_to_long(books_with_index, stubnames=['ratings', 'sold'], i=['author', 'title'], j='year')

                                                     ratings  sold
                    title           author    year
0   To Kill a Mockingbird       Harper Lee    2019       4.7   456
1  The Hitchhiker's Guide    Douglas Adams    2019       4.8   980
2           The Black Cat   Edgar Alan Poe    2019       4.5   678

sep argument

new_books

                  title              author  ratings_2019 sold_2019 ratings_2020 sold_2020
0 A Murder Is Announced     Agatha Christie           4.4       796          4.8       856
1       Sherlock Holmes  Sir A. Conan Doyle           4.5       780          4.8       818
2           The Sparrow  Mary Doria Russell           4.2       178          4.1       238

sep argument

pd.wide_to_long(new_books, stubnames=['ratings', 'sold'], i=['title', 'author'], j='year')

                    sold_2020 ratings_2020 ratings_2019 sold_2019  ratings sold
title  author year

sep argument

pd.wide_to_long(new_books, stubnames=['ratings', 'sold'], i=['title', 'author'], j='year', sep='_')

                                                   ratings    sold
                   title              author  year 
0  A Murder Is Announced     Agatha Christie  2019     4.4     796
1        Sherlock Holmes  Sir A. Conan Doyle  2019     4.5     780
2            The Sparrow  Mary Doria Russell  2019     4.2     178    
3  A Murder Is Announced     Agatha Christie  2020     4.8     856
4        Sherlock Holmes  Sir A. Conan Doyle  2020     4.8     818
5            The Sparrow  Mary Doria Russell  2020     4.1     238

suffix argument

another_books

                  title  ratings_one  sold_one  ratings_two  sold_two
0 A Murder Is Announced          4.4       796          4.8       856
1       Sherlock Holmes          4.5       780          4.8       818
2           The Sparrow          4.2       178          4.1       238

suffix argument

pd.wide_to_long(another_books, stubnames=['ratings', 'sold'], i='title', j='edition', sep='_')

            sold_one ratings_one ratings_two sold_two  ratings sold
title  year

suffix argument

pd.wide_to_long(another_books, stubnames=['ratings', 'sold'], i='title', j='edition', sep='_', suffix='\w+')

                                 ratings    sold
                   title edition 
0  A Murder Is Announced     one     4.4     796
1        Sherlock Holmes     one     4.5     780
2            The Sparrow     one     4.2     178    
3  A Murder Is Announced     two     4.8     856
4        Sherlock Holmes     two     4.8     818
5            The Sparrow     two     4.1     238

Let's practice!

Rimodellare i dati con pandas