Wide to long function

Reshaping Data with pandas

Maria Eugenia Inzaugarat

Data Scientist

Wide to long transformation

Reshaping Data with pandas

Wide to long transformation

 

Reshaping Data with pandas

Wide to long function

 

Reshaping Data with pandas

Wide to long function

 

Reshaping Data with pandas

Wide to long function

 

Reshaping Data with pandas

Wide to long function

 

Reshaping Data with pandas

Reshaping data

books
                             title ratings2019 sold2019 ratings2020 sold2020
0                  Mostly Harmless         4.2      456         4.3      436
1           The Hitchhiker's Guide         4.8      980         4.9      998
2 El restaurante del fin del mundo         4.5      678         4.6      638
Reshaping Data with pandas

Reshaping data

pd.wide_to_long(books,                                                   )
Reshaping Data with pandas

Reshaping data

pd.wide_to_long(books, stubnames=['ratings', 'sold']                     )
Reshaping Data with pandas

Reshaping data

pd.wide_to_long(books, stubnames=['ratings', 'sold'],          , j='year')
Reshaping Data with pandas

Reshaping data

pd.wide_to_long(books, stubnames=['ratings', 'sold'], i='title', j='year')
                                         ratings    sold
                             title  year 
0                  Mostly Harmless  2019     4.2     456
1           The Hitchhiker's Guide  2019     4.8     980
2 El restaurante del fin del mundo  2019     4.5     678    
3                  Mostly Harmless  2020     4.4     436
4           The Hitchhiker's Guide  2020     4.9     998
5 El restaurante del fin del mundo  2020     4.6     638
Reshaping Data with pandas

DataFrame with index

books_with_index
                                    author  ratings2019 sold2019
                    title 
0   To Kill a Mockingbird       Harper Lee          4.7      456
1  The Hitchhiker's Guide    Douglas Adams          4.8      980
2           The Black Cat   Edgar Alan Poe          4.5      678
pd.wide_to_long(books_with_index, stubnames=['ratings', 'sold'], i='author', j='year')
                        ratings    sold
          author  year
0     Harper Lee  2019      4.2      456
1  Douglas Adams  2019      4.8      980
2 Edgar Alan Poe  2019      4.5      678
Reshaping Data with pandas

DataFrame with index

books_with_index.reset_index(drop=False, inplace=True)

pd.wide_to_long(books_with_index, stubnames=['ratings', 'sold'], i=['author', 'title'], j='year')
                                                     ratings  sold
                    title           author    year
0   To Kill a Mockingbird       Harper Lee    2019       4.7   456
1  The Hitchhiker's Guide    Douglas Adams    2019       4.8   980
2           The Black Cat   Edgar Alan Poe    2019       4.5   678
Reshaping Data with pandas

sep argument

new_books
                  title              author  ratings_2019 sold_2019 ratings_2020 sold_2020
0 A Murder Is Announced     Agatha Christie           4.4       796          4.8       856
1       Sherlock Holmes  Sir A. Conan Doyle           4.5       780          4.8       818
2           The Sparrow  Mary Doria Russell           4.2       178          4.1       238                    
Reshaping Data with pandas

sep argument

pd.wide_to_long(new_books, stubnames=['ratings', 'sold'], i=['title', 'author'], j='year')
                    sold_2020 ratings_2020 ratings_2019 sold_2019  ratings sold
title  author year


Reshaping Data with pandas

sep argument

pd.wide_to_long(new_books, stubnames=['ratings', 'sold'], i=['title', 'author'], j='year', sep='_')
                                                   ratings    sold
                   title              author  year 
0  A Murder Is Announced     Agatha Christie  2019     4.4     796
1        Sherlock Holmes  Sir A. Conan Doyle  2019     4.5     780
2            The Sparrow  Mary Doria Russell  2019     4.2     178    
3  A Murder Is Announced     Agatha Christie  2020     4.8     856
4        Sherlock Holmes  Sir A. Conan Doyle  2020     4.8     818
5            The Sparrow  Mary Doria Russell  2020     4.1     238
Reshaping Data with pandas

suffix argument

another_books
                  title  ratings_one  sold_one  ratings_two  sold_two
0 A Murder Is Announced          4.4       796          4.8       856
1       Sherlock Holmes          4.5       780          4.8       818
2           The Sparrow          4.2       178          4.1       238                    
Reshaping Data with pandas

suffix argument

pd.wide_to_long(another_books, stubnames=['ratings', 'sold'], i='title', j='edition', sep='_')
            sold_one ratings_one ratings_two sold_two  ratings sold
title  year


Reshaping Data with pandas

suffix argument

pd.wide_to_long(another_books, stubnames=['ratings', 'sold'], i='title', j='edition', sep='_', suffix='\w+')
                                 ratings    sold
                   title edition 
0  A Murder Is Announced     one     4.4     796
1        Sherlock Holmes     one     4.5     780
2            The Sparrow     one     4.2     178    
3  A Murder Is Announced     two     4.8     856
4        Sherlock Holmes     two     4.8     818
5            The Sparrow     two     4.1     238
Reshaping Data with pandas

Let's practice!

Reshaping Data with pandas

Preparing Video For Download...