Reshaping Data with pandas
Maria Eugenia Inzaugarat
Data Scientist
books
title raitings_2015 sold_2015 raitings_2016 sold_2016
0 The Civil War:Vol. 1 4.3 234 4.2 254
1 The Civil War:Vol. 2 4.5 525 4.3 515
2 The Civil War:Vol. 3 4.1 242 4.2 251
books['title'].dtypes
dtype('O')
pandas
Series string processing methodsstr
attributebooks
title raitings_2015 sold_2015 raitings_2016 sold_2016
0 The Civil War:Vol. 1 4.3 234 4.2 254
1 The Civil War:Vol. 2 4.5 525 4.3 515
2 The Civil War:Vol. 3 4.1 242 4.2 251
books['title']
books
title raitings_2015 sold_2015 raitings_2016 sold_2016
0 The Civil War:Vol. 1 4.3 234 4.2 254
1 The Civil War:Vol. 2 4.5 525 4.3 515
2 The Civil War:Vol. 3 4.1 242 4.2 251
books['title'].str.split(':')
0 [The Civil War, Vol. 1]
1 [The Civil War, Vol. 2]
2 [The Civil War, Vol. 3]
books
title raitings_2015 sold_2015 raitings_2016 sold_2016
0 The Civil War:Vol. 1 4.3 234 4.2 254
1 The Civil War:Vol. 2 4.5 525 4.3 515
2 The Civil War:Vol. 3 4.1 242 4.2 251
books['title'].str.split(":").str.get(0)
0 The Civil War
1 The Civil War
2 The Civil War
books
title raitings_2015 sold_2015 raitings_2016 sold_2016
0 The Civil War:Vol. 1 4.3 234 4.2 254
1 The Civil War:Vol. 2 4.5 525 4.3 515
2 The Civil War:Vol. 3 4.1 242 4.2 251
books['title'].str.split(":", expand=True)
0 1
0 The Civil War Vol. 1
1 The Civil War Vol. 2
2 The Civil War Vol. 3
books[['main_title', 'subtitle']] = books['title'].str.split(":", expand=True)
books.drop('title', axis=1, inplace=True)
pd.wide_to_long(books , stubnames=['ratings', 'sold'], i=['main_title', 'subtitle'], j='year')
ratings sold
main_title subtitle year
The Civil War Vol. 1 2015 4.3 234
2016 4.2 254
Vol. 2 2015 4.5 525
2016 4.3 515
Vol. 3 2015 4.1 242
2016 4.2 251
books_new
name_author lastname_author nationality number_books
0 Virginia Wolf British 50
1 Margaret Atwood Canadian 40
2 Harper Lee American 2
books_new
name_author lastname_author nationality number_books
0 Virginia Wolf British 50
1 Margaret Atwood Canadian 40
2 Harper Lee American 2
books_new['name_author'].str.cat(books_new['lastname_author'], sep=' ')
0 Virginia Wolf
1 Margaret Atwood
2 Harper Lee
books_new
name_author lastname_author nationality number_books
0 Virginia Wolf British 50
1 Margaret Atwood Canadian 40
2 Harper Lee American 2
books_new['author'] = books_new['name_author'].str.cat(books_new['lastname_author'], sep=' ')
books_new
name_author lastname_author nationality number_books author
0 Virginia Wolf British 50 Virginia Wolf
1 Margaret Atwood Canadian 40 Margaret Atwood
2 Harper Lee American 2 Harper Lee
books_new
name_author lastname_author nationality number_books
0 Virginia Wolf British 50
1 Margaret Atwood Canadian 40
2 Harper Lee American 2
books_new.melt(id_vars='author', value_vars=['nationality', 'number_books'], var_name='feature', value_name='value')
author feature value
0 Virginia Wolf nationality British
1 Margaret Atwood nationality Canadian
2 Harper Lee nationality American
3 Virginia Wolf number_books 50
4 Margaret Atwood number_books 40
5 Harper Lee number_books 2
comics_marvel
subtitle year ratings sold
main_title
Avengers Next 1992 4.5 234
Avengers Forever 1998 4.6 224
Avengers 2099 1999 4.8 141
comics_marvel.head(2)
subtitle year ratings sold
main_title
Avengers Next 1992 4.5 234
Avengers Forever 1998 4.6 224
comics_marvel.index = comics_marvel.index.str.cat(comics_marvel['subtitle'], sep='-')
books
subtitle year ratings sold
main_title
Avengers-Next Next 1992 4.5 234
Avengers-Forever Forever 1998 4.6 224
Avengers-2099 2099 1999 4.8 141
comics_marvel.index = comics_marvel.index.str.split('-', expand=True)
comics_marvel
subtitle year ratings sold
Avengers Next Next 1992 4.5 234
Forever Forever 1998 4.6 224
2099 2099 1999 4.8 141
books_new['name_author']
0 Virginia
1 Margaret
2 Harper
new_list = ['Wolf', 'Atwood', 'Lee']
books_new['name_author'].str.cat(new_list, sep=' ')
0 Virginia Wolf
1 Margaret Atwood
2 Harper Lee
Reshaping Data with pandas