Transforming a list-like column

Reshaping Data with pandas

Maria Eugenia Inzaugarat

Data Scientist

List-like columns

DataFrame with a column containing lists

Reshaping Data with pandas

Transforming list-like columns

Arrow pointing from DataFrame to explode DataFrame

Reshaping Data with pandas

The .explode() method

Arrow pointing from DataFrame to explode DataFrame

The explode method

Reshaping Data with pandas

Exploding a column

cities
          city  country               zip_code
0  Los Angeles      USA  [90001, 90004, 90008]
1       Madrid    Spain  [28001, 28004, 28005]
2        Rabat  Morocco         [10010, 10170]
Reshaping Data with pandas

Exploding a column

cities_explode = cities['zip_code'].explode()
cities_explode
0    90001
0    90004
0    90008
1    28001
1    28004
1    28005
2    10010
2    10170
Reshaping Data with pandas

Exploding a column

cities[['city', 'country']]
Reshaping Data with pandas

Exploding a column

cities[['city', 'country']].merge(cities_explode,                                  )
Reshaping Data with pandas

Exploding a column

cities[['city', 'country']].merge(cities_explode, left_index=True, right_index=True)
          city  country zip_code
0  Los Angeles      USA    90001
0  Los Angeles      USA    90004
0  Los Angeles      USA    90008
1       Madrid    Spain    28001
1       Madrid    Spain    28004
1       Madrid    Spain    28005
2        Rabat  Morocco    10010
2        Rabat  Morocco    10170
Reshaping Data with pandas

Exploding a column in the DataFrame

cities_explode = cities.explode('zip_code')
cities_explode
          city  country zip_code
0  Los Angeles      USA    90001
0  Los Angeles      USA    90004
0  Los Angeles      USA    90008
1       Madrid    Spain    28001
1       Madrid    Spain    28004
1       Madrid    Spain    28005
2        Rabat  Morocco    10010
2        Rabat  Morocco    10170
Reshaping Data with pandas

Exploding a column in the DataFrame

cities_explode.reset_index(drop=True, inplace=True)
          city  country zip_code
0  Los Angeles      USA    90001
1  Los Angeles      USA    90004
2  Los Angeles      USA    90008
3       Madrid    Spain    28001
4       Madrid    Spain    28004
5       Madrid    Spain    28005
6        Rabat  Morocco    10010
7        Rabat  Morocco    10170
Reshaping Data with pandas

Empty lists

cities_new
          city  country               zip_code
0  Los Angeles      USA  [90001, 90004, 90008]
1       Madrid    Spain                     []
2        Rabat  Morocco         [10010, 10170]
cities_new.explode('zip_code')
          city  country zip_code
0  Los Angeles      USA    90001
0  Los Angeles      USA    90004
0  Los Angeles      USA    90008
1       Madrid    Spain      NaN
2        Rabat  Morocco    10010
2        Rabat  Morocco    10170
Reshaping Data with pandas

Chaining operations

cities
          city  country             zip_code
0  Los Angeles      USA  90001, 90004, 90008
1       Madrid    Spain  28001, 28004, 28005
2        Rabat  Morocco         10010, 10170
Reshaping Data with pandas

Chaining operations

cities['zip_code'].str.split(',', expand=True)
       0       1       2
0  90001   90004   90008
1  28001   28004   28005
2  10010   10170    None
Reshaping Data with pandas

Chaining operations

cites.assign(zip_code=                                )
Reshaping Data with pandas

Chaining operations

cites.assign(zip_code=cities['zip_code'].str.split(','))
Reshaping Data with pandas

Chaining operations

cites.assign(zip_code=cities['zip_code'].str.split(',')).explode('zip_code')
          city  country zip_code
0  Los Angeles      USA    90001
0  Los Angeles      USA    90004
0  Los Angeles      USA    90008
1       Madrid    Spain    28001
1       Madrid    Spain    28004
1       Madrid    Spain    28005
2        Rabat  Morocco    10010
2        Rabat  Morocco    10170
Reshaping Data with pandas

Let's practice!

Reshaping Data with pandas

Preparing Video For Download...