Reading nested data into a DataFrame

Reshaping Data with pandas

Maria Eugenia Inzaugarat

Data Scientist

Review

  • Reshape DataFrames and Series
  • Explode lists contained in columns
  • Split and concatenate strings
Reshaping Data with pandas

JSON format

  • JavaScript Object Notation
  • Data-interchange format
  • Easy for humans to read and write
  • Easy for machines to parse and generate
Reshaping Data with pandas

JSON format

my_writer
{
  "first" : "Mary",
  "last" : "Shelley",
  "country" : "England",
  "books" : 12
}
Reshaping Data with pandas

Nested JSON

writers
writers = [
            {
              "first": "Mary",
              "last": "Shelley",
              "books": {"title": "Frankenstein", "year": 1818}
            },
            {
              "first": "Ernest",
              "last": "Hemingway",
              "books": {"title": "The Old Man and the Sea", "year": 1951}
            }
          ]
Reshaping Data with pandas

Data normalization

from pandas import json_normalize
json_normalize(writers)
    first       last              books.title  books.year
0    Mary    Shelley             Frankenstein        1818
1  Ernest  Hemingway  The Old Man and the Sea        1951
Reshaping Data with pandas

Data normalization

writers_norm = json_normalize(writers, sep='_')
writers_norm
    first       last              books_title  books_year
0    Mary    Shelley             Frankenstein        1818
1  Ernest  Hemingway  The Old Man and the Sea        1951
Reshaping Data with pandas

Data normalization

pd.wide_to_long(writers_norm, stubnames=['books'], i=['first', 'last'], j='feature', sep='_', suffix='\w+')
                                            books
first  last      feature                         
Mary   Shelley   title               Frankenstein
                 year                        1818
Ernest Hemingway title    The Old Man and the Sea
                 year                        1951
Reshaping Data with pandas

Complex JSON

writers
[
    {'name': 'Mary',
     'last': 'Shelley',
     'books': [{'title': 'Frankestein', 'year': 1818},
                {'title': 'Mathilda ', 'year': 1819},
                {'title': 'The Last Man', 'year': 1826}]},
    {'name': 'Ernest',
     'last': 'Hemmingway',
     'books': [{'title': 'The Old Man and the Sea', 'year': 1951},
               {'title': 'The Sun Also Rises', 'year': 1927}]}
]
Reshaping Data with pandas

Complex JSON

json_normalize(writers)
     name        last                                              books
0    Mary     Shelley  [{'title': 'Frankestein', 'year': 1818}, {'tit...
1  Ernest  Hemmingway  [{'title': 'The Old Man and the Sea', 'year': ...
Reshaping Data with pandas

Record path

json_normalize(writers, record_path='books')
                     title  year
0              Frankestein  1818
1                Mathilda   1819
2             The Last Man  1826
3  The Old Man and the Sea  1951
4       The Sun Also Rises  1927
Reshaping Data with pandas

Metadata

json_normalize(writers, record_path='books', meta=['name', 'last'])
                     title  year    name        last
0              Frankestein  1818    Mary     Shelley
1                Mathilda   1819    Mary     Shelley
2             The Last Man  1826    Mary     Shelley
3  The Old Man and the Sea  1951  Ernest  Hemmingway
4       The Sun Also Rises  1927  Ernest  Hemmingway
Reshaping Data with pandas

Let's practice!

Reshaping Data with pandas

Preparing Video For Download...