Working with nested JSONs

Acquisizione dati semplificata con pandas

Amany Mahfouz

Instructor

Nested JSONs

  • JSONs contain objects with attribute-value pairs
  • A JSON is nested when the value itself is an object
Acquisizione dati semplificata con pandas

Example JSON response from the Yelp Business API documentation

Acquisizione dati semplificata con pandas

Example Yelp response data, with nested coordinate and location data highlighted

Acquisizione dati semplificata con pandas

Example Yelp response data, with nested category data highlighted

Acquisizione dati semplificata con pandas

Example Yelp response data, highlighting nested records under businesses

Acquisizione dati semplificata con pandas
# Print columns containing nested data
print(bookstores[["categories", "coordinates", "location"]].head(3))
                                          categories  \
0   [{'alias': 'bookstores', 'title': 'Bookstores'}]   
1  [{'alias': 'bookstores', 'title': 'Bookstores'...   
2   [{'alias': 'bookstores', 'title': 'Bookstores'}]     

                                         coordinates  \
0  {'latitude': 37.7975997924805, 'longitude': -1...   
1  {'latitude': 37.7885846793652, 'longitude': -1...   
2  {'latitude': 37.7589836120605, 'longitude': -1...    

                                            location  
0  {'address1': '261 Columbus Ave', 'address2': '...  
1  {'address1': '50 2nd St', 'address2': '', 'add...  
2  {'address1': '866 Valencia St', 'address2': ''...
Acquisizione dati semplificata con pandas

pandas.io.json

  • pandas.io.json submodule has tools for reading and writing JSON
    • Needs its own import statement
  • json_normalize()
    • Takes a dictionary/list of dictionaries (like pd.DataFrame() does)
    • Returns a flattened dataframe
    • Default flattened column name pattern: attribute.nestedattribute
    • Choose a different separator with the sep argument
Acquisizione dati semplificata con pandas

Loading Nested JSON Data

import pandas as pd
import requests

from pandas.io.json import json_normalize
# Set up headers, parameters, and API endpoint api_url = "https://api.yelp.com/v3/businesses/search" headers = {"Authorization": "Bearer {}".format(api_key)} params = {"term": "bookstore", "location": "San Francisco"}
# Make the API call and extract the JSON data response = requests.get(api_url, headers=headers, params=params) data = response.json()
Acquisizione dati semplificata con pandas
# Flatten data and load to dataframe, with _ separators
bookstores = json_normalize(data["businesses"], sep="_")
print(list(bookstores))
['alias', 
 'categories',
 'coordinates_latitude',
 'coordinates_longitude',
 ...
 'location_address1',
 'location_address2',
 'location_address3',
 'location_city',
 'location_country',
 'location_display_address',
 'location_state',
 'location_zip_code',
 ...
 'url']
Acquisizione dati semplificata con pandas

Deeply Nested Data

print(bookstores.categories.head())
0     [{'alias': 'bookstores', 'title': 'Bookstores'}]
1    [{'alias': 'bookstores', 'title': 'Bookstores'...
2     [{'alias': 'bookstores', 'title': 'Bookstores'}]
3     [{'alias': 'bookstores', 'title': 'Bookstores'}]
4    [{'alias': 'bookstores', 'title': 'Bookstores'...
Name: categories, dtype: object
Acquisizione dati semplificata con pandas

Deeply Nested Data

  • json_normalize()
    • record_path: string/list of string attributes to nested data
    • meta: list of other attributes to load to dataframe
    • meta_prefix: string to prefix to meta column names
Acquisizione dati semplificata con pandas

Deeply Nested Data

# Flatten categories data, bring in business details
df = json_normalize(data["businesses"],
                    sep="_",

record_path="categories",
meta=["name", "alias", "rating", ["coordinates", "latitude"], ["coordinates", "longitude"]],
meta_prefix="biz_")
Acquisizione dati semplificata con pandas
print(df.head(4))
        alias               title                biz_name  \
0  bookstores          Bookstores   City Lights Bookstore   
1  bookstores          Bookstores  Alexander Book Company   
2  stationery  Cards & Stationery  Alexander Book Company   
3  bookstores          Bookstores       Borderlands Books   

                              biz_alias  biz_rating  biz_coordinates_latitude  \
0   city-lights-bookstore-san-francisco         4.5                 37.797600   
1  alexander-book-company-san-francisco         4.5                 37.788585   
2  alexander-book-company-san-francisco         4.5                 37.788585   
3       borderlands-books-san-francisco         5.0                 37.758984   

   biz_coordinates_longitude  
0                -122.406578  
1                -122.400631  
2                -122.400631  
3                -122.421638
Acquisizione dati semplificata con pandas

Let's practice!

Acquisizione dati semplificata con pandas

Preparing Video For Download...