Working with nested JSONs

Streamlined Data Ingestion with pandas

Amany Mahfouz

Instructor

Nested JSONs

  • JSONs contain objects with attribute-value pairs
  • A JSON is nested when the value itself is an object
Streamlined Data Ingestion with pandas

Example JSON response from the Yelp Business API documentation

Streamlined Data Ingestion with pandas

Example Yelp response data, with nested coordinate and location data highlighted

Streamlined Data Ingestion with pandas

Example Yelp response data, with nested category data highlighted

Streamlined Data Ingestion with pandas

Example Yelp response data, highlighting nested records under businesses

Streamlined Data Ingestion with pandas
# Print columns containing nested data
print(bookstores[["categories", "coordinates", "location"]].head(3))
                                          categories  \
0   [{'alias': 'bookstores', 'title': 'Bookstores'}]   
1  [{'alias': 'bookstores', 'title': 'Bookstores'...   
2   [{'alias': 'bookstores', 'title': 'Bookstores'}]     

                                         coordinates  \
0  {'latitude': 37.7975997924805, 'longitude': -1...   
1  {'latitude': 37.7885846793652, 'longitude': -1...   
2  {'latitude': 37.7589836120605, 'longitude': -1...    

                                            location  
0  {'address1': '261 Columbus Ave', 'address2': '...  
1  {'address1': '50 2nd St', 'address2': '', 'add...  
2  {'address1': '866 Valencia St', 'address2': ''...
Streamlined Data Ingestion with pandas

pandas.io.json

  • pandas.io.json submodule has tools for reading and writing JSON
    • Needs its own import statement
  • json_normalize()
    • Takes a dictionary/list of dictionaries (like pd.DataFrame() does)
    • Returns a flattened dataframe
    • Default flattened column name pattern: attribute.nestedattribute
    • Choose a different separator with the sep argument
Streamlined Data Ingestion with pandas

Loading Nested JSON Data

import pandas as pd
import requests

from pandas.io.json import json_normalize
# Set up headers, parameters, and API endpoint api_url = "https://api.yelp.com/v3/businesses/search" headers = {"Authorization": "Bearer {}".format(api_key)} params = {"term": "bookstore", "location": "San Francisco"}
# Make the API call and extract the JSON data response = requests.get(api_url, headers=headers, params=params) data = response.json()
Streamlined Data Ingestion with pandas
# Flatten data and load to dataframe, with _ separators
bookstores = json_normalize(data["businesses"], sep="_")
print(list(bookstores))
['alias', 
 'categories',
 'coordinates_latitude',
 'coordinates_longitude',
 ...
 'location_address1',
 'location_address2',
 'location_address3',
 'location_city',
 'location_country',
 'location_display_address',
 'location_state',
 'location_zip_code',
 ...
 'url']
Streamlined Data Ingestion with pandas

Deeply Nested Data

print(bookstores.categories.head())
0     [{'alias': 'bookstores', 'title': 'Bookstores'}]
1    [{'alias': 'bookstores', 'title': 'Bookstores'...
2     [{'alias': 'bookstores', 'title': 'Bookstores'}]
3     [{'alias': 'bookstores', 'title': 'Bookstores'}]
4    [{'alias': 'bookstores', 'title': 'Bookstores'...
Name: categories, dtype: object
Streamlined Data Ingestion with pandas

Deeply Nested Data

  • json_normalize()
    • record_path: string/list of string attributes to nested data
    • meta: list of other attributes to load to dataframe
    • meta_prefix: string to prefix to meta column names
Streamlined Data Ingestion with pandas

Deeply Nested Data

# Flatten categories data, bring in business details
df = json_normalize(data["businesses"],
                    sep="_",

record_path="categories",
meta=["name", "alias", "rating", ["coordinates", "latitude"], ["coordinates", "longitude"]],
meta_prefix="biz_")
Streamlined Data Ingestion with pandas
print(df.head(4))
        alias               title                biz_name  \
0  bookstores          Bookstores   City Lights Bookstore   
1  bookstores          Bookstores  Alexander Book Company   
2  stationery  Cards & Stationery  Alexander Book Company   
3  bookstores          Bookstores       Borderlands Books   

                              biz_alias  biz_rating  biz_coordinates_latitude  \
0   city-lights-bookstore-san-francisco         4.5                 37.797600   
1  alexander-book-company-san-francisco         4.5                 37.788585   
2  alexander-book-company-san-francisco         4.5                 37.788585   
3       borderlands-books-san-francisco         5.0                 37.758984   

   biz_coordinates_longitude  
0                -122.406578  
1                -122.400631  
2                -122.400631  
3                -122.421638
Streamlined Data Ingestion with pandas

Let's practice!

Streamlined Data Ingestion with pandas

Preparing Video For Download...