Combining multiple datasets

Streamlined Data Ingestion with pandas

Amany Mahfouz

Instructor

Concatenating

  • Use case: adding rows from one dataframe to another
  • concat()
    • pandas function
    • Syntax: pd.concat([df1,df2])
    • Set ignore_index to True to renumber rows
Streamlined Data Ingestion with pandas

Concatenating

# Get first 20 bookstore results
params = {"term": "bookstore", 
          "location": "San Francisco"}  
first_results = requests.get(api_url,
                             headers=headers,
                             params=params).json()

first_20_bookstores = json_normalize(first_results["businesses"],
                                     sep="_")

print(first_20_bookstores.shape)
(20, 24)
Streamlined Data Ingestion with pandas
# Get the next 20 bookstores
params["offset"] = 20
next_results = requests.get(api_url,
                            headers=headers,
                            params=params).json()

next_20_bookstores = json_normalize(next_results["businesses"],
                                    sep="_")

print(next_20_bookstores.shape)
(20, 24)
Streamlined Data Ingestion with pandas
# Put bookstore datasets together, renumber rows
bookstores = pd.concat([first_20_bookstores, next_20_bookstores],
                        ignore_index=True)

print(bookstores.name)
0                             City Lights Bookstore
1                            Alexander Book Company
2                                 Borderlands Books
3                                   Alley Cat Books
4                                   Dog Eared Books
...                                             ...
35                                     Forest Books
36                San Francisco Center For The Book
37                           KingSpoke - Book Store
38                            Eastwind Books & Arts
39                                      My Favorite
Name: name, dtype: object
Streamlined Data Ingestion with pandas

Merging

  • Use case: combining datasets to add related columns
  • Datasets have key column(s) with common values
  • merge(): pandas version of a SQL join
Streamlined Data Ingestion with pandas

Merging

  • merge()
    • Both a pandas function and a dataframe method
  • df.merge() arguments
    • Second dataframe to merge
    • Columns to merge on
      • on if names are the same in both dataframes
      • left_on and right_on if key names differ
      • Key columns should be the same data type
Streamlined Data Ingestion with pandas
call_counts.head()
  created_date  call_counts
0   01/01/2018         4597
1   01/02/2018         4362
2   01/03/2018         3045
3   01/04/2018         3374
4   01/05/2018         4333
weather.head()
         date    tmax  tmin  
0  12/01/2017      52    42  
1  12/02/2017      48    39  
2  12/03/2017      48    42  
3  12/04/2017      51    40  
4  12/05/2017      61    50
Streamlined Data Ingestion with pandas

Merging

# Merge weather into call counts on date columns
merged = call_counts.merge(weather, 
                           left_on="created_date", 
                           right_on="date")

print(merged.head())
  created_date  call_counts        date  tmax  tmin
0   01/01/2018         4597  01/01/2018    19     7
1   01/02/2018         4362  01/02/2018    26    13
2   01/03/2018         3045  01/03/2018    30    16
3   01/04/2018         3374  01/04/2018    29    19
4   01/05/2018         4333  01/05/2018    19     9
Streamlined Data Ingestion with pandas

Merging

   created_date  call_counts        date  tmax  tmin
0    01/01/2018         4597  01/01/2018    19     7
1    01/02/2018         4362  01/02/2018    26    13
2    01/03/2018         3045  01/03/2018    30    16
3    01/04/2018         3374  01/04/2018    29    19
4    01/05/2018         4333  01/05/2018    19     9
  • Default merge() behavior: return only values that are in both datasets
  • One record for each value match between dataframes
    • Multiple matches = multiple records
Streamlined Data Ingestion with pandas

Let's practice!

Streamlined Data Ingestion with pandas

Preparing Video For Download...