Streamlined Data Ingestion with pandas
Amany Mahfouz
Instructor
concat()
pandas
functionpd.concat([df1,df2])
ignore_index
to True
to renumber rows# Get first 20 bookstore results
params = {"term": "bookstore",
"location": "San Francisco"}
first_results = requests.get(api_url,
headers=headers,
params=params).json()
first_20_bookstores = json_normalize(first_results["businesses"],
sep="_")
print(first_20_bookstores.shape)
(20, 24)
# Get the next 20 bookstores
params["offset"] = 20
next_results = requests.get(api_url,
headers=headers,
params=params).json()
next_20_bookstores = json_normalize(next_results["businesses"],
sep="_")
print(next_20_bookstores.shape)
(20, 24)
# Put bookstore datasets together, renumber rows bookstores = pd.concat([first_20_bookstores, next_20_bookstores], ignore_index=True)
print(bookstores.name)
0 City Lights Bookstore
1 Alexander Book Company
2 Borderlands Books
3 Alley Cat Books
4 Dog Eared Books
... ...
35 Forest Books
36 San Francisco Center For The Book
37 KingSpoke - Book Store
38 Eastwind Books & Arts
39 My Favorite
Name: name, dtype: object
merge()
: pandas
version of a SQL joinmerge()
pandas
function and a dataframe methoddf.merge()
argumentson
if names are the same in both dataframesleft_on
and right_on
if key names differcall_counts.head()
created_date call_counts
0 01/01/2018 4597
1 01/02/2018 4362
2 01/03/2018 3045
3 01/04/2018 3374
4 01/05/2018 4333
weather.head()
date tmax tmin
0 12/01/2017 52 42
1 12/02/2017 48 39
2 12/03/2017 48 42
3 12/04/2017 51 40
4 12/05/2017 61 50
# Merge weather into call counts on date columns merged = call_counts.merge(weather, left_on="created_date", right_on="date")
print(merged.head())
created_date call_counts date tmax tmin
0 01/01/2018 4597 01/01/2018 19 7
1 01/02/2018 4362 01/02/2018 26 13
2 01/03/2018 3045 01/03/2018 30 16
3 01/04/2018 3374 01/04/2018 29 19
4 01/05/2018 4333 01/05/2018 19 9
created_date call_counts date tmax tmin
0 01/01/2018 4597 01/01/2018 19 7
1 01/02/2018 4362 01/02/2018 26 13
2 01/03/2018 3045 01/03/2018 30 16
3 01/04/2018 3374 01/04/2018 29 19
4 01/05/2018 4333 01/05/2018 19 9
merge()
behavior: return only values that are in both datasetsStreamlined Data Ingestion with pandas