Joining Data with pandas
Aaren Stubberfield
Instructor
print(licenses.head())
account ward aid business address zip
0 307071 3 743 REGGIE'S BAR... 2105 S STATE ST 60616
1 10 10 829 HONEYBEERS 13200 S HOUS... 60633
2 10002 14 775 CELINA DELI 5089 S ARCHE... 60632
3 10005 12 nan KRAFT FOODS ... 2005 W 43RD ST 60609
4 10044 44 638 NEYBOUR'S TA... 3651 N SOUTH... 60613
print(wards.head())
ward alderman address zip
0 1 Proco "Joe" ... 2058 NORTH W... 60647
1 2 Brian Hopkins 1400 NORTH ... 60622
2 3 Pat Dowell 5046 SOUTH S... 60609
3 4 William D. B... 435 EAST 35T... 60616
4 5 Leslie A. Ha... 2325 EAST 71... 60649
grants = pd.read_csv('Small_Business_Grant_Agreements.csv')
print(grants.head())
address zip grant company
0 1000 S KOSTN... 60624 148914.50 NATIONWIDE F...
1 1000 W 35TH ST 60609 100000.00 SMALL BATCH,...
2 1000 W FULTO... 60612 34412.50 FULTON MARKE...
3 10008 S WEST... 60643 12285.32 LAW OFFICES ...
4 1002 W ARGYL... 60640 28998.75 MASALA'S IND...
address zip grant company
0 1031 N CICER... 60651 150000.00 1031 HANS LLC
1 1375 W LAKE ST 60612 150000.00 1375 W LAKE ...
2 1800 W LAKE ST 60612 47700.00 1800 W LAKE LLC
3 4311 S HALST... 60609 87350.63 4311 S. HALS...
4 1747 W CARRO... 60612 50000.00 ACE STYLINE ...
account ward aid business address zip
0 307071 3 743 REGGIE'S BAR... 2105 S STATE ST 60616
1 10 10 829 HONEYBEERS 13200 S HOUS... 60633
2 10002 14 775 CELINA DELI 5089 S ARCHE... 60632
3 10005 12 nan KRAFT FOODS ... 2005 W 43RD ST 60609
4 10044 44 638 NEYBOUR'S TA... 3651 N SOUTH... 60613
grants_licenses = grants.merge(licenses, on='zip')
print(grants_licenses.loc[grants_licenses['business']=="REGGIE'S BAR & GRILL",
['grant','company','account','ward','business']])
grant company account ward business
0 136443.07 CEDARS MEDIT... 307071 3 REGGIE'S BAR...
1 39943.15 DARRYL & FYL... 307071 3 REGGIE'S BAR...
2 31250.0 JGF MANAGEMENT 307071 3 REGGIE'S BAR...
3 143427.79 HYDE PARK AN... 307071 3 REGGIE'S BAR...
4 69500.0 ZBERRY INC 307071 3 REGGIE'S BAR...
grants.merge(licenses, on=['address','zip'])
address zip grant company account ward aid business
0 1020 N KOLMA... 60651 68309.8 TRITON INDUS... 7689 37 929 TRITON INDUS...
1 10241 S COMM... 60617 33275.5 SOUTH CHICAG... 246598 10 nan SOUTH CHICAG...
2 11612 S WEST... 60643 30487.5 BEVERLY RECO... 3705 19 nan BEVERLY RECO...
3 1600 S KOSTN... 60623 128513.7 CHARTER STEE... 293825 24 nan LEELO STEEL,...
4 1647 W FULTO... 60612 5634.0 SN PECK BUIL... 85595 27 673 S.N. PECK BU...
grants_licenses_ward = grants.merge(licenses, on=['address','zip']) \
.merge(wards, on='ward', suffixes=('_bus','_ward'))
grants_licenses_ward.head()
address_bus zip_bus grant company account ward aid business alderman address_ward zip_ward
0 1020 N KOLMA... 60651 68309.8 TRITON INDUS... 7689 37 929 TRITON INDUS... Emma M. Mitts 4924 West Ch... 60651
1 10241 S COMM... 60617 33275.5 SOUTH CHICAG... 246598 10 nan SOUTH CHICAG... Susan Sadlow... 10500 South ... 60617
2 11612 S WEST... 60643 30487.5 BEVERLY RECO... 3705 19 nan BEVERLY RECO... Matthew J. O... 10400 South ... 60643
3 3502 W 111TH ST 60655 50000.0 FACE TO FACE... 263274 19 704 FACE TO FACE Matthew J. O... 10400 South ... 60643
4 1600 S KOSTN... 60623 128513.7 CHARTER STEE... 293825 24 nan LEELO STEEL,... Michael Scot... 1158 South K... 60624
import matplotlib.pyplot as plt
grant_licenses_ward.groupby('ward').agg('sum').plot(kind='bar', y='grant')
plt.show()
Three tables:
df1.merge(df2, on='col') \
.merge(df3, on='col')
Four tables:
df1.merge(df2, on='col') \
.merge(df3, on='col') \
.merge(df4, on='col')
Joining Data with pandas