Merging multiple DataFrames

Joining Data with pandas

Aaren Stubberfield

Instructor

Merging multiple tables

Image of two DataFrames put together horizontally

Image of three DataFrames put together horizontally

Joining Data with pandas

Remembering the licenses table

print(licenses.head())
  account  ward  aid  business         address          zip
0 307071   3     743  REGGIE'S BAR...  2105 S STATE ST  60616
1 10       10    829  HONEYBEERS       13200 S HOUS...  60633
2 10002    14    775  CELINA DELI      5089 S ARCHE...  60632
3 10005    12    nan  KRAFT FOODS ...  2005 W 43RD ST   60609
4 10044    44    638  NEYBOUR'S TA...  3651 N SOUTH...  60613
Joining Data with pandas

Remembering the wards table

print(wards.head())
  ward  alderman         address          zip
0 1     Proco "Joe" ...  2058 NORTH W...  60647
1 2     Brian Hopkins    1400 NORTH  ...  60622
2 3     Pat Dowell       5046 SOUTH S...  60609
3 4     William D. B...  435 EAST 35T...  60616
4 5     Leslie A. Ha...  2325 EAST 71...  60649
Joining Data with pandas

Review new data

grants = pd.read_csv('Small_Business_Grant_Agreements.csv')
print(grants.head())
  address          zip    grant      company        
0 1000 S KOSTN...  60624  148914.50  NATIONWIDE F...
1 1000 W 35TH ST   60609  100000.00  SMALL BATCH,...
2 1000 W FULTO...  60612  34412.50   FULTON MARKE...
3 10008 S WEST...  60643  12285.32   LAW OFFICES ...
4 1002 W ARGYL...  60640  28998.75   MASALA'S IND...
Joining Data with pandas

Tables to merge

  address          zip    grant      company
0 1031 N CICER...  60651  150000.00  1031 HANS LLC
1 1375 W LAKE ST   60612  150000.00  1375 W LAKE ...
2 1800 W LAKE ST   60612  47700.00   1800 W LAKE LLC
3 4311 S HALST...  60609  87350.63   4311 S. HALS...
4 1747 W CARRO...  60612  50000.00   ACE STYLINE ...
  account  ward  aid  business         address          zip
0 307071   3     743  REGGIE'S BAR...  2105 S STATE ST  60616
1 10       10    829  HONEYBEERS       13200 S HOUS...  60633
2 10002    14    775  CELINA DELI      5089 S ARCHE...  60632
3 10005    12    nan  KRAFT FOODS ...  2005 W 43RD ST   60609
4 10044    44    638  NEYBOUR'S TA...  3651 N SOUTH...  60613
Joining Data with pandas

Theoretical merge

grants_licenses = grants.merge(licenses, on='zip')
print(grants_licenses.loc[grants_licenses['business']=="REGGIE'S BAR & GRILL", 
                          ['grant','company','account','ward','business']])
  grant      company          account  ward  business
0 136443.07  CEDARS MEDIT...  307071   3     REGGIE'S BAR...
1 39943.15   DARRYL & FYL...  307071   3     REGGIE'S BAR...
2 31250.0    JGF MANAGEMENT   307071   3     REGGIE'S BAR...
3 143427.79  HYDE PARK AN...  307071   3     REGGIE'S BAR...
4 69500.0    ZBERRY INC       307071   3     REGGIE'S BAR...
Joining Data with pandas

Single merge

grants.merge(licenses, on=['address','zip'])
  address          zip    grant     company          account  ward  aid  business
0 1020 N KOLMA...  60651  68309.8   TRITON INDUS...  7689     37    929  TRITON INDUS...
1 10241 S COMM...  60617  33275.5   SOUTH CHICAG...  246598   10    nan  SOUTH CHICAG...
2 11612 S WEST...  60643  30487.5   BEVERLY RECO...  3705     19    nan  BEVERLY RECO...
3 1600 S KOSTN...  60623  128513.7  CHARTER STEE...  293825   24    nan  LEELO STEEL,...
4 1647 W FULTO...  60612  5634.0    SN PECK BUIL...  85595    27    673  S.N. PECK BU...
Joining Data with pandas

Merging multiple tables

grants_licenses_ward = grants.merge(licenses, on=['address','zip']) \
                        .merge(wards, on='ward', suffixes=('_bus','_ward'))
grants_licenses_ward.head()
  address_bus      zip_bus  grant     company          account  ward  aid  business         alderman         address_ward     zip_ward  
0 1020 N KOLMA...  60651    68309.8   TRITON INDUS...  7689     37    929  TRITON INDUS...  Emma M. Mitts    4924 West Ch...  60651
1 10241 S COMM...  60617    33275.5   SOUTH CHICAG...  246598   10    nan  SOUTH CHICAG...  Susan Sadlow...  10500 South ...  60617
2 11612 S WEST...  60643    30487.5   BEVERLY RECO...  3705     19    nan  BEVERLY RECO...  Matthew J. O...  10400 South ...  60643
3 3502 W 111TH ST  60655    50000.0   FACE TO FACE...  263274   19    704  FACE TO FACE     Matthew J. O...  10400 South ...  60643
4 1600 S KOSTN...  60623    128513.7  CHARTER STEE...  293825   24    nan  LEELO STEEL,...  Michael Scot...  1158 South K...  60624
Joining Data with pandas

Results

import matplotlib.pyplot as plt
grant_licenses_ward.groupby('ward').agg('sum').plot(kind='bar', y='grant')
plt.show()

Bar plot of grants by ward

Joining Data with pandas

Merging even more...

Three tables:

df1.merge(df2, on='col') \
    .merge(df3, on='col')

Four tables:

df1.merge(df2, on='col') \
    .merge(df3, on='col') \
    .merge(df4, on='col')
Joining Data with pandas

Let's practice!

Joining Data with pandas

Preparing Video For Download...