Inner join

Joining Data with pandas

Aaren Stubberfield

Instructor

For clarity

Eye glasses looking at eye chart

 

 

Tables = DataFrames

Merging = Joining

1 Photo by David Travis on Unsplash
Joining Data with pandas

Chicago data portal dataset

Chicago skyline imagel

1 Photo by Pedro Lastra on Unsplash
Joining Data with pandas

Datasets for example

Map of Chicago Wards

Image of US Census Logo

1 Ward image By Alissapump, Own work, CC BY-SA 3.0
Joining Data with pandas

The ward data

wards = pd.read_csv('Ward_Offices.csv')
print(wards.head())
print(wards.shape)
  ward  alderman         address          zip
0 1     Proco "Joe" ...  2058 NORTH W...  60647
1 2     Brian Hopkins    1400 NORTH  ...  60622
2 3     Pat Dowell       5046 SOUTH S...  60609
3 4     William D. B...  435 EAST 35T...  60616
4 5     Leslie A. Ha...  2325 EAST 71...  60649
(50, 4)
Joining Data with pandas

Census data

census = pd.read_csv('Ward_Census.csv')
print(census.head())
print(census.shape)
  ward  pop_2000  pop_2010  change  address          zip
0 1     52951     56149     6%      2765 WEST SA...  60647
1 2     54361     55805     3%      WM WASTE MAN...  60622
2 3     40385     53039     31%     17 EAST 38TH...  60653
3 4     51953     54589     5%      31ST ST HARB...  60653
4 5     55302     51455     -7%     JACKSON PARK...  60637
(50, 6)
Joining Data with pandas

Merging tables

  ward  alderman         address          zip
0 1     Proco "Joe" ...  2058 NORTH W...  60647
1 2     Brian Hopkins    1400 NORTH  ...  60622
2 3     Pat Dowell       5046 SOUTH S...  60609
3 4     William D. B...  435 EAST 35T...  60616
4 5     Leslie A. Ha...  2325 EAST 71...  60649
  ward  pop_2000  pop_2010  change  address          zip
0 1     52951     56149     6%      2765 WEST SA...  60647
1 2     54361     55805     3%      WM WASTE MAN...  60622
2 3     40385     53039     31%     17 EAST 38TH...  60653
3 4     51953     54589     5%      31ST ST HARB...  60653
4 5     55302     51455     -7%     JACKSON PARK...  60637
Joining Data with pandas

Inner join

wards_census = wards.merge(census, on='ward')
print(wards_census.head(4))
  ward  alderman         address_x        zip_x  pop_2000  pop_2010  change  address_y        zip_y
0 1     Proco "Joe" ...  2058 NORTH W...  60647  52951     56149     6%      2765 WEST SA...  60647
1 2     Brian Hopkins    1400 NORTH  ...  60622  54361     55805     3%      WM WASTE MAN...  60622
2 3     Pat Dowell       5046 SOUTH S...  60609  40385     53039     31%     17 EAST 38TH...  60653
3 4     William D. B...  435 EAST 35T...  60616  51953     54589     5%      31ST ST HARB...  60653
print(wards_census.shape)
(50, 9)
Joining Data with pandas

Inner join

Inner join venn diagram

Joining Data with pandas

Suffixes

print(wards_census.columns)
Index(['ward', 'alderman', 'address_x', 'zip_x', 'pop_2000', 'pop_2010', 'change',
       'address_y', 'zip_y'],
      dtype='object')
Joining Data with pandas

Suffixes

wards_census = wards.merge(census, on='ward', suffixes=('_ward','_cen'))
print(wards_census.head())
print(wards_census.shape)
  ward  alderman         address_ward     zip_ward  pop_2000  pop_2010  change  address_cen      zip_cen
0 1     Proco "Joe" ...  2058 NORTH W...  60647     52951     56149     6%      2765 WEST SA...  60647   
1 2     Brian Hopkins    1400 NORTH  ...  60622     54361     55805     3%      WM WASTE MAN...  60622   
2 3     Pat Dowell       5046 SOUTH S...  60609     40385     53039     31%     17 EAST 38TH...  60653   
3 4     William D. B...  435 EAST 35T...  60616     51953     54589     5%      31ST ST HARB...  60653   
4 5     Leslie A. Ha...  2325 EAST 71...  60649     55302     51455     -7%     JACKSON PARK...  60637  
(50, 9)
Joining Data with pandas

Let's practice!

Joining Data with pandas

Preparing Video For Download...