Joining Data with pandas
Aaren Stubberfield
Instructor
id title popularity release_date
0 257 Oliver Twist 20.415572 2005-09-23
1 14290 Better Luck ... 3.877036 2002-01-12
2 38365 Grown Ups 38.864027 2010-06-24
3 9672 Infamous 3.680896 2006-11-16
4 12819 Alpha and Omega 12.300789 2010-09-17
title popularity release_date
id
257 Oliver Twist 20.415572 2005-09-23
14290 Better Luck ... 3.877036 2002-01-12
38365 Grown Ups 38.864027 2010-06-24
9672 Infamous 3.680896 2006-11-16
12819 Alpha and Omega 12.300789 2010-09-17
movies = pd.read_csv('tmdb_movies.csv', index_col=['id'])
print(movies.head())
title popularity release_date
id
257 Oliver Twist 20.415572 2005-09-23
14290 Better Luck ... 3.877036 2002-01-12
38365 Grown Ups 38.864027 2010-06-24
9672 Infamous 3.680896 2006-11-16
12819 Alpha and Omega 12.300789 2010-09-17
title popularity release_date
id
257 Oliver Twist 20.415572 2005-09-23
14290 Better Luck ... 3.877036 2002-01-12
38365 Grown Ups 38.864027 2010-06-24
9672 Infamous 3.680896 2006-11-16
tagline
id
19995 Enter the Wo...
285 At the end o...
206647 A Plan No On...
49026 The Legend Ends
movies_taglines = movies.merge(taglines, on='id', how='left')
print(movies_taglines.head())
title popularity release_date tagline
id
257 Oliver Twist 20.415572 2005-09-23 NaN
14290 Better Luck ... 3.877036 2002-01-12 Never undere...
38365 Grown Ups 38.864027 2010-06-24 Boys will be...
9672 Infamous 3.680896 2006-11-16 There's more...
12819 Alpha and Omega 12.300789 2010-09-17 A Pawsome 3D...
samuel = pd.read_csv('samuel.csv',
index_col=['movie_id',
'cast_id'])
print(samuel.head())
name
movie_id cast_id
184 3 Samuel L. Jackson
319 13 Samuel L. Jackson
326 2 Samuel L. Jackson
329 138 Samuel L. Jackson
393 21 Samuel L. Jackson
casts = pd.read_csv('casts.csv',
index_col=['movie_id',
'cast_id'])
print(casts.head())
character
movie_id cast_id
5 22 Jezebel
23 Diana
24 Athena
25 Elspeth
26 Eva
samuel_casts = samuel.merge(casts, on=['movie_id','cast_id'])
print(samuel_casts.head())
print(samuel_casts.shape)
name character
movie_id cast_id
184 3 Samuel L. Jackson Ordell Robbie
319 13 Samuel L. Jackson Big Don
326 2 Samuel L. Jackson Neville Flynn
329 138 Samuel L. Jackson Arnold
393 21 Samuel L. Jackson Rufus
(67, 2)
title popularity release_date
id
257 Oliver Twist 20.415572 2005-09-23
14290 Better Luck ... 3.877036 2002-01-12
38365 Grown Ups 38.864027 2010-06-24
9672 Infamous 3.680896 2006-11-16
genre
movie_id
5 Crime
5 Comedy
11 Science Fiction
11 Action
movies_genres = movies.merge(movie_to_genres, left_on='id', left_index=True,
right_on='movie_id', right_index=True)
print(movies_genres.head())
id title popularity release_date genre
5 5 Four Rooms 22.876230 1995-12-09 Crime
5 5 Four Rooms 22.876230 1995-12-09 Comedy
11 11 Star Wars 126.393695 1977-05-25 Science Fiction
11 11 Star Wars 126.393695 1977-05-25 Action
11 11 Star Wars 126.393695 1977-05-25 Adventure
Joining Data with pandas