Joining Data with pandas
Aaren Stubberfield
Instructor
Possible merging issue:
Possible concatenating issue:
.merge(validate=None)
:
'one_to_one'
'one_to_many'
'many_to_one'
'many_to_many'
Table Name: tracks
tid name aid mtid gid u_price
0 2 Balls to the... 2 2 1 0.99
1 3 Fast As a Shark 3 2 1 0.99
2 4 Restless and... 3 2 1 0.99
Table Name: specs
tid milliseconds bytes
0 2 342562 5510424
1 3 230619 3990994
2 2 252051 4331779
tracks.merge(specs, on='tid',
validate='one_to_one')
Traceback (most recent call last):
MergeError: Merge keys are not unique in right dataset; not a one-to-one merge
albums.merge(tracks, on='aid',
validate='one_to_many')
aid title artid tid name mtid gid u_price
0 2 Balls to the... 2 2 Balls to the... 2 1 0.99
1 3 Restless and... 2 3 Fast As a Shark 2 1 0.99
2 3 Restless and... 2 4 Restless and... 2 1 0.99
.concat(verify_integrity=False)
:
False
Table Name: inv_feb
cid invoice_date total
iid
7 38 2009-02-01 1.98
8 40 2009-02-01 1.98
9 42 2009-02-02 3.96
Table Name: inv_mar
cid invoice_date total
iid
9 17 2009-03-04 1.98
15 19 2009-03-04 1.98
16 21 2009-03-05 3.96
pd.concat([inv_feb, inv_mar],
verify_integrity=True)
Traceback (most recent call last):
ValueError: Indexes have overlapping
values: Int64Index([9], dtype='int64',
name='iid')
pd.concat([inv_feb, inv_mar],
verify_integrity=False)
cid invoice_date total
iid
7 38 2009-02-01 1.98
8 40 2009-02-01 1.98
9 42 2009-02-02 3.96
9 17 2009-03-04 1.98
15 19 2009-03-04 1.98
16 21 2009-03-05 3.96
Why:
What to do:
Joining Data with pandas