Using merge_ordered()

Joining Data with pandas

Aaren Stubberfield

Instructor

merge_ordered()

Table example of merge_ordered()

Joining Data with pandas

Method comparison

.merge() method:

  • Column(s) to join on
    • on, left_on, and right_on
  • Type of join
    • how (left, right, inner, outer) {{@}}
    • default inner
  • Overlapping column names
    • suffixes
  • Calling the method
    • df1.merge(df2)

merge_ordered() method:

  • Column(s) to join on
    • on, left_on, and right_on
  • Type of join
    • how (left, right, inner, outer)
    • default outer
  • Overlapping column names
    • suffixes
  • Calling the function
    • pd.merge_ordered(df1, df2)
Joining Data with pandas

Financial dataset

Image of newspaper with dow jones chart

1 Photo by Markus Spiske on Unsplash
Joining Data with pandas

Stock data

Table Name: aapl

  date        close    
0 2007-02-01  12.087143
1 2007-03-01  13.272857
2 2007-04-01  14.257143
3 2007-05-01  17.312857
4 2007-06-01  17.434286

Table Name: mcd

  date        close    
0 2007-01-01  44.349998
1 2007-02-01  43.689999
2 2007-03-01  45.049999
3 2007-04-01  48.279999
4 2007-05-01  50.549999
Joining Data with pandas

Merging stock data

import pandas as pd
pd.merge_ordered(aapl, mcd, on='date', suffixes=('_aapl','_mcd'))
  date        close_aapl  close_mcd
0 2007-01-01  NaN         44.349998
1 2007-02-01  12.087143   43.689999
2 2007-03-01  13.272857   45.049999
3 2007-04-01  14.257143   48.279999
4 2007-05-01  17.312857   50.549999
5 2007-06-01  17.434286   NaN
Joining Data with pandas

Forward fill

Image of a before and after of a forward fill. Missing values in a column are filled with previous values of that column.

Joining Data with pandas

Forward fill example

pd.merge_ordered(aapl, mcd, on='date', 
                 suffixes=('_aapl','_mcd'), 
                 fill_method='ffill')
  date        close_aapl  close_mcd
0 2007-01-01  NaN         44.349998
1 2007-02-01  12.087143   43.689999
2 2007-03-01  13.272857   45.049999
3 2007-04-01  14.257143   48.279999
4 2007-05-01  17.312857   50.549999
5 2007-06-01  17.434286   50.549999
pd.merge_ordered(aapl, mcd, on='date', 
                 suffixes=('_aapl','_mcd'))
  date        close_aapl  close_mcd
0 2007-01-01  NaN         44.349998
1 2007-02-01  12.087143   43.689999
2 2007-03-01  13.272857   45.049999
3 2007-04-01  14.257143   48.279999
4 2007-05-01  17.312857   50.549999
5 2007-06-01  17.434286   NaN
Joining Data with pandas

When to use merge_ordered()?

  • Ordered data / time series
  • Filling in missing values
Joining Data with pandas

Let's practice!

Joining Data with pandas

Preparing Video For Download...