Simple Linear Regressions

Time Series Analysis in Python

Rob Reider

Adjunct Professor, NYU-Courant Consultant, Quantopian

What is a Regression?

  • Simple linear regression:

$\ \ \ \ y_t = \alpha + \beta x_t + \epsilon_t$

Time Series Analysis in Python

What is a Regression?

  • Ordinary Least Squares (OLS)
Time Series Analysis in Python

Python Packages to Perform Regressions

  • In statsmodels:
    import statsmodels.api as sm
    sm.OLS(y, x).fit()
    
  • In numpy:
    np.polyfit(x, y, deg=1)
    
  • In pandas:
    pd.ols(y, x)
    
  • In scipy:
    from scipy import stats
    stats.linregress(x, y)
    

Warning: the order of x and y is not consistent across packages

Time Series Analysis in Python

Example: Regression of Small Cap Returns on Large Cap

  • Import the statsmodels module
    import statsmodels.api as sm
    
  • As before, compute percentage changes in both series
    df['SPX_Ret'] = df['SPX_Prices'].pct_change()
    df['R2000_Ret'] = df['R2000_Prices'].pct_change()
    
  • Add a constant to the DataFrame for the regression intercept
    df = sm.add_constant(df)
    
Time Series Analysis in Python

Regression Example (continued)

  • Notice that the first row of returns is NaN
                SPX_Price  R2000_Price   SPX_Ret  R2000_Ret
    Date                                                     
    2012-11-01  1427.589966   827.849976       NaN        NaN
    2012-11-02  1414.199951   814.369995 -0.009379  -0.016283
    
  • Delete the row of NaN
      df = df.dropna()
    
  • Run the regression
      results = sm.OLS(df['R2000_Ret'],df[['const','SPX_Ret']]).fit()
      print(results.summary())
    
Time Series Analysis in Python

Regression Example (continued)

  • Regression output

  • Intercept in results.params[0]
  • Slope in results.params[1]
Time Series Analysis in Python

Regression Example (continued)

  • Regression output

Time Series Analysis in Python

Relationship Between R-Squared and Correlation

  • $ [\text{corr} (x,y)]^2 = R^2$ (or R-squared)
  • $ \text{sign(corr)} = \text{sign(regression slope)}$
  • In last example:
    • R-Squared = 0.753
    • Slope is positive
    • correlation = $ + \sqrt{0.753} = 0.868$
Time Series Analysis in Python

Let's practice!

Time Series Analysis in Python

Preparing Video For Download...