Fitting a linear regression

Introduction to Regression with statsmodels in Python

Maarten Van den Broeck

Content Developer at DataCamp

Straight lines are defined by two things

Intercept

The $y$ value at the point when $x$ is zero.

Slope

The amount the $y$ value increases if you increase $x$ by one.

Equation

$y = \text{intercept} + \text{slope} * x$

Introduction to Regression with statsmodels in Python

Estimating the intercept

A scatter plot of total payment versus number of claims with a linear trend line. The payment increases linearly as the number of claims increases.

Introduction to Regression with statsmodels in Python

Estimating the intercept

The scatter plot of total payment versus number of claims, annotated with the point where the trend line crosses the y-axis.

Introduction to Regression with statsmodels in Python

Estimating the intercept

The scatter plot of total payment versus number of claims, annotated with the value when the number of claims is zero.

Introduction to Regression with statsmodels in Python

Estimating the slope

The scatter plot of total payment versus number of claims, annotated with two points on the trend line. One point is at 1500 krona and 40 claims; another point is at 3500 krona and 100 claims.

Introduction to Regression with statsmodels in Python

Estimating the slope

The scatter plot of total payment versus number of claims, annotated with the difference in payment between the two points. 3500 krona minus 1500 krona is 2000 krona.

Introduction to Regression with statsmodels in Python

Estimating the slope

The scatter plot of total payment versus number of claims, annotated with the difference in number of claims between the two points. 100 claims minus 40 claims is 60 claims.

Introduction to Regression with statsmodels in Python

Estimating the slope

The scatter plot of total payment versus number of claims, annotated with the ratio of difference and payment and difference in number of claims. 2000 divided by 60 is about 33.

Introduction to Regression with statsmodels in Python

Running a model

from statsmodels.formula.api import ols

mdl_payment_vs_claims = ols("total_payment_sek ~ n_claims", data=swedish_motor_insurance)
mdl_payment_vs_claims = mdl_payment_vs_claims.fit()
print(mdl_payment_vs_claims.params)
Intercept    19.994486
n_claims      3.413824
dtype: float64
Introduction to Regression with statsmodels in Python

Interpreting the model coefficients

Intercept    19.994486
n_claims      3.413824
dtype: float64

Equation

$\text{total\_payment\_sek} = 19.99 + 3.41 * \text{n\_claims}$

Introduction to Regression with statsmodels in Python

Let's practice!

Introduction to Regression with statsmodels in Python

Preparing Video For Download...