Fitting a linear regression

Introduction to Regression in R

Richie Cotton

Data Evangelist at DataCamp

Straight lines are defined by two things

Intercept

The $y$ value at the point when $x$ is zero.

Slope

The amount the $y$ value increases if you increase $x$ by one.

Equation

$y = intercept + slope * x$

Introduction to Regression in R

Estimating the intercept

A scatter plot of total payment versus number of claims with a linear trend line. The payment increases linearly as the number of claims increases.

Introduction to Regression in R

Estimating the intercept

The scatter plot of total payment versus number of claims, annotated with the point where the trend line crosses the y-axis.

Introduction to Regression in R

Estimating the intercept

The scatter plot of total payment versus number of claims, annotated with the value when the number of claims is zero.

Introduction to Regression in R

Estimating the slope

The scatter plot of total payment versus number of claims, annotated with two points on the trend line. One point is at 150 krona and 40 claims; another point is at 400 krona and 110 claims.

Introduction to Regression in R

Estimating the slope

The scatter plot of total payment versus number of claims, annotated with the difference in payment between the two points. 400 krona minus 150 krona is 250 krona.

Introduction to Regression in R

Estimating the slope

The scatter plot of total payment versus number of claims, annotated with the difference in number of claims between the two points. 110 claims minus 40 claims is 70 claims.

Introduction to Regression in R

Estimating the slope

The scatter plot of total payment versus number of claims, annotated with the ratio of difference and payment and difference in number of claims. 2000 divided by 60 is about 33.

Introduction to Regression in R

Running a model

lm(total_payment_sek ~ n_claims, data = swedish_motor_insurance)
Call:
lm(formula = total_payment_sek ~ n_claims, data = swedish_motor_insurance)

Coefficients:
(Intercept)     n_claims  
     19.994        3.414
Introduction to Regression in R

Interpreting the model coefficients

Call:
lm(formula = total_payment_sek ~ n_claims, data = swedish_motor_insurance)

Coefficients:
(Intercept)     n_claims  
     19.994        3.414

Equation

$total\_payment\_sek = 19.994 + 3.414 * n\_claims$

Introduction to Regression in R

Let's practice!

Introduction to Regression in R

Preparing Video For Download...