Making predictions

Introduction to Regression with statsmodels in Python

Maarten Van den Broeck

Content Developer at DataCamp

The fish dataset: bream

bream = fish[fish["species"] == "Bream"]
print(bream.head())
  species  mass_g  length_cm
0   Bream   242.0       23.2
1   Bream   290.0       24.0
2   Bream   340.0       23.9
3   Bream   363.0       26.3
4   Bream   430.0       26.5

Common bream, _Abramis brama_

Introduction to Regression with statsmodels in Python

Plotting mass vs. length

sns.regplot(x="length_cm",
            y="mass_g",
            data=bream,
            ci=None)

plt.show()

A scatter plot of bream masses versus their lengths, with a linear trend line. The points all lie close to the trend line.

Introduction to Regression with statsmodels in Python

Running the model

mdl_mass_vs_length = ols("mass_g ~ length_cm", data=bream).fit()

print(mdl_mass_vs_length.params)
Intercept   -1035.347565
length_cm      54.549981
dtype: float64
Introduction to Regression with statsmodels in Python

Data on explanatory values to predict

If I set the explanatory variables to these values,
what value would the response variable have?

explanatory_data = pd.DataFrame({"length_cm": np.arange(20, 41)})
    length_cm
0          20
1          21
2          22
3          23
4          24
5          25
     ...
Introduction to Regression with statsmodels in Python

Call predict()

print(mdl_mass_vs_length.predict(explanatory_data))
0       55.652054
1      110.202035
2      164.752015
3      219.301996
4      273.851977
    ...
16     928.451749
17     983.001730
18    1037.551710
19    1092.101691
20    1146.651672
Length: 21, dtype: float64
Introduction to Regression with statsmodels in Python

Predicting inside a DataFrame

explanatory_data = pd.DataFrame(
  {"length_cm": np.arange(20, 41)}
)

prediction_data = explanatory_data.assign( mass_g=mdl_mass_vs_length.predict(explanatory_data) )
print(prediction_data)
    length_cm         mass_g
0          20      55.652054
1          21     110.202035
2          22     164.752015
3          23     219.301996
4          24     273.851977
..        ...            ...
16         36     928.451749
17         37     983.001730
18         38    1037.551710
19         39    1092.101691
20         40    1146.651672
Introduction to Regression with statsmodels in Python

Showing predictions

import matplotlib.pyplot as plt
import seaborn as sns
fig = plt.figure()
sns.regplot(x="length_cm",
            y="mass_g",
            ci=None,
            data=bream,)
sns.scatterplot(x="length_cm",
                y="mass_g",
                data=prediction_data, 
                color="red",
                marker="s")
plt.show()

The scatter plot of bream masses versus their lengths, with a linear trend line. the plot has been annotated with the points calculated using predict(). These points all follow the trend line exactly.

Introduction to Regression with statsmodels in Python

Extrapolating

Extrapolating means making predictions outside the range of observed data.

little_bream = pd.DataFrame({"length_cm": [10]})

pred_little_bream = little_bream.assign(
    mass_g=mdl_mass_vs_length.predict(little_bream))

print(pred_little_bream)
   length_cm      mass_g
0         10 -489.847756

The scatter plot of bream masses versus their lengths, with a linear trend line. The plot has been annotated with a fictional 10 cm bream and its predicted mass.

Introduction to Regression with statsmodels in Python

Let's practice!

Introduction to Regression with statsmodels in Python

Preparing Video For Download...