Two numeric explanatory variables

Intermediate Regression with statsmodels in Python

Maarten Van den Broeck

Content Developer at DataCamp

Visualizing three numeric variables

  • 3D scatter plot
  • 2D scatter plot with response as color
Intermediate Regression with statsmodels in Python

Another column for the fish dataset

species mass_g length_cm height_cm
Bream 1000 33.5 18.96
Bream 925 36.2 18.75
Roach 290 24.0 8.88
Roach 390 29.5 9.48
Perch 1100 39.0 12.80
Perch 1000 40.2 12.60
Pike 1250 52.0 10.69
Pike 1650 59.0 10.81
Intermediate Regression with statsmodels in Python

3D scatter plot

3D scatter plot of all fish species, with length, mass and height on the axes. It is difficult to interpret this plot in a 2D plane.

Intermediate Regression with statsmodels in Python

2D scatter plot, color for response

sns.scatterplot(x="length_cm", 
                y="height_cm",
                data=fish,
                hue="mass_g")

2D scatter plot, with mass colored to visualize a third numeric variable.

Intermediate Regression with statsmodels in Python

Modeling with two numeric explanatory variables

mdl_mass_vs_both = ols("mass_g ~ length_cm + height_cm",
                       data=fish).fit()

print(mdl_mass_vs_both.params)
Intercept   -622.150234
length_cm     28.968405
height_cm     26.334804
Intermediate Regression with statsmodels in Python

The prediction flow

from itertools import product

length_cm = np.arange(5, 61, 5)
height_cm = np.arange(2, 21, 2)

p = product(length_cm, height_cm)

explanatory_data = pd.DataFrame(p,
                                columns=["length_cm",
                                         "height_cm"])
prediction_data = explanatory_data.assign(
  mass_g = mdl_mass_vs_both.predict(explanatory_data))
print(prediction_data)
     length_cm  height_cm       mass_g
0            5          2  -424.638603
1            5          4  -371.968995
2            5          6  -319.299387
3            5          8  -266.629780
4            5         10  -213.960172
..         ...        ...          ...
115         60         12  1431.971694
116         60         14  1484.641302
117         60         16  1537.310909
118         60         18  1589.980517
119         60         20  1642.650125

[120 rows x 3 columns]
Intermediate Regression with statsmodels in Python

Plotting the predictions

sns.scatterplot(x="length_cm",
                y="height_cm",
                data=fish,
                hue="mass_g")
sns.scatterplot(x="length_cm",
                y="height_cm",
                data=prediction_data,
                hue="mass_g",
                legend=False,
                marker="s")
plt.show()

scatter plot of fish length, heigth and mass, with a prediction grid.

Intermediate Regression with statsmodels in Python

Including an interaction

mdl_mass_vs_both_inter = ols("mass_g ~ length_cm * height_cm",
                             data=fish).fit()
print(mdl_mass_vs_both_inter.params)
Intercept              159.107480
length_cm                0.301426
height_cm              -78.125178
length_cm:height_cm      3.545435
Intermediate Regression with statsmodels in Python

The prediction flow with an interaction

length_cm = np.arange(5, 61, 5)
height_cm = np.arange(2, 21, 2)

p = product(length_cm, height_cm)

explanatory_data = pd.DataFrame(p,
                                columns=["length_cm",
                                         "height_cm"])

prediction_data = explanatory_data.assign(
  mass_g = mdl_mass_vs_both_inter.predict(explanatory_data))
Intermediate Regression with statsmodels in Python

Plotting the predictions

sns.scatterplot(x="length_cm",
           y="height_cm",
           data=fish,
           hue="mass_g")

sns.scatterplot(x="length_cm",
                y="height_cm",
                data=prediction_data,
                hue="mass_g",
                legend=False,
                marker="s")
plt.show()

scatter plot of fish length, height and mass, with an interaction term on a prediction grid.

Intermediate Regression with statsmodels in Python

Let's practice!

Intermediate Regression with statsmodels in Python

Preparing Video For Download...