Supervised Learning in R: Regression
Nina Zumel and John Mount
Win-Vector LLC
has_dmd
inputs: CK
, H
model <- lm(has_dmd ~ CK + H,
data = train)
test$pred <- predict(
model,
newdata = test
)
outcome: has_dmd
$\in$ {0,1}
Model predicts values outside the range [0:1]
$$ log(\frac{p}{1-p}) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... $$
glm(formula, data, family = binomial)
family = binomial
model <- glm(has_dmd ~ CK + H, data = train, family = binomial)
model
Call: glm(formula = has_dmd ~ CK + H, family = binomial, data = train)
Coefficients:
(Intercept) CK H
-16.22046 0.07128 0.12552
Degrees of Freedom: 86 Total (i.e. Null); 84 Residual
Null Deviance: 110.8
Residual Deviance: 45.16 AIC: 51.16
predict(model, newdata, type = "response")
newdata
: by default, training datatype = "response"
model <- glm(has_dmd ~ CK + H, data = train, family = binomial)
test$pred <- predict(model, newdata = test, type = "response")
$$ R^2 = 1 - \frac{RSS}{SS_{Tot}} $$
$$ pseudo R^2 = 1 - \frac{deviance}{null.deviance} $$
Using broom::glance()
glance(model) %>%
summarize(pR2 = 1 - deviance/null.deviance)
pseudoR2
1 0.5922402
Using sigr::wrapChiSqTest()
wrapChiSqTest(model)
"... pseudo-R2=0.59 ..."
# Test data
test %>%
mutate(pred = predict(model, newdata = test, type = "response")) %>%
wrapChiSqTest("pred", "has_dmd", TRUE)
Arguments:
GainCurvePlot(test, "pred","has_dmd", "DMD model on test")
Supervised Learning in R: Regression