Supervised Learning in R: Regression
Nina Zumel and John Mount
Win-Vector LLC

has_dmd    inputs: CK, Hmodel <- lm(has_dmd ~ CK + H, 
            data = train)
test$pred <- predict(
    model, 
    newdata = test
)
outcome: has_dmd $\in$ {0,1}
Model predicts values outside the range [0:1]

$$ log(\frac{p}{1-p}) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... $$
glm(formula, data, family = binomial)
family = binomialmodel <- glm(has_dmd ~ CK + H, data = train, family = binomial)
model
Call:  glm(formula = has_dmd ~ CK + H, family = binomial, data = train)
Coefficients:
(Intercept)           CK            H  
  -16.22046      0.07128      0.12552  
Degrees of Freedom: 86 Total (i.e. Null);  84 Residual
Null Deviance:       110.8 
Residual Deviance: 45.16     AIC: 51.16
predict(model, newdata, type = "response")
newdata: by default, training datatype = "response"model <- glm(has_dmd ~ CK + H, data = train, family = binomial)
test$pred <- predict(model, newdata = test, type = "response")

$$ R^2 = 1 - \frac{RSS}{SS_{Tot}} $$
$$ pseudo R^2 = 1 - \frac{deviance}{null.deviance} $$
Using broom::glance() 
glance(model) %>% 
  summarize(pR2 = 1 - deviance/null.deviance)
   pseudoR2
1 0.5922402
Using sigr::wrapChiSqTest()
wrapChiSqTest(model)
"... pseudo-R2=0.59 ..."
# Test data
test %>% 
  mutate(pred = predict(model, newdata = test, type = "response")) %>%
  wrapChiSqTest("pred", "has_dmd", TRUE)
Arguments:
GainCurvePlot(test, "pred","has_dmd", "DMD model on test")

Supervised Learning in R: Regression