Model validation

Experimental Design in R

Joanne Xiong

Data Scientist

Pre-modeling EDA

  • Mean and variance of outcome by variable of interest
lendingclub %>% summarise(median(loan_amnt),
                          mean(int_rate), 
                          mean(annual_inc))

lendingclub %>% group_by(verification_status) %>% summarise(mean(funded_amnt), var(funded_amnt))
# A tibble: 3 x 3
verification_status `mean(funded_amnt)` `var(funded_amnt)`
<chr>                             <dbl>              <dbl>
1 Not Verified                   114.15          349.41953
2 Source Verified                156.14          723.53265
3 Verified                       166.08          848.54561
Experimental Design in R

Pre-modeling EDA continued

  • Boxplot of outcome (y-axis) by variable of interest (x-axis).
ggplot(data = lendingclub,
       aes(x = verification_status, y = funded_amnt)) +
    geom_boxplot()
Experimental Design in R

alt text

Experimental Design in R

Post-modeling model validation

  • Residual plot
  • QQ-plot for normality
  • Test ANOVA assumptions
    • Homogeneity of variances
  • Try non-parametric alternatives to ANOVA
Experimental Design in R

alt text

Experimental Design in R

Let's practice!

Experimental Design in R

Preparing Video For Download...