Model Building and Evaluation with tidymodels

Dimensionality Reduction in R

Matt Pickard

Owner, Pickard Predictives, LLC

Model fitting process

first step of modeling fitting is splitting the data

Dimensionality Reduction in R

Model fitting process

the second step of model fitting is preparing the data

Dimensionality Reduction in R

Model fitting process

the third step of model fitting is fitting the model

Dimensionality Reduction in R

Model fitting process

the fourth step of model fitting is evaluating the model

Dimensionality Reduction in R

Model fitting with tidymodels

tidymodels has functions to split the data into train and test sets

Dimensionality Reduction in R

Model fitting with tidymodels

tidymodel recipes have functions to create step for pre-processing the data

Dimensionality Reduction in R

Model fitting with tidymodels

tidymodels have functions to fit a variety of different models in the workflow

Dimensionality Reduction in R

Splitting out train and test sets

split <- initial_split(credit_df, prop = 0.8, strata = credit_score)


train <- split %>% training()
test <- split %>% testing()
Dimensionality Reduction in R

Creating a recipe and a model

feature_selection_recipe <- 
  recipe(credit_score ~ ., data = train) %>%

step_filter_missing(all_predictors(), threshold = 0.5) %>%
step_scale(all_numeric_predictors()) %>%
step_nzv(all_predictors()) %>%
prep()
lr_model <- logistic_reg() %>%

set_engine("glm")
Dimensionality Reduction in R

Create and fit the workflow

credit_wflow <- workflow() %>%

add_recipe(feature_selection_recipe) %>%
add_model(lr_model)
credit_fit <- credit_wflow %>% fit(data = train)
Dimensionality Reduction in R

Evaluate the model

# Predict test data
credit_pred_df <- predict(credit_fit, test) %>% 
  bind_cols(test %>% select(credit_score))


# Evaluate F score f_meas(credit_pred_df, credit_score, .pred_class)
# A tibble: 1 × 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 f_meas  macro          0.519
Dimensionality Reduction in R

Explore the recipe with tidy()

tidy(feature_selection_recipe, number = 1)
# A tibble: 2 × 2
  terms            id                  
  <chr>            <chr>               
1 age              filter_missing_gVVfc
2 outstanding_debt filter_missing_gVVfc
Dimensionality Reduction in R

Explore the model with tidy()

# Display model estimates
tidy(credit_fit)
# A tibble: 44 × 5
   term                estimate std.error statistic p.value
   <chr>                  <dbl>     <dbl>     <dbl>   <dbl>
 1 (Intercept)           2.88       0.918    3.13   0.00173
 2 monthAugust          -0.449      0.236   -1.91   0.0565 
 3 monthFebruary        17.7      677.       0.0262 0.979  
 4 monthJanuary         17.7      661.       0.0268 0.979  
 ...                    ...       ...        ...    ... 
Dimensionality Reduction in R

Let's practice!

Dimensionality Reduction in R

Preparing Video For Download...