Machine learning on Spark

Introduction to Spark with sparklyr in R

Richie Cotton

Data Evangelist at DataCamp

Gradient boosted tree models and random forests

model_formula <- reformulate(features, response = "response")

model <- training_data %>%
  ml_gradient_boosted_trees(formula)

predicted <- ml_predict(
    model,
    testing_data) %>% pull(prediction)

results <- testing_data %>%
  select(response)
  collect() %>%
  mutate(predicted_response = predicted)
Introduction to Spark with sparklyr in R

Let's practice!

Introduction to Spark with sparklyr in R

Preparing Video For Download...