Interpreting topics

Introduction to Text Analysis in R

Maham Faisal Khan

Senior Data Science Content Developer

Two topics

lda_topics <- LDA(
  dtm_review,
  k = 2,
  method = "Gibbs",
  control = list(seed = 42)
) %>% 
  tidy(matrix = "beta")

word_probs <- lda_topics %>% group_by(topic) %>% slice_max(beta, n = 15) %>% ungroup() %>% mutate(term2 = fct_reorder(term, beta))
Introduction to Text Analysis in R

Two topics

ggplot(
  word_probs, 
  aes(
    term2, 
    beta, 
    fill = as.factor(topic)
  )
) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ topic, scales = "free") +
  coord_flip()

Introduction to Text Analysis in R

Three topics

lda_topics2 <- LDA(
  dtm_review,
  k = 3,
  method = "Gibbs",
  control = list(seed = 42)
) %>% 
  tidy(matrix = "beta")

word_probs2 <- lda_topics2 %>% group_by(topic) %>% slice_max(beta, n = 15) %>% ungroup() %>% mutate(term2 = fct_reorder(term, beta))
Introduction to Text Analysis in R

Three topics

ggplot(
  word_probs2, 
  aes(
    term2, 
    beta, 
    fill = as.factor(topic)
  )
) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ topic, scales = "free") +
  coord_flip()

Introduction to Text Analysis in R

Four topics

Introduction to Text Analysis in R

The art of model selection

  • Adding topics that are different is good
  • If we start repeating topics, we've gone too far
  • Name the topics based on the combination of high-probability words
Introduction to Text Analysis in R

Let's practice!

Introduction to Text Analysis in R

Preparing Video For Download...