Interpreting topics

Introduction to Text Analysis in R

Maham Faisal Khan

Senior Data Science Content Developer

Two topics

lda_topics <- LDA(
  dtm_review,
  k = 2,
  method = "Gibbs",
  control = list(seed = 42)
) %>% 
  tidy(matrix = "beta")

word_probs <- lda_topics %>% 
  group_by(topic) %>% 
  slice_max(beta, n = 15) %>% 
  ungroup() %>%
  mutate(term2 = fct_reorder(term, beta))

Two topics

ggplot(
  word_probs, 
  aes(
    term2, 
    beta, 
    fill = as.factor(topic)
  )
) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ topic, scales = "free") +
  coord_flip()

Three topics

lda_topics2 <- LDA(
  dtm_review,
  k = 3,
  method = "Gibbs",
  control = list(seed = 42)
) %>% 
  tidy(matrix = "beta")

word_probs2 <- lda_topics2 %>% 
  group_by(topic) %>% 
  slice_max(beta, n = 15) %>% 
  ungroup() %>%
  mutate(term2 = fct_reorder(term, beta))

Three topics

ggplot(
  word_probs2, 
  aes(
    term2, 
    beta, 
    fill = as.factor(topic)
  )
) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ topic, scales = "free") +
  coord_flip()

Four topics

The art of model selection

Adding topics that are different is good
If we start repeating topics, we've gone too far
Name the topics based on the combination of high-probability words

Let's practice!

Introduction to Text Analysis in R