Visualize popular terms

Analyzing Social Media Data in R

Vivek Vijayaraghavan

Data Science Coach

Lesson Overview

Extract most frequent terms from the text corpus
Remove custom stop words and refine corpus
Visualize popular terms using bar plot and word cloud

Term frequency

Extract term frequency which is the number of occurrences of each word

# Extract term frequency
library(qdap)
term_count  <-  freq_terms(twt_corpus_final, 60)
term_count

Term frequency

Removing custom stop words

# Create a vector of custom stop words
custom_stop <- c("obesity", "can", "amp", "one", "like", "will", "just", 
                "many", "new", "know", "also", "need", "may", "now", 
                "get", "s", "t", "m", "re")

# Remove custom stop words
twt_corpus_refined <- tm_map(twt_corpus_final,removeWords, custom_stop)

Term count after refining corpus

# Term count after refining corpus
term_count_clean <- freq_terms(twt_corpus_refined, 20)
term_count_clean

Term frequency after refining corpus

Brand promoting an obesity management program can analyze these terms

Bar plot of popular terms

Create a bar plot of terms that occur more than 50 times
Bar plots summarize popular terms in an easily interpretable form

# Create a subset dataframe
term50 <- subset(term_count_clean, FREQ > 50)

Bar plot of most popular terms

library(ggplot2)

# Create a bar plot of frequent terms
ggplot(term50, aes(x = reorder(WORD,  -FREQ),  y = FREQ)) +
       geom_bar(stat = "identity", fill = "blue") + 
       theme(axis.text.x = element_text(angle = 45, hjust = 1))

Bar plot of popular terms

Word cloud

Visualize the frequent terms using word clouds
Word cloud is an image made up of words
Size of each word indicates its frequency
Effective promotional image for campaigns
Communicates the brand messaging and highlights popular terms

Word cloud based on min frequency

The wordcloud() function helps create word clouds

# Create a word cloud based on min frequency
library(wordcloud)
wordcloud(twt_corpus_refined, min.freq = 20, colors = "red", 
          scale = c(3,0.5), random.order = FALSE)

Word cloud based on min frequency

Word cloud based on minimum frequency

Colorful word cloud

# Create a colorful word cloud
library(RColorBrewer)
wordcloud(twt_corpus_refined, max.words = 100, 
          colors = brewer.pal(6,"Dark2"), scale = c(2.5,.5),
          random.order = FALSE)

Colorful word cloud

Word cloud with different colors

Let's practice!

Analyzing Social Media Data in R