Intro to word clouds

Text Mining with Bag-of-Words in R

Ted Kwartler

Instructor

A simple word cloud

# Convert TDM to matrix
chardonnay_tdm <- TermDocumentMatrix(clean_chardonnay)
chardonnay_m <- as.matrix(chardonnay_tdm)

# Sum rows and sort by frequency term_frequency <- rowSums(chardonnay_m) word_freqs <- data.frame(term = names(term_frequency), num = term_frequency)
# Make word cloud wordcloud(word_freqs$term, word_freqs$num, max.words = 100, colors = "red")

wordcloud.png

Text Mining with Bag-of-Words in R

The impact of stop words

clean_corpus <- function(corpus){
  corpus <- tm_map(corpus, removePunctuation)
  corpus <- tm_map(corpus, stripWhitespace)
  corpus <- tm_map(corpus, removeNumbers)
  corpus <- tm_map(corpus, 
                   content_transformer(tolower))
  corpus <- tm_map(corpus, removeWords, 
                   c(stopwords("en"), "amp"))
  return(corpus)
}

wordcloud2.png

Text Mining with Bag-of-Words in R

Removing uninformative words

clean_corpus <- function(corpus){
  corpus <- tm_map(corpus, removePunctuation)
  corpus <- tm_map(corpus, stripWhitespace)
  corpus <- tm_map(corpus, removeNumbers)
  corpus <- tm_map(corpus, 
                   content_transformer(tolower))

corpus <- tm_map(corpus, removeWords, c(stopwords("en"), "amp", "chardonnay", "wine", "glass"))
return(corpus) }

wordcloud3.png

Text Mining with Bag-of-Words in R

Let's practice!

Text Mining with Bag-of-Words in R

Preparing Video For Download...