Text mining met bag-of-words in R
Ted Kwartler
Instructor
# Gebruik alleen de eerste 2 coffee-tweets
tweets$text[1:2]
[1] @ayyytylerb that is so true drink lots of coffee
[2] RT @bryzy_brib: Senior March tmw morning at 7:25 A.M. in the SENIOR lot. Get up early, make yo coffee/breakfast, cus this will only happen…
# Maak een unigram-DTM op de eerste 2 coffee-tweets
unigram_dtm <- DocumentTermMatrix(text_corp)
unigram_dtm
<<DocumentTermMatrix (documents: 2, terms: 18)>>
Non-/sparse entries: 18/18
Sparsity : 50%
Maximal term length: 15
Weighting : term frequency (tf)
# Laad het RWeka-pakket
library(RWeka)
# Definieer bigram-tokenizer tokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2))# Maak een bigram-TDM bigram_tdm <- TermDocumentMatrix(clean_corpus(text_corp), control = list(tokenize = tokenizer)) bigram_tdm
<<DocumentTermMatrix (documents: 2, terms: 21)>>
Non-/sparse entries: 21/21
Sparsity : 50%
Maximal term length: 19
Weighting : term frequency (tf)
Text mining met bag-of-words in R