Introduction to Text Analysis in R
Maham Faisal Khan
Senior Data Science Content Developer
sparse_review
Terms
Docs admit ago albeit amazing angle awesome
4 1 0 1 0 0 0
5 0 1 0 1 1 0
3 0 0 0 0 0 1
2 0 0 0 0 0 0
tidy_review %>%
count(word, id) %>%
cast_dtm(id, word, n)
<<DocumentTermMatrix (documents: 1791, terms: 9669)>>
Non-/sparse entries: 62766/17252622
Sparsity : 100%
Maximal term length: NA
Weighting : term frequency (tf)
dtm_review <- tidy_review %>% count(word, id) %>% cast_dtm(id, word, n) %>% as.matrix()
dtm_review[1:4, 2000:2004]
Terms
Docs consecutive consensus consequences considerable considerably
223 0 0 0 0 0
615 0 0 0 0 0
1069 0 0 0 0 0
425 0 0 0 0 0
Introduction to Text Analysis in R