Introduction to Natural Language Processing in R
Kasey Jones
Research Data Scientist
# A tibble: 1,498 x 6
X word n tf idf tf_idf
<int> <chr> <int> <dbl> <dbl> <dbl>
1 20 january 4 0.0930 2.30 0.214
2 15 power 4 0.0690 3.00 0.207
3 19 futures 9 0.0643 3.00 0.193
4 8 8 6 0.0619 3.00 0.185
5 3 canada 2 0.0526 3.00 0.158
6 3 canadian 2 0.0526 3.00 0.158
crude_weights <- crude_tibble %>%
unnest_tokens(output = "word", token = "words", input = text) %>%
anti_join(stop_words) %>%
count(word, X) %>%
bind_tf_idf(word, X, n)
# A tibble: 1,498 x 6
X word n tf idf tf_idf
<int> <chr> <int> <dbl> <dbl> <dbl>
1 1 1.50 1 0.25 3.25 0.812
2 1 16.00 1 1 3.25 3.25
3 1 barrel 2 0.133 3.25 0.433
...
pairwise_similarity(tbl, item, feature, value, ...)
crude_weights %>%
pairwise_similarity(X, word, tf_idf) %>%
arrange(desc(similarity))
# A tibble: 380 x 3
item1 item2 similarity
<int> <int> <dbl>
1 17 16 0.663
2 16 17 0.663
3 13 10 0.311
4 10 13 0.311
...
Introduction to Natural Language Processing in R