Analisis sentimen

Pengantar Natural Language Processing di R

Kasey Jones

Research Data Scientist

Analisis sentimen

  • Nilai informasi subjektif dari teks
  • Jenis analisis sentimen:
    • positif vs negatif
    • kata yang memicu emosi
  • Tiap kata diberi makna dan kadang skor
    • abandon -> fear
    • accomplish -> joy
Pengantar Natural Language Processing di R

Sentimen di tidytext

library(tidytext)
sentiments
# A tibble: 27,314 x 4
   word        sentiment lexicon score
   <chr>       <chr>     <chr>   <int>
 1 abacus      trust     nrc        NA
 2 abandon     fear      nrc        NA
 3 abandon     negative  nrc        NA
 4 abandon     sadness   nrc        NA
 5 abandoned   anger     nrc        NA
Pengantar Natural Language Processing di R

3 leksikon

  • AFINN: memberi skor -5 (sangat negatif) hingga 5 (sangat positif)
  • bing: label positif/negatif untuk semua kata
  • nrc: label emosi seperti fear, joy, anger, dll.
library(tidytext)
get_sentiments("afinn")
# A tibble: 2,476 x 2
 1 abandon       -2
 2 abandoned     -2
 3 abandons      -2
 ...
Pengantar Natural Language Processing di R

Siapkan data Anda.

# Baca data
animal_farm <- read.csv("animal_farm.csv")
animal_farm <- as_tibble(animal_farm)

# Tokenisasi dan hapus stop word
animal_farm_tokens <- animal_farm %>%
  unnest_tokens(output = "word", token = "words", input = text_column) %>%
  anti_join(stop_words)
Pengantar Natural Language Processing di R

Leksikon afinn

animal_farm_tokens %>% 
  inner_join(get_sentiments("afinn"))
# A tibble: 1,175 x 3
   chapter   word        score
   <chr>     <chr>       <int>
 1 Chapter 1 drunk          -2
 2 Chapter 1 strange        -1
 3 Chapter 1 dream           1
 4 Chapter 1 agreed          1
 5 Chapter 1 safely          1
Pengantar Natural Language Processing di R

afinn lanjutan

animal_farm_tokens %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(chapter) %>% 
  summarise(sentiment = sum(score)) %>%
  arrange(sentiment)
# A tibble: 10 x 2
   chapter    sentiment
   <chr>          <int>
 1 Chapter 7       -166
 2 Chapter 8       -158
 3 Chapter 4        -84
Pengantar Natural Language Processing di R

Leksikon bing

word_totals <- animal_farm_tokens %>% 
  group_by(chapter) %>% 
  count()
animal_farm_tokens %>% 
  inner_join(get_sentiments("bing")) %>% 
  group_by(chapter) %>% 
  count(sentiment) %>%
  filter(sentiment == 'negative') %>%
  transform(p = n / word_totals$n) %>%
  arrange(desc(p))
      chapter sentiment   n          p
1   Chapter 7  negative 154 0.11711027
2   Chapter 6  negative 106 0.10750507
3   Chapter 4  negative  68 0.10559006
4  Chapter 10  negative 117 0.10372340
5   Chapter 8  negative 155 0.10006456
6   Chapter 9  negative 121 0.09152799
7   Chapter 3  negative  65 0.08843537
8   Chapter 1  negative  77 0.08603352
9   Chapter 5  negative  93 0.08462238
10  Chapter 2  negative  67 0.07395143
Pengantar Natural Language Processing di R

Leksikon nrc

as.data.frame(table(get_sentiments("nrc")$sentiment)) %>%
  arrange(desc(Freq))
           Var1 Freq
1      negative 3324
2      positive 2312
3          fear 1476
4         anger 1247
5         trust 1231
6       sadness 1191
...
Pengantar Natural Language Processing di R

nrc lanjutan

fear <- get_sentiments("nrc") %>% 
  filter(sentiment == "fear")
animal_farm_tokens %>%
  inner_join(fear) %>%
  count(word, sort = TRUE)
# A tibble: 220 x 2
   word           n
   <chr>      <int>
 1 rebellion     29
 2 death         19
 3 gun           19
 4 terrible      15
 5 bad           14
 6 enemy         12
 7 broke         11
 ...
Pengantar Natural Language Processing di R

Waktunya sentimen.

Pengantar Natural Language Processing di R

Preparing Video For Download...