How many words do YOU know? Zipf's law & subjectivity lexicon

Sentiment Analysis in R

Ted Kwartler

Data Dude

Subjectivity lexicon

library(qdap)
library(magrittr)

text_df %$% polarity(text)

Returns a "polarity" object with positive and negative scores.

A subjectivity lexicon is a predefined list of words associated with emotional context such as positive/negative, or specific emotions like "frustration" or "joy."

Sentiment Analysis in R

Where to get subjectivity lexicons?

  • qdap's polarity() function uses a lexicon from hash_sentiment_huliu

  • tidytext has a sentiments tibble with

    • NRC - Words according to 8 emotions like "angry" or "joy" and Pos/Neg
    • Bing - Words labeled positive or negative
    • AFINN - Words scored from -5 to 5
Sentiment Analysis in R

library(lexicon)

Name Description
dodds_sentiment Mechanical Turk Sentiment Words
hash_emoticons Translations of basic punctuation emoticons :)
hash_sentiment_huliu U of IL @CHI Polarity (+/-) word research
hash_sentiment_jockers A lexicon inherited from library(syuzhet)
hash_sentiment_nrc 5468 words crowdsourced scoring between -1 & 1
Sentiment Analysis in R

No way! Too few words.

thinking

  • Zipf's Law
  • Principle of Least Effort
Sentiment Analysis in R

Zipf's Law in action

Rank City 2010 Census Population Actual % Zipf's Expected %
1 New York 8,175,133 100% ...
2 LA 3,792,621 46% 50%
3 Chicago 2,695,598 33% 33%
4 Houston 2,100,263 26% 25%
5 Philadelphia 1,526,006 19% 20%
Sentiment Analysis in R

Principle of Least Effort

If there are several ways of achieving the same goal, people will choose the least demanding course of action

lazy cat

Sentiment Analysis in R

Up next...

twitter logo

football

Sentiment Analysis in R

Let's practice!

Sentiment Analysis in R

Preparing Video For Download...