Filtering tweets

Analyzing Social Media Data in R

Vivek Vijayaraghavan

Data Science Coach

Lesson Overview

  • Filtering based on tweet components
    • Extract original tweets
    • Language of the tweet
    • Popular tweets based on minimum number of retweets and favorites
Analyzing Social Media Data in R

Filtering for original tweets

  • An original tweet is an original posting by a twitter user
  • Not a retweet, quote, or reply
  • Original tweets ensure that content is not repetitive
  • Helps retain user engagement levels
Analyzing Social Media Data in R

Filtering for original tweets

  • -filter used to extract original tweets
  • -filter:retweets excludes all retweets
  • -filter:quote filters out quoted tweets
  • -filter:replies ensures reply type tweets are filtered out
Analyzing Social Media Data in R

Extract tweets without filters

  • Extract tweets on "digital marketing" without any filters
# Extract 100 tweets on "digital marketing"
tweets_all <- search_tweets("digital marketing", n = 100)
Analyzing Social Media Data in R

Extract tweets without filters

  • Check count of values in columns reply_to_screen_name, is_quote, is_retweet
# Check for count of replies
library(plyr)
count(tweets_all$reply_to_screen_name)
x               freq
<fct>          <int>
blairaasmith      2            
javiergosende     1            
juanburgos        1            
WhutTheHale       2            
NA               94
Analyzing Social Media Data in R

Extract tweets without filters

# Check for count of quotes
count(tweets_all$is_quote)
x        freq
<lgl>    <int>
FALSE     98            
TRUE       2
Analyzing Social Media Data in R

Extract tweets without filters

# Check for count of retweets
count(tweets_all$is_retweet)
x         freq
<lgl>    <int>
FALSE      61            
TRUE       39
Analyzing Social Media Data in R

Exclude retweets, quotes, and replies

  • Extract tweets on "digital marketing" applying the -filter
# Apply the '-filter'
tweets_org <- search_tweets("digital marketing 
                            -filter:retweets 
                            -filter:quote 
                            -filter:replies", 
                            n = 100)
Analyzing Social Media Data in R

Exclude retweets, quotes, and replies

  • Check output to see if replies, quotes, and retweets are excluded
# Check for count of replies
library(plyr)
count(tweets_org$reply_to_screen_name)
x         freq
<lgl>    <int>
NA         100
Analyzing Social Media Data in R

Exclude retweets, quotes, and replies

# Check for count of quotes
library(plyr)
count(tweets_org$is_quote)

x         freq
<lgl>    <int>
FALSE     100
# Check for count of retweets
library(plyr)
count(tweets_org$is_retweet)
x         freq
<lgl>    <int>
FALSE     100
Analyzing Social Media Data in R

Filtering tweets on language

  • lang filters tweets based on language
  • Matches tweets of a particular language

Twitter language codes for a few languages

Analyzing Social Media Data in R

Filtering tweets on language

# Filter and extract tweets posted in Spanish
tweets_lang <- search_tweets("brand marketing", lang = "es")
Analyzing Social Media Data in R

Filtering tweets on language

View(tweets_lang)

Tweets extracted in Spanish language

Analyzing Social Media Data in R

Filtering tweets on language

head(tweets_lang$lang)
[1] "es" "es" "es" "es" "es" "es"
Analyzing Social Media Data in R

Filter by retweet and favorite counts

  • min_faves: filter tweets with minimum number of favorites
  • min_retweets: filter tweets with minimum number of retweets
  • Use AND operator to check for both conditions
Analyzing Social Media Data in R

Filter by retweet and favorite counts

# Extract tweets with minimum 100 favorites and retweets 
tweets_pop <- search_tweets("bitcoin min_faves:100 AND
                            min_retweets:100")
Analyzing Social Media Data in R

Filter by retweet and favorite counts

# Create a data frame to check retweet and favorite counts
counts <- tweets_pop[c("retweet_count", "favorite_count")]
head(counts)
retweet_count    favorite_count
    <int>              <int>
1    162                833
2    141                894    
3    164                1128
4    395                1346    
5    475                2271
6    270                1654
Analyzing Social Media Data in R

Filter by retweet and favorite counts

# View the tweets
head(tweets_pop$text)
text    
<chr>
1    As we continue to build the Bakkt Bitcoin Futures contract, we reached a
2    BREAKING: The United States is considering entering into a "currency pact"
3    REMINDER: The Bitcoin ETF will eventually get approved.\n\nNot a question
4    [New Post] Bitcoin is becoming much more important in Hong Kong and India.
5    Reports are surfacing that some Hong Kong ATMs have run out of cash as
6    Bitcoin is the most transparent currency ever created.
Analyzing Social Media Data in R

Let's practice!

Analyzing Social Media Data in R

Preparing Video For Download...