Plotting twitter data over time

Analyzing Social Media Data in R

Vivek Vijayaraghavan

Data Science Coach

Lesson overview

Time series data
Create time series objects and plots
Visualize frequency of tweets over time
Compare brand salience of two brands

Definition of Brand salience

Brand salience measured by volume of tweets

Time series data

Series of data points indexed over time
Visualize frequency of tweets

A time series plot

Extracting tweets for time series analysis

Extract tweets for time series analysis using search_tweets()

library(rtweet)

# Extract tweets on "#google" using search_tweets()
search_tweets("#google", n = 18000, include_rts = FALSE)

Extracted tweet data

status_id                 created_at           screen_name
<chr>                    <S3: POSIXct>            <chr>
1164921105066463232    2019-08-23 15:23:29    catapanoannal            
1164921037143699456    2019-08-23 15:23:13    STARBEXPLORE    
1164920927341039621    2019-08-23 15:22:46    indra_susanto    
1164920898475794435    2019-08-23 15:22:40    virfice    
1164920877940482048    2019-08-23 15:22:35    KnowledgeNile    
1164920647962832897    2019-08-23 15:21:40    mahomes_tech

created_at has the timestamp of the tweets

Visualize frequency of tweets

Analysing tweet frequencies

Monitor overall engagement for a product
Tweet frequencies: insights on interest level

Visualize tweet frequency

# Extract tweets on "#camry" using search_tweets()
camry_st <- search_tweets("#camry", n = 18000, include_rts = FALSE)

Visualize tweet frequency

created_at            screen_name                    text
<S3: POSIXct>           <chr>                        <chr>
2019-08-23 03:29:58    dromru        Toyota Camry 2019 <U+0433><U+043E><U+0434><U+0        
2019-08-23 02:59:04    NusTrivia     Sportier 2020 Toyota Camry TRD to cost $31,995    
2019-08-22 18:09:06    NusTrivia     2020 Toyota Camry TRD Costs $31,995, It’s The     
2019-08-23 01:56:51    RaitisRides   ALL NEW 2020 Toyota Avalon is coming to R    
2019-08-23 01:17:36    jhooie        I have to say, when I finally settled down tod

Create time series plot

# Create a time series plot
ts_plot(camry_st, by = "hours", color = "blue")

Time series plot of tweets on Camry

Compare frequency of tweets

Volume of tweets posted is a strong indicator of brand salience
Compare the brand salience of Tesla and Camry

Tesla vs Camry

Compare frequency of tweets

Convert the tweets extracted on Camry into a time series object
Time series object contains aggregated frequency of tweets over a time interval

# Convert tweet data into a time series object
camry_ts <- ts_data(camry_st, by = 'hours')
head(camry_ts)

time                    n
<S3: POSIXct>         <int>
2019-08-13 14:00:00     12            
2019-08-13 15:00:00     34            
2019-08-13 16:00:00      1            
2019-08-13 17:00:00      2

Compare frequency of tweets

# Rename the two columns in the time series object
names(camry_ts) <- c("time", "camry_n")

head(camry_ts)

time                  camry_n
<S3: POSIXct>          <int>
2019-08-13 14:00:00      12            
2019-08-13 15:00:00      34            
2019-08-13 16:00:00       1            
2019-08-13 17:00:00       2

Compare frequency of tweets

tesla_st <- search_tweets("#tesla", n = 18000, include_rts = FALSE)
tesla_ts <- ts_data(tesla_st, by = 'hours')

names(tesla_ts) <- c("time", "tesla_n")
head(tesla_ts)

time                 tesla_n
<S3: POSIXct>         <int>
2019-08-13 13:00:00    17            
2019-08-13 14:00:00    58            
2019-08-13 15:00:00    38            
2019-08-13 16:00:00    32            
2019-08-13 17:00:00    38

Compare frequency of tweets

# Merge the two time series objects and retain "time" column
merged_df <- merge(tesla_ts, camry_ts, by = "time", all = TRUE)
head(merged_df)

time                tesla_n    camry_n
<S3:POSIXct>          <int>     <int>
2019-08-13 13:00:00    17        NA    
2019-08-13 14:00:00    58        12    
2019-08-13 15:00:00    38        34    
2019-08-13 16:00:00    32         1

Compare frequency of tweets

# Stack the tweet frequency columns using melt() function
library(reshape)
melt_df <- melt(merged_df, na.rm = TRUE, id.vars = "time")
head(melt_df)

time                  variable       value
<S3: POSIXct>          <fct>        <int>
2019-08-13 13:00:00    tesla_n        17    
2019-08-13 14:00:00    tesla_n        58    
2019-08-13 15:00:00    tesla_n        38    
2019-08-13 16:00:00    tesla_n        32    
2019-08-13 17:00:00    tesla_n        38    
2019-08-13 18:00:00    tesla_n        34

Compare frequency of tweets

# Plot frequency of tweets on Camry and Tesla
ggplot(data = melt_df, 
       aes(x = time, y = value, col = variable)) +
       geom_line(lwd = 0.8)

The comparison plot

Comparison time series plots of Camry and Tesla

Let's practice!

Analyzing Social Media Data in R