Building a graph from raw data

Case Studies: Network Analysis in R

Edmund Hart

Instructor

Exploring the data

  • Data is several days of all the tweets mentioning #rstats
  • Key attributes for building a graph are:
    • screen name
    • raw text of the tweet
Case Studies: Network Analysis in R

Anatomy of a tweet

  1. ReecheshJC: "Hey #rstats, how do I do fct_lump but where I lump based on count values in a column?"
  2. kom_256: "RT @elenagbg: Retweeted R-Ladies Madrid (@RLadiesMAD):\n\nEn el #OCSummit17... Fast Talks sobre #rstats organizado por... https://t.co/CKY5aG…"
Case Studies: Network Analysis in R
library(igraph)
library(stringr)
raw_tweets <- read.csv("datasets/rstatstweets.csv", 
  stringsAsFactors = FALSE)

Data sample, single row

user_name:    Karen Millidine
screen_name:    KJMillidine
tweet_tex:t    RT @Rbloggers: RStudio v1.1 Released 
https://t.co/kCMHc689nY #rstats #DataScience
favorites:    0
retweets:    96
location:    None
expanded_url:    https://wp.me/pMm6L-ExV
in_reply_to_tweet_id:    NA
in_reply_to_user_id:    NA
dt:    10/10/17
Case Studies: Network Analysis in R

Building the graph

## Get all the screen names
all_sn <- unique(raw_tweets$screen_name)

## Create graph
retweet_graph <- graph.empty()

## Add screen names as vertices
retweet_graph <- retweet_graph + vertices(all_sn)
Case Studies: Network Analysis in R

Building the graph

## Extract name and add edges
for(i in 1:dim(raw_tweets)[1]){

# Extract retweet name rt_name <- find_rt(raw_tweets$tweet_text[i]) # If there is a name add an edge if(!is.null(rt_name)){
# Check to make sure the vertex exists, if not, add it if(!rt_name %in% all_sn){ retweet_graph <- retweet_graph + vertices(rt_name) }
# add the edge retweet_graph <- retweet_graph + edges(c(raw_tweets$screen_name[i], rt_name))
}
}
Case Studies: Network Analysis in R

Cleaning the graph

## Size the number of degree 0 vertices
sum(degree(retweet_graph) == 0) 

## Trim and simplify
retweet_graph <- simplify(retweet_graph)
retweet_graph <- delete.vertices(retweet_graph, 
  degree(retweet_graph) == 0)
Case Studies: Network Analysis in R

Let's practice!

Case Studies: Network Analysis in R

Preparing Video For Download...