Counting words

Analyzing Social Media Data in Python

Alex Hanna

Computational Social Scientist

Why count words?

  • Basic step for automation of text analysis
  • Can tell us how many times a relevant keyword is mentioned in documents in comparison to others
  • In exercises: #rstats vs #python
Analyzing Social Media Data in Python

Counting with str.contains

  • str.contains
    • pandas Series string method
    • Returns boolean Series
    • case = False - Case insensitive search
Analyzing Social Media Data in Python

Companies dataset

import pandas as pd

tweets = pd.DataFrame(flatten_tweets(companies_json))
apple = tweets['text'].str.contains('apple', case = False)
print(np.sum(apple) / tweets.shape[0])
0.112
Analyzing Social Media Data in Python

Counting in multiple text fields

apple = tweets['text'].str.contains('apple', 
                                     case = False) 
for column in ['extended_tweet-full_text',
    'retweeted_status-text',
    'retweeted_status-extended_tweet-full_text']:
    apple = apple | tweets[column].str.contains('apple', 
                                                 case = False)

print(np.sum(apple) / tweets.shape[0])
0.12866666666666668
Analyzing Social Media Data in Python

Let's practice!

Analyzing Social Media Data in Python

Preparing Video For Download...