Can you guess the language?

Sentiment Analysis in Python

Violeta Misheva

Data Scientist

Language of a string in Python

from langdetect import detect_langs 
foreign = 'Este libro ha sido uno de los mejores libros que he leido.'
detect_langs(foreign)
[es:0.9999945352697024]
Sentiment Analysis in Python

Language of a column

  • Problem: Detect the language of each of the strings and capture the most likely language in a new column
from langdetect import detect_langs 
reviews = pd.read_csv('product_reviews.csv')
reviews.head()

top 5 rows of the Amazon product reviews

Sentiment Analysis in Python

Building a feature for the language

languages = []
for row in range(len(reviews)):
    languages.append(detect_langs(reviews.iloc[row, 1]))
languages
[it:0.9999982541301151],
[es:0.9999954153640488],
[es:0.7142833997345875, en:0.2857160465706441],
[es:0.9999942365605781],
[es:0.999997956049055] ...
Sentiment Analysis in Python

Building a feature for the language

# Transform the first list to a string and split on a colon
str(languages[0]).split(':') 
['[es', '0.9999954153640488]']
str(languages[0]).split(':')[0]
'[es'
str(languages[0]).split(':')[0][1:]
'es'
Sentiment Analysis in Python

Building a feature for the language

languages = [str(lang).split(':')[0][1:] for lang in languages]
reviews['language'] = languages
Sentiment Analysis in Python

Let's practice!

Sentiment Analysis in Python

Preparing Video For Download...