Sentiment Analysis in Python
Violeta Misheva
Data Scientist
Stemming is the process of transforming words to their root forms, even if the stem itself is not a valid word in the language.
staying, stays, stayed ----> stay
house, houses, housing ----> hous
Lemmatization is quite similar to stemming but unlike stemming, it reduces the words to roots that are valid words in the language.
stay, stays, staying, stayed ----> stay
house, houses, housing ----> house
Stemming
Lemmatization
from nltk.stem import PorterStemmer
porter = PorterStemmer()
porter.stem('wonderful')
'wonder'
Snowball Stemmer: Danish, Dutch, English, Finnish, French, German, Hungarian,Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish
from nltk.stem.snowball import SnowballStemmer
DutchStemmer = SnowballStemmer("dutch")
DutchStemmer.stem("beginnen")
'begin'
porter.stem('Today is a wonderful day!')
'today is a wonderful day!'
tokens = word_tokenize('Today is a wonderful day!')
stemmed_tokens = [porter.stem(token) for token in tokens]
stemmed_tokens
['today', 'is', 'a', 'wonder', 'day', '!']
from nltk.stem import WordNetLemmatizer
WNlemmatizer = WordNetLemmatizer()
WNlemmatizer.lemmatize('wonderful', pos='a')
'wonderful'
Sentiment Analysis in Python