Pythonで学ぶSentiment Analysis
Violeta Misheva
Data Scientist
ステミングは、語幹が実在語でなくても、単語を語幹に変換する処理です。
staying, stays, stayed ----> stay
house, houses, housing ----> hous
レンマ化はステミングに似ていますが、言語として有効な語(基本形)にまで縮約します。
stay, stays, staying, stayed ----> stay
house, houses, housing ----> house
ステミング
レンマ化
from nltk.stem import PorterStemmer
porter = PorterStemmer()
porter.stem('wonderful')
'wonder'
Snowball Stemmer: Danish, Dutch, English, Finnish, French, German, Hungarian,Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish
from nltk.stem.snowball import SnowballStemmer
DutchStemmer = SnowballStemmer("dutch")
DutchStemmer.stem("beginnen")
'begin'
porter.stem('Today is a wonderful day!')
'today is a wonderful day!'
tokens = word_tokenize('Today is a wonderful day!')
stemmed_tokens = [porter.stem(token) for token in tokens]
stemmed_tokens
['today', 'is', 'a', 'wonder', 'day', '!']
from nltk.stem import WordNetLemmatizer
WNlemmatizer = WordNetLemmatizer()
WNlemmatizer.lemmatize('wonderful', pos='a')
'wonderful'
Pythonで学ぶSentiment Analysis