Rekayasa Fitur untuk NLP di Python
Rounak Banik
Data Scientist
| ulasan | label |
|---|---|
'The movie was good and not boring' |
positif |
'The movie was not good and boring' |
negatif |
'for you a thousand times over'
[
'for you',
'you a',
'a thousand',
'thousand times',
'times over'
]
'for you a thousand times over'
[
'for you a',
'you a thousand',
'a thousand times',
'thousand times over'
]
Hanya menghasilkan bigram.
bigrams = CountVectorizer(ngram_range=(2,2))
Menghasilkan unigram, bigram, dan trigram.
ngrams = CountVectorizer(ngram_range=(1,3))
Rekayasa Fitur untuk NLP di Python