Feature Engineering for NLP in Python
Rounak Banik
Data Scientist
review | label |
---|---|
'The movie was good and not boring' |
positive |
'The movie was not good and boring' |
negative |
'for you a thousand times over'
[
'for you',
'you a',
'a thousand',
'thousand times',
'times over'
]
'for you a thousand times over'
[
'for you a',
'you a thousand',
'a thousand times',
'thousand times over'
]
Generates only bigrams.
bigrams = CountVectorizer(ngram_range=(2,2))
Generates unigrams, bigrams and trigrams.
ngrams = CountVectorizer(ngram_range=(1,3))
Feature Engineering for NLP in Python