Feature Engineering for NLP in Python
Rounak Banik
Data Scientist
Dogs
, dog
reduction
, REDUCING
, Reduce
don't
, do not
won't
, will not
"I have a dog. His name is Hachi."
Tokens:
["I", "have", "a", "dog", ".", "His", "name", "is", "Hachi", "."]
"Don't do this."
Tokens:
["Do", "n't", "do", "this", "."]
import spacy
# Load the en_core_web_sm model nlp = spacy.load('en_core_web_sm')
# Initiliaze string string = "Hello! I don't know what I'm doing here."
# Create a Doc object doc = nlp(string)
# Generate list of tokens tokens = [token.text for token in doc] print(tokens)
['Hello','!','I','do',"n't",'know','what','I',"'m",'doing','here','.']
reducing
, reduces
, reduced
, reduction
→ reduce
am
, are
, is
→ be
n't
→ not
've
→ have
import spacy # Load the en_core_web_sm model nlp = spacy.load('en_core_web_sm') # Initiliaze string string = "Hello! I don't know what I'm doing here." # Create a Doc object doc = nlp(string)
# Generate list of lemmas lemmas = [token.lemma_ for token in doc] print(lemmas)
['hello','!','-PRON-','do','not','know','what','-PRON','be','do','here', '.']
Feature Engineering for NLP in Python