Feature Engineering voor NLP in Python
Rounak Banik
Data Scientist
'I am happy'
'I am joyous'
'I am sad'
King-Queen → Man-WomanFrance-Paris → Russia-Moscowimport spacy
# Model laden en Doc-object maken
nlp = spacy.load('en_core_web_lg')
doc = nlp('I am happy')
# Word vectors genereren voor elk token
for token in doc:
print(token.vector)
[-1.0747459e+00 4.8677087e-02 5.6630421e+00 1.6680446e+00
-1.3194644e+00 -1.5142369e+00 1.1940931e+00 -3.0168812e+00
...
doc = nlp("happy joyous sad")
for token1 in doc:
for token2 in doc:
print(token1.text, token2.text, token1.similarity(token2))
happy happy 1.0
happy joyous 0.63244456
happy sad 0.37338886
joyous happy 0.63244456
joyous joyous 1.0
joyous sad 0.5340932
...
# Doc-objecten genereren
sent1 = nlp("I am happy")
sent2 = nlp("I am sad")
sent3 = nlp("I am joyous")
# Overeenkomst tussen sent1 en sent2 berekenen
sent1.similarity(sent2)
0.9273363837282105
# Overeenkomst tussen sent1 en sent3 berekenen
sent1.similarity(sent3)
0.9403554938594568
Feature Engineering voor NLP in Python