Linguistic features

Natural Language Processing with spaCy

Azadeh Mobasher

Principal Data Scientist

POS tagging

  • POS tags depend on the context, surrounding words and their tags
import spacy
nlp = spacy.load("en_core_web_sm")
text = "My cat will fish for a fish tomorrrow in a fishy way."
print([(token.text, token.pos_, spacy.explain(token.pos_)) 
        for token in nlp(text)])

POS tagging example

Natural Language Processing with spaCy

What is the importance of POS?

 

  • Better accuracy for many NLP tasks

 

I will fish tomorrow.

I ate fish.

 

  • Translation system use case

 

verb -> pescaré

noun -> pescado
Natural Language Processing with spaCy

What is the importance of POS?

 

  • Word-sense disambiguation (WSD) is the problem of deciding in which sense a word is used in a sentence.
  • Determining the sense of the word can be crucial in machine translation, etc.

WSD Example

Natural Language Processing with spaCy

Word-sense disambiguation

import spacy
nlp = spacy.load("en_core_web_sm")

verb_text = "I will fish tomorrow."
noun_text = "I ate fish."


print([(token.text, token.pos_) for token in nlp(verb_text) if "fish" in token.text], "\n") print([(token.text, token.pos_) for token in nlp(noun_text) if "fish" in token.text])
[('fish', 'VERB', 'verb')] 
[('fish', 'NOUN', 'noun')]
Natural Language Processing with spaCy

Dependency parsing

  • Explores a sentence syntax
  • Links between two tokens
  • Results in a tree

Example of dependency parsing

Natural Language Processing with spaCy

Dependency parsing and spaCy

 

  • Dependency label describes the type of syntactic relation between two tokens

 

Dependency label Description
nsubj Nominal subject
root Root
det Determiner
dobj Direct object
aux Auxiliary
Natural Language Processing with spaCy

Dependency parsing and displaCy

  • displaCy can draw dependency trees
doc = nlp("We understand the differences.")

spacy.displacy.serve(doc, style="dep")

Example of dependency parser with displaCy

Natural Language Processing with spaCy

Dependency parsing and spaCy

  • .dep_ attribute to access the dependency label of a token

 

doc = nlp("We understand the differences.")
print([(token.text, token.dep_, spacy.explain(token.dep_)) for token in doc])
[('We', 'nsubj', 'nominal subject'), ('understand', 'ROOT', 'root'),
('the', 'det', 'determiner'), ('differences', 'dobj', 'direct object'),
('.', 'punct', 'punctuation')]
Natural Language Processing with spaCy

Let's practice!

Natural Language Processing with spaCy

Preparing Video For Download...