Corrispondenze basate su regole

NLP avanzato con spaCy

Ines Montani

spaCy core developer

Perché non solo le espressioni regolari?

Lavora su oggetti Doc, non solo stringhe
Confronta token e loro attributi
Usa le previsioni del modello
Esempio: "duck" (verbo) vs. "duck" (sostantivo)

Pattern di corrispondenza

Liste di dizionari, uno per token
Confronta testi dei token esatti
```
[{'ORTH': 'iPhone'}, {'ORTH': 'X'}]
```
Confronta attributi lessicali
```
[{'LOWER': 'iphone'}, {'LOWER': 'x'}]
```
Confronta qualsiasi attributo dei token
```
[{'LEMMA': 'buy'}, {'POS': 'NOUN'}]
```

Uso del Matcher (1)

import spacy

# Import the Matcher
from spacy.matcher import Matcher

# Load a model and create the nlp object
nlp = spacy.load('en_core_web_sm')

# Initialize the matcher with the shared vocab
matcher = Matcher(nlp.vocab)

# Add the pattern to the matcher
pattern = [{'ORTH': 'iPhone'}, {'ORTH': 'X'}]
matcher.add('IPHONE_PATTERN', None, pattern)

# Process some text
doc = nlp("New iPhone X release date leaked")

# Call the matcher on the doc
matches = matcher(doc)

Uso del Matcher (2)

# Call the matcher on the doc
doc = nlp("New iPhone X release date leaked")
matches = matcher(doc)

# Iterate over the matches
for match_id, start, end in matches:

    # Get the matched span
    matched_span = doc[start:end]
    print(matched_span.text)

iPhone X

match_id: hash del nome del pattern
start: indice di inizio dello span
end: indice di fine dello span

Confrontare attributi lessicali

pattern = [
    {'IS_DIGIT': True},
    {'LOWER': 'fifa'},
    {'LOWER': 'world'},
    {'LOWER': 'cup'},
    {'IS_PUNCT': True}
]

doc = nlp("2018 FIFA World Cup: France won!")

2018 FIFA World Cup:

Confrontare altri attributi dei token

pattern = [
    {'LEMMA': 'love', 'POS': 'VERB'},
    {'POS': 'NOUN'}
]

doc = nlp("I loved dogs but now I love cats more.")

loved dogs
love cats

Operatori e quantificatori (1)

pattern = [
    {'LEMMA': 'buy'},
    {'POS': 'DET', 'OP': '?'},  # opzionale: corrisponde 0 o 1 volta
    {'POS': 'NOUN'}
]

doc = nlp("I bought a smartphone. Now I'm buying apps.")

bought a smartphone
buying apps

Operatori e quantificatori (2)

	Descrizione
`{'OP': '!'}`	Negazione: 0 volte
`{'OP': '?'}`	Opzionale: 0 o 1 volta
`{'OP': '+'}`	1 o più volte
`{'OP': '*'}`	0 o più volte

Ayo berlatih!

NLP avanzato con spaCy