Natural Language Processing with spaCy
Azadeh Mobasher
Principal Data Scientist
Pros:
Cons:
re
.re
package is to define a pattern
.
import re
pattern = r"((\d){3}-(\d){3}-(\d){4})"
text = "Our phone number is 832-123-5555 and their phone number is 425-123-4567."
.finditer()
method from re
packageiter_matches = re.finditer(pattern, text)
for match in iter_matches: start_char = match.start() end_char = match.end()
print ("Start character: ", start_char, "| End character: ", end_char, "| Matching text: ", text[start_char:end_char])
>>> Start character: 20 | End character: 32 | Matching text: 832-123-5555
Start character: 59 | End character: 71 | Matching text: 425-123-4567
Matcher
, PhraseMatcher
and EntityRuler
.text = "Our phone number is 832-123-5555 and their phone number is 425-123-4567."
nlp = spacy.blank("en") patterns = [{"label": "PHONE_NUMBER", "pattern": [{"SHAPE": "ddd"}, {"ORTH": "-"}, {"SHAPE": "ddd"}, {"ORTH": "-"}, {"SHAPE": "dddd"}]}]
ruler = nlp.add_pipe("entity_ruler") ruler.add_patterns(patterns) doc = nlp(text) print ([(ent.text, ent.label_) for ent in doc.ents])
>>> [('832-123-5555', 'PHONE_NUMBER'), ('425-123-4567', 'PHONE_NUMBER')]
Natural Language Processing with spaCy