Natural Language Processing with spaCy
Azadeh Mobasher
Principal Data Scientist
Pros:
Cons:
re.re package is to define a pattern.
import re
pattern = r"((\d){3}-(\d){3}-(\d){4})"
text = "Our phone number is 832-123-5555 and their phone number is 425-123-4567."
.finditer() method from re packageiter_matches = re.finditer(pattern, text)for match in iter_matches: start_char = match.start() end_char = match.end()print ("Start character: ", start_char, "| End character: ", end_char, "| Matching text: ", text[start_char:end_char])
>>> Start character: 20 | End character: 32 | Matching text: 832-123-5555
Start character: 59 | End character: 71 | Matching text: 425-123-4567
Matcher, PhraseMatcher and EntityRuler.text = "Our phone number is 832-123-5555 and their phone number is 425-123-4567."nlp = spacy.blank("en") patterns = [{"label": "PHONE_NUMBER", "pattern": [{"SHAPE": "ddd"}, {"ORTH": "-"}, {"SHAPE": "ddd"}, {"ORTH": "-"}, {"SHAPE": "dddd"}]}]ruler = nlp.add_pipe("entity_ruler") ruler.add_patterns(patterns) doc = nlp(text) print ([(ent.text, ent.label_) for ent in doc.ents])
>>> [('832-123-5555', 'PHONE_NUMBER'), ('425-123-4567', 'PHONE_NUMBER')]
Natural Language Processing with spaCy