Pemrosesan Bahasa Alami dengan spaCy
Azadeh Mobasher
Principal Data Scientist
spaCy menawarkan alternatif yang lebih mudah dibaca dan siap produksi: kelas Matcher.
import spacy from spacy.matcher import Matchernlp = spacy.load("en_core_web_sm") doc = nlp("Good morning, this is our first day on campus.")matcher = Matcher(nlp.vocab)
pattern = [{"LOWER": "good"}, {"LOWER": "morning"}]matcher.add("morning_greeting", [pattern])matches = matcher(doc) for match_id, start, end in matches: print("Start token: ", start, " | End token: ", end, "| Matched text: ", doc[start:end].text)
>>> Start token: 0 | End token: 2 | Matched text: Good morning
in, not in Python dan operator perbandingan
| Attribute | Value type | Description |
|---|---|---|
IN |
any type | Nilai atribut adalah anggota dari daftar |
NOT_IN |
any type | Nilai atribut bukan anggota dari daftar |
==, >=, <=, >, < |
int, float | Operator perbandingan untuk cek kesetaraan atau ketidaksamaan |
IN untuk mencocokkan good morning dan good eveningdoc = nlp("Good morning and good evening.")
matcher = Matcher(nlp.vocab)
pattern = [{"LOWER": "good"}, {"LOWER": {"IN": ["morning", "evening"]}}]
matcher.add("morning_greeting", [pattern])
matches = matcher(doc)
INfor match_id, start, end in matches:
print("Start token: ", start, " | End token: ", end,
"| Matched text: ", doc[start:end].text)
>>> Start token: 0 | End token: 2 | Matched text: Good morning
Start token: 3 | End token: 5 | Matched text: good evening
PhraseMatcher mencocokkan daftar frasa panjang dalam sebuah teks.
from spacy.matcher import PhraseMatcher
nlp = spacy.load("en_core_web_sm")
matcher = PhraseMatcher(nlp.vocab)
terms = ["Bill Gates", "John Smith"]
patterns = [nlp.make_doc(term) for term in terms] matcher.add("PeopleOfInterest", patterns)doc = nlp("Bill Gates met John Smith for an important discussion regarding importance of AI.")matches = matcher(doc) for match_id, start, end in matches: print("Start token: ", start, " | End token: ", end, "| Matched text: ", doc[start:end].text)
>>> Start token: 0 | End token: 2 | Matched text: Bill Gates
Start token: 3 | End token: 5 | Matched text: John Smith
attr pada kelas PhraseMatchermatcher = PhraseMatcher(nlp.vocab, attr = "LOWER")terms = ["Government", "Investment"] patterns = [nlp.make_doc(term) for term in terms] matcher.add("InvestmentTerms", patterns) doc = nlp("It was interesting to the investment division of the government.")
matcher = PhraseMatcher(nlp.vocab, attr = "SHAPE")terms = ["110.0.0.0", "101.243.0.0"] patterns = [nlp.make_doc(term) for term in terms] matcher.add("IPAddresses", patterns) doc = nlp("The tracked IP address was 234.135.0.0.")
Pemrosesan Bahasa Alami dengan spaCy