Pemrosesan Bahasa Alami dengan spaCy
Azadeh Mobasher
Principal Data Scientist
| POS | Deskripsi | Contoh |
|---|---|---|
| VERB | Verba | run, eat, ate, take |
| NOUN | Nomina | man, airplane, tree, flower |
| ADJ | Adjektiva | big, old, incompatible, conflicting |
| ADV | Adverbia | very, down, there, tomorrow |
| CONJ | Konjungsi | and, or, but |
spaCy menyimpan tag POS di fitur pos_ pada pipeline nlpspacy.explain() menjelaskan tag POS tertentuverb_sent = "I watch TV."print([(token.text, token.pos_, spacy.explain(token.pos_)) for token in nlp(verb_sent)])
[('I', 'PRON', 'pronoun'),
('watch', 'VERB', 'verb'),
('TV', 'NOUN', 'noun'),
('.', 'PUNCT', 'punctuation')]
noun_sent = "I left without my watch."print([(token.text, token.pos_, spacy.explain(token.pos_)) for token in nlp(noun_sent)])
[('I', 'PRON', 'pronoun'),
('left', 'VERB', 'verb'),
('without', 'ADP', 'adposition'),
('my', 'PRON', 'pronoun'),
('watch', 'NOUN', 'noun'),
('.', 'PUNCT', 'punctuation')]
| Jenis entitas | Deskripsi |
|---|---|
| PERSON | Orang bernama atau keluarga |
| ORG | Perusahaan, institusi, dll. |
| GPE | Entitas geo-politik: negara, kota, dll. |
| LOC | Lokasi non-GPE, pegunungan, dll. |
| DATE | Tanggal/periode absolut atau relatif |
| TIME | Waktu kurang dari satu hari |
spaCy mengekstrak entitas bernama dengan komponen pipeline NERdoc.entsspaCy juga memberi label entitas (.label_)
import spacy nlp = spacy.load("en_core_web_sm") text = "Albert Einstein was genius." doc = nlp(text)print([(ent.text, ent.start_char, ent.end_char, ent.label_) for ent in doc.ents])
>>> [('Albert Einstein', 0, 15, 'PERSON')]
Doc
import spacy nlp = spacy.load("en_core_web_sm") text = "Albert Einstein was genius." doc = nlp(text)print([(token.text, token.ent_type_) for token in doc])
>>> [('Albert', 'PERSON'), ('Einstein', 'PERSON'),
('was', ''), ('genius', ''), ('.', '')]
spaCy dilengkapi visualizer modern: displaCydisplaCy menyorot entitas bernama dan labelnyaimport spacy from spacy import displacy text = "Albert Einstein was genius." nlp = spacy.load("en_core_web_sm") doc = nlp(text)displacy.serve(doc, style="ent")
Pemrosesan Bahasa Alami dengan spaCy