Pemrosesan Bahasa Lisan dengan Python
Daniel Bourke
Machine Learning Engineer/YouTube Creator
# Instal spaCy
$ pip install spacy
# Unduh model bahasa spaCy
$ python -m spacy download en_core_web_sm
import spacy# Muat model bahasa spaCy nlp = spacy.load("en_core_web_sm")
# Buat doc spaCy
doc = nlp("I'd like to talk about a smartphone I ordered on July 31st from your
Sydney store, my order number is 40939440. I spoke to Georgia about it last week.")
# Tampilkan token dan posisinya
for token in doc:
print(token.text, token.idx)
I 0
'd 1
like 4
to 9
talk 12
about 17
a 23
smartphone 25...
# Tampilkan kalimat dalam doc
for sentences in doc.sents:
print(sentence)
I'd like to talk about a smartphone I ordered on July 31st from your Sydney store,
my order number is 4093829.
I spoke to one of your customer service team, Georgia, yesterday.
Beberapa entitas bernama bawaan spaCy:
# Temukan entitas bernama dalam doc
for entity in doc.ents:
print(entity.text, entity.label_)
July 31st DATE
Sydney GPE
4093829 CARDINAL
one CARDINAL
Georgia GPE
yesterday DATE
# Impor kelas EntityRuler
from spacy.pipeline import EntityRuler
# Periksa pipeline spaCy
print(nlp.pipeline)
[('tagger', <spacy.pipeline.pipes.Tagger at 0x1c3aa8a470>),
('parser', <spacy.pipeline.pipes.DependencyParser at 0x1c3bb60588>),
('ner', <spacy.pipeline.pipes.EntityRecognizer at 0x1c3bb605e8>)]
# Buat instance EntityRuler
ruler = EntityRuler(nlp)
# Tambahkan pola token ke ruler
ruler.add_patterns([{"label":"PRODUCT", "pattern": "smartphone"}])
# Tambahkan aturan baru ke pipeline sebelum ner
nlp.add_pipe(ruler, before="ner")
# Periksa pipeline yang diperbarui
nlp.pipeline
[('tagger', <spacy.pipeline.pipes.Tagger at 0x1c1f9c9b38>),
('parser', <spacy.pipeline.pipes.DependencyParser at 0x1c3c9cba08>),
('entity_ruler', <spacy.pipeline.entityruler.EntityRuler at 0x1c1d834b70>),
('ner', <spacy.pipeline.pipes.EntityRecognizer at 0x1c3c9cba68>)]
# Uji aturan entitas baru
for entity in doc.ents:
print(entity.text, entity.label_)
smartphone PRODUCT
July 31st DATE
Sydney GPE
4093829 CARDINAL
one CARDINAL
Georgia GPE
yesterday DATE
Pemrosesan Bahasa Lisan dengan Python