Geavanceerde NLP met spaCy
Ines Montani
spaCy core developer
nlp.pipeDoc-objecten opnlp per tekst aanroepenSLECHT:
docs = [nlp(text) for text in LOTS_OF_TEXTS]
GOED:
docs = list(nlp.pipe(LOTS_OF_TEXTS))
as_tuples=True op nlp.pipe laat je (text, context)-tuples doorgeven(doc, context)-tuples opdoc te koppelendata = [ ('This is a text', {'id': 1, 'page_number': 15}), ('And another text', {'id': 2, 'page_number': 16}), ]for doc, context in nlp.pipe(data, as_tuples=True): print(doc.text, context['page_number'])
This is a text 15
And another text 16
from spacy.tokens import Doc Doc.set_extension('id', default=None) Doc.set_extension('page_number', default=None)data = [ ('This is a text', {'id': 1, 'page_number': 15}), ('And another text', {'id': 2, 'page_number': 16}), ] for doc, context in nlp.pipe(data, as_tuples=True): doc._.id = context['id'] doc._.page_number = context['page_number']

nlp.make_doc om tekst om te zetten naar een Doc-objectSLECHT:
doc = nlp("Hello world")
GOED:
doc = nlp.make_doc("Hello world!")
nlp.disable_pipes om tijdelijk één of meer pipes uit te zetten# Disable tagger and parser
with nlp.disable_pipes('tagger', 'parser'):
# Process the text and print the entities
doc = nlp(text)
print(doc.ents)
with-blokGeavanceerde NLP met spaCy