NLP avanzato con spaCy
Ines Montani
spaCy core developer
nlp.pipeDocnlp su ogni testoSBAGLIATO:
docs = [nlp(text) for text in LOTS_OF_TEXTS]
GIUSTO:
docs = list(nlp.pipe(LOTS_OF_TEXTS))
as_tuples=True su nlp.pipe puoi passare tuple (text, context)(doc, context)docdata = [ ('This is a text', {'id': 1, 'page_number': 15}), ('And another text', {'id': 2, 'page_number': 16}), ]for doc, context in nlp.pipe(data, as_tuples=True): print(doc.text, context['page_number'])
This is a text 15
And another text 16
from spacy.tokens import Doc Doc.set_extension('id', default=None) Doc.set_extension('page_number', default=None)data = [ ('This is a text', {'id': 1, 'page_number': 15}), ('And another text', {'id': 2, 'page_number': 16}), ] for doc, context in nlp.pipe(data, as_tuples=True): doc._.id = context['id'] doc._.page_number = context['page_number']

nlp.make_doc per trasformare un testo in un oggetto DocSBAGLIATO:
doc = nlp("Hello world")
GIUSTO:
doc = nlp.make_doc("Hello world!")
nlp.disable_pipes per disattivare temporaneamente una o più pipe# Disable tagger and parser
with nlp.disable_pipes('tagger', 'parser'):
# Process the text and print the entities
doc = nlp(text)
print(doc.ents)
withNLP avanzato con spaCy