Advanced NLP with spaCy
Ines Montani
spaCy core developer
Vocab
: stores data shared across multiple documentsStringStore
via nlp.vocab.strings
coffee_hash = nlp.vocab.strings['coffee']
coffee_string = nlp.vocab.strings[coffee_hash]
# Raises an error if we haven't seen the string before
string = nlp.vocab.strings[3197928453018144401]
nlp.vocab.strings
doc = nlp("I love coffee") print('hash value:', nlp.vocab.strings['coffee'])
print('string value:', nlp.vocab.strings[3197928453018144401])
hash value: 3197928453018144401
string value: coffee
doc
also exposes the vocab and stringsdoc = nlp("I love coffee")
print('hash value:', doc.vocab.strings['coffee'])
hash value: 3197928453018144401
Lexeme
object is an entry in the vocabularydoc = nlp("I love coffee") lexeme = nlp.vocab['coffee']
# print the lexical attributes print(lexeme.text, lexeme.orth, lexeme.is_alpha)
coffee 3197928453018144401 True
lexeme.text
and lexeme.orth
(the hash)lexeme.is_alpha
Advanced NLP with spaCy