Struktur Data: Vocab, Lexeme, dan StringStore

NLP Lanjutan dengan spaCy

Ines Montani

spaCy core developer

Vocab bersama dan string store (1)

coffee_hash = nlp.vocab.strings['coffee']
coffee_string = nlp.vocab.strings[coffee_hash]

# Error jika string belum pernah terlihat
string = nlp.vocab.strings[3197928453018144401]

doc = nlp("I love coffee")
print('hash value:', nlp.vocab.strings['coffee'])

print('string value:', nlp.vocab.strings[3197928453018144401])

hash value: 3197928453018144401

string value: coffee

doc = nlp("I love coffee")
print('hash value:', doc.vocab.strings['coffee'])

hash value: 3197928453018144401

doc = nlp("I love coffee")
lexeme = nlp.vocab['coffee']

# cetak atribut leksikal
print(lexeme.text, lexeme.orth, lexeme.is_alpha)

coffee 3197928453018144401 True

Berisi informasi kata yang tidak bergantung konteks
- Teks kata: lexeme.text dan lexeme.orth (hash)
- Atribut leksikal seperti lexeme.is_alpha
- Bukan tag kelas kata, dependensi, atau label entitas yang bergantung konteks

Ilustrasi kata "I", "love" dan "coffee" pada Doc, Vocab, dan StringStore

NLP Lanjutan dengan spaCy