Machine Translation with Keras
Thushan Ganegedara
Data Scientist and Author
new jersey is sometimes quiet during autumn , and it is snowy in april .
the united states is usually chilly during july , and it is usually freezing ...
california is usually quiet during march , and it is usually hot in june .
new jersey est parfois calme pendant l' automne , et il est neigeux en avril .
les états-unis est généralement froid en juillet , et il gèle habituellement ...
california est généralement calme en mars , et il est généralement chaud en juin .
A mapping containing words and their corresponding indices
word2index = {"I":0, "like": 1, "cats": 2}
Converting words to IDs or indices
words = ["I", "like", "cats"]
word_ids = [word2index[w] for w in words]
print(word_ids)
[0, 1, 2]
One-hot encoding without specifying output vector length
onehot_1 = to_categorical(word_ids)
print([(w,ohe.tolist()) for w,ohe in zip(words, onehot_1)])
[('I', [1.0, 0.0, 0.0]), ('like', [0.0, 1.0, 0.0]), ('cats', [0.0, 0.0, 1.0])]
One-hot encoding with specifying output vector length
onehot_2 = to_categorical(word_ids, num_classes=5)
print([(w,ohe.tolist()) for w,ohe in zip(words, onehot_2)])
[('I', [1.0, 0.0, 0.0, 0.0, 0.0]), ('like', [0.0, 1.0, 0.0, 0.0, 0.0]),
('cats', [0.0, 0.0, 1.0, 0.0, 0.0])]
Machine Translation with Keras