Menerapkan encoder

Penerjemahan Mesin dengan Keras

Thushan Ganegedara

Data Scientist and Author

Memahami data

Mencetak beberapa data dalam dataset

for en_sent, fr_sent in zip(en_text[:3], fr_text[:3]):
  print("English: ", en_sent)
  print("\tFrench: ", fr_sent)
English:  new jersey is sometimes quiet during autumn , and it is snowy in april .
    French:  new jersey est parfois calme pendant l' automne , et il est neigeux en avril .

English:  the united states is usually chilly during july , and it is usually freezing in november .
    French:  les états-unis est généralement froid en juillet , et il gèle habituellement en novembre .

English:  california is usually quiet during march , and it is usually hot in june .
    French:  california est généralement calme en mars , et il est généralement chaud en juin .
Penerjemahan Mesin dengan Keras

Tokenisasi kalimat

Tokenisasi

  • Proses memecah kalimat/frasa menjadi token individual (mis. kata)

Tokenisasi kata dalam kalimat

first_sent = en_text[0]
print("First sentence: ", first_sent)
first_words = first_sent.split(" ")
print("\tWords: ", first_words)
First sentence:  new jersey is sometimes quiet during autumn , and it is snowy in april .
    Words:  ['new', 'jersey', 'is', 'sometimes', 'quiet', 'during', 'autumn', ',', 
             'and', 'it', 'is', 'snowy', 'in', 'april', '.']
Penerjemahan Mesin dengan Keras

Menghitung panjang kalimat

Menghitung panjang rata-rata kalimat dan ukuran kosakata (Inggris)

sent_lengths = [len(en_sent.split(" ")) for en_sent in en_text]
mean_length = np.mean(sent_lengths)
print('(English) Mean sentence length: ', mean_length)
(English) Mean sentence length:  13.20662
Penerjemahan Mesin dengan Keras

Menghitung ukuran kosakata

all_words = []
for sent in en_text:
    all_words.extend(sent.split(" "))
vocab_size = len(set(all_words))
print("(English) Vocabulary size: ", vocab_size)
  • Objek set hanya berisi item unik tanpa duplikasi
(English) Vocabulary size:  228
Penerjemahan Mesin dengan Keras

Encoder

Encoder

Penerjemahan Mesin dengan Keras

Menerapkan encoder dengan Keras

  • Lapisan input
    en_inputs = Input(shape=(en_len, en_vocab))
    
  • Lapisan GRU
    en_gru = GRU(hsize, return_state=True)
    en_out, en_state = en_gru(en_inputs)
    
  • Model Keras
    encoder = Model(inputs=en_inputs, outputs=en_state)
    
Penerjemahan Mesin dengan Keras

Memahami ringkasan model Keras

print(encoder.summary())
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 15, 150)           0         
_________________________________________________________________
gru (GRU)                    [(None, 48), (None, 48)]  28656     
=================================================================
Total params: 28,656
Trainable params: 28,656
Non-trainable params: 0
_________________________________________________________________
Penerjemahan Mesin dengan Keras

Ayo berlatih!

Penerjemahan Mesin dengan Keras

Preparing Video For Download...