Building blocks to train LLMs

Large Language Models (LLMs) Concepts

Vidhi Chugh

AI strategist and ethicist

Where are we?

Image showing learning progress as Pre-training

Large Language Models (LLMs) Concepts

Generative pre-training

 

  • Trained using generative pre-training

    • Input data of text tokens
    • Trained to predict the tokens within the dataset

 

  • Types:
    • Next word prediction
    • Masked language modeling
Large Language Models (LLMs) Concepts

Next word prediction

  • Supervised learning technique
    • Model trained on input-output pairs

 

  • Predicts next word and generates coherent text
  • Captures the dependencies between words

 

  • Training Data
    • Pairs of input and output examples

Auto-suggestion by a search engine

Large Language Models (LLMs) Concepts

Training data for next word prediction

Input

The quick brown

The quick brown fox

The quick brown fox jumps

The quick brown fox jumps over

The quick brown fox jumps over the

The quick brown fox jumps over the lazy

The quick brown fox jumps over the lazy dog.

Output

fox

jumps

over

the

lazy

dog

Large Language Models (LLMs) Concepts

Which word relates more with pizza?

 

  • More examples = better prediction

 

  • For example:
    • I love to eat pizza with _ _ _ _ _ _

 

  • Cheese is more related with pizza than anything else

Probabilities of different words association with the word "Pizza"

Large Language Models (LLMs) Concepts

Masked language modeling

  • Hides a selective word

  • Trained model predicts the masked word

 

  • Original Text: "The quick brown fox jumps over the lazy dog."

  • Masked Text: "The quick [MASK] fox jumps over the lazy dog."

 

  • Objective: predict the missing word

  • Based on learnings from training data

Large Language Models (LLMs) Concepts

Let's practice!

Large Language Models (LLMs) Concepts

Preparing Video For Download...