Tuning Llama 3 parameters

Working with Llama 3

Imtihan Ahmed

Machine Learning Engineer

What are parameters for?

from llama_cpp import Llama
llm = Llama(model_path="path/to/model.gguf")
output = llm("What are some ways to improve customer retention?")

Control the quality, randomness, and length

Working with Llama 3

What are parameters for?

  • Example: generating product descriptions

professional_site.png

→ Should be factual and concise

 

Lifesyle-oriented site

 

→ Should be engaging and creative

Working with Llama 3

Llama 3 decoding parameters

$$

  • Adjust Llama's behavior

$$

  • Use decoding parameters to match different tones

$$

  • Transform raw output into readable text

Adusting Llama behavior with parameters

Working with Llama 3

Llama 3 decoding parameters

$$

  • Temperature: controls randomness
  • Top-K: limits token selection to the most probable choices
  • Top-P: adjusts token selection based on cumulative probability
  • Max tokens: limits response length

$$

Screenshot 2025-02-26 at 12.51.27.png

Working with Llama 3

Temperature

  • Values usually between 0 and 1

  • Low temperature (e.g., close to 0):

    • More predictable response
      A smartwatch with a heart rate monitor, GPS, and a long-lasting battery 
      for all-day tracking.
      
  • High temperature (e.g., close to 1):
    • More creative response
      Your personal fitness coach on your wrist - track every heartbeat, every step, 
      and every adventure without limits.
      
Working with Llama 3

Top-k

  • Limits how many of the most likely words Llama can choose from

  • Low k value (e.g., 1):

    • More predictable response
      Track fitness, stream music, and receive notifications with our sleek smartwatch.
      
  • High k value (e.g., 50):

    • More diverse response
      Experience the future with our cutting-edge smartwatch, featuring fitness tracking, music streaming, customizable notifications, personalized insights, and seamless smartphone integration always.
      
Working with Llama 3

Top-p

  • Controls the choice of output words based on confidence

  • High top-p value (e.g., close to 1):

    • More varied responses
      Stay connected with our sleek smartwatch, featuring fitness tracking,
      music, and customizable notifications, perfect for fitness 
      enthusiasts and busy professionals.
      
  • Low top-p value (e.g., close to 0):

    • Less variation
      Smartwatch with fitness tracking and music control, perfect for workouts.
      
Working with Llama 3

Max tokens

  • Used to limit response length
  • The count of tokens - units of words - in the response

  • Low max_tokens value:

    Stay connected with our sleek smartwatch, featuring fitness tracking 
    and music control.
    
  • High max_tokens value:
    Stay connected with our sleek smartwatch, featuring fitness tracking, 
    music control, customizable notifications, and seamless smartphone 
    integration. Monitor your health, track your progress, and receive 
    alerts on your wrist. Perfect for fitness enthusiasts.
    
Working with Llama 3

Combining different parameters

llm = Llama(model_path="path/to/model.gguf")

output_concise = llm(
    "Describe an electric car.",

temperature=0.2,
top_k=1,
top_p=0.4,
max_tokens=20
)
A fast, eco-friendly electric car with a long range and cutting-edge technology.
Working with Llama 3

Combining different parameters

output_creative = llm(
    "Describe an electric car.",

temperature=0.8,
top_k=10,
top_p=0.9,
max_tokens=100
)
Glide into the future with an electric car that blends speed, luxury,
and sustainability. Silent yet powerful, it redefines the road ...
Working with Llama 3

Let's practice!

Working with Llama 3

Preparing Video For Download...