Tuning Llama 3 parameters

Working with Llama 3

Imtihan Ahmed

Machine Learning Engineer

What are parameters for?

from llama_cpp import Llama
llm = Llama(model_path="path/to/model.gguf")
output = llm("What are some ways to improve customer retention?")

Control the quality, randomness, and length

What are parameters for?

Example: generating product descriptions

→ Should be factual and concise

Lifesyle-oriented site

→ Should be engaging and creative

Llama 3 decoding parameters

Adjust Llama's behavior

Use decoding parameters to match different tones

Transform raw output into readable text

Adusting Llama behavior with parameters

Llama 3 decoding parameters

Temperature: controls randomness
Top-K: limits token selection to the most probable choices
Top-P: adjusts token selection based on cumulative probability
Max tokens: limits response length

Screenshot 2025-02-26 at 12.51.27.png

Temperature

Values usually between 0 and 1

Low temperature (e.g., close to 0):

More predictable response

A smartwatch with a heart rate monitor, GPS, and a long-lasting battery 
for all-day tracking.

High temperature (e.g., close to 1):

More creative response

Your personal fitness coach on your wrist - track every heartbeat, every step, 
and every adventure without limits.

Top-k

Limits how many of the most likely words Llama can choose from

Low k value (e.g., 1):

More predictable response

Track fitness, stream music, and receive notifications with our sleek smartwatch.

High k value (e.g., 50):

More diverse response

Experience the future with our cutting-edge smartwatch, featuring fitness tracking, music streaming, customizable notifications, personalized insights, and seamless smartphone integration always.

Top-p

Controls the choice of output words based on confidence

High top-p value (e.g., close to 1):

More varied responses

Stay connected with our sleek smartwatch, featuring fitness tracking,
music, and customizable notifications, perfect for fitness 
enthusiasts and busy professionals.

Low top-p value (e.g., close to 0):

Less variation

Smartwatch with fitness tracking and music control, perfect for workouts.

Max tokens

Used to limit response length
The count of tokens - units of words - in the response

Low max_tokens value:

Stay connected with our sleek smartwatch, featuring fitness tracking 
and music control.

High max_tokens value:

Stay connected with our sleek smartwatch, featuring fitness tracking, 
music control, customizable notifications, and seamless smartphone 
integration. Monitor your health, track your progress, and receive 
alerts on your wrist. Perfect for fitness enthusiasts.

Combining different parameters

llm = Llama(model_path="path/to/model.gguf")

output_concise = llm(
    "Describe an electric car.",

    temperature=0.2,

    top_k=1,

    top_p=0.4,

    max_tokens=20

)

A fast, eco-friendly electric car with a long range and cutting-edge technology.

Combining different parameters

output_creative = llm(
    "Describe an electric car.",

    temperature=0.8,

    top_k=10,

    top_p=0.9,

    max_tokens=100

)

Glide into the future with an electric car that blends speed, luxury,
and sustainability. Silent yet powerful, it redefines the road ...

Let's practice!

Working with Llama 3