Evaluating responses

Understanding Prompt Engineering

Alex Banks

Founder & Educator

Introduction to evaluating responses

Every tool has its limitations. ChatGPT's has a knowledge cut off. Solution: smart prompting.

1 Images source: DALLE-3
Understanding Prompt Engineering

The four cornerstones of evaluation

LARF

  • Logical consistency
  • Accuracy
  • Relevance
  • Factual correctness

Happy, smiling older person

1 Images source: DALLE-3
Understanding Prompt Engineering

Logical consistency - the coherence check

Person working on solar panel and the prompt "what are the benefits and drawbacks of solar energy"

1 Images source: DALLE-3
Understanding Prompt Engineering

Logical consistency - the coherence check

Person working on solar panel and the prompt "what are the benefits and drawbacks of solar energy"

List of benefits

1 Images source: DALLE-3
Understanding Prompt Engineering

Logical consistency - the coherence check

Person working on solar panel and the prompt "what are the benefits and drawbacks of solar energy"

List of drawbacks

1 Images source: DALLE-3
Understanding Prompt Engineering

Accuracy and the hallucination tendency

Hallucination -> confidently state an incorrect answer.  

Incorrect answer to the question "who was the first person to walk the moon"

1 Images source: DALLE-3
Understanding Prompt Engineering

Relevance - meeting the context

Relevance -> response aligns with the context and intent of the prompt.   What are the top tourist attractions in Paris

1 Images source: DALLE-3
Understanding Prompt Engineering

Relevance - meeting the context

What are the top tourist attractions in Paris with incorrect answer highlighted

1 Images source: DALLE-3
Understanding Prompt Engineering

Factual correctness beyond the cutoff date

Are universal basic income trials successful in reducing poverty? Provice your answer by only referencing and citing reliable sources.

Understanding Prompt Engineering

Factual correctness beyond the cutoff date

ChatGPT cut off dates

1 Images source: ChatGPT
Understanding Prompt Engineering

Let's practice!

Understanding Prompt Engineering

Preparing Video For Download...