Active learning

Reinforcement Learning from Human Feedback (RLHF)

Mina Parham

AI Engineer

Human in the loop systems

A diagram of an LLM with output evaluated by a human reviewer.

Human in the loop systems

A diagram of an LLM with a large volume of data in output evaluated by a human reviewer.

Human in the loop systems

A diagram of an LLM with a random choice of data in output evaluated by a human reviewer.

Human in the loop systems

A diagram of an LLM with actively chosen data evaluated by a human reviewer.

Active learning in RLHF

The RLHF process without the reward model part.

Active learning in RLHF

The full RLHF process

Active learning

An icon of documents representing input data.

Active learning

An icon of documents representing data going into a model.

Active learning

An icon of documents representing data going into a model, and an arrow with the label "model confident" going towards the output.

Active learning

An icon of documents representing data going into a model, an arrow with the label "model confident" going towards the output, and a parallel arrow going towards a human with labels: "model unsure" and "human reviews and corrects".

Active learning

An icon of documents representing data going into a model, an arrow with the label "model confident" going towards the output, a parallel arrow going towards a human with labels "model unsure" and "human reviews and corrects", and a prediction output.

Active learning pipeline with low confidence

from modAL.models import ActiveLearner

# Initialize learner
learner = ActiveLearner(
    estimator=LogisticRegression(),
    query_strategy=uncertainty_sampling,
    X_training=X_labeled, y_training=y_labeled
)

Uncertainty sampling: points selected where confidence is lowest

Active learning pipeline with low confidence

# Active learning loop
for _ in range(10):
    learner.teach(X_labeled, y_labeled)
    query_idx, _ = learner.query(X_unlabeled)
    X_labeled = np.vstack((X_labeled, X_unlabeled[query_idx]))
    y_labeled = np.append(y_labeled, y[query_idx])

    X_unlabeled = np.delete(X_unlabeled, query_idx, axis=0)

Let's practice!

Reinforcement Learning from Human Feedback (RLHF)