Reinforcement Learning from Human Feedback (RLHF)
Mina Parham
AI Engineer
from modAL.models import ActiveLearner
# Initialize learner learner = ActiveLearner( estimator=LogisticRegression(), query_strategy=uncertainty_sampling, X_training=X_labeled, y_training=y_labeled )
# Active learning loop for _ in range(10): learner.teach(X_labeled, y_labeled) query_idx, _ = learner.query(X_unlabeled) X_labeled = np.vstack((X_labeled, X_unlabeled[query_idx])) y_labeled = np.append(y_labeled, y[query_idx])
X_unlabeled = np.delete(X_unlabeled, query_idx, axis=0)
Reinforcement Learning from Human Feedback (RLHF)