Reinforcement Learning from Human Feedback (RLHF)
Mina Parham
AI Engineer





Preferentiedata preference_df met bronnen 'Journalist', 'Social Media Influencer' en 'Marketing Professional':

Deze voorbeelddata kun je eenvoudig integreren door te groeperen op 'id':
df_majority = preference_df.groupby(['id']).apply(majority_vote)
Vervolgens met meerderheid van stemmen:
from collections import Counter
def majority_vote(df):
votes = Counter(zip(df['chosen'], df['rejected']))
return max(votes, key=votes.get)
Preferentiedata preference_df2 met dezelfde drie experts:

preference_df2 om onbetrouwbare bronnen te vinden:df_majority = preference_df2.groupby('id').apply(majority_vote)disagreements = {source: 0 for source in preference_df2['source'].unique()}for _, row in preference_df2.iterrows(): if (row['chosen'], row['rejected']) != df_majority[row['id']]: disagreements[row['source']] += 1detect_unreliable_source = max(disagreements, key=disagreements.get)
Reinforcement Learning from Human Feedback (RLHF)