Reinforcement Learning from Human Feedback (RLHF)
Mina Parham
AI Engineer
Preference data preference_df
with sources 'Journalist'
, 'Social Media Influencer'
, and 'Marketing Professional'
:
This sample data could easily be integrated by grouping by 'id':
df_majority = preference_df.groupby(['id']).apply(majority_vote)
Then, using majority voting:
from collections import Counter
def majority_vote(df):
votes = Counter(zip(df['chosen'], df['rejected']))
return max(votes, key=votes.get)
Preference data preference_df2
with the same three experts:
preference_df2
to identify unreliable sources:df_majority = preference_df2.groupby('id').apply(majority_vote)
disagreements = {source: 0 for source in preference_df2['source'].unique()}
for _, row in preference_df2.iterrows(): if (row['chosen'], row['rejected']) != df_majority[row['id']]: disagreements[row['source']] += 1
detect_unreliable_source = max(disagreements, key=disagreements.get)
Reinforcement Learning from Human Feedback (RLHF)