Reinforcement Learning from Human Feedback (RLHF)
Mina Parham
AI Engineer





Preference data preference_df with sources 'Journalist', 'Social Media Influencer', and 'Marketing Professional':

This sample data could easily be integrated by grouping by 'id':
df_majority = preference_df.groupby(['id']).apply(majority_vote)
Then, using majority voting:
from collections import Counter
def majority_vote(df):
votes = Counter(zip(df['chosen'], df['rejected']))
return max(votes, key=votes.get)
Preference data preference_df2 with the same three experts:

preference_df2 to identify unreliable sources:df_majority = preference_df2.groupby('id').apply(majority_vote)disagreements = {source: 0 for source in preference_df2['source'].unique()}for _, row in preference_df2.iterrows(): if (row['chosen'], row['rejected']) != df_majority[row['id']]: disagreements[row['source']] += 1detect_unreliable_source = max(disagreements, key=disagreements.get)
Reinforcement Learning from Human Feedback (RLHF)