Incorporating diverse feedback sources

Reinforcement Learning from Human Feedback (RLHF)

Mina Parham

AI Engineer

Improved model generalization

Represent different viewpoints and contexts
Generalizes preferences and values

An image of hands with different callout bubbles representing diverse opinions.

Reduced bias

Mitigates individual biases
Creates a more balanced and fair model output

A bar chart showing the difference between a dataset biased towards a male group, and a balanced dataset with equal distribution for men and women

Better alignment with human values

Complex human preferences
Cultures and backgrounds represented

Icons representing a diverse group of people

Enhanced adaptability

Model responds to wider range of user needs and preferences
Represent different viewpoints

An icon showing a person with emojis indicating different viewpoints.

Increased robustness

Resilient to different types of inputs
Improving its performance

A diagram showing improved quality thanks to different inputs and contexts.

Integrating preference data from multiple sources

Preference data preference_df with sources 'Journalist', 'Social Media Influencer', and 'Marketing Professional':

A table showing structured data from three different sources

Majority voting

This sample data could easily be integrated by grouping by 'id':

df_majority = preference_df.groupby(['id']).apply(majority_vote)

Then, using majority voting:

from collections import Counter

def majority_vote(df):
    votes = Counter(zip(df['chosen'], df['rejected'])) 
    return max(votes, key=votes.get)

Unreliable preference data sources

Preference data preference_df2 with the same three experts:

A table showing structured data from three different sources

Unreliable preference data sources

Iterating over the rows of preference_df2 to identify unreliable sources:

df_majority = preference_df2.groupby('id').apply(majority_vote)

disagreements = {source: 0 for source in preference_df2['source'].unique()}


for _, row in preference_df2.iterrows():
    if (row['chosen'], row['rejected']) != df_majority[row['id']]:
            disagreements[row['source']] += 1


detect_unreliable_source = max(disagreements, key=disagreements.get)

Let's practice!

Reinforcement Learning from Human Feedback (RLHF)