Categorical palettes

Improving Your Data Visualizations in Python

Nick Strayer

Instructor

three panels containing countries, cities, and birds as examples of categorical data

Improving Your Data Visualizations in Python

Limits in perception

  • Try and limit to 10 or fewer categories
  • Keep color-blindness in mind
sns.palplot(sns.color_palette('Set2', 11))

A bunch of colors that are hard to tell apart due to how many there are

Improving Your Data Visualizations in Python
# Assign a new column to dataframe the desired combos
pollution['interesting cities'] = [x if x in ['Long Beach', 'Cincinnati'] 
                                   else 'other' for x in pollution['city'] ]

sns.scatterplot(x="NO2", y="SO2", hue = 'interesting cities', palette='Set2',
                data=pollution.query('year == 2014 & month == 12'))

scatter plot with Long Beach and Cincinnati given distinct point colors and all other cities lumped together into an other color

Improving Your Data Visualizations in Python
colorbrewer_palettes = ['Set1',   'Set2',    'Set3',    'Accent', 
                        'Paired', 'Pastel1', 'Pastel2', 'Dark2']

for pal in colorbrewer_palettes: 
    sns.palplot(pal=sns.color_palette(pal))
    plt.title(pal, loc = 'left')

A series of available categorical color palettes

Improving Your Data Visualizations in Python

Ordinal data (a)

  • Has order between classes

  • A set number of distinct classes

Diagram showing four quartiles

Improving Your Data Visualizations in Python

Ordinal data (b)

  • Has order between classes

  • A set number of distinct classes

Diagram showing the seven days of the week

Improving Your Data Visualizations in Python

Ordinal data (c)

  • Has order between classes

  • A set number of distinct classes

Diagram showing scale of happy to sad using emoji faces

Improving Your Data Visualizations in Python
colorbrewer_palettes = ['Reds', 'Blues', 'YlOrBr', 'PuBuGn', 'GnBu', 'Greys']

for i, pal in enumerate(colorbrewer_palettes): 
    sns.palplot(pal=sns.color_palette(pal, n_colors=i+4))

A series of available ordinal palettes

Improving Your Data Visualizations in Python
# Make a tertials column using qcut()
pollution['NO2 Tertial'] = pd.qcut(pollution['NO2'], 3, labels = False)

# Plot colored by the computer tertials 
sns.scatterplot(x="CO", y="SO2", hue='NO2 Tertial', palette="OrRd",
                data=pollution.query("city == 'Long Beach' & year == 2014"))

Scatterplot of CO and SO2 values that encodes the tertials of NO2 in a red ordinal color scale

Improving Your Data Visualizations in Python

Let's color some categories

Improving Your Data Visualizations in Python

Preparing Video For Download...