Set theory

Writing Efficient Python Code

Logan Thomas

Scientific Software Technical Trainer, Enthought

Set theory

Branch of Mathematics applied to collections of objects
- i.e., sets
Python has built-in set datatype with accompanying methods:
- intersection(): all elements that are in both sets
- difference(): all elements in one set but not the other
- symmetric_difference(): all elements in exactly one set
- union(): all elements that are in either set
Fast membership testing
- Check if a value exists in a sequence or not
- Using the in operator

Comparing objects with loops

list_a = ['Bulbasaur', 'Charmander', 'Squirtle']
list_b = ['Caterpie', 'Pidgey', 'Squirtle']

alt=”The Pokémon named Bulbasaur, Charmander, and Squirtle enclosed in a box titled List A and the Pokémon Caterpie, Pidgey, and Squirtle enclosed in a separate box titled List B”

Comparing objects with loops

list_a = ['Bulbasaur', 'Charmander', 'Squirtle']
list_b = ['Caterpie', 'Pidgey', 'Squirtle']

alt=”The Pokémon named Bulbasaur, Charmander, and Squirtle enclosed in a box titled List A and the Pokémon Caterpie, Pidgey, and Squirtle enclosed in a separate box titled List B; Squirtle is circled in both boxes”

list_a = ['Bulbasaur', 'Charmander', 'Squirtle']
list_b = ['Caterpie', 'Pidgey', 'Squirtle']

in_common = []

for pokemon_a in list_a:
    for pokemon_b in list_b:
        if pokemon_a == pokemon_b:
            in_common.append(pokemon_a)

print(in_common)

['Squirtle']

list_a = ['Bulbasaur', 'Charmander', 'Squirtle']
list_b = ['Caterpie', 'Pidgey', 'Squirtle']

set_a = set(list_a)
print(set_a)

{'Bulbasaur', 'Charmander', 'Squirtle'}

set_b = set(list_b)
print(set_b)

{'Caterpie', 'Pidgey', 'Squirtle'}

set_a.intersection(set_b)

{'Squirtle'}

Efficiency gained with set theory

%%timeit
in_common = []

for pokemon_a in list_a:
    for pokemon_b in list_b:
        if pokemon_a == pokemon_b:
            in_common.append(pokemon_a)

601 ns ± 17.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit in_common = set_a.intersection(set_b)

137 ns ± 3.01 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Set method: difference

set_a = {'Bulbasaur', 'Charmander', 'Squirtle'}
set_b = {'Caterpie', 'Pidgey', 'Squirtle'}

set_a.difference(set_b)

{'Bulbasaur', 'Charmander'}

alt=”The Pokémon named Bulbasaur, Charmander, and Squirtle enclosed in a box titled Set A and the Pokémon Caterpie, Pidgey, and Squirtle enclosed in a separate box titled Set B; Bulbasaur and Charmander are circled in the box titled Set A”

Set method: difference

set_a = {'Bulbasaur', 'Charmander', 'Squirtle'}
set_b = {'Caterpie', 'Pidgey', 'Squirtle'}

set_b.difference(set_a)

{'Caterpie', 'Pidgey'}

alt=”The Pokémon named Bulbasaur, Charmander, and Squirtle enclosed in a box titled Set A and the Pokémon Caterpie, Pidgey, and Squirtle enclosed in a separate box titled Set B; Caterpie and Pidgey are circled in the box titled Set B”

Set method: symmetric difference

set_a = {'Bulbasaur', 'Charmander', 'Squirtle'}
set_b = {'Caterpie', 'Pidgey', 'Squirtle'}

set_a.symmetric_difference(set_b)

{'Bulbasaur', 'Caterpie', 'Charmander', 'Pidgey'}

Set method: union

set_a = {'Bulbasaur', 'Charmander', 'Squirtle'}
set_b = {'Caterpie', 'Pidgey', 'Squirtle'}

set_a.union(set_b)

{'Bulbasaur', 'Caterpie', 'Charmander', 'Pidgey', 'Squirtle'}

alt=”The Pokémon named Bulbasaur, Charmander, and Squirtle enclosed in a box titled Set A and the Pokémon Caterpie, Pidgey, and Squirtle enclosed in a separate box titled Set B; All Pokémon are circled and Squirtle only circled once”

Membership testing with sets

# The same 720 total Pokémon in each data structure
names_list  = ['Abomasnow', 'Abra', 'Absol', ...]
names_tuple = ('Abomasnow', 'Abra', 'Absol', ...)
names_set   = {'Abomasnow', 'Abra', 'Absol', ...}

alt=”The Pokémon named Abomasnow, Abra, and Absol enclosed in three separate boxes each titled List, Tuple, and Set respectively”

Membership testing with sets

# The same 720 total Pokémon in each data structure
names_list  = ['Abomasnow', 'Abra', 'Absol', ...]
names_tuple = ('Abomasnow', 'Abra', 'Absol', ...)
names_set   = {'Abomasnow', 'Abra', 'Absol', ...}

alt=”The Pokémon named Abomasnow, Abra, and Absol enclosed in three separate boxes each titled List, Tuple, and Set respectively; the Pokémon named Zubat with a line drawn to each box representing a membership test for each box”

names_list  = ['Abomasnow', 'Abra', 'Absol', ...]
names_tuple = ('Abomasnow', 'Abra', 'Absol', ...)
names_set   = {'Abomasnow', 'Abra', 'Absol', ...}

%timeit 'Zubat' in names_list

7.63 µs ± 211 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit 'Zubat' in names_tuple

7.6 µs ± 394 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit 'Zubat' in names_set

37.5 ns ± 1.37 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Uniques with sets

# 720 Pokémon primary types corresponding to each Pokémon
primary_types = ['Grass', 'Psychic', 'Dark', 'Bug', ...]

unique_types = []

for prim_type in primary_types:
    if prim_type not in unique_types:
        unique_types.append(prim_type)

print(unique_types)

['Grass', 'Psychic', 'Dark', 'Bug', 'Steel', 'Rock', 'Normal',
 'Water', 'Dragon', 'Electric', 'Poison', 'Fire', 'Fairy', 'Ice',
 'Ground', 'Ghost', 'Fighting', 'Flying']

Uniques with sets

# 720 Pokémon primary types corresponding to each Pokémon
primary_types = ['Grass', 'Psychic', 'Dark', 'Bug', ...]

unique_types_set = set(primary_types)

print(unique_types_set)

{'Grass', 'Psychic', 'Dark', 'Bug', 'Steel', 'Rock', 'Normal',
 'Water', 'Dragon', 'Electric', 'Poison', 'Fire', 'Fairy', 'Ice',
 'Ground', 'Ghost', 'Fighting', 'Flying'}

Let's practice set theory!

Writing Efficient Python Code