Set theory

Writing Efficient Python Code

Logan Thomas

Scientific Software Technical Trainer, Enthought

Set theory

  • Branch of Mathematics applied to collections of objects
    • i.e., sets
  • Python has built-in set datatype with accompanying methods:
    • intersection(): all elements that are in both sets
    • difference(): all elements in one set but not the other
    • symmetric_difference(): all elements in exactly one set
    • union(): all elements that are in either set
  • Fast membership testing
    • Check if a value exists in a sequence or not
    • Using the in operator
Writing Efficient Python Code

Comparing objects with loops

list_a = ['Bulbasaur', 'Charmander', 'Squirtle']
list_b = ['Caterpie', 'Pidgey', 'Squirtle']

alt=”The Pokémon named Bulbasaur, Charmander, and Squirtle enclosed in a box titled List A and the Pokémon Caterpie, Pidgey, and Squirtle enclosed in a separate box titled List B”

Writing Efficient Python Code

Comparing objects with loops

list_a = ['Bulbasaur', 'Charmander', 'Squirtle']
list_b = ['Caterpie', 'Pidgey', 'Squirtle'] 

alt=”The Pokémon named Bulbasaur, Charmander, and Squirtle enclosed in a box titled List A and the Pokémon Caterpie, Pidgey, and Squirtle enclosed in a separate box titled List B; Squirtle is circled in both boxes”

Writing Efficient Python Code
list_a = ['Bulbasaur', 'Charmander', 'Squirtle']
list_b = ['Caterpie', 'Pidgey', 'Squirtle'] 
in_common = []

for pokemon_a in list_a:
    for pokemon_b in list_b:
        if pokemon_a == pokemon_b:
            in_common.append(pokemon_a)

print(in_common)
['Squirtle']
Writing Efficient Python Code
list_a = ['Bulbasaur', 'Charmander', 'Squirtle']
list_b = ['Caterpie', 'Pidgey', 'Squirtle'] 
set_a = set(list_a)
print(set_a)
{'Bulbasaur', 'Charmander', 'Squirtle'}
set_b = set(list_b)
print(set_b)
{'Caterpie', 'Pidgey', 'Squirtle'}
set_a.intersection(set_b)
{'Squirtle'}
Writing Efficient Python Code

Efficiency gained with set theory

%%timeit
in_common = []

for pokemon_a in list_a:
    for pokemon_b in list_b:
        if pokemon_a == pokemon_b:
            in_common.append(pokemon_a)
601 ns ± 17.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit in_common = set_a.intersection(set_b)
137 ns ± 3.01 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Writing Efficient Python Code

Set method: difference

set_a = {'Bulbasaur', 'Charmander', 'Squirtle'}
set_b = {'Caterpie', 'Pidgey', 'Squirtle'}
set_a.difference(set_b)
{'Bulbasaur', 'Charmander'}

alt=”The Pokémon named Bulbasaur, Charmander, and Squirtle enclosed in a box titled Set A and the Pokémon Caterpie, Pidgey, and Squirtle enclosed in a separate box titled Set B; Bulbasaur and Charmander are circled in the box titled Set A”

Writing Efficient Python Code

Set method: difference

set_a = {'Bulbasaur', 'Charmander', 'Squirtle'}
set_b = {'Caterpie', 'Pidgey', 'Squirtle'}
set_b.difference(set_a)
{'Caterpie', 'Pidgey'}

alt=”The Pokémon named Bulbasaur, Charmander, and Squirtle enclosed in a box titled Set A and the Pokémon Caterpie, Pidgey, and Squirtle enclosed in a separate box titled Set B; Caterpie and Pidgey are circled in the box titled Set B”

Writing Efficient Python Code

Set method: symmetric difference

set_a = {'Bulbasaur', 'Charmander', 'Squirtle'}
set_b = {'Caterpie', 'Pidgey', 'Squirtle'}
set_a.symmetric_difference(set_b)
{'Bulbasaur', 'Caterpie', 'Charmander', 'Pidgey'}

alt=”The Pokémon named Bulbasaur, Charmander, and Squirtle enclosed in a box titled Set A and the Pokémon Caterpie, Pidgey, and Squirtle enclosed in a separate box titled Set B; Bulbasaur, Charmander, Caterpie and Pidgey are circled”

Writing Efficient Python Code

Set method: union

set_a = {'Bulbasaur', 'Charmander', 'Squirtle'}
set_b = {'Caterpie', 'Pidgey', 'Squirtle'}
set_a.union(set_b)
{'Bulbasaur', 'Caterpie', 'Charmander', 'Pidgey', 'Squirtle'}

alt=”The Pokémon named Bulbasaur, Charmander, and Squirtle enclosed in a box titled Set A and the Pokémon Caterpie, Pidgey, and Squirtle enclosed in a separate box titled Set B; All  Pokémon are circled and Squirtle only circled once”

Writing Efficient Python Code

Membership testing with sets

# The same 720 total Pokémon in each data structure
names_list  = ['Abomasnow', 'Abra', 'Absol', ...]
names_tuple = ('Abomasnow', 'Abra', 'Absol', ...)
names_set   = {'Abomasnow', 'Abra', 'Absol', ...}

alt=”The Pokémon named Abomasnow, Abra, and Absol enclosed in three separate boxes each titled List, Tuple, and Set respectively”

Writing Efficient Python Code

Membership testing with sets

# The same 720 total Pokémon in each data structure
names_list  = ['Abomasnow', 'Abra', 'Absol', ...]
names_tuple = ('Abomasnow', 'Abra', 'Absol', ...)
names_set   = {'Abomasnow', 'Abra', 'Absol', ...}

alt=”The Pokémon named Abomasnow, Abra, and Absol enclosed in three separate boxes each titled List, Tuple, and Set respectively; the Pokémon named Zubat with a line drawn to each box representing a membership test for each box”

Writing Efficient Python Code
names_list  = ['Abomasnow', 'Abra', 'Absol', ...]
names_tuple = ('Abomasnow', 'Abra', 'Absol', ...)
names_set   = {'Abomasnow', 'Abra', 'Absol', ...}
%timeit 'Zubat' in names_list
7.63 µs ± 211 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit 'Zubat' in names_tuple
7.6 µs ± 394 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit 'Zubat' in names_set
37.5 ns ± 1.37 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Writing Efficient Python Code

Uniques with sets

# 720 Pokémon primary types corresponding to each Pokémon
primary_types = ['Grass', 'Psychic', 'Dark', 'Bug', ...]
unique_types = []

for prim_type in primary_types:
    if prim_type not in unique_types:
        unique_types.append(prim_type)

print(unique_types)
['Grass', 'Psychic', 'Dark', 'Bug', 'Steel', 'Rock', 'Normal',
 'Water', 'Dragon', 'Electric', 'Poison', 'Fire', 'Fairy', 'Ice',
 'Ground', 'Ghost', 'Fighting', 'Flying']
Writing Efficient Python Code

Uniques with sets

# 720 Pokémon primary types corresponding to each Pokémon
primary_types = ['Grass', 'Psychic', 'Dark', 'Bug', ...]
unique_types_set = set(primary_types)
print(unique_types_set)
{'Grass', 'Psychic', 'Dark', 'Bug', 'Steel', 'Rock', 'Normal',
 'Water', 'Dragon', 'Electric', 'Poison', 'Fire', 'Fairy', 'Ice',
 'Ground', 'Ghost', 'Fighting', 'Flying'}
Writing Efficient Python Code

Let's practice set theory!

Writing Efficient Python Code

Preparing Video For Download...