Writing Efficient Code with pandas
Leonidas Souliotis
PhD Candidate
S1 | R1 | S2 | R2 | S3 | R3 | S4 | R4 | S5 | R5 | |
---|---|---|---|---|---|---|---|---|---|---|
1 | ♦ | 10 | ♣ | Jack | ♣ | King | ♠ | 4 | ♥ | Ace |
2 | ♦ | Jack | ♦ | King | ♦ | 10 | ♦ | Queen | ♦ | Ace |
3 | ♣ | Queen | ♣ | Jack | ♣ | King | ♣ | 10 | ♣ | Ace |
S1 | R1 | S2 | R2 | S3 | R3 | S4 | R4 | S5 | R5 | |
---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 10 | 3 | 11 | 3 | 13 | 4 | 4 | 1 | 1 |
2 | 2 | 11 | 2 | 13 | 2 | 10 | 2 | 12 | 2 | 1 |
3 | 3 | 12 | 3 | 11 | 3 | 13 | 3 | 10 | 3 | 1 |
Sn: symbol of the n-th card
1 (Hearts), 2 (Diamonds), 3 (Clubs), 4 (Spades)
Rn: rank of the n-th card
1 (Ace), 2-10, 11 (Jack), 12 (Queen), 13 (King)
.loc[]
— index name locator
# Specify the range of rows to select
rows = range(0, 500)
# Time selecting rows using .loc[]
loc_start_time = time.time()
data.loc[rows]
loc_end_time = time.time()
print("Time using .loc[] : {} sec".format(
loc_end_time - loc_start_time))
Time using .loc[]: 0.001951932 seconds
.iloc[]
— index number locator
# Specify the range of rows to select
rows = range(0, 500)
# Time selecting rows using .iloc[]
iloc_start_time = time.time()
data.iloc[rows]
iloc_end_time = time.time()
print("Time using .iloc[]: {} sec".format(
iloc_end_time - iloc_start_time)
Time using .iloc[] : 0.0007140636 sec
Difference in speed: 173.355592654%
.iloc[]
— index number locator
iloc_start_time = time.time()
data.iloc[:,:3]
iloc_end_time = time.time()
print("Time using .iloc[]: {} sec".format(
iloc_end_time - iloc_start_time))
Time using .iloc[]: 0.00125193595886 sec
Locating columns by names
names_start_time = time.time()
data[['S1', 'R1', 'S2']]
names_end_time = time.time()
print("Time using selection by name: {} sec".format(
names_end_time - names_start_time))
Time using selection by name: 0.000964879989624 sec
Difference in speed: 29.7504324188%
Writing Efficient Code with pandas