Filtering data

Intermediate Python for Finance

Kennedy Behrman

Data Engineer, Author, Founder

Introducing the data

prices.head()
Intermediate Python for Finance

Introducing the data

prices.head()
Date Symbol High
0 2020-04-03 AAPL 245.70
1 2020-04-02 AAPL 245.15
2 2020-04-01 AAPL 248.72
3 2020-03-31 AAPL 262.49
4 2020-03-30 AAPL 255.52
Intermediate Python for Finance

Introducing the data

prices.describe()
Intermediate Python for Finance

Introducing the data

prices.describe()
High
count 378.000000
mean 881.593138
std 720.771922
min 227.490000
max 2185.950000
Intermediate Python for Finance

Introducing the data

prices.describe(include='object')
Symbol
count 378
unique 3
top AMZN
freq 126
Intermediate Python for Finance

Comparison operators

< <= > >= == !=

Intermediate Python for Finance

Column comparison

prices.High > 2160
Intermediate Python for Finance

Column comparison

prices.High > 2160
0      False
1      False
2      False
3      False
4      False
       ...  
374    False
375    False
376    False
377    False
Intermediate Python for Finance

Column comparison

prices.Symbol == 'AAPL'
Intermediate Python for Finance

Column comparison

prices.Symbol == 'AAPL'
0       True
1       True
2       True
3       True
4       True
       ...  
374    False
375    False
376    False
377    False
Intermediate Python for Finance

Masking by symbol

mask_symbol = prices.Symbol == 'AAPL'
aapl = prices.loc[mask_symbol]
Intermediate Python for Finance

Masking by symbol

mask_symbol = prices.Symbol == 'AAPL'
aapl = prices.loc[mask_symbol]
aapl.describe(include='object')
Symbol
count 126
unique 1
top AAPL
freq 126
Intermediate Python for Finance

Masking by price

mask_high = prices.High > 2160
big_price = prices.loc[mask_high]
Intermediate Python for Finance

Masking by price

big_price.describe()
High
count 6.000000
mean 2177.406567
std 7.999334
min 2166.070000
max 2185.95000
Intermediate Python for Finance

Pandas Boolean operators

  • And &
  • Or|
  • Not ~
Intermediate Python for Finance

Combining conditions

mask_prices = prices['Symbol'] != 'AMZN'
mask_date = historical_highs['Date'] > datetime(2020, 4, 1)
mask_amzn = mask_prices & mask_date
prices.loc[mask_amzn]
Intermediate Python for Finance

Combining conditions

Date Symbol High
0 2020-04-03 AAPL 245.7000
1 2020-04-02 AAPL 245.1500
252 2020-04-03 TSLA 515.4900
253 2020-04-02 TSLA 494.2599
Intermediate Python for Finance

Let's practice!

Intermediate Python for Finance

Preparing Video For Download...