Filtering data

Python intermedio per la finanza

Kennedy Behrman

Data Engineer, Author, Founder

Introducing the data

prices.head()
Python intermedio per la finanza

Introducing the data

prices.head()
Date Symbol High
0 2020-04-03 AAPL 245.70
1 2020-04-02 AAPL 245.15
2 2020-04-01 AAPL 248.72
3 2020-03-31 AAPL 262.49
4 2020-03-30 AAPL 255.52
Python intermedio per la finanza

Introducing the data

prices.describe()
Python intermedio per la finanza

Introducing the data

prices.describe()
High
count 378.000000
mean 881.593138
std 720.771922
min 227.490000
max 2185.950000
Python intermedio per la finanza

Introducing the data

prices.describe(include='object')
Symbol
count 378
unique 3
top AMZN
freq 126
Python intermedio per la finanza

Comparison operators

< <= > >= == !=

Python intermedio per la finanza

Column comparison

prices.High > 2160
Python intermedio per la finanza

Column comparison

prices.High > 2160
0      False
1      False
2      False
3      False
4      False
       ...  
374    False
375    False
376    False
377    False
Python intermedio per la finanza

Column comparison

prices.Symbol == 'AAPL'
Python intermedio per la finanza

Column comparison

prices.Symbol == 'AAPL'
0       True
1       True
2       True
3       True
4       True
       ...  
374    False
375    False
376    False
377    False
Python intermedio per la finanza

Masking by symbol

mask_symbol = prices.Symbol == 'AAPL'
aapl = prices.loc[mask_symbol]
Python intermedio per la finanza

Masking by symbol

mask_symbol = prices.Symbol == 'AAPL'
aapl = prices.loc[mask_symbol]
aapl.describe(include='object')
Symbol
count 126
unique 1
top AAPL
freq 126
Python intermedio per la finanza

Masking by price

mask_high = prices.High > 2160
big_price = prices.loc[mask_high]
Python intermedio per la finanza

Masking by price

big_price.describe()
High
count 6.000000
mean 2177.406567
std 7.999334
min 2166.070000
max 2185.95000
Python intermedio per la finanza

Pandas Boolean operators

  • And &
  • Or|
  • Not ~
Python intermedio per la finanza

Combining conditions

mask_prices = prices['Symbol'] != 'AMZN'
mask_date = historical_highs['Date'] > datetime(2020, 4, 1)
mask_amzn = mask_prices & mask_date
prices.loc[mask_amzn]
Python intermedio per la finanza

Combining conditions

Date Symbol High
0 2020-04-03 AAPL 245.7000
1 2020-04-02 AAPL 245.1500
252 2020-04-03 TSLA 515.4900
253 2020-04-02 TSLA 494.2599
Python intermedio per la finanza

Let's practice!

Python intermedio per la finanza

Preparing Video For Download...