Importing and Managing Financial Data in Python
Stefan Jansen
Instructor
pd.DataFrame()
is same as CSV source fileamex-listings.csv
dtype
)dtype
affects your calculation and visualizationpandas dtype |
Column characteristics |
---|---|
object |
Text, or a mix of text and numeric data |
int64 |
Numeric: whole numbers - 64 bits ($\le 2^{64}$) |
float64 |
Numeric: Decimals, or whole numbers with missing values |
datetime64 |
Date and time information |
import pandas as pd
amex = pd.read_csv('amex-listings.csv')
amex.info() # To inspect table structure & data types
RangeIndex: 360 entries, 0 to 359
Data columns (total 8 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 360 non-null object
1 Company Name 360 non-null object
2 Last Sale 346 non-null float64
3 Market Capitalization 360 non-null float64
4 IPO Year 105 non-null float64
5 Sector 238 non-null object
6 Industry 238 non-null object
7 Last Update 360 non-null object
dtypes: float64(3), object(5)
# Replace 'n/a' with np.nan amex = pd.read_csv('amex-listings.csv', na_values='n/a')
amex.info()
RangeIndex: 360 entries, 0 to 359
Data columns (total 8 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 360 non-null object
1 Company Name 360 non-null object
2 Last Sale 346 non-null float64
3 Market Capitalization 360 non-null float64
4 IPO Year 105 non-null float64
5 Sector 238 non-null object
6 Industry 238 non-null object
7 Last Update 360 non-null object
dtypes: float64(3), object(5)
amex = pd.read_csv('amex-listings.csv', na_values='n/a', parse_dates=['Last Update'])
amex.info()
RangeIndex: 360 entries, 0 to 359
Data columns (total 8 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 360 non-null object
1 Company Name 360 non-null object
2 Last Sale 346 non-null float64
3 Market Capitalization 360 non-null float64
4 IPO Year 105 non-null float64
5 Sector 238 non-null object
6 Industry 238 non-null object
7 Last Update 360 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(3), object(4)
amex.head(2) # Show first n rows (default: 5)
Stock Symbol Company Name
0 XXII 22nd Century Group, Inc
1 FAX Aberdeen Asia-Pacific Income Fund Inc
Last Sale Market Capitalization IPO Year
0 1.3300 1.206285e+08 NaN
1 5.0000 1.266333e+09 1986.0
Sector Industry Last Update
0 Non-Durables Farming/Seeds/Milling 2017-04-26
1 NaN NaN 2017-04-25
Importing and Managing Financial Data in Python