Combine data from multiple worksheets

Importing and Managing Financial Data in Python

Stefan Jansen

Instructor

Combine DataFrames

  • Concatenate or "stack" a list of pd.DataFrames
  • Syntax: pd.concat([amex, nasdaq, nyse])

NASDAQ, NYSE, and AMEX tables with the same columns

Importing and Managing Financial Data in Python

Combine DataFrames

  • Concatenate or "stack" a list of pd.DataFrames
  • Syntax: pd.concat([amex, nasdaq, nyse])

tables with axis = 0

Importing and Managing Financial Data in Python

Combine DataFrames

  • Concatenate or "stack" a list of pd.DataFrames
  • Syntax: pd.concat([amex, nasdaq, nyse])

three tables combined into one long table

Importing and Managing Financial Data in Python

Concatenate two DataFrames

amex = pd.read_excel('listings.xlsx',
                     sheet_name='amex', 
                     na_values=['n/a'])

nyse = pd.read_excel('listings.xlsx', sheet_name='nyse', na_values=['n/a'])
pd.concat([amex, nyse]).info()
Int64Index: 3507 entries, 0 to 3146
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
 --  ------                 --------------  -----  
 0   Stock Symbol           3507 non-null   object 
...
Importing and Managing Financial Data in Python

Add a reference column

amex['Exchange'] = 'AMEX' # Add column to reference source
nyse['Exchange'] = 'NYSE'

listings = pd.concat([amex, nyse])
listings.head(2)
  Stock Symbol    ...      Exchange       
0         XXII    ...         AMEX    
1          FAX    ...         AMEX
Importing and Managing Financial Data in Python

Combine three DataFrames

xls = pd.ExcelFile('listings.xlsx')

exchanges = xls.sheet_names
# Create empty list to collect DataFrames listings = []
for exchange in exchanges: listing = pd.read_excel(xls, sheet_name=exchange) # Add reference col listing['Exchange'] = exchange # Add DataFrame to list listings.append(listing)
# List of DataFrames combined_listings = pd.concat(listings)
Importing and Managing Financial Data in Python

Combine three DataFrames

combined_listings.info()
Int64Index: 6674 entries, 0 to 3146
Data columns (total 8 columns):
 #   Column                 Non-Null Count  Dtype  
 --  ------                 --------------  -----  
 0   Stock Symbol           6674 non-null   object 
 1   Company Name           6674 non-null   object 
 2   Last Sale              6590 non-null   float64
 3   Market Capitalization  6674 non-null   float64
 4   IPO Year               2852 non-null   float64
 5   Sector                 5182 non-null   object 
 6   Industry               5182 non-null   object 
 7   Exchange               6674 non-null   object 
dtypes: float64(3), object(5)
Importing and Managing Financial Data in Python

Let's practice!

Importing and Managing Financial Data in Python

Preparing Video For Download...