Getting data from multiple worksheets

Streamlined Data Ingestion with pandas

Amany Mahfouz

Instructor

Selecting Sheets to Load

  • read_excel() loads the first sheet in an Excel file by default
  • Use the sheet_name keyword argument to load other sheets
  • Specify spreadsheets by name and/or (zero-indexed) position number
  • Pass a list of names/numbers to load more than one sheet at a time
  • Any arguments passed to read_excel() apply to all sheets read
Streamlined Data Ingestion with pandas

Selecting Sheets to Load

Screenshot of a spreadsheet program, showing tabs for two spreadsheets

Streamlined Data Ingestion with pandas

Loading Select Sheets

# Get the second sheet by position index
survey_data_sheet2 = pd.read_excel('fcc_survey.xlsx',
                                   sheet_name=1)

# Get the second sheet by name survey_data_2017 = pd.read_excel('fcc_survey.xlsx', sheet_name='2017')
print(survey_data_sheet2.equals(survey_data_2017))
True
Streamlined Data Ingestion with pandas

Loading All Sheets

  • Passing sheet_name=None to read_excel() reads all sheets in a workbook
survey_responses = pd.read_excel("fcc_survey.xlsx", sheet_name=None)

print(type(survey_responses))
<class 'collections.OrderedDict'>
for key, value in survey_responses.items():
    print(key, type(value))
2016 <class 'pandas.core.frame.DataFrame'>
2017 <class 'pandas.core.frame.DataFrame'>
Streamlined Data Ingestion with pandas

Putting It All Together

# Create empty dataframe to hold all loaded sheets
all_responses = pd.DataFrame()

# Iterate through dataframes in dictionary for sheet_name, frame in survey_responses.items(): # Add a column so we know which year data is from frame["Year"] = sheet_name
# Add the dataframe to all_responses all_responses = pd.concat([all_responses, frame])
# View years in data print(all_responses.Year.unique())
['2016' '2017']
Streamlined Data Ingestion with pandas

Let's practice!

Streamlined Data Ingestion with pandas

Preparing Video For Download...