Streamlined Data Ingestion with pandas
Amany Mahfouz
Instructor
read_excel() loads the first sheet in an Excel file by defaultsheet_name keyword argument to load other sheetsread_excel() apply to all sheets read
# Get the second sheet by position index survey_data_sheet2 = pd.read_excel('fcc_survey.xlsx', sheet_name=1)# Get the second sheet by name survey_data_2017 = pd.read_excel('fcc_survey.xlsx', sheet_name='2017')print(survey_data_sheet2.equals(survey_data_2017))
True
sheet_name=None to read_excel() reads all sheets in a workbooksurvey_responses = pd.read_excel("fcc_survey.xlsx", sheet_name=None)
print(type(survey_responses))
<class 'collections.OrderedDict'>
for key, value in survey_responses.items():
    print(key, type(value))
2016 <class 'pandas.core.frame.DataFrame'>
2017 <class 'pandas.core.frame.DataFrame'>
# Create empty dataframe to hold all loaded sheets all_responses = pd.DataFrame()# Iterate through dataframes in dictionary for sheet_name, frame in survey_responses.items(): # Add a column so we know which year data is from frame["Year"] = sheet_name# Add the dataframe to all_responses all_responses = pd.concat([all_responses, frame])# View years in data print(all_responses.Year.unique())
['2016' '2017']
Streamlined Data Ingestion with pandas