Streamlined Data Ingestion with pandas
Amany Mahfouz
Instructor
read_excel()
loads the first sheet in an Excel file by defaultsheet_name
keyword argument to load other sheetsread_excel()
apply to all sheets read# Get the second sheet by position index survey_data_sheet2 = pd.read_excel('fcc_survey.xlsx', sheet_name=1)
# Get the second sheet by name survey_data_2017 = pd.read_excel('fcc_survey.xlsx', sheet_name='2017')
print(survey_data_sheet2.equals(survey_data_2017))
True
sheet_name=None
to read_excel()
reads all sheets in a workbooksurvey_responses = pd.read_excel("fcc_survey.xlsx", sheet_name=None)
print(type(survey_responses))
<class 'collections.OrderedDict'>
for key, value in survey_responses.items():
print(key, type(value))
2016 <class 'pandas.core.frame.DataFrame'>
2017 <class 'pandas.core.frame.DataFrame'>
# Create empty dataframe to hold all loaded sheets all_responses = pd.DataFrame()
# Iterate through dataframes in dictionary for sheet_name, frame in survey_responses.items(): # Add a column so we know which year data is from frame["Year"] = sheet_name
# Add the dataframe to all_responses all_responses = pd.concat([all_responses, frame])
# View years in data print(all_responses.Year.unique())
['2016' '2017']
Streamlined Data Ingestion with pandas