Streamlined Data Ingestion with pandas
Amany Mahfouz
Instructor
pandas
: read_excel()
import pandas as pd # Read the Excel file survey_data = pd.read_excel("fcc_survey.xlsx")
# View the first 5 lines of data print(survey_data.head())
Age AttendedBootcamp ... SchoolMajor StudentDebtOwe
0 28.0 0.0 ... NaN 20000
1 22.0 0.0 ... NaN NaN
2 19.0 0.0 ... NaN NaN
3 26.0 0.0 ... Cinematography And Film 7000
4 20.0 0.0 ... NaN NaN
[5 rows x 98 columns]
read_excel()
has many keyword arguments in common with read_csv()
nrows
: limit number of rows to loadskiprows
: specify number of rows or row numbers to skipusecols
: choose columns by name, positional number, or letter (e.g. "A:P")# Read columns W-AB and AR of file, skipping metadata header survey_data = pd.read_excel("fcc_survey_with_headers.xlsx", skiprows=2, usecols="W:AB, AR")
# View data print(survey_data.head())
CommuteTime CountryCitizen ... EmploymentFieldOther EmploymentStatus Income
0 35.0 United States of America ... NaN Employed for wages 32000.0
1 90.0 United States of America ... NaN Employed for wages 15000.0
2 45.0 United States of America ... NaN Employed for wages 48000.0
3 45.0 United States of America ... NaN Employed for wages 43000.0
4 10.0 United States of America ... NaN Employed for wages 6000.0
[5 rows x 7 columns]
Streamlined Data Ingestion with pandas