Streamlined Data Ingestion with pandas
Amany Mahfouz
Instructor
True
/False
databootcamp_data = pd.read_excel("fcc_survey_booleans.xlsx")
print(bootcamp_data.dtypes)
ID.x object
AttendedBootcamp float64
AttendedBootCampYesNo object
AttendedBootcampTF float64
BootcampLoan float64
LoanYesNo object
LoanTF float64
dtype: object
# Count True values
print(bootcamp_data.sum())
AttendedBootcamp 38
AttendedBootcampTF 38
BootcampLoan 14
LoanTF 14
dtype: object
# Count NAs
print(bootcamp_data.isna().sum())
ID.x 0
AttendedBootcamp 0
AttendedBootCampYesNo 0
AttendedBootcampTF 0
BootcampLoan 964
LoanYesNo 964
LoanTF 964
dtype: int64
# Load data, casting True/False columns as Boolean bool_data = pd.read_excel("fcc_survey_booleans.xlsx", dtype={"AttendedBootcamp": bool, "AttendedBootCampYesNo": bool, "AttendedBootcampTF":bool, "BootcampLoan": bool, "LoanYesNo": bool, "LoanTF": bool})
print(bool_data.dtypes)
ID.x object
AttendedBootcamp bool
AttendedBootCampYesNo bool
AttendedBootcampTF bool
BootcampLoan bool
LoanYesNo bool
LoanTF bool
dtype: object
# Count True values
print(bool_data.sum())
AttendedBootcamp 38
AttendedBootCampYesNo 1000
AttendedBootcampTF 38
BootcampLoan 978
LoanYesNo 1000
LoanTF 978
dtype: object
# Count NA values
print(bool_data.isna().sum())
ID.x 0
AttendedBootcamp 0
AttendedBootCampYesNo 0
AttendedBootcampTF 0
BootcampLoan 0
LoanYesNo 0
LoanTF 0
dtype: int64
pandas
loads True
/False
columns as float data by defaultbool
with read_excel()
's dtype
argumentTrue
and False
valuesTrue
pandas
automatically recognizes some values as True
/False
in Boolean columnsTrue
read_excel()
'strue_values
argument to set custom True
valuesfalse_values
to set custom False
valuesTrue
/False
, respectivelyTrue
/False
values are only applied to columns set as Boolean# Load data with Boolean dtypes and custom T/F values
bool_data = pd.read_excel("fcc_survey_booleans.xlsx",
dtype={"AttendedBootcamp": bool,
"AttendedBootCampYesNo": bool,
"AttendedBootcampTF":bool,
"BootcampLoan": bool,
"LoanYesNo": bool,
"LoanTF": bool},
true_values=["Yes"],
false_values=["No"])
print(bool_data.sum())
AttendedBootcamp 38
AttendedBootCampYesNo 38
AttendedBootcampTF 38
BootcampLoan 978
LoanYesNo 978
LoanTF 978
dtype: object
True
?Streamlined Data Ingestion with pandas