Streamlined Data Ingestion with pandas
Amany Mahfouz
Instructor
True/False data






bootcamp_data = pd.read_excel("fcc_survey_booleans.xlsx")
print(bootcamp_data.dtypes)
ID.x object
AttendedBootcamp float64
AttendedBootCampYesNo object
AttendedBootcampTF float64
BootcampLoan float64
LoanYesNo object
LoanTF float64
dtype: object
# Count True values
print(bootcamp_data.sum())
AttendedBootcamp 38
AttendedBootcampTF 38
BootcampLoan 14
LoanTF 14
dtype: object
# Count NAs
print(bootcamp_data.isna().sum())
ID.x 0
AttendedBootcamp 0
AttendedBootCampYesNo 0
AttendedBootcampTF 0
BootcampLoan 964
LoanYesNo 964
LoanTF 964
dtype: int64
# Load data, casting True/False columns as Boolean bool_data = pd.read_excel("fcc_survey_booleans.xlsx", dtype={"AttendedBootcamp": bool, "AttendedBootCampYesNo": bool, "AttendedBootcampTF":bool, "BootcampLoan": bool, "LoanYesNo": bool, "LoanTF": bool})print(bool_data.dtypes)
ID.x object
AttendedBootcamp bool
AttendedBootCampYesNo bool
AttendedBootcampTF bool
BootcampLoan bool
LoanYesNo bool
LoanTF bool
dtype: object
# Count True values
print(bool_data.sum())
AttendedBootcamp 38
AttendedBootCampYesNo 1000
AttendedBootcampTF 38
BootcampLoan 978
LoanYesNo 1000
LoanTF 978
dtype: object
# Count NA values
print(bool_data.isna().sum())
ID.x 0
AttendedBootcamp 0
AttendedBootCampYesNo 0
AttendedBootcampTF 0
BootcampLoan 0
LoanYesNo 0
LoanTF 0
dtype: int64
pandas loads True/False columns as float data by defaultbool with read_excel()'s dtype argumentTrue and False valuesTruepandas automatically recognizes some values as True/False in Boolean columnsTrueread_excel()'strue_values argument to set custom True valuesfalse_values to set custom False valuesTrue/False, respectivelyTrue/False values are only applied to columns set as Boolean# Load data with Boolean dtypes and custom T/F values
bool_data = pd.read_excel("fcc_survey_booleans.xlsx",
dtype={"AttendedBootcamp": bool,
"AttendedBootCampYesNo": bool,
"AttendedBootcampTF":bool,
"BootcampLoan": bool,
"LoanYesNo": bool,
"LoanTF": bool},
true_values=["Yes"],
false_values=["No"])
print(bool_data.sum())
AttendedBootcamp 38
AttendedBootCampYesNo 38
AttendedBootcampTF 38
BootcampLoan 978
LoanYesNo 978
LoanTF 978
dtype: object
True?Streamlined Data Ingestion with pandas