Modifying imports: true/false data

Acquisizione dati semplificata con pandas

Amany Mahfouz

Instructor

Boolean Data

  • True/False data

Spreadsheet of survey data with several true/false columns

Acquisizione dati semplificata con pandas

Boolean Data

Spreadsheet of survey data with columns about bootcamp attendance highlighted

Acquisizione dati semplificata con pandas

Boolean Data

Spreadsheet of survey data with columns about loan usage highlighted

Acquisizione dati semplificata con pandas

Boolean Data

Spreadsheet of survey data with several True/False columns

Acquisizione dati semplificata con pandas

Boolean Data

Spreadsheet of survey data with columns using zeros and ones highlighted

Acquisizione dati semplificata con pandas

Boolean Data

Spreadsheet of survey data with columns using Trues and Falses highlighted

Acquisizione dati semplificata con pandas

Boolean Data

Spreadsheet of survey data with columns using yeses and nos highlighted

Acquisizione dati semplificata con pandas

pandas and Booleans

bootcamp_data = pd.read_excel("fcc_survey_booleans.xlsx")
print(bootcamp_data.dtypes)
ID.x                      object
AttendedBootcamp         float64
AttendedBootCampYesNo     object
AttendedBootcampTF       float64
BootcampLoan             float64
LoanYesNo                 object
LoanTF                   float64
dtype: object
Acquisizione dati semplificata con pandas

pandas and Booleans

# Count True values
print(bootcamp_data.sum())
AttendedBootcamp                      38
AttendedBootcampTF                    38
BootcampLoan                          14
LoanTF                                14
dtype: object
# Count NAs
print(bootcamp_data.isna().sum())
ID.x                           0
AttendedBootcamp               0
AttendedBootCampYesNo          0
AttendedBootcampTF             0
BootcampLoan                 964
LoanYesNo                    964
LoanTF                       964
dtype: int64
Acquisizione dati semplificata con pandas
# Load data, casting True/False columns as Boolean
bool_data = pd.read_excel("fcc_survey_booleans.xlsx", 
                          dtype={"AttendedBootcamp": bool,
                                "AttendedBootCampYesNo": bool,
                                "AttendedBootcampTF":bool,
                                "BootcampLoan": bool,
                                "LoanYesNo": bool,
                                "LoanTF": bool})

print(bool_data.dtypes)
ID.x                      object
AttendedBootcamp            bool
AttendedBootCampYesNo       bool
AttendedBootcampTF          bool
BootcampLoan                bool
LoanYesNo                   bool
LoanTF                      bool
dtype: object
Acquisizione dati semplificata con pandas
# Count True values
print(bool_data.sum())
AttendedBootcamp                         38
AttendedBootCampYesNo                  1000
AttendedBootcampTF                       38
BootcampLoan                            978
LoanYesNo                              1000
LoanTF                                  978
dtype: object
# Count NA values
print(bool_data.isna().sum())
ID.x                       0
AttendedBootcamp           0
AttendedBootCampYesNo      0
AttendedBootcampTF         0
BootcampLoan               0
LoanYesNo                  0
LoanTF                     0
dtype: int64
Acquisizione dati semplificata con pandas

pandas and Booleans

  • pandas loads True/False columns as float data by default
  • Specify a column should be bool with read_excel()'s dtype argument
  • Boolean columns can only have True and False values
  • NA/missing values in Boolean columns are changed to True
  • pandas automatically recognizes some values as True/False in Boolean columns
  • Unrecognized values in a Boolean column are also changed to True
Acquisizione dati semplificata con pandas

Setting Custom True/False Values

  • Use read_excel()'strue_values argument to set custom True values
  • Use false_values to set custom False values
  • Each takes a list of values to treat as True/False, respectively
  • Custom True/False values are only applied to columns set as Boolean
Acquisizione dati semplificata con pandas

Setting Custom True/False Values

# Load data with Boolean dtypes and custom T/F values
bool_data = pd.read_excel("fcc_survey_booleans.xlsx", 
                          dtype={"AttendedBootcamp": bool,
                                "AttendedBootCampYesNo": bool,
                                "AttendedBootcampTF":bool,
                                "BootcampLoan": bool,
                                "LoanYesNo": bool,
                                "LoanTF": bool},
                          true_values=["Yes"],
                          false_values=["No"])
Acquisizione dati semplificata con pandas

Setting Custom True/False Values

print(bool_data.sum())
AttendedBootcamp                                  38
AttendedBootCampYesNo                             38
AttendedBootcampTF                                38
BootcampLoan                                     978
LoanYesNo                                        978
LoanTF                                           978
dtype: object
Acquisizione dati semplificata con pandas

Boolean Considerations

  • Are there missing values, or could there be in the future?
  • How will this column be used in analysis?
  • What would happen if a value were incorrectly coded as True?
  • Could the data be modeled another way (e.g., as floats or integers)?
Acquisizione dati semplificata con pandas

Let's practice!

Acquisizione dati semplificata con pandas

Preparing Video For Download...