Working With Data Types

Preprocessing for Machine Learning in Python

James Chapman

Curriculum Manager, DataCamp

Why are types important?

print(volunteer.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 665 entries, 0 to 664
Data columns (total 35 columns):
 #   Column              Non-Null Count  Dtype  
 --  ------              --------------  -----  
 0   opportunity_id      665 non-null    int64  
 1   content_id          665 non-null    int64  
 2   vol_requests        665 non-null    int64  
 3   event_time          665 non-null    int64  
 4   title               665 non-null    object 
 ..  ...                 ...             ...
 34  NTA                 0 non-null      float64
dtypes: float64(13), int64(8), object(14)
memory usage: 182.0+ KB

object: string/mixed types
int64: integer
float64: float
datetime64: dates and times

Converting column types

print(df)

   A        B    C
0  1   string  1.0
1  2  string2  2.0
2  3  string3  3.0

print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
 --  ------  --------------  ----- 
 0   A       3 non-null      int64 
 1   B       3 non-null      object
 2   C       3 non-null      object
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes

Converting column types

print(df)

   A        B    C
0  1   string  1.0
1  2  string2  2.0
2  3  string3  3.0

df["C"] = df["C"].astype("float")
print(df.dtypes)

A      int64
B     object
C    float64
dtype: object

Let's practice!

Preprocessing for Machine Learning in Python