Predicting CTR with Machine Learning in Python
Kevin Huo
Instructor
print(df.columns)
['id', 'click', 'hour', 'C1', ... ]
print(df.dtypes)
id object
click int64
...
int
: an integer: 1
, 2
, etc.float
: decimals: 3.02
, 4.56
, etc.object
: string: "hello"
, "world"
, etc.datetime
: datetime: 2018-01-01
, etc.df.select_dtypes(
include=['int', 'float'])
click int64
...
df.info()
Data columns (total 24 columns):
id 50000 non-null object
df['id'].isnull()
[False, False, False, False, ... ]
df.isnull().sum(axis = 0)
dtype: object
id 0
...
df.isnull().sum(axis = 0).sum()
0
df.groupby(['search_engine_type',
'click']).size()
search_engine_type click
1002 0 940
1 240
...
df.groupby(['search_engine_type',
'click']).size().unstack()
click 0 1
search_engine_type
1002 940 240
...
df.reset_index()
click search_engine_type 0 1
1002 940 240
df.rename(columns = {0: 'non_clicks'}, inplace = True)
click search_engine_type non_clicks clicks
1002 940 240
Predicting CTR with Machine Learning in Python