Objective (loss) functions and base learners

Extreme Gradient Boosting with XGBoost

Sergey Fogelson

Head of Data Science, TelevisaUnivision

Objective Functions and Why We Use Them

Quantifies how far off a prediction is from the actual result
Measures the difference between estimated and true values for some collection of data
Goal: Find the model that yields the minimum value of the loss function

Common loss functions and XGBoost

Loss function names in xgboost:
- reg:squarederror - use for regression problems
- reg:logistic - use for classification problems when you want just decision, not probability
- binary:logistic - use when you want probability rather than just decision

Base learners and why we need them

XGBoost involves creating a meta-model that is composed of many individual models that combine to give a final prediction
Individual models = base learners
Want base learners that when combined create final prediction that is non-linear
Each base learner should be good at distinguishing or predicting different parts of the dataset
Two kinds of base learners: tree and linear

Trees as base learners example: Scikit-learn API

import xgboost as xgb
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

boston_data = pd.read_csv("boston_housing.csv")

X, y = boston_data.iloc[:,:-1],boston_data.iloc[:,-1]

X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.2, 
                                                        random_state=123)

xg_reg = xgb.XGBRegressor(objective='reg:squarederror', n_estimators=10,
                                                  seed=123)
xg_reg.fit(X_train, y_train)

preds = xg_reg.predict(X_test)

Trees as base learners example: Scikit-learn API

rmse = np.sqrt(mean_squared_error(y_test,preds))

print("RMSE: %f" % (rmse))

RMSE: 129043.2314

Linear base learners example: learning API only

import xgboost as xgb
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

boston_data = pd.read_csv("boston_housing.csv")

X, y = boston_data.iloc[:,:-1],boston_data.iloc[:,-1]

X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.2, 
                                                         random_state=123)

DM_train = xgb.DMatrix(data=X_train,label=y_train)
DM_test =  xgb.DMatrix(data=X_test,label=y_test)

params = {"booster":"gblinear","objective":"reg:squarederror"}

xg_reg = xgb.train(params = params, dtrain=DM_train, num_boost_round=10)

preds = xg_reg.predict(DM_test)

Linear base learners example: learning API only

rmse = np.sqrt(mean_squared_error(y_test,preds))

print("RMSE: %f" % (rmse))

RMSE: 124326.24465

Let's get to work!

Extreme Gradient Boosting with XGBoost