Introducing XGBoost

Extreme Gradient Boosting with XGBoost

Sergey Fogelson

Head of Data Science, TelevisaUnivision

What is XGBoost?

  • Optimized gradient-boosting machine learning library
  • Originally written in C++
  • Has APIs in several languages:
    • Python
    • R
    • Scala
    • Julia
    • Java
Extreme Gradient Boosting with XGBoost

What makes XGBoost so popular?

  • Speed and performance
  • Core algorithm is parallelizable
  • Consistently outperforms single-algorithm methods
  • State-of-the-art performance in many ML tasks
Extreme Gradient Boosting with XGBoost

Using XGBoost: a quick example

import xgboost as xgb
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

class_data = pd.read_csv("classification_data.csv") X, y = class_data.iloc[:,:-1], class_data.iloc[:,-1]
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.2, random_state=123)
xg_cl = xgb.XGBClassifier(objective='binary:logistic', n_estimators=10, seed=123)
xg_cl.fit(X_train, y_train) preds = xg_cl.predict(X_test)
accuracy = float(np.sum(preds==y_test))/y_test.shape[0] print("accuracy: %f" % (accuracy))
accuracy: 0.78333
Extreme Gradient Boosting with XGBoost

Let's begin using XGBoost!

Extreme Gradient Boosting with XGBoost

Preparing Video For Download...