Extreme Gradient Boosting with XGBoost
Sergey Fogelson
Head of Data Science, TelevisaUnivision
import xgboost as xgb import pandas as pd boston_data = pd.read_csv("boston_data.csv") X,y = boston_data.iloc[:,:-1],boston_data.iloc[:,-1]
boston_dmatrix = xgb.DMatrix(data=X,label=y) params={"objective":"reg:squarederror","max_depth":4}
l1_params = [1,10,100] rmses_l1=[]
for reg in l1_params: params["alpha"] = reg cv_results = xgb.cv(dtrain=boston_dmatrix, params=params,nfold=4, num_boost_round=10,metrics="rmse",as_pandas=True,seed=123) rmses_l1.append(cv_results["test-rmse-mean"].tail(1).values[0])
print("Best rmse as a function of l1:") print(pd.DataFrame(list(zip(l1_params,rmses_l1)), columns=["l1","rmse"]))
Best rmse as a function of l1:
l1 rmse
0 1 69572.517742
1 10 73721.967141
2 100 82312.312413
pd.DataFrame(list(zip(list1,list2)),columns=["list1","list2"]))
zip
creates a generator
of parallel values:zip([1,2,3],["a","b""c"])
= [1,"a"],[2,"b"],[3,"c"]
generators
need to be completely instantiated before they can be used in DataFrame
objectslist()
instantiates the full generator and passing that into the DataFrame
converts the whole expressionExtreme Gradient Boosting with XGBoost