Extreme Gradient Boosting with XGBoost
Sergey Fogelson
Head of Data Science, TelevisaUnivision
import pandas as pd import xgboost as xgb import numpy as np from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline from sklearn.model_selection import cross_val_score
names = ["crime","zone","industry","charles","no","rooms","age", "distance","radial","tax","pupil","aam","lower","med_price"] data = pd.read_csv("boston_housing.csv",names=names) X, y = data.iloc[:,:-1], data.iloc[:,-1]
xgb_pipeline = Pipeline[("st_scaler", StandardScaler()), ("xgb_model",xgb.XGBRegressor())] scores = cross_val_score(xgb_pipeline, X, y, scoring="neg_mean_squared_error",cv=10)
final_avg_rmse = np.mean(np.sqrt(np.abs(scores))) print("Final XGB RMSE:", final_avg_rmse)
Final RMSE: 4.02719593323
sklearn_pandas
:DataFrameMapper
- Interoperability between pandas
and scikit-learn
sklearn.impute
:SimpleImputer
- Native imputation of numerical and categorical columns in scikit-learnsklearn.pipeline
:FeatureUnion
- combine multiple pipelines of features into a single pipeline of featuresExtreme Gradient Boosting with XGBoost