Extreme Gradient Boosting with XGBoost
Sergey Fogelson
Head of Data Science, TelevisaUnivision
import pandas as pd import xgboost as xgb import numpy as np from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline from sklearn.model_selection import cross_val_scorenames = ["crime","zone","industry","charles","no","rooms","age", "distance","radial","tax","pupil","aam","lower","med_price"] data = pd.read_csv("boston_housing.csv",names=names) X, y = data.iloc[:,:-1], data.iloc[:,-1]xgb_pipeline = Pipeline[("st_scaler", StandardScaler()), ("xgb_model",xgb.XGBRegressor())] scores = cross_val_score(xgb_pipeline, X, y, scoring="neg_mean_squared_error",cv=10)final_avg_rmse = np.mean(np.sqrt(np.abs(scores))) print("Final XGB RMSE:", final_avg_rmse)
Final RMSE: 4.02719593323
  sklearn_pandas:DataFrameMapper - Interoperability between pandas and scikit-learnsklearn.impute:SimpleImputer - Native imputation of numerical and categorical columns in scikit-learnsklearn.pipeline:FeatureUnion - combine multiple pipelines of features into a single pipeline of featuresExtreme Gradient Boosting with XGBoost