Machine Learning with Tree-Based Models in Python
Elie Kawerk
Data Scientist
Chap 1: Classification And Regression Tree (CART)
Chap 2: The Bias-Variance Tradeoff
Chap 3: Bagging and Random Forests
Chap 4: Boosting
Chap 5: Model Tuning
Sequence of if-else questions about individual features.
Objective: infer class labels.
Able to capture non-linear relationships between features and labels.
Don't require feature scaling (ex: Standardization, ..)
# Import DecisionTreeClassifier from sklearn.tree import DecisionTreeClassifier # Import train_test_split from sklearn.model_selection import train_test_split # Import accuracy_score from sklearn.metrics import accuracy_score
# Split the dataset into 80% train, 20% test X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.2, stratify=y, random_state=1)
# Instantiate dt dt = DecisionTreeClassifier(max_depth=2, random_state=1)
# Fit dt to the training set dt.fit(X_train,y_train) # Predict the test set labels y_pred = dt.predict(X_test)
# Evaluate the test-set accuracy accuracy_score(y_test, y_pred)
0.90350877192982459
Decision region: region in the feature space where all instances are assigned to one class label.
Decision Boundary: surface separating different decision regions.
Machine Learning with Tree-Based Models in Python