Machine Learning dengan Model Berbasis Pohon di Python
Elie Kawerk
Data Scientist
beberapa instance dapat diambil berkali-kali untuk satu model,
instance lain mungkin tidak diambil sama sekali.
Rata-rata, untuk tiap model, 63% data latih diambil.
Sisa 37% menjadi instance OOB.

# Import models and split utility function
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
# Set seed for reproducibility
SEED = 1
# Split data into 70% train and 30% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3,
stratify= y,
random_state=SEED)
# Instantiate a classification-tree 'dt' dt = DecisionTreeClassifier(max_depth=4, min_samples_leaf=0.16, random_state=SEED)# Instantiate a BaggingClassifier 'bc'; set oob_score = True bc = BaggingClassifier(base_estimator=dt, n_estimators=300, oob_score=True, n_jobs=-1)# Fit 'bc' to the training set bc.fit(X_train, y_train) # Predict the test set labels y_pred = bc.predict(X_test)
# Evaluate test set accuracy test_accuracy = accuracy_score(y_test, y_pred)# Extract the OOB accuracy from 'bc' oob_accuracy = bc.oob_score_ # Print test set accuracy print('Test set accuracy: {:.3f}'.format(test_accuracy))
Akurasi set uji: 0.936
# Print OOB accuracy
print('OOB accuracy: {:.3f}'.format(oob_accuracy))
Akurasi OOB: 0.925
Machine Learning dengan Model Berbasis Pohon di Python