Imputasi median

Machine Learning dengan caret di R

Max Kuhn

Software Engineer at RStudio and creator of caret

Menangani nilai hilang

  • Kebanyakan model memerlukan angka, tidak bisa menangani data hilang
  • Pendekatan umum: hapus baris dengan data hilang
    • Dapat menimbulkan bias
    • Membuat model terlalu percaya diri
  • Strategi lebih baik: imputasi median!
    • Ganti nilai hilang dengan median
    • Efektif jika data hilang acak (MAR)
Machine Learning dengan caret di R

Contoh: mtcars

# Generate some data with missing values
data(mtcars)
set.seed(42)
mtcars[sample(1:nrow(mtcars), 10), "hp"] <- NA
# Split target from predictors
Y <- mtcars$mpg
X <- mtcars[, 2:4]
# Try to fit a caret model
library(caret)
model <- train(X, Y)
Error in train.default(X, Y) : Stopping 
Machine Learning dengan caret di R

Solusi sederhana

# Now fit with median imputation
model <- train(X, Y, preProcess = "medianImpute")
print(model)
Random Forest 

32 samples
 3 predictor

Pre-processing: median imputation (3) 
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 32, 32, 32, 32, 32, 32, ... 
Resampling results across tuning parameters:

  mtry  RMSE      Rsquared 
  2     2.617096  0.8234652
  3     2.670550  0.8164535

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 2. 
Machine Learning dengan caret di R

Ayo berlatih!

Machine Learning dengan caret di R

Preparing Video For Download...