Menangani Data Hilang dengan Imputasi di R
Michal Oleszak
Machine Learning Engineer

Sebagian besar model statistik mengestimasi sebaran kondisional dari variabel respons:
$p(y|X)$
Untuk satu prediksi, sebaran ini diringkas:
Sebaliknya, kita bisa mengambil sampel dari sebaran ini untuk menambah variabilitas.


Tugas: imputasi PhysActive dari data nhanes dengan regresi logistik.
nhanes_imp <- hotdeck(nhanes)
missing_physactive <- is.na(nhanes$PhysActive)
Tugas: imputasi PhysActive dari data nhanes dengan regresi logistik.
nhanes_imp <- hotdeck(nhanes)
missing_physactive <- is.na(nhanes$PhysActive)
logreg_model <- glm(PhysActive ~ Age + Weight + Pulse,
data = nhanes_imp, family = binomial)
Tugas: imputasi PhysActive dari data nhanes dengan regresi logistik.
nhanes_imp <- hotdeck(nhanes)
missing_physactive <- is.na(nhanes$PhysActive)
logreg_model <- glm(PhysActive ~ Age + Weight + Pulse,
data = nhanes_imp, family = binomial)
preds <- predict(logreg_model, type = "response")
Tugas: imputasi PhysActive dari data nhanes dengan regresi logistik.
nhanes_imp <- hotdeck(nhanes)
missing_physactive <- is.na(nhanes$PhysActive)
logreg_model <- glm(PhysActive ~ Age + Weight + Pulse,
data = nhanes_imp, family = binomial)
preds <- predict(logreg_model, type = "response")
preds <- ifelse(preds >= 0.5, 1, 0)
Tugas: imputasi PhysActive dari data nhanes dengan regresi logistik.
nhanes_imp <- hotdeck(nhanes)
missing_physactive <- is.na(nhanes$PhysActive)
logreg_model <- glm(PhysActive ~ Age + Weight + Pulse,
data = nhanes_imp, family = binomial)
preds <- predict(logreg_model, type = "response")
preds <- ifelse(preds >= 0.5, 1, 0)
nhanes_imp[missing_physactive, "PhysActive"] <- preds[missing_physactive]
Variabilitas data imputasi:
table(preds[missing_physactive])
1
26
Variabilitas data PhysActive teramati:
table(nhanes$PhysActive)
0 1
181 610
nhanes_imp <- hotdeck(nhanes)
missing_physactive <- is.na(nhanes$PhysActive)
logreg_model <- glm(PhysActive ~ Age + Weight + Pulse,
data = nhanes_imp, family = binomial)
preds <- predict(logreg_model, type = "response")
preds <- ifelse(preds >= 0.5, 1, 0)
nhanes_imp[missing_physactive, "PhysActive"] <- preds[missing_physactive]
nhanes_imp <- hotdeck(nhanes)
missing_physactive <- is.na(nhanes$PhysActive)
logreg_model <- glm(PhysActive ~ Age + Weight + Pulse,
data = nhanes_imp, family = binomial)
preds <- predict(logreg_model, type = "response")
nhanes_imp[missing_physactive, "PhysActive"] <- preds[missing_physactive]
nhanes_imp <- hotdeck(nhanes)
missing_physactive <- is.na(nhanes$PhysActive)
logreg_model <- glm(PhysActive ~ Age + Weight + Pulse,
data = nhanes_imp, family = binomial)
preds <- predict(logreg_model, type = "response")
preds <- rbinom(length(preds), size = 1, prob = preds)
nhanes_imp[missing_physactive, "PhysActive"] <- preds[missing_physactive]
Variabilitas data imputasi:
table(preds[missing_physactive])
0 1
5 21
Variabilitas data PhysActive teramati:
table(nhanes$PhysActive)
0 1
181 610
Menangani Data Hilang dengan Imputasi di R