Handling Missing Data with Imputations in R
Michal Oleszak
Machine Learning Engineer
In almost half of the studies, key results disappear
Bootstrapping = sampling rows with replacement to get original-size data
Pros:
Cons:
calc_correlation <- function(data, indices) {
# Return the correlation coefficient
return(corr_coeff)
}
calc_correlation <- function(data, indices) {
# Get bootstrap sample
data_boot <- data[indices, ]
# Return the correlation coefficient
return(corr_coeff)
}
calc_correlation <- function(data, indices) {
# Get bootstrap sample
data_boot <- data[indices, ]
# Impute with kNN imputation
data_imp <- kNN(data_boot)
# Return the correlation coefficient
return(corr_coeff)
}
calc_correlation <- function(data, indices) {
# Get bootstrap sample
data_boot <- data[indices, ]
# Impute with kNN imputation
data_imp <- kNN(data_boot)
# Calculate correlation between Weight and TotChol
corr_coeff <- cor(data_imp$Weight, data_imp$TotChol)
# Return the correlation coefficient
return(corr_coeff)
}
library(boot)
boot_results <- boot(nhanes, statistic = calc_correlation, R = 50)
print(boot_results)
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = nhanes, statistic = calc_correlation, R = 50)
Bootstrap Statistics :
original bias std. error
t1* 0.03028306 0.007385452 0.04207152
plot(boot_results)
boot_ci <- boot.ci(boot_results, conf = 0.95, type = "norm")
print(boot_ci)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 50 bootstrap replicates
CALL :
boot.ci(boot.out = boot_results, conf = 0.95, type = "norm")
Intervals :
Level Normal
95% (-0.0596, 0.1054 )
Calculations and Intervals on Original Scale
Handling Missing Data with Imputations in R