Scalable Data Processing in R
Michael Kane
Assistant Professor, Yale University
Missing Completely at Random
Missing at Random
Missing Not at Random
Full treatment of missingness is beyond the scope of this course
We will check to see if it's plausible data are MCAR and drop missing values
# Our dependent variable is_missing <- rbinom(1000, 1, 0.5)
# Our independent variables data_matrix <- matrix(rnorm(1000*10), nrow = 1000, ncol = 10) # A vector of p-values we'll fill in p_vals <- rep(NA, ncol(data_matrix))
# Perform logistic regression for (j in 1:ncol(data_matrix)) { s <- summary(glm(is_missing ~ data_matrix[, j]), family = binomial) p_vals[j] <- s$coefficients[2, 4] }
# Show the p-values p_vals
0.5930082 0.7822695 0.7560343 0.3689330 0.8757048
0.8812320 0.8281008 0.4888898 0.4781299 0.5655739
Scalable Data Processing in R