Determining dimensionality

Factor Analysis in R

Jennifer Brussow

Psychometrician

How many dimensions does your data have?

dimensions

Factor Analysis in R

The bfi dataset

  • Big Five Inventory
  • 2,800 subjects
  • 25 questions
  • Data collected from the Synthetic Aperture Personality Assessment (SAPA)
Factor Analysis in R

bfi_structure

Factor Analysis in R

1 = Very Inaccurate   ...   6 = Very Accurate

head(bfi)
      A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1 E2 E3 E4 E5 N1 N2 N3 N4 N5 O1  ...
61617  2  4  3  4  4  2  3  3  4  4  3  3  3  4  4  3  4  2  2  3  3  ...
61618  2  4  5  2  5  5  4  4  3  4  1  1  6  4  3  3  3  3  5  5  4  ...
61620  5  4  5  4  4  4  5  4  2  5  2  4  4  4  5  4  5  4  2  3  4  ...
61621  4  4  6  5  5  4  4  3  5  5  5  3  4  4  4  2  5  2  4  1  3  ...
61622  2  3  3  4  5  4  4  5  3  2  2  2  5  4  5  2  3  4  4  3  3  ...
61623  6  6  5  6  5  6  6  6  1  3  2  1  6  5  6  3  5  2  2  3  4  ...
names(bfi)
"A1" "A2" "A3" "A4" "A5" "C1" "C2" "C3" "C4" "C5" "E1" "E2" 
"E3" "E4" "E5" "N1" "N2" "N3" "N4" "N5" "O1" "O2" "O3" "O4" "O5"
Factor Analysis in R

Setup: split your dataset

# Establish two sets of indices to split the dataset
N <- nrow(bfi)
indices <- seq(1, N)
indices_EFA <- sample(indices, floor((.5*N)))
indices_CFA <- indices[!(indices %in% indices_EFA)]
# Use those indices to split the dataset into halves for your EFA and CFA
bfi_EFA <- bfi[indices_EFA, ]
bfi_CFA <- bfi[indices_CFA, ]
Factor Analysis in R

Setup: split your dataset

head(bfi_EFA, 2)
      A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1 E2 E3 E4 E5 N1 N2 N3 N4 N5 O1  ...
65237  3  4  4  4  4  4  4  5  2  3  3  4 NA  4  4  4  3  1  3  2  4  ...
61825  3  1  2  2  2  2  1  2  6  6  6  6  1  1  1  3  5  4  4  4  5  ...
head(bfi_CFA, 2)
      A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1 E2 E3 E4 E5 N1 N2 N3 N4 N5 O1  ...
61617  2  4  3  4  4  2  3  3  4  4  3  3  3  4  4  3  4  2  2  3  3  ...
61621  4  4  6  5  5  4  4  3  5  5  5  3  4  4  4  2  5  2  4  1  3  ...
...
Factor Analysis in R

An empirical approach to dimensionality

Imagine we have no theory...

no_theory

Without theory, use an empirical approach: Eigenvalues

Factor Analysis in R

Calculate the correlation matrix

# Calculate the correlation matrix first
bfi_EFA_cor <- cor(bfi_EFA, use = "pairwise.complete.obs")
           A1          A2          A3          A4          A5         C1 ...
A1  1.00000000 -0.31920397 -0.25651343 -0.12441523 -0.20083692  0.058252 
A2 -0.31920397  1.00000000  0.46698961  0.30599175  0.36599749  0.075002 
A3 -0.25651343  0.46698961  1.00000000  0.32762347  0.47616038  0.089720 
A4 -0.12441523  0.30599175  0.32762347  1.00000000  0.27182236  0.083987 
A5 -0.20083692  0.36599749  0.47616038  0.27182236  1.00000000  0.116890 
C1  0.05825219  0.07500228  0.08972097  0.08398741  0.11689059  1.000000 
C2  0.04236764  0.12843266  0.10471200  0.22697628  0.09639765  0.421518 
C3 -0.02289831  0.18618382  0.14009601  0.09975850  0.13797236  0.301556 
C4  0.09865372 -0.11178917 -0.11576273 -0.15035049 -0.10248897 -0.354081 
C5  0.04925038 -0.10820392 -0.15392300 -0.24998065 -0.15667123 -0.269701 
...
Factor Analysis in R

Eigenvalues

# Calculate the correlation matrix first
bfi_EFA_cor <- cor(bfi_EFA, use = "pairwise.complete.obs")

# Then use that correlation matrix to create the scree plot
scree(bfi_EFA_cor, factors = FALSE)
Factor Analysis in R

Scree plots

# Calculate the correlation matrix first
bfi_EFA_cor <- cor(bfi_EFA, use = "pairwise.complete.obs")

# Then use that correlation matrix to create the scree plot
scree(bfi_EFA_cor, factors = FALSE)
Factor Analysis in R

Scree_plot

Factor Analysis in R

Let's practice!

Factor Analysis in R

Preparing Video For Download...