Checking normality of multivariate data

Multivariate Probability Distributions in R

Surajit Ray

Professor, University of Glasgow

Why check normality?

  • Classical statistical techniques that assume univariate/multivariate normality:
    • Multivariate regression
    • Discriminant analysis
    • Model-based clustering
    • Principal component analysis (PCA)
    • Multivariate analysis of variance (MANOVA)
Multivariate Probability Distributions in R

Review: univariate normality tests

qqnorm(iris_raw[, 1])
qqline(iris_raw[, 1])

  • If the values lie along the reference line the distribution is close to normal
Multivariate Probability Distributions in R

Review: univariate normality tests

qqnorm(iris_raw[, 1])
qqline(iris_raw[, 1])

  • If the values lie along the reference line the distribution is close to normal

  • Deviation from the line might indicate

    • heavier tails
    • skewness
    • outliers
    • clustered data
Multivariate Probability Distributions in R

qqnorm of all variables

mvn(iris_raw[, 1:4], univariatePlot = "qqplot")

Multivariate Probability Distributions in R

MVN library multivariate normality test functions

MVN version 5.9

  • Multivariate normality tests by

    • Mardia
    • Henze-Zirkler
    • Royston
  • Graphical appoaches

    • chi-square Q-Q
    • perspective
    • contour plots
Multivariate Probability Distributions in R

MVN library multivariate normality test functions

  • Multivariate normality tests by

    • Mardia $\checkmark$
    • Henze-Zirkler $\checkmark$
    • Royston
  • Graphical appoaches

    • chi-square Q-Q $\checkmark$
    • perspective
    • contour plots
Multivariate Probability Distributions in R

Using Mardia Test to check multivariate normality

mvn(iris_raw[, 1:4],mvnTest = "mardia")
$multivariateNormality 
             Test          Statistic              p value Result
1 Mardia Skewness   67.4305087780629 4.75799820400705e-07     NO
2 Mardia Kurtosis -0.230112114480775    0.818004651478188    YES
3             MVN               <NA>                 <NA>     NO
Multivariate Probability Distributions in R

Using qqplot from Mardia Test to check multivariate normality

 mvn(iris_raw[, 1:4],mvnTest = "mardia", multivariatePlot = "qq")

Multivariate Probability Distributions in R

Using Henze-Zirkler's test to check multivariate normality

mvn(iris_raw[, 1:4],mvnTest = "hz")
$multivariateNormality
           Test  HZ p value MVN
1 Henze-Zirkler 2.3       0  NO
Multivariate Probability Distributions in R

Testing multivariate normality by species

mvn(iris[iris$Species  == "setosa", 1:4], 
     mvnTest = "mardia",
     multivariatePlot = "qq")
$multivariateNormality
             Test        Statistic           p value Result
1 Mardia Skewness 25.6643445196298 0.177185884467652    YES
2 Mardia Kurtosis 1.29499223711605 0.195322907441935    YES
3             MVN             <NA>              <NA>    YES

Multivariate Probability Distributions in R

Let's make use of the tests for multivariate normality!

Multivariate Probability Distributions in R

Preparing Video For Download...