Exploratory data analysis

R For SAS Users

Melinda Higgins, PhD

Research Professor/Senior Biostatistician Emory University

Summary statistics

sas proc univariate similar to r summary and describe from Hmisc and psych packages

R For SAS Users

sas proc univariate similar to summary function in R

R For SAS Users

compare data definition in sas proc univariate and r codes

R For SAS Users

show sas var statement like select function in r

R For SAS Users

Summary statistics

# Summary statistics of weight, height, bmi of daviskeep
daviskeep %>%
  select(weight, height, bmi) %>%
  summary()
     weight          height           bmi       
 Min.   : 39.0   Min.   :148.0   Min.   :15.82  
 1st Qu.: 55.0   1st Qu.:164.0   1st Qu.:20.22  
 Median : 63.0   Median :170.0   Median :21.80  
 Mean   : 65.3   Mean   :170.6   Mean   :22.26  
 3rd Qu.: 73.5   3rd Qu.:177.5   3rd Qu.:23.94  
 Max.   :119.0   Max.   :197.0   Max.   :36.73
R For SAS Users

Descriptive statistics with Hmisc

# Load Hmisc, run describe() for sex and bmi
library(Hmisc)
daviskeep %>%
  select(sex, bmi) %>%
  Hmisc::describe()
 2  Variables      199  Observations
-----------------------------------------------------
sex
       n  missing distinct
     199        0        2                       
Value          F     M
Frequency    111    88
Proportion 0.558 0.442

-----------------------------------------------------
bmi
       n  missing distinct     Info     Mean      Gmd
     199        0      176        1    22.26    3.303
     .05      .10      .25      .50      .75      .90
   18.05    18.84    20.22    21.80    23.94    26.30
     .95
   27.25
lowest : 15.82214 16.93703 17.09928 17.43285 17.50639
highest: 29.73704 29.80278 30.09496 30.15916 36.72840
R For SAS Users

Descriptive statistics with psych

# Load psych package, run psych:: describe() for weight, height, bmi
library(psych)
daviskeep %>%
  select(weight, height, bmi) %>%
  psych::describe()

Result

       vars   n   mean    sd median trimmed   mad    min    max range skew kurtosis   se
weight    1 199  65.30 13.34   63.0   64.12 11.86  39.00 119.00 80.00 0.91     0.84 0.95
height    2 199 170.59  8.95  170.0  170.40 10.38 148.00 197.00 49.00 0.21    -0.38 0.63
bmi       3 199  22.26  3.01   21.8   22.08  2.55  15.82  36.73 20.91 0.91     1.91 0.21
R For SAS Users

Specific statistic summaries

sas proc means like summarise from dplyr in r

1 https://dplyr.tidyverse.org/articles/colwise.html
R For SAS Users

get specific stats in sas and r

R For SAS Users

Specific statistic summaries - one variable

# For height, get n, median, 5th, 95th quartiles, min, max
daviskeep %>%
  summarise(nht = n(),
            medianht = median(height),
            pt05 = quantile(height, probs = 0.05),
            pt95 = quantile(height, probs = 0.95),
            minht = min(height),
            maxht = max(height))

Result

  nht medianht pt05 pt95 minht maxht
1 199      170  157  185   148   197
R For SAS Users

show similarity of proc means specific stats and dplyr summarise function from r

1 https://dplyr.tidyverse.org/articles/colwise.html
R For SAS Users

Specific statistic summaries - multiple variables

# For weight, height and bmi, get mean, standard deviation
daviskeep %>%
  select(weight, height, bmi) %>%
  summarise(across(everything(), list(mean = ~ mean(.x), 
                                      sd = ~ sd(.x))))

Result

  weight_mean weight_sd height_mean height_sd bmi_mean   bmi_sd
1    65.29648  13.34346    170.5879  8.948848 22.25761 3.009239
R For SAS Users

class statement in sas like group by in r

R For SAS Users

Summary statistics - by group

# Get mean and sd for weight, height and bmi by sex group
daviskeep %>%
  group_by(sex) %>%
  select(sex, weight, height, bmi) %>%
  summarise(across(everything(), list(mean = ~ mean(.x), 
                                      sd = ~ sd(.x))))
# A tibble: 2 × 7
  sex   weight_mean weight_sd height_mean height_sd bmi_mean bmi_sd
  <fct>       <dbl>     <dbl>       <dbl>     <dbl>    <dbl>  <dbl>
1 F            56.9      6.89        165.      5.68     21.0   2.18
2 M            75.9     11.9         178.      6.44     23.9   3.12
R For SAS Users

Let's summarise abalones!

R For SAS Users

Preparing Video For Download...