Using linear regression

HR Analytics: Exploring Employee Data in R

Ben Teusch

HR Analytics Consultant

HR Analytics: Exploring Employee Data in R

Linear regression

  • Focus on testing differences between groups
  • Learn more about other uses and the math under the hood at DataCamp
HR Analytics: Exploring Employee Data in R

HR Analytics: Exploring Employee Data in R

HR Analytics: Exploring Employee Data in R
lm(salary ~ new_hire, data = pay) %>%
    tidy()
         term  estimate std.error  statistic    p.value
1 (Intercept) 73424.603  577.2369 127.200112 0.00000000
2 new_hireYes  2649.672 1109.3568   2.388476 0.01704414
# A tibble: 2 x 2
  new_hire avg_salary
     <chr>      <dbl>
1       No   73424.60
2      Yes   76074.28
76074.28 - 73424.60
2649.68
HR Analytics: Exploring Employee Data in R

Significance for linear regression

lm(salary ~ new_hire, data = pay) %>%
    tidy()
         term  estimate std.error  statistic    p.value
1 (Intercept) 73424.603  577.2369 127.200112 0.00000000
2 new_hireYes  2649.672 1109.3568   2.388476 0.01704414
HR Analytics: Exploring Employee Data in R

Multiple linear regression

lm(salary ~ new_hire + department, data = pay) %>% 
    tidy()
               term  estimate std.error  statistic    p.value
1       (Intercept) 72844.040  679.3007 107.233869 0.00000000
2       new_hireYes  2649.028 1108.9698   2.388728 0.01903265
3 departmentFinance  3092.807 2457.0717   1.258737 0.20832572
4   departmentSales  1477.215 1082.4749   1.364665 0.17256792

HR Analytics: Exploring Employee Data in R
lm(salary ~ new_hire + department, data = pay) %>% summary()
Call:
lm(formula = salary ~ new_hire + department, data = pay)

Residuals:
   Min     1Q Median     3Q    Max 
-31674 -14446  -3629  10657  88580 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        72844.0      679.3 107.234   <2e-16 ***
new_hireYes         2649.0     1109.0   2.389    0.017 *  
departmentFinance   3092.8     2457.1   1.259    0.208    
departmentSales     1477.2     1082.5   1.365    0.173    

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 18890 on 1466 degrees of freedom
Multiple R-squared:  0.005923,    Adjusted R-squared:  0.003889 
F-statistic: 2.912 on 3 and 1466 DF,  p-value: 0.03338
HR Analytics: Exploring Employee Data in R

Let's practice!

HR Analytics: Exploring Employee Data in R

Preparing Video For Download...