Mengukur kecocokan model

Pengantar Regresi di R

Richie Cotton

Data Evangelist at DataCamp

Model bream dan perch

Ikan Bream

Scatter plot massa bream vs panjang, dengan garis tren, seperti sebelumnya.

Ikan Perch

Scatter plot massa perch vs panjang, dengan garis tren, seperti sebelumnya.

Pengantar Regresi di R

Koefisien determinasi

Terkadang disebut "r-kuadrat" atau "R-squared".

proporsi varians pada variabel respons yang dapat diprediksi dari variabel penjelas

  • 1 berarti kecocokan sempurna
  • 0 berarti kecocokan terburuk
Pengantar Regresi di R

summary()

Lihat nilai bertajuk "Multiple R-Squared"

mdl_bream <- lm(mass_g ~ length_cm, data = bream)
summary(mdl_bream)
# Beberapa baris output diabaikan

Residual standard error: 74.15 on 33 degrees of freedom
Multiple R-squared:  0.8781,    Adjusted R-squared:  0.8744 
F-statistic: 237.6 on 1 and 33 DF,  p-value: < 2.2e-16
Pengantar Regresi di R

glance()

library(broom)
library(dplyr)
mdl_bream %>% 
  glance()
# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.878         0.874  74.2      238. 1.22e-16     1  -199.  405.  409.
# ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
mdl_bream %>% 
  glance() %>% 
  pull(r.squared)
0.8780627
Pengantar Regresi di R

Hanya korelasi kuadrat

bream %>% 
  summarize(
    coeff_determination = cor(length_cm, mass_g) ^ 2
  )
  coeff_determination
1           0.8780627
Pengantar Regresi di R

Residual standard error (RSE)

selisih “tipikal” antara prediksi dan respons teramati

Unitnya sama dengan variabel respons.

Pengantar Regresi di R

summary() lagi

Lihat nilai bertajuk "Residual standard error"

summary(mdl_bream)
# Beberapa baris output diabaikan

Residual standard error: 74.15 on 33 degrees of freedom
Multiple R-squared:  0.8781,    Adjusted R-squared:  0.8744 
F-statistic: 237.6 on 1 and 33 DF,  p-value: < 2.2e-16
Pengantar Regresi di R

glance() lagi

library(broom)
library(dplyr)
mdl_bream %>% 
  glance()
# A tibble: 1 x 11
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC deviance df.residual
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <int>  <dbl> <dbl> <dbl>    <dbl>       <int>
1     0.878         0.874  74.2      238. 1.22e-16     2  -199.  405.  409.  181452.          33
mdl_bream %>% 
  glance() %>% 
  pull(sigma)
74.15224
Pengantar Regresi di R

Menghitung RSE: kuadrat residual

bream %>% 
  mutate(
    residuals_sq = residuals(mdl_bream) ^ 2
  )
  species mass_g length_cm residuals_sq
1   Bream    242      23.2     138.9571
2   Bream    290      24.0     260.7586
3   Bream    340      23.9    5126.9926
4   Bream    363      26.3    1318.9197
5   Bream    430      26.5     390.9743
6   Bream    450      26.8     547.9380
...
Pengantar Regresi di R

Menghitung RSE: jumlah kuadrat residual

bream %>% 
  mutate(
    residuals_sq = residuals(mdl_bream) ^ 2
  ) %>% 
  summarize(
    resid_sum_of_sq = sum(residuals_sq)
  )
  resid_sum_of_sq
1        181452.3
Pengantar Regresi di R

Menghitung RSE: derajat kebebasan

Derajat kebebasan sama dengan jumlah observasi dikurangi jumlah koefisien model.

bream %>% 
  mutate(
    residuals_sq = residuals(mdl_bream) ^ 2
  ) %>% 
  summarize(
    resid_sum_of_sq = sum(residuals_sq),
    deg_freedom = n() - 2
  )
  resid_sum_of_sq deg_freedom
1        181452.3          33
Pengantar Regresi di R

Menghitung RSE: akar dari rasio

bream %>% 
  mutate(
    residuals_sq = residuals(mdl_bream) ^ 2
  ) %>% 
  summarize(
    resid_sum_of_sq = sum(residuals_sq),
    deg_freedom = n() - 2,
    rse = sqrt(resid_sum_of_sq / deg_freedom)
  )
  resid_sum_of_sq deg_freedom      rse
1        181452.3          33 74.15224
Pengantar Regresi di R

Menafsirkan RSE

mdl_bream memiliki RSE 74.

Selisih antara massa bream terprediksi dan teramati biasanya sekitar 74 g.

Pengantar Regresi di R

Root-mean-square error (RMSE)

Residual standard error

bream %>% 
  mutate(
    residuals_sq = residuals(mdl_bream) ^ 2
  ) %>% 
  summarize(
    resid_sum_of_sq = sum(residuals_sq),
    deg_freedom = n() - 2,
    rse = sqrt(resid_sum_of_sq / deg_freedom)
  )

Root-mean-square error

bream %>% 
  mutate(
    residuals_sq = residuals(mdl_bream) ^ 2
  ) %>% 
  summarize(
    resid_sum_of_sq = sum(residuals_sq),
    n_obs = n(),
    rmse = sqrt(resid_sum_of_sq / n_obs)
  )
Pengantar Regresi di R

Ayo berlatih!

Pengantar Regresi di R

Preparing Video For Download...