Intermediate Regression in R
Richie Cotton
Data Evangelist at DataCamp
gaussian_distn <- tibble(
x = seq(-4, 4, 0.05),
gauss_pdf_x = dnorm(x)
)
ggplot(gaussian_distn, aes(x, gauss_pdf_x)) +
geom_line()
gaussian_distn <- tibble(
x = seq(-4, 4, 0.05),
gauss_pdf_x = dnorm(x),
gauss_cdf_x = pnorm(x)
)
ggplot(gaussian_distn, aes(x, gauss_cdf_x)) +
geom_line()
gaussian_distn_inv <- tibble(
p = seq(0.001, 0.999, 0.001),
gauss_inv_cdf_p = qnorm(p)
)
ggplot(gaussian_distn_inv, aes(p, gauss_inv_cdf_p)) +
geom_line()
curve | prefix | normal | logistic | nmemonic |
---|---|---|---|---|
d | dnorm() |
dlogis() |
"d" for differentiate - you differentiate the CDF to get the PDF | |
CDF | p | pnorm() |
plogis() |
"p" is backwards "q" so it's the inverse of the inverse CDF |
Inv. CDF | q | qnorm() |
qlogis() |
"q" for quantile |
lm(response ~ explanatory, data = dataset)
glm(response ~ explanatory, data = dataset, family = gaussian)
glm(response ~ explanatory, data = dataset, family = binomial)
str(gaussian())
List of 11
$ family : chr "gaussian"
$ link : chr "identity"
$ linkfun :function (mu)
$ linkinv :function (eta)
$ variance :function (mu)
$ dev.resids:function (y, mu, wt)
$ aic :function (y, n, mu, wt, dev)
$ mu.eta :function (eta)
$ initialize: expression({ n <- rep.int(1, nobs) if (is.null(etastart) && is.null(start) &&
is.null(mustart) && ((family$link| __truncated__
$ validmu :function (mu)
$ valideta :function (eta)
- attr(*, "class")= chr "family"
Link function is a transformation of the response variable
gaussian()$linkfun
function (mu)
mu
gaussian()$linkinv
function (eta)
eta
logistic_distn <- tibble(
x = seq(-6, 6, 0.05),
logistic_pdf_x = dlogis(x)
)
ggplot(logistic_distn, aes(x, logistic_pdf_x)) +
geom_line()
$\text{cdf}(x) = \frac{1}{(1 + exp(-x))}$
Logistic distribution inverse CDF is also called the logit function.
Intermediate Regression in R