Linear predictor
\[\eta_i = \beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} + \dots + \beta_{k}X_{ik}\]
Link function \(g(\cdot)\)
\[g(\mu_i) = \eta_i = \beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} + \dots + \beta_{k}X_{ik}\] \[\mu_i = g^{-1}(\eta_i) = g^{-1}(\beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} + \dots + \beta_{k}X_{ik})\]
Least squares regression
\[\boldsymbol{\beta} = \mathbf{(X^{'}X)^{-1}X^{'}Y}\]
OLS as a special case of a GLM
\[Pr(Y_i = y_i) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left[\frac{(Y_i - \mu)^2}{2\sigma^2}\right]\]
\[ \begin{aligned} \log L(\hat{\mu}, \hat{\sigma}^2 | Y) &= \log \prod_{i = 1}^{N}{\frac{1}{\sqrt{2\pi\sigma^2}} \exp \left[\frac{(Y_i - \mu)^2}{2\sigma^2}\right]} \\ \log L(\hat{\mu}, \hat{\sigma}^2 | Y) &= \sum_{i=1}^{N}{\log\left(\frac{1}{\sqrt{2\pi\sigma^2}} \exp \left[\frac{(Y_i - \mu)^2}{2\sigma^2}\right]\right)} \\ \log L(\hat{\mu}, \hat{\sigma}^2 | Y) &= -\frac{N}{2} \log(2\pi) - \left[ \sum_{i = 1}^{N} \log{\sigma^2 - \frac{1}{2\sigma^2}} (Y_i - \mu)^2 \right] \end{aligned} \]
| id | salary |
|---|---|
| 1 | 60 |
| 2 | 55 |
| 3 | 65 |
| 4 | 50 |
| 5 | 70 |
\[ \begin{aligned} E(Y) &\equiv \mu = \beta_0 + \beta_{1}X_{i} \\ \mathrm{Var}(Y) &= \sigma^2 \end{aligned} \]
\[ \begin{aligned} \log L(\beta_0, \beta_1, \sigma^2 | Y) &= \log \prod_{i = 1}^{N}{\frac{1}{\sqrt{2\pi\sigma^2}} \exp \left[\frac{(Y_i - \beta_0 - \beta_{1}X_{i})^2}{2\sigma^2}\right]} \\ \log L(\beta_0, \beta_1, \sigma^2 | Y) &= -\frac{N}{2} \log(2\pi) \\ &\quad - \left[ \sum_{i = 1}^{N} \log{\sigma^2 - \frac{1}{2\sigma^2}} (Y_i - \beta_0 - \beta_{1}X_{i})^2 \right] \end{aligned} \]
Kernal of the log-likelihood
\[-\sum_{i = 1}^{N} \log{\sigma^2 - \frac{1}{2\sigma^2}} (Y_i - \beta_0 - \beta_{1}X_{i})^2\]
Residual sum of squared errors
\[RSS = \sum_{i = 1}^{N} (Y_i - \beta_0 - \beta_{1}X_{i})^2\]
\[ \begin{aligned} \eta_i = g(\mu_i) = &\mu_i \\ &\mu_i = g^{-1}(\eta_i) = \eta_i \end{aligned} \]
The random component is the normal distribution:
\[Pr(Y_i = y_i) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left[\frac{(Y_i - \mu)^2}{2\sigma^2}\right]\]
The linear predictor is:
\[\eta_{i} = \beta_0 + \beta_{1}X_{i}\]
The link function is the identity function:
\[\eta_i = \mu_i\]
\[Pr(Y_i = y_i | \pi_i) = \pi_i^{y_i} (1 - \pi_i)^{(1 - y_i)}\]
\[\pi_i = \eta_i\]
In the linear context
\[g(\pi_i) \equiv \eta_i = \beta_0 + \beta_{1}X_i\]
Logit link function
\[g(\pi_i) = \frac{e^{\eta_i}}{1 + e^{\eta_i}}\]
The random component is the Bernoulli distribution
\[Pr(Y_i = y_i | \pi) = \pi_i^{y_i} (1 - \pi_i)^{(1 - y_i)}\]
The linear predictor is:
\[\eta_i = \beta_0 + \beta_{1}X_i\]
The link function is the logit function:
\[\eta_i = \log \left( \frac{\pi_i}{1 - \pi_i} \right)\]
\[ \begin{aligned} L_i &= \left( \frac{e^{\eta_i}}{1 + e^{\eta_i}} \right) ^ {Y_i} \left[ 1 - \left( \frac{e^{\eta_i}}{1 + e^{\eta_i}} \right) \right]^{1 - Y_i} \\ L_i &= \left( \frac{e^{\beta_0 + \beta_{1}X_i}}{1 + e^{\beta_0 + \beta_{1}X_i}} \right) ^ {Y_i} \left[ 1 - \left( \frac{e^{\beta_0 + \beta_{1}X_i}}{1 + e^{\beta_0 + \beta_{1}X_i}} \right) \right]^{1 - Y_i} \\ L &= \prod_{i = 1}^{N} \left( \frac{e^{\beta_0 + \beta_{1}X_i}}{1 + e^{\beta_0 + \beta_{1}X_i}} \right) ^ {Y_i} \left[ 1 - \left( \frac{e^{\beta_0 + \beta_{1}X_i}}{1 + e^{\beta_0 + \beta_{1}X_i}} \right) \right]^{1 - Y_i} \\ \log L &= \sum_{i = 1}^{N} Y_i \log \left( \frac{e^{\beta_0 + \beta_{1}X_i}}{1 + e^{\beta_0 + \beta_{1}X_i}} \right) \\ &\quad + \sum_{i = 1}^{N}(1 - Y_i) \log \left[ 1 - \left( \frac{e^{\beta_0 + \beta_{1}X_i}}{1 + e^{\beta_0 + \beta_{1}X_i}} \right) \right] \end{aligned} \]
\(\pi_{ij}\) denotes the probability that the \(i\)th observation falls into the \(j\)th category of the response variable:
\[\pi_{ij} \equiv \text{Pr}(Y_i = j), \text{ for } j = 1, \dots, m\]For a model with \(k\) regressors, \(X_1, \dots, X_k\):
\[\pi_{ij} = \frac{e^{[\gamma_{0j} + \gamma_{1j}X_{i1} + \dots + \gamma_{kj}X_{ik}]}}{1 + \sum_{l = 1}^{m-1} e^{[\gamma_{0l} + \gamma_{1l}X_{i1} + \dots + \gamma_{kl}X_{ik}]}}, \text{for } j = 1, \dots, m-1\]
\[\pi_{im} = 1 - \sum_{i = 1}^{m-1} \pi_{ij}\]
\[\sum_{j = 1}^{m} = 1\]
\[\log \frac{\pi_{ij}}{\pi_{im}} = \gamma_{0j} + \gamma_{1j}X_{i1} + \dots + \gamma_{kj}X_{ik}, \text{for } j = 1, \dots, m\]
Dichotomous logit model
\[\log \frac{\pi_{i1}}{\pi_{i2}} = \log \frac{\pi_{i1}}{1 - \pi_{i1}} = \gamma_{01} + \gamma_{11}X_{i1} + \dots + \gamma_{k1}X_{ik}\]
## Observations: 712
## Variables: 12
## $ PassengerId <int> 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16...
## $ Survived <int> 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0,...
## $ Pclass <int> 3, 1, 3, 1, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2, 3, 3,...
## $ Name <chr> "Braund, Mr. Owen Harris", "Cumings, Mrs. John Bra...
## $ Sex <chr> "male", "female", "female", "female", "male", "mal...
## $ Age <dbl> 22, 38, 26, 35, 35, 54, 2, 27, 14, 4, 58, 20, 39, ...
## $ SibSp <int> 1, 1, 0, 1, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0, 4, 1,...
## $ Parch <int> 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0, 1, 0,...
## $ Ticket <chr> "A/5 21171", "PC 17599", "STON/O2. 3101282", "1138...
## $ Fare <dbl> 7.25, 71.28, 7.92, 53.10, 8.05, 51.86, 21.07, 11.1...
## $ Cabin <chr> "", "C85", "", "C123", "", "E46", "", "", "", "G6"...
## $ Embarked <chr> "S", "C", "S", "S", "S", "S", "S", "S", "C", "S", ...
## # weights: 9 (4 variable)
## initial value 782.211950
## iter 10 value 423.965155
## final value 422.773331
## converged
## Call:
## multinom(formula = Embarked ~ Fare, data = titanic_multi)
##
## Coefficients:
## (Intercept) Fare
## C -1.96 0.0126
## Q -2.64 -0.0159
##
## Std. Errors:
## (Intercept) Fare
## C 0.133 0.00202
## Q 0.278 0.01082
##
## Residual Deviance: 846
## AIC: 854
## (Intercept) Fare
## C -14.78 6.21
## Q -9.47 -1.47
## (Intercept) Fare
## C 0 5.24e-10
## Q 0 1.41e-01
## (Intercept) Fare
## C 0.1407 1.013
## Q 0.0717 0.984
Latent continuous variable
\[\zeta_i = \beta_{0} + \beta_{1}X_{i1} + \dots + \beta_{k}X_{ik} + \epsilon_i\]
Divide into \(m\) regions using \(m-1\) thresholds
\[ Y_i = \begin{cases} 1 & \text{if } \zeta_{i} \leq \alpha_{1} \\ 2 & \text{if } \alpha_{1} \lt \zeta_{i} \leq \alpha_{2} \\ \vdots & \\ m - 1 & \text{if } \alpha_{m - 2} \lt \zeta_{i} \leq \alpha_{m - 1} \\ m & \text{if } \alpha_{m - 1} \lt \zeta_{i} \\ \end{cases} \]
Difference between levels is constant
## Call:
## MASS::polr(formula = Pclass ~ Fare, data = titanic_ord, Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## Fare -0.0958 0.00682 -14.1
##
## Intercepts:
## Value Std. Error t value
## 1|2 -4.064 0.225 -18.102
## 2|3 -2.024 0.153 -13.257
##
## Residual Deviance: 970.38
## AIC: 976.38
\[P(Y_i = y_i | \mu) = \frac{\mu^{y_i} e^{-\mu}}{y_i!}\]
The random component is the Poisson distribution:
\[P(Y_i = y_i | \mu) = \frac{\mu^{k} e^{-y_i}}{y_i!}\]
The linear predictor is
\[\eta_i = \beta_0 + \beta_{1}X_i\]
The canonical link function for the Poisson distribution is the log function
\[\eta_i = \log(\mu)\]
Estimation:
\[P(Y_i = y_i | \beta_0, \beta_1) = \frac{\log(\beta_0 + \beta_{1}X_i)^{k} e^{-y_i}}{y_i!}\]
## Observations: 650
## Variables: 12
## $ ccode <dbl> 404, 404, 404, 404, 404, 404, 404, 404, 404, 404, 40...
## $ year <dbl> 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003...
## $ cabbr <chr> "", "", "GNB", "", "GNB", "", "GNB", "", "", "", "",...
## $ INTERNAL <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ latitude <dbl> 12.08, 12.08, 12.08, 12.08, 12.08, 12.08, 12.08, 12....
## $ longitude <dbl> -14.6, -14.6, -14.6, -14.6, -14.6, -14.6, -14.6, -14...
## $ literacy <dbl> 44.0, 42.5, 41.0, 39.5, 38.0, 40.0, 42.0, 44.0, 46.0...
## $ refugees <dbl> 15400, 15400, 16000, 6600, 7100, 7600, 7300, 7600, 7...
## $ lnRefs <dbl> 9.64, 9.64, 9.68, 8.79, 8.87, 8.94, 8.90, 8.94, 8.94...
## $ lnGDP <dbl> 6.45, 6.53, 6.57, 6.21, 6.26, 6.30, 6.27, 6.17, 6.13...
## $ lnTrade <dbl> 3.85, 3.75, 4.11, 3.92, 4.21, 4.42, 4.52, 4.41, 4.35...
## $ PolityLag <dbl> NA, 5, 5, 5, 0, 3, 5, 5, 5, -1, -1, 6, 6, NA, -5, -5...
##
## Call:
## glm(formula = INTERNAL ~ literacy, family = "poisson", data = africa)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.171 -0.684 -0.511 -0.411 2.808
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.000277 0.270673 0.00 1
## literacy -0.029070 0.005083 -5.72 1.1e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 405.45 on 563 degrees of freedom
## Residual deviance: 372.10 on 562 degrees of freedom
## (86 observations deleted due to missingness)
## AIC: 584.2
##
## Number of Fisher Scoring iterations: 6
## (Intercept) literacy
## 1.000 0.971
##
## Call:
## glm(formula = INTERNAL ~ literacy + PolityLag, family = "poisson",
## data = africa)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.380 -0.677 -0.517 -0.342 2.803
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.11181 0.28678 0.39 0.697
## literacy -0.03063 0.00539 -5.68 1.4e-08 ***
## PolityLag -0.04733 0.01946 -2.43 0.015 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 380.81 on 520 degrees of freedom
## Residual deviance: 341.96 on 518 degrees of freedom
## (129 observations deleted due to missingness)
## AIC: 544.1
##
## Number of Fisher Scoring iterations: 6
Test for over/under dispersion
\[V(Y_i | \eta_i) = \phi \mu_i\]
##
## Call:
## glm(formula = INTERNAL ~ literacy, family = "quasipoisson", data = africa)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.171 -0.684 -0.511 -0.411 2.808
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.000277 0.263033 0.00 1
## literacy -0.029070 0.004940 -5.89 6.8e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 0.944)
##
## Null deviance: 405.45 on 563 degrees of freedom
## Residual deviance: 372.10 on 562 degrees of freedom
## (86 observations deleted due to missingness)
## AIC: NA
##
## Number of Fisher Scoring iterations: 6
##
## Call:
## glm(formula = INTERNAL ~ literacy + PolityLag, family = "quasipoisson",
## data = africa)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.380 -0.677 -0.517 -0.342 2.803
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.11181 0.27866 0.40 0.688
## literacy -0.03063 0.00524 -5.84 9e-09 ***
## PolityLag -0.04733 0.01891 -2.50 0.013 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 0.944)
##
## Null deviance: 380.81 on 520 degrees of freedom
## Residual deviance: 341.96 on 518 degrees of freedom
## (129 observations deleted due to missingness)
## AIC: NA
##
## Number of Fisher Scoring iterations: 6