Generalized linear models

Computational Mathematics and Statistics Camp University of Chicago September 2018

Generalized linear models

Flexible class of models to estimate linear regression for response variables that have error distribution models other than the normal distribution
Typically estimated via maximum likelihood estimation

Elements of a GLM

Random component - conditional distribution of the response variable \(Y_i\) given the values of the predictor variables
- Exponential family
Linear predictor

\[\eta_i = \beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} + \dots + \beta_{k}X_{ik}\]
Link function \(g(\cdot)\)

\[g(\mu_i) = \eta_i = \beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} + \dots + \beta_{k}X_{ik}\] \[\mu_i = g^{-1}(\eta_i) = g^{-1}(\beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} + \dots + \beta_{k}X_{ik})\]
- Transforms the expectation of the response variable, \(\mu_i \equiv E(Y_i)\) to the linear predictor
- Canonical link functions

GLMs and ordinary least squares regression

Least squares regression

\[\boldsymbol{\beta} = \mathbf{(X^{'}X)^{-1}X^{'}Y}\]
OLS as a special case of a GLM

GLMs and ordinary least squares regression

\[Pr(Y_i = y_i) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left[\frac{(Y_i - \mu)^2}{2\sigma^2}\right]\]

Log-likelihood function

\[ \begin{aligned} \log L(\hat{\mu}, \hat{\sigma}^2 | Y) &= \log \prod_{i = 1}^{N}{\frac{1}{\sqrt{2\pi\sigma^2}} \exp \left[\frac{(Y_i - \mu)^2}{2\sigma^2}\right]} \\ \log L(\hat{\mu}, \hat{\sigma}^2 | Y) &= \sum_{i=1}^{N}{\log\left(\frac{1}{\sqrt{2\pi\sigma^2}} \exp \left[\frac{(Y_i - \mu)^2}{2\sigma^2}\right]\right)} \\ \log L(\hat{\mu}, \hat{\sigma}^2 | Y) &= -\frac{N}{2} \log(2\pi) - \left[ \sum_{i = 1}^{N} \log{\sigma^2 - \frac{1}{2\sigma^2}} (Y_i - \mu)^2 \right] \end{aligned} \]

Example data

Salaries of assistant professors
id	salary
1	60
2	55
3	65
4	50
5	70

Example loglikelihood function

Adding predictor variables

\[ \begin{aligned} E(Y) &\equiv \mu = \beta_0 + \beta_{1}X_{i} \\ \mathrm{Var}(Y) &= \sigma^2 \end{aligned} \]

\[ \begin{aligned} \log L(\beta_0, \beta_1, \sigma^2 | Y) &= \log \prod_{i = 1}^{N}{\frac{1}{\sqrt{2\pi\sigma^2}} \exp \left[\frac{(Y_i - \beta_0 - \beta_{1}X_{i})^2}{2\sigma^2}\right]} \\ \log L(\beta_0, \beta_1, \sigma^2 | Y) &= -\frac{N}{2} \log(2\pi) \\ &\quad - \left[ \sum_{i = 1}^{N} \log{\sigma^2 - \frac{1}{2\sigma^2}} (Y_i - \beta_0 - \beta_{1}X_{i})^2 \right] \end{aligned} \]

Connecting MLE estimator to OLS

Kernal of the log-likelihood

\[-\sum_{i = 1}^{N} \log{\sigma^2 - \frac{1}{2\sigma^2}} (Y_i - \beta_0 - \beta_{1}X_{i})^2\]
Residual sum of squared errors

\[RSS = \sum_{i = 1}^{N} (Y_i - \beta_0 - \beta_{1}X_{i})^2\]

Link function

\[ \begin{aligned} \eta_i = g(\mu_i) = &\mu_i \\ &\mu_i = g^{-1}(\eta_i) = \eta_i \end{aligned} \]

GLM for linear regression

The random component is the normal distribution:

\[Pr(Y_i = y_i) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left[\frac{(Y_i - \mu)^2}{2\sigma^2}\right]\]
The linear predictor is:

\[\eta_{i} = \beta_0 + \beta_{1}X_{i}\]
The link function is the identity function:

\[\eta_i = \mu_i\]

Why we use the log-likelihood

GLMs and logistic regression

\[Pr(Y_i = y_i | \pi_i) = \pi_i^{y_i} (1 - \pi_i)^{(1 - y_i)}\]

GLMs and logistic regression

\[\pi_i = \eta_i\]

In the linear context

\[g(\pi_i) \equiv \eta_i = \beta_0 + \beta_{1}X_i\]
Logit link function

\[g(\pi_i) = \frac{e^{\eta_i}}{1 + e^{\eta_i}}\]

GLMs and logistic regression

The random component is the Bernoulli distribution

\[Pr(Y_i = y_i | \pi) = \pi_i^{y_i} (1 - \pi_i)^{(1 - y_i)}\]
The linear predictor is:

\[\eta_i = \beta_0 + \beta_{1}X_i\]
The link function is the logit function:

\[\eta_i = \log \left( \frac{\pi_i}{1 - \pi_i} \right)\]

GLMs and logistic regression

\[ \begin{aligned} L_i &= \left( \frac{e^{\eta_i}}{1 + e^{\eta_i}} \right) ^ {Y_i} \left[ 1 - \left( \frac{e^{\eta_i}}{1 + e^{\eta_i}} \right) \right]^{1 - Y_i} \\ L_i &= \left( \frac{e^{\beta_0 + \beta_{1}X_i}}{1 + e^{\beta_0 + \beta_{1}X_i}} \right) ^ {Y_i} \left[ 1 - \left( \frac{e^{\beta_0 + \beta_{1}X_i}}{1 + e^{\beta_0 + \beta_{1}X_i}} \right) \right]^{1 - Y_i} \\ L &= \prod_{i = 1}^{N} \left( \frac{e^{\beta_0 + \beta_{1}X_i}}{1 + e^{\beta_0 + \beta_{1}X_i}} \right) ^ {Y_i} \left[ 1 - \left( \frac{e^{\beta_0 + \beta_{1}X_i}}{1 + e^{\beta_0 + \beta_{1}X_i}} \right) \right]^{1 - Y_i} \\ \log L &= \sum_{i = 1}^{N} Y_i \log \left( \frac{e^{\beta_0 + \beta_{1}X_i}}{1 + e^{\beta_0 + \beta_{1}X_i}} \right) \\ &\quad + \sum_{i = 1}^{N}(1 - Y_i) \log \left[ 1 - \left( \frac{e^{\beta_0 + \beta_{1}X_i}}{1 + e^{\beta_0 + \beta_{1}X_i}} \right) \right] \end{aligned} \]

Logistic regression for more than two response classes

Roll-call vote outcomes
Binary response
1. Yay
2. Nay
Multiclass response
1. Yay
2. Nay
3. Abstain
4. Not present

Multinomial logistic regression

\(Y\) can take on any of \(m\) discrete values from \(1,2,\dots,m\)
Ordering does not matter
\(\pi_{ij}\) denotes the probability that the \(i\)th observation falls into the \(j\)th category of the response variable:
\[\pi_{ij} \equiv \text{Pr}(Y_i = j), \text{ for } j = 1, \dots, m\]
For a model with \(k\) regressors, \(X_1, \dots, X_k\):

\[\pi_{ij} = \frac{e^{[\gamma_{0j} + \gamma_{1j}X_{i1} + \dots + \gamma_{kj}X_{ik}]}}{1 + \sum_{l = 1}^{m-1} e^{[\gamma_{0l} + \gamma_{1l}X_{i1} + \dots + \gamma_{kl}X_{ik}]}}, \text{for } j = 1, \dots, m-1\]

\[\pi_{im} = 1 - \sum_{i = 1}^{m-1} \pi_{ij}\]

\[\sum_{j = 1}^{m} = 1\]

Multinomial logistic regression

\[\log \frac{\pi_{ij}}{\pi_{im}} = \gamma_{0j} + \gamma_{1j}X_{i1} + \dots + \gamma_{kj}X_{ik}, \text{for } j = 1, \dots, m\]

Dichotomous logit model

\[\log \frac{\pi_{i1}}{\pi_{i2}} = \log \frac{\pi_{i1}}{1 - \pi_{i1}} = \gamma_{01} + \gamma_{11}X_{i1} + \dots + \gamma_{k1}X_{ik}\]

Applying multinomial logisitic regression

## Observations: 712
## Variables: 12
## $ PassengerId <int> 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16...
## $ Survived    <int> 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0,...
## $ Pclass      <int> 3, 1, 3, 1, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2, 3, 3,...
## $ Name        <chr> "Braund, Mr. Owen Harris", "Cumings, Mrs. John Bra...
## $ Sex         <chr> "male", "female", "female", "female", "male", "mal...
## $ Age         <dbl> 22, 38, 26, 35, 35, 54, 2, 27, 14, 4, 58, 20, 39, ...
## $ SibSp       <int> 1, 1, 0, 1, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0, 4, 1,...
## $ Parch       <int> 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0, 1, 0,...
## $ Ticket      <chr> "A/5 21171", "PC 17599", "STON/O2. 3101282", "1138...
## $ Fare        <dbl> 7.25, 71.28, 7.92, 53.10, 8.05, 51.86, 21.07, 11.1...
## $ Cabin       <chr> "", "C85", "", "C123", "", "E46", "", "", "", "G6"...
## $ Embarked    <chr> "S", "C", "S", "S", "S", "S", "S", "S", "C", "S", ...

Port of embarkation
- C = Cherbourg
- Q = Queenstown
- S = Southampton

Port of embarkation

## # weights:  9 (4 variable)
## initial  value 782.211950 
## iter  10 value 423.965155
## final  value 422.773331 
## converged

## Call:
## multinom(formula = Embarked ~ Fare, data = titanic_multi)
## 
## Coefficients:
##   (Intercept)    Fare
## C       -1.96  0.0126
## Q       -2.64 -0.0159
## 
## Std. Errors:
##   (Intercept)    Fare
## C       0.133 0.00202
## Q       0.278 0.01082
## 
## Residual Deviance: 846 
## AIC: 854

##   (Intercept)  Fare
## C      -14.78  6.21
## Q       -9.47 -1.47

##   (Intercept)     Fare
## C           0 5.24e-10
## Q           0 1.41e-01

Port of embarkation: odds

##   (Intercept)  Fare
## C      0.1407 1.013
## Q      0.0717 0.984

Port of embarkation: probability

Port of embarkation: cumulative probability

Ordinal logistic regression

More than two ordered response values
Latent continuous variable

\[\zeta_i = \beta_{0} + \beta_{1}X_{i1} + \dots + \beta_{k}X_{ik} + \epsilon_i\]
Divide into \(m\) regions using \(m-1\) thresholds

\[ Y_i = \begin{cases} 1 & \text{if } \zeta_{i} \leq \alpha_{1} \\ 2 & \text{if } \alpha_{1} \lt \zeta_{i} \leq \alpha_{2} \\ \vdots & \\ m - 1 & \text{if } \alpha_{m - 2} \lt \zeta_{i} \leq \alpha_{m - 1} \\ m & \text{if } \alpha_{m - 1} \lt \zeta_{i} \\ \end{cases} \]
Log-odds of increasing to the next level
Difference between levels is constant

Passenger class

## Call:
## MASS::polr(formula = Pclass ~ Fare, data = titanic_ord, Hess = TRUE)
## 
## Coefficients:
##        Value Std. Error t value
## Fare -0.0958    0.00682   -14.1
## 
## Intercepts:
##     Value   Std. Error t value
## 1|2  -4.064   0.225    -18.102
## 2|3  -2.024   0.153    -13.257
## 
## Residual Deviance: 970.38 
## AIC: 976.38

Passenger class

Poisson distribution

\[P(Y_i = y_i | \mu) = \frac{\mu^{y_i} e^{-\mu}}{y_i!}\]

GLM and Poisson regression

The random component is the Poisson distribution:

\[P(Y_i = y_i | \mu) = \frac{\mu^{k} e^{-y_i}}{y_i!}\]
The linear predictor is

\[\eta_i = \beta_0 + \beta_{1}X_i\]
The canonical link function for the Poisson distribution is the log function

\[\eta_i = \log(\mu)\]
Estimation:

\[P(Y_i = y_i | \beta_0, \beta_1) = \frac{\log(\beta_0 + \beta_{1}X_i)^{k} e^{-y_i}}{y_i!}\]

Internal armed conflict in Africa

## Observations: 650
## Variables: 12
## $ ccode     <dbl> 404, 404, 404, 404, 404, 404, 404, 404, 404, 404, 40...
## $ year      <dbl> 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003...
## $ cabbr     <chr> "", "", "GNB", "", "GNB", "", "GNB", "", "", "", "",...
## $ INTERNAL  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ latitude  <dbl> 12.08, 12.08, 12.08, 12.08, 12.08, 12.08, 12.08, 12....
## $ longitude <dbl> -14.6, -14.6, -14.6, -14.6, -14.6, -14.6, -14.6, -14...
## $ literacy  <dbl> 44.0, 42.5, 41.0, 39.5, 38.0, 40.0, 42.0, 44.0, 46.0...
## $ refugees  <dbl> 15400, 15400, 16000, 6600, 7100, 7600, 7300, 7600, 7...
## $ lnRefs    <dbl> 9.64, 9.64, 9.68, 8.79, 8.87, 8.94, 8.90, 8.94, 8.94...
## $ lnGDP     <dbl> 6.45, 6.53, 6.57, 6.21, 6.26, 6.30, 6.27, 6.17, 6.13...
## $ lnTrade   <dbl> 3.85, 3.75, 4.11, 3.92, 4.21, 4.42, 4.52, 4.41, 4.35...
## $ PolityLag <dbl> NA, 5, 5, 5, 0, 3, 5, 5, 5, -1, -1, 6, 6, NA, -5, -5...

Count of internal armed conflicts
Measures of latitude and longitude
Adult literacy rate
Number of refugees living in the country
Lagged natural log of GDP and trade (in constant dollars)
Lagged polity score (ranging from -10 to +10, with 10 = democracy)

Internal armed conflict in Africa

## 
## Call:
## glm(formula = INTERNAL ~ literacy, family = "poisson", data = africa)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.171  -0.684  -0.511  -0.411   2.808  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.000277   0.270673    0.00        1    
## literacy    -0.029070   0.005083   -5.72  1.1e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 405.45  on 563  degrees of freedom
## Residual deviance: 372.10  on 562  degrees of freedom
##   (86 observations deleted due to missingness)
## AIC: 584.2
## 
## Number of Fisher Scoring iterations: 6

Interpreting estimated parameters

## (Intercept)    literacy 
##       1.000       0.971

Multiple predictors

## 
## Call:
## glm(formula = INTERNAL ~ literacy + PolityLag, family = "poisson", 
##     data = africa)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.380  -0.677  -0.517  -0.342   2.803  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.11181    0.28678    0.39    0.697    
## literacy    -0.03063    0.00539   -5.68  1.4e-08 ***
## PolityLag   -0.04733    0.01946   -2.43    0.015 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 380.81  on 520  degrees of freedom
## Residual deviance: 341.96  on 518  degrees of freedom
##   (129 observations deleted due to missingness)
## AIC: 544.1
## 
## Number of Fisher Scoring iterations: 6

Over or underdispersion

Poisson distribution assumes mean and variance are identical
Violation of that assumption
Test for over/under dispersion

\[V(Y_i | \eta_i) = \phi \mu_i\]

Over or underdispersion

## 
## Call:
## glm(formula = INTERNAL ~ literacy, family = "quasipoisson", data = africa)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.171  -0.684  -0.511  -0.411   2.808  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.000277   0.263033    0.00        1    
## literacy    -0.029070   0.004940   -5.89  6.8e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 0.944)
## 
##     Null deviance: 405.45  on 563  degrees of freedom
## Residual deviance: 372.10  on 562  degrees of freedom
##   (86 observations deleted due to missingness)
## AIC: NA
## 
## Number of Fisher Scoring iterations: 6

Over or underdispersion

## 
## Call:
## glm(formula = INTERNAL ~ literacy + PolityLag, family = "quasipoisson", 
##     data = africa)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.380  -0.677  -0.517  -0.342   2.803  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.11181    0.27866    0.40    0.688    
## literacy    -0.03063    0.00524   -5.84    9e-09 ***
## PolityLag   -0.04733    0.01891   -2.50    0.013 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 0.944)
## 
##     Null deviance: 380.81  on 520  degrees of freedom
## Residual deviance: 341.96  on 518  degrees of freedom
##   (129 observations deleted due to missingness)
## AIC: NA
## 
## Number of Fisher Scoring iterations: 6