Ordinary least squares regression

Computational Mathematics and Statistics Camp University of Chicago September 2018

Motivation

  • \(Y\)
  • \(X\)
  • \(i = \{1,2,\ldots, N \}\)

    \[Y_i = \mu + u_i\]

    • Systematic component
    • Stochastic component
  • Linear relationship

    \[\mu_i = \beta_0 + \beta_1 X_i\] \[Y_i = \beta_0 + \beta_1 X_i + u_i\]

  • Estimate
    • Point estimates: \(\hat{\beta}_0 \text{ and } \hat{\beta}_1\)
    • Variability: standard errors for \(\hat{\beta}_0 \text{ and } \hat{\beta}_1\)

Motivation

\[\hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i\] \[ \begin{aligned} \hat{u}_i &= Y_i - \hat{Y}_i \\ &= Y_i - \hat{\beta}_0 + \hat{\beta}_1 X_i \end{aligned} \]

Estimating \(\hat{\beta}_0 \text{ and } \hat{\beta}_1\)

Estimating \(\hat{\beta}_0 \text{ and } \hat{\beta}_1\)

Desired qualities of estimator

  • Unbiased
    • \(E(\hat{\beta}) = \beta\)
  • Efficient
    • \(\min(Var(\hat{\beta}))\)

Least squares regression

\[ \begin{aligned} &\min(\hat{S}) \\ \min(\hat{S}) &= \sum_{i=1}^n \hat{u}_i^2 \\ \sum_{i=1}^n (s_i)^2 &= \sum_{i=1}^n (Y_i - (\beta_0 + \beta_1 X_i))^2\\ f(\beta_0, \beta_1 | x_i, y_i) & = \sum_{i=1}^n (Y_i - \beta_0 - \beta_1 X_i )^2\\ \dfrac{\partial{ f(\beta_0, \beta_1 | x_i, y_i)}}{\partial \beta_0} & = -2 (\sum_{i=1}^n (Y_i - \beta_0 - \beta_1 X_i))\\ & = \sum_{i=1}^n -2Y_i + 2\beta_0 + 2\beta_1 X_i\\ 0 & = \sum_{i=1}^n -2Y_{i} + 2\beta_0 + 2\beta_1 X_i\\ 0 & = -2 \sum_{i=1}^n Y_{i} + 2\sum_{i=1}^n \beta_0 + 2\beta_1 \sum_{i=1}^n X_i\\ 0 & = -2 \sum_{i=1}^n Y_{i} + (n \times 2\beta_0) + 2\beta_1 \sum_{i=1}^n X_i\\ n \times 2\beta_0 & = 2 \sum_{i=1}^n Y_i - 2\beta_1 \sum_{i=1}^n X_i\\ \hat \beta_0 & = \dfrac{2 \sum_{i=1}^n Y_i}{2n} - \dfrac{2\beta_1 \sum_{i=1}^n X_i}{2n}\\ & = \dfrac{\sum_{i=1}^n Y_i}{n} - \beta_1\dfrac{ \sum_{i=1}^n X_i}{n}\\ \hat \beta_0 & = \bar{Y} - \beta_1 \bar{X} \end{aligned} \]

Least squares regression

\[ \begin{aligned} \dfrac{\partial{ f(\beta_0, \beta_1 | x_i, y_i)}}{\partial \beta_1} & = \sum_{i=1}^n -2X_i(Y_i - \beta_0 - \beta_1 X_i) \\ & = \sum_{i=1}^n -2Y_iX_i + 2\beta_0X_i + 2\beta_1 X_i^2\\ 0 & = \sum_{i=1}^n -2Y_iX_i + 2\beta_0 \sum_{i=1}^nX_i + 2\beta_1 \sum_{i=1}^n X_i^2\\ & = \sum_{i=1}^n -2Y_iX_i + 2 (\bar{Y} - \beta_1 \bar{X}) \sum_{i=1}^nX_i + 2\beta_1 \sum_{i=1}^n X_i^2\\ & = \sum_{i=1}^n -2Y_iX_i + 2\bar{Y} \sum_{i=1}^nX_i - 2\beta_1 \bar{X}\sum_{i=1}^nX_i + 2\beta_1 \sum_{i=1}^n X_i^2\\ 2\beta_1 \sum_{i=1}^n X_i^2 - 2\beta_1 \bar{X}\sum_{i=1}^nX_i & = \sum_{i=1}^n 2Y_iX_i - 2\bar{Y} \sum_{i=1}^nX_i\\ \beta_1 ( \sum_{i=1}^n X_i^2 - \bar{X}\sum_{i=1}^nX_i ) & = \sum_{i=1}^n Y_iX_i - \bar{Y} \sum_{i=1}^nX_i\\ \hat \beta_1 & = \dfrac{ \sum_{i=1}^n Y_iX_i - \bar{Y} \sum_{i=1}^nX_i}{ \sum_{i=1}^n X_i^2 - \bar{X}\sum_{i=1}^nX_i}\\ \hat \beta_0 & = \bar{Y} - \hat{\beta}_1 \bar{X} \end{aligned} \]

Least squares regression

Inference with point estimates

  • Seek to make inferences about a population
  • Variability of our estimates
  • Sampling variability properties
  • Properties of a good estimator
    • Unbiased
    • Efficient

Properties of the estimators

  • Least square regressor

    \[Y_i = \beta_0 + \beta_1 X_i + u_i\]
    • Where \(u_i \sim N(0, \sigma^2)\)
  • Variance of the least-squares estimates

    \[Var(\hat{\beta}_0) = \frac{\sum X_i^2}{N\sum(X_i - \bar{X})^2}\] \[Var(\hat{\beta}_1) = \frac{\sigma^2}{\sum(X_i - \bar{X})^2}\] \[Cov(\hat{\beta}_0,\hat{\beta}_1) = \frac{-\bar{X}}{\sum(X_i - \bar{X})^2}\]

Important notes

  • Variance of both estimates is proportional to \(\sigma^2\)
  • Variance of both estimates is inversely proportional to \(\sum(X_i - \bar{X})\)
  • As \(N\) increases, variability decreases
  • Covariance of the two estimates depends on the sign of \(\bar{X}\)

Gauss-Markov theorem

Given the assumptions of the classical linear regression model, the least squares estimators are the minimum variance estimators among the class of unbiased estimators. (They are BLUE [Best Linear Unbiased Estimator])

Unbiased

\[ \begin{aligned} \hat{\beta}_1 &= \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sum (X_i - \bar{X})^2} \\ &= \frac{\sum (X_i - \bar{X})Y_i}{\sum (X_i - \bar{X})^2} - \frac{\bar{Y}(X_i - \bar{X})}{\sum (X_i - \bar{X})^2} \\ &= \frac{\sum (X_i - \bar{X})Y_i}{\sum (X_i - \bar{X})^2} \\ &= \sum k_i Y_i \end{aligned} \]

where \(k_i\) is a weighting function \(\frac{\sum (X_i - \bar{X})}{\sum (X_i - \bar{X})^2}\)

Unbiased

  • New weight \(w_i\)

    \[\tilde{\beta}_1 = \sum w_i Y_i\]
  • What happens to \(\hat{\beta}_1\)?

    \[ \begin{aligned} E(\tilde{\beta}_1) &= \sum w_i E(Y_i) \\ &= \sum w_i (\beta_0 + \beta_1 X_i) \\ &= \beta_0 \sum w_i + \beta_1 \sum w_i X_i \end{aligned} \]

    • \(\sum w_i = 0\) and \(\sum (w_iX_i) = 1\)
    • Otherwise \(\tilde{\beta}_1 \neq \beta_1\)

Efficient

\[ \begin{aligned} Var(\tilde{\beta}_1) &= Var \left(\sum w_i Y_i \right) \\ &= \sigma^2 \sum w_i^2 \\ &= \sigma^2 \sum \left[ w_i - \frac{X_i - \bar{X}}{\sum(X_i - \bar{X})^2} + \frac{X_i - \bar{X}}{\sum(X_i - \bar{X})^2} \right]^2 \\ &= \sigma^2 \sum \left[ w_i - \frac{X_i - \bar{X}}{\sum(X_i - \bar{X})^2}\right]^2 + \sigma^2 \left[\frac{1}{\sum(X_i - \bar{X})^2}\right] \\ \min(Var[\tilde{\beta}_1]) &= \sum \left[ w_i - \frac{X_i - \bar{X}}{\sum(X_i - \bar{X})^2}\right]^2 \\ \end{aligned} \]

  • Minimized when it equals 0

    \[w_i = \frac{X_i - \bar{X}}{\sum(X_i - \bar{X})^2}\]

  • Substitute back in

    \[Var(\tilde{\beta}_1) = \frac{\sigma^2}{\sum(X_i - \bar{X})^2}\]

    • Variance of the least-squares estimator

Inference

  • Assume \(u_i \sim N(0, \sigma^2)\)
  • Estimates of \(\hat{\beta}_0\) and \(\hat{\beta}_1\) also normally distributed

    \[ \begin{aligned} \hat{\beta}_0 &\sim N(\beta_0, Var(\hat{\beta}_0)) \\ \hat{\beta}_1 &\sim N(\beta_1, Var(\hat{\beta}_1)) \\ \end{aligned} \]
  • \(z\) scores!

    \[ \begin{aligned} z_{\hat{\beta}_1} &= \frac{\hat{\beta}_1 - \beta_1}{\sqrt{Var(\hat{\beta}_1)}} \\ &= \frac{\hat{\beta}_1 - \beta_1}{se(\hat{\beta}_1)} \\ \end{aligned} \]

Inference

  • \(\sigma^2 = Var(\hat{\beta}_1)\) is unknown
  • Estimating \(\sigma^2\): \(\hat{\sigma}^2\)
    • Estimated variance of the errors \(u_i\)

      \[\hat{\sigma}^2 = \frac{\sum \hat{u}_i^2}{N - k}\]
  • Estimated variances

    \[ \begin{aligned} \widehat{Var(\hat{\beta}_1)} &= \frac{\hat{\sigma}^2}{\sum (X_i - \bar{X})^2} \\ \widehat{Var(\hat{\beta}_0)} &= \frac{\sum X_i^2}{N \sum (X_i - \bar{X})^2} \hat{\sigma}^2 \\ \end{aligned} \]
  • Estimated standard errors

    \[ \begin{aligned} \widehat{se(\hat{\beta}_1)} &= \sqrt{\widehat{Var(\hat{\beta}_1)}} \\ &= \frac{\hat{\sigma}}{\sqrt{\sum (X_i - \bar{X})^2}} \\ \end{aligned} \]

Inference

\[ \begin{aligned} t_{\hat{\beta}_1} \equiv \frac{\hat{\beta}_1 - \beta_1}{\widehat{se(\hat{\beta}_1)}} &= \frac{\hat{\beta}_1 - \beta_1}{\frac{\hat{\sigma}}{\sqrt{\sum (X_i - \bar{X})^2}}} \\ &= \frac{\hat{(\beta}_1 - \beta_1)\sqrt{\sum (X_i - \bar{X})^2}}{\sigma} \end{aligned} \]

  • \(t\) statistic, not a \(z\) score
  • Numerator is standard normal
  • \(u_i\) is standard normal
  • Squared errors \(u_i^2\) independent \(\chi^2\) variables with \(df = 1\) each
  • \(t\) distribution \(\equiv\) standard normal divided by square root of \(\chi^2\) variable

Hypothesis testing

  1. Plug-in a value for \(\beta\) into the \(t\)-test formula
  2. Choose a significance level \(\alpha\)
  3. Evaluate how likely it is to draw that particular sample of data given the estimated value \(\hat{\beta}\)
  • Null hypothesis testing
  • Confidence intervals

Credit card balance

Credit card balance

## 
## Call:
## lm(formula = Balance ~ Income, data = credit)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -803.6 -349.0  -54.4  331.7 1100.2 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  246.515     33.199    7.43  6.9e-13 ***
## Income         6.048      0.579   10.44  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 408 on 398 degrees of freedom
## Multiple R-squared:  0.215,  Adjusted R-squared:  0.213 
## F-statistic:  109 on 1 and 398 DF,  p-value: <2e-16

Multivariate regression

\[\mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{u}\]

  • \(\mathbf{Y}\): \(N\times 1\) vector
  • \(\mathbf{X}\): \(N \times K\) matrix
  • \(\boldsymbol{\beta}\): \(K \times 1\) vector
  • \(\mathbf{u}\): \(N\times 1\) vector
  • \(i \in \{1,\ldots,N \}\)
  • \(k \in \{1,\ldots,K \}\)

    \[Y_i = \beta_0 + \beta_1X_{1i} + \beta_2 X_{2i} + \ldots + \beta_K X_{Ki} + u_i\]

Estimation of \(\hat{\boldsymbol{\beta}}\)

\[ \begin{aligned} \mathbf{u} &= \mathbf{Y} - \mathbf{X}\boldsymbol{\beta} \\ \mathbf{u}'\mathbf{u} &= (\mathbf{Y} - \mathbf{X}\boldsymbol{\beta})'(\mathbf{Y} - \mathbf{X}\boldsymbol{\beta}) \\ &= \mathbf{Y'Y} - 2 \boldsymbol{\beta}' \mathbf{X'Y'} + \boldsymbol{\beta}' \mathbf{X'X} \boldsymbol{\beta} \end{aligned} \]

Estimation of \(\hat{\boldsymbol{\beta}}\)

\[ \begin{aligned} \frac{\partial\mathbf{u}' \mathbf{u}}{\partial \boldsymbol{\beta}} &= -2\mathbf{X'Y} + 2\boldsymbol{X'X\beta} \\ 0 &= -2\mathbf{X'Y} + 2\mathbf{X'X} \boldsymbol{\beta} \\ 0 &= -\mathbf{X'Y} + \mathbf{X'X}\boldsymbol{\beta} \\ \mathbf{X'Y} &= \mathbf{X'X\beta} \\ (\mathbf{X'X})^{-1}\mathbf{X'Y} &= (\mathbf{X'X})^{-1}\mathbf{X'X}\boldsymbol{\beta} \\ (\mathbf{X'X})^{-1}\mathbf{X'Y} &= \mathbf{I}\boldsymbol{\beta} \\ (\mathbf{X'X})^{-1}\mathbf{X'Y} &= \boldsymbol{\beta} \\ \end{aligned} \]

Estimation of \(\hat{\boldsymbol{\beta}}\)

  • \(\mathbf{X'Y}\): covariance of \(\mathbf{X}\) and \(\mathbf{Y}\)
  • \(\mathbf{X'X}\): variance of \(\mathbf{X}\)
  • Premultiplying \(\mathbf{X'Y}\) by \((\mathbf{X'X})^{-1}\): dividing \(\mathbf{X'Y}\) by \(\mathbf{X'X}\)

Inference in multivariate regression

  • Variance-covariance matrix

    \[ \begin{aligned} \mathbf{V}(\boldsymbol{\hat{\beta}}) &= E[\boldsymbol{\hat{\beta}} - E(\boldsymbol{\hat{\beta}})]^2 \\ &= E\{[\boldsymbol{\hat{\beta}} - E(\boldsymbol{\hat{\beta}})][\boldsymbol{\hat{\beta}} - E(\boldsymbol{\hat{\beta}})]' \} \\ &= E\{[\boldsymbol{\hat{\beta}} - \boldsymbol{\beta}][\boldsymbol{\hat{\beta}} - \boldsymbol{\beta}]' \} \\ &= E\{[(\mathbf{X'X})^{-1}\mathbf{X'u}][(\mathbf{X'X})^{-1}\mathbf{X'u}]' \} \\ &= E[(\mathbf{X'X})^{-1}\mathbf{X'u}\mathbf{u'X}(\mathbf{X'X})^{-1}] \\ &= (\mathbf{X'X})^{-1}\mathbf{X'}E[\mathbf{u}\mathbf{u}']\mathbf{X}(\mathbf{X'X})^{-1} \\ &= (\mathbf{X'X})^{-1}\mathbf{X'}\sigma^2 \mathbf{I} \mathbf{X}(\mathbf{X'X})^{-1} \\ &= \sigma^2 (\mathbf{X'X})^{-1} \\ \end{aligned} \]

  • Assumes \(E[\mathbf{u}\mathbf{u}'] = \sigma^2 \mathbf{I}\)
  • Variance-covariance matrix
    • Variability of the \(\mathbf{X}\)s and the covariance of the variables
    • Useful for interaction terms, multicollinearity, etc.

Estimation of \(\mathbf{V}(\boldsymbol{\hat{\beta}})\)

  • Need to know \(\sigma^2\), but it is unknown
  • \(\hat{\sigma}^2\)

    \[\hat{\sigma}^2 = \frac{\mathbf{\hat{u}' \hat{u}}}{N-K}\] \[\widehat{\mathbf{V}(\boldsymbol{\hat{\beta}})} = \hat{\sigma}^2 (\mathbf{X'X})^{-1}\]
  • On-diagonal elements
  • Square roots of diagonal elements
  • Off-diagonal elements

Credit card balance

  • Regression results

    ## 
    ## Call:
    ## lm(formula = Balance ~ Income + Rating, data = credit)
    ## 
    ## Residuals:
    ##    Min     1Q Median     3Q    Max 
    ## -278.6 -112.7  -36.2   57.9  575.2 
    ## 
    ## Coefficients:
    ##              Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept) -534.8122    21.6027   -24.8   <2e-16 ***
    ## Income        -7.6721     0.3785   -20.3   <2e-16 ***
    ## Rating         3.9493     0.0862    45.8   <2e-16 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 163 on 397 degrees of freedom
    ## Multiple R-squared:  0.875,  Adjusted R-squared:  0.874 
    ## F-statistic: 1.39e+03 on 2 and 397 DF,  p-value: <2e-16
  • Variance-covariance matrix

    ##             (Intercept)  Income   Rating
    ## (Intercept)      466.68  2.6877 -1.47035
    ## Income             2.69  0.1432 -0.02582
    ## Rating            -1.47 -0.0258  0.00743
  • Diagonal elements

    ## (Intercept)      Income      Rating 
    ##    4.67e+02    1.43e-01    7.43e-03
    ## (Intercept)      Income      Rating 
    ##     21.6027      0.3785      0.0862