Ordinary least squares regression

Computational Mathematics and Statistics Camp University of Chicago September 2018

Motivation

\(Y\)
\(X\)
\(i = \{1,2,\ldots, N \}\)

\[Y_i = \mu + u_i\]
- Systematic component
- Stochastic component
Linear relationship

\[\mu_i = \beta_0 + \beta_1 X_i\] \[Y_i = \beta_0 + \beta_1 X_i + u_i\]
Estimate
- Point estimates: \(\hat{\beta}_0 \text{ and } \hat{\beta}_1\)
- Variability: standard errors for \(\hat{\beta}_0 \text{ and } \hat{\beta}_1\)

Motivation

\[\hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i\] \[ \begin{aligned} \hat{u}_i &= Y_i - \hat{Y}_i \\ &= Y_i - \hat{\beta}_0 + \hat{\beta}_1 X_i \end{aligned} \]

Estimating \(\hat{\beta}_0 \text{ and } \hat{\beta}_1\)

Desired qualities of estimator

Unbiased
- \(E(\hat{\beta}) = \beta\)
Efficient
- \(\min(Var(\hat{\beta}))\)

Least squares regression

\[ \begin{aligned} &\min(\hat{S}) \\ \min(\hat{S}) &= \sum_{i=1}^n \hat{u}_i^2 \\ \sum_{i=1}^n (s_i)^2 &= \sum_{i=1}^n (Y_i - (\beta_0 + \beta_1 X_i))^2\\ f(\beta_0, \beta_1 | x_i, y_i) & = \sum_{i=1}^n (Y_i - \beta_0 - \beta_1 X_i )^2\\ \dfrac{\partial{ f(\beta_0, \beta_1 | x_i, y_i)}}{\partial \beta_0} & = -2 (\sum_{i=1}^n (Y_i - \beta_0 - \beta_1 X_i))\\ & = \sum_{i=1}^n -2Y_i + 2\beta_0 + 2\beta_1 X_i\\ 0 & = \sum_{i=1}^n -2Y_{i} + 2\beta_0 + 2\beta_1 X_i\\ 0 & = -2 \sum_{i=1}^n Y_{i} + 2\sum_{i=1}^n \beta_0 + 2\beta_1 \sum_{i=1}^n X_i\\ 0 & = -2 \sum_{i=1}^n Y_{i} + (n \times 2\beta_0) + 2\beta_1 \sum_{i=1}^n X_i\\ n \times 2\beta_0 & = 2 \sum_{i=1}^n Y_i - 2\beta_1 \sum_{i=1}^n X_i\\ \hat \beta_0 & = \dfrac{2 \sum_{i=1}^n Y_i}{2n} - \dfrac{2\beta_1 \sum_{i=1}^n X_i}{2n}\\ & = \dfrac{\sum_{i=1}^n Y_i}{n} - \beta_1\dfrac{ \sum_{i=1}^n X_i}{n}\\ \hat \beta_0 & = \bar{Y} - \beta_1 \bar{X} \end{aligned} \]

Least squares regression

\[ \begin{aligned} \dfrac{\partial{ f(\beta_0, \beta_1 | x_i, y_i)}}{\partial \beta_1} & = \sum_{i=1}^n -2X_i(Y_i - \beta_0 - \beta_1 X_i) \\ & = \sum_{i=1}^n -2Y_iX_i + 2\beta_0X_i + 2\beta_1 X_i^2\\ 0 & = \sum_{i=1}^n -2Y_iX_i + 2\beta_0 \sum_{i=1}^nX_i + 2\beta_1 \sum_{i=1}^n X_i^2\\ & = \sum_{i=1}^n -2Y_iX_i + 2 (\bar{Y} - \beta_1 \bar{X}) \sum_{i=1}^nX_i + 2\beta_1 \sum_{i=1}^n X_i^2\\ & = \sum_{i=1}^n -2Y_iX_i + 2\bar{Y} \sum_{i=1}^nX_i - 2\beta_1 \bar{X}\sum_{i=1}^nX_i + 2\beta_1 \sum_{i=1}^n X_i^2\\ 2\beta_1 \sum_{i=1}^n X_i^2 - 2\beta_1 \bar{X}\sum_{i=1}^nX_i & = \sum_{i=1}^n 2Y_iX_i - 2\bar{Y} \sum_{i=1}^nX_i\\ \beta_1 ( \sum_{i=1}^n X_i^2 - \bar{X}\sum_{i=1}^nX_i ) & = \sum_{i=1}^n Y_iX_i - \bar{Y} \sum_{i=1}^nX_i\\ \hat \beta_1 & = \dfrac{ \sum_{i=1}^n Y_iX_i - \bar{Y} \sum_{i=1}^nX_i}{ \sum_{i=1}^n X_i^2 - \bar{X}\sum_{i=1}^nX_i}\\ \hat \beta_0 & = \bar{Y} - \hat{\beta}_1 \bar{X} \end{aligned} \]

Least squares regression

Inference with point estimates

Seek to make inferences about a population
Variability of our estimates
Sampling variability properties
Properties of a good estimator
- Unbiased
- Efficient

Properties of the estimators

Least square regressor
\[Y_i = \beta_0 + \beta_1 X_i + u_i\]
- Where \(u_i \sim N(0, \sigma^2)\)
Variance of the least-squares estimates

\[Var(\hat{\beta}_0) = \frac{\sum X_i^2}{N\sum(X_i - \bar{X})^2}\] \[Var(\hat{\beta}_1) = \frac{\sigma^2}{\sum(X_i - \bar{X})^2}\] \[Cov(\hat{\beta}_0,\hat{\beta}_1) = \frac{-\bar{X}}{\sum(X_i - \bar{X})^2}\]

Important notes

Variance of both estimates is proportional to \(\sigma^2\)
Variance of both estimates is inversely proportional to \(\sum(X_i - \bar{X})\)
As \(N\) increases, variability decreases
Covariance of the two estimates depends on the sign of \(\bar{X}\)

Gauss-Markov theorem

Given the assumptions of the classical linear regression model, the least squares estimators are the minimum variance estimators among the class of unbiased estimators. (They are BLUE [Best Linear Unbiased Estimator])

Unbiased

\[ \begin{aligned} \hat{\beta}_1 &= \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sum (X_i - \bar{X})^2} \\ &= \frac{\sum (X_i - \bar{X})Y_i}{\sum (X_i - \bar{X})^2} - \frac{\bar{Y}(X_i - \bar{X})}{\sum (X_i - \bar{X})^2} \\ &= \frac{\sum (X_i - \bar{X})Y_i}{\sum (X_i - \bar{X})^2} \\ &= \sum k_i Y_i \end{aligned} \]

where \(k_i\) is a weighting function \(\frac{\sum (X_i - \bar{X})}{\sum (X_i - \bar{X})^2}\)

Unbiased

New weight \(w_i\)
\[\tilde{\beta}_1 = \sum w_i Y_i\]
What happens to \(\hat{\beta}_1\)?

\[ \begin{aligned} E(\tilde{\beta}_1) &= \sum w_i E(Y_i) \\ &= \sum w_i (\beta_0 + \beta_1 X_i) \\ &= \beta_0 \sum w_i + \beta_1 \sum w_i X_i \end{aligned} \]
- \(\sum w_i = 0\) and \(\sum (w_iX_i) = 1\)
- Otherwise \(\tilde{\beta}_1 \neq \beta_1\)

Efficient

\[ \begin{aligned} Var(\tilde{\beta}_1) &= Var \left(\sum w_i Y_i \right) \\ &= \sigma^2 \sum w_i^2 \\ &= \sigma^2 \sum \left[ w_i - \frac{X_i - \bar{X}}{\sum(X_i - \bar{X})^2} + \frac{X_i - \bar{X}}{\sum(X_i - \bar{X})^2} \right]^2 \\ &= \sigma^2 \sum \left[ w_i - \frac{X_i - \bar{X}}{\sum(X_i - \bar{X})^2}\right]^2 + \sigma^2 \left[\frac{1}{\sum(X_i - \bar{X})^2}\right] \\ \min(Var[\tilde{\beta}_1]) &= \sum \left[ w_i - \frac{X_i - \bar{X}}{\sum(X_i - \bar{X})^2}\right]^2 \\ \end{aligned} \]

Minimized when it equals 0

\[w_i = \frac{X_i - \bar{X}}{\sum(X_i - \bar{X})^2}\]
Substitute back in

\[Var(\tilde{\beta}_1) = \frac{\sigma^2}{\sum(X_i - \bar{X})^2}\]
- Variance of the least-squares estimator

Inference

Assume \(u_i \sim N(0, \sigma^2)\)
Estimates of \(\hat{\beta}_0\) and \(\hat{\beta}_1\) also normally distributed
\[ \begin{aligned} \hat{\beta}_0 &\sim N(\beta_0, Var(\hat{\beta}_0)) \\ \hat{\beta}_1 &\sim N(\beta_1, Var(\hat{\beta}_1)) \\ \end{aligned} \]
\(z\) scores!

\[ \begin{aligned} z_{\hat{\beta}_1} &= \frac{\hat{\beta}_1 - \beta_1}{\sqrt{Var(\hat{\beta}_1)}} \\ &= \frac{\hat{\beta}_1 - \beta_1}{se(\hat{\beta}_1)} \\ \end{aligned} \]

Inference

\(\sigma^2 = Var(\hat{\beta}_1)\) is unknown
Estimating \(\sigma^2\): \(\hat{\sigma}^2\)
- Estimated variance of the errors \(u_i\)
  \[\hat{\sigma}^2 = \frac{\sum \hat{u}_i^2}{N - k}\]
Estimated variances
\[ \begin{aligned} \widehat{Var(\hat{\beta}_1)} &= \frac{\hat{\sigma}^2}{\sum (X_i - \bar{X})^2} \\ \widehat{Var(\hat{\beta}_0)} &= \frac{\sum X_i^2}{N \sum (X_i - \bar{X})^2} \hat{\sigma}^2 \\ \end{aligned} \]
Estimated standard errors

\[ \begin{aligned} \widehat{se(\hat{\beta}_1)} &= \sqrt{\widehat{Var(\hat{\beta}_1)}} \\ &= \frac{\hat{\sigma}}{\sqrt{\sum (X_i - \bar{X})^2}} \\ \end{aligned} \]

Inference

\[ \begin{aligned} t_{\hat{\beta}_1} \equiv \frac{\hat{\beta}_1 - \beta_1}{\widehat{se(\hat{\beta}_1)}} &= \frac{\hat{\beta}_1 - \beta_1}{\frac{\hat{\sigma}}{\sqrt{\sum (X_i - \bar{X})^2}}} \\ &= \frac{\hat{(\beta}_1 - \beta_1)\sqrt{\sum (X_i - \bar{X})^2}}{\sigma} \end{aligned} \]

\(t\) statistic, not a \(z\) score
Numerator is standard normal
\(u_i\) is standard normal
Squared errors \(u_i^2\) independent \(\chi^2\) variables with \(df = 1\) each
\(t\) distribution \(\equiv\) standard normal divided by square root of \(\chi^2\) variable

Hypothesis testing

Plug-in a value for \(\beta\) into the \(t\)-test formula
Choose a significance level \(\alpha\)
Evaluate how likely it is to draw that particular sample of data given the estimated value \(\hat{\beta}\)

Null hypothesis testing
Confidence intervals

Credit card balance

## 
## Call:
## lm(formula = Balance ~ Income, data = credit)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -803.6 -349.0  -54.4  331.7 1100.2 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  246.515     33.199    7.43  6.9e-13 ***
## Income         6.048      0.579   10.44  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 408 on 398 degrees of freedom
## Multiple R-squared:  0.215,  Adjusted R-squared:  0.213 
## F-statistic:  109 on 1 and 398 DF,  p-value: <2e-16

Multivariate regression

\[\mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{u}\]

\(\mathbf{Y}\): \(N\times 1\) vector
\(\mathbf{X}\): \(N \times K\) matrix
\(\boldsymbol{\beta}\): \(K \times 1\) vector
\(\mathbf{u}\): \(N\times 1\) vector
\(i \in \{1,\ldots,N \}\)
\(k \in \{1,\ldots,K \}\)

\[Y_i = \beta_0 + \beta_1X_{1i} + \beta_2 X_{2i} + \ldots + \beta_K X_{Ki} + u_i\]

Estimation of \(\hat{\boldsymbol{\beta}}\)

\[ \begin{aligned} \mathbf{u} &= \mathbf{Y} - \mathbf{X}\boldsymbol{\beta} \\ \mathbf{u}'\mathbf{u} &= (\mathbf{Y} - \mathbf{X}\boldsymbol{\beta})'(\mathbf{Y} - \mathbf{X}\boldsymbol{\beta}) \\ &= \mathbf{Y'Y} - 2 \boldsymbol{\beta}' \mathbf{X'Y'} + \boldsymbol{\beta}' \mathbf{X'X} \boldsymbol{\beta} \end{aligned} \]

Estimation of \(\hat{\boldsymbol{\beta}}\)

\[ \begin{aligned} \frac{\partial\mathbf{u}' \mathbf{u}}{\partial \boldsymbol{\beta}} &= -2\mathbf{X'Y} + 2\boldsymbol{X'X\beta} \\ 0 &= -2\mathbf{X'Y} + 2\mathbf{X'X} \boldsymbol{\beta} \\ 0 &= -\mathbf{X'Y} + \mathbf{X'X}\boldsymbol{\beta} \\ \mathbf{X'Y} &= \mathbf{X'X\beta} \\ (\mathbf{X'X})^{-1}\mathbf{X'Y} &= (\mathbf{X'X})^{-1}\mathbf{X'X}\boldsymbol{\beta} \\ (\mathbf{X'X})^{-1}\mathbf{X'Y} &= \mathbf{I}\boldsymbol{\beta} \\ (\mathbf{X'X})^{-1}\mathbf{X'Y} &= \boldsymbol{\beta} \\ \end{aligned} \]

Estimation of \(\hat{\boldsymbol{\beta}}\)

\(\mathbf{X'Y}\): covariance of \(\mathbf{X}\) and \(\mathbf{Y}\)
\(\mathbf{X'X}\): variance of \(\mathbf{X}\)
Premultiplying \(\mathbf{X'Y}\) by \((\mathbf{X'X})^{-1}\): dividing \(\mathbf{X'Y}\) by \(\mathbf{X'X}\)

Inference in multivariate regression

Variance-covariance matrix

\[ \begin{aligned} \mathbf{V}(\boldsymbol{\hat{\beta}}) &= E[\boldsymbol{\hat{\beta}} - E(\boldsymbol{\hat{\beta}})]^2 \\ &= E\{[\boldsymbol{\hat{\beta}} - E(\boldsymbol{\hat{\beta}})][\boldsymbol{\hat{\beta}} - E(\boldsymbol{\hat{\beta}})]' \} \\ &= E\{[\boldsymbol{\hat{\beta}} - \boldsymbol{\beta}][\boldsymbol{\hat{\beta}} - \boldsymbol{\beta}]' \} \\ &= E\{[(\mathbf{X'X})^{-1}\mathbf{X'u}][(\mathbf{X'X})^{-1}\mathbf{X'u}]' \} \\ &= E[(\mathbf{X'X})^{-1}\mathbf{X'u}\mathbf{u'X}(\mathbf{X'X})^{-1}] \\ &= (\mathbf{X'X})^{-1}\mathbf{X'}E[\mathbf{u}\mathbf{u}']\mathbf{X}(\mathbf{X'X})^{-1} \\ &= (\mathbf{X'X})^{-1}\mathbf{X'}\sigma^2 \mathbf{I} \mathbf{X}(\mathbf{X'X})^{-1} \\ &= \sigma^2 (\mathbf{X'X})^{-1} \\ \end{aligned} \]
Assumes \(E[\mathbf{u}\mathbf{u}'] = \sigma^2 \mathbf{I}\)
Variance-covariance matrix
- Variability of the \(\mathbf{X}\)s and the covariance of the variables
- Useful for interaction terms, multicollinearity, etc.

Estimation of \(\mathbf{V}(\boldsymbol{\hat{\beta}})\)

Need to know \(\sigma^2\), but it is unknown
\(\hat{\sigma}^2\)
\[\hat{\sigma}^2 = \frac{\mathbf{\hat{u}' \hat{u}}}{N-K}\] \[\widehat{\mathbf{V}(\boldsymbol{\hat{\beta}})} = \hat{\sigma}^2 (\mathbf{X'X})^{-1}\]
On-diagonal elements
Square roots of diagonal elements
Off-diagonal elements

Credit card balance

Regression results

## 
## Call:
## lm(formula = Balance ~ Income + Rating, data = credit)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -278.6 -112.7  -36.2   57.9  575.2 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -534.8122    21.6027   -24.8   <2e-16 ***
## Income        -7.6721     0.3785   -20.3   <2e-16 ***
## Rating         3.9493     0.0862    45.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 163 on 397 degrees of freedom
## Multiple R-squared:  0.875,  Adjusted R-squared:  0.874 
## F-statistic: 1.39e+03 on 2 and 397 DF,  p-value: <2e-16

Variance-covariance matrix

##             (Intercept)  Income   Rating
## (Intercept)      466.68  2.6877 -1.47035
## Income             2.69  0.1432 -0.02582
## Rating            -1.47 -0.0258  0.00743

Diagonal elements

## (Intercept)      Income      Rating 
##    4.67e+02    1.43e-01    7.43e-03

## (Intercept)      Income      Rating 
##     21.6027      0.3785      0.0862