Standard error is the same thing as random sampling error
\[ \begin{equation} \begin{split} \text{Random sampling error}& =\frac{\sigma}{\sqrt{n}} \\ \text{Standard error of a sample mean}& =\frac{\sigma}{\sqrt{n}} \\ \end{split} \end{equation} \]The central limit theorem always applies to sample statistics, regardless of the shape of the underlying variable of interest
Feelings of warmth towards athiests
\[ \begin{equation} \begin{split} \text{Observed sample mean } \bar{x} &=40 \\ \sigma &=25 \\ n &= 100 \\ \text{Standard error of } \bar{x} &=2.5 \\ \text{Hypothesized population mean } \mu &=37 \end{split} \end{equation} \]
95% confidence intervals
\[ \begin{equation} \begin{split} \text{Lower confidence interval}&=\bar{x} - 2 \text{ standard errors} \\ & =40 - 2(2.5) \\ & =40 - 5 \\ & =35 \\ \text{Upper confidence interval}&=\bar{x} + 2 \text{ standard errors} \\ & =40 + 2(2.5) \\ & =40 + 5 \\ & =45 \end{split} \end{equation} \]
Calculate the difference between the observed sample mean and the hypothetical population mean
\[ \begin{equation} \begin{split} \text{Sample mean minus hypothetical mean} &=\bar{x} - \mu \\ &=40-37 \\ &=3 \end{split} \end{equation} \]Calculating sample standard deviation
\[ \begin{equation} \begin{split} s^2 &=\frac{\text{TSS}}{(n-1)} \\ s &=\sqrt{\frac{\text{TSS}}{(n-1)}} \end{split} \end{equation} \]
\(\mu = 2050\) \[2051, 2053, 2055, 2050, 2051\]
Sample average \(\hat{\mu}\) \[\frac{1}{5}\left(2051 + 2053 + 2055 + 2050 + 2051\right) = 2052\]
Population variance if \(\mu\) is known
\[ \begin{align} {} & \frac{1}{5}[(2051 - 2050)^2 + (2053 - 2050)^2 \\ &\quad\, + (2055 - 2050)^2 + (2050 - 2050)^2 \\ &\quad\, + (2051 - 2050)^2] \\ = {} & \frac{36}{5} = 7.2 \end{align} \]
Population variance if \(\mu\) is unknown
\[ \begin{align} {} & \frac{1}{5}[(2051 - 2052)^2 + (2053 - 2052)^2 \\ &\quad\, + (2055 - 2052)^2 + (2050 - 2052)^2 \\ &\quad\, + (2051 - 2052)^2] \\ = {} & \frac{16}{5} = 3.2 \end{align} \]
Estimate of variance using sample mean is always smaller than using population mean
\[(a + b)^2 = a^2 + 2ab + b^2\]
\[ \begin{align} {[}\,\underbrace{2053 - 2050}_{\begin{smallmatrix} \text{Deviation from} \\ \text{the population} \\ \text{mean} \end{smallmatrix}}\,]^2 & = [\,\overbrace{(\,\underbrace{2053 - 2052}_{\begin{smallmatrix} \text{Deviation from} \\ \text{the sample mean} \end{smallmatrix}}\,)}^{\text{This is }a.} + \overbrace{(2052 - 2050)}^{\text{This is }b.}\,]^2 \\ & = \overbrace{(2053 - 2052)^2}^{\text{This is }a^2.} \\ &\quad + \overbrace{2(2053 - 2052)(2052 - 2050)}^{\text{This is }2ab.} \\ &\quad + \overbrace{(2052 - 2050)^2}^{\text{This is }b^2.} \end{align} \]
\[ \begin{alignat}{2} \overbrace{(2051 - 2052)^2}^{\text{This is }a^2.}\ &+\ \overbrace{2(2051 - 2052)(2052 - 2050)}^{\text{This is }2ab.}\ &&+\ \overbrace{(2052 - 2050)^2}^{\text{This is }b^2.} \\ (2053 - 2052)^2\ &+\ 2(2053 - 2052)(2052 - 2050)\ &&+\ (2052 - 2050)^2 \\ (2055 - 2052)^2\ &+\ 2(2055 - 2052)(2052 - 2050)\ &&+\ (2052 - 2050)^2 \\ (2050 - 2052)^2\ &+\ 2(2050 - 2052)(2052 - 2050)\ &&+\ (2052 - 2050)^2 \\ (2051 - 2052)^2\ &+\ \underbrace{2(2051 - 2052)(2052 - 2050)}_{\begin{smallmatrix} \text{The sum of the entries in this} \\ \text{middle column must be 0.} \end{smallmatrix}}\ &&+\ (2052 - 2050)^2 \end{alignat} \]
\(T^2, S^2\)
\[E((T^2 - \sigma^2)^2) < E((S^2 - \sigma^2)^2)\]
\[f(t) = \frac{\Gamma(\frac{\nu+1}{2})} {\sqrt{\nu\pi}\,\Gamma(\frac{\nu}{2})} \left(1+\frac{t^2}{\nu} \right)^{\!-\frac{\nu+1}{2}}\]
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
##
## [[6]]
##
## [[7]]
95% confidence intervals for campaign donation amount using the \(t\) distribution \[ \begin{equation} \begin{split} \text{Degrees of freedom} &=\text{Sample size} - \text{Number of estimated parameters} \\ &=30 - 1 \\ &=29 \end{split} \end{equation} \]
\[ \begin{equation} \begin{split} \text{Lower confidence interval}&=\bar{x} - 2.045 \text{ standard errors} \\ & =1500 - 2.045(128) \\ & \approx 1500 - 262 \\ & = 1238 \\ \text{Upper confidence interval}&=\bar{x} + 2.045 \text{ standard errors} \\ & =1500 + 2.045(128) \\ & \approx 1500 + 262 \\ & = 1762 \end{split} \end{equation} \]
95% confidence intervals for campaign donation amount using the normal distribution \[ \begin{equation} \begin{split} \text{Lower confidence interval}&=\bar{x} - 1.96 \text{ standard errors} \\ & =1500 - 1.96(128) \\ & \approx 1500 - 251 \\ & = 1249 \\ \text{Upper confidence interval}&=\bar{x} + 1.96 \text{ standard errors} \\ & =1500 - 1.96(128) \\ & \approx 1500 + 251 \\ & = 1751 \end{split} \end{equation} \]
Did you vote in the last election? 78 students answer yes, 28 answer no. \[ \begin{equation} \begin{split} \text{Sample proportion of voters}&=\frac{\text{Number Answering ``Yes''}}{\text{Sample size}} \\ & =\frac{72}{100} \\ & =.72 \end{split} \end{equation} \]
\[ \begin{equation} \begin{split} \text{Sample proportion of nonvoters}&=\frac{\text{Number Answering ``No''}}{\text{Sample size}} \\ & =\frac{28}{100} \\ & =.28 \end{split} \end{equation} \]
What is the standard error of the observed sample statistic, \(.72\)? \[ \begin{equation} \begin{split} \text{Sample proportion of voters } p&=.72 \\ \text{Sample proportion of nonvoters } q&= 1-p=.28 \\ \text{Sample size } n&=100 \end{split} \end{equation} \]
General formula for calculating the random sampling error is \[ \begin{equation} \begin{split} \text{Random sampling error} &=\frac{\text{Variation component}}{\text{Sample size component}} \\ \text{Sample size component} &= \sqrt{n} \\ \text{Variation component} &= \sqrt{pq} \end{split} \end{equation} \]
SE of a sample proportion \(\hat{p}\)
\[ \begin{equation} \text{Standard error}=\frac{\sqrt{\hat{p}q}}{\sqrt{n}} \end{equation} \]
Plug in the numbers and we get \[ \begin{equation} \begin{split} \text{Standard error} &=\frac{\sqrt{(.72)(.28)}}{\sqrt{100}} \\ &= \frac{\sqrt{.20}}{\sqrt{100}} \\ &= \frac{.45}{10} \\ &= .045 \end{split} \end{equation} \]
Calculate the standard error of a difference in sample means \[ \begin{equation} \begin{split} \text{Standard error of a difference} &=\sqrt{{se_F}^2 + {se_M}^2} \\ &=\sqrt{1.00^2 + .98^2} \\ &=\sqrt{1.00 + .96} \\ &=\sqrt{1.96} \\ &=1.40 \end{split} \end{equation} \]
Calculate the test statistic
\[ \begin{equation} \begin{split} \text{Test statistic} &= \frac{(H_A - H_0)}{se_{1-2}} \\ Z &= \frac{(H_A - H_0)}{se_{1-2}} \\ t &= \frac{(H_A - H_0)}{se_{1-2}} \end{split} \end{equation} \]
\(Z\) score for females and males \[ \begin{equation} \begin{split} Z &= \frac{(H_A - H_0)}{se_{1-2}} \\ &= \frac{(4.6-0)}{1.40} \\ &= 3.30 \end{split} \end{equation} \]
\(t\) statistic for females and males \[ \begin{equation} \begin{split} t &= \frac{(H_A - H_0)}{se_{1-2}} \\ &= \frac{(4.6-0)}{1.40} \\ &= 3.30 \end{split} \end{equation} \]
\[ \begin{equation} \begin{split} \text{Degrees of freedom} &=\text{Sample size} - \text{Number of estimated parameters} \\ &=(625+553) - 2 \\ &=1176 \end{split} \end{equation} \]\(p\)-value for \(t=3.30\) and \(df=1176\) is .0005
Everything else is the same
\[ \begin{equation} \begin{split} se_{1-2} &= \sqrt{\frac{p_{1}q_{1}}{n_1}+\frac{p_{2}q_{2}}{n_2}} \\ \min(p_{1} - p_{2}) &= |p_{1} - p_{2}| - 1.645(se_{1-2}) \end{split} \end{equation} \]
\[ \begin{equation} \begin{split} H_A &= p_{1} - p_{2} \\ H_0 &= 0 \\ se_{1-2} &=\sqrt{\frac{p_{1}q_{1}}{n_1}+\frac{p_{2}q_{2}}{n_2}} \end{split} \end{equation} \]
\[ \begin{equation} Z = \frac{(H_A - H_0)}{se_{1-2}} , t = \frac{(H_A - H_0)}{se_{1-2}} \end{equation} \]
CBS News/60 Minutes/Vanity Fair National Poll, September #2, 2012
When you hear the name Kim Kardashian (Kar-dash-ian), which of the following comes to mind first? 1. A self-made businesswoman, 2. A reality television star, 3. A perfume line, 4. Certain physical traits, or 5. A sex tape.
Who is more likely to associated Kim Kardashian with “sex tape”? Men or women?
| Frequency | Percent | |
|---|---|---|
| A self made businesswoman | 74 | 6.715 |
| Reality television star | 562 | 50.998 |
| A perfume line | 13 | 1.180 |
| Certain physical traits | 76 | 6.897 |
| A sex tape | 176 | 15.971 |
| DK/NA | 201 | 18.240 |
| Total | 1102 | 100.000 |
| Sample proportion (\(p\)) | Complement of sample proportion (\(q\)) | Squared standard error \(\frac{pq}{n}\) | |
|---|---|---|---|
| Male | .227 | .773 | .000441 |
| (406) | |||
| Female | .170 | .830 | .000289 |
| (495) | |||
| Mean difference | .057 | ||
| Sum of squared standard errors | .00073 | ||
| Standard error of the mean difference | .027 |
| Right to Abortion | Liberal | Moderate | Conservative | Total |
|---|---|---|---|---|
| Yes | 40.8% | 40.8% | 40.8% | 40.8% |
| (206.45) | (289.68) | (271.32) | (768) | |
| No | 59.2% | 59.2% | 59.2% | 59.2% |
| (299.55) | (420.32) | (393.68) | (1113) | |
| Total | 26.9% | 37.7% | 35.4% | 100% |
| (506) | (710) | (665) | (1881) |
| Right to Abortion | Liberal | Moderate | Conservative | Total |
|---|---|---|---|---|
| Yes | 62.6% | 36.6% | 28.7% | 40.8% |
| (317) | (260) | (191) | (768) | |
| No | 37.4% | 63.4% | 71.28% | 59.2% |
| (189) | (450) | (474) | (1113) | |
| Total | 26.9% | 37.7% | 35.4% | 100% |
| (506) | (710) | (665) | (1881) |
| Right to Abortion | Liberal | Moderate | Conservative | |
|---|---|---|---|---|
| Yes | Observed Frequency (\(f_o\)) | 317.0 | 260.0 | 191.0 |
| Expected Frequency (\(f_e\)) | 206.6 | 289.9 | 271.5 | |
| \(f_o - f_e\) | 110.4 | -29.9 | -80.5 | |
| \((f_o - f_e)^2\) | 12188.9 | 893.3 | 6482.7 | |
| \(\frac{(f_o - f_e)^2}{f_e}\) | 59.0 | 4.1 | 23.9 | |
| No | Observed Frequency (\(f_o\)) | 189.0 | 450.0 | 474.0 |
| Expected Frequency (\(f_e\)) | 299.4 | 420.1 | 393.5 | |
| \(f_o - f_e\) | -110.4 | 29.9 | 80.5 | |
| \((f_o - f_e)^2\) | 12188.9 | 893.3 | 6482.7 | |
| \(\frac{(f_o - f_e)^2}{f_e}\) | 40.7 | 2.1 | 16.5 |
Expected outcomes
| Outcome | Higher Seed | Lower Seed | Total |
|---|---|---|---|
| Win | 312 | 0 | 312 |
| Loss | 0 | 312 | 312 |
| Total | 312 | 312 | 624 |
Actual outcomes from NCAA tournament games (2010-14)
| Outcome | Higher Seed | Lower Seed | Total |
|---|---|---|---|
| Win | 214 | 98 | 312 |
| Loss | 98 | 214 | 312 |
| Total | 312 | 312 | 624 |
Accuracy rate = 68.5%
======================================================
gss$polviews
gss$abany Liberal Moderate Conservative Total
------------------------------------------------------
Yes 154 116 93 363
62.857 34.627 29.245
------------------------------------------------------
No 91 219 225 535
37.143 65.373 70.755
------------------------------------------------------
Total 245 335 318 898
27.283 37.305 35.412
======================================================
Statistics for All Table Factors
Pearson's Chi-squared test
------------------------------------------------------------
Chi^2 = 72.37185 d.f. = 2 p = 1.925987e-16
Minimum expected frequency: 99.03675
This will happen when the within-category modes are the same as the overall mode
Interpreted the same way as \(\lambda\) \[ \begin{equation} \begin{split} \text{Weak} &= d_{yx} \leq .1 \\ \text{Moderate} &= .1 < d_{yx} \leq .2 \\ \text{Moderately Strong} &= .2 < d_{yx} \leq .3 \\ \text{Strong} &= d_{yx} > .3 \end{split} \end{equation} \]
\[ \begin{equation} \begin{split} \text{Weak} &= d_{yx} \geq -.1 \\ \text{Moderate} &= -.1 < d_{yx} \leq -.2 \\ \text{Moderately Strong} &= -.2 < d_{yx} \leq -.3 \\ \text{Strong} &= d_{yx} < -.3 \end{split} \end{equation} \]