Inference for numerical and categorical data

Computational Mathematics and Statistics Camp University of Chicago September 2018

Review: Inference and the Normal Distribution

  • Standard error is the same thing as random sampling error

    \[ \begin{equation} \begin{split} \text{Random sampling error}& =\frac{\sigma}{\sqrt{n}} \\ \text{Standard error of a sample mean}& =\frac{\sigma}{\sqrt{n}} \\ \end{split} \end{equation} \]
  • Central limit theorem
  • Normal distribution
  • The central limit theorem always applies to sample statistics, regardless of the shape of the underlying variable of interest

Comparing sample means

  1. Calculate the difference between the observed sample mean and the hypothetical population mean
  2. Convert the untransformed number to a \(Z\) score
  3. Look up the \(Z\) score in the normal distribution probability table

Example: athiests

  • Feelings of warmth towards athiests

    \[ \begin{equation} \begin{split} \text{Observed sample mean } \bar{x} &=40 \\ \sigma &=25 \\ n &= 100 \\ \text{Standard error of } \bar{x} &=2.5 \\ \text{Hypothesized population mean } \mu &=37 \end{split} \end{equation} \]

  • 95% confidence intervals

    \[ \begin{equation} \begin{split} \text{Lower confidence interval}&=\bar{x} - 2 \text{ standard errors} \\ & =40 - 2(2.5) \\ & =40 - 5 \\ & =35 \\ \text{Upper confidence interval}&=\bar{x} + 2 \text{ standard errors} \\ & =40 + 2(2.5) \\ & =40 + 5 \\ & =45 \end{split} \end{equation} \]

Example: athiests

  • Reject the hypothesized population mean?
    1. Calculate the difference between the observed sample mean and the hypothetical population mean

      \[ \begin{equation} \begin{split} \text{Sample mean minus hypothetical mean} &=\bar{x} - \mu \\ &=40-37 \\ &=3 \end{split} \end{equation} \]
    2. Convert the untransformed number to a \(Z\) score \[ \begin{equation} \begin{split} Z &=\frac{\text{Sample mean minus hypothetical mean}}{\text{Standard error}} \\ &=\frac{40-37}{2.5} \\ &=\frac{3}{2.5} \\ &=1.2 \end{split} \end{equation} \]
    3. Look up the \(Z\) score in the normal distribution probability table
      • \(Z(1.20)=0.1151\)

Using the sample standard deviation

  • Generally, we do not know the population standard deviation
  • A reasonable estimate for this population parameter is the sample standard deviation \[ \begin{equation} \begin{split} \text{SE of the sample mean} &=\frac{\text{Sample standard deviation}}{\text{Square root of the sample size}} \\ &=\sqrt{\frac{s^2}{n}} \end{split} \end{equation} \]
  • Calculating population standard deviation \[ \begin{equation} \begin{split} \sigma^2 &=\frac{\text{TSS}}{N} \\ \sigma &=\sqrt{\frac{\text{TSS}}{N}} \end{split} \end{equation} \]
  • Calculating sample standard deviation

    \[ \begin{equation} \begin{split} s^2 &=\frac{\text{TSS}}{(n-1)} \\ s &=\sqrt{\frac{\text{TSS}}{(n-1)}} \end{split} \end{equation} \]

Bessel’s correction

  • \(\mu = 2050\) \[2051, 2053, 2055, 2050, 2051\]

  • Sample average \(\hat{\mu}\) \[\frac{1}{5}\left(2051 + 2053 + 2055 + 2050 + 2051\right) = 2052\]

  • Population variance if \(\mu\) is known

    \[ \begin{align} {} & \frac{1}{5}[(2051 - 2050)^2 + (2053 - 2050)^2 \\ &\quad\, + (2055 - 2050)^2 + (2050 - 2050)^2 \\ &\quad\, + (2051 - 2050)^2] \\ = {} & \frac{36}{5} = 7.2 \end{align} \]

  • Population variance if \(\mu\) is unknown

    \[ \begin{align} {} & \frac{1}{5}[(2051 - 2052)^2 + (2053 - 2052)^2 \\ &\quad\, + (2055 - 2052)^2 + (2050 - 2052)^2 \\ &\quad\, + (2051 - 2052)^2] \\ = {} & \frac{16}{5} = 3.2 \end{align} \]

  • Estimate of variance using sample mean is always smaller than using population mean

Rationale

\[(a + b)^2 = a^2 + 2ab + b^2\]

  • \(a=\) the deviation from an individual to the sample mean
  • \(b=\) the deviation from the sample mean to the population mean

Rationale

\[ \begin{align} {[}\,\underbrace{2053 - 2050}_{\begin{smallmatrix} \text{Deviation from} \\ \text{the population} \\ \text{mean} \end{smallmatrix}}\,]^2 & = [\,\overbrace{(\,\underbrace{2053 - 2052}_{\begin{smallmatrix} \text{Deviation from} \\ \text{the sample mean} \end{smallmatrix}}\,)}^{\text{This is }a.} + \overbrace{(2052 - 2050)}^{\text{This is }b.}\,]^2 \\ & = \overbrace{(2053 - 2052)^2}^{\text{This is }a^2.} \\ &\quad + \overbrace{2(2053 - 2052)(2052 - 2050)}^{\text{This is }2ab.} \\ &\quad + \overbrace{(2052 - 2050)^2}^{\text{This is }b^2.} \end{align} \]

Rationale

\[ \begin{alignat}{2} \overbrace{(2051 - 2052)^2}^{\text{This is }a^2.}\ &+\ \overbrace{2(2051 - 2052)(2052 - 2050)}^{\text{This is }2ab.}\ &&+\ \overbrace{(2052 - 2050)^2}^{\text{This is }b^2.} \\ (2053 - 2052)^2\ &+\ 2(2053 - 2052)(2052 - 2050)\ &&+\ (2052 - 2050)^2 \\ (2055 - 2052)^2\ &+\ 2(2055 - 2052)(2052 - 2050)\ &&+\ (2052 - 2050)^2 \\ (2050 - 2052)^2\ &+\ 2(2050 - 2052)(2052 - 2050)\ &&+\ (2052 - 2050)^2 \\ (2051 - 2052)^2\ &+\ \underbrace{2(2051 - 2052)(2052 - 2050)}_{\begin{smallmatrix} \text{The sum of the entries in this} \\ \text{middle column must be 0.} \end{smallmatrix}}\ &&+\ (2052 - 2050)^2 \end{alignat} \]

Why we do not correct the sample standard deviation

Why we do not correct the sample standard deviation

  • Inflates the mean squared error of the estimator \(sd_n\)
  • \(MSE_{\text{estimator}} = E([\text{estimator}] - [\text{quantity to be estimated}])^2\)
    • \(T^2, S^2\)

      \[E((T^2 - \sigma^2)^2) < E((S^2 - \sigma^2)^2)\]

When not to use the normal distribution

  • The properties of the normal distribution are only accurate if the sample size is large enough
  • \(n>100\)

Student’s \(T\)-Distribution

  • William Sealy Gosset
  • Beermaking
  • \(N=3\)
  • New distribution for low \(N\)
  • Pseudonym “Student”

Student’s \(T\)-Distribution

\[f(t) = \frac{\Gamma(\frac{\nu+1}{2})} {\sqrt{\nu\pi}\,\Gamma(\frac{\nu}{2})} \left(1+\frac{t^2}{\nu} \right)^{\!-\frac{\nu+1}{2}}\]

  • \(\nu\) is the number of degrees of freedom
  • \(\Gamma\) is the Gamma function \(\Gamma(n) = (n-1)!\)

Differences from the Normal distribution

  • The normal distribution always has the same shape
  • The shape of the student’s \(t\)-distribution changes depending on the sample size
  • Low \(N \leadsto\) expands the boundaries on random sampling error
  • As sample size increases, the confidence bounds shrink
  • As sample size approaches infinite size, student’s \(t\)-distribution takes on the same shape as the normal distribution

Differences from the Normal distribution

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

Example: calculate CIs for campaign spending

  • How much do candidates for state legislature receive from each donor? \[ \begin{equation} \begin{split} \bar{x} &= 1500 \\ s &= 700 \\ n &= 30 \end{split} \end{equation} \]
  • What is the standard error? \[ \begin{equation} \begin{split} \text{SE of the sample mean} &=\frac{\text{Sample standard deviation}}{\text{Square root of the sample size}} \\ &=\frac{700}{\sqrt{30}} \\ &\approx 128 \end{split} \end{equation} \]

Example: calculate CIs for campaign spending

95% confidence intervals for campaign donation amount using the \(t\) distribution \[ \begin{equation} \begin{split} \text{Degrees of freedom} &=\text{Sample size} - \text{Number of estimated parameters} \\ &=30 - 1 \\ &=29 \end{split} \end{equation} \]

\[ \begin{equation} \begin{split} \text{Lower confidence interval}&=\bar{x} - 2.045 \text{ standard errors} \\ & =1500 - 2.045(128) \\ & \approx 1500 - 262 \\ & = 1238 \\ \text{Upper confidence interval}&=\bar{x} + 2.045 \text{ standard errors} \\ & =1500 + 2.045(128) \\ & \approx 1500 + 262 \\ & = 1762 \end{split} \end{equation} \]

Example: calculate CIs for campaign spending

95% confidence intervals for campaign donation amount using the normal distribution \[ \begin{equation} \begin{split} \text{Lower confidence interval}&=\bar{x} - 1.96 \text{ standard errors} \\ & =1500 - 1.96(128) \\ & \approx 1500 - 251 \\ & = 1249 \\ \text{Upper confidence interval}&=\bar{x} + 1.96 \text{ standard errors} \\ & =1500 - 1.96(128) \\ & \approx 1500 + 251 \\ & = 1751 \end{split} \end{equation} \]

Inference for Sample Proportions

  • Number of cases falling into one category of the variable divided by the number of cases in the sample
  • What percentage of Americans support the legalization of marijuana?
  • What percentage of Americans voted for Barack Obama in 2012?
  • What percentage of Americans prefer cats as opposed to dogs?
  • The same principles of statistical inference for population means also apply to population proportions

Example - Voter Turnout

  • Did you vote in the last election? 78 students answer yes, 28 answer no. \[ \begin{equation} \begin{split} \text{Sample proportion of voters}&=\frac{\text{Number Answering ``Yes''}}{\text{Sample size}} \\ & =\frac{72}{100} \\ & =.72 \end{split} \end{equation} \]

    \[ \begin{equation} \begin{split} \text{Sample proportion of nonvoters}&=\frac{\text{Number Answering ``No''}}{\text{Sample size}} \\ & =\frac{28}{100} \\ & =.28 \end{split} \end{equation} \]

Example - Voter Turnout

  • What is the standard error of the observed sample statistic, \(.72\)? \[ \begin{equation} \begin{split} \text{Sample proportion of voters } p&=.72 \\ \text{Sample proportion of nonvoters } q&= 1-p=.28 \\ \text{Sample size } n&=100 \end{split} \end{equation} \]

  • General formula for calculating the random sampling error is \[ \begin{equation} \begin{split} \text{Random sampling error} &=\frac{\text{Variation component}}{\text{Sample size component}} \\ \text{Sample size component} &= \sqrt{n} \\ \text{Variation component} &= \sqrt{pq} \end{split} \end{equation} \]

  • SE of a sample proportion \(\hat{p}\)

    \[ \begin{equation} \text{Standard error}=\frac{\sqrt{\hat{p}q}}{\sqrt{n}} \end{equation} \]

  • Plug in the numbers and we get \[ \begin{equation} \begin{split} \text{Standard error} &=\frac{\sqrt{(.72)(.28)}}{\sqrt{100}} \\ &= \frac{\sqrt{.20}}{\sqrt{100}} \\ &= \frac{.45}{10} \\ &= .045 \end{split} \end{equation} \]

Example - Voter Turnout

  • Sample proportions are normally distributed, so we use the same method to calculate 95% confidence intervals \[ \begin{equation} \begin{split} \text{Lower confidence interval} &=p - 1.96 \text{ standard errors} \\ & =.72 - 1.96(0.045) \\ & =.72 - 0.09 \\ & =.63 \\ \text{Upper confidence interval} &=p + 1.96 \text{ standard errors} \\ & =.72 + 1.96(0.045) \\ & =.72 + 0.09 \\ & =.81 \end{split} \end{equation} \]

Two sample means

Two sample means

Calculate the standard error of a difference in sample means \[ \begin{equation} \begin{split} \text{Standard error of a difference} &=\sqrt{{se_F}^2 + {se_M}^2} \\ &=\sqrt{1.00^2 + .98^2} \\ &=\sqrt{1.00 + .96} \\ &=\sqrt{1.96} \\ &=1.40 \end{split} \end{equation} \]

One and Two-Tailed Tests of Significance

Example - Voter Turnout

  • Procedure to follow for using confidence interval approach to estimate statistical significance only
    1. Multiply the standard error of the difference by 1.645
    2. Subtract this number from the absolute value of the sample difference
    3. If the result is greater than 0, then reject \(H_0\). If the result is not greater than 0, do not reject \(H_0\)
  • Difference in means for females and males \[ \begin{equation} \begin{split} \text{Lowest plausible difference in means} &= |\bar{x}_F - \bar{x}_M| - 1.645(se_{F-M}) \\ &=4.6 - 1.645(1.40) \\ &=2.30 \end{split} \end{equation} \]
  • Value is greater than 0, so we reject \(H_0\)

\(p\)-Values - the Formal Approach

  • Need to know three things to determine the exact probability of obtaining a given sample difference if the true population difference is 0 \[ \begin{equation} \begin{split} H_A &= \bar{x}_1 - \bar{x}_2 \\ H_0 &= 0 \\ se_{1-2} &=\sqrt{{se_1}^2 + {se_2}^2} \end{split} \end{equation} \]
  • Calculate the test statistic

    \[ \begin{equation} \begin{split} \text{Test statistic} &= \frac{(H_A - H_0)}{se_{1-2}} \\ Z &= \frac{(H_A - H_0)}{se_{1-2}} \\ t &= \frac{(H_A - H_0)}{se_{1-2}} \end{split} \end{equation} \]

Revisit - Gender and Democratic Party Ratings

  • \(Z\) score for females and males \[ \begin{equation} \begin{split} Z &= \frac{(H_A - H_0)}{se_{1-2}} \\ &= \frac{(4.6-0)}{1.40} \\ &= 3.30 \end{split} \end{equation} \]

  • \(p\)-value for \(Z=3.30\) is .0005
  • \(t\) statistic for females and males \[ \begin{equation} \begin{split} t &= \frac{(H_A - H_0)}{se_{1-2}} \\ &= \frac{(4.6-0)}{1.40} \\ &= 3.30 \end{split} \end{equation} \]

    \[ \begin{equation} \begin{split} \text{Degrees of freedom} &=\text{Sample size} - \text{Number of estimated parameters} \\ &=(625+553) - 2 \\ &=1176 \end{split} \end{equation} \]
  • \(p\)-value for \(t=3.30\) and \(df=1176\) is .0005

Comparing Two Sample Proportions

  • Similar to comparing two sample means \[ \begin{equation} \begin{split} \text{SE of the diff in props} &= \sqrt{\sqrt{\frac{p_{1}q_{1}}{n_1}}^2+\sqrt{\frac{p_{1}q_{1}}{n_1}}^2} \\ &= \sqrt{\frac{p_{1}q_{1}}{n_1}+\frac{p_{2}q_{2}}{n_2}} \end{split} \end{equation} \]
  • Everything else is the same

    \[ \begin{equation} \begin{split} se_{1-2} &= \sqrt{\frac{p_{1}q_{1}}{n_1}+\frac{p_{2}q_{2}}{n_2}} \\ \min(p_{1} - p_{2}) &= |p_{1} - p_{2}| - 1.645(se_{1-2}) \end{split} \end{equation} \]

    \[ \begin{equation} \begin{split} H_A &= p_{1} - p_{2} \\ H_0 &= 0 \\ se_{1-2} &=\sqrt{\frac{p_{1}q_{1}}{n_1}+\frac{p_{2}q_{2}}{n_2}} \end{split} \end{equation} \]

    \[ \begin{equation} Z = \frac{(H_A - H_0)}{se_{1-2}} , t = \frac{(H_A - H_0)}{se_{1-2}} \end{equation} \]

Example: Kim Kardashian

  • CBS News/60 Minutes/Vanity Fair National Poll, September #2, 2012

    When you hear the name Kim Kardashian (Kar-dash-ian), which of the following comes to mind first? 1. A self-made businesswoman, 2. A reality television star, 3. A perfume line, 4. Certain physical traits, or 5. A sex tape.

Example: Kim Kardashian

  • Who is more likely to associated Kim Kardashian with “sex tape”? Men or women?

    Frequency Percent
    A self made businesswoman 74 6.715
    Reality television star 562 50.998
    A perfume line 13 1.180
    Certain physical traits 76 6.897
    A sex tape 176 15.971
    DK/NA 201 18.240
    Total 1102 100.000

Example: Kim Kardashian

Sample proportion (\(p\)) Complement of sample proportion (\(q\)) Squared standard error \(\frac{pq}{n}\)
Male .227 .773 .000441
(406)
Female .170 .830 .000289
(495)
Mean difference .057
Sum of squared standard errors .00073
Standard error of the mean difference .027
  • Calculate the \(Z\) score test statistic under the null hypothesis \[ \begin{equation} \begin{split} Z &= \frac{(H_A - H_0)}{se_{1-2}} \\ &= \frac{(.057-0)}{.027} \\ &= 2.11 \end{split} \end{equation} \]
  • \(p\)-value is .0174

\(\chi^2\) Test of Significance

  • Mean and proportion comparisons between two variables
    • What if our variables have more than two values?
  • \(\chi^2\) test of significance
  • Works whenever your independent and dependent variables are nominal or ordinal
  • Can handle more than two values for either/both variables

Theoretical Explanation: Abortion Attitudes

  • \(H_A\) - In a comparison of individuals, liberals are more likely to favor allowing a woman to obtain an abortion for any reason than conservatives
  • \(H_0\) - There is no difference in support between liberals and conservatives for allowing a woman to obtain an abortion for any reason. Any difference is the result of random sampling error.
  • Say the null hypothesis is correct - there are no differences between ideological groups and attitudes towards abortion. What would the table look like?

If null hypothesis is correct

Right to Abortion Liberal Moderate Conservative Total
Yes 40.8% 40.8% 40.8% 40.8%
(206.45) (289.68) (271.32) (768)
No 59.2% 59.2% 59.2% 59.2%
(299.55) (420.32) (393.68) (1113)
Total 26.9% 37.7% 35.4% 100%
(506) (710) (665) (1881)

Observed data

Right to Abortion Liberal Moderate Conservative Total
Yes 62.6% 36.6% 28.7% 40.8%
(317) (260) (191) (768)
No 37.4% 63.4% 71.28% 59.2%
(189) (450) (474) (1113)
Total 26.9% 37.7% 35.4% 100%
(506) (710) (665) (1881)
  • \(\chi^2\) - based on the difference between
    1. Observed frequency
    2. Expected frequency
  • Distribution looks different from other distributions we’ve seen before, but is interpreted in the same way

\(\chi^2\) distribution

\(\chi^2\) Test of Significance

Right to Abortion Liberal Moderate Conservative
Yes Observed Frequency (\(f_o\)) 317.0 260.0 191.0
Expected Frequency (\(f_e\)) 206.6 289.9 271.5
\(f_o - f_e\) 110.4 -29.9 -80.5
\((f_o - f_e)^2\) 12188.9 893.3 6482.7
\(\frac{(f_o - f_e)^2}{f_e}\) 59.0 4.1 23.9
No Observed Frequency (\(f_o\)) 189.0 450.0 474.0
Expected Frequency (\(f_e\)) 299.4 420.1 393.5
\(f_o - f_e\) -110.4 29.9 80.5
\((f_o - f_e)^2\) 12188.9 893.3 6482.7
\(\frac{(f_o - f_e)^2}{f_e}\) 40.7 2.1 16.5
  • Calculating test statistic
    • \(\chi^2=\sum{\frac{(f_o - f_e)^2}{f_e}}=145.27\)
    • \(\text{Degrees of freedom} = (\text{number of rows}-1)(\text{number of columns-1})=2\)
  • Interpretation

Why hypothesis testing is not enough

  • Statistical significance \(\neq\) substantive significance
  • A relationship can be statistically significant, even if in reality the actual difference is small or meaningless
  • “Statistical significance is usually the main thing you want to know about a relationship”
  • Measures of association extend hypothesis testing beyond whether or not the relationship is representative of the population, but how strong is the relationship

Theory

  • Statistical significance is based in part on sample size \[ \begin{equation} \begin{split} H_A &= p_{1} - p_{2} \\ H_0 &= 0 \\ se_p &=\frac{\sqrt{pq}}{\sqrt{n}} \\ se_{1-2} &=\sqrt{\frac{p_{1}q_{1}}{n_1}+\frac{p_{2}q_{2}}{n_2}} \\ Z &= \frac{(H_A - H_0)}{se_{1-2}} \end{split} \end{equation} \]
  • As sample size increases, standard error decreases and the \(p\)-value decreases
  • Easier to meet the .05 threshold

Proportional Reduction in Error

  • How much better can you predict the dependent variable by knowing the independent variable than by not knowing the independent variable

Basic Example - Predicting NCAA Tournament Wins

  • Two possible outcomes
    • Win
    • Loss
  • What is the best strategy for predicting the outcome for any randomly selected game?
    • Always predict a team will win (or lose)
    • Guaranteed 50% accuracy
  • Now let’s say you also know another fact - whether the team is the higher seed or lower seed
  • What is the best strategy for predicting the outcome for any randomly selected game?
    • If the team is the higher seed, predict it will win
    • If the team is the lower seed, predict it will lose

Basic Example - Predicting NCAA Tournament Wins

  • Expected outcomes

    Outcome Higher Seed Lower Seed Total
    Win 312 0 312
    Loss 0 312 312
    Total 312 312 624
  • Actual outcomes from NCAA tournament games (2010-14)

    Outcome Higher Seed Lower Seed Total
    Win 214 98 312
    Loss 98 214 312
    Total 312 312 624
  • Correct predictions = 428
  • Incorrect predictions = 196
  • Accuracy rate = 68.5%

Lambda

  • Measures the strength of a relationship between two categorical variables, at least one of which is nominal
  • Based upon two values \[ \begin{equation} \begin{split} E_1 &= \text{Prediction error without knowledge of the independent variable} \\ E_2 &= \text{Prediction error with knowledge of the independent variable} \\ \lambda &= \frac{(E_1 - E_2)}{E_1} \end{split} \end{equation} \]

Lambda

  • Calculation for NCAA tournament prediction \[ \begin{equation} \begin{split} E_1 &= 312 \\ E_2 &= 196 \\ \lambda &= \frac{(312-196)}{312} \\ &= .37 \end{split} \end{equation} \]
  • Interpreting lambda \[ \begin{equation} \begin{split} \text{Weak} &= \lambda \leq .1 \\ \text{Moderate} &= .1 < \lambda \leq .2 \\ \text{Moderately Strong} &= .2 < \lambda \leq .3 \\ \text{Strong} &= \lambda > .3 \end{split} \end{equation} \]

Example - Abortion

======================================================
             gss$polviews
gss$abany    Liberal   Moderate   Conservative   Total
------------------------------------------------------
Yes              154        116             93     363
              62.857     34.627         29.245
------------------------------------------------------
No                91        219            225     535
              37.143     65.373         70.755
------------------------------------------------------
Total            245        335            318     898
              27.283     37.305         35.412
======================================================

Statistics for All Table Factors

Pearson's Chi-squared test
------------------------------------------------------------
Chi^2 = 72.37185      d.f. = 2      p = 1.925987e-16

        Minimum expected frequency: 99.03675
  • Strategy if you don’t know an individual’s political views - “No”
  • Strategy if you do know an individual’s political views
    • Liberals - Yes
    • Moderates - No
    • Conservatives - No
  • Calculate \(\lambda\) \[ \begin{equation} \begin{split} E_1 &= 363 \\ E_2 &= 91+116+93=300 \\ \lambda &= \frac{(363-300)}{363} \\ &= .17 \end{split} \end{equation} \]

Cramer’s V

  • Sometimes \(\lambda\) will be zero, even if a relationship exists in the data
  • This will happen when the within-category modes are the same as the overall mode

Cramer’s V

  • Yet \(\lambda\) is 0 for both races
    • Whites \[ \begin{equation} \begin{split} E_1 &= 389 \\ E_2 &= 198+191=389 \\ \lambda &= \frac{(389-389)}{389} \\ &= 0 \end{split} \end{equation} \]
    • Blacks \[ \begin{equation} \begin{split} E_1 &= 68 \\ E_2 &= 35+33=68 \\ \lambda &= \frac{(68-68)}{68} \\ &= 0 \end{split} \end{equation} \]
  • In this situation, a better measure is Cramer’s V
    • Based on the \(\chi^2\) test statistic and sample size
    • Value of 0 means no relationship
    • Value of 1 means perfect relationship
  • Interpreting Cramer’s V \[ \begin{equation} \begin{split} \text{Weak} &= V \leq .1 \\ \text{Moderate} &= .1 < V \leq .2 \\ \text{Moderately Strong} &= .2 < V \leq .3 \\ \text{Strong} &= V > .3 \end{split} \end{equation} \]
  • For blacks: \(V=.192\)
  • For whites: \(V=.067\)

Somers’ \(d_{yx}\)

  • Appropriate for examining the relationship between two ordinal variables
  • Accounts for the directionality of the variables
  • Interpreted the same way as \(\lambda\) \[ \begin{equation} \begin{split} \text{Weak} &= d_{yx} \leq .1 \\ \text{Moderate} &= .1 < d_{yx} \leq .2 \\ \text{Moderately Strong} &= .2 < d_{yx} \leq .3 \\ \text{Strong} &= d_{yx} > .3 \end{split} \end{equation} \]

    \[ \begin{equation} \begin{split} \text{Weak} &= d_{yx} \geq -.1 \\ \text{Moderate} &= -.1 < d_{yx} \leq -.2 \\ \text{Moderately Strong} &= -.2 < d_{yx} \leq -.3 \\ \text{Strong} &= d_{yx} < -.3 \end{split} \end{equation} \]