Inference for numerical and categorical data

Computational Mathematics and Statistics Camp University of Chicago September 2018

Review: Inference and the Normal Distribution

Standard error is the same thing as random sampling error
\[ \begin{equation} \begin{split} \text{Random sampling error}& =\frac{\sigma}{\sqrt{n}} \\ \text{Standard error of a sample mean}& =\frac{\sigma}{\sqrt{n}} \\ \end{split} \end{equation} \]
Central limit theorem
Normal distribution
The central limit theorem always applies to sample statistics, regardless of the shape of the underlying variable of interest

Comparing sample means

Calculate the difference between the observed sample mean and the hypothetical population mean
Convert the untransformed number to a \(Z\) score
Look up the \(Z\) score in the normal distribution probability table

Example: athiests

Feelings of warmth towards athiests

\[ \begin{equation} \begin{split} \text{Observed sample mean } \bar{x} &=40 \\ \sigma &=25 \\ n &= 100 \\ \text{Standard error of } \bar{x} &=2.5 \\ \text{Hypothesized population mean } \mu &=37 \end{split} \end{equation} \]
95% confidence intervals

\[ \begin{equation} \begin{split} \text{Lower confidence interval}&=\bar{x} - 2 \text{ standard errors} \\ & =40 - 2(2.5) \\ & =40 - 5 \\ & =35 \\ \text{Upper confidence interval}&=\bar{x} + 2 \text{ standard errors} \\ & =40 + 2(2.5) \\ & =40 + 5 \\ & =45 \end{split} \end{equation} \]

Example: athiests

Reject the hypothesized population mean?
1. Calculate the difference between the observed sample mean and the hypothetical population mean
  \[ \begin{equation} \begin{split} \text{Sample mean minus hypothetical mean} &=\bar{x} - \mu \\ &=40-37 \\ &=3 \end{split} \end{equation} \]
2. Convert the untransformed number to a \(Z\) score \[ \begin{equation} \begin{split} Z &=\frac{\text{Sample mean minus hypothetical mean}}{\text{Standard error}} \\ &=\frac{40-37}{2.5} \\ &=\frac{3}{2.5} \\ &=1.2 \end{split} \end{equation} \]
3. Look up the \(Z\) score in the normal distribution probability table
  - \(Z(1.20)=0.1151\)

Using the sample standard deviation

Generally, we do not know the population standard deviation
A reasonable estimate for this population parameter is the sample standard deviation \[ \begin{equation} \begin{split} \text{SE of the sample mean} &=\frac{\text{Sample standard deviation}}{\text{Square root of the sample size}} \\ &=\sqrt{\frac{s^2}{n}} \end{split} \end{equation} \]
Calculating population standard deviation \[ \begin{equation} \begin{split} \sigma^2 &=\frac{\text{TSS}}{N} \\ \sigma &=\sqrt{\frac{\text{TSS}}{N}} \end{split} \end{equation} \]
Calculating sample standard deviation

\[ \begin{equation} \begin{split} s^2 &=\frac{\text{TSS}}{(n-1)} \\ s &=\sqrt{\frac{\text{TSS}}{(n-1)}} \end{split} \end{equation} \]

Bessel’s correction

\(\mu = 2050\) \[2051, 2053, 2055, 2050, 2051\]
Sample average \(\hat{\mu}\) \[\frac{1}{5}\left(2051 + 2053 + 2055 + 2050 + 2051\right) = 2052\]
Population variance if \(\mu\) is known

\[ \begin{align} {} & \frac{1}{5}[(2051 - 2050)^2 + (2053 - 2050)^2 \\ &\quad\, + (2055 - 2050)^2 + (2050 - 2050)^2 \\ &\quad\, + (2051 - 2050)^2] \\ = {} & \frac{36}{5} = 7.2 \end{align} \]
Population variance if \(\mu\) is unknown

\[ \begin{align} {} & \frac{1}{5}[(2051 - 2052)^2 + (2053 - 2052)^2 \\ &\quad\, + (2055 - 2052)^2 + (2050 - 2052)^2 \\ &\quad\, + (2051 - 2052)^2] \\ = {} & \frac{16}{5} = 3.2 \end{align} \]
Estimate of variance using sample mean is always smaller than using population mean

Rationale

\[(a + b)^2 = a^2 + 2ab + b^2\]

\(a=\) the deviation from an individual to the sample mean
\(b=\) the deviation from the sample mean to the population mean

Rationale

\[ \begin{align} {[}\,\underbrace{2053 - 2050}_{\begin{smallmatrix} \text{Deviation from} \\ \text{the population} \\ \text{mean} \end{smallmatrix}}\,]^2 & = [\,\overbrace{(\,\underbrace{2053 - 2052}_{\begin{smallmatrix} \text{Deviation from} \\ \text{the sample mean} \end{smallmatrix}}\,)}^{\text{This is }a.} + \overbrace{(2052 - 2050)}^{\text{This is }b.}\,]^2 \\ & = \overbrace{(2053 - 2052)^2}^{\text{This is }a^2.} \\ &\quad + \overbrace{2(2053 - 2052)(2052 - 2050)}^{\text{This is }2ab.} \\ &\quad + \overbrace{(2052 - 2050)^2}^{\text{This is }b^2.} \end{align} \]

Rationale

\[ \begin{alignat}{2} \overbrace{(2051 - 2052)^2}^{\text{This is }a^2.}\ &+\ \overbrace{2(2051 - 2052)(2052 - 2050)}^{\text{This is }2ab.}\ &&+\ \overbrace{(2052 - 2050)^2}^{\text{This is }b^2.} \\ (2053 - 2052)^2\ &+\ 2(2053 - 2052)(2052 - 2050)\ &&+\ (2052 - 2050)^2 \\ (2055 - 2052)^2\ &+\ 2(2055 - 2052)(2052 - 2050)\ &&+\ (2052 - 2050)^2 \\ (2050 - 2052)^2\ &+\ 2(2050 - 2052)(2052 - 2050)\ &&+\ (2052 - 2050)^2 \\ (2051 - 2052)^2\ &+\ \underbrace{2(2051 - 2052)(2052 - 2050)}_{\begin{smallmatrix} \text{The sum of the entries in this} \\ \text{middle column must be 0.} \end{smallmatrix}}\ &&+\ (2052 - 2050)^2 \end{alignat} \]

Why we do not correct the sample standard deviation

Inflates the mean squared error of the estimator \(sd_n\)
\(MSE_{\text{estimator}} = E([\text{estimator}] - [\text{quantity to be estimated}])^2\)
- \(T^2, S^2\)
  
  \[E((T^2 - \sigma^2)^2) < E((S^2 - \sigma^2)^2)\]

When not to use the normal distribution

The properties of the normal distribution are only accurate if the sample size is large enough
\(n>100\)

Student’s \(T\)-Distribution

William Sealy Gosset
Beermaking
\(N=3\)
New distribution for low \(N\)
Pseudonym “Student”

Student’s \(T\)-Distribution

\[f(t) = \frac{\Gamma(\frac{\nu+1}{2})} {\sqrt{\nu\pi}\,\Gamma(\frac{\nu}{2})} \left(1+\frac{t^2}{\nu} \right)^{\!-\frac{\nu+1}{2}}\]

\(\nu\) is the number of degrees of freedom
\(\Gamma\) is the Gamma function \(\Gamma(n) = (n-1)!\)

Differences from the Normal distribution

The normal distribution always has the same shape
The shape of the student’s \(t\)-distribution changes depending on the sample size
Low \(N \leadsto\) expands the boundaries on random sampling error
As sample size increases, the confidence bounds shrink
As sample size approaches infinite size, student’s \(t\)-distribution takes on the same shape as the normal distribution

Differences from the Normal distribution

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

Example: calculate CIs for campaign spending

How much do candidates for state legislature receive from each donor? \[ \begin{equation} \begin{split} \bar{x} &= 1500 \\ s &= 700 \\ n &= 30 \end{split} \end{equation} \]
What is the standard error? \[ \begin{equation} \begin{split} \text{SE of the sample mean} &=\frac{\text{Sample standard deviation}}{\text{Square root of the sample size}} \\ &=\frac{700}{\sqrt{30}} \\ &\approx 128 \end{split} \end{equation} \]

Example: calculate CIs for campaign spending

95% confidence intervals for campaign donation amount using the \(t\) distribution \[ \begin{equation} \begin{split} \text{Degrees of freedom} &=\text{Sample size} - \text{Number of estimated parameters} \\ &=30 - 1 \\ &=29 \end{split} \end{equation} \]

\[ \begin{equation} \begin{split} \text{Lower confidence interval}&=\bar{x} - 2.045 \text{ standard errors} \\ & =1500 - 2.045(128) \\ & \approx 1500 - 262 \\ & = 1238 \\ \text{Upper confidence interval}&=\bar{x} + 2.045 \text{ standard errors} \\ & =1500 + 2.045(128) \\ & \approx 1500 + 262 \\ & = 1762 \end{split} \end{equation} \]

Example: calculate CIs for campaign spending

95% confidence intervals for campaign donation amount using the normal distribution \[ \begin{equation} \begin{split} \text{Lower confidence interval}&=\bar{x} - 1.96 \text{ standard errors} \\ & =1500 - 1.96(128) \\ & \approx 1500 - 251 \\ & = 1249 \\ \text{Upper confidence interval}&=\bar{x} + 1.96 \text{ standard errors} \\ & =1500 - 1.96(128) \\ & \approx 1500 + 251 \\ & = 1751 \end{split} \end{equation} \]

Inference for Sample Proportions

Number of cases falling into one category of the variable divided by the number of cases in the sample
What percentage of Americans support the legalization of marijuana?
What percentage of Americans voted for Barack Obama in 2012?
What percentage of Americans prefer cats as opposed to dogs?
The same principles of statistical inference for population means also apply to population proportions

Example - Voter Turnout

Did you vote in the last election? 78 students answer yes, 28 answer no. \[ \begin{equation} \begin{split} \text{Sample proportion of voters}&=\frac{\text{Number Answering ``Yes''}}{\text{Sample size}} \\ & =\frac{72}{100} \\ & =.72 \end{split} \end{equation} \]

\[ \begin{equation} \begin{split} \text{Sample proportion of nonvoters}&=\frac{\text{Number Answering ``No''}}{\text{Sample size}} \\ & =\frac{28}{100} \\ & =.28 \end{split} \end{equation} \]

Example - Voter Turnout

What is the standard error of the observed sample statistic, \(.72\)? \[ \begin{equation} \begin{split} \text{Sample proportion of voters } p&=.72 \\ \text{Sample proportion of nonvoters } q&= 1-p=.28 \\ \text{Sample size } n&=100 \end{split} \end{equation} \]
General formula for calculating the random sampling error is \[ \begin{equation} \begin{split} \text{Random sampling error} &=\frac{\text{Variation component}}{\text{Sample size component}} \\ \text{Sample size component} &= \sqrt{n} \\ \text{Variation component} &= \sqrt{pq} \end{split} \end{equation} \]
SE of a sample proportion \(\hat{p}\)

\[ \begin{equation} \text{Standard error}=\frac{\sqrt{\hat{p}q}}{\sqrt{n}} \end{equation} \]
Plug in the numbers and we get \[ \begin{equation} \begin{split} \text{Standard error} &=\frac{\sqrt{(.72)(.28)}}{\sqrt{100}} \\ &= \frac{\sqrt{.20}}{\sqrt{100}} \\ &= \frac{.45}{10} \\ &= .045 \end{split} \end{equation} \]

Example - Voter Turnout

Sample proportions are normally distributed, so we use the same method to calculate 95% confidence intervals \[ \begin{equation} \begin{split} \text{Lower confidence interval} &=p - 1.96 \text{ standard errors} \\ & =.72 - 1.96(0.045) \\ & =.72 - 0.09 \\ & =.63 \\ \text{Upper confidence interval} &=p + 1.96 \text{ standard errors} \\ & =.72 + 1.96(0.045) \\ & =.72 + 0.09 \\ & =.81 \end{split} \end{equation} \]

Two sample means

Calculate the standard error of a difference in sample means \[ \begin{equation} \begin{split} \text{Standard error of a difference} &=\sqrt{{se_F}^2 + {se_M}^2} \\ &=\sqrt{1.00^2 + .98^2} \\ &=\sqrt{1.00 + .96} \\ &=\sqrt{1.96} \\ &=1.40 \end{split} \end{equation} \]

One and Two-Tailed Tests of Significance

Example - Voter Turnout

Procedure to follow for using confidence interval approach to estimate statistical significance only
1. Multiply the standard error of the difference by 1.645
2. Subtract this number from the absolute value of the sample difference
3. If the result is greater than 0, then reject \(H_0\). If the result is not greater than 0, do not reject \(H_0\)
Difference in means for females and males \[ \begin{equation} \begin{split} \text{Lowest plausible difference in means} &= |\bar{x}_F - \bar{x}_M| - 1.645(se_{F-M}) \\ &=4.6 - 1.645(1.40) \\ &=2.30 \end{split} \end{equation} \]
Value is greater than 0, so we reject \(H_0\)

\(p\)-Values - the Formal Approach

Need to know three things to determine the exact probability of obtaining a given sample difference if the true population difference is 0 \[ \begin{equation} \begin{split} H_A &= \bar{x}_1 - \bar{x}_2 \\ H_0 &= 0 \\ se_{1-2} &=\sqrt{{se_1}^2 + {se_2}^2} \end{split} \end{equation} \]
Calculate the test statistic

\[ \begin{equation} \begin{split} \text{Test statistic} &= \frac{(H_A - H_0)}{se_{1-2}} \\ Z &= \frac{(H_A - H_0)}{se_{1-2}} \\ t &= \frac{(H_A - H_0)}{se_{1-2}} \end{split} \end{equation} \]

Revisit - Gender and Democratic Party Ratings

\(Z\) score for females and males \[ \begin{equation} \begin{split} Z &= \frac{(H_A - H_0)}{se_{1-2}} \\ &= \frac{(4.6-0)}{1.40} \\ &= 3.30 \end{split} \end{equation} \]
\(p\)-value for \(Z=3.30\) is .0005
\(t\) statistic for females and males \[ \begin{equation} \begin{split} t &= \frac{(H_A - H_0)}{se_{1-2}} \\ &= \frac{(4.6-0)}{1.40} \\ &= 3.30 \end{split} \end{equation} \]
\[ \begin{equation} \begin{split} \text{Degrees of freedom} &=\text{Sample size} - \text{Number of estimated parameters} \\ &=(625+553) - 2 \\ &=1176 \end{split} \end{equation} \]
\(p\)-value for \(t=3.30\) and \(df=1176\) is .0005

Comparing Two Sample Proportions

Similar to comparing two sample means \[ \begin{equation} \begin{split} \text{SE of the diff in props} &= \sqrt{\sqrt{\frac{p_{1}q_{1}}{n_1}}^2+\sqrt{\frac{p_{1}q_{1}}{n_1}}^2} \\ &= \sqrt{\frac{p_{1}q_{1}}{n_1}+\frac{p_{2}q_{2}}{n_2}} \end{split} \end{equation} \]
Everything else is the same

\[ \begin{equation} \begin{split} se_{1-2} &= \sqrt{\frac{p_{1}q_{1}}{n_1}+\frac{p_{2}q_{2}}{n_2}} \\ \min(p_{1} - p_{2}) &= |p_{1} - p_{2}| - 1.645(se_{1-2}) \end{split} \end{equation} \]

\[ \begin{equation} \begin{split} H_A &= p_{1} - p_{2} \\ H_0 &= 0 \\ se_{1-2} &=\sqrt{\frac{p_{1}q_{1}}{n_1}+\frac{p_{2}q_{2}}{n_2}} \end{split} \end{equation} \]

\[ \begin{equation} Z = \frac{(H_A - H_0)}{se_{1-2}} , t = \frac{(H_A - H_0)}{se_{1-2}} \end{equation} \]

Example: Kim Kardashian

CBS News/60 Minutes/Vanity Fair National Poll, September #2, 2012

When you hear the name Kim Kardashian (Kar-dash-ian), which of the following comes to mind first? 1. A self-made businesswoman, 2. A reality television star, 3. A perfume line, 4. Certain physical traits, or 5. A sex tape.

Example: Kim Kardashian

Who is more likely to associated Kim Kardashian with “sex tape”? Men or women?

	Frequency	Percent
A self made businesswoman	74	6.715
Reality television star	562	50.998
A perfume line	13	1.180
Certain physical traits	76	6.897
A sex tape	176	15.971
DK/NA	201	18.240
Total	1102	100.000

Example: Kim Kardashian

	Sample proportion (\(p\))	Complement of sample proportion (\(q\))	Squared standard error \(\frac{pq}{n}\)
Male	.227	.773	.000441
	(406)
Female	.170	.830	.000289
	(495)
Mean difference	.057
Sum of squared standard errors			.00073
Standard error of the mean difference			.027

Calculate the \(Z\) score test statistic under the null hypothesis \[ \begin{equation} \begin{split} Z &= \frac{(H_A - H_0)}{se_{1-2}} \\ &= \frac{(.057-0)}{.027} \\ &= 2.11 \end{split} \end{equation} \]
\(p\)-value is .0174

\(\chi^2\) Test of Significance

Mean and proportion comparisons between two variables
- What if our variables have more than two values?
\(\chi^2\) test of significance
Works whenever your independent and dependent variables are nominal or ordinal
Can handle more than two values for either/both variables

Theoretical Explanation: Abortion Attitudes

\(H_A\) - In a comparison of individuals, liberals are more likely to favor allowing a woman to obtain an abortion for any reason than conservatives
\(H_0\) - There is no difference in support between liberals and conservatives for allowing a woman to obtain an abortion for any reason. Any difference is the result of random sampling error.
Say the null hypothesis is correct - there are no differences between ideological groups and attitudes towards abortion. What would the table look like?

If null hypothesis is correct

Right to Abortion	Liberal	Moderate	Conservative	Total
Yes	40.8%	40.8%	40.8%	40.8%
	(206.45)	(289.68)	(271.32)	(768)
No	59.2%	59.2%	59.2%	59.2%
	(299.55)	(420.32)	(393.68)	(1113)
Total	26.9%	37.7%	35.4%	100%
	(506)	(710)	(665)	(1881)

Observed data

Right to Abortion	Liberal	Moderate	Conservative	Total
Yes	62.6%	36.6%	28.7%	40.8%
	(317)	(260)	(191)	(768)
No	37.4%	63.4%	71.28%	59.2%
	(189)	(450)	(474)	(1113)
Total	26.9%	37.7%	35.4%	100%
	(506)	(710)	(665)	(1881)

\(\chi^2\) - based on the difference between
1. Observed frequency
2. Expected frequency
Distribution looks different from other distributions we’ve seen before, but is interpreted in the same way

\(\chi^2\) distribution

\(\chi^2\) Test of Significance

Right to Abortion		Liberal	Moderate	Conservative
Yes	Observed Frequency (\(f_o\))	317.0	260.0	191.0
	Expected Frequency (\(f_e\))	206.6	289.9	271.5
	\(f_o - f_e\)	110.4	-29.9	-80.5
	\((f_o - f_e)^2\)	12188.9	893.3	6482.7
	\(\frac{(f_o - f_e)^2}{f_e}\)	59.0	4.1	23.9
No	Observed Frequency (\(f_o\))	189.0	450.0	474.0
	Expected Frequency (\(f_e\))	299.4	420.1	393.5
	\(f_o - f_e\)	-110.4	29.9	80.5
	\((f_o - f_e)^2\)	12188.9	893.3	6482.7
	\(\frac{(f_o - f_e)^2}{f_e}\)	40.7	2.1	16.5

Calculating test statistic
- \(\chi^2=\sum{\frac{(f_o - f_e)^2}{f_e}}=145.27\)
- \(\text{Degrees of freedom} = (\text{number of rows}-1)(\text{number of columns-1})=2\)
Interpretation

Why hypothesis testing is not enough

Statistical significance \(\neq\) substantive significance
A relationship can be statistically significant, even if in reality the actual difference is small or meaningless
“Statistical significance is usually the main thing you want to know about a relationship”
Measures of association extend hypothesis testing beyond whether or not the relationship is representative of the population, but how strong is the relationship

Theory

Statistical significance is based in part on sample size \[ \begin{equation} \begin{split} H_A &= p_{1} - p_{2} \\ H_0 &= 0 \\ se_p &=\frac{\sqrt{pq}}{\sqrt{n}} \\ se_{1-2} &=\sqrt{\frac{p_{1}q_{1}}{n_1}+\frac{p_{2}q_{2}}{n_2}} \\ Z &= \frac{(H_A - H_0)}{se_{1-2}} \end{split} \end{equation} \]
As sample size increases, standard error decreases and the \(p\)-value decreases
Easier to meet the .05 threshold

Proportional Reduction in Error

How much better can you predict the dependent variable by knowing the independent variable than by not knowing the independent variable

Basic Example - Predicting NCAA Tournament Wins

Two possible outcomes
- Win
- Loss
What is the best strategy for predicting the outcome for any randomly selected game?
- Always predict a team will win (or lose)
- Guaranteed 50% accuracy
Now let’s say you also know another fact - whether the team is the higher seed or lower seed
What is the best strategy for predicting the outcome for any randomly selected game?
- If the team is the higher seed, predict it will win
- If the team is the lower seed, predict it will lose

Basic Example - Predicting NCAA Tournament Wins

Expected outcomes

Outcome Higher Seed Lower Seed Total

Win 312 0 312

Loss 0 312 312

Total 312 312 624
Actual outcomes from NCAA tournament games (2010-14)

Outcome Higher Seed Lower Seed Total

Win 214 98 312

Loss 98 214 312

Total 312 312 624
Correct predictions = 428
Incorrect predictions = 196
Accuracy rate = 68.5%

Outcome	Higher Seed	Lower Seed	Total
Win	312	0	312
Loss	0	312	312
Total	312	312	624

Outcome	Higher Seed	Lower Seed	Total
Win	214	98	312
Loss	98	214	312
Total	312	312	624

Lambda

Measures the strength of a relationship between two categorical variables, at least one of which is nominal
Based upon two values \[ \begin{equation} \begin{split} E_1 &= \text{Prediction error without knowledge of the independent variable} \\ E_2 &= \text{Prediction error with knowledge of the independent variable} \\ \lambda &= \frac{(E_1 - E_2)}{E_1} \end{split} \end{equation} \]

Lambda

Calculation for NCAA tournament prediction \[ \begin{equation} \begin{split} E_1 &= 312 \\ E_2 &= 196 \\ \lambda &= \frac{(312-196)}{312} \\ &= .37 \end{split} \end{equation} \]
Interpreting lambda \[ \begin{equation} \begin{split} \text{Weak} &= \lambda \leq .1 \\ \text{Moderate} &= .1 < \lambda \leq .2 \\ \text{Moderately Strong} &= .2 < \lambda \leq .3 \\ \text{Strong} &= \lambda > .3 \end{split} \end{equation} \]

Example - Abortion

======================================================
             gss$polviews
gss$abany    Liberal   Moderate   Conservative   Total
------------------------------------------------------
Yes              154        116             93     363
              62.857     34.627         29.245
------------------------------------------------------
No                91        219            225     535
              37.143     65.373         70.755
------------------------------------------------------
Total            245        335            318     898
              27.283     37.305         35.412
======================================================

Statistics for All Table Factors

Pearson's Chi-squared test
------------------------------------------------------------
Chi^2 = 72.37185      d.f. = 2      p = 1.925987e-16

        Minimum expected frequency: 99.03675

Strategy if you don’t know an individual’s political views - “No”
Strategy if you do know an individual’s political views
- Liberals - Yes
- Moderates - No
- Conservatives - No
Calculate \(\lambda\) \[ \begin{equation} \begin{split} E_1 &= 363 \\ E_2 &= 91+116+93=300 \\ \lambda &= \frac{(363-300)}{363} \\ &= .17 \end{split} \end{equation} \]

Cramer’s V

Sometimes \(\lambda\) will be zero, even if a relationship exists in the data
This will happen when the within-category modes are the same as the overall mode

Cramer’s V

Yet \(\lambda\) is 0 for both races
- Whites \[ \begin{equation} \begin{split} E_1 &= 389 \\ E_2 &= 198+191=389 \\ \lambda &= \frac{(389-389)}{389} \\ &= 0 \end{split} \end{equation} \]
- Blacks \[ \begin{equation} \begin{split} E_1 &= 68 \\ E_2 &= 35+33=68 \\ \lambda &= \frac{(68-68)}{68} \\ &= 0 \end{split} \end{equation} \]
In this situation, a better measure is Cramer’s V
- Based on the \(\chi^2\) test statistic and sample size
- Value of 0 means no relationship
- Value of 1 means perfect relationship
Interpreting Cramer’s V \[ \begin{equation} \begin{split} \text{Weak} &= V \leq .1 \\ \text{Moderate} &= .1 < V \leq .2 \\ \text{Moderately Strong} &= .2 < V \leq .3 \\ \text{Strong} &= V > .3 \end{split} \end{equation} \]
For blacks: \(V=.192\)
For whites: \(V=.067\)

Somers’ \(d_{yx}\)

Appropriate for examining the relationship between two ordinal variables
Accounts for the directionality of the variables
Interpreted the same way as \(\lambda\) \[ \begin{equation} \begin{split} \text{Weak} &= d_{yx} \leq .1 \\ \text{Moderate} &= .1 < d_{yx} \leq .2 \\ \text{Moderately Strong} &= .2 < d_{yx} \leq .3 \\ \text{Strong} &= d_{yx} > .3 \end{split} \end{equation} \]

\[ \begin{equation} \begin{split} \text{Weak} &= d_{yx} \geq -.1 \\ \text{Moderate} &= -.1 < d_{yx} \leq -.2 \\ \text{Moderately Strong} &= -.2 < d_{yx} \leq -.3 \\ \text{Strong} &= d_{yx} < -.3 \end{split} \end{equation} \]