Probability

Computational Mathematics and Statistics Camp University of Chicago September 2018

Model of probability

Sample space - set of all things that could happen
Events - subsets of the sample space
Probability - chance of an event

Sample space

Set of all things that can occur
All distinct outcomes into the set \(S\)

House of Representatives - elections every 2 years
- One incumbent: \(S = \{W, N\}\)
- Two incumbents: \(S = \{(W,W), (W,N), (N,W), (N,N)\}\)
- 435 incumbents: \(S = 2^{435}\) possible outcomes
Number of countries signing treaties
- \(s = \{0, 1, 2, \ldots, 194\}\)
Duration of cabinets
- All non-negative real numbers: \([0, \infty)\)
- \(S = \{x : 0 \leq x < \infty\}\)

The sample space must define all possible realizations

Events

Subset of the sample space \(E \subset S\)
Congressional election example
- One incumbent
  - \(E = W\)
  - \(F = N\)
- Two incumbents
  - \(E = \{(W, N), (W, W) \}\)
  - \(F = \{(N, N)\}\)
- 435 incumbents
  - Outcome of 2016 election - one event
  - All outcomes where Dems retake control of the House - one event
\(x\) is an element of a set \(E\) \[x \in E\] \[\{N, N\} \in E\]

Event operations

\(E\) is a set
Perform operations on sets to create new sets
- \(E = \{ (W,W), (W,N) \}\)
- \(F = \{ (N, N), (W,N) \}\)
- \(S = \{(W,W), (W,N), (N,W), (N,N) \}\)
Operations determine what lies in the new set \(E^{\text{new}}\)

Union: \(\cup\)
- All objects that appear in either set (OR)
- \(E^{\text{new}} = E \cup F = \{(W,W), (W,N), (N,N) \}\)
Intersection: \(\cap\)
- All objects that appear in both sets (AND)
- \(E^{\text{new}} = E \cap F = \{(W,N)\}\)
Complement of set \(E\): \(E^{c}\)
- All objects in \(S\) that are not in \(E\)
- \(E^{c} = \{(N, W) , (N, N) \}\)
- \(F^{c} = \{(N, W) , (W, W) \}\)
- What is \(S^{c}\)? - an empty set \(\emptyset\)
- Suppose \(E = {W}\), \(F = {N}\). Then \(E \cap F = \emptyset\)

Probability

Probability is the chance of an event occurring
\(P\) is a function
The domain contains all events \(E\)

Three axioms

For all events \(E\), \(0 \leq P(E) \leq 1\)
\(P(S) = 1\)
For all sequences of mutually exclusive events \(E_{1}, E_{2}, \ldots,E_{N}\) (where \(N\) can go to infinity):

\[P\left(\cup_{i=1}^{N} E_{i} \right) = \sum_{i=1}^{N} P(E_{i} )\]

Basic examples

Suppose we are flipping a fair coin
- \(P(H) = P(T) = 1/2\)
Suppose we are rolling a six-sided die
- \(P(1) = 1/6\)
Suppose we are flipping a pair of fair coins
- \(P(H, H) = 1/4\)

Basic examples

One candidate example
- \(P(W)\): probability incumbent wins
- \(P(N)\): probability incumbent loses (does not win)
Two candidate example
- \(P(\{W,W\})\): probability both incumbents win
- \(P( \{W,W\}, \{W, N\} )\): probability incumbent \(1\) wins
Full House example:
- \(P( \{ \text{All Democrats Win}\} )\)
We’ll use data to infer these things

Birthday problem

Probabilistic thinking is sometimes difficult, but enlightening
Suppose we have a room full of \(N\) people. What is the probability at least 2 people have the same birthday?
Assuming leap year counts, \(N = 367\) guarantees at least two people with same birthday
For \(N < 367?\)

Birthday problem

E-Harmony problem

eHarmony matches you based on compatibility in the most important areas of life – like values, character, intellect, sense of humor, and 25 other dimensions

E-Harmony problem

Suppose (for example) 29 dimensions are binary (0,1)
Suppose dimensions are independent
\(\Pr(\text{2 people agree}) = 0.5\)

\[ \begin{eqnarray} \text{Pr(Exact)} & = & \text{Pr(Agree)}_{1} \times \text{Pr(Agree)}_{2}\times \ldots \times \text{Pr(Agree)}_{29} \\ & = & 0.5 \times 0.5 \times \ldots \times 0.5 \\ & = & 0.5^{29} \\ & \approx & 1.8 \times 10^{-9} \end{eqnarray} \]
1 in 536,870,912 people

Conditional probability

Social scientists almost always examine conditional relationships
- Given opposite Party ID, probability of date
- Given low-interest rates probability of high inflation
- Given “economic anxiety” probability of voting for Trump
Intuition
- Some event has occurred: an outcome was realized
- We know that this outcome has already happened
- What is the probability that something in another set happens?

Conditional probability

Suppose we have two events, \(E\) and \(F\), and that \(P(F)>0\). Then,

\[ \begin{eqnarray} P(E|F) & = & \frac{P(E\cap F ) } {P(F) } \end{eqnarray} \]

Examples

Example 1
- \(F = \{\text{All Democrats Win} \}\)
- \(E = \{\text{Nancy Pelosi Wins (D-CA)} \}\)
- If \(F\) occurs then \(E\) must occur, \(P(E|F) = 1\)
Example 2
- \(F = \{\text{All Democrats Win} \}\)
- \(E = \{ \text{Ted Cruz Wins (R-TX) }\)
- \(F \cap E = \emptyset \Rightarrow P(E|F) = \frac{P(F \cap E) }{P(F)} = \frac{P(\emptyset)}{P(F)} = 0\)
Example 3: incumbency advantage
- \(I = \{ \text{Candidate is an incumbent} \}\)
- \(D = \{ \text{Candidate Defeated} \}\)
- \(P(D|I) = \frac{P(D \cap I)}{P(I) }\)

Difference between \(P(A|B)\) and \(P(B|A)\)

\[ \begin{eqnarray} P(A|B) & = & \frac{P(A\cap B)}{P(B)} \\ P(B|A) & = & \frac{P(A \cap B) } {P(A)} \end{eqnarray} \]

Type of person who attends football games
- \(P(\text{Attending a football game}| \text{Drunk}) = 0.01\)
- \(P(\text{Drunk}| \text{Attending a football game}) \approx 1\)

Law of total probability

Suppose that we have a set of events \(F_{1}, F_{2}, \ldots, F_{N}\) such that the events are mutually exclusive and together comprise the entire sample space \(\cup_{i=1}^{N} F_{i} = \text{Sample Space}\)
Then, for any event \(E\)

\[ \begin{eqnarray} P(E) & = & \sum_{i=1}^{N} P(E | F_{i} ) \times P(F_{i}) \end{eqnarray} \]

Example

Infer \(P(\text{vote})\) after mobilization campaign
- \(P(\text{vote}|\text{mobilized} ) = 0.75\)
- \(P(\text{vote}| \text{not mobilized} ) = 0.25\)
- \(P(\text{mobilized}) = 0.6 ; P(\text{not mobilized} ) = 0.4\)
- What is \(P(\text{vote})\)?
Sample space (one person) = \(\{\) (mobilized, vote), (mobilized, not vote), (not mobilized, vote) , (not mobilized, not vote) \(\}\)
- Mobilization partitions the space (mutually exclusive and exhaustive)
- We can use the law of total probability \[ \begin{eqnarray} P(\text{vote} ) & = & P(\text{vote}| \text{mob.} ) \times P(\text{mob.} ) + P(\text{vote} | \text{not mob} ) \times P(\text{not mob}) \\ & = & 0.75 \times 0.6 + 0.25 \times 0.4 \\ & = & 0.55 \end{eqnarray} \]

Bayes’ Rule

\(P(B|A)\) may be easy to obtain
\(P(A|B)\) may be harder to determine
A method to move from \(P(B|A)\) to \(P(A|B)\)
Bayes’ Rule: For two events \(A\) and \(B\),

\[ \begin{eqnarray} P(A|B) & = & \frac{P(A)\times P(B|A)}{P(B)} \end{eqnarray} \]

Intuition behind Bayes’ Rule

\[ \begin{eqnarray} P(A|B) & = & \frac{P(A \cap B) }{P(B) } \\ & = & \frac{P(B|A)P(A) } {P(B) } \end{eqnarray} \]

Identifying racial groups from lists of names

P(black)= 0.126
P(not black) = 1 - P(black) = 0.874
P(Washington\(|\)black) = 0.00378
P(Washington\(|\)not black) = 0.000060615
P(black\(|\)Washington) = ???

\[ \begin{eqnarray} P(\text{black}|\text{Wash} ) & = & \frac{P(\text{black}) P(\text{Wash}| \text{black}) }{P(\text{Wash} ) } \\ & = & \frac{P(\text{black}) P(\text{Wash}| \text{black}) }{P(\text{black})P(\text{Wash}|\text{black}) + P(\text{nb})P(\text{Wash}| \text{nb}) } \\ & = & \frac{0.126 \times 0.00378}{0.126\times 0.00378 + 0.874 \times 0.000060616} \\ & \approx & 0.9 \end{eqnarray} \]

The Monty Hall problem

You blew it, and you blew it big! Since you seem to have difficulty grasping the basic principle at work here, I’ll explain. After the host reveals a goat, you now have a one-in-two chance of being correct. Whether you change your selection or not, the odds are the same. There is enough mathematical illiteracy in this country, and we don’t need the world’s highest IQ propagating more. Shame! – Scott Smith, Ph.D. University of Florida (From Wikipedia)

The Monty Hall problem

Suppose we have three doors: \(A, B, C\)
Behind one door there is a car. Behind the other is a goat
- A contestant guesses a door
- The host opens a different door and then contestant has option to switch
- Should the contestant switch?
Contestant guesses \(A\)
\(P(A) = 1/3 \leadsto\) chance of winning without switch
If \(C\) is revealed to not have a car:

\[ \begin{eqnarray} P(B| C \text{ revealed} ) & = & \frac{P(B)P(C \text{ revealed} | B)}{P(B)P(C \text{ revealed} | B) + P(A) P(C \text{ revealed} | A) } \\ & = & \frac{1/3 \times 1}{1/3 \times 1 + 1/3 \times 1/2 } = \frac{1/3}{1/2} = \frac{2}{3} \\ P(A| C \text{ revealed} ) & = & \frac{P(A) P(C \text{ revealed} | A)}{ P(B)P(C \text{ revealed} | B) + P(A) P(C \text{ revealed} | A) } \\ & = & \frac{1/3 \times 1/2}{1/3 \times 1 + 1/3 \times 1/2} = \frac{1}{3} \end{eqnarray} \]
Double chances of winning with switch

Independence of probabilities

Independence: Two events \(E\) and \(F\) are independent if

\[ \begin{eqnarray} P(E\cap F ) & = & P(E)P(F) \end{eqnarray} \]
Independence is symetric

Independence of probabilities

Flip a fair coin twice
- \(E = \text{first flip heads}\)
- \(F = \text{second flip heads}\)
  
  \[ \begin{eqnarray} P(E \cap F ) & = & P( \{ (H, H) , (H, T) \} \cap \{ (H, H), (T, H) \} ) \\ & =& P( \{(H, H)\} ) \\ & = & \frac{1}{4} \\ P(E ) & = & \frac{1} {2} \\ P(F) & = & \frac{1}{2} \\ P(E)P(F) & =& \frac{1}{2} \times \frac{1}{2} = \frac{1}{4} =P(E \cap F ) \end{eqnarray} \]

Independence of probabilities

Suppose \(E\) and \(F\) are independent. Then,

\[ \begin{eqnarray} P(E|F ) & = & \frac{P(E \cap F) }{P(F) } \\ & = & \frac{P(E)P(F)}{P(F)} \\ & = & P(E) \end{eqnarray} \]
- Conditioning on the event \(F\) does not modify the probability of \(E\)
- No information about \(E\) in \(F\)

Independence and Causal Inference

Selection and Observational Studies
- We often want to infer the effect of some treatment
  - Incumbency on vote return
  - College education and job earnings
- Observational studies: observe what we see to make inference
- Problem: units select into treatment
  - Simple example: enroll in job training if I think it will help
  - P(job\(|\)training in study) \(\neq\) P(job\(|\)forced training)
- Background characteristic: difference between treatment and control groups
Experiments: make background characteristics and treatment status independent

Random variables

A random process or variable with a numerical outcome
A random variable \(X\) is a function of the sample space \[ \begin{eqnarray} X:\text{Sample Space} \rightarrow \mathcal{R} \end{eqnarray} \]
- Number of incumbents who win
- An indicator whether a country defaults on a loan (1 if a default, 0 otherwise)
- Number of casualties in a war (rather than all possible outcomes)

Examples of random variables

Suppose we have \(3\) units, flipping fair coin (\(\frac{1}{2}\)) to assign each unit
Assign to \(T=\)Treatment or \(C=\)control
\(X\) = Number of units received treatment
Defining the function

\[ \begin{equation} X = \left \{ \begin{array} {ll} 0 \text{ if } (C, C, C) \\ 1 \text{ if } (T, C, C) \text{ or } (C, T, C) \text{ or } (C, C, T) \\ 2 \text{ if } (T, T, C) \text{ or } (T, C, T) \text{ or } (C, T, T) \\ 3 \text{ if } (T, T, T) \end{array} \right. \end{equation} \]

Examples of random variables

In other words:

\[ \begin{eqnarray} X( (C, C, C) ) & = & 0 \\ X( (T, C, C)) & = & 1 \\ X((T, C, T)) & = & 2 \\ X((T, T, T)) & = & 3 \end{eqnarray} \]

Examples of random variables

\(X\) = Number of Calls into congressional office in some period \(p\)
- \(X(c) = c\)
Outcome of Election
- Define \(v\) as the proportion of vote the candidate receives
- Define \(X = 1\) if \(v>0.50\)
- Define \(X = 0\) if \(v<0.50\)
- For example, if \(v = 0.48\), then \(X(v) = 0\)
How do we compute P(X=1), P(X=0), etc? Come back tomorrow

Expectation

What can we expect from a trial? What is the expected outcome?
Value of random variable for any outcome weighted by the probability of observing that outcome

\[ \begin{eqnarray} E[X] & = & \sum_{x:p(x)>0} x p(x) \end{eqnarray} \]

Example of expected value

Suppose again \(X\) is number of units assigned to treatment, in one of our previous example.
What is \(E[X]\)?

\[ \begin{eqnarray} E[X] & = & 0\times \frac{1}{8} + 1 \times \frac{3}{8} + 2 \times \frac{3}{8} + 3 \times \frac{1}{8} \\ & = & 1.5 \end{eqnarray} \]
Measure of central tendency

Variance

Measure of spread
For each value, we might measure distance from center
- Euclidean distance, squared \(d(x, E[x])^{2} = (x - E[x])^2\)
Then, we might take weighted average of these distances,

\[ \begin{eqnarray} E[(X - E[X])^2] & = & \sum_{x:p(x)>0} (x - E[X])^2p(x) \\ & = & \sum_{x:p(x)>0} \left(x^2 p(x)\right) - 2 E[X]\sum_{x:p(x)>0} \left(x p(x)\right) \\ & \quad & + E[X]^2\sum_{x:p(x)>0} p(x) \\ & = & E[X^2] - 2E[X]^2 + E[X]^2 \\ & = & E[X^2] - E[X]^2 \\ & = & \text{Var}(X) \end{eqnarray} \]

Variance

The variance of a random variable \(X\), var\((X)\), is

\[ \begin{eqnarray} \text{var}(X) & = & E[(X - E[X])^2] \\ & = & E[X^2] - E[X]^2 \end{eqnarray} \]
We will define the standard deviation of \(X\), sd\((X) = \sqrt{\text{var}(X)}\)
var\((X) \geq 0\)

Example of variance

Three person experiment: \(P(T) = P(C) = 1/2\)
What is Var(\(X\))?
We have two components to our variance calculation:

\[ \begin{eqnarray} E[X^2] & = & 3 \\ E[X]^2 & = & 1.5^2 = 2.25 \\ \text{Var}(X) & = & E[X^2] - E[X]^2 \\ & = & 3 - 2.25 = 0.75 \end{eqnarray} \]