Probability

Computational Mathematics and Statistics Camp University of Chicago September 2018

Model of probability

  1. Sample space - set of all things that could happen
  2. Events - subsets of the sample space
  3. Probability - chance of an event

Sample space

  • Set of all things that can occur
  • All distinct outcomes into the set \(S\)
  1. House of Representatives - elections every 2 years
    • One incumbent: \(S = \{W, N\}\)
    • Two incumbents: \(S = \{(W,W), (W,N), (N,W), (N,N)\}\)
    • 435 incumbents: \(S = 2^{435}\) possible outcomes
  2. Number of countries signing treaties
    • \(s = \{0, 1, 2, \ldots, 194\}\)
  3. Duration of cabinets
    • All non-negative real numbers: \([0, \infty)\)
    • \(S = \{x : 0 \leq x < \infty\}\)
  • The sample space must define all possible realizations

Events

  • Subset of the sample space \(E \subset S\)
  • Congressional election example
    • One incumbent
      • \(E = W\)
      • \(F = N\)
    • Two incumbents
      • \(E = \{(W, N), (W, W) \}\)
      • \(F = \{(N, N)\}\)
    • 435 incumbents
      • Outcome of 2016 election - one event
      • All outcomes where Dems retake control of the House - one event
  • \(x\) is an element of a set \(E\) \[x \in E\] \[\{N, N\} \in E\]

Event operations

  • \(E\) is a set
  • Perform operations on sets to create new sets
    • \(E = \{ (W,W), (W,N) \}\)
    • \(F = \{ (N, N), (W,N) \}\)
    • \(S = \{(W,W), (W,N), (N,W), (N,N) \}\)
  • Operations determine what lies in the new set \(E^{\text{new}}\)
  1. Union: \(\cup\)
    • All objects that appear in either set (OR)
    • \(E^{\text{new}} = E \cup F = \{(W,W), (W,N), (N,N) \}\)
  2. Intersection: \(\cap\)
    • All objects that appear in both sets (AND)
    • \(E^{\text{new}} = E \cap F = \{(W,N)\}\)
  3. Complement of set \(E\): \(E^{c}\)
    • All objects in \(S\) that are not in \(E\)
    • \(E^{c} = \{(N, W) , (N, N) \}\)
    • \(F^{c} = \{(N, W) , (W, W) \}\)
    • What is \(S^{c}\)? - an empty set \(\emptyset\)
    • Suppose \(E = {W}\), \(F = {N}\). Then \(E \cap F = \emptyset\)

Probability

  • Probability is the chance of an event occurring
  • \(P\) is a function
  • The domain contains all events \(E\)

Three axioms

  1. For all events \(E\), \(0 \leq P(E) \leq 1\)
  2. \(P(S) = 1\)
  3. For all sequences of mutually exclusive events \(E_{1}, E_{2}, \ldots,E_{N}\) (where \(N\) can go to infinity):

    \[P\left(\cup_{i=1}^{N} E_{i} \right) = \sum_{i=1}^{N} P(E_{i} )\]

Basic examples

  • Suppose we are flipping a fair coin
    • \(P(H) = P(T) = 1/2\)
  • Suppose we are rolling a six-sided die
    • \(P(1) = 1/6\)
  • Suppose we are flipping a pair of fair coins
    • \(P(H, H) = 1/4\)

Basic examples

  • One candidate example
    • \(P(W)\): probability incumbent wins
    • \(P(N)\): probability incumbent loses (does not win)
  • Two candidate example
    • \(P(\{W,W\})\): probability both incumbents win
    • \(P( \{W,W\}, \{W, N\} )\): probability incumbent \(1\) wins
  • Full House example:
    • \(P( \{ \text{All Democrats Win}\} )\)
  • We’ll use data to infer these things

Birthday problem

  • Probabilistic thinking is sometimes difficult, but enlightening
  • Suppose we have a room full of \(N\) people. What is the probability at least 2 people have the same birthday?
  • Assuming leap year counts, \(N = 367\) guarantees at least two people with same birthday
  • For \(N < 367?\)

Birthday problem

E-Harmony problem

eHarmony matches you based on compatibility in the most important areas of life – like values, character, intellect, sense of humor, and 25 other dimensions

E-Harmony problem

  • Suppose (for example) 29 dimensions are binary (0,1)
  • Suppose dimensions are independent
  • \(\Pr(\text{2 people agree}) = 0.5\)

    \[ \begin{eqnarray} \text{Pr(Exact)} & = & \text{Pr(Agree)}_{1} \times \text{Pr(Agree)}_{2}\times \ldots \times \text{Pr(Agree)}_{29} \\ & = & 0.5 \times 0.5 \times \ldots \times 0.5 \\ & = & 0.5^{29} \\ & \approx & 1.8 \times 10^{-9} \end{eqnarray} \]

  • 1 in 536,870,912 people

Conditional probability

  • Social scientists almost always examine conditional relationships
    • Given opposite Party ID, probability of date
    • Given low-interest rates probability of high inflation
    • Given “economic anxiety” probability of voting for Trump
  • Intuition
    • Some event has occurred: an outcome was realized
    • We know that this outcome has already happened
    • What is the probability that something in another set happens?

Conditional probability

  • Suppose we have two events, \(E\) and \(F\), and that \(P(F)>0\). Then,

    \[ \begin{eqnarray} P(E|F) & = & \frac{P(E\cap F ) } {P(F) } \end{eqnarray} \]

Examples

  • Example 1
    • \(F = \{\text{All Democrats Win} \}\)
    • \(E = \{\text{Nancy Pelosi Wins (D-CA)} \}\)
    • If \(F\) occurs then \(E\) must occur, \(P(E|F) = 1\)
  • Example 2
    • \(F = \{\text{All Democrats Win} \}\)
    • \(E = \{ \text{Ted Cruz Wins (R-TX) }\)
    • \(F \cap E = \emptyset \Rightarrow P(E|F) = \frac{P(F \cap E) }{P(F)} = \frac{P(\emptyset)}{P(F)} = 0\)
  • Example 3: incumbency advantage
    • \(I = \{ \text{Candidate is an incumbent} \}\)
    • \(D = \{ \text{Candidate Defeated} \}\)
    • \(P(D|I) = \frac{P(D \cap I)}{P(I) }\)

Difference between \(P(A|B)\) and \(P(B|A)\)

\[ \begin{eqnarray} P(A|B) & = & \frac{P(A\cap B)}{P(B)} \\ P(B|A) & = & \frac{P(A \cap B) } {P(A)} \end{eqnarray} \]

  • Type of person who attends football games
    • \(P(\text{Attending a football game}| \text{Drunk}) = 0.01\)
    • \(P(\text{Drunk}| \text{Attending a football game}) \approx 1\)

Law of total probability

  • Suppose that we have a set of events \(F_{1}, F_{2}, \ldots, F_{N}\) such that the events are mutually exclusive and together comprise the entire sample space \(\cup_{i=1}^{N} F_{i} = \text{Sample Space}\)
  • Then, for any event \(E\)

    \[ \begin{eqnarray} P(E) & = & \sum_{i=1}^{N} P(E | F_{i} ) \times P(F_{i}) \end{eqnarray} \]

Example

  • Infer \(P(\text{vote})\) after mobilization campaign
    • \(P(\text{vote}|\text{mobilized} ) = 0.75\)
    • \(P(\text{vote}| \text{not mobilized} ) = 0.25\)
    • \(P(\text{mobilized}) = 0.6 ; P(\text{not mobilized} ) = 0.4\)
    • What is \(P(\text{vote})\)?
  • Sample space (one person) = \(\{\) (mobilized, vote), (mobilized, not vote), (not mobilized, vote) , (not mobilized, not vote) \(\}\)
    • Mobilization partitions the space (mutually exclusive and exhaustive)
    • We can use the law of total probability \[ \begin{eqnarray} P(\text{vote} ) & = & P(\text{vote}| \text{mob.} ) \times P(\text{mob.} ) + P(\text{vote} | \text{not mob} ) \times P(\text{not mob}) \\ & = & 0.75 \times 0.6 + 0.25 \times 0.4 \\ & = & 0.55 \end{eqnarray} \]

Bayes’ Rule

  • \(P(B|A)\) may be easy to obtain
  • \(P(A|B)\) may be harder to determine
  • A method to move from \(P(B|A)\) to \(P(A|B)\)
  • Bayes’ Rule: For two events \(A\) and \(B\),

    \[ \begin{eqnarray} P(A|B) & = & \frac{P(A)\times P(B|A)}{P(B)} \end{eqnarray} \]

Intuition behind Bayes’ Rule

\[ \begin{eqnarray} P(A|B) & = & \frac{P(A \cap B) }{P(B) } \\ & = & \frac{P(B|A)P(A) } {P(B) } \end{eqnarray} \]

Identifying racial groups from lists of names

  • P(black)= 0.126
  • P(not black) = 1 - P(black) = 0.874
  • P(Washington\(|\)black) = 0.00378
  • P(Washington\(|\)not black) = 0.000060615
  • P(black\(|\)Washington) = ???

    \[ \begin{eqnarray} P(\text{black}|\text{Wash} ) & = & \frac{P(\text{black}) P(\text{Wash}| \text{black}) }{P(\text{Wash} ) } \\ & = & \frac{P(\text{black}) P(\text{Wash}| \text{black}) }{P(\text{black})P(\text{Wash}|\text{black}) + P(\text{nb})P(\text{Wash}| \text{nb}) } \\ & = & \frac{0.126 \times 0.00378}{0.126\times 0.00378 + 0.874 \times 0.000060616} \\ & \approx & 0.9 \end{eqnarray} \]

The Monty Hall problem

You blew it, and you blew it big! Since you seem to have difficulty grasping the basic principle at work here, I’ll explain. After the host reveals a goat, you now have a one-in-two chance of being correct. Whether you change your selection or not, the odds are the same. There is enough mathematical illiteracy in this country, and we don’t need the world’s highest IQ propagating more. Shame! – Scott Smith, Ph.D. University of Florida (From Wikipedia)

The Monty Hall problem

  • Suppose we have three doors: \(A, B, C\)
  • Behind one door there is a car. Behind the other is a goat
    • A contestant guesses a door
    • The host opens a different door and then contestant has option to switch
    • Should the contestant switch?
  • Contestant guesses \(A\)
  • \(P(A) = 1/3 \leadsto\) chance of winning without switch
  • If \(C\) is revealed to not have a car:

    \[ \begin{eqnarray} P(B| C \text{ revealed} ) & = & \frac{P(B)P(C \text{ revealed} | B)}{P(B)P(C \text{ revealed} | B) + P(A) P(C \text{ revealed} | A) } \\ & = & \frac{1/3 \times 1}{1/3 \times 1 + 1/3 \times 1/2 } = \frac{1/3}{1/2} = \frac{2}{3} \\ P(A| C \text{ revealed} ) & = & \frac{P(A) P(C \text{ revealed} | A)}{ P(B)P(C \text{ revealed} | B) + P(A) P(C \text{ revealed} | A) } \\ & = & \frac{1/3 \times 1/2}{1/3 \times 1 + 1/3 \times 1/2} = \frac{1}{3} \end{eqnarray} \]

  • Double chances of winning with switch

Independence of probabilities

  • Independence: Two events \(E\) and \(F\) are independent if

    \[ \begin{eqnarray} P(E\cap F ) & = & P(E)P(F) \end{eqnarray} \]

  • Independence is symetric

Independence of probabilities

  • Flip a fair coin twice
    • \(E = \text{first flip heads}\)
    • \(F = \text{second flip heads}\)

      \[ \begin{eqnarray} P(E \cap F ) & = & P( \{ (H, H) , (H, T) \} \cap \{ (H, H), (T, H) \} ) \\ & =& P( \{(H, H)\} ) \\ & = & \frac{1}{4} \\ P(E ) & = & \frac{1} {2} \\ P(F) & = & \frac{1}{2} \\ P(E)P(F) & =& \frac{1}{2} \times \frac{1}{2} = \frac{1}{4} =P(E \cap F ) \end{eqnarray} \]

Independence of probabilities

  • Suppose \(E\) and \(F\) are independent. Then,

    \[ \begin{eqnarray} P(E|F ) & = & \frac{P(E \cap F) }{P(F) } \\ & = & \frac{P(E)P(F)}{P(F)} \\ & = & P(E) \end{eqnarray} \]

    • Conditioning on the event \(F\) does not modify the probability of \(E\)
    • No information about \(E\) in \(F\)

Independence and Causal Inference

  • Selection and Observational Studies
    • We often want to infer the effect of some treatment
      • Incumbency on vote return
      • College education and job earnings
    • Observational studies: observe what we see to make inference
    • Problem: units select into treatment
      • Simple example: enroll in job training if I think it will help
      • P(job\(|\)training in study) \(\neq\) P(job\(|\)forced training)
    • Background characteristic: difference between treatment and control groups
  • Experiments: make background characteristics and treatment status independent

Random variables

  • A random process or variable with a numerical outcome
  • A random variable \(X\) is a function of the sample space \[ \begin{eqnarray} X:\text{Sample Space} \rightarrow \mathcal{R} \end{eqnarray} \]

    • Number of incumbents who win
    • An indicator whether a country defaults on a loan (1 if a default, 0 otherwise)
    • Number of casualties in a war (rather than all possible outcomes)

Examples of random variables

  • Suppose we have \(3\) units, flipping fair coin (\(\frac{1}{2}\)) to assign each unit
  • Assign to \(T=\)Treatment or \(C=\)control
  • \(X\) = Number of units received treatment
  • Defining the function

    \[ \begin{equation} X = \left \{ \begin{array} {ll} 0 \text{ if } (C, C, C) \\ 1 \text{ if } (T, C, C) \text{ or } (C, T, C) \text{ or } (C, C, T) \\ 2 \text{ if } (T, T, C) \text{ or } (T, C, T) \text{ or } (C, T, T) \\ 3 \text{ if } (T, T, T) \end{array} \right. \end{equation} \]

Examples of random variables

  • In other words:

    \[ \begin{eqnarray} X( (C, C, C) ) & = & 0 \\ X( (T, C, C)) & = & 1 \\ X((T, C, T)) & = & 2 \\ X((T, T, T)) & = & 3 \end{eqnarray} \]

Examples of random variables

  • \(X\) = Number of Calls into congressional office in some period \(p\)
    • \(X(c) = c\)
  • Outcome of Election
    • Define \(v\) as the proportion of vote the candidate receives
    • Define \(X = 1\) if \(v>0.50\)
    • Define \(X = 0\) if \(v<0.50\)
    • For example, if \(v = 0.48\), then \(X(v) = 0\)
  • How do we compute P(X=1), P(X=0), etc? Come back tomorrow

Expectation

  • What can we expect from a trial? What is the expected outcome?
  • Value of random variable for any outcome weighted by the probability of observing that outcome

    \[ \begin{eqnarray} E[X] & = & \sum_{x:p(x)>0} x p(x) \end{eqnarray} \]

Example of expected value

  • Suppose again \(X\) is number of units assigned to treatment, in one of our previous example.
  • What is \(E[X]\)?

    \[ \begin{eqnarray} E[X] & = & 0\times \frac{1}{8} + 1 \times \frac{3}{8} + 2 \times \frac{3}{8} + 3 \times \frac{1}{8} \\ & = & 1.5 \end{eqnarray} \]

  • Measure of central tendency

Variance

  • Measure of spread
  • For each value, we might measure distance from center
    • Euclidean distance, squared \(d(x, E[x])^{2} = (x - E[x])^2\)
  • Then, we might take weighted average of these distances,

    \[ \begin{eqnarray} E[(X - E[X])^2] & = & \sum_{x:p(x)>0} (x - E[X])^2p(x) \\ & = & \sum_{x:p(x)>0} \left(x^2 p(x)\right) - 2 E[X]\sum_{x:p(x)>0} \left(x p(x)\right) \\ & \quad & + E[X]^2\sum_{x:p(x)>0} p(x) \\ & = & E[X^2] - 2E[X]^2 + E[X]^2 \\ & = & E[X^2] - E[X]^2 \\ & = & \text{Var}(X) \end{eqnarray} \]

Variance

  • The variance of a random variable \(X\), var\((X)\), is

    \[ \begin{eqnarray} \text{var}(X) & = & E[(X - E[X])^2] \\ & = & E[X^2] - E[X]^2 \end{eqnarray} \]

  • We will define the standard deviation of \(X\), sd\((X) = \sqrt{\text{var}(X)}\)
  • var\((X) \geq 0\)

Example of variance

  • Three person experiment: \(P(T) = P(C) = 1/2\)
  • What is Var(\(X\))?
  • We have two components to our variance calculation:

    \[ \begin{eqnarray} E[X^2] & = & 3 \\ E[X]^2 & = & 1.5^2 = 2.25 \\ \text{Var}(X) & = & E[X^2] - E[X]^2 \\ & = & 3 - 2.25 = 0.75 \end{eqnarray} \]