Random Variables

Computational Mathematics and Statistics Camp University of Chicago September 2018

Probability mass function

  • Go back to our experiment example – probability comes from probability of outcomes
  • \(P(C, T, C) = P(C)P(T)P(C) = \frac{1}{2}\frac{1}{2}\frac{1}{2} = \frac{1}{8}\)
  • This applies to all outcomes

    \[ \begin{eqnarray} p(X = 0) & = & P(C, C, C) = \frac{1}{8}\\ p(X = 1) & = & P(T, C, C) + P(C, T, C) + P(C, C, T) = \frac{3}{8} \\ p(X = 2) & = & P(T, T, C) + P(T, C, T) + P(C, T, T) = \frac{3}{8} \\ p(X = 3) & = & P(T, T, T) = \frac{1}{8} \end{eqnarray} \]
  • \(p(X = a) = 0\), for all \(a \notin (0, 1, 2, 3)\)

Probability mass function

  • Consider outcome of election
    • \(X(v)=1\) if \(v>0.5\) otherwise \(X(v) = 0\)
    • \(P(X = 1)\) then is equal to \(P(v>0.5)\)

Probability mass function

  • If \(X\) is defined on an outcome space that is discrete (countable), we’ll call it discrete
  • Probability Mass Function: For a discrete random variable \(X\), define the probability mass function \(p(x)\) as

    \[ \begin{eqnarray} p(x) & = & P(X = x) \end{eqnarray} \]

Topic model example

  • Topics: distinct concepts (war in Afghanistan, national debt, fire department grants)
  • Mathematically: Probability Mass Function on Words
    • Probability of using word, when discussing a topic
  • Suppose we have a set of words:
    • (afghanistan, fire, department, soldier, troop, war, grant)
  • Topic 1 (war)
    • P(afghanistan) = 0.3; P(fire) = 0.0001; P(department) = 0.0001; P(soldier) = 0.2; P(troop) = 0.2; P(war)=0.2997; P(grant)=0.0001
  • Topic 2 (fire departments ):
    • P(afghanistan) = 0.0001; P(fire) = 0.3; P(department) = 0.2; P(soldier) = 0.0001; P(troop) = 0.0001; P(war)=0.0001; P(grant)=0.2997
  • Take a set of documents and estimate topics

Cumulative mass function

  • For a random variable \(X\), define the cumulative mass function \(F(x)\) as, \[ \begin{eqnarray} F(x) & = & P(X \leq x) \end{eqnarray} \]
  • Characterizes how probability cumulates as \(X\) gets larger
  • \(F(x) \in [0,1]\)
  • \(F(x)\) is non-decreasing

Three person experiment

  • Consider the three person experiment: \(P(T) = P(C) = 1/2\)
  • What is \(F(2)\)?

    \[ \begin{eqnarray} F(2) & = & P(X = 0) + P(X = 1) + P(X = 2) \\ & = & \frac{1}{8} + \frac{3}{8} + \frac{3}{8} \\ & = & \frac{7}{8} \end{eqnarray} \]
  • What is \(F(2) - F(1)\)?

    \[ \begin{eqnarray} F(2) - F(1) & = & [P(X = 0) + P(X = 1) + P(X = 2)] \nonumber \\ && -[P(X = 0) + P(X = 1)] \\ F(2) - F(1) & = & P(X = 2) \end{eqnarray} \]

Close relationship between pmf’s and cmf’s

Functions of random variables

  • We might (or often) apply a function to a random variable \(g(X)\)
  • How do we compute \(E[g(X)]\)?
  • Expected value of a function of a random variable: Suppose \(X\) is a discrete random variable that takes on values \(x_{i}\), \(i=\{1, 2, \ldots, \}\), with probabilities \(p(x_{i})\). If \(g:X \rightarrow \mathcal{R}\), then its expected value \(E[g(X)]\) is,

    \[ \begin{eqnarray} E[g(X)] & = & \sum_{i} g(x_{i}) p(x_{i} ) \end{eqnarray} \]

Example

  • Let’s suppose that \(X\) is the number of observations assigned to treatment (from our previous example).
  • Suppose that \(g(X) = X^2\). What is \(E[g(X)]\)?

    \[ \begin{eqnarray} E[g(X)] = E[X^2]& = & 0^2 \times \frac{1}{8} + 1^2 \times \frac{3}{8} + 2^2 \times \frac{3}{8} + 3^2 \times \frac{1}{8} \\ & = & 0 + \frac{3}{8} + \frac{12}{8} + \frac{9}{8} \\ & = & \frac{24}{8} = 3 \end{eqnarray} \]

Bernoulli distribution

  • Suppose \(X\) is a random variable, with \(X \in \{0, 1\}\) and \(P(X = 1) = \pi\). Then we will say that \(X\) is Bernoulli random variable,

    \[ \begin{eqnarray} p(k) & = & \pi^{k} (1- \pi)^{1 - k} \nonumber \end{eqnarray} \]

    for \(k \in \{0,1\}\) and \(p(k) = 0\) otherwise.

  • We will (equivalently) say that

    \[ \begin{eqnarray} Y & \sim & \text{Bernoulli}(\pi) \nonumber \end{eqnarray} \]

Bernoulli distribution

  • Suppose we flip a fair coin and \(Y = 1\) if the outcome is Heads

    \[ \begin{eqnarray} Y & \sim & \text{Bernoulli}(1/2) \nonumber \\ p(1) & = & (1/2)^{1} (1- 1/2)^{ 1- 1} = 1/2 \nonumber \\ p(0) & = & (1/2)^{0} (1- 1/2)^{1 - 0} = (1- 1/2) \nonumber \end{eqnarray} \]

Expected value and variance

  • Suppose \(Y \sim \text{Bernoulli}(\pi)\)

    \[ \begin{eqnarray} E[Y] & = & 1 \times P(Y = 1) + 0 \times P(Y = 0) \nonumber \\ & = & \pi + 0 (1 - \pi) \nonumber = \pi \\ \text{var}(Y) & = & E[Y^2] - E[Y]^2 \nonumber \\ E[Y^2] & = & 1^{2} P(Y = 1) + 0^{2} P(Y = 0) \nonumber \\ & = & \pi \nonumber \\ \text{var}(Y) & = & \pi - \pi^{2} \nonumber \\ & = & \pi(1 - \pi ) \nonumber \end{eqnarray} \]

  • \(E[Y] = \pi\)
  • Var\((Y) = \pi(1- \pi)\)
  • What is the maximum variance?

    \[ \begin{eqnarray} \text{var}(Y) & = & \pi - \pi^{2} \nonumber \\ & = & 0.5(1 - 0.5 ) \\ & = & 0.25 \end{eqnarray} \]

Example: Winning a war

  • Suppose country \(1\) is engaged in a conflict and can either win or lose.
  • Define \(Y = 1\) if the country wins and \(Y = 0\) otherwise.
  • Then,

    \[ \begin{eqnarray} Y &\sim & \text{Bernoulli}(\pi) \end{eqnarray} \]

  • Suppose country \(1\) is deciding whether to fight a war.
  • Engaging in the war will cost the country \(c\).
  • If they win, country \(1\) receives \(B\).
  • What is \(1\)’s expected utility from fighting a war?

    \[ \begin{eqnarray} E[U(\text{war})] & = & (\text{Utility}|\text{win})\times P(\text{win}) + (\text{Utility}| \text{lose})\times P(\text{lose}) \\ &= & (B - c) P(Y = 1) + (- c) P(Y = 0 ) \\ & = & B \times p(Y = 1) - c(P(Y = 1) + P(Y = 0)) \\ & = & B \times \pi - c \end{eqnarray} \]

Binomial distribution

  • A model to count the number of successes across \(N\) trials
  • Suppose \(X\) is a random variable that counts the number of successes in \(N\) independent and identically distributed Bernoulli trials. Then \(X\) is a Binomial random variable,

    \[ \begin{eqnarray} p(k) & = & {{N}\choose{k}}\pi^{k} (1- \pi)^{1-k} \nonumber \end{eqnarray} \]

  • for \(k \in \{0, 1, 2, \ldots, N\}\) and \(p(k) = 0\) otherwise
  • \(\binom{N}{k} = \frac{N!}{(N-k)! k!}\).
  • Equivalently,

    \[ \begin{eqnarray} Y & \sim & \text{Binomial}(N, \pi) \nonumber \end{eqnarray} \]

Example

  • Recall our experiment example:
    • \(P(T) = P(C) = 1/2\)
    • \(Z =\) number of units assigned to treatment

      \[ \begin{eqnarray} Z & \sim & \text{Binomial}(1/2)\\ p(0) & = & {{3}\choose{0}} (1/2)^{0} (1- 1/2)^{3-0} = 1 \times \frac{1}{8}\\ p(1) & = & {{3}\choose{1}} (1/2)^{1} (1 - 1/2)^{2} = 3 \times \frac{1}{8} \\ p(2) & = & {{3}\choose{2}} (1/2)^{2} (1- 1/2)^1 = 3 \times \frac{1}{8} \\ p(3) & = & {{3}\choose{3}} (1/2)^{3} (1 - 1/2)^{0} = 1 \times \frac{1}{8} \end{eqnarray} \]

Expected value and variance

\[Z = \sum_{i=1}^{N} Y_{i} \text{ where } Y_{i} \sim \text{Bernoulli}(\pi)\]

\[ \begin{eqnarray} E[Z] & = & E[Y_{1} + Y_{2} + Y_{3} + \ldots + Y_{N} ] \\ & = & \sum_{i=1}^{N} E[Y_{i} ] \\ & = & N \pi \\ \text{var}(Z) & = & \sum_{i=1}^{N} \text{var}(Y_{i}) \\ & = & N \pi (1-\pi) \end{eqnarray} \]

Poisson

  • Often interested in counting number of events that occur:
    • Number of wars started
    • Number of speeches made
    • Number of bribes offered
    • Number of people waiting for license
  • Generally referred to as event counts

Poisson

  • Suppose \(X\) is a random variable that takes on values \(X \in \{0, 1, 2, \ldots, \}\) and that \(P(X = k) = p(k)\) is,

    \[ \begin{eqnarray} p(k) & = & e^{-\lambda} \frac{\lambda^{k}}{k!} \nonumber \end{eqnarray} \]

    for \(k \in \{0, 1, \ldots, \}\) and \(0\) otherwise. Then we will say that \(X\) follows a Poisson distribution with rate parameter \(\lambda\)

    \[ \begin{eqnarray} X & \sim & \text{Poisson}(\lambda) \nonumber \end{eqnarray} \]
  • \(E(X) = var(X) = \lambda\)

Example

  • Suppose the number of threats a president makes in a term is given by \(X \sim \text{Poisson}(5)\)
  • What is the probability the president will make ten or more threats?

    \[ \begin{eqnarray} P(X \geq 10) & = & e^{-\lambda} \sum_{k=10}^{\infty} \frac{5^{k}}{k!} \\ & = & 1 - P(X< 10 ) \end{eqnarray} \]

Example

Continuous random variables

  • Random variables that are not discrete
    • Approval ratings
    • GDP
    • Wait time between wars: \(X(t) = t\) for all \(t\)
    • Proportion of vote received: \(X(v) = v\) for all \(v\)
  • Many analogues to discrete probability distributions
  • We need calculus to answer questions about probability

Probability density function

  • What is the area under the curve under \(f(x)\) between \(.5\) and \(2\)?

    \[\int_{1/2}^{2} f(x)dx = F(2) - F(1/2)\]

Definition

  • \(X\) is a continuous random variable if there exists a nonnegative function defined for all \(x \in \Re\) having the property for any (measurable) set of real numbers \(B\),

    \[ \begin{eqnarray} P(X \in B) & = & \int_{B} f(x)dx \nonumber \end{eqnarray} \]

    • Non-negative meaning \(f(x)\) is never negative
  • We’ll call \(f(\cdot)\) the density function for \(X\)

Uniform Random Variable

  • \(X \sim \text{Uniform}(0,1)\) if

    \[ \begin{eqnarray} f(x) & = & 1 \text{ if } x \in [0,1] \\ f(x) & = & 0 \text{ otherwise } \end{eqnarray} \]

Uniform Random Variable

\[ \begin{eqnarray} P(X \in [0.2, 0.5]) & = & \int_{0.2}^{0.5} 1 dx \nonumber \\ & = & X |^{0.5}_{0.2} \nonumber \\ & = & 0.5 - 0.2 \nonumber \\ & = & 0.3\nonumber \end{eqnarray} \]

Uniform Random Variable

\[ \begin{eqnarray} P(X \in [0, 1] ) & = & \int_{0}^{1} 1 dx \nonumber \\ & = & X |^{1}_{0} \nonumber \\ & = & 1 - 0 \nonumber \\ & = & 1 \nonumber \end{eqnarray} \]

Uniform Random Variable

\[ \begin{eqnarray} P(X \in [0.5, 0.5]) & = & \int_{0.5}^{0.5} 1dx \nonumber \\ & = & X|^{0.5}_{0.5} \nonumber \\ & = & 0.5 - 0.5 \nonumber \\ & = & 0 \nonumber \end{eqnarray} \]

Uniform Random Variable

\[ \begin{eqnarray} P(X \in \{[0, 0.2]\cup[0.5, 1]\}) & = & \int_{0}^{0.2} 1dx + \int_{0.5}^{1} 1dx \nonumber \\ & = & X_{0}^{0.2} + X_{0.5}^{1} \nonumber \\ & = & 0.2 - 0 + 1 - 0.5 \nonumber \\ & = & 0.7 \nonumber \end{eqnarray} \]

Uniform Random Variable

  • To summarize
    • \(P(X = a) = 0\) – probability at any point is 0 (instantaneous)
    • \(P(X \in (-\infty, \infty) ) = 1\) – probability over the entire real number line is 1
    • If \(F\) is antiderivative of \(f\), then \(P(X \in [c,d]) = F(d) - F(c)\) (Fundamental theorem of calculus)

Cumulative mass function

  • Probability density function (\(f\)) characterizes distribution of continuous random variable
  • Equivalently, cumulative distribution function characterizes continuous random variables
  • For a continuous random variable \(X\) define its cumulative distribution function \(F(x)\) as,

    \[ \begin{eqnarray} F(t) & = & P(X \leq t) = \int_{-\infty} ^{t} f(x) dx \nonumber \end{eqnarray} \]

  • pdf integrated \(\leadsto\) cdf
  • cdf differentiated \(\leadsto\) pdf

Example: uniform distribution

  • Suppose \(X \sim Uniform(0,1)\), then

    \[ \begin{eqnarray} F(t) & = & P(X\leq t) \\ & = & 0 \text{, if $t< 0$ } \\ & = & 1 \text{, if $t >1$ } \\ & = & t \text{, if $t \in [0,1]$} \end{eqnarray} \]

Uniform Random Variable

Expectations of continuous random variables

  • If \(X\) is a continuous random variable then,

    \[ \begin{eqnarray} E[X] & = & \int_{-\infty}^{\infty} x f(x) dx \nonumber \end{eqnarray} \]

  • Suppose \(X \sim Uniform(0,1)\). What is \(E[X]\)?

    \[ \begin{eqnarray} E[X] & = & \int_{-\infty}^{\infty} xf(x)dx \\ & = & \int_{-\infty}^{0} x 0 dx + \int_{0}^{1} x 1 dx + \int_{1}^{\infty} x 0 dx \\ & = & 0 + \frac{x^{2}}{2} |^{1}_{0} + 0 \\ & = & 0 + \frac{1}{2} + 0 \\ & = & \frac{1}{2} \\ \end{eqnarray} \]

Expectations of functions

  • Suppose \(X\) is a continuous random variable and \(g:\Re \rightarrow \Re\)
  • Then,

    \[ \begin{eqnarray} E[g(X)] & = & \int_{-\infty}^{\infty} g(x)f(x)dx \nonumber \end{eqnarray} \]

  • Suppose \(g(X) = X^2\) and \(X \sim \text{Uniform}(0,1)\). What is E[g(X)]?

    \[ \begin{eqnarray} E[g(X)] & = & \int_{-\infty}^{\infty} g(x)f(x)dx \\ & = & \int_{0}^{1} x^2dx \\ & = & \frac{x^3}{3}|^{1}_{0} \\ & = & \frac{1}{3} \end{eqnarray} \]

Variance of functions

  • If \(X\) is a continuous random variable, define its variance, \(Var(X)\),

    \[ \begin{eqnarray} Var(X) & = & E[(X- E[X])^2] \nonumber \\ & = & \int_{-\infty}^{\infty} (x - E[X])^2f(x) dx \nonumber \\ & = & E[X^2] - E[X]^2 \nonumber \end{eqnarray} \]

Variance of functions

  • \(X \sim \text{Uniform}(0,1)\). What is \(Var(X)\)?

    \[ \begin{eqnarray} E[X^2] & = & \frac{1}{3} \\ E[X]^2 & = & \left(\frac{1}{2}\right)^2 \\ & = & \frac{1}{4} \end{eqnarray} \]

    \[ \begin{eqnarray} Var(X) & =& E[X^2] - E[X]^2 \\ & = & \frac{1}{3} - \frac{1}{4} = \frac{1}{12} \end{eqnarray} \]

Normal (Gaussian)

  • Suppose \(X\) is a random variable with \(X \in \Re\) and density

    \[ \begin{eqnarray} f(x) & = & \frac{1}{\sqrt{2\pi \sigma^2}}\exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right) \nonumber \end{eqnarray} \]
  • Then \(X\) is a normally distributed random variable with parameters \(\mu\) and \(\sigma^2\)
  • Equivalently, we’ll write

    \[ \begin{eqnarray} X & \sim & \text{Normal}(\mu, \sigma^2) \nonumber \end{eqnarray} \]

Support for President Obama

Suppose we are interested in modeling presidential approval

  • Let \(Y\) represent random variable: proportion of population who “approves job president is doing”
  • Individual responses (that constitute proportion) are independent and identically distributed and we take the average of those individual responses
  • Observe many responses (\(N\rightarrow \infty\))
  • Then (by Central Limit Theorm) \(Y\) is Normally distributed, or

    \[ \begin{eqnarray} Y& \sim & \text{Normal}(\mu, \sigma^2) \\ f(y) & = & \frac{\exp\left(-\frac{(y-\mu)^2}{2\sigma^2} \right)}{\sqrt{2\pi \sigma^2}} \end{eqnarray} \]

Expected Value/Variance of Normal Distribution

  • \(Z\) is a standard normal distribution if

    \[ \begin{eqnarray} Z & \sim & \text{Normal}(0,1) \nonumber \end{eqnarray} \]

  • We’ll call the cumulative distribution function of \(Z\),

    \[ \begin{eqnarray} F_{Z}(x) & = & \frac{1}{\sqrt{2\pi} }\int_{-\infty}^{x} \exp(-z^2/2) dz \end{eqnarray} \]

Expected Value/Variance of Normal Distribution

  • Suppose \(Z \sim \text{Normal}(0,1)\)

    • \(Y = 2Z + 6\)
    • \(Y \sim \text{Normal}(6, 4)\)

Expected Value/Variance of Normal Distribution

  • Scale/Location: If \(Z \sim N(0,1)\), then \(X = aZ + b\) is,

    \[ \begin{eqnarray} X & \sim & \text{Normal} (b, a^2) \nonumber \end{eqnarray} \]

  • Assume we know:

    \[ \begin{eqnarray} E[Z] & = & 0 \nonumber \\ Var(Z) & = & 1 \nonumber \end{eqnarray} \]

  • This implies that, for \(Y \sim \text{Normal}(\mu, \sigma^2)\)

    \[ \begin{eqnarray} E[Y] & = & E[\sigma Z + \mu] \\ & = & \sigma E[Z] + \mu \nonumber \\ & = & \mu \nonumber \\ Var(Y) & = & Var(\sigma Z + \mu) \\ & = & \sigma^2 Var(Z) + Var(\mu) \\ & = & \sigma^2 + 0 \\ & =& \sigma^2 \end{eqnarray} \]

Why rely on the standard normal distribution

  • Normal distribution is commonly used in statistical analysis
  • Standardizing this makes it easier to make comparisons across variables with different ranges/variances
  • Unitless measurement
  • Saves time on the calculus

Back To Obama

  • Suppose \(\mu = 0.39\) and \(\sigma^2 = 0.0025\)
  • \(P(Y\geq 0.45)\) (What is the probability it isn’t that bad?)?

    \[ \begin{eqnarray} P(Y \geq 0.45) & = & 1 - P(Y \leq 0.45 ) \\ & = & 1 - P(0.05 Z + 0.39 \leq 0.45) \\ & = & 1 - P(Z \leq \frac{0.45-0.39 }{0.05} ) \\ & = & 1 - P(Z \leq \frac{6 }{5} ) \\ & = & 1 - \frac{1}{\sqrt{2\pi} } \int_{-\infty}^{6/5} \exp(-z^2/2) dz \\ & = & 1 - F_{Z} (\frac{6}{5} ) \\ & = & 0.1150697 \end{eqnarray} \]