Random Variables

Computational Mathematics and Statistics Camp University of Chicago September 2018

Probability mass function

Go back to our experiment example – probability comes from probability of outcomes
$P(C, T, C) = P(C)P(T)P(C) = \frac{1}{2}\frac{1}{2}\frac{1}{2} = \frac{1}{8}$
This applies to all outcomes
\[ \begin{eqnarray} p(X = 0) & = & P(C, C, C) = \frac{1}{8}\\ p(X = 1) & = & P(T, C, C) + P(C, T, C) + P(C, C, T) = \frac{3}{8} \\ p(X = 2) & = & P(T, T, C) + P(T, C, T) + P(C, T, T) = \frac{3}{8} \\ p(X = 3) & = & P(T, T, T) = \frac{1}{8} \end{eqnarray} \]
$p(X = a) = 0$, for all $a \notin (0, 1, 2, 3)$

Probability mass function

Consider outcome of election
- $X(v)=1$ if $v>0.5$ otherwise $X(v) = 0$
- $P(X = 1)$ then is equal to $P(v>0.5)$

Probability mass function

If $X$ is defined on an outcome space that is discrete (countable), we’ll call it discrete
Probability Mass Function: For a discrete random variable $X$, define the probability mass function $p(x)$ as

\[ \begin{eqnarray} p(x) & = & P(X = x) \end{eqnarray} \]

Topic model example

Topics: distinct concepts (war in Afghanistan, national debt, fire department grants)
Mathematically: Probability Mass Function on Words
- Probability of using word, when discussing a topic
Suppose we have a set of words:
- (afghanistan, fire, department, soldier, troop, war, grant)
Topic 1 (war)
- P(afghanistan) = 0.3; P(fire) = 0.0001; P(department) = 0.0001; P(soldier) = 0.2; P(troop) = 0.2; P(war)=0.2997; P(grant)=0.0001
Topic 2 (fire departments ):
- P(afghanistan) = 0.0001; P(fire) = 0.3; P(department) = 0.2; P(soldier) = 0.0001; P(troop) = 0.0001; P(war)=0.0001; P(grant)=0.2997
Take a set of documents and estimate topics

Cumulative mass function

For a random variable $X$, define the cumulative mass function $F(x)$ as, \[ \begin{eqnarray} F(x) & = & P(X \leq x) \end{eqnarray} \]
Characterizes how probability cumulates as $X$ gets larger
$F(x) \in [0,1]$
$F(x)$ is non-decreasing

Three person experiment

Consider the three person experiment: $P(T) = P(C) = 1/2$
What is $F(2)$?
\[ \begin{eqnarray} F(2) & = & P(X = 0) + P(X = 1) + P(X = 2) \\ & = & \frac{1}{8} + \frac{3}{8} + \frac{3}{8} \\ & = & \frac{7}{8} \end{eqnarray} \]
What is $F(2) - F(1)$?

\[ \begin{eqnarray} F(2) - F(1) & = & [P(X = 0) + P(X = 1) + P(X = 2)] \nonumber \\ && -[P(X = 0) + P(X = 1)] \\ F(2) - F(1) & = & P(X = 2) \end{eqnarray} \]

Close relationship between pmf’s and cmf’s

Functions of random variables

We might (or often) apply a function to a random variable $g(X)$
How do we compute $E[g(X)]$?
Expected value of a function of a random variable: Suppose $X$ is a discrete random variable that takes on values $x_{i}$, $i=\{1, 2, \ldots, \}$, with probabilities $p(x_{i})$. If $g:X \rightarrow \mathcal{R}$, then its expected value $E[g(X)]$ is,

\[ \begin{eqnarray} E[g(X)] & = & \sum_{i} g(x_{i}) p(x_{i} ) \end{eqnarray} \]

Example

Let’s suppose that $X$ is the number of observations assigned to treatment (from our previous example).
Suppose that $g(X) = X^2$. What is $E[g(X)]$?

\[ \begin{eqnarray} E[g(X)] = E[X^2]& = & 0^2 \times \frac{1}{8} + 1^2 \times \frac{3}{8} + 2^2 \times \frac{3}{8} + 3^2 \times \frac{1}{8} \\ & = & 0 + \frac{3}{8} + \frac{12}{8} + \frac{9}{8} \\ & = & \frac{24}{8} = 3 \end{eqnarray} \]

Bernoulli distribution

Suppose $X$ is a random variable, with $X \in \{0, 1\}$ and $P(X = 1) = \pi$. Then we will say that $X$ is Bernoulli random variable,

\[ \begin{eqnarray} p(k) & = & \pi^{k} (1- \pi)^{1 - k} \nonumber \end{eqnarray} \]

for $k \in \{0,1\}$ and $p(k) = 0$ otherwise.
We will (equivalently) say that

\[ \begin{eqnarray} Y & \sim & \text{Bernoulli}(\pi) \nonumber \end{eqnarray} \]

Bernoulli distribution

Suppose we flip a fair coin and $Y = 1$ if the outcome is Heads

\[ \begin{eqnarray} Y & \sim & \text{Bernoulli}(1/2) \nonumber \\ p(1) & = & (1/2)^{1} (1- 1/2)^{ 1- 1} = 1/2 \nonumber \\ p(0) & = & (1/2)^{0} (1- 1/2)^{1 - 0} = (1- 1/2) \nonumber \end{eqnarray} \]

Expected value and variance

Suppose $Y \sim \text{Bernoulli}(\pi)$

\[ \begin{eqnarray} E[Y] & = & 1 \times P(Y = 1) + 0 \times P(Y = 0) \nonumber \\ & = & \pi + 0 (1 - \pi) \nonumber = \pi \\ \text{var}(Y) & = & E[Y^2] - E[Y]^2 \nonumber \\ E[Y^2] & = & 1^{2} P(Y = 1) + 0^{2} P(Y = 0) \nonumber \\ & = & \pi \nonumber \\ \text{var}(Y) & = & \pi - \pi^{2} \nonumber \\ & = & \pi(1 - \pi ) \nonumber \end{eqnarray} \]
$E[Y] = \pi$
Var$(Y) = \pi(1- \pi)$
What is the maximum variance?

\[ \begin{eqnarray} \text{var}(Y) & = & \pi - \pi^{2} \nonumber \\ & = & 0.5(1 - 0.5 ) \\ & = & 0.25 \end{eqnarray} \]

Example: Winning a war

Suppose country $1$ is engaged in a conflict and can either win or lose.
Define $Y = 1$ if the country wins and $Y = 0$ otherwise.
Then,

\[ \begin{eqnarray} Y &\sim & \text{Bernoulli}(\pi) \end{eqnarray} \]
Suppose country $1$ is deciding whether to fight a war.
Engaging in the war will cost the country $c$.
If they win, country $1$ receives $B$.
What is $1$’s expected utility from fighting a war?

\[ \begin{eqnarray} E[U(\text{war})] & = & (\text{Utility}|\text{win})\times P(\text{win}) + (\text{Utility}| \text{lose})\times P(\text{lose}) \\ &= & (B - c) P(Y = 1) + (- c) P(Y = 0 ) \\ & = & B \times p(Y = 1) - c(P(Y = 1) + P(Y = 0)) \\ & = & B \times \pi - c \end{eqnarray} \]

Binomial distribution

A model to count the number of successes across $N$ trials
Suppose $X$ is a random variable that counts the number of successes in $N$ independent and identically distributed Bernoulli trials. Then $X$ is a Binomial random variable,

\[ \begin{eqnarray} p(k) & = & {{N}\choose{k}}\pi^{k} (1- \pi)^{1-k} \nonumber \end{eqnarray} \]
for $k \in \{0, 1, 2, \ldots, N\}$ and $p(k) = 0$ otherwise
$\binom{N}{k} = \frac{N!}{(N-k)! k!}$.
Equivalently,

\[ \begin{eqnarray} Y & \sim & \text{Binomial}(N, \pi) \nonumber \end{eqnarray} \]

Example

Recall our experiment example:
- $P(T) = P(C) = 1/2$
- $Z =$ number of units assigned to treatment
  
  \[ \begin{eqnarray} Z & \sim & \text{Binomial}(1/2)\\ p(0) & = & {{3}\choose{0}} (1/2)^{0} (1- 1/2)^{3-0} = 1 \times \frac{1}{8}\\ p(1) & = & {{3}\choose{1}} (1/2)^{1} (1 - 1/2)^{2} = 3 \times \frac{1}{8} \\ p(2) & = & {{3}\choose{2}} (1/2)^{2} (1- 1/2)^1 = 3 \times \frac{1}{8} \\ p(3) & = & {{3}\choose{3}} (1/2)^{3} (1 - 1/2)^{0} = 1 \times \frac{1}{8} \end{eqnarray} \]

Expected value and variance

\[Z = \sum_{i=1}^{N} Y_{i} \text{ where } Y_{i} \sim \text{Bernoulli}(\pi)\]

\[ \begin{eqnarray} E[Z] & = & E[Y_{1} + Y_{2} + Y_{3} + \ldots + Y_{N} ] \\ & = & \sum_{i=1}^{N} E[Y_{i} ] \\ & = & N \pi \\ \text{var}(Z) & = & \sum_{i=1}^{N} \text{var}(Y_{i}) \\ & = & N \pi (1-\pi) \end{eqnarray} \]

Poisson

Often interested in counting number of events that occur:
- Number of wars started
- Number of speeches made
- Number of bribes offered
- Number of people waiting for license
Generally referred to as event counts

Poisson

Suppose $X$ is a random variable that takes on values $X \in \{0, 1, 2, \ldots, \}$ and that $P(X = k) = p(k)$ is,

\[ \begin{eqnarray} p(k) & = & e^{-\lambda} \frac{\lambda^{k}}{k!} \nonumber \end{eqnarray} \]

for $k \in \{0, 1, \ldots, \}$ and $0$ otherwise. Then we will say that $X$ follows a Poisson distribution with rate parameter $\lambda$
\[ \begin{eqnarray} X & \sim & \text{Poisson}(\lambda) \nonumber \end{eqnarray} \]
$E(X) = var(X) = \lambda$

Example

Suppose the number of threats a president makes in a term is given by $X \sim \text{Poisson}(5)$
What is the probability the president will make ten or more threats?

\[ \begin{eqnarray} P(X \geq 10) & = & e^{-\lambda} \sum_{k=10}^{\infty} \frac{5^{k}}{k!} \\ & = & 1 - P(X< 10 ) \end{eqnarray} \]

Example

Continuous random variables

Random variables that are not discrete
- Approval ratings
- GDP
- Wait time between wars: $X(t) = t$ for all $t$
- Proportion of vote received: $X(v) = v$ for all $v$
Many analogues to discrete probability distributions
We need calculus to answer questions about probability

Probability density function

What is the area under the curve under $f(x)$ between $.5$ and $2$?

\[\int_{1/2}^{2} f(x)dx = F(2) - F(1/2)\]

Definition

$X$ is a continuous random variable if there exists a nonnegative function defined for all $x \in \Re$ having the property for any (measurable) set of real numbers $B$,

\[ \begin{eqnarray} P(X \in B) & = & \int_{B} f(x)dx \nonumber \end{eqnarray} \]
- Non-negative meaning $f(x)$ is never negative
We’ll call $f(\cdot)$ the density function for $X$

Uniform Random Variable

$X \sim \text{Uniform}(0,1)$ if

\[ \begin{eqnarray} f(x) & = & 1 \text{ if } x \in [0,1] \\ f(x) & = & 0 \text{ otherwise } \end{eqnarray} \]

Uniform Random Variable

\[ \begin{eqnarray} P(X \in [0.2, 0.5]) & = & \int_{0.2}^{0.5} 1 dx \nonumber \\ & = & X |^{0.5}_{0.2} \nonumber \\ & = & 0.5 - 0.2 \nonumber \\ & = & 0.3\nonumber \end{eqnarray} \]

Uniform Random Variable

\[ \begin{eqnarray} P(X \in [0, 1] ) & = & \int_{0}^{1} 1 dx \nonumber \\ & = & X |^{1}_{0} \nonumber \\ & = & 1 - 0 \nonumber \\ & = & 1 \nonumber \end{eqnarray} \]

Uniform Random Variable

\[ \begin{eqnarray} P(X \in [0.5, 0.5]) & = & \int_{0.5}^{0.5} 1dx \nonumber \\ & = & X|^{0.5}_{0.5} \nonumber \\ & = & 0.5 - 0.5 \nonumber \\ & = & 0 \nonumber \end{eqnarray} \]

Uniform Random Variable

\[ \begin{eqnarray} P(X \in \{[0, 0.2]\cup[0.5, 1]\}) & = & \int_{0}^{0.2} 1dx + \int_{0.5}^{1} 1dx \nonumber \\ & = & X_{0}^{0.2} + X_{0.5}^{1} \nonumber \\ & = & 0.2 - 0 + 1 - 0.5 \nonumber \\ & = & 0.7 \nonumber \end{eqnarray} \]

Uniform Random Variable

To summarize
- $P(X = a) = 0$ – probability at any point is 0 (instantaneous)
- $P(X \in (-\infty, \infty) ) = 1$ – probability over the entire real number line is 1
- If $F$ is antiderivative of $f$, then $P(X \in [c,d]) = F(d) - F(c)$ (Fundamental theorem of calculus)

Cumulative mass function

Probability density function ($f$) characterizes distribution of continuous random variable
Equivalently, cumulative distribution function characterizes continuous random variables
For a continuous random variable $X$ define its cumulative distribution function $F(x)$ as,

\[ \begin{eqnarray} F(t) & = & P(X \leq t) = \int_{-\infty} ^{t} f(x) dx \nonumber \end{eqnarray} \]
pdf integrated $\leadsto$ cdf
cdf differentiated $\leadsto$ pdf

Example: uniform distribution

Suppose $X \sim Uniform(0,1)$, then

\[ \begin{eqnarray} F(t) & = & P(X\leq t) \\ & = & 0 \text{, if $t< 0$ } \\ & = & 1 \text{, if $t >1$ } \\ & = & t \text{, if $t \in [0,1]$} \end{eqnarray} \]

Uniform Random Variable

Expectations of continuous random variables

If $X$ is a continuous random variable then,

\[ \begin{eqnarray} E[X] & = & \int_{-\infty}^{\infty} x f(x) dx \nonumber \end{eqnarray} \]
Suppose $X \sim Uniform(0,1)$. What is $E[X]$?

\[ \begin{eqnarray} E[X] & = & \int_{-\infty}^{\infty} xf(x)dx \\ & = & \int_{-\infty}^{0} x 0 dx + \int_{0}^{1} x 1 dx + \int_{1}^{\infty} x 0 dx \\ & = & 0 + \frac{x^{2}}{2} |^{1}_{0} + 0 \\ & = & 0 + \frac{1}{2} + 0 \\ & = & \frac{1}{2} \\ \end{eqnarray} \]

Expectations of functions

Suppose $X$ is a continuous random variable and $g:\Re \rightarrow \Re$
Then,

\[ \begin{eqnarray} E[g(X)] & = & \int_{-\infty}^{\infty} g(x)f(x)dx \nonumber \end{eqnarray} \]
Suppose $g(X) = X^2$ and $X \sim \text{Uniform}(0,1)$. What is E[g(X)]?

\[ \begin{eqnarray} E[g(X)] & = & \int_{-\infty}^{\infty} g(x)f(x)dx \\ & = & \int_{0}^{1} x^2dx \\ & = & \frac{x^3}{3}|^{1}_{0} \\ & = & \frac{1}{3} \end{eqnarray} \]

Variance of functions

If $X$ is a continuous random variable, define its variance, $Var(X)$,

\[ \begin{eqnarray} Var(X) & = & E[(X- E[X])^2] \nonumber \\ & = & \int_{-\infty}^{\infty} (x - E[X])^2f(x) dx \nonumber \\ & = & E[X^2] - E[X]^2 \nonumber \end{eqnarray} \]

Variance of functions

$X \sim \text{Uniform}(0,1)$. What is $Var(X)$?

\[ \begin{eqnarray} E[X^2] & = & \frac{1}{3} \\ E[X]^2 & = & \left(\frac{1}{2}\right)^2 \\ & = & \frac{1}{4} \end{eqnarray} \]

\[ \begin{eqnarray} Var(X) & =& E[X^2] - E[X]^2 \\ & = & \frac{1}{3} - \frac{1}{4} = \frac{1}{12} \end{eqnarray} \]

Normal (Gaussian)

Suppose $X$ is a random variable with $X \in \Re$ and density
\[ \begin{eqnarray} f(x) & = & \frac{1}{\sqrt{2\pi \sigma^2}}\exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right) \nonumber \end{eqnarray} \]
Then $X$ is a normally distributed random variable with parameters $\mu$ and $\sigma^2$
Equivalently, we’ll write

\[ \begin{eqnarray} X & \sim & \text{Normal}(\mu, \sigma^2) \nonumber \end{eqnarray} \]

Support for President Obama

Suppose we are interested in modeling presidential approval

Let $Y$ represent random variable: proportion of population who “approves job president is doing”
Individual responses (that constitute proportion) are independent and identically distributed and we take the average of those individual responses
Observe many responses ($N\rightarrow \infty$)
Then (by Central Limit Theorm) $Y$ is Normally distributed, or

\[ \begin{eqnarray} Y& \sim & \text{Normal}(\mu, \sigma^2) \\ f(y) & = & \frac{\exp\left(-\frac{(y-\mu)^2}{2\sigma^2} \right)}{\sqrt{2\pi \sigma^2}} \end{eqnarray} \]

Expected Value/Variance of Normal Distribution

$Z$ is a standard normal distribution if

\[ \begin{eqnarray} Z & \sim & \text{Normal}(0,1) \nonumber \end{eqnarray} \]
We’ll call the cumulative distribution function of $Z$,

\[ \begin{eqnarray} F_{Z}(x) & = & \frac{1}{\sqrt{2\pi} }\int_{-\infty}^{x} \exp(-z^2/2) dz \end{eqnarray} \]

Expected Value/Variance of Normal Distribution

Suppose $Z \sim \text{Normal}(0,1)$
- $Y = 2Z + 6$
- $Y \sim \text{Normal}(6, 4)$

Expected Value/Variance of Normal Distribution

Scale/Location: If $Z \sim N(0,1)$, then $X = aZ + b$ is,

\[ \begin{eqnarray} X & \sim & \text{Normal} (b, a^2) \nonumber \end{eqnarray} \]
Assume we know:

\[ \begin{eqnarray} E[Z] & = & 0 \nonumber \\ Var(Z) & = & 1 \nonumber \end{eqnarray} \]
This implies that, for $Y \sim \text{Normal}(\mu, \sigma^2)$

\[ \begin{eqnarray} E[Y] & = & E[\sigma Z + \mu] \\ & = & \sigma E[Z] + \mu \nonumber \\ & = & \mu \nonumber \\ Var(Y) & = & Var(\sigma Z + \mu) \\ & = & \sigma^2 Var(Z) + Var(\mu) \\ & = & \sigma^2 + 0 \\ & =& \sigma^2 \end{eqnarray} \]

Why rely on the standard normal distribution

Normal distribution is commonly used in statistical analysis
Standardizing this makes it easier to make comparisons across variables with different ranges/variances
Unitless measurement
Saves time on the calculus

Back To Obama

Suppose $\mu = 0.39$ and $\sigma^2 = 0.0025$
$P(Y\geq 0.45)$ (What is the probability it isn’t that bad?)?

\[ \begin{eqnarray} P(Y \geq 0.45) & = & 1 - P(Y \leq 0.45 ) \\ & = & 1 - P(0.05 Z + 0.39 \leq 0.45) \\ & = & 1 - P(Z \leq \frac{0.45-0.39 }{0.05} ) \\ & = & 1 - P(Z \leq \frac{6 }{5} ) \\ & = & 1 - \frac{1}{\sqrt{2\pi} } \int_{-\infty}^{6/5} \exp(-z^2/2) dz \\ & = & 1 - F_{Z} (\frac{6}{5} ) \\ & = & 0.1150697 \end{eqnarray} \]