Definition 20.1 (Expectation) The expectation of a random variable X is its expected value.
\mathbb{E}_{X} [X]
= \sum_{x \in \mathbb{R}} x \mathbb{P}_{X} (x)
= \sum_{\omega \in \Omega} X (\omega) \mathbb{P} (\omega).
By definition, the expectation of a scalar a is itself
\mathbb{E}_{X} [a] = a.
Corollary 20.1 (Linearity of expectation) For any random variables X, Y: \Omega \to \mathbb{R} that are defined on the same probability space (\Omega, \mathcal{F}, \mathbb{P}), we have
\mathbb{E}_{X, Y} [X + Y] = \mathbb{E}_{X} [X] + \mathbb{E}_{Y} [Y],
and for any scalars a, b \in \mathbb{R}
\mathbb{E}_{X} [a X + b] = a \mathbb{E}_{X} [X] + b.
For the first property, we have
\begin{aligned}
\mathbb{E}_{X, Y} [X + Y] & = \sum_{\omega \in \Omega} (X + Y) (\omega) \mathbb{P} (\omega)
\\
& = \sum_{\omega \in \Omega} (X (\omega) + Y (\omega)) \mathbb{P} (\omega)
\\
& = \sum_{\omega \in \Omega} X (\omega) \mathbb{P} (\omega) + \sum_{\omega \in \Omega} Y (\omega) \mathbb{P} (\omega)
\\
& = \mathbb{E} [X] + \mathbb{E} [Y],
\end{aligned}
and for the second property
\begin{aligned}
\mathbb{E} [aX + b]
& = \sum_{\omega \in \Omega} (aX + b) (\omega) \mathbb{P} (\omega)
\\
& = \sum_{\omega \in \Omega} (aX (\omega) + b) \mathbb{P} (\omega)
\\
& = \sum_{\omega \in \Omega} aX (\omega) \mathbb{P} (\omega)
+ \sum_{\omega \in \Omega} b \mathbb{P} (\omega)
\\
& = a \sum_{\omega \in \Omega} X (\omega) \mathbb{P} (\omega)
+ b \sum_{\omega \in \Omega} \mathbb{P} (\omega)
\\
& = a\mathbb{E} [X] + b
& [\sum_{\omega \in \Omega} \mathbb{P} (\omega) = 1].
\end{aligned}
The “law of the unconscious statistician” states that the expectation of a transformed random variable can be found without finding the probabilities of the transformed random variable, simply by applying the probability weights of the original random variable to the transformed values.
Corollary 20.2 (Law of the unconscious statistician (LOTUS)) The expectation of a random variable Y = g (X) that is a function the
\mathbb{E}_{Y} [Y]
= \sum_{y \in \mathbb{R}} y \mathbb{P}_{Y} (y)
= \sum_{x \in \mathbb{R}} g (x) \mathbb{P}_{X} (x)
= \mathbb{E}_{X} [g (X)]
TODO
Since the probability distribution \mathbb{P}_{Y} (y)
\begin{aligned}
\sum_{y \in \Omega_Y} y \mathbb{P}_Y (y)
& = \sum_{y \in \Omega_Y} y \sum_{x \in \Omega_X : g(x)=y} \mathbb{P}_X (x)
\\
& = \sum_{y \in \Omega_Y} \sum_{x \in \Omega_X : g(x)=y} y \mathbb{P}_X (x)
\\
& = \sum_{y \in \Omega_Y} \sum_{x \in \Omega_X : g(x)=y} g(x) \mathbb{P}_X (x)
\\
& = \sum_{x \in \Omega_X} g(x) \mathbb{P}_X (x)
\\
& = \mathbb{E} [g(X)]
\end{aligned}
Corollary 20.3 The expectation of the product of independent random variables is the product of their own expectations
\mathbb{E}_{X, Y} [X Y] = \mathbb{E}_{X} [X] \mathbb{E}_{Y} [Y].
By Definition 20.1 and Corollary 20.2, we have
\begin{aligned}
\mathbb{E}_{X, Y} [X Y]
& = \sum_{x \in \mathbb{R}} \sum_{y \in \mathbb{R}} x y \mathbb{P}_{X, Y} (x, y)
\\
& = \sum_{x \in \mathbb{R}} \sum_{y \in \mathbb{R}} x y \mathbb{P}_{X} (x) \mathbb{P}_{Y} (y)
& [X, Y \text{ independent}]
\\
& = \sum_{x \in \mathbb{R}} x \mathbb{P}_{X} (x) \sum_{y \in \mathbb{R}} y \mathbb{P}_{Y} (y)
\\
& = \mathbb{E}_{X} [X] \mathbb{E}_{Y} [Y].
\end{aligned}
Conditional expectation
Definition 20.2 (Conditional expectation) Let X, Y be jointly distributed random variables. Then the conditional expectation of X given the event that Y = y is
\mathbb{E}_{X \mid Y} [X \mid y] = \sum_{x \in \mathbb{R}} x \mathbb{P}_{X \mid Y} (x \mid y),
which is a function of y.
Using Corollary 20.2, the conditional expectation of a transformed random variable g (X) is
\mathbb{E}_{X \mid Y} [g (X) \mid y] = \sum_{x \in \mathbb{R}} g (x) \mathbb{P}_{X \mid Y} (x \mid y).
Theorem 20.1 (Law of total expectation (LTE)) Let X, Y be jointly distributed random variables. The expectation of g (X) can be calculated from its conditional expectation
\mathbb{E}_{X} [g (X)] = \sum_{y \in \mathbb{R}} \mathbb{E}_{X \mid Y} [g (X) \mid y] \mathbb{P}_{Y} (y).
By expanding \mathbb{E}_{X \mid Y} [g (X) \mid y], we can have
\begin{aligned}
\sum_{y \in \mathbb{R}} \mathbb{E}_{X \mid Y} [g(X) \mid y] \mathbb{P}_{Y} (y)
& = \sum_{y \in \mathbb{R}} \left(
\sum_{x \in \mathbb{R}} g(x) \mathbb{P}_{X \mid Y} (x \mid y)
\right) \mathbb{P}_{Y} (y)
\\
& = \sum_{x \in \mathbb{R}} \sum_{y \in \mathbb{R}} g (x) \mathbb{P}_{X \mid Y} (x \mid y) \mathbb{P}_{Y} (y)
\\
& = \sum_{x \in \mathbb{R}} g (x) \sum_{y \in \mathbb{R}} \mathbb{P}_{X, Y} (x, y)
\\
& = \sum_{x \in \mathbb{R}} g (x) \mathbb{P}_{X} (x)
\\
& = \mathbb{E}_{X} [g (X)].
\end{aligned}
Variance
The concept of the variance summarizes how much a random variable deviates from its mean on average.
Definition 20.3 The variance of a random variable X is defined to be
\mathrm{Var} [X] = \mathbb{E}_{X} [(X - \mathbb{E}_{X} [X])^{2}].
Corollary 20.4 The variance can also be calculated as
\mathrm{Var} [X] = \mathbb{E}_{X} [X^{2}] - \mathbb{E}_{X} [X]^{2}.
Let \mu = \mathbb{E}_{X} [X]. By Definition 20.3, we have
\begin{aligned}
\mathrm{Var} (X)
& = \mathbb{E}_{X} [(X - \mu)^2]
\\
& = \mathbb{E}_{X} [X^2 - 2 \mu X + \mu^2]
\\
& = \mathbb{E}_{X} [X^2] - 2 \mu\mathbb{E} [X] + \mu^2
& [\text{linearity of expectation}]
\\
& = \mathbb{E}_{X} [X^2] - 2 \mathbb{E} [X]^2 + \mathbb{E} [X]^2
\\
& = \mathbb{E}_{X} [X^2] - \mathbb{E} [X]^2
\end{aligned}
First we show that variance is invariant of the shift. By Definition 20.3,
\begin{aligned}
\mathrm{Var} (X + b)
& = \mathbb{E} [((X + b) - \mathbb{E} [X + b])^{2}]
\\
& = \mathbb{E} [(X + b - \mathbb{E} [X] - b)^2]
\\
& = \mathbb{E} [(X - \mathbb{E} [X])]^2
\\
& = \text{Var} (X).
\end{aligned}
Then we show that the variance of the scaling of a random variable is squared. By Corollary 20.4,
\begin{aligned}
\text{Var} (aX)
& = \mathbb{E} [(aX)^2] - (\mathbb{E} [aX])^2
\\
& = \mathbb{E} [a^2 X^2] - (a \mathbb{E} [X])^2
\\
& = a^2 \mathbb{E} [X^2] - a^2 \mathbb{E} [X]^2
\\
& = a^2 (\mathbb{E} [X^2] - \mathbb{E} [X]^2)
\\
& = a^2 \text{Var} (X).
\end{aligned}
Standard deviation
Another measure of a random variable X’s spread is the standard deviation.
Definition 20.4 The standard-deviation of a random variable X is
\sigma_{X} = \sqrt{\mathrm{Var}(X)}.
Covariance
Given two random variables X, Y with a joint distribution \mathbb{P}_{X, Y} (x, y), the covariance describes how they are related with each other. If the covariance is positive, increasing one variable generally leads to an increase in the other random variable, and leads to a decrease if it is negative.
Definition 20.5 (Covariance) Let X, Y be random variables. The covariance between X and Y is
\mathrm{Cov} [X, Y] = \mathbb{E}_{X, Y} [(X - \mathbb{E}_{X} [X]) (Y - \mathbb{E}_{Y} [Y])].
Corollary 20.6 The covariance can also be calculated as
\mathrm{Cov} [X, Y] = \mathbb{E}_{X, Y} [X Y] - \mathbb{E}_{X} [X] \mathbb{E}_{Y} [Y].
Let \mu_{X} = \mathbb{E}_{X} [X] and \mu_{Y} = \mathbb{E}_{Y} [Y]. By Definition 20.5, we have
\begin{aligned}
\mathrm{Cov} [X, Y]
& = \mathbb{E}_{X, Y} [(X - \mu_{X}) (Y - \mu_{Y})]
\\
& = \mathbb{E}_{X, Y} [X Y - X \mu_{Y} - \mu_{X} Y + \mu_{X} \mu_{Y}]
\\
& = \mathbb{E}_{X, Y} [X Y] - \mu_{X} \mu_{Y} - \mu_{X} \mu_{Y} + \mu_{X} \mu_{Y}
\\
& = \mathbb{E}_{X, Y} [X Y] - \mathbb{E}_{X} [X] \mathbb{E}_{Y} [Y].
\end{aligned}
Corollary 20.7 The covariance has the following properties.
Invariant to shifting
\mathrm{Cov} [X + a, Y] = \mathrm{Cov} [X, Y].
Linear transformation
\mathrm{Cov} [a X + b Y, Z] = a \mathrm{Cov} [X, Z] + b \mathrm{Cov} [Y, Z].
Covariance of sum of random variables
\mathrm{Cov} [X + A, Y + B] = \mathrm{Cov} [X, Y] + \mathrm{Cov} [X, B] + \mathrm{Cov} [A, Y] + \mathrm{Cov} [A, B].
All of the 3 properties can be proved from the definition of the variance using Corollary 20.1.
\begin{aligned}
\mathrm{Cov} [X + a, Y]
& = \mathbb{E}_{X, Y} [(X + a - \mathbb{E} [X + a]) (Y - \mathbb{E} [Y])]
\\
& = \mathbb{E}_{X, Y} [(X + a - \mathbb{E} [X] - a) (Y - \mathbb{E} [Y])]
\\
& = \mathbb{E}_{X, Y} [(X - \mathbb{E} [X]) (Y - \mathbb{E} [Y])]
\\
& = \mathrm{Cov} [X, Y].
\end{aligned}
\begin{aligned}
\mathrm{Cov} [aX + bY, Z]
& = \mathbb{E}_{X, Y, Z} [(aX + bY - \mathbb{E}_{X, Y} [aX + bY]) (Z - \mathbb{E}_{Z} [Z])]
\\
& = \mathbb{E}_{X, Y, Z} [a(X - \mathbb{E}_{X} [X]) (Z - \mathbb{E}_{Z} [Z]) + b(Y - \mathbb{E}_{Y} [Y]) (Z - \mathbb{E}_{Z} [Z])]
\\
& = a\mathbb{E}_{X, Z} [(X - \mathbb{E}_{X} [X]) (Z - \mathbb{E}_{Z} [Z])] + b\mathbb{E}_{Y, Z} [(Y - \mathbb{E}_{Y} [Y]) (Z - \mathbb{E}_{Z} [Z])]
\\
& = a\mathrm{Cov} [X, Z] + b\mathrm{Cov} [Y, Z].
\end{aligned}
\begin{aligned}
\mathrm{Cov} [X + A, Y + B] &
= \mathbb{E}_{X, A, Y, B} [(X + A - \mathbb{E}_{X, A} [X + A]) (Y + B - \mathbb{E}_{Y, B} [Y + B])]
\\
& = \mathbb{E}_{X, A, Y, B} [(X - \mathbb{E}_{X} [X] + A - \mathbb{E}_{A} [A]) (Y - \mathbb{E}_{Y} [Y] + B - \mathbb{E}_{B} [B])]
\\
& = \mathbb{E}_{X, Y} [(X - \mathbb{E}_{X} [X]) (Y - \mathbb{E}_{Y} [Y])] + \mathbb{E}_{X, B} [(X - \mathbb{E}_{X} [X]) (B - \mathbb{E}_{B} [B])]
\\
& \quad + \mathbb{E}_{A, Y} [(A - \mathbb{E}_{A} [A]) (Y - \mathbb{E}_{Y} [Y])] + \mathbb{E}_{A, B} [(A - \mathbb{E}_{A} [A]) (B - \mathbb{E}_{B} [B])]
\\
& = \mathrm{Cov} [X, Y] + \mathrm{Cov} [X, B] + \mathrm{Cov} [A, Y] + \mathrm{Cov} [A, B].
\end{aligned}
Corollary 20.8 If X and Y are independent, their covariance is 0
\mathrm{Cov} [X, Y] = 0.
According to Corollary 20.3, we have that
\mathbb{E}_{X, Y} [X Y] = \mathbb{E}_{X} [X] \mathbb{E}_{Y} [Y].
Therefore according to Corollary 20.6, we have
\mathrm{Cov} [X, Y] = \mathbb{E}_{X, Y} [X Y] - \mathbb{E}_{X} [X] \mathbb{E}_{Y} [Y] = 0.
Corollary 20.9 For any random variables X and Y
\mathrm{Var} [X + Y] = \mathrm{Var} [X] + \mathrm{Var} [Y] + 2 \mathrm{Cov} [X, Y].
Since the variance of a random variable is its covariance with itself,
\begin{aligned}
\mathrm{Var} [X + Y]
& = \mathrm{Cov} [X + Y, X + Y]
\\
& = \mathrm{Cov} [X, X] + \mathrm{Cov} [X, Y] + \mathrm{Cov} [Y, X] + \mathrm{Cov} [Y, Y]
\\
& = \mathrm{Var} [X] + \mathrm{Var} [Y] + 2 \mathrm{Cov} [X, Y].
\end{aligned}
where we used the third property in Corollary 20.7 in the second equality.
Correlation
The problem with covariance in describing the relation between two random variables is that its values can be affected by the variance of each individual random variable. THe correlation coefficient calculates the normalized covariance that is invariant with the scaling of the individual random variables.
Definition 20.6 Let X, Y be random variables. The correlation coefficient between X and Y is
\rho (X, Y) = \frac{ \mathrm{Cov} [X, Y] }{ \sqrt{\mathrm{Var} [X]} \sqrt{\mathrm{Var} [Y]} }.