17 Probability
Definition 17.1 The probability space (\Omega, \mathcal{F}, \mathbb{P}) of a random experiment consists of the following quantities.
The sample space \Omega is the set of all possible outcomes of the random experiment.
The set of events \mathcal{F}, where each event A is a subset of the sample space \Omega and we say that A occurs if the outcome \omega \in \Omega of the random experiment is an element of A. By definition, \mathcal{F} is a \sigma-algebra, which means it must satisfies the following requirements
Contains the sample space: \Omega \in \mathcal{F}.
Closed under complement: if A \in \mathcal{F}, then \Omega \setminus A \in \mathcal{F}.
Closed under countable unions: if A_{1}, \dots, A_{n} \in \mathcal{F}, then \bigcup_{i = 1}^{n} A_{i} \in \mathcal{F}.
A probability measure \mathbb{P}: \mathcal{F} \to [0, 1] is a function that assigns probabilities to the events in \mathcal{F} and satisfy the following axioms.
Non-negativity: \mathbb{P} (A) \geq 0, \forall A \in \mathcal{F}.
Normalization: \mathbb{P} (\Omega) = 1.
Countable additivity: If A_{1}, \dots, A_{n} are mutually disjoint, then \mathbb{P} \left( \bigcup_{i = 1}^{n} A_{i} \right) = \sum_{i = 1}^{n} \mathbb{P} (A_{i}).
If the sample space \Omega is finite, the probability space (\Omega, \mathcal{F}, \mathbb{P}) defined in Definition 17.1 is said to be discrete.
If the sample space \Omega is infinite, the probability space (\Omega, \mathcal{B}, \mathbb{P}) is said to be continuous if the set of events \mathcal{B} is a Borel \sigma-algebra. If the sample space \Omega is the real space \mathbb{R}, Borel \sigma-algebra is the smallest \sigma-algebra that contains all open sub-intervals in \Omega = \mathbb{R}.
Basic probability properties
Corollary 17.1 The probability of the union events is
\mathbb{P} (A \cup B) = \mathbb{P} (A) + \mathbb{P} (B) - \mathbb{P} (A \cap B).
Corollary 17.2 (Union bound) The probability of an event that is a union of a finite set of events A_{1}, \dots, A_{n} is no greater than the sum of the probabilities of all events.
\mathbb{P} \left( \bigcup_{i = 1}^{n} A_{i} \right) \leq \sum_{i = 1}^{n} \mathbb{P} (A_{i}).
Joint probability
Usually we refer the probability that the events A, B occur at the same time as the joint probability of A and B
\mathbb{P} (A \cap B) = \mathbb{P} (A, B).
Conditional probability
Definition 17.2 (Conditional probability) Let B be an event such that \mathbb{P} (B) > 0. The conditional probability of the event A given B is defined to be
\mathbb{P} (A \mid B) = \frac{ \mathbb{P} (A \cap B) }{ \mathbb{P} (B) }.
Note that \mathbb{P} (\cdot \mid \cdot ) is also a probability measure and therefore satisfies the axioms of the probability measure in Definition 17.1.
Corollary 17.3 (Chain rule) Given a set of events A_{1}, \dots, A_{n}, the probability of their intersection can be calculated as
\mathbb{P} \left( \bigcap_{i = 1}^{n} A_{i} \right) = \prod_{j = 1}^{n} \mathbb{P} \left( A_{j} \;\middle|\; \bigcap_{k = j + 1}^{n} A_{k} \right)
where \mathbb{P} \left( A_{j} \;\middle|\; \bigcap_{k > n}^{n} A_{k} \right) = \mathbb{P} (A_{j}).
Theorem 17.1 (Bayes’s theorem) Let A, B be events with nonzero probability. Then,
\mathbb{P} (A \mid B) = \frac{ \mathbb{P} (B \mid A) \mathbb{P} (A) }{ \mathbb{P} (B) }.
Marginal probability
Theorem 17.2 (Law of total probability) Let A_{1}, \dots, A_{n} be events that partition \Omega, that is, A_{1}, \dots, A_{n} are disjoint and there union is \Omega, the for any event B,
\mathbb{P} (B) = \sum_{i = 1}^{n} \mathbb{P} (B \cap A_{i}) = \sum_{i = 1}^{n} \mathbb{P} (B \mid A_{i}) \mathbb{P} (A_{i}).
Independence
Definition 17.3 (Independence) Two events A and B are said to be independent if any of the following holds.
\mathbb{P} (A \cap B) = \mathbb{P} (A) \mathbb{P} (B).
\mathbb{P} (A \mid B) = \mathbb{P} (A).
\mathbb{P} (B \mid A) = \mathbb{P} (B).
Definition 17.4 (conditional-independence) Two events A and B are said to be conditional independent given an event C if any of the following holds.
\mathbb{P} (A \cap B \mid C) = \mathbb{P} (A \mid C) \mathbb{P} (B \mid C).
\mathbb{P} (A \mid B \cap C) = \mathbb{P} (A \cap C).
\mathbb{P} (B \mid A \cap C) = \mathbb{P} (B \cap C).
We can use independence properties to calculate the probability of the intersection of independent events by calculating the product of their probabilities.