Discrete Probability
Asymptotic Notations
Let \(f,g : \mathbb{N} \to \mathbb{R}\) be functions defined on \(\mathbb{N}\). We write \(f(n) = \mathcal{O}(g(n))\) if there exist constants \(C > 0\) and \(N \in \mathbb{N}\) such that \[ \begin{equation*} |f(n)| \le Cg(n), \quad \forall n \ge N. \end{equation*} \]
\[ \begin{equation*} \mathcal{O}(g) := \{f : \exists C > 0, \exists N \in \mathbb{N} \text{ s.t. } |f(n)| \le Cg(n), \text{ for all } n \ge N\}. \end{equation*} \]
Let \(f, g : \mathbb{N} \to \mathbb{R}\) and \(c \in \mathbb{R}\). If \(f(n), g(n) \ge 0\) for all \(n\), then \[\begin{align*} f(n) &= \mathcal{O}(f(n)) \\ c \cdot f(n) &= \mathcal{O}(f(n)) \\ \mathcal{O}(\mathcal{O}(f(n))) &= \mathcal{O}(f(n)) \\ \mathcal{O}(f(n)) + \mathcal{O}(g(n)) &= \mathcal{O}(f(n) + g(n)) \\ \mathcal{O}(f(n)) + \mathcal{O}(g(n)) &= \mathcal{O}(\max\{f(n), g(n)\}) \\ \mathcal{O}(f(n)) \cdot \mathcal{O}(g(n)) &= \mathcal{O}(f(n) \cdot g(n)) \\ \mathcal{O}(f(n)g(n)) &= f(n) \cdot \mathcal{O}(g(n)). \\ \end{align*}\]
Theorem 1.1 (Big-\(\mathcal{O}\) notation defined using limsup) Let \(f, g : \mathbb{N} \to \mathbb{R}\), where \(g(n) > 0\) for all sufficiently large \(n\). We write \(f(n) = \mathcal{O}(g(n))\) if and only if \[ \begin{equation*} \limsup_{n \to \infty} \frac{|f(n)|}{g(n)} < +\infty. \end{equation*} \]
Let \(f,g : \mathbb{N} \to \mathbb{R}\), where \(g(n) > 0\) for all sufficiently large \(n\).
- We write \(f(n) = \Omega(g(n))\) if \(\liminf\limits_{n \to \infty} \frac{|f(n)|}{g(n)} > 0\).
- We write \(f(n) = \Theta(g(n))\) if \(f(n) = \mathcal{O}(g(n))\) and \(f(n) = \Omega(g(n))\).
- We write \(f(n) \sim g(n)\) if \(\lim\limits_{n \to \infty} \frac{|f(n)|}{g(n)} = 1\).
- We write \(f(n) = o(g(n))\) if \(\lim\limits_{n \to \infty} \frac{|f(n)|}{g(n)} = 0\).
- We write \(f(n) = \omega(g(n))\) if \(\lim\limits_{n \to \infty} \frac{|f(n)|}{g(n)} = +\infty\).
Theorem 1.2 Let \(f,g : \mathbb{N} \to \mathbb{R}\) with \(g(n) > 0\) for all \(n \ge 1\). If \[ \begin{equation*} f(k) = \mathcal{O}(g(k)) \quad \text{as } k \to \infty, \end{equation*} \] then \[ \begin{equation*} \sum_{k=1}^n f(k) = \mathcal{O}\left(\sum_{k=1}^n g(k)\right) \quad \text{as } n \to \infty. \end{equation*} \]
Theorem 1.3 Let \(f,g : \mathbb{N}^2 \to \mathbb{R}\) with \(g(m,n) > 0\) for all \(m,n \ge 1\). If for some function \(U(m) \ge 1\), we have \[ \begin{equation*} f(m,k) = \mathcal{O}(g(m,k)), \quad \text{for } 1 \le k \le U(m), \text{ as } m \to \infty, \end{equation*} \] Then, \[ \begin{equation*} \sum_{k=1}^n f(m,k) = \mathcal{O}\left(\sum_{k=1}^n g(m,k)\right), \quad \text{for } 1 \le n \le U(m), \text{ as } m \to \infty. \end{equation*} \]
Let \(f,g : \mathbb{N}^d \to \mathbb{R}\) be functions defined on \(\mathbb{N}^d\). We write \[ \begin{equation*} f(n_1,\dots,n_d) = \mathcal{O}(g(n_1,\dots,n_d)) \end{equation*} \] if there exist constants \(C > 0\) and \(N \in \mathbb{N}\) such that \[ \begin{equation*} |f(n_1,\dots,n_d)| \le Cg(n_1,\dots,n_d), \quad \forall n_1,\dots,n_d \ge N. \end{equation*} \]
Probability Spaces and Event
A discrete probability space is a pair \(\mathcal{P} = (\Omega, p)\) consisting of:
- A non-empty finite set \(\Omega\), called the sample space, whose elements represent all possible outcomes of an experiment.
- A function \(p : \Omega \to [0, 1]\), called the probability mass function, which assigns probabilities to outcomes and satisfies \(\sum_{\omega \in \Omega} p(\omega) = 1\).
An event \(A\) is a subset of \(\Omega\) (\(A \subseteq \Omega\)), and its probability is defined as \[ \begin{equation*} \Pr[A] := \sum_{\omega \in A} p(\omega). \end{equation*} \]
Uniform Probability Means Counting
Threshold Strategy
For a fixed integer \(k\) satisfying \(1 \le k \le n - 1\), the \(k\)-strategy is an online decision strategy defined as follows:
- Reject the first \(k\) options and record the maximum value observed;
- Starting from step \(k + 1\), accept the first option whose value exceeds the recorded maximum.
Conclusion
For \(n \ge 2\), the following two statements hold:
- the optimal success probability satisfies
\[ \begin{align*} p_n = 1/e + \mathcal{O}\left(\frac{1}{n}\right); \end{align*} \]
- the optimal threshold satisfies
\[ \begin{align*} k^* = n/e + \mathcal{O}(1). \end{align*} \]
Pointer Chasing Strategy
Before opening any locker, students discuss and agree on a bijection between student names and \([n]\), \[ \begin{align*} \phi : \{\text{student names}\} \to [n], \end{align*} \] which assigns each student a unique ID in \([n]\).
For each student \(s\), let \(x \leftarrow \phi(s)\). The search rule is:
- open locker \(x\);
- if the revealed name tag is \(s\), stop (success);
- otherwise, let \(x \leftarrow \phi(t)\), where \(t\) is the revealed student name, and continue.
If no success occurs, stop after \(n/2\) opens (failure).
Conclusion
For all even \(n \ge 2\), the success probability of the pointer chasing strategy is \[ \begin{align*} \Pr[\text{success}] = 1 - \sum_{j=n/2+1}^n \frac{1}{j} = 1 - \ln 2 + \mathcal{O}\left(\frac{1}{n}\right) \approx 0.307. \end{align*} \]
Intersection and Union of Events
For events \(A, B \subseteq \Omega\):
The intersection of two events \(A\) and \(B\), denoted by \(A \cap B\), is the event that both \(A\) and \(B\) are happen.
The union of two events \(A\) and \(B\), denoted by \(A \cup B\), is the event that at least one of \(A, B\) happens.
As events are subsets of the sample space \(\Omega\), the operations \(\cap\) and \(\cup\) are the same as set operations.
For any events \(A, B \subseteq \Omega\), \[ \begin{align*} \Pr[A \cup B] = \Pr[A] + \Pr[B] - \Pr[A \cap B]. \end{align*} \]
For ans two events \(A, B \subseteq \Omega\), \[ \begin{align*} \Pr [A \cup B] \leq \Pr[A] + \Pr[B]. \end{align*} \]
For any sequence of events \(A_1, \dots, A_m \subseteq \Omega\) with \(m \geq 2\), \[ \begin{align*} \Pr\left[\bigcup_{i=1}^m A_i\right] \le \sum_{i=1}^m \Pr[A_i]. \end{align*} \]
Linear Hash Function
Here \[ \begin{align*} h_{a,b,p,m} : K \to \mathbb{Z}_m, \quad x \mapsto ((ax + b) \bmod p) \bmod m, \end{align*} \]
The corresponding linear hash family is defined as follows: \[ \begin{align*} \mathcal{H}_{p,m} := \{h_{a,b,p,m} : a \in \mathbb{Z}_p \setminus \{0\}, b \in \mathbb{Z}_p\}. \end{align*} \] That is, \(\mathcal{H}_{p,m}\) is a family of linear hash functions with fixed prime \(p\) and bucket number \(m\).
Conclusion
For any two distinct integers \(k, k' \in K\), if a hash function \(h\) is sampled uniformly at random from \(\mathcal{H}_{p,m}\), then \[ \begin{align*} \Pr[h(k) = h(k')] \le \frac{1}{m}. \end{align*} \]
For any universal hash family \(\mathcal{H}\) and any \(n\) distinct keys \(k_1, \dots, k_n \in K\), the collision probability admits the upper bound \[ \begin{align*} \Pr[C] \le \frac{1}{m} \binom{n}{2}. \end{align*} \]
Definition of Ramsey number
For every integer \(k \ge 2\), there exists an integer \(N > 0\) such that every red/blue coloring of the edges of \(K_N\) contains a monochromatic \(K_k\). The Ramsey number \(R(k)\) is the smallest integer \(N\) such that every red/blue coloring of the edges of \(K_N\) contains a monochromatic \(K_k\).
For any integers \(r \ge 1\) and \(k \ge m \ge 1\), there exists an integer \(N\) such that for every coloring of the \(m\)-element subsets of an \(N\)-element set with \(r\) colors, there exists a \(k\)-element subset whose \(m\)-element subsets are all assigned the same color.
Conclusion
For all \(k \ge 3\), \[ \begin{align*} 2 ^ {k / 2} < R(k) < \binom{2k - 2}{k - 1} < 4 ^ k. \end{align*} \] Thus \(R(k) = 2^{\Theta(k)}\).
For all \(r, k, m \geq 2\), the multicolor Ramsey number \(R_r^{(m)}(k)\) satisfies the following lower bound: \[ R_r^{(m)}(k) > r^{\frac{1}{m} \left( \binom{k}{m} - 1 \right)}. \]
Conditional Probability and Independence
Let \(A, B\) be events with \(\Pr[B] > 0\). The conditional probability of \(A\) given \(B\) is defined as:
\[ \Pr[A \mid B] = \frac{\Pr[A \cap B]}{\Pr[B]} \]
We leave \(\Pr[A \mid B]\) undefined if \(\Pr[B] = 0\).
Two events \(A, B\) are said to be independent if
\[ \Pr[A \cap B] = \Pr[A] \Pr[B]. \]
If \(\Pr[B] > 0\), this is equivalent to
\[ \Pr[A \mid B] = \Pr[A]. \]
Events \(A_1, \dots, A_m\) are mutually independent if for every non-empty subset \(I \subseteq [m]\),
\[ \Pr \left[ \bigcap_{i \in I} A_i \right] = \prod_{i \in I} \Pr[A_i]. \]
Theorem 1.4 Let \(B_1, \dots, B_m\) be mutually exclusive events that satisfy \(\bigcup_{i=1}^m B_i = \Omega\) and \(\Pr[B_i] > 0\) for all \(i \in [m]\). Then for any event \(A\),
\[ \Pr[A] = \sum_{i=1}^m \Pr[A \mid B_i] \Pr[B_i]. \]
Random Variables
Let \(\mathcal{P} = (\Omega, p)\) be a discrete probability space. A random variable is a function
\[ X : \Omega \to \mathbb{R}. \]
The indicator function of a condition \(C\) is:
\[ \mathbb{1}_{[C]} = \begin{cases} 1 & \text{if the condition } C \text{ holds}, \\ 0 & \text{otherwise}. \end{cases} \]
For a set \(A \subseteq U\), where \(U\) is some universe, the indicator function of \(A\) is:
\[ \mathbb{1}_A : U \to \{0, 1\}, \quad \mathbb{1}_A(x) = \begin{cases} 1 & \text{if } x \in A, \\ 0 & \text{otherwise}. \end{cases} \]
For a random variable \(X\), its probability mass function \(p_X : \mathbb{R} \to [0, 1]\) is defined as
\[ p_X(x) := \Pr[X = x], \quad \forall x \in \mathbb{R}. \]
Let \(X\) be a random variable on a discrete probability space \((\Omega, p)\). The expectation, expected value or mean of \(X\) is defined as
\[ \mathbb{E}[X] = \sum_{\omega \in \Omega} X(\omega)p(\omega). \]
Theorem 1.5 (Linearity of expectation) For any random variables \(X_1, \dots, X_n\) and constants \(c_1, \dots, c_n \in \mathbb{R}\),
\[ \mathbb{E} \left[ \sum_{i=1}^n c_i X_i \right] = \sum_{i=1}^n c_i \mathbb{E}[X_i]. \]
Theorem 1.6 If \(X\) is a random variable that takes values in non-negative integers, then
\[ \mathbb{E}[X] = \sum_{t \geq 0} \Pr[X > t]. \]
Let \(X\) be a random variable and \(B\) be an event with \(\Pr[B] > 0\). The conditional expectation of \(X\) given \(B\) is defined as
\[ \mathbb{E}[X \mid B] = \frac{\mathbb{E}[\mathbb{1}_B \cdot X]}{\Pr[B]}. \]
Let \(X\) and \(Y\) be random variables. The conditional expectation of \(X\) given \(Y\) is defined as the following random variable:
\[ \mathbb{E}[X \mid Y] : \Omega \to \mathbb{R}, \quad \omega \mapsto \mathbb{E}[X \mid Y = Y(\omega)]. \]
Theorem 1.7 Let \(X\) be a random variable and \(B_1, \dots, B_m\) be mutually exclusive events that satisfy \(\bigcup_{i=1}^m B_i = \Omega\) and \(\Pr[B_i] > 0\) for all \(i \in [m]\). Then
\[ \mathbb{E}[X] = \sum_{i=1}^m \mathbb{E}[X \mid B_i] \Pr[B_i]. \]
Theorem 1.8 Let \(X\) and \(Y\) be random variables. Then
\[ \mathbb{E}[X] = \mathbb{E}[\mathbb{E}[X \mid Y]]. \]
We say that random variables \(X\) and \(Y\) are independent if for all sets \(A, B \subseteq \mathbb{R}\),
\[ \Pr[X \in A, Y \in B] = \Pr[X \in A] \Pr[Y \in B]. \]
Theorem 1.9 If \(X, Y\) are independent random variables, then
\[ \mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]. \]
We say that random variables \(X_1, \dots, X_n\) are mutually independent if for all sets \(A_1, \dots, A_n \subseteq \mathbb{R}\),
\[ \Pr[X_1 \in A_1, \dots, X_n \in A_n] = \prod_{i=1}^n \Pr[X_i \in A_i]. \]
Theorem 1.10 If \(X_1, \dots, X_n\) are mutually independent random variables, then
\[ \mathbb{E} \left[ \prod_{i=1}^n X_i \right] = \prod_{i=1}^n \mathbb{E}[X_i]. \]
We say that random variables \(X_1, \dots, X_n\) are independent and identically distributed, abbreviated i.i.d., if
- \(X_1, \dots, X_n\) are mutually independent, and
- \(X_1, \dots, X_n\) have the same probability distribution.
Appendix
Theorem 1.11 (Nested conditional expectations) Let \(X\) and \(Y\) be two random variables, and let \(f : \mathbb{R} \to \mathbb{R}\) be a measurable function. Then the following simplified expressions hold for nested conditional expectations:
\(\mathbb{E}\left[ \mathbb{E}[X \mid f(Y)] \;\middle|\; Y \right] = \mathbb{E}[X \mid f(Y)]\)
\(\mathbb{E}\left[ \mathbb{E}[X \mid Y] \;\middle|\; f(Y) \right] = \mathbb{E}[X \mid f(Y)]\)