Discrete Probability

Asymptotic Notations

Big-\(\mathcal{O}\) notation

Let \(f,g : \mathbb{N} \to \mathbb{R}\) be functions defined on \(\mathbb{N}\). We write \(f(n) = \mathcal{O}(g(n))\) if there exist constants \(C > 0\) and \(N \in \mathbb{N}\) such that \[ \begin{equation*} |f(n)| \le Cg(n), \quad \forall n \ge N. \end{equation*} \]

\[ \begin{equation*} \mathcal{O}(g) := \{f : \exists C > 0, \exists N \in \mathbb{N} \text{ s.t. } |f(n)| \le Cg(n), \text{ for all } n \ge N\}. \end{equation*} \]

Big-\(\mathcal{O}\) notation properties

Let \(f, g : \mathbb{N} \to \mathbb{R}\) and \(c \in \mathbb{R}\). If \(f(n), g(n) \ge 0\) for all \(n\), then \[\begin{align*} f(n) &= \mathcal{O}(f(n)) \\ c \cdot f(n) &= \mathcal{O}(f(n)) \\ \mathcal{O}(\mathcal{O}(f(n))) &= \mathcal{O}(f(n)) \\ \mathcal{O}(f(n)) + \mathcal{O}(g(n)) &= \mathcal{O}(f(n) + g(n)) \\ \mathcal{O}(f(n)) + \mathcal{O}(g(n)) &= \mathcal{O}(\max\{f(n), g(n)\}) \\ \mathcal{O}(f(n)) \cdot \mathcal{O}(g(n)) &= \mathcal{O}(f(n) \cdot g(n)) \\ \mathcal{O}(f(n)g(n)) &= f(n) \cdot \mathcal{O}(g(n)). \\ \end{align*}\]

Theorem 1.1 (Big-\(\mathcal{O}\) notation defined using limsup) Let \(f, g : \mathbb{N} \to \mathbb{R}\), where \(g(n) > 0\) for all sufficiently large \(n\). We write \(f(n) = \mathcal{O}(g(n))\) if and only if \[ \begin{equation*} \limsup_{n \to \infty} \frac{|f(n)|}{g(n)} < +\infty. \end{equation*} \]

\(\Omega, \Theta, \sim, o, \omega\) notations

Let \(f,g : \mathbb{N} \to \mathbb{R}\), where \(g(n) > 0\) for all sufficiently large \(n\).

We write \(f(n) = \Omega(g(n))\) if \(\liminf\limits_{n \to \infty} \frac{|f(n)|}{g(n)} > 0\).
We write \(f(n) = \Theta(g(n))\) if \(f(n) = \mathcal{O}(g(n))\) and \(f(n) = \Omega(g(n))\).
We write \(f(n) \sim g(n)\) if \(\lim\limits_{n \to \infty} \frac{|f(n)|}{g(n)} = 1\).
We write \(f(n) = o(g(n))\) if \(\lim\limits_{n \to \infty} \frac{|f(n)|}{g(n)} = 0\).
We write \(f(n) = \omega(g(n))\) if \(\lim\limits_{n \to \infty} \frac{|f(n)|}{g(n)} = +\infty\).

Theorem 1.2 Let \(f,g : \mathbb{N} \to \mathbb{R}\) with \(g(n) > 0\) for all \(n \ge 1\). If \[ \begin{equation*} f(k) = \mathcal{O}(g(k)) \quad \text{as } k \to \infty, \end{equation*} \] then \[ \begin{equation*} \sum_{k=1}^n f(k) = \mathcal{O}\left(\sum_{k=1}^n g(k)\right) \quad \text{as } n \to \infty. \end{equation*} \]

Theorem 1.3 Let \(f,g : \mathbb{N}^2 \to \mathbb{R}\) with \(g(m,n) > 0\) for all \(m,n \ge 1\). If for some function \(U(m) \ge 1\), we have \[ \begin{equation*} f(m,k) = \mathcal{O}(g(m,k)), \quad \text{for } 1 \le k \le U(m), \text{ as } m \to \infty, \end{equation*} \] Then, \[ \begin{equation*} \sum_{k=1}^n f(m,k) = \mathcal{O}\left(\sum_{k=1}^n g(m,k)\right), \quad \text{for } 1 \le n \le U(m), \text{ as } m \to \infty. \end{equation*} \]

Big-\(\mathcal{O}\) notation for multivariate functions

Let \(f,g : \mathbb{N}^d \to \mathbb{R}\) be functions defined on \(\mathbb{N}^d\). We write \[ \begin{equation*} f(n_1,\dots,n_d) = \mathcal{O}(g(n_1,\dots,n_d)) \end{equation*} \] if there exist constants \(C > 0\) and \(N \in \mathbb{N}\) such that \[ \begin{equation*} |f(n_1,\dots,n_d)| \le Cg(n_1,\dots,n_d), \quad \forall n_1,\dots,n_d \ge N. \end{equation*} \]

Probability Spaces and Event

Discrete probability space

A discrete probability space is a pair \(\mathcal{P} = (\Omega, p)\) consisting of:

A non-empty finite set \(\Omega\), called the sample space, whose elements represent all possible outcomes of an experiment.
A function \(p : \Omega \to [0, 1]\), called the probability mass function, which assigns probabilities to outcomes and satisfies \(\sum_{\omega \in \Omega} p(\omega) = 1\).

Event

An event \(A\) is a subset of \(\Omega\) (\(A \subseteq \Omega\)), and its probability is defined as \[ \begin{equation*} \Pr[A] := \sum_{\omega \in A} p(\omega). \end{equation*} \]

Uniform Probability Means Counting

The Secretary Problem

Problem Description

Threshold Strategy

For a fixed integer \(k\) satisfying \(1 \le k \le n - 1\), the \(k\)-strategy is an online decision strategy defined as follows:

Reject the first \(k\) options and record the maximum value observed;
Starting from step \(k + 1\), accept the first option whose value exceeds the recorded maximum.

Conclusion

For \(n \ge 2\), the following two statements hold:

the optimal success probability satisfies

\[ \begin{align*} p_n = 1/e + \mathcal{O}\left(\frac{1}{n}\right); \end{align*} \]

the optimal threshold satisfies

\[ \begin{align*} k^* = n/e + \mathcal{O}(1). \end{align*} \]

The Locker Puzzle

Problem Description

Pointer Chasing Strategy

Before opening any locker, students discuss and agree on a bijection between student names and \([n]\), \[ \begin{align*} \phi : \{\text{student names}\} \to [n], \end{align*} \] which assigns each student a unique ID in \([n]\).

For each student \(s\), let \(x \leftarrow \phi(s)\). The search rule is:

open locker \(x\);
if the revealed name tag is \(s\), stop (success);
otherwise, let \(x \leftarrow \phi(t)\), where \(t\) is the revealed student name, and continue.

If no success occurs, stop after \(n/2\) opens (failure).

Conclusion

For all even \(n \ge 2\), the success probability of the pointer chasing strategy is \[ \begin{align*} \Pr[\text{success}] = 1 - \sum_{j=n/2+1}^n \frac{1}{j} = 1 - \ln 2 + \mathcal{O}\left(\frac{1}{n}\right) \approx 0.307. \end{align*} \]

Intersection and Union of Events

Intersection and union of event

For events \(A, B \subseteq \Omega\):

The intersection of two events \(A\) and \(B\), denoted by \(A \cap B\), is the event that both \(A\) and \(B\) are happen.
The union of two events \(A\) and \(B\), denoted by \(A \cup B\), is the event that at least one of \(A, B\) happens.

As events are subsets of the sample space \(\Omega\), the operations \(\cap\) and \(\cup\) are the same as set operations.

Properties of probability measures

For any events \(A, B \subseteq \Omega\), \[ \begin{align*} \Pr[A \cup B] = \Pr[A] + \Pr[B] - \Pr[A \cap B]. \end{align*} \]
For ans two events \(A, B \subseteq \Omega\), \[ \begin{align*} \Pr [A \cup B] \leq \Pr[A] + \Pr[B]. \end{align*} \]
For any sequence of events \(A_1, \dots, A_m \subseteq \Omega\) with \(m \geq 2\), \[ \begin{align*} \Pr\left[\bigcup_{i=1}^m A_i\right] \le \sum_{i=1}^m \Pr[A_i]. \end{align*} \]

Hashing Collision

Linear Hash Function

Here \[ \begin{align*} h_{a,b,p,m} : K \to \mathbb{Z}_m, \quad x \mapsto ((ax + b) \bmod p) \bmod m, \end{align*} \]

The corresponding linear hash family is defined as follows: \[ \begin{align*} \mathcal{H}_{p,m} := \{h_{a,b,p,m} : a \in \mathbb{Z}_p \setminus \{0\}, b \in \mathbb{Z}_p\}. \end{align*} \] That is, \(\mathcal{H}_{p,m}\) is a family of linear hash functions with fixed prime \(p\) and bucket number \(m\).

Conclusion

For any two distinct integers \(k, k' \in K\), if a hash function \(h\) is sampled uniformly at random from \(\mathcal{H}_{p,m}\), then \[ \begin{align*} \Pr[h(k) = h(k')] \le \frac{1}{m}. \end{align*} \]
For any universal hash family \(\mathcal{H}\) and any \(n\) distinct keys \(k_1, \dots, k_n \in K\), the collision probability admits the upper bound \[ \begin{align*} \Pr[C] \le \frac{1}{m} \binom{n}{2}. \end{align*} \]

Ramsey Numbers

Problem Description

Definition of Ramsey number

For every integer \(k \ge 2\), there exists an integer \(N > 0\) such that every red/blue coloring of the edges of \(K_N\) contains a monochromatic \(K_k\). The Ramsey number \(R(k)\) is the smallest integer \(N\) such that every red/blue coloring of the edges of \(K_N\) contains a monochromatic \(K_k\).
For any integers \(r \ge 1\) and \(k \ge m \ge 1\), there exists an integer \(N\) such that for every coloring of the \(m\)-element subsets of an \(N\)-element set with \(r\) colors, there exists a \(k\)-element subset whose \(m\)-element subsets are all assigned the same color.

Conclusion

For all \(k \ge 3\), \[ \begin{align*} 2 ^ {k / 2} < R(k) < \binom{2k - 2}{k - 1} < 4 ^ k. \end{align*} \] Thus \(R(k) = 2^{\Theta(k)}\).
For all \(r, k, m \geq 2\), the multicolor Ramsey number \(R_r^{(m)}(k)\) satisfies the following lower bound: \[ R_r^{(m)}(k) > r^{\frac{1}{m} \left( \binom{k}{m} - 1 \right)}. \]

Conditional Probability and Independence

Conditional probability

Let \(A, B\) be events with \(\Pr[B] > 0\). The conditional probability of \(A\) given \(B\) is defined as:

\[ \Pr[A \mid B] = \frac{\Pr[A \cap B]}{\Pr[B]} \]

We leave \(\Pr[A \mid B]\) undefined if \(\Pr[B] = 0\).

Independence

Two events \(A, B\) are said to be independent if

\[ \Pr[A \cap B] = \Pr[A] \Pr[B]. \]

If \(\Pr[B] > 0\), this is equivalent to

\[ \Pr[A \mid B] = \Pr[A]. \]

Mutual independence

Events \(A_1, \dots, A_m\) are mutually independent if for every non-empty subset \(I \subseteq [m]\),

\[ \Pr \left[ \bigcap_{i \in I} A_i \right] = \prod_{i \in I} \Pr[A_i]. \]

Theorem 1.4 Let \(B_1, \dots, B_m\) be mutually exclusive events that satisfy \(\bigcup_{i=1}^m B_i = \Omega\) and \(\Pr[B_i] > 0\) for all \(i \in [m]\). Then for any event \(A\),

\[ \Pr[A] = \sum_{i=1}^m \Pr[A \mid B_i] \Pr[B_i]. \]

Random Variables

Random variable

Let \(\mathcal{P} = (\Omega, p)\) be a discrete probability space. A random variable is a function

\[ X : \Omega \to \mathbb{R}. \]

Indicator function

The indicator function of a condition \(C\) is:

\[ \mathbb{1}_{[C]} = \begin{cases} 1 & \text{if the condition } C \text{ holds}, \\ 0 & \text{otherwise}. \end{cases} \]

For a set \(A \subseteq U\), where \(U\) is some universe, the indicator function of \(A\) is:

\[ \mathbb{1}_A : U \to \{0, 1\}, \quad \mathbb{1}_A(x) = \begin{cases} 1 & \text{if } x \in A, \\ 0 & \text{otherwise}. \end{cases} \]

Probability mass function

For a random variable \(X\), its probability mass function \(p_X : \mathbb{R} \to [0, 1]\) is defined as

\[ p_X(x) := \Pr[X = x], \quad \forall x \in \mathbb{R}. \]

Expectation

Let \(X\) be a random variable on a discrete probability space \((\Omega, p)\). The expectation, expected value or mean of \(X\) is defined as

\[ \mathbb{E}[X] = \sum_{\omega \in \Omega} X(\omega)p(\omega). \]

Theorem 1.5 (Linearity of expectation) For any random variables \(X_1, \dots, X_n\) and constants \(c_1, \dots, c_n \in \mathbb{R}\),

\[ \mathbb{E} \left[ \sum_{i=1}^n c_i X_i \right] = \sum_{i=1}^n c_i \mathbb{E}[X_i]. \]

Theorem 1.6 If \(X\) is a random variable that takes values in non-negative integers, then

\[ \mathbb{E}[X] = \sum_{t \geq 0} \Pr[X > t]. \]

Conditional expectation given an event

Let \(X\) be a random variable and \(B\) be an event with \(\Pr[B] > 0\). The conditional expectation of \(X\) given \(B\) is defined as

\[ \mathbb{E}[X \mid B] = \frac{\mathbb{E}[\mathbb{1}_B \cdot X]}{\Pr[B]}. \]

Conditional expectation given a random variable

Let \(X\) and \(Y\) be random variables. The conditional expectation of \(X\) given \(Y\) is defined as the following random variable:

\[ \mathbb{E}[X \mid Y] : \Omega \to \mathbb{R}, \quad \omega \mapsto \mathbb{E}[X \mid Y = Y(\omega)]. \]

Theorem 1.7 Let \(X\) be a random variable and \(B_1, \dots, B_m\) be mutually exclusive events that satisfy \(\bigcup_{i=1}^m B_i = \Omega\) and \(\Pr[B_i] > 0\) for all \(i \in [m]\). Then

\[ \mathbb{E}[X] = \sum_{i=1}^m \mathbb{E}[X \mid B_i] \Pr[B_i]. \]

Theorem 1.8 Let \(X\) and \(Y\) be random variables. Then

\[ \mathbb{E}[X] = \mathbb{E}[\mathbb{E}[X \mid Y]]. \]

Independence of random variables

We say that random variables \(X\) and \(Y\) are independent if for all sets \(A, B \subseteq \mathbb{R}\),

\[ \Pr[X \in A, Y \in B] = \Pr[X \in A] \Pr[Y \in B]. \]

Theorem 1.9 If \(X, Y\) are independent random variables, then

\[ \mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]. \]

Mutual independence of random variables

We say that random variables \(X_1, \dots, X_n\) are mutually independent if for all sets \(A_1, \dots, A_n \subseteq \mathbb{R}\),

\[ \Pr[X_1 \in A_1, \dots, X_n \in A_n] = \prod_{i=1}^n \Pr[X_i \in A_i]. \]

Theorem 1.10 If \(X_1, \dots, X_n\) are mutually independent random variables, then

\[ \mathbb{E} \left[ \prod_{i=1}^n X_i \right] = \prod_{i=1}^n \mathbb{E}[X_i]. \]

Independent and identically distributed random variables

We say that random variables \(X_1, \dots, X_n\) are independent and identically distributed, abbreviated i.i.d., if

\(X_1, \dots, X_n\) are mutually independent, and
\(X_1, \dots, X_n\) have the same probability distribution.

Appendix

Theorem 1.11 (Nested conditional expectations) Let \(X\) and \(Y\) be two random variables, and let \(f : \mathbb{R} \to \mathbb{R}\) be a measurable function. Then the following simplified expressions hold for nested conditional expectations:

\(\mathbb{E}\left[ \mathbb{E}[X \mid f(Y)] \;\middle|\; Y \right] = \mathbb{E}[X \mid f(Y)]\)
\(\mathbb{E}\left[ \mathbb{E}[X \mid Y] \;\middle|\; f(Y) \right] = \mathbb{E}[X \mid f(Y)]\)