1 Discrete Probability

1.1 Probability Spaces and Event

Discrete Probability Space

A discrete probability space is a pair $\mathcal{P} = (\Omega, p)$ consisting of:

A non-empty finite set $\Omega$ , called the sample space, whose elements represent all possible outcomes of an experiment.
A function $p : \Omega \to [0, 1]$ , called the probability mass function, which assigns probabilities to outcomes and satisfies $\sum_{\omega \in \Omega} p(\omega) = 1$ . That is, the total probability of all outcomes is 1.

Event

An event $A$ is a subset of $\Omega$ ( $A \subseteq \Omega$ ), and its probability is defined as

\begin{equation*} Pr[A] := \sum_{\omega \in A} p(\omega). \end{equation*}

1.2 Asymptotic Notations

Big- $\mathcal{O}$ notation

Let $f,g : \mathbb{N} \to \mathbb{R}$ be functions defined on $\mathbb{N}$ . We write $f(n) = \mathcal{O}(g(n))$ if there exist constants $C > 0$ and $N \in \mathbb{N}$ such that

\begin{equation*} |f(n)| \le Cg(n), \quad \forall n \ge N. \end{equation*}

\begin{equation*} \mathcal{O}(g) := \{f : \exists C > 0, \exists N \in \mathbb{N} \text{ s.t. } |f(n)| \le Cg(n), \text{ for all } n \ge N\}. \end{equation*}

Theorem 1.1. Let $f, g : \mathbb{N} \to \mathbb{R}$ and $c \in \mathbb{R}$ . If $f(n), g(n) \ge 0$ for all $n$ , then

\begin{align*} f(n) &= \mathcal{O}(f(n)) \\ c \cdot f(n) &= \mathcal{O}(f(n)) \\ \mathcal{O}(\mathcal{O}(f(n))) &= \mathcal{O}(f(n)) \\ \mathcal{O}(f(n)) + \mathcal{O}(g(n)) &= \mathcal{O}(f(n) + g(n)) \\ \mathcal{O}(f(n)) + \mathcal{O}(g(n)) &= \mathcal{O}(\max\{f(n), g(n)\}) \\ \mathcal{O}(f(n)) \cdot \mathcal{O}(g(n)) &= \mathcal{O}(f(n) \cdot g(n)) \\ \mathcal{O}(f(n)g(n)) &= f(n) \cdot \mathcal{O}(g(n)). \\ \end{align*}

Big- $\mathcal{O}$ notation over a domain

Let $f,g : D \to \mathbb{R}$ be functions defined on a domain $D$ . We say that $f(n) = \mathcal{O}(g(n))$ for all $n \in D$ if there exists a constant $C > 0$ such that

\begin{equation*} |f(n)| \le Cg(n), \quad \forall n \in D. \end{equation*}

Big- $\mathcal{O}$ notation for multivariate functions

Let $f,g : \mathbb{N}^d \to \mathbb{R}$ be functions defined on $\mathbb{N}^d$ . We write

\begin{equation*} f(n_1,\dots,n_d) = \mathcal{O}(g(n_1,\dots,n_d)) \end{equation*}

if there exist constants $C > 0$ and $N \in \mathbb{N}$ such that

\begin{equation*} |f(n_1,\dots,n_d)| \le Cg(n_1,\dots,n_d), \quad \forall n_1,\dots,n_d \ge N. \end{equation*}

Big- $\mathcal{O}$ notation for $x \to 0$

Let $f,g : \mathbb{R} \to \mathbb{R}$ be functions defined on $\mathbb{R}$ . We say that $f(x) = \mathcal{O}(g(x))$ as $x \to 0$ if there exist constants $C > 0$ and $\delta > 0$ such that

\begin{equation*} |f(x)| \le Cg(x), \quad \forall x \in \mathbb{R} \text{ with } 0 < |x| < \delta. \end{equation*}

Big- $\mathcal{O}$ notation defined using $\limsup$

Let $f, g : \mathbb{N} \to \mathbb{R}$ , where $g(n) > 0$ for all sufficiently large $n$ . We write $f(n) = \mathcal{O}(g(n))$ if

\begin{equation*} \limsup_{n \to \infty} \frac{|f(n)|}{g(n)} < +\infty. \end{equation*}

$\Omega, \Theta, \sim$ Notations

Let $f,g : \mathbb{N} \to \mathbb{R}$ , where $g(n) > 0$ for all sufficiently large $n$ .

We write $f(n) = \Omega(g(n))$ if $\liminf\limits_{n \to \infty} \frac{|f(n)|}{g(n)} > 0$ .
We write $f(n) = \Theta(g(n))$ if $f(n) = \mathcal{O}(g(n))$ and $f(n) = \Omega(g(n))$ .
We write $f(n) \sim g(n)$ if $\lim\limits_{n \to \infty} \frac{|f(n)|}{g(n)} = 1$ .

Little- $o$ and Little- $\omega$ Notations

Let $f,g : \mathbb{N} \to \mathbb{R}$ , where $g(n) > 0$ for all sufficiently large $n$ .

We write $f(n) = o(g(n))$ if $\lim\limits_{n \to \infty} \frac{|f(n)|}{g(n)} = 0$ .
We write $f(n) = \omega(g(n))$ if $\lim\limits_{n \to \infty} \frac{|f(n)|}{g(n)} = +\infty$ .

Theorem 1.2. Let $f,g : \mathbb{N} \to \mathbb{R}$ with $g(n) > 0$ for all $n \ge 1$ . If

\begin{equation*} f(k) = \mathcal{O}(g(k)) \quad \text{as } k \to \infty, \end{equation*}

then

\begin{equation*} \sum_{k=1}^n f(k) = \mathcal{O}\left(\sum_{k=1}^n g(k)\right) \quad \text{as } n \to \infty. \end{equation*}

Theorem 1.3. Let $f,g : \mathbb{N}^2 \to \mathbb{R}$ with $g(m,n) > 0$ for all $m,n \ge 1$ . If for some function $U(m) \ge 1$ , we have

\begin{equation*} f(m,k) = \mathcal{O}(g(m,k)), \quad \text{for } 1 \le k \le U(m), \text{ as } m \to \infty, \end{equation*}

Then,

\begin{equation*} \sum_{k=1}^n f(m,k) = \mathcal{O}\left(\sum_{k=1}^n g(m,k)\right), \quad \text{for } 1 \le n \le U(m), \text{ as } m \to \infty. \end{equation*}

1.3 Uniform Probability Means Counting

Theorem 1.4 (Addition Principle). If a set $A$ is the union of $m$ disjoint sets $S_1, \dots, S_m$ , i.e., $A = S_1 \sqcup \dots \sqcup S_m$ , then

\begin{align*} |A| = |S_1 \sqcup \dots \sqcup S_m| = \sum_{i=1}^m |S_i|, \end{align*}

where $\sqcup$ denotes the union of two disjoint sets.

Theorem 1.5 (Multiplication Principle). If a set $A$ is the Cartesian product of $m$ sets $S_1, \dots, S_m$ , i.e., $A = S_1 \times S_2 \times \dots \times S_m$ , then

\begin{align*} |A| = |S_1 \times S_2 \times \dots \times S_m| = \prod_{i=1}^m |S_i|, \end{align*}

where $\times$ denotes the Cartesian product of two sets, i.e., $A \times B = \{(a, b) : a \in A, b \in B\}$ .

The Secretary Problem

There are exactly $n$ options, each associated with a distinct value $v_i \in \mathbb{R}$ . The options are presented in a random order $\sigma$ , where $\sigma$ is a permutation of $[n]$ sampled uniformly at random.
At step $t$ , you observe the value of the current option $v_{\sigma_t}$ and only know the values of the first $t$ observed options $v_{\sigma_1}, \dots, v_{\sigma_t}$ . You must make an immediate decision: accept or reject the current option.
If an option is rejected, it cannot be recalled. If an option is accepted, the process terminates immediately and no subsequent options can be inspected or accepted.

The final goal is to maximize the probability of accepting the global best option, namely the option with the highest value

\begin{align*} v_{\max} := \max_{i \in [n]} \{v_i\} \end{align*}

Definition (Threshold Strategy). For a fixed integer $k$ satisfying $1 \le k \le n - 1$ , the $k$ -strategy is an online decision strategy defined as follows:

Reject the first $k$ options and record the maximum value observed;
Starting from step $k + 1$ , accept the first option whose value exceeds the recorded maximum.

Theorem 1.6. For $n \ge 2$ , the following two statements hold:

the optimal success probability satisfies

\begin{align*} p_n = 1/e + \mathcal{O}\left(\frac{1}{n}\right); \end{align*}

the optimal threshold satisfies

\begin{align*} k^* = n/e + \mathcal{O}(1). \end{align*}

The Locker Puzzle

Suppose $n$ students attend U.S. visa interviews together, and all students have distinct names. Each student has one bag, and the student's own name tag is attached to that bag. They are not allowed to bring personal belongings into the embassy, so everyone stores one bag in one of $n$ lockers. After they come out, everyone forgets which locker contains their own bag. So we have a random one-to-one assignment between name tags and lockers.
Students search one by one (not simultaneously): when one student finishes, all opened lockers are closed again and the locker state is restored before the next student starts. Each student may open at most $n/2$ lockers (assume $n$ is an even number), and no communication is allowed during the search. The goal is to design a strategy that maximizes the probability that all students succeed in finding their own bags.

Definition (Pointer Chasing Strategy). Before opening any locker, students discuss and agree on a bijection between student names and $[n]$ ,

\begin{align*} \phi : \{\text{student names}\} \to [n], \end{align*}

which assigns each student a unique ID in $[n]$ .

For each student $s$ , let $x \leftarrow \phi(s)$ . The search rule is:

open locker $x$ ;
if the revealed name tag is $s$ , stop (success);
otherwise, let $x \leftarrow \phi(t)$ , where $t$ is the revealed student name, and continue.

If no success occurs, stop after $n/2$ opens (failure).

Theorem 1.7. For all even $n \ge 2$ , the success probability of the pointer chasing strategy is

\begin{align*} \Pr[\text{success}] = 1 - \sum_{j=n/2+1}^n \frac{1}{j} = 1 - \ln 2 + \mathcal{O}\left(\frac{1}{n}\right) \approx 0.307. \end{align*}

1.4 Intersection and Union of Events

Intersection and Union of Event

For events $A, B \subseteq \Omega$ :

The intersection of two events $A$ and $B$ , denoted by $A \cap B$ , is the event that both $A$ and $B$ are happen.
The union of two events $A$ and $B$ , denoted by $A \cup B$ , is the event that at least one of $A, B$ happens.

As events are subsets of the sample sapce $\Omega$ , the operations $\cap$ and $\cup$ are the same as set operations.

Lemma. For any events $A, B \subseteq \Omega$ ,

\begin{align*} \Pr[A \cup B] = \Pr[A] + \Pr[B] - \Pr[A \cap B]. \end{align*}

Theorem 1.8. For ans two events $A, B \subseteq \Omega$ ,

\begin{align*} \Pr [A \cup B] \leq \Pr[A] + \Pr[B]. \end{align*}

Theorem 1.9. For any sequence of events $A_1, \dots, A_m \subseteq \Omega$ with $m \geq 2$ ,

\begin{align*} \Pr\left[\bigcup_{i=1}^m A_i\right] \le \sum_{i=1}^m \Pr[A_i]. \end{align*}

Hashing Collision

Given a set $S = \{(k_i, v_i)\}_{i=1}^n$ of $n$ key-value pairs, a dictionary is a data structure that stores these pairs and supports queries of the form: given a key $k$ , return its corresponding value $v$ (if it exists). We assume that all the given keys are distinct and are taken from a key space $K$ . Each key is represented as an integer in the range $\{0, 1, \dots, |K| - 1\}$ .

Definition (Hash Table). Choose a hash function $h : K \to \{0, 1, \dots, m - 1\}$ and allocate an array of $m$ buckets to store the key-value pairs, where the $i$ -th key-value pair is stored at the $h(k_i)$ -th bucket. To answer a query for a key $k$ , compute $h(k)$ and look up the value at the $h(k)$ -th bucket. This yields $\mathcal{O}(1)$ query time while using only $\Theta(m)$ space for storing the dictionary.

Definition (Linear Hash Function). Here

\begin{align*} h_{a,b,p,m} : K \to \mathbb{Z}_m, \quad x \mapsto ((ax + b) \bmod p) \bmod m, \end{align*}

where $a, b$ are integers, $p$ is a large prime number such that $K \subseteq \mathbb{Z}_p$ , and $m$ is a positive integer with $m \le p$ . Here $\mathbb{Z}_m$ is the integers modulo $m$ , i.e., $\mathbb{Z}_m = \{0, 1, \dots, m - 1\}$ . The operation $z \bmod p$ denotes the remainder of $z$ modulo $p$ , which takes the value in $\mathbb{Z}_p$ . Since the final result is taken modulo $m$ , the hash function always produces a value in $\mathbb{Z}_m$ .

The corresponding linear hash family is defined as follows:

\begin{align*} \mathcal{H}_{p,m} := \{h_{a,b,p,m} : a \in \mathbb{Z}_p \setminus \{0\}, b \in \mathbb{Z}_p\}. \end{align*}

That is, $\mathcal{H}_{p,m}$ is a family of linear hash functions with fixed prime $p$ and bucket number $m$ .

Theorem 1.10. For any two distinct integers $k, k' \in K$ , if a hash function $h$ is sampled uniformly at random from $\mathcal{H}_{p,m}$ , then

\begin{align*} \Pr[h(k) = h(k')] \le \frac{1}{m}. \end{align*}

Definition. A hash family $\mathcal{H}$ is a set of hash functions $h : K \to \{0, 1, \dots, m - 1\}$ . We say that $\mathcal{H}$ is universal if for any two distinct keys $k, k' \in K$ , when $h$ is sampled uniformly at random from $\mathcal{H}$ ,

\begin{align*} \Pr[h(k) = h(k')] \le \frac{1}{m}. \end{align*}

Theorem 1.11. For any universal hash family $\mathcal{H}$ and any $n$ distinct keys $k_1, \dots, k_n \in K$ , the collision probability admits the upper bound

\begin{align*} \Pr[C] \le \frac{1}{m} \binom{n}{2}. \end{align*}

Ramsey Numbers via the Probabilistic Method

Theorem 1.12. Among any 6 people, there always exist either 3 mutual friends or 3 mutual strangers.

Definition. $K_n$ donotes the complete graph on $n$ vertices, and we say that a complete graph $K_n$ is monochromatic if all its edges have the same color.

Theorem 1.13 (Ramsey's theorem). For every integer $k \ge 2$ , there exists an integer $N > 0$ such that every red/blue coloring of the edges of $K_N$ contains a monochromatic $K_k$ .

Definition. The Ramsey number $R(k)$ is the smallest integer $N$ such that every red/blue coloring of the edges of $K_N$ contains a monochromatic $K_k$ .

Theorem 1.14 (Ramsey's theorem for hypergraphs). For any integers $r \ge 1$ and $k \ge m \ge 1$ , there exists an integer $N$ such that for every coloring of the $m$ -element subsets of an $N$ -element set with $r$ colors, there exists a $k$ -element subset whose $m$ -element subsets are all assigned the same color.

Theorem 1.15. For all $k \ge 3$ ,

\begin{align*} 2 ^ {k / 2} < R(k) < \binom{2k - 2}{k - 1} < 4 ^ k. \end{align*}

Thus $R(k) = 2^{\Theta(k)}$ .

1.5 Conditional Probability and Independence

Conditional Probability

Let $A, B$ be events with $\Pr[B] > 0$ . The conditional probability of $A$ given $B$ is defined as

\begin{align*} \Pr[A \mid B] = \frac{\Pr[A \cap B]}{\Pr[B]}. \end{align*}

We leave $\Pr[A \mid B]$ undefined if $\Pr[B] = 0$ .

Independence of Event

Two events $A, B$ are said to be independent if

\begin{align*} \Pr[A \cap B] &= \Pr[A] \Pr[B]. \end{align*}

If $\Pr[B] > 0$ , this is equivalent to

\begin{align*} \Pr[A \mid B] &= \Pr[A]. \end{align*}

Mutual Independence

Events $A_1, \ldots, A_m$ are mutually independent if for every non-empty subset $I \subseteq [m]$ ,

\begin{align*} \Pr\left[\bigcap_{i \in I} A_i\right] &= \prod_{i \in I} \Pr[A_i]. \end{align*}

Theorem 1.16. For $m \geq 2$ , let $A_1, \ldots, A_m$ be events with all relevant conditional probabilities well-defined. Then

\begin{align*} \Pr\left[\bigcap_{i=1}^m A_i\right] &= \Pr[A_1] \prod_{i=2}^m \Pr\left[A_i \mid \bigcap_{j=1}^{i-1} A_j\right]. \end{align*}

Theorem 1.17. Let $B_1, \dots, B_m$ be mutually exclusive events with $\bigcup_{i=1}^m B_i = \Omega$ and $\Pr[B_i] > 0$ for all $i \in [m]$ . Then for any event $A$ ,

\begin{align*} \Pr[A] &= \sum_{i=1}^m \Pr[A \mid B_i]\Pr[B_i]. \end{align*}

1.6 Random Variables

Random Variable

Let $\mathcal{P} = (\Omega, p)$ be a discrete probability spac

e. A random variable is a function

\begin{align*} X : \Omega \to \mathbb{R}. \end{align*}

Probability Distribution

A random variable defined on a probability space induces a probability distribution on the real numbers. Here, the probability distribution of $X$ is defined as a function that assigns a probability to each subset of possible values of $X$ . That is, for any subset $A \subseteq \mathbb{R}$ ,

\begin{align*} \Pr[X \in A] &= \sum_{\omega \in \Omega} \mathbb{1}_{[X(\omega) \in A]} p(\omega), \end{align*}

Indicator Function

The indicator function of a condition $C$ is:

\begin{align*} \mathbb{1}_{[C]} &= \begin{cases} 1 & \text{if the condition } C \text{ holds}, \\ 0 & \text{otherwise}. \end{cases} \end{align*}

For a set $A \subseteq U$ , where $U$ is some universe, the indicator function of $A$ is

\begin{align*} \mathbb{1}_A : U \to \{0, 1\}, \quad \mathbb{1}_A(x) &= \begin{cases} 1 & \text{if } x \in A, \\ 0 & \text{otherwise}. \end{cases} \end{align*}

Probability Mass Function

For a random variable $X$ , its probability mass function $p_X : \mathbb{R} \to [0, 1]$ is defined as

\begin{align*} p_X(x) &:= \Pr[X = x], \quad \forall x \in \mathbb{R}. \end{align*}

Expectation

Let $X$ be a random variable on a discrete probability space $(\Omega, p)$ . Its expectation is

\begin{align*} \mathbb{E}[X] &= \sum_{\omega \in \Omega} X(\omega) p(\omega). \end{align*}

Theorem 1.18. For any random variables $X_1, \dots, X_n$ and constants $c_1, \dots, c_n \in \mathbb{R}$ ,

\begin{align*} \mathbb{E}\left[\sum_{i=1}^n c_i X_i\right] &= \sum_{i=1}^n c_i \mathbb{E}[X_i]. \end{align*}

Theorem 1.19. If $X$ is a random variable that takes values in non-negative integers, then

\begin{align*} \mathbb{E}[X] &= \sum_{t \ge 0} \Pr[X > t]. \end{align*}

Conditional Expectation

Let $X$ be a random variable and $B$ be an event with $\Pr[B] > 0$ . The conditional expectation of $X$ given $B$ is defined as

\begin{align*} \mathbb{E}[X \mid B] &= \frac{\mathbb{E}[\mathbb{1}_B \cdot X]}{\Pr[B]}. \end{align*}

Conditional Expectation Given a Random Variable

Let $X$ and $Y$ be random variables. The conditional expectation of $X$ given $Y$ is defined as the following random variable:

\begin{align*} \mathbb{E}[X \mid Y] : \Omega \to \mathbb{R}, \quad \omega \mapsto \mathbb{E}[X \mid Y = Y(\omega)]. \end{align*}

Theorem 1.20. Let $X$ be a random variable and $B_1, \dots, B_m$ be mutually exclusive events with $\bigcup_{i=1}^m B_i = \Omega$ and $\Pr[B_i] > 0$ for all $i \in [m]$ . Then

\begin{align*} \mathbb{E}[X] &= \sum_{i=1}^m \mathbb{E}[X \mid B_i] \Pr[B_i]. \end{align*}

Theorem 1.21. Let $X$ and $Y$ be random variables. Then

\begin{align*} \mathbb{E}[X] &= \mathbb{E}[\mathbb{E}[X \mid Y]]. \end{align*}

Independence of Random Variables

We say that random variables $X$ and $Y$ are independent if for all sets $A, B \subseteq \mathbb{R}$ ,

\begin{align*} \Pr[X \in A, Y \in B] &= \Pr[X \in A]\Pr[Y \in B]. \end{align*}

Equivalently, $X$ and $Y$ are independent if for all $a, b$ ,

\begin{align*} \Pr[X = a, Y = b] &= \Pr[X = a]\Pr[Y = b]. \end{align*}

Theorem 1.22. If $X, Y$ are independent random variables, then

\begin{align*} \mathbb{E}[XY] &= \mathbb{E}[X]\mathbb{E}[Y]. \end{align*}

Mutual Independence of Random Variables

We say that random variables $X_1, \dots, X_n$ are mutually independent if for all sets $A_1, \dots, A_n \subseteq \mathbb{R}$ ,

\begin{align*} \Pr[X_1 \in A_1, \dots, X_n \in A_n] &= \prod_{i=1}^n \Pr[X_i \in A_i]. \end{align*}

Equivalently, $X_1, \dots, X_n$ are mutually independent if for all $a_1, \dots, a_n$ ,

\begin{align*} \Pr[X_1 = a_1, \dots, X_n = a_n] &= \prod_{i=1}^n \Pr[X_i = a_i]. \end{align*}

Data Deduplication via Hashing

Definition. Let $S(D)$ be the set of 5-grams of document $D$ . Define Jaccard similarity between two documents $D_1$ and $D_2$ as

\begin{align*} J(D_1, D_2) &= \frac{|S(D_1) \cap S(D_2)|}{|S(D_1) \cup S(D_2)|}. \end{align*}

Fix a threshold $\tau \in (0, 1)$ and a tolerance $\epsilon \in (0, 1)$ . Given a corpus of $n$ documents $D_1, \dots, D_n$ , we build a graph where each document is a node. For any two documents $D_1$ and $D_2$ , we compute their Jaccard similarity $J(D_1, D_2)$ and discuss the following three cases:

If $J(D_1, D_2) \ge \tau + \epsilon$ , we add an edge between $D_1$ and $D_2$ .
If $J(D_1, D_2) \le \tau - \epsilon$ , we do nothing.
If $\tau - \epsilon < J(D_1, D_2) < \tau + \epsilon$ , it is both ok to add an edge between $D_1$ and $D_2$ and not to add an edge.

Definition (MinHash). First, we collect all 5-grams appearing in the $n$ documents in the corpus, and denote this set by $\mathcal{G}$ . Next, we sample a random permutation on $\mathcal{G}$ , and construct a hash function $h$ by mapping each 5-gram to its rank in the permutation. Finally, for each document $D$ , the MinHash value of $D$ is defined as

\begin{align*} \text{MinHash}(D; h) &= \min\{h(x) : x \in S(D)\}, \ \end{align*}

Theorem 1.23. For any two documents $D, D'$ in the corpus,

\begin{align*} \Pr[\text{MinHash}(D; h) = \text{MinHash}(D'; h)] &= J(D, D'). \end{align*}

Definition (MinHash Signature).

\begin{align*} H_j^{(i)}(D) &= \text{MinHash}(D; h_j^{(i)}) \in \mathbb{Z} \quad \text{for } i \in [b] \text{ and } j \in [r]. \end{align*}

We call this vector $H(D)$ the MinHash signature of $D$ .

For two documents $D$ and $D'$ , we say that their MinHash signatures collide if there exists an index $i \in [b]$ such that

\begin{align*} H^{(i)}(D) &= H^{(i)}(D'). \end{align*}

We write $H(D) \sim H(D')$ when the two signatures collide.

Theorem 1.24. For any two documents $D, D'$ in the corpus,

\begin{align*} \Pr[H(D) \sim H(D')] &= 1 - (1 - J(D, D')^r)^b. \end{align*}

Definition (The Banding Method).

The following algorithm clusters documents with colliding MinHash signatures into the same group.

Compute the MinHash signature $H(D)$ for each document $D$ .
Build an empty graph $G$ with $n$ vertices.
For each band $i \in [b]$ $i \in [b]$ :
- Build an empty hash table $\mathcal{T}_i$ with $m = \Theta(n)$ buckets.
- For each document $D$ $D$ :
  - If the key $H^{(i)}(D)$ is already present in $\mathcal{T}_i$ , select one stored document $D'$ and add an edge between $D$ and $D'$ in $G$ .
  - Otherwise, insert $D$ into $\mathcal{T}_i$ with key $H^{(i)}(D)$ .
Identify all connected components of $G$ using breadth-first search (BFS).
Return the documents grouped by these connected components.

1.1 Probability Spaces and Event​

Discrete Probability Space​

Event​

1.2 Asymptotic Notations​

Big-O\mathcal{O}O notation​

Big-O\mathcal{O}O notation over a domain​

Big-O\mathcal{O}O notation for multivariate functions​

Big-O\mathcal{O}O notation for x→0x \to 0x→0​

Big-O\mathcal{O}O notation defined using lim sup⁡\limsuplimsup​

Ω,Θ,∼\Omega, \Theta, \simΩ,Θ,∼ Notations​

Little-ooo and Little-ω\omegaω Notations​

1.3 Uniform Probability Means Counting​

The Secretary Problem​

The Locker Puzzle​

1.4 Intersection and Union of Events​

Intersection and Union of Event​

Hashing Collision​

Ramsey Numbers via the Probabilistic Method​

1.5 Conditional Probability and Independence​

Conditional Probability​

Independence of Event​

Mutual Independence​

1.6 Random Variables​

Random Variable​

Probability Distribution​

Indicator Function​

Probability Mass Function​

Expectation​

Conditional Expectation​

Conditional Expectation Given a Random Variable​

Independence of Random Variables​

Mutual Independence of Random Variables​

Data Deduplication via Hashing​