How to use martingales

a repository of mathematical know-how

How to use martingales

Title: *

Area of mathematics: *

A comma-separated list of areas of mathematics to which this article applies. Use ">" to tag in a subcategory. Example: Analysis > Harmonic analysis, Combinatorics

Keywords:

A comma-separated list of keywords associated with this article. Example: free group

Used in:

A comma-separated list of examples of where this technique is used. Example: Cauchy-Schwarz inequality

Parent articles:		Order

Body:

[QUICK DESCRIPTION]
An important general principle in probability is that a sum of bounded independent random variables, each of mean 0, has a very small probability of having a large modulus, where "large" means substantially larger than its standard deviation. For example, if you toss a coin 1000 times, then the probability that the number of heads does not lie between 400 and 600 is tiny. This principle can be proved rigorously -- see Example 3 of [[bounding probabilities by expectations]]. Also, and this is the main theme of this article, the assumption of independence can be weakened in a way that makes the principle much more widely applicable.

[PREREQUISITES]
Basic concepts of probability, graph theory, metric spaces.

[note contributions wanted] More examples wanted. [/note]

[EXAMPLE permutations]
Let $S_n$ be the symmetric group on $n$ elements. There is a natural graph with $S_n$ as its vertex set, where two permutations $\rho$ and $\sigma$ are joined if and only if $\rho\sigma^{-1}$ is a transposition, or equivalently $\rho=\tau\sigma$ for some transposition $\tau.$ This graph gives us a natural metric on $S_n$: the distance between two permutations $\rho$ and $\sigma$ is the smallest $k$ such that we can find $k$ transpositions $\tau_1,\dots,\tau_k$ with $\rho=\tau_1\dots\tau_k\sigma.$

Now let $\phi:S_n\rightarrow\mathbb{R}$ be a function that is 1-Lipschitz with respect to this distance. This is equivalent to the assumption that $|\phi(\rho)-\phi(\sigma)|\leq 1$ whenever there is a transposition $\tau$ such that $\rho=\tau\sigma.$ (An obvious example of such a function is the function $\phi$ where $\phi(\rho)$ is defined to be the smallest $k$ such that $\rho$ is a product of $k$ transpositions.) Since every permutation can be written as a product of at most $n-1$ transpositions and an $n$-cycle needs exactly $n-1$ transpositions, the diameter of $S_n$ is $n-1$, so $|\phi(\rho)-\phi(\sigma)|$ cannot exceed $n-1.$ We shall show that $\phi$ is "highly concentrated", in the sense that for almost all pairs $(\rho,\sigma)$ $|\phi(\rho)-\phi(\sigma)|$ is far smaller than $n$ -- it is rarely much more than $\sqrt{n}.$

[GENERAL DISCUSSION]

Before we get on to the proof, we need to state Azuma's inequality, and before we do that we need to say what a martingale is. The latter is simple. Let $Z_0,Z_1,\dots,Z_n$ be a sequence of random variables. They are a ''martingale'' if for every $k$ the expectation of $Z_k$ given the values of $Z_0,Z_1,\dots,Z_{k-1}$ is $Z_{k-1}.$

It is perhaps clearer to express this in terms of the ''martingale difference sequence'' $X_1,\dots,X_n,$ where $X_k=Z_k-Z_{k-1}$ for each $k$. Then the martingale condition is that the expectation of $X_k$ given the values of $Z_0$ and $X_1,\dots,X_{k-1}$ is always $0$. An obvious example is when the $X_i$ are independent random variables with mean $0$, but as we shall see there is an easy way of generating many more examples than this.

Suppose that each variable $X_i$ takes values between $-1$ and $1$. Then Azuma's inequality states that the probability that $|X_1+\dots+X_n|\geq t$ is at most $2\exp(-t^2/4n).$ More generally, if $X_i$ takes values between $-\lambda_i$ and $\lambda_i$, then the probability that $|X_1+\dots+X_n|\geq t$ is at most $2\exp(-t^2/4\sum \lambda_i^2).$

Now let us return to our example, and see how we can define a martingale in a very natural way.

[EXAMPLE permutations]
Recall that we have a function $\phi$ defined on $S_n$, with the property that if $\rho$ and $\sigma$ are partitions such that $\rho\sigma^{-1}$ is a transposition, then $|\phi(\rho)-\phi(\sigma)|\leq 1.$ In order to define a martingale difference sequence, we choose a random permutation $\rho$ and "reveal information about $\rho$ little by little".

What does this mean? Well, we look at the values of $\rho(1)$, $\rho(2)$, $\rho(3)$, and so on, and at the $k$th stage we define $Z_k$ to be the expected value of $\phi(\rho)$ given the values $\rho(1),\dots,\rho(k).$ In other words, $Z_0$ is just $\mathbb{E}\phi,$ $Z_1$ is the expected value of $\mathbb{E}\phi(\sigma)$ given that $\sigma(1)=\rho(1),$ and in general $Z_k$ is the expected value of $\mathbb{E}\phi(\sigma)$ given that $\sigma(i)=\rho(i)$ for every $i\in\{1,2,\dots,k\}.$

What can we say about $X_k=Z_k-Z_{k-1}$? We would like to prove an upper bound for the maximum absolute value that $X_k$ can take, and we would also like to prove that the expectation of $X_k$ given the values of $X_1,\dots,X_{k-1}$ is $0$.

$X_k$ is the amount by which the expectation of $\phi(\sigma)$ can change if we add to our existing knowledge of the values of $\sigma(1),\dots,\sigma(k-1)$ the knowledge that $\sigma(k)=\rho(k).$ To get an upper bound for this, let us define $A_r$, for each $r$ not in the set $\{\rho(1),\dots,\rho(k-1)\}$, to be the set of all $\sigma$ such that $\sigma(i)=\rho(i)$ for every $i<k$ and $\sigma(k)=r.$ Now let us pick $s$ and $t$ and define a bijection $\beta_{st}$ between $A_s$ and $A_t$ as follows. Given a permutation $\sigma\in A_s$, let $\beta_{st}(\sigma)=\tau$ agree with $\sigma$ except that $\tau(k)=t$ and $\tau(\sigma^{-1}(t))=s.$ To put this another way, $\beta_{st}(\sigma)=(st)\sigma,$ where $(st)$ is the transposition that exchanges $s$ and $t$.

Now the distance between $\sigma$ and $\beta_{rs}\sigma$ is at most 1 (and exactly 1 if $r\ne s$). Therefore, $|\phi(\sigma)-\phi(\beta_{rs}\sigma)|\leq 1$ for every $\sigma\in A_s.$ It follows that $\mathbb{E}[\phi(\sigma)|\sigma\in A_s]$ differs from $\mathbb{E}[\phi(\sigma)|\sigma\in A_t]$ by at most 1.

In other words, once we know what $\rho(k)$ is, the difference it can make to our expectation of $\phi(\rho)$ is at most 1. Also, the average of $\phi(\sigma)$ given that $\sigma(i)=\rho(i)$ for every $i<k$ is the average over $s$ of the averages of $\phi(\sigma)$ over $A_s$ (since all the sets $A_s$ have the same size), so the expectation of $Z_k$ given $Z_{k-1}$ is $Z_{k-1}$, as we wanted. (Equivalently, the expectation of $X_k$ given $X_1,\dots,X_{k-1}$ is $0.$)

We have now checked the facts we wanted to check: we have a martingale and the random variables $X_i$ in the martingale difference sequence take values in $[-1,1]$. It follows from Azuma's inequality that the probability that $|\phi(\rho)-\mathbb{E}\phi|\geq t$ is at most $2\exp(-t^2/4n).$ From this we see that even though the diameter of $S_n$ is $n-1$, for the vast majority of pairs $(\rho,\sigma)$ the difference $|\phi(\rho)-\phi(\sigma)|$ is at most $100\sqrt{n}.$

[GENERAL DISCUSSION]

The proof of Azuma's inequality is a relatively straightforward generalization of the proof given in Example 3 of [[bounding probabilities by expectations]], which deals with the special case where the martingale difference sequence consists of independent random variables that take values in $[-1,1].$

This is a stub

A stub is an article that is not sufficiently complete to be interesting.

Notifications

File attachments

Changes made to the attachments are not permanent until you save this post. The first "listed" file will be included in RSS feeds.

Attach new file:

Images are larger than 640x480 will be resized. The maximum upload size is 1 MB. Only files with the following extensions may be uploaded: jpg jpeg gif png svg.

Revision information

Log message:

An explanation of the additions or updates being made to help other authors understand your motivations.

How to use martingales

Recent articles

Active forum topics

Recent comments