Chernoff bounds

Herman Chernoff described a method on how to derive bounds on sequences of random variables. All the bounds derived using his method are now known as Chernoff bounds.

Chernoff's method centers around bounding a random variable $X$ , which represents a sequence of random variables, by studying the random variable $e^{tX}$ rather than the variable $X$ itself. There are many flavors of Chernoff bounds; we present a bound on the relative error for a series of experiments as well as the absolute error.

Theorem (absolute error)

The following Theorem is due to Wassily Hoeffding. Assume random variables $X_{1},X_{2},\ldots ,X_{m}$ are i.i.d. Let $p=E\left[X_{i}\right]$ , $X_{i}\in \{0,1\}$ , and $\varepsilon >0$ . Then

\Pr \left[{\frac {1}{m}}\sum X_{i}\geq p+\varepsilon \right]\leq \left({\left({\frac {p}{p+\varepsilon }}\right)}^{p+\varepsilon }{\left({\frac {1-p}{1-p-\varepsilon }}\right)}^{1-p-\varepsilon }\right)^{m}=e^{-D(p+\varepsilon \|p)m}

and

\Pr \left[{\frac {1}{m}}\sum X_{i}\leq p-\varepsilon \right]\leq \left({\left({\frac {p}{p-\varepsilon }}\right)}^{p-\varepsilon }{\left({\frac {1-p}{1-p+\varepsilon }}\right)}^{1-p+\varepsilon }\right)^{m}=e^{-D(p-\varepsilon \|p)m},

where

D(x||y)=x\log {\frac {x}{y}}+(1-x)\log {\frac {1-x}{1-y}}

is the Kullback-Leibler divergence between Bernoulli distributed random variables with parameters $x$ and $y$ respectively.

Proof

The proof of this result relies on Markov's inequality for positive-valued random variables. First, let us set $q=p+\varepsilon$ in our bound for ease of notation. Letting $\lambda >0$ be an arbitrary positive real, we see that

\Pr \left[\sum X_{i}\geq mq\right]=\Pr \left[e^{\sum X_{i}}\geq e^{mq}\right]=\Pr \left[\prod e^{X_{i}}\geq e^{mq}\right]=\Pr \left[\prod e^{\lambda X_{i}}\geq e^{\lambda mq}\right]

Applying Markov's inequality to the last expression, we see that

\Pr \left[{\frac {1}{m}}\sum X_{i}\geq q\right]\leq {\frac {E\left[\prod e^{\lambda X_{i}}\right]}{e^{\lambda mq}}}=\left[{\frac {E\left[e^{\lambda X_{i}}\right]}{e^{\lambda q}}}\right]^{m}

where the equality follows from the independence of the $n$ $X_{i}$ 's. Now, knowing that $\Pr[X_{i}=1]=p$ , $\Pr[X_{i}=0]=(1-p)$ , we have

\left[{\frac {E\left[e^{\lambda X_{i}}\right]}{e^{\lambda q}}}\right]^{m}=\left[{\frac {pe^{\lambda }+(1-p)}{e^{\lambda q}}}\right]^{m}=[pe^{(1-q)\lambda }+(1-p)e^{-q\lambda }]^{m}.

Because $\lambda$ is arbitrary, we can minimize the above expression with respect to $\lambda$ , which is easily done using calculus and some logarithms. Thus,

{\begin{aligned}{\frac {d}{d\lambda }}\log(pe^{(1-q)\lambda }+(1-p)e^{-q\lambda })&={\frac {1}{pe^{(1-q)\lambda }+(1-p)e^{-q\lambda }}}((1-q)pe^{(1-q)\lambda }-q(1-p)e^{-q\lambda })\\&=-q+{\frac {pe^{(1-q)\lambda }}{pe^{(1-q)\lambda }+(1-p)e^{-q\lambda }}}\end{aligned}}

Setting the last equation to zero and solving, we have

{\begin{aligned}q&={\frac {pe^{(1-q)\lambda }}{pe^{(1-q)\lambda }+(1-p)e^{-q\lambda }}}={\frac {pe^{(1-q)\lambda }}{e^{-q\lambda }(pe^{\lambda }+(1-p))}}\\pe^{(1-q)\lambda }&=pe^{-q\lambda }e^{\lambda }=qe^{-q\lambda }(pe^{\lambda }+1-p)\\{\frac {p}{q}}e^{\lambda }&=pe^{\lambda }+1-p\end{aligned}}

so that $e^{\lambda }=(1-p)\left({\frac {p}{q}}-p\right)^{-1}$ . Thus, $\lambda =\log \left({\frac {(1-p)q}{(1-q)p}}\right)$ . As $q=p+\varepsilon >p$ , we see that $\lambda >0$ , so out bound is satisfied on $\lambda$ . Having solved for $\lambda$ , we can plug back into the equations above to find that

{\begin{aligned}\log(pe^{(1-q)\lambda }+(1-p)e^{-q\lambda })&=\log[e^{-q\lambda }(1-p+pe^{\lambda })]\\&=\log \left[e^{-q\log \left({\frac {(1-p)q}{(1-q)p}}\right)}\right]+\log \left[1-p+pe^{\log \left({\frac {1-p}{1-q}}\right)}e^{\log {\frac {q}{p}}}\right]\\&=-q\log {\frac {1-p}{1-q}}-q\log {\frac {q}{p}}+\log \left[1-p+p\left({\frac {1-p}{1-q}}\right){\frac {q}{p}}\right]\\&=-q\log {\frac {1-p}{1-q}}-q\log {\frac {q}{p}}+\log \left[{\frac {(1-p)(1-q)}{1-q}}+{\frac {(1-p)q}{1-q}}\right]\\&=-q\log {\frac {q}{p}}+(1-q)\log {\frac {1-p}{1-q}}=-D(q\|p).\end{aligned}}

We now have our desired result, that

\Pr \left[{\frac {1}{m}}\sum X_{i}\geq p+\varepsilon \right]\leq e^{-D(p+\varepsilon \|p)m}.

To complete the proof for the symmetric case, we simply define the random variable $Y_{i}=1-X_{i}$ , apply the same proof, and plug into our bound.

Simpler bounds

A simpler bound follows by relaxing the theorem using $D(p+x\|p)\geq 2x^{2}$ , which follows from the convexity of $D(p+x\|p)$ and the fact that ${\frac {d^{2}}{dx^{2}}}D(p+x\|p)={\frac {1}{p(1-p)}}$ . This results in a special case of Hoeffding's inequality. Sometimes, the bound $D((1+x)p\|p)\geq x^{2}p/4$ for $-1/2\leq x\leq 1/2$ , which is stronger for $p<1/8$ , is also used.

Rudolf Ahlswede and Andreas Winter introduced a Chernoff bound for matrix-valued random variables.

Theorem (relative error)

Let random variables $X_{1},X_{2},\ldots ,X_{n}$ be independent random variables taking on values 0 or 1. Further, assume that $\Pr(X_{i}=1)=p_{i}$ . Then, if we let $X=\sum _{i=1}^{n}X_{i}$ and $\mu$ be the expectation of $X$ , for any $\delta >0$

\Pr \left[X>(1+\delta )\mu \right]<\left({\frac {e^{\delta }}{(1+\delta )^{(1+\delta )}}}\right)^{\mu }.

Proof

For any $t>0$ , we have that $\Pr[X>(1+\delta )\mu )]=\Pr[\exp(tX)>\exp(t(1+\delta )\mu )]$ . Applying Markov's inequality to the right-hand side of the previous formula (noting that $\exp(tX)$ is always a positive random variable), we have

\Pr[X>(1+\delta )\mu ]\leq {\frac {\mathbf {E} [\exp(tX)]}{\exp(t(1+\delta )\mu )}}.

Noting that $\mathbf {E} [\exp(tX)]=\mathbf {E} \left[\exp \left(t\sum _{i=1}^{n}X_{i}\right)\right]=\mathbf {E} \left[\prod _{i=1}^{n}\exp(tX_{i})\right]$ , we can begin to bound $\Pr[X>(1+\delta )\mu ]$ . We have

{\begin{aligned}\Pr[X>(1+\delta )\mu )]&\leq {\frac {\mathbf {E} \left[\prod _{i=1}^{n}\exp(tX_{i})\right]}{\exp(t(1+\delta )\mu )}}\\&={\frac {\prod _{i=1}^{n}\mathbf {E} [\exp(tX_{i})]}{\exp(t(1+\delta )\mu )}}\\&={\frac {\prod _{i=1}^{n}\left[p_{i}\exp(t)+(1-p_{i})\right]}{\exp(t(1+\delta )\mu )}}\end{aligned}}

The second line above follows because of the independence of the $X_{i}$ s, and the third line follows because $\exp(tX_{i})$ takes the value $e^{t}$ with probability $p_{i}$ and the value $1$ with probability $1-p_{i}$ . Re-writing $p_{i}\exp(t)+(1-p_{i})$ as $p_{i}(\exp(t)-1)+1$ and recalling that $1+x\leq \exp(x)$ (with strict inequality if $x>0$ ), we set $x=p_{i}(\exp(t)-1)$ . Thus

\Pr[X>(1+\delta )\mu ]<{\frac {\prod _{i=1}^{n}\exp(p_{i}(e^{t}-1))}{\exp(t(1+\delta )\mu )}}={\frac {\exp \left((e^{t}-1)\sum _{i=1}^{n}p_{i}\right)}{\exp(t(1+\delta )\mu )}}={\frac {\exp((e^{t}-1)\mu )}{\exp(t(1+\delta )\mu )}}.

If we simply set $t=\log(1+\delta )$ so that $t>0$ for $\delta >0$ , we can substitute and find

{\frac {\exp((e^{t}-1)\mu )}{\exp(t(1+\delta )\mu )}}={\frac {\exp((1+\delta -1)\mu )}{(1+\delta )^{(1+\delta )\mu }}}=\left[{\frac {\exp(\delta )}{(1+\delta )^{(1+\delta )}}}\right]^{\mu }

This proves the result desired. A similar proof strategy can be used to show that

\Pr[X<(1-\delta )\mu ]<\exp(-\mu \delta ^{2}/2).

References

Herman Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Annals of Mathematical Statistics, vol. 23, pp. 493–507, 1952.
Wassily Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association 58 (301): 13–30, March 1963. (JSTOR)
Rudolf Ahlswede and Andreas Winter, "Strong Converse for Identification via Quantum Channels" http://www.arxiv.org/abs/quant-ph/0012127

Theorem (absolute error)

Proof

Simpler bounds

Theorem (relative error)

Proof

See also

References