Herman Chernoff described a method on how to derive bounds on sequences of random variables. All the bounds derived using his method are now known as Chernoff bounds.
Chernoff's method centers around bounding a random variable , which represents a sequence of random variables, by studying the random variable rather than the variable itself. There are many flavors of Chernoff bounds; we present a bound on the relative error for a series of experiments as well as the absolute error.
Theorem (absolute error)
The following Theorem is due to Wassily Hoeffding.
Assume random variables are i.i.d.
Let , , and .
Then
and
where
is the Kullback-Leibler divergence between Bernoulli distributed random variables with parameters and respectively.
Proof
The proof of this result relies on Markov's inequality for positive-valued random variables. First, let us set in our bound for ease of notation. Letting be an arbitrary positive real, we see that
Applying Markov's inequality to the last expression, we see that
where the equality follows from the independence of the 's. Now, knowing that , , we have
Because is arbitrary, we can minimize the above expression with respect to , which is easily done using calculus and some logarithms. Thus,
Setting the last equation to zero and solving, we have
so that . Thus, . As , we see that , so out bound is satisfied on . Having solved for , we can plug back into the equations above to find that
We now have our desired result, that
To complete the proof for the symmetric case, we simply define the random variable , apply the same proof, and plug into our bound.
Simpler bounds
A simpler bound follows by relaxing the theorem using
, which follows from the convexity of and the fact that . This results in a special case of Hoeffding's inequality.
Sometimes, the bound
for , which is stronger for , is also used.
Rudolf Ahlswede and Andreas Winter introduced a Chernoff bound for matrix-valued random variables.
Theorem (relative error)
Let random variables be independent random variables taking on values 0 or 1. Further, assume that . Then, if we let and be the expectation of , for any
Proof
For any , we have that . Applying Markov's inequality to the right-hand side of the previous formula (noting that is always a positive random variable), we have
Noting that , we can begin to bound . We have
The second line above follows because of the independence of the s, and the third line follows because takes the value with probability and the value with probability . Re-writing as and recalling that (with strict inequality if ), we set . Thus
If we simply set so that for , we can substitute and find
This proves the result desired. A similar proof strategy can be used to show that
See also
References