Chebyshev's inequality — Safekipedia Discoverer

In probability theory, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) helps us understand how likely it is for a random variable to be far away from its average value, or mean. It tells us that the chance of a value being more than a certain number of steps (measured by the standard deviation) away from the mean is very small. Specifically, it says this chance is at most one divided by the square of that number.

This rule is useful because it works for any kind of data distribution, as long as we know the mean and variance. For example, it can help prove important results in statistics, like the weak law of large numbers. Unlike the 68–95–99.7 rule, which only applies to normal distributions, Chebyshev's inequality gives a general idea about where most values lie. It says that at least 75% of values will be within two standard deviations of the mean, and about 89% will be within three standard deviations, no matter what the shape of the distribution is.

The idea is closely related to Markov's inequality, and sometimes the two are confused. But Chebyshev's inequality is especially important because it gives a clear bound on how spread out data can be, and it is exact for certain special cases.

History

This important rule in statistics is named after the Russian mathematician Pafnuty Chebyshev. However, it was actually first created by his friend Irénée-Jules Bienaymé in 1853. Chebyshev later expanded on it in 1867. Another mathematician, Andrey Markov, who was Chebyshev's student, also showed how the rule works in his thesis in 1884.

Statement

Chebyshev's inequality helps us understand how likely a value is to be close to the average in a set of data. It says that for any set of numbers with a known average and spread (called variance), the chance that a number is far away from the average can be bounded.

For example, if we know the average and how spread out the numbers are, we can say that it's unlikely for a number to be very far from this average. This idea is useful in many areas where we deal with uncertainty or variability in data.

Main article: Chebyshev's inequality

k {\displaystyle k}	Min. % within k {\displaystyle k} standard deviations of mean	Max. % beyond k {\displaystyle k} standard deviations from mean
1	0%	100%
√2	50%	50%
1.5	55.55%	44.44%
2	75%	25%
2√2	87.5%	12.5%
3	88.8888%	11.1111%
4	93.75%	6.25%
5	96%	4%
6	97.2222%	2.7778%
7	97.9592%	2.0408%
8	98.4375%	1.5625%
9	98.7654%	1.2346%
10	99%	1%

Example

Imagine we pick a journal article at random. These articles usually have around 1,000 words, but they can vary, with most being about 200 words away from this average. Chebyshev's inequality helps us understand the chances that an article’s word count stays within a certain range. If we look at articles within 2 standard deviations of the average, meaning between 600 and 1,400 words, there’s at least a 75% chance the article will fall in this range. This is because Chebyshev’s rule tells us that no more than 25% of articles can be outside this range. If we also know the word counts follow a normal pattern, we can narrow it down even more, saying there’s a 75% chance the count is between 770 and 1,230 words.

Sharpness of bounds

Chebyshev's inequality often gives loose bounds, meaning it doesn’t always give the exact probability. However, these bounds are the best possible for any distribution. There is a special example where Chebyshev's inequality gives an exact result. In this example, a random variable can take the values -1, 0, or +1, each with specific probabilities. For this case, the mean is 0, and the standard deviation is exactly (1/k). The probability that the variable differs from the mean by more than (k) times the standard deviation is precisely (1/k^2), showing that Chebyshev's inequality can be exact for certain distributions. These exact cases are special transformations of the example provided.

Proof

Chebyshev's inequality is a powerful tool in probability theory. It helps us understand how likely a random variable is to stray far from its average value.

One way to prove this inequality is by using another important concept called Markov's inequality. By applying Markov's inequality to a special kind of random variable, we can show that the chance of a random variable being more than a certain distance away from its average is very small. This gives us a clear bound on that probability.

Finite samples

Chebyshev's inequality can be adjusted for situations where we don’t know the exact mean and variance of a population, but we have data from a sample. Researchers Saw, Yang, and Mo created a version of the inequality that works with sample data instead of population data. This version helps us understand how much a new observation might differ from the sample mean, using the sample’s own measurements.

Another important idea is Samuelson’s inequality, which tells us that all values in a sample must stay within a certain number of standard deviations from the sample mean. This gives a stricter bound than Chebyshev’s inequality in some cases, especially when dealing with small samples. These tools help statisticians make better guesses about where future data points might fall, even when they don’t have complete information about the whole population.

Main article: Gauss's inequality

Main article: Vysochanskij–Petunin inequality

Related inequalities

Several other inequalities are connected to Chebyshev's inequality. The Paley–Zygmund inequality provides a lower bound on probabilities, which is different from Chebyshev's inequality that gives an upper bound.

Haldane's transformation uses Chebyshev's inequality to help create confidence intervals for data with unknown patterns. It can change data into a form that looks more normal, which can be useful in some situations.

There is also He, Zhang and Zhang's inequality, which applies to collections of non-negative independent random variables and provides a specific probability bound.

Integral Chebyshev inequality

The Integral Chebyshev inequality is another rule named after Chebyshev. It works with two special kinds of functions, called monotonic functions, that either both increase or both decrease at the same rate over a certain range.

If these two functions move in the same direction — both increasing or both decreasing — a special math rule connects their averages. This rule says that the average of their product is at least as large as the product of their separate averages.

If the two functions move in opposite directions — one increasing while the other decreases — the rule flips, and the average of their product becomes smaller than the product of their separate averages.