Chebyshev's inequality

In probability theory, Chebyshev's inequality helps us understand how likely a random variable is to be far from its average value, or mean. It tells us that the chance of a value being far away from the mean is very small.

This rule is useful because it works for any kind of data, as long as we know the mean and variance. For example, it can help prove important results in statistics, like the weak law of large numbers. Unlike the 68–95–99.7 rule, which only applies to normal distributions, Chebyshev's inequality gives a general idea about where most values lie.

The idea is closely related to Markov's inequality. Chebyshev's inequality is important because it shows how spread out data can be.

History

This important rule in statistics is named after the Russian mathematician Pafnuty Chebyshev. It was first created by his friend Irénée-Jules Bienaymé in 1853. Chebyshev later expanded on it in 1867. Another mathematician, Andrey Markov, who was Chebyshev's student, also showed how the rule works in his thesis in 1884.

Statement

Chebyshev's inequality helps us understand how likely a value is to be close to the average in a set of data. It says that for any set of numbers with a known average and spread (called variance), the chance that a number is far away from the average can be limited.

For example, if we know the average and how spread out the numbers are, we can say that it's unlikely for a number to be very far from this average. This idea is useful in many areas where we deal with uncertainty or variability in data.

Main article: Chebyshev's inequality

k {\displaystyle k}	Min. % within k {\displaystyle k} standard deviations of mean	Max. % beyond k {\displaystyle k} standard deviations from mean
1	0%	100%
√2	50%	50%
1.5	55.55%	44.44%
2	75%	25%
2√2	87.5%	12.5%
3	88.8888%	11.1111%
4	93.75%	6.25%
5	96%	4%
6	97.2222%	2.7778%
7	97.9592%	2.0408%
8	98.4375%	1.5625%
9	98.7654%	1.2346%
10	99%	1%

Example

Imagine we pick a journal article at random. These articles usually have around 1,000 words, but they can vary. Chebyshev's inequality helps us understand the chances that an article’s word count stays within a certain range.

If we look at articles within 2 standard deviations of the average, meaning between 600 and 1,400 words, there’s at least a 75% chance the article will fall in this range. This is because Chebyshev’s rule tells us that no more than 25% of articles can be outside this range. If we also know the word counts follow a normal pattern, we can narrow it down even more.

Sharpness of bounds

Chebyshev's inequality sometimes gives loose bounds, meaning it doesn’t always give the exact probability. But these bounds are the best possible for any distribution.

There is a special example where Chebyshev's inequality gives an exact result. In this example, a random variable can take the values -1, 0, or +1, each with specific probabilities. For this case, the mean is 0, and the standard deviation is exactly (1/k). The probability that the variable differs from the mean by more than (k) times the standard deviation is precisely (1/k^2). This shows that Chebyshev's inequality can be exact for certain distributions. These exact cases are special transformations of the example provided.

Proof

Chebyshev's inequality is a useful idea in probability. It helps us understand how often a random value might be far from its average.

One way to show this inequality is by using another idea called Markov's inequality. By using Markov's inequality in a special way, we can see that the chance of a random value being far from its average is quite small. This gives us a good limit for that chance.

Finite samples

Chebyshev's inequality can be changed to work when we only have data from a sample, not the whole group. Researchers Saw, Yang, and Mo made a new version that uses sample data. This helps us guess how much a new measurement might differ from the sample mean.

Samuelson’s inequality is another idea. It says that all values in a sample stay within a certain number of standard deviations from the sample mean. This can give a stricter rule than Chebyshev’s inequality for small samples. These tools help statisticians make better guesses about where future data might fall, even when they don’t know everything about the whole group.

Main article: Gauss's inequality

Main article: Vysochanskij–Petunin inequality

Related inequalities

Several other inequalities are connected to Chebyshev's inequality. The Paley–Zygmund inequality gives a lower bound on probabilities, which is different from Chebyshev's inequality that gives an upper bound.

Haldane's transformation uses Chebyshev's inequality to help make confidence intervals for data with unknown patterns. It can change data into a form that looks more normal, which can be useful sometimes.

There is also He, Zhang and Zhang's inequality, which works with collections of non-negative independent random variables and gives a specific probability bound.

Integral Chebyshev inequality

The Integral Chebyshev inequality is a math rule named after Chebyshev. It works with two special kinds of functions called monotonic functions. These functions either both go up or both go down at the same rate over a certain range.

If the two functions move the same way — both going up or both going down — there is a rule about their averages. This rule says that the average of what you get when you multiply them together is at least as big as what you get when you multiply their averages.

If the two functions move in opposite ways — one going up while the other goes down — the rule changes. The average of their product becomes smaller than the product of their averages.