[go: nahoru, domu]

Jump to content

Expected value: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Rigormath (talk | contribs)
m →‎Expected values of common distributions: Two errors corrected: Normal (denominator 2 in exponent) and Pareto (α = 0).
 
(46 intermediate revisions by 28 users not shown)
Line 2: Line 2:
{{About|the term used in probability theory and statistics}}
{{About|the term used in probability theory and statistics}}
{{Redirect|E(X)|the <math>e^x</math> function|Exponential function}}
{{Redirect|E(X)|the <math>e^x</math> function|Exponential function}}
{{ Redirect | E value || E-Series (disambiguation) }}
{{Probability fundamentals}}
{{Probability fundamentals}}


In [[probability theory]], the '''expected value''' (also called '''expectation''', '''expectancy''', '''mathematical expectation''', '''mean''', '''average''', or '''first moment''') is a generalization of the [[weighted average]]. Informally, the expected value is the [[arithmetic mean]] of a large number of [[Independence (probability theory)|independently]] selected [[Experiment (probability theory)|outcomes]] of a [[random variable]].
In [[probability theory]], the '''expected value''' (also called '''expectation''', '''expectancy''', '''expectation operator''', '''mathematical expectation''', '''mean''', '''expectation value''', or '''first [[Moment (mathematics)|moment]]''') is a generalization of the [[weighted average]]. Informally, the expected value is the [[arithmetic mean]] of the possible values a [[random variable]] can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would "expect" to get in reality.


The expected value of a [[random variable]] with a finite number of outcomes is a [[weighted average]] of all possible outcomes. In the case of a continuum of possible outcomes, the expectation is defined by [[Integral|integration]]. In the axiomatic foundation for probability provided by [[measure theory]], the expectation is given by [[Lebesgue integration]].
The expected value of a random variable with a finite number of outcomes is a weighted average of all possible outcomes. In the case of a continuum of possible outcomes, the expectation is defined by [[Integral|integration]]. In the axiomatic foundation for probability provided by [[measure theory]], the expectation is given by [[Lebesgue integration]].


The expected value of a random variable {{mvar|X}} is often denoted by {{math|E(''X'')}}, {{math|E[''X'']}}, or {{math|E''X''}}, with {{math|E}} also often stylized as {{mvar|E}} or <math>\mathbb{E}.</math><ref>{{Cite web|title=Expectation {{!}} Mean {{!}} Average|url=https://www.probabilitycourse.com/chapter3/3_2_2_expectation.php|access-date=2020-09-11|website=www.probabilitycourse.com}}</ref><ref>{{Cite web|last=Hansen|first=Bruce|title=PROBABILITY AND STATISTICS FOR ECONOMISTS|url=https://ssc.wisc.edu/~bhansen/probability/Probability.pdf|url-status=live|access-date=2021-07-20}}</ref><ref>{{cite book |last1=Wasserman |first1=Larry |title=All of Statistics: a concise course in statistical inference |date=December 2010 |publisher=Springer texts in statistics |isbn=9781441923226 |page=47}}</ref>
The expected value of a random variable {{mvar|X}} is often denoted by {{math|E(''X'')}}, {{math|E[''X'']}}, or {{math|E''X''}}, with {{math|E}} also often stylized as <math>\mathbb{E}</math> or {{math|''E''}}.<ref>{{Cite web|title=Expectation {{!}} Mean {{!}} Average|url=https://www.probabilitycourse.com/chapter3/3_2_2_expectation.php|access-date=2020-09-11|website=www.probabilitycourse.com}}</ref><ref>{{Cite web|last=Hansen|first=Bruce|title=PROBABILITY AND STATISTICS FOR ECONOMISTS|url=https://ssc.wisc.edu/~bhansen/probability/Probability.pdf|access-date=2021-07-20|archive-date=2022-01-19|archive-url=https://web.archive.org/web/20220119041716/https://ssc.wisc.edu/~bhansen/probability/Probability.pdf|url-status=dead}}</ref><ref>{{cite book |last1=Wasserman |first1=Larry |title=All of Statistics: a concise course in statistical inference |date=December 2010 |publisher=Springer texts in statistics |isbn=9781441923226 |page=47}}</ref>


{{TOC limit|3}}
{{TOC limit|3}}


==History==
== History==
The idea of the expected value originated in the middle of the 17th century from the study of the so-called [[problem of points]], which seeks to divide the stakes ''in a fair way'' between two players, who have to end their game before it is properly finished.<ref>{{Cite book|title=History of Probability and Statistics and Their Applications before 1750|language=en|doi=10.1002/0471725161|series = Wiley Series in Probability and Statistics|year = 1990|isbn = 9780471725169}}</ref> This problem had been debated for centuries. Many conflicting proposals and solutions had been suggested over the years when it was posed to [[Blaise Pascal]] by French writer and amateur mathematician [[Antoine Gombaud|Chevalier de Méré]] in 1654. Méré claimed that this problem couldn't be solved and that it showed just how flawed mathematics was when it came to its application to the real world. Pascal, being a mathematician, was provoked and determined to solve the problem once and for all.
The idea of the expected value originated in the middle of the 17th century from the study of the so-called [[problem of points]], which seeks to divide the stakes ''in a fair way'' between two players, who have to end their game before it is properly finished.<ref>{{Cite book|title=History of Probability and Statistics and Their Applications before 1750|language=en|doi=10.1002/0471725161|series = Wiley Series in Probability and Statistics|year = 1990|isbn = 9780471725169}}</ref> This problem had been debated for centuries. Many conflicting proposals and solutions had been suggested over the years when it was posed to [[Blaise Pascal]] by French writer and amateur mathematician [[Antoine Gombaud|Chevalier de Méré]] in 1654. Méré claimed that this problem could not be solved and that it showed just how flawed mathematics was when it came to its application to the real world. Pascal, being a mathematician, was provoked and determined to solve the problem once and for all.


He began to discuss the problem in the famous series of letters to [[Pierre de Fermat]]. Soon enough, they both independently came up with a solution. They solved the problem in different computational ways, but their results were identical because their computations were based on the same fundamental principle. The principle is that the value of a future gain should be directly proportional to the chance of getting it. This principle seemed to have come naturally to both of them. They were very pleased by the fact that they had found essentially the same solution, and this in turn made them absolutely convinced that they had solved the problem conclusively; however, they did not publish their findings. They only informed a small circle of mutual scientific friends in Paris about it.<ref>{{cite journal |title=Ore, Pascal and the Invention of Probability Theory |journal=The American Mathematical Monthly |volume=67 |issue=5 |year=1960 |pages=409–419 |doi=10.2307/2309286|jstor=2309286 |last1=Ore |first1=Oystein }}</ref>
He began to discuss the problem in the famous series of letters to [[Pierre de Fermat]]. Soon enough, they both independently came up with a solution. They solved the problem in different computational ways, but their results were identical because their computations were based on the same fundamental principle. The principle is that the value of a future gain should be directly proportional to the chance of getting it. This principle seemed to have come naturally to both of them. They were very pleased by the fact that they had found essentially the same solution, and this in turn made them absolutely convinced that they had solved the problem conclusively; however, they did not publish their findings. They only informed a small circle of mutual scientific friends in Paris about it.<ref>{{cite journal |title=Ore, Pascal and the Invention of Probability Theory |journal=The American Mathematical Monthly |volume=67 |issue=5 |year=1960 |pages=409–419 |doi=10.2307/2309286|jstor=2309286 |last1=Ore |first1=Oystein }}</ref>
Line 23: Line 22:


{{Blockquote|text=It should be said, also, that for some time some of the best mathematicians of France have occupied themselves with this kind of calculus so that no one should attribute to me the honour of the first invention. This does not belong to me. But these savants, although they put each other to the test by proposing to each other many questions difficult to solve, have hidden their methods. I have had therefore to examine and go deeply for myself into this matter by beginning with the elements, and it is impossible for me for this reason to affirm that I have even started from the same principle. But finally I have found that my answers in many cases do not differ from theirs.|sign=|source=Edwards (2002)}}
{{Blockquote|text=It should be said, also, that for some time some of the best mathematicians of France have occupied themselves with this kind of calculus so that no one should attribute to me the honour of the first invention. This does not belong to me. But these savants, although they put each other to the test by proposing to each other many questions difficult to solve, have hidden their methods. I have had therefore to examine and go deeply for myself into this matter by beginning with the elements, and it is impossible for me for this reason to affirm that I have even started from the same principle. But finally I have found that my answers in many cases do not differ from theirs.|sign=|source=Edwards (2002)}}

During his visit to France in 1655, Huygens learned about [[de Méré's Problem]]. From his correspondence with Carcavine a year later (in 1656), he realized his method was essentially the same as Pascal's. Therefore, he knew about Pascal's priority in this subject before his book went to press in 1657.<ref>{{Cite book|last=Mckay|first=Cain|title=Probability and Statistics|year=2019|isbn=9781839473302|pages=257}}</ref>


In the mid-nineteenth century, [[Pafnuty Chebyshev]] became the first person to think systematically in terms of the expectations of [[random variables]].<ref>{{cite journal|journal=Bulletin of the American Mathematical Society |series=New Series|volume=3|number=1|date=July 1980|title=HARMONIC ANALYSIS AS THE EXPLOITATION OF SYMMETRY - A HISTORICAL SURVEY|author=George Mackey|page=549}}</ref>
In the mid-nineteenth century, [[Pafnuty Chebyshev]] became the first person to think systematically in terms of the expectations of [[random variables]].<ref>{{cite journal|journal=Bulletin of the American Mathematical Society |series=New Series|volume=3|number=1|date=July 1980|title=HARMONIC ANALYSIS AS THE EXPLOITATION OF SYMMETRY - A HISTORICAL SURVEY|author=George Mackey|page=549}}</ref>
Line 35: Line 32:
More than a hundred years later, in 1814, [[Pierre-Simon Laplace]] published his tract "''Théorie analytique des probabilités''", where the concept of expected value was defined explicitly:<ref>{{Cite book|title=A philosophical essay on probabilities|last=Laplace, Pierre Simon, marquis de, 1749-1827.|date=1952| orig-year=1951|publisher=Dover Publications|oclc=475539}}</ref>
More than a hundred years later, in 1814, [[Pierre-Simon Laplace]] published his tract "''Théorie analytique des probabilités''", where the concept of expected value was defined explicitly:<ref>{{Cite book|title=A philosophical essay on probabilities|last=Laplace, Pierre Simon, marquis de, 1749-1827.|date=1952| orig-year=1951|publisher=Dover Publications|oclc=475539}}</ref>


{{quote| this advantage in the theory of chance is the product of the sum hoped for by the probability of obtaining it; it is the partial sum which ought to result when we do not wish to run the risks of the event in supposing that the division is made proportional to the probabilities. This division is the only equitable one when all strange circumstances are eliminated; because an equal degree of probability gives an equal right for the sum hoped for. We will call this advantage ''mathematical hope''.}}
{{quote|... this advantage in the theory of chance is the product of the sum hoped for by the probability of obtaining it; it is the partial sum which ought to result when we do not wish to run the risks of the event in supposing that the division is made proportional to the probabilities. This division is the only equitable one when all strange circumstances are eliminated; because an equal degree of probability gives an equal right for the sum hoped for. We will call this advantage ''mathematical hope.''}}


==Notations==
==Notations==
The use of the letter {{math|E}} to denote "expected value" goes back to [[William Allen Whitworth|W. A. Whitworth]] in 1901.<ref>Whitworth, W.A. (1901) ''Choice and Chance with One Thousand Exercises''. Fifth edition. Deighton Bell, Cambridge. [Reprinted by Hafner Publishing Co., New York, 1959.]</ref> The symbol has become popular since then for English writers. In German, {{math|E}} stands for "Erwartungswert", in Spanish for "esperanza matemática", and in French for "espérance mathématique".<ref>{{cite web|title=Earliest uses of symbols in probability and statistics|url=http://jeff560.tripod.com/stat.html}}</ref>
The use of the letter {{math|E}} to denote "expected value" goes back to [[William Allen Whitworth|W. A. Whitworth]] in 1901.<ref>Whitworth, W.A. (1901) ''Choice and Chance with One Thousand Exercises.'' Fifth edition. Deighton Bell, Cambridge. [Reprinted by Hafner Publishing Co., New York, 1959.]</ref> The symbol has since become popular for English writers. In German, {{math|E}} stands for ''Erwartungswert'', in Spanish for ''esperanza matemática'', and in French for ''espérance mathématique.''<ref>{{cite web|title=Earliest uses of symbols in probability and statistics|url=http://jeff560.tripod.com/stat.html}}</ref>


When "E" is used to denote expected value, authors use a variety of stylization: the expectation operator can be stylized as {{math|E}} (upright), {{mvar|E}} (italic), or <math>\mathbb{E}</math> (in [[blackboard bold]]), while a variety of bracket notations (such as {{math|E(''X'')}}, {{math|E[''X'']}}, and {{math|E''X''}}) are all used.
When "E" is used to denote "expected value", authors use a variety of stylizations: the expectation operator can be stylized as {{math|E}} (upright), {{mvar|E}} (italic), or <math>\mathbb{E}</math> (in [[blackboard bold]]), while a variety of bracket notations (such as {{math|E(''X'')}}, {{math|E[''X'']}}, and {{math|E''X''}}) are all used.


Another popular notation is {{math|μ<sub>''X''</sub>}}, whereas {{math|⟨''X''⟩}}, {{math|⟨''X''⟩<sub>av</sub>}}, and <math>\overline{X}</math> are commonly used in physics,{{sfnm|1a1=Feller|1y=1968|1p=221}} and {{math|M(''X'')}} in Russian-language literature.
Another popular notation is {{math|μ<sub>''X''</sub>}}, whereas {{math|⟨''X''⟩}}, {{math|⟨''X''⟩<sub>av</sub>}}, and <math>\overline{X}</math> are commonly used in physics,{{sfnm|1a1=Feller|1y=1968|1p=221}} and {{math|M(''X'')}} in Russian-language literature.
Line 50: Line 47:


===Random variables with finitely many outcomes===
===Random variables with finitely many outcomes===
Consider a random variable {{mvar|X}} with a ''finite'' list {{math|''x''<sub>1</sub>, ..., ''x''<sub>''k''</sub>}} of possible outcomes, each of which (respectively) has probability {{math|''p''<sub>1</sub>, ..., ''p''<sub>''k''</sub>}} of occurring. The '''expectation''' of {{mvar|X}} is defined as{{sfnm|1a1=Billingsley|1y=1995|1p=76}}
Consider a random variable {{mvar|X}} with a ''finite'' list {{math|''x''<sub>1</sub>, ..., ''x''<sub>''k''</sub>}} of possible outcomes, each of which (respectively) has probability {{math|''p''<sub>1</sub>, ..., ''p''<sub>''k''</sub>}} of occurring. The expectation of {{mvar|X}} is defined as{{sfnm|1a1=Billingsley|1y=1995|1p=76}}
<math display="block">\operatorname{E}[X] =x_1p_1 + x_2p_2 + \cdots + x_kp_k.</math>


Since the probabilities must satisfy {{math|1=''p''<sub>1</sub> + ⋅⋅⋅ + ''p''<sub>''k''</sub> = 1}}, it is natural to interpret {{math|E[''X'']}} as a [[weighted average]] of the {{math|''x''<sub>''i''</sub>}} values, with weights given by their probabilities {{math|''p''<sub>''i''</sub>}}.
:<math>\operatorname{E}[X] =x_1p_1 + x_2p_2 + \cdots + x_kp_k.</math>


Since the probabilities must satisfy {{math|''p''<sub>1</sub> + ⋅⋅⋅ + ''p''<sub>''k''</sub> {{=}} 1}}, it is natural to interpret {{math|E[''X'']}} as a [[weighted average]] of the {{math|''x''<sub>''i''</sub>}} values, with weights given by their probabilities {{math|''p''<sub>''i''</sub>}}.
In the special case that all possible outcomes are [[equiprobable]] (that is, {{math|1=''p''<sub>1</sub> = ⋅⋅⋅ = ''p''<sub>''k''</sub>}}), the weighted average is given by the standard [[arithmetic mean|average]]. In the general case, the expected value takes into account the fact that some outcomes are more likely than others.

In the special case that all possible outcomes are [[equiprobable]] (that is, {{math|''p''<sub>1</sub> {{=}} ⋅⋅⋅ {{=}} ''p''<sub>''k''</sub>}}), the weighted average is given by the standard [[arithmetic mean|average]]. In the general case, the expected value takes into account the fact that some outcomes are more likely than others.


====Examples====
====Examples====
[[File:Largenumbers.svg|thumb|An illustration of the convergence of sequence averages of rolls of a {{dice}} to the expected value of 3.5 as the number of rolls (trials) grows]]
[[File:Largenumbers.svg|thumb|An illustration of the convergence of sequence averages of rolls of a dice to the expected value of 3.5 as the number of rolls (trials) grows]]

* Let <math>X</math> represent the outcome of a roll of a fair six-sided {{dice}}. More specifically, <math>X</math> will be the number of [[Pip (counting)|pips]] showing on the top face of the {{dice}} after the toss. The possible values for <math>X</math> are 1, 2, 3, 4, 5, and 6, all of which are equally likely with a probability of {{frac2|1|6}}. The expectation of <math>X</math> is
:: <math>\operatorname{E}[X] = 1\cdot\frac16 + 2\cdot\frac16 + 3\cdot\frac16 + 4\cdot\frac16 + 5\cdot\frac16 + 6\cdot\frac16 = 3.5.</math>

:If one rolls the {{dice}} <math>n</math> times and computes the average ([[arithmetic mean]]) of the results, then as <math>n</math> grows, the average will [[almost surely]] [[Convergent sequence|converge]] to the expected value, a fact known as the [[strong law of large numbers]].
* The [[roulette]] game consists of a small ball and a wheel with 38 numbered pockets around the edge. As the wheel is spun, the ball bounces around randomly until it settles down in one of the pockets. Suppose random variable <math>X</math> represents the (monetary) outcome of a $1 bet on a single number ("straight up" bet). If the bet wins (which happens with probability {{frac2|1|38}} in American roulette), the payoff is $35; otherwise the player loses the bet. The expected profit from such a bet will be
:: <math> \operatorname{E}[\,\text{gain from }\$1\text{ bet}\,] = -\$1 \cdot \frac{37}{38} + \$35 \cdot \frac{1}{38} = -\$\frac{1}{19}.</math>


* Let <math>X</math> represent the outcome of a roll of a fair six-sided {{dice}}. More specifically, <math>X</math> will be the number of [[Pip (counting)|pips]] showing on the top face of the {{dice}} after the toss. The possible values for <math>X</math> are 1, 2, 3, 4, 5, and 6, all of which are equally likely with a probability of {{frac2|1|6}}. The expectation of <math>X</math> is <math display="block"> \operatorname{E}[X] = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6} + 3\cdot\frac{1}{6} + 4\cdot\frac{1}{6} + 5\cdot\frac{1}{6} + 6\cdot\frac{1}{6} = 3.5.</math> If one rolls the {{dice}} <math>n</math> times and computes the average ([[arithmetic mean]]) of the results, then as <math>n</math> grows, the average will [[almost surely]] [[Convergent sequence|converge]] to the expected value, a fact known as the [[strong law of large numbers]].
:That is, the expected value to be won from a $1 bet is −${{frac2|1|19}}. Thus, in 190 bets, the net loss will probably be about $10.
* The [[roulette]] game consists of a small ball and a wheel with 38 numbered pockets around the edge. As the wheel is spun, the ball bounces around randomly until it settles down in one of the pockets. Suppose random variable <math>X</math> represents the (monetary) outcome of a $1 bet on a single number ("straight up" bet). If the bet wins (which happens with probability {{frac2|1|38}} in American roulette), the payoff is $35; otherwise the player loses the bet. The expected profit from such a bet will be <math display="block"> \operatorname{E}[\,\text{gain from }\$1\text{ bet}\,] = -\$1 \cdot \frac{37}{38} + \$35 \cdot \frac{1}{38} = -\$\frac{1}{19}.</math> That is, the expected value to be won from a $1 bet is −${{frac2|1|19}}. Thus, in 190 bets, the net loss will probably be about $10.


===Random variables with countably many outcomes===
===Random variables with countably infinitely many outcomes===
Informally, the expectation of a random variable with a [[countable set]] of possible outcomes is defined analogously as the weighted average of all possible outcomes, where the weights are given by the probabilities of realizing each given value. This is to say that
Informally, the expectation of a random variable with a [[countable set|countably infinite set]] of possible outcomes is defined analogously as the weighted average of all possible outcomes, where the weights are given by the probabilities of realizing each given value. This is to say that
: <math> \operatorname{E}[X] = \sum_{i=1}^\infty x_i\, p_i,</math>
<math display="block">\operatorname{E}[X] = \sum_{i=1}^\infty x_i\, p_i,</math>
where {{math|''x''<sub>1</sub>, ''x''<sub>2</sub>, ...}} are the possible outcomes of the random variable {{mvar|X}} and {{math|''p''<sub>1</sub>, ''p''<sub>2</sub>, ...}} are their corresponding probabilities. In many non-mathematical textbooks, this is presented as the full definition of expected values in this context.{{sfnm|1a1=Ross|1y=2019|1loc=Section 2.4.1}}
where {{math|''x''<sub>1</sub>, ''x''<sub>2</sub>, ...}} are the possible outcomes of the random variable {{mvar|X}} and {{math|''p''<sub>1</sub>, ''p''<sub>2</sub>, ...}} are their corresponding probabilities. In many non-mathematical textbooks, this is presented as the full definition of expected values in this context.{{sfnm|1a1=Ross|1y=2019|1loc=Section 2.4.1}}


Line 80: Line 70:


====Examples====
====Examples====
* Suppose <math>x_i = i</math> and <math> p_i = \tfrac{c}{i2^i}</math> for <math>i = 1, 2, 3, \ldots,</math> where <math>c = \tfrac{1}{\ln 2}</math> is the scaling factor which makes the probabilities sum to 1. Then, using the direct definition for non-negative random variables, we have <math display="block">\operatorname{E}[X] \,= \sum_i x_i p_i = 1(\tfrac{c}{2})
* Suppose <math>x_i = i</math> and <math>p_i = \tfrac{c}{i \cdot 2^i}</math> for <math>i = 1, 2, 3, \ldots,</math> where <math>c = \tfrac{1}{\ln 2}</math> is the scaling factor which makes the probabilities sum to 1. Then we have <math display="block">\operatorname{E}[X] \,= \sum_i x_i p_i = 1(\tfrac{c}{2})
+ 2(\tfrac{c}{8}) + 3 (\tfrac{c}{24}) + \cdots
+ 2(\tfrac{c}{8}) + 3 (\tfrac{c}{24}) + \cdots
\,= \, \tfrac{c}{2} + \tfrac{c}{4} + \tfrac{c}{8} + \cdots \,=\, c \,=\, \tfrac{1}{\ln 2}.</math>
\,= \, \tfrac{c}{2} + \tfrac{c}{4} + \tfrac{c}{8} + \cdots \,=\, c \,=\, \tfrac{1}{\ln 2}.</math>


===Random variables with density===
===Random variables with density===
Now consider a random variable {{mvar|X}} which has a [[probability density function]] given by a function {{mvar|f}} on the [[real number line]]. This means that the probability of {{mvar|X}} taking on a value in any given [[open interval]] is given by the [[integral]] of {{mvar|f}} over that interval. The '''expectation''' of {{mvar|X}} is then given by the integral{{sfnm|1a1=Papoulis|1a2=Pillai|1y=2002|1loc=Section 5-3|2a1=Ross|2y=2019|2loc=Section 2.4.2}}
Now consider a random variable {{mvar|X}} which has a [[probability density function]] given by a function {{mvar|f}} on the [[real number line]]. This means that the probability of {{mvar|X}} taking on a value in any given [[open interval]] is given by the [[integral]] of {{mvar|f}} over that interval. The expectation of {{mvar|X}} is then given by the integral{{sfnm|1a1=Papoulis|1a2=Pillai|1y=2002|1loc=Section 5-3|2a1=Ross|2y=2019|2loc=Section 2.4.2}}
: <math>\operatorname{E}[X] = \int_{-\infty}^\infty x f(x)\, dx.</math>
<math display="block">\operatorname{E}[X] = \int_{-\infty}^\infty x f(x)\, dx.</math>
A general and mathematically precise formulation of this definition uses [[measure theory]] and [[Lebesgue integration]], and the corresponding theory of ''absolutely continuous random variables'' is described in the next section. The density functions of many common distributions are [[piecewise continuous]], and as such the theory is often developed in this restricted setting.{{sfnm|1a1=Feller|1y=1971|1loc=Section I.2}} For such functions, it is sufficient to only consider the standard [[Riemann integration]]. Sometimes ''continuous random variables'' are defined as those corresponding to this special class of densities, although the term is used differently by various authors.
A general and mathematically precise formulation of this definition uses [[measure theory]] and [[Lebesgue integration]], and the corresponding theory of ''absolutely continuous random variables'' is described in the next section. The density functions of many common distributions are [[piecewise continuous]], and as such the theory is often developed in this restricted setting.{{sfnm|1a1=Feller|1y=1971|1loc=Section I.2}} For such functions, it is sufficient to only consider the standard [[Riemann integration]]. Sometimes ''continuous random variables'' are defined as those corresponding to this special class of densities, although the term is used differently by various authors.


Analogously to the countably-infinite case above, there are subtleties with this expression due to the infinite region of integration. Such subtleties can be seen concretely if the distribution of {{mvar|X}} is given by the [[Cauchy distribution]] {{math|Cauchy(0, π)}}, so that {{math|''f''(''x'') {{=}} (''x''<sup>2</sup> + π<sup>2</sup>)<sup>−1</sup>}}. It is straightforward to compute in this case that
Analogously to the countably-infinite case above, there are subtleties with this expression due to the infinite region of integration. Such subtleties can be seen concretely if the distribution of {{mvar|X}} is given by the [[Cauchy distribution]] {{math|Cauchy(0, π)}}, so that {{math|''f''(''x'') {{=}} (''x''<sup>2</sup> + π<sup>2</sup>)<sup>−1</sup>}}. It is straightforward to compute in this case that
:<math>\int_a^b xf(x)\,dx=\int_a^b \frac{x}{x^2+\pi^2}\,dx=\frac{1}{2}\ln\frac{b^2+\pi^2}{a^2+\pi^2}.</math>
<math display="block">\int_a^b xf(x)\,dx=\int_a^b \frac{x}{x^2+\pi^2}\,dx=\frac{1}{2}\ln\frac{b^2+\pi^2}{a^2+\pi^2}.</math>
The limit of this expression as {{math|''a'' → −∞}} and {{math|''b'' → ∞}} does not exist: if the limits are taken so that {{math|''a'' {{=}} −''b''}}, then the limit is zero, while if the constraint {{math|2''a'' {{=}} −''b''}} is taken, then the limit is {{math|ln(2)}}.
The limit of this expression as {{math|''a'' → −∞}} and {{math|''b'' → ∞}} does not exist: if the limits are taken so that {{math|''a'' {{=}} −''b''}}, then the limit is zero, while if the constraint {{math|2''a'' {{=}} −''b''}} is taken, then the limit is {{math|ln(2)}}.


Line 97: Line 87:
===Arbitrary real-valued random variables===
===Arbitrary real-valued random variables===
All definitions of the expected value may be expressed in the language of [[measure theory]]. In general, if {{mvar|X}} is a real-valued [[random variable]] defined on a [[probability space]] {{math|(Ω, Σ, P)}}, then the expected value of {{mvar|X}}, denoted by {{math|E[''X'']}}, is defined as the [[Lebesgue integration|Lebesgue integral]]{{sfnm|1a1=Billingsley|1y=1995|1p=273}}
All definitions of the expected value may be expressed in the language of [[measure theory]]. In general, if {{mvar|X}} is a real-valued [[random variable]] defined on a [[probability space]] {{math|(Ω, Σ, P)}}, then the expected value of {{mvar|X}}, denoted by {{math|E[''X'']}}, is defined as the [[Lebesgue integration|Lebesgue integral]]{{sfnm|1a1=Billingsley|1y=1995|1p=273}}
:<math>\operatorname{E} [X] = \int_\Omega X\,d\operatorname{P}.</math>
<math display="block">\operatorname{E} [X] = \int_\Omega X\,d\operatorname{P}.</math>
Despite the newly abstract situation, this definition is extremely similar in nature to the very simplest definition of expected values, given above, as certain weighted averages. This is because, in measure theory, the value of the Lebesgue integral of {{mvar|X}} is defined via weighted averages of ''approximations'' of {{mvar|X}} which take on finitely many values.{{sfnm|1a1=Billingsley|1y=1995|1loc=Section 15}} Moreover, if given a random variable with finitely or countably many possible values, the Lebesgue theory of expectation is identical with the summation formulas given above. However, the Lebesgue theory clarifies the scope of the theory of probability density functions. A random variable {{mvar|X}} is said to be ''absolutely continuous'' if any of the following conditions are satisfied:
Despite the newly abstract situation, this definition is extremely similar in nature to the very simplest definition of expected values, given above, as certain weighted averages. This is because, in measure theory, the value of the Lebesgue integral of {{mvar|X}} is defined via weighted averages of ''approximations'' of {{mvar|X}} which take on finitely many values.{{sfnm|1a1=Billingsley|1y=1995|1loc=Section 15}} Moreover, if given a random variable with finitely or countably many possible values, the Lebesgue theory of expectation is identical to the summation formulas given above. However, the Lebesgue theory clarifies the scope of the theory of probability density functions. A random variable {{mvar|X}} is said to be ''absolutely continuous'' if any of the following conditions are satisfied:
* there is a nonnegative [[measurable function]] {{mvar|f}} on the real line such that
* there is a nonnegative [[measurable function]] {{mvar|f}} on the real line such that <math display="block">\operatorname{P}(X \in A) = \int_A f(x) \, dx,</math> for any [[Borel set]] {{mvar|A}}, in which the integral is Lebesgue.
::<math>\text{P}(X\in A)=\int_A f(x)\,dx,</math>
:for any [[Borel set]] {{mvar|A}}, in which the integral is Lebesgue.
* the [[cumulative distribution function]] of {{mvar|X}} is [[absolutely continuous]].
* the [[cumulative distribution function]] of {{mvar|X}} is [[absolutely continuous]].
* for any Borel set {{mvar|A}} of real numbers with [[Lebesgue measure]] equal to zero, the probability of {{mvar|X}} being valued in {{mvar|A}} is also equal to zero
* for any Borel set {{mvar|A}} of real numbers with [[Lebesgue measure]] equal to zero, the probability of {{mvar|X}} being valued in {{mvar|A}} is also equal to zero
* for any positive number {{math|ε}} there is a positive number {{math|δ}} such that: if {{mvar|A}} is a Borel set with Lebesgue measure less than {{math|δ}}, then the probability of {{mvar|X}} being valued in {{mvar|A}} is less than {{math|ε}}.
* for any positive number {{math|ε}} there is a positive number {{math|δ}} such that: if {{mvar|A}} is a Borel set with Lebesgue measure less than {{math|δ}}, then the probability of {{mvar|X}} being valued in {{mvar|A}} is less than {{math|ε}}.
These conditions are all equivalent, although this is nontrivial to establish.{{sfnm|1a1=Billingsley|1y=1995|1loc=Theorems 31.7 and 31.8 and p. 422}} In this definition, {{mvar|f}} is called the ''probability density function'' of {{mvar|X}} (relative to Lebesgue measure). According to the change-of-variables formula for Lebesgue integration,{{sfnm|1a1=Billingsley|1y=1995|1loc=Theorem 16.13}} combined with the [[law of the unconscious statistician]],{{sfnm|1a1=Billingsley|1y=1995|1loc=Theorem 16.11}} it follows that
These conditions are all equivalent, although this is nontrivial to establish.{{sfnm|1a1=Billingsley|1y=1995|1loc=Theorems 31.7 and 31.8 and p. 422}} In this definition, {{mvar|f}} is called the ''probability density function'' of {{mvar|X}} (relative to Lebesgue measure). According to the change-of-variables formula for Lebesgue integration,{{sfnm|1a1=Billingsley|1y=1995|1loc=Theorem 16.13}} combined with the [[law of the unconscious statistician]],{{sfnm|1a1=Billingsley|1y=1995|1loc=Theorem 16.11}} it follows that
:<math>\operatorname{E}[X]\equiv\int_\Omega X\,d\operatorname{P}=\int_{\mathbb{R}}xf(x)\,dx</math>
<math display="block">\operatorname{E}[X] \equiv \int_\Omega X\,d\operatorname{P} = \int_\Reals x f(x)\, dx</math>
for any absolutely continuous random variable {{mvar|X}}. The above discussion of continuous random variables is thus a special case of the general Lebesgue theory, due to the fact that every piecewise-continuous function is measurable.
for any absolutely continuous random variable {{mvar|X}}. The above discussion of continuous random variables is thus a special case of the general Lebesgue theory, due to the fact that every piecewise-continuous function is measurable.

[[File:Roland Uhl 2023 Charakterisierung des Erwartungswertes Bild1.svg|upright=1.15|frameless|right|border|Expected value {{mvar|μ}} and median {{mvar|𝑚}}]]
The expected value of any real-valued random variable <math>X</math> can also be defined on the graph of its [[cumulative distribution function]] <math>F</math> by a nearby equality of areas. In fact, <math>\operatorname{E}[X] = \mu</math> with a real number <math>\mu</math> if and only if the two surfaces in the <math>x</math>-<math>y</math>-plane, described by
<math display="block">
x \le \mu, \;\, 0\le y \le F(x) \quad\text{or}\quad x \ge \mu, \;\, F(x) \le y \le 1
</math>
respectively, have the same finite area, i.e. if
<math display="block">
\int_{-\infty}^\mu F(x)\,dx = \int_\mu^\infty \big(1 - F(x)\big)\,dx
</math>
and both [[improper integral|improper Riemann integrals]] converge. Finally, this is equivalent to the representation
{{anchor|EX as difference of integrals}}
<math display="block">
\operatorname{E}[X]
= \int_0^\infty \bigl(1 - F(x)\bigr) \, dx - \int_{-\infty}^0 F(x) \, dx,
</math>
also with convergent integrals.<ref>{{cite book |last1=Uhl |first1=Roland |title=Charakterisierung des Erwartungswertes am Graphen der Verteilungsfunktion |date=2023 |publisher=Technische Hochschule Brandenburg |doi=10.25933/opus4-2986 |doi-access=free |url=https://opus4.kobv.de/opus4-fhbrb/files/2986/Uhl2023.pdf}} pp. 2–4.</ref>


===Infinite expected values===
===Infinite expected values===
Expected values as defined above are automatically finite numbers. However, in many cases it is fundamental to be able to consider expected values of {{math|±∞}}. This is intuitive, for example, in the case of the [[St. Petersburg paradox]], in which one considers a random variable with possible outcomes {{math|''x''<sub>''i''</sub> {{=}} 2<sup>''i''</sup>}}, with associated probabilities {{math|''p''<sub>''i''</sub> {{=}} 2<sup>−''i''</sup>}}, for {{mvar|i}} ranging over all positive integers. According to the summation formula in the case of random variables with countably many outcomes, one has
Expected values as defined above are automatically finite numbers. However, in many cases it is fundamental to be able to consider expected values of {{math|±∞}}. This is intuitive, for example, in the case of the [[St. Petersburg paradox]], in which one considers a random variable with possible outcomes {{math|''x''<sub>''i''</sub> {{=}} 2<sup>''i''</sup>}}, with associated probabilities {{math|''p''<sub>''i''</sub> {{=}} 2<sup>−''i''</sup>}}, for {{mvar|i}} ranging over all positive integers. According to the summation formula in the case of random variables with countably many outcomes, one has
<math display="block"> \operatorname{E}[X]= \sum_{i=1}^\infty x_i\,p_i =2\cdot \frac{1}{2}+4\cdot\frac{1}{4} + 8\cdot\frac{1}{8}+ 16\cdot\frac{1}{16}+ \cdots = 1 + 1 + 1 + 1 + \cdots. </math>
<math display="block"> \operatorname{E}[X]= \sum_{i=1}^\infty x_i\,p_i = 2\cdot \frac{1}{2}+4\cdot\frac{1}{4} + 8\cdot\frac{1}{8}+ 16\cdot\frac{1}{16}+ \cdots = 1 + 1 + 1 + 1 + \cdots.</math>
It is natural to say that the expected value equals {{math|+∞}}.
It is natural to say that the expected value equals {{math|+∞}}.


There is a rigorous mathematical theory underlying such ideas, which is often taken as part of the definition of the Lebesgue integral.{{sfnm|1a1=Billingsley|1y=1995|1loc=Section 15}} The first fundamental observation is that, whichever of the above definitions are followed, any ''nonnegative'' random variable whatsoever can be given an unambiguous expected value; whenever absolute convergence fails, then the expected value can be defined as {{math|+∞}}. The second fundamental observation is that any random variable can be written as the difference of two nonnegative random variables. Given a random variable {{mvar|X}}, one defines the [[positive and negative parts]] by {{math|''X''<sup> +</sup> {{=}} max(''X'', 0)}} and {{math|''X''<sup> −</sup> {{=}} −min(''X'', 0)}}. These are nonnegative random variables, and it can be directly checked that {{math|''X'' {{=}} ''X''<sup> +</sup> − ''X''<sup> −</sup>}}. Since {{math|E[''X''<sup> +</sup>]}} and {{math|E[''X''<sup> −</sup>]}} are both then defined as either nonnegative numbers or {{math|+∞}}, it is then natural to define:
There is a rigorous mathematical theory underlying such ideas, which is often taken as part of the definition of the Lebesgue integral.{{sfnm|1a1=Billingsley|1y=1995|1loc=Section 15}} The first fundamental observation is that, whichever of the above definitions are followed, any ''nonnegative'' random variable whatsoever can be given an unambiguous expected value; whenever absolute convergence fails, then the expected value can be defined as {{math|+∞}}. The second fundamental observation is that any random variable can be written as the difference of two nonnegative random variables. Given a random variable {{mvar|X}}, one defines the [[positive and negative parts]] by {{math|''X''<sup> +</sup> {{=}} max(''X'', 0)}} and {{math|''X''<sup> −</sup> {{=}} −min(''X'', 0)}}. These are nonnegative random variables, and it can be directly checked that {{math|''X'' {{=}} ''X''<sup> +</sup> − ''X''<sup> −</sup>}}. Since {{math|E[''X''<sup> +</sup>]}} and {{math|E[''X''<sup> −</sup>]}} are both then defined as either nonnegative numbers or {{math|+∞}}, it is then natural to define:
<math display="block">
<math display="block">
\operatorname{E}[X] = \begin{cases} \operatorname{E}[X^+] - \operatorname{E}[X^-] & \text{if } \operatorname{E}[X^+] < \infty \text{ and } \operatorname{E}[X^-] < \infty;\\
\operatorname{E}[X] = \begin{cases}
\operatorname{E}[X^+] - \operatorname{E}[X^-] & \text{if } \operatorname{E}[X^+] < \infty \text{ and } \operatorname{E}[X^-] < \infty;\\
+\infty & \text{if } \operatorname{E}[X^+] = \infty \text{ and } \operatorname{E}[X^-] < \infty;\\
+\infty & \text{if } \operatorname{E}[X^+] = \infty \text{ and } \operatorname{E}[X^-] < \infty;\\
-\infty & \text{if } \operatorname{E}[X^+] < \infty \text{ and } \operatorname{E}[X^-] = \infty;\\
-\infty & \text{if } \operatorname{E}[X^+] < \infty \text{ and } \operatorname{E}[X^-] = \infty;\\
\text{undefined} & \text{if } \operatorname{E}[X^+] = \infty \text{ and } \operatorname{E}[X^-] = \infty.
\text{undefined} & \text{if } \operatorname{E}[X^+] = \infty \text{ and } \operatorname{E}[X^-] = \infty.
\end{cases}
\end{cases}
</math>
</math>

According to this definition, {{math|E[''X'']}} exists and is finite if and only if {{math|E[''X''<sup> +</sup>]}} and {{math|E[''X''<sup> −</sup>]}} are both finite. Due to the formula {{math|{{!}}''X''{{!}} {{=}} ''X''<sup> +</sup> + ''X''<sup> −</sup>}}, this is the case if and only if {{math|E{{!}}''X''{{!}}}} is finite, and this is equivalent to the absolute convergence conditions in the definitions above. As such, the present considerations do not define finite expected values in any cases not previously considered; they are only useful for infinite expectations.
According to this definition, {{math|E[''X'']}} exists and is finite if and only if {{math|E[''X''<sup> +</sup>]}} and {{math|E[''X''<sup> −</sup>]}} are both finite. Due to the formula {{math|{{!}}''X''{{!}} {{=}} ''X''<sup> +</sup> + ''X''<sup> −</sup>}}, this is the case if and only if {{math|E{{!}}''X''{{!}}}} is finite, and this is equivalent to the absolute convergence conditions in the definitions above. As such, the present considerations do not define finite expected values in any cases not previously considered; they are only useful for infinite expectations.
* In the case of the St. Petersburg paradox, one has {{math|''X''<sup> −</sup> {{=}} 0}} and so {{math|E[''X''] {{=}} +∞}} as desired.
* In the case of the St. Petersburg paradox, one has {{math|''X''<sup> −</sup> {{=}} 0}} and so {{math|E[''X''] {{=}} +∞}} as desired.
* Suppose the random variable {{mvar|X}} takes values {{math|1, −2,3, −4, ...}} with respective probabilities {{math|6π<sup>−2</sup>, 6(2π)<sup>−2</sup>, 6(3π)<sup>−2</sup>, 6(4π)<sup>−2</sup>, ...}}. Then it follows that {{math|''X''<sup> +</sup>}} takes value {{math|2''k''−1}} with probability {{math|6((2''k''−1)π)<sup>−2</sup>}} for each positive integer {{mvar|k}}, and takes value {{math|0}} with remaining probability. Similarly, {{math|''X''<sup> −</sup>}} takes value {{math|2''k''}} with probability {{math|6(2''k''π)<sup>−2</sup>}} for each positive integer {{mvar|k}} and takes value {{math|0}} with remaining probability. Using the definition for non-negative random variables, one can show that both {{math|E[''X''<sup> +</sup>] {{=}} ∞}} and {{math|E[''X''<sup> −</sup>] {{=}} ∞}} (see [[harmonic series (mathematics)|Harmonic series]]). Hence, in this case the expectation of {{mvar|X}} is undefined.
* Suppose the random variable {{mvar|X}} takes values {{math|1, −2,3, −4, ...}} with respective probabilities {{math|6π<sup>−2</sup>, 6(2π)<sup>−2</sup>, 6(3π)<sup>−2</sup>, 6(4π)<sup>−2</sup>, ...}}. Then it follows that {{math|''X''<sup> +</sup>}} takes value {{math|2''k''−1}} with probability {{math|6((2''k''−1)π)<sup>−2</sup>}} for each positive integer {{mvar|k}}, and takes value {{math|0}} with remaining probability. Similarly, {{math|''X''<sup> −</sup>}} takes value {{math|2''k''}} with probability {{math|6(2''k''π)<sup>−2</sup>}} for each positive integer {{mvar|k}} and takes value {{math|0}} with remaining probability. Using the definition for non-negative random variables, one can show that both {{math|E[''X''<sup> +</sup>] {{=}} ∞}} and {{math|E[''X''<sup> −</sup>] {{=}} ∞}} (see [[harmonic series (mathematics)|Harmonic series]]). Hence, in this case the expectation of {{mvar|X}} is undefined.
* Similarly, the Cauchy distribution, as discussed above, has undefined expectation.
* Similarly, the Cauchy distribution, as discussed above, has undefined expectation.


== Expected values of common distributions ==
==Expected values of common distributions==
The following table gives the expected values of some commonly occurring [[probability distribution]]s. The third column gives the expected values both in the form immediately given by the definition, as well as in the simplified form obtained by computation therefrom. The details of these computations, which are not always straightforward, can be found in the indicated references.
The following table gives the expected values of some commonly occurring [[probability distribution]]s. The third column gives the expected values both in the form immediately given by the definition, as well as in the simplified form obtained by computation therefrom. The details of these computations, which are not always straightforward, can be found in the indicated references.
{| class="wikitable"
{| class="wikitable"
Line 152: Line 159:
|[[Uniform distribution (continuous)|Uniform]]{{sfnm|1a1=Casella|1a2=Berger|1y=2001|1p=99|2a1=Ross|2y=2019|2loc=Example 2.20}}
|[[Uniform distribution (continuous)|Uniform]]{{sfnm|1a1=Casella|1a2=Berger|1y=2001|1p=99|2a1=Ross|2y=2019|2loc=Example 2.20}}
|<math>X\sim U(a,b)</math>
|<math>X\sim U(a,b)</math>
|<math>\int_a^b \frac{x}{b-a}\,dx=\frac{a+b}{2} </math>
|<math>\int_a^b \frac{x}{b-a}\,dx=\frac{a+b}{2}</math>
|-
|-
|[[Exponential distribution|Exponential]]{{sfnm|1a1=Billingsley|1y=1995|1loc=Example 21.3|2a1=Casella|2a2=Berger|2y=2001|2loc=Example 2.2.2|3a1=Ross|3y=2019|3loc=Example 2.21}}
|[[Exponential distribution|Exponential]]{{sfnm|1a1=Billingsley|1y=1995|1loc=Example 21.3|2a1=Casella|2a2=Berger|2y=2001|2loc=Example 2.2.2|3a1=Ross|3y=2019|3loc=Example 2.21}}
Line 160: Line 167:
|[[Normal distribution|Normal]]{{sfnm|1a1=Casella|1a2=Berger|1y=2001|1p=103|2a1=Ross|2y=2019|2loc=Example 2.22}}
|[[Normal distribution|Normal]]{{sfnm|1a1=Casella|1a2=Berger|1y=2001|1p=103|2a1=Ross|2y=2019|2loc=Example 2.22}}
|<math>X\sim N(\mu,\sigma^2)</math>
|<math>X\sim N(\mu,\sigma^2)</math>
|<math>\frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^\infty xe^{-(x-\mu)^2/2\sigma^2}\,dx=\mu</math>
|<math>\frac{1}{\sqrt{2\pi\sigma^2}} \int_{-\infty}^\infty x\, e^{-\frac12\left(\frac{x-\mu}{\sigma}\right)^2} \,dx = \mu</math>
|-
|-
|[[Standard normal|Standard Normal]]{{sfnm|1a1=Billingsley|1y=1995|1loc=Example 21.1|2a1=Casella|2a2=Berger|2y=2001|2p=103}}
|[[Standard normal|Standard Normal]]{{sfnm|1a1=Billingsley|1y=1995|1loc=Example 21.1|2a1=Casella|2a2=Berger|2y=2001|2p=103}}
Line 168: Line 175:
|[[Pareto distribution|Pareto]]{{sfnm|1a1=Johnson|1a2=Kotz|1a3=Balakrishnan|1y=1994|1loc=Chapter 20}}
|[[Pareto distribution|Pareto]]{{sfnm|1a1=Johnson|1a2=Kotz|1a3=Balakrishnan|1y=1994|1loc=Chapter 20}}
|<math>X\sim \mathrm{Par}(\alpha, k)</math>
|<math>X\sim \mathrm{Par}(\alpha, k)</math>
|<math>\int_k^\infty\alpha k^\alpha x^{-\alpha}\,dx=\begin{cases}\frac{\alpha k}{\alpha-1}&\alpha>1\\ \infty&0 \leq \alpha \leq 1.\end{cases}</math>
|<math>\int_k^\infty\alpha k^\alpha x^{-\alpha}\,dx
= \begin{cases} \frac{\alpha k}{\alpha-1} &\text{if } \alpha > 1\\ \infty &\text{if } 0 < \alpha \leq 1\end{cases}</math>
|-
|-
|[[Cauchy distribution|Cauchy]]{{sfnm|1a1=Feller|1y=1971|1loc=Section II.4}}
|[[Cauchy distribution|Cauchy]]{{sfnm|1a1=Feller|1y=1971|1loc=Section II.4}}
Line 175: Line 183:
|}
|}


== Properties ==
==Properties==
The basic properties below (and their names in bold) replicate or follow immediately from those of [[Lebesgue integral]]. Note that the letters "a.s." stand for "[[almost surely]]"—a central property of the Lebesgue integral. Basically, one says that an inequality like <math>X \geq 0 </math> is true almost surely, when the probability measure attributes zero-mass to the complementary event <math> \left\{ X < 0 \right\} </math>.
The basic properties below (and their names in bold) replicate or follow immediately from those of [[Lebesgue integral]]. Note that the letters "a.s." stand for "[[almost surely]]"—a central property of the Lebesgue integral. Basically, one says that an inequality like <math>X \geq 0</math> is true almost surely, when the probability measure attributes zero-mass to the complementary event <math>\left\{ X < 0 \right\}.</math>
* '''Non-negativity:''' If <math>X \geq 0 </math> (a.s.), then <math> \operatorname{E}[ X] \geq 0</math>.
* Non-negativity: If <math>X \geq 0</math> (a.s.), then <math>\operatorname{E}[X] \geq 0.</math>
* {{vanchor|Linearity}} of expectation:<ref name=":1">{{Cite web|last=Weisstein|first=Eric W.|title=Expectation Value|url=https://mathworld.wolfram.com/ExpectationValue.html|access-date=2020-09-11|website=mathworld.wolfram.com|language=en}}</ref> The expected value operator (or ''expectation operator'') <math>\operatorname{E}[\cdot]</math> is [[linear operator|linear]] in the sense that, for any random variables <math>X</math> and <math>Y,</math> and a constant <math>a,</math> <math display="block">\begin{align}
{{anchor|Linearity}}
* '''Linearity of expectation:'''<ref name=":1">{{Cite web|last=Weisstein|first=Eric W.|title=Expectation Value|url=https://mathworld.wolfram.com/ExpectationValue.html|access-date=2020-09-11|website=mathworld.wolfram.com|language=en}}</ref> The expected value operator (or '''expectation operator''') <math>\operatorname{E}[\cdot]</math> is [[linear operator|linear]] in the sense that, for any random variables <math>X</math> and <math>Y</math>, and a constant <math>a</math>, <math display="block">\begin{align}
\operatorname{E}[X + Y] &= \operatorname{E}[X] + \operatorname{E}[Y], \\
\operatorname{E}[X + Y] &= \operatorname{E}[X] + \operatorname{E}[Y], \\
\operatorname{E}[aX] &= a \operatorname{E}[X],
\operatorname{E}[aX] &= a \operatorname{E}[X],
\end{align}
\end{align}
</math> whenever the right-hand side is well-defined. By [[mathematical induction|induction]], this means that the expected value of the sum of any finite number of random variables is the sum of the expected values of the individual random variables, and the expected value scales linearly with a multiplicative constant. Symbolically, for <math>N</math> random variables <math>X_{i}</math> and constants <math>a_{i} (1\leq i \leq N),</math> we have <math display="inline"> \operatorname{E}\left[\sum_{i=1}^{N}a_{i}X_{i}\right] = \sum_{i=1}^{N}a_{i}\operatorname{E}[X_{i}].</math> If we think of the set of random variables with finite expected value as forming a vector space, then the linearity of expectation implies that the expected value is a [[linear form]] on this vector space.
</math>
* Monotonicity: If <math>X\leq Y</math> [[almost surely|(a.s.)]], and both <math>\operatorname{E}[X]</math> and <math>\operatorname{E}[Y]</math> exist, then <math>\operatorname{E}[X]\leq\operatorname{E}[Y].</math> {{pb}} Proof follows from the linearity and the non-negativity property for <math>Z=Y-X,</math> since <math>Z\geq 0</math> (a.s.).
:whenever the right-hand side is well-defined. By [[mathematical induction|induction]], this means that the expected value of the sum of any finite number of random variables is the sum of the expected values of the individual random variables, and the expected value scales linearly with a multiplicative constant. Symbolically, for <math>N</math> random variables <math>X_{i}</math> and constants <math>a_{i} (1\leq i \leq N)</math>, we have <math display="inline"> \operatorname{E}\left[\sum_{i=1}^{N}a_{i}X_{i}\right] = \sum_{i=1}^{N}a_{i}\operatorname{E}[X_{i}]
* Non-degeneracy: If <math>\operatorname{E}[|X|]=0,</math> then <math>X=0</math> (a.s.).
</math>. If we think of the set of random variables with finite expected value as forming a vector space, then the linearity of expectation implies that the expected value is a [[linear form]] on this vector space.
* '''Monotonicity:''' If <math>X\leq Y</math> [[almost surely|(a.s.)]], and both <math>\operatorname{E}[X]</math> and <math>\operatorname{E}[Y]</math> exist, then <math>\operatorname{E}[X]\leq\operatorname{E}[Y]</math>. {{pb}} Proof follows from the linearity and the non-negativity property for <math>Z=Y-X</math>, since <math>Z\geq 0</math> (a.s.).
* If <math>X = Y</math> [[almost surely|(a.s.)]], then <math>\operatorname{E}[X] = \operatorname{E}[ Y].</math> In other words, if X and Y are random variables that take different values with probability zero, then the expectation of X will equal the expectation of Y.
* If <math>X = c</math> [[almost surely|(a.s.)]] for some real number {{mvar|c}}, then <math>\operatorname{E}[X] = c.</math> In particular, for a random variable <math>X</math> with well-defined expectation, <math>\operatorname{E}[\operatorname{E}[X]] = \operatorname{E}[X].</math> A well defined expectation implies that there is one number, or rather, one constant that defines the expected value. Thus follows that the expectation of this constant is just the original expected value.
* '''Non-degeneracy:''' If <math>\operatorname{E}[|X|]=0</math>, then <math>X=0</math> (a.s.).
* As a consequence of the formula {{math|1={{abs|''X''}} = ''X''{{isup|+}} + ''X''{{isup|−}}}} as discussed above, together with the [[triangle inequality]], it follows that for any random variable <math>X</math> with well-defined expectation, one has <math>|\operatorname{E}[X]| \leq \operatorname{E}|X|.</math>
* If <math>X = Y</math> [[almost surely|(a.s.)]], then <math> \operatorname{E}[ X] = \operatorname{E}[ Y]</math>. In other words, if X and Y are random variables that take different values with probability zero, then the expectation of X will equal the expectation of Y.
* If <math>X=c</math> [[almost surely|(a.s.)]] for some real number {{mvar|c}}, then <math>\operatorname{E}[X] = c</math>. In particular, for a random variable <math>X</math> with well-defined expectation, <math>\operatorname{E}[\operatorname{E}[X]] = \operatorname{E}[X]</math>. A well defined expectation implies that there is one number, or rather, one constant that defines the expected value. Thus follows that the expectation of this constant is just the original expected value.
* As a consequence of the formula {{math|{{!}}''X''{{!}} {{=}} ''X''<sup> +</sup> + ''X''<sup> −</sup>}} as discussed above, together with the [[triangle inequality]], it follows that for any random variable <math>X</math> with well-defined expectation, one has <math> |\operatorname{E}[X]| \leq \operatorname{E}|X| </math>.
* Let {{math|'''1'''<sub>''A''</sub>}} denote the [[indicator function]] of an [[Event (probability theory)|event]] {{mvar|A}}, then {{math|E['''1'''<sub>''A''</sub>]}} is given by the probability of {{mvar|A}}. This is nothing but a different way of stating the expectation of a [[Bernoulli random variable]], as calculated in the table above.
* Let {{math|'''1'''<sub>''A''</sub>}} denote the [[indicator function]] of an [[Event (probability theory)|event]] {{mvar|A}}, then {{math|E['''1'''<sub>''A''</sub>]}} is given by the probability of {{mvar|A}}. This is nothing but a different way of stating the expectation of a [[Bernoulli random variable]], as calculated in the table above.
* Formulas in terms of CDF: If <math>F(x)</math> is the [[cumulative distribution function]] of a random variable {{mvar|X}}, then <math display="block" display="block">\operatorname{E}[X] = \int_{-\infty}^\infty x\,dF(x),</math> where the values on both sides are well defined or not well defined simultaneously, and the integral is taken in the sense of [[Lebesgue-Stieltjes integral|Lebesgue-Stieltjes]]. As a consequence of [[integration by parts]] as applied to this representation of {{math|E[''X'']}}, it can be proved that <math display="block"> \operatorname{E}[X] = \int_0^\infty (1-F(x))\,dx - \int^0_{-\infty} F(x)\,dx,</math> with the integrals taken in the sense of Lebesgue.{{sfnm|1a1=Feller|1y=1971|1loc=Section V.6}} As a special case, for any random variable {{mvar|X}} valued in the nonnegative integers {{math|{0, 1, 2, 3, ...}<nowiki/>}}, one has <math display="block"> \operatorname{E}[X] = \sum _{n=0}^\infty \Pr(X>n), </math> where {{mvar|P}} denotes the underlying probability measure.
* <li>'''Formulas in terms of CDF:''' If <math>F(x)</math> is the [[cumulative distribution function]] of a random variable {{mvar|X}}, then
* Non-multiplicativity: In general, the expected value is not multiplicative, i.e. <math>\operatorname{E}[XY]</math> is not necessarily equal to <math>\operatorname{E}[X]\cdot \operatorname{E}[Y].</math> If <math>X</math> and <math>Y</math> are [[independent random variables|independent]], then one can show that <math>\operatorname{E}[XY]=\operatorname{E}[X] \operatorname{E}[Y].</math> If the random variables are [[Dependent and independent variables|dependent]], then generally <math>\operatorname{E}[XY] \neq \operatorname{E}[X] \operatorname{E}[Y],</math> although in special cases of dependency the equality may hold.
:<math display="block">
* [[Law of the unconscious statistician]]: The expected value of a measurable function of <math>X,</math> <math>g(X),</math> given that <math>X</math> has a probability density function <math>f(x),</math> is given by the [[inner product]] of <math>f</math> and <math>g</math>:<ref name=":1" /> <math display="block">\operatorname{E}[g(X)] = \int_{\R} g(x) f(x)\, dx .</math> This formula also holds in multidimensional case, when <math>g</math> is a function of several random variables, and <math>f</math> is their [[Probability density function#Densities associated with multiple variables|joint density]].<ref name=":1" />{{sfnm|1a1=Papoulis|1a2=Pillai|1y=2002|1loc=Section 6-4}}
\operatorname{E}[X] = \int_{-\infty}^\infty x\,dF(x),
</math> where the values on both sides are well defined or not well defined simultaneously, and the integral is taken in the sense of [[Lebesgue-Stieltjes integral|Lebesgue-Stieltjes]]. As a consequence of [[integration by parts]] as applied to this representation of {{math|E[''X'']}}, it can be proved that <math display="block"> \operatorname{E}[X] = \int_0^\infty (1-F(x))\,dx - \int^0_{-\infty} F(x)\,dx,</math> with the integrals taken in the sense of Lebesgue.{{sfnm|1a1=Feller|1y=1971|1loc=Section V.6}} As a special case, for any random variable {{mvar|X}} valued in the nonnegative integers {{math|{0, 1, 2, 3, ...}}}, one has <math display="block"> \operatorname{E}[X]=\sum _{n=0}^\infty \operatorname{P}(X>n),</math>
:where {{mvar|P}} denotes the underlying probability measure.
* '''Non-multiplicativity:''' In general, the expected value is not multiplicative, i.e. <math>\operatorname{E}[XY]</math> is not necessarily equal to <math>\operatorname{E}[X]\cdot \operatorname{E}[Y]</math>. If <math>X</math> and <math>Y</math> are [[independent random variables|independent]], then one can show that <math>\operatorname{E}[XY]=\operatorname{E}[X] \operatorname{E}[Y]</math>. If the random variables are [[Dependent and independent variables|dependent]], then generally <math>\operatorname{E}[XY] \neq \operatorname{E}[X] \operatorname{E}[Y]</math>, although in special cases of dependency the equality may hold.
* '''[[Law of the unconscious statistician]]:''' The expected value of a measurable function of <math>X</math>, <math>g(X)</math>, given that <math>X</math> has a probability density function <math>f(x)</math>, is given by the [[inner product]] of <math>f</math> and <math>g</math>:<ref name=":1" /> <math display="block">\operatorname{E}[g(X)] = \int_{\R} g(x) f(x)\, dx .</math> This formula also holds in multidimensional case, when <math>g</math> is a function of several random variables, and <math>f</math> is their [[Probability density function#Densities associated with multiple variables|joint density]].<ref name=":1" />{{sfnm|1a1=Papoulis|1a2=Pillai|1y=2002|1loc=Section 6-4}}


=== Inequalities ===
=== Inequalities===
[[Concentration inequalities]] control the likelihood of a random variable taking on large values. [[Markov's inequality]] is among the best-known and simplest to prove: for a ''nonnegative'' random variable {{mvar|X}} and any positive number {{mvar|a}}, it states that{{sfnm|1a1=Feller|1y=1968|1loc=Section IX.6|2a1=Feller|2y=1971|2loc=Section V.7|3a1=Papoulis|3a2=Pillai|3y=2002|3loc=Section 5-4|4a1=Ross|4y=2019|4loc=Section 2.8}} <math display="block">
[[Concentration inequalities]] control the likelihood of a random variable taking on large values. [[Markov's inequality]] is among the best-known and simplest to prove: for a ''nonnegative'' random variable {{mvar|X}} and any positive number {{mvar|a}}, it states that{{sfnm|1a1=Feller|1y=1968|1loc=Section IX.6|2a1=Feller|2y=1971|2loc=Section V.7|3a1=Papoulis|3a2=Pillai|3y=2002|3loc=Section 5-4|4a1=Ross|4y=2019|4loc=Section 2.8}} <math display="block">
\operatorname{P}(X\geq a)\leq\frac{\operatorname{E}[X]}{a}.
\operatorname{P}(X\geq a)\leq\frac{\operatorname{E}[X]}{a}.
</math>
</math>

If {{mvar|X}} is any random variable with finite expectation, then Markov's inequality may be applied to the random variable {{math|{{!}}''X''−E[''X'']{{!}}<sup>2</sup>}} to obtain [[Chebyshev's inequality]] <math display="block">
If {{mvar|X}} is any random variable with finite expectation, then Markov's inequality may be applied to the random variable {{math|{{!}}''X''−E[''X'']{{!}}<sup>2</sup>}} to obtain [[Chebyshev's inequality]] <math display="block">
\operatorname{P}(|X-\text{E}[X]|\geq a)\leq\frac{\operatorname{Var}[X]}{a^2},
\operatorname{P}(|X-\text{E}[X]|\geq a)\leq\frac{\operatorname{Var}[X]}{a^2},
Line 210: Line 212:


The following three inequalities are of fundamental importance in the field of [[mathematical analysis]] and its applications to probability theory.
The following three inequalities are of fundamental importance in the field of [[mathematical analysis]] and its applications to probability theory.
* [[Jensen's inequality]]: Let {{math|''f'': }} be a [[convex function]] and {{mvar|X}} a random variable with finite expectation. Then{{sfnm|1a1=Feller|1y=1971|1loc=Section V.8}} <math display="block">
* [[Jensen's inequality]]: Let {{math|''f'': '''R''''''R'''}} be a [[convex function]] and {{mvar|X}} a random variable with finite expectation. Then{{sfnm|1a1=Feller|1y=1971|1loc=Section V.8}} <math display="block">
f(\operatorname{E}(X)) \leq \operatorname{E} (f(X)).
f(\operatorname{E}(X)) \leq \operatorname{E} (f(X)).
</math> Part of the assertion is that the [[positive and negative parts|negative part]] of {{math|''f''(''X'')}} has finite expectation, so that the right-hand side is well-defined (possibly infinite). Convexity of {{mvar|f}} can be phrased as saying that the output of the weighted average of ''two'' inputs under-estimates the same weighted average of the two outputs; Jensen's inequality extends this to the setting of completely general weighted averages, as represented by the expectation. In the special case that {{math|1=''f''(''x'') = {{abs|''x''}}<sup>''t''/''s''</sup>}} for positive numbers {{math|''s'' &lt; ''t''}}, one obtains the Lyapunov inequality{{sfnm|1a1=Billingsley|1y=1995|1pp=81,277}} <math display="block">
</math>
\left(\operatorname{E}|X|^s\right)^{1/s} \leq \left(\operatorname{E}|X|^t\right)^{1/t}.
:Part of the assertion is that the [[positive and negative parts|negative part]] of {{math|''f''(''X'')}} has finite expectation, so that the right-hand side is well-defined (possibly infinite). Convexity of {{mvar|f}} can be phrased as saying that the output of the weighted average of ''two'' inputs under-estimates the same weighted average of the two outputs; Jensen's inequality extends this to the setting of completely general weighted averages, as represented by the expectation. In the special case that {{math|''f''(''x'') {{=}} {{!}}''x''{{!}}<sup>''t''/''s''</sup>}} for positive numbers {{math|''s'' &lt; ''t''}}, one obtains the Lyapunov inequality{{sfnm|1a1=Billingsley|1y=1995|1pp=81,277}} <math display="block">
</math> This can also be proved by the Hölder inequality.{{sfnm|1a1=Feller|1y=1971|1loc=Section V.8}} In measure theory, this is particularly notable for proving the inclusion {{math|L<sup>''s''</sup> ⊂ L<sup>''t''</sup>}} of [[Lp space|{{math|L<sup>''p''</sup> spaces}}]], in the special case of [[probability space]]s.
\left(\operatorname{E}|X|^s\right)^{1/s}\leq\left(\operatorname{E}|X|^t\right)^{1/t}.
</math>
:This can also be proved by the Hölder inequality.{{sfnm|1a1=Feller|1y=1971|1loc=Section V.8}} In measure theory, this is particularly notable for proving the inclusion {{math|L<sup>''s''</sup> ⊂ L<sup>''t''</sup>}} of [[Lp space|{{math|L<sup>''p''</sup> spaces}}]], in the special case of [[probability space]]s.
* [[Hölder's inequality]]: if {{math|''p'' &gt; 1}} and {{math|''q'' &gt; 1}} are numbers satisfying {{math|''p''<sup> −1</sup> + ''q''<sup> −1</sup> {{=}} 1}}, then <math display="block">
* [[Hölder's inequality]]: if {{math|''p'' &gt; 1}} and {{math|''q'' &gt; 1}} are numbers satisfying {{math|''p''<sup> −1</sup> + ''q''<sup> −1</sup> {{=}} 1}}, then <math display="block">
\operatorname{E}|XY|\leq(\operatorname{E}|X|^p)^{1/p}(\operatorname{E}|Y|^q)^{1/q}.
\operatorname{E}|XY|\leq(\operatorname{E}|X|^p)^{1/p}(\operatorname{E}|Y|^q)^{1/q}.
</math> for any random variables {{mvar|X}} and {{mvar|Y}}.{{sfnm|1a1=Feller|1y=1971|1loc=Section V.8}} The special case of {{math|''p'' {{=}} ''q'' {{=}} 2}} is called the [[Cauchy–Schwarz inequality]], and is particularly well-known.{{sfnm|1a1=Feller|1y=1971|1loc=Section V.8}}
</math>
: for any random variables {{mvar|X}} and {{mvar|Y}}.{{sfnm|1a1=Feller|1y=1971|1loc=Section V.8}} The special case of {{math|''p'' {{=}} ''q'' {{=}} 2}} is called the [[Cauchy–Schwarz inequality]], and is particularly well-known.{{sfnm|1a1=Feller|1y=1971|1loc=Section V.8}}
* [[Minkowski inequality]]: given any number {{math|''p'' ≥ 1}}, for any random variables {{mvar|X}} and {{mvar|Y}} with {{math|E{{!}}''X''{{!}}<sup>''p''</sup>}} and {{math|E{{!}}''Y''{{!}}<sup>''p''</sup>}} both finite, it follows that {{math|E{{!}}''X'' + ''Y''{{!}}<sup>''p''</sup>}} is also finite and{{sfnm|1a1=Billingsley|1y=1995|1loc=Section 19}} <math display="block">
* [[Minkowski inequality]]: given any number {{math|''p'' ≥ 1}}, for any random variables {{mvar|X}} and {{mvar|Y}} with {{math|E{{!}}''X''{{!}}<sup>''p''</sup>}} and {{math|E{{!}}''Y''{{!}}<sup>''p''</sup>}} both finite, it follows that {{math|E{{!}}''X'' + ''Y''{{!}}<sup>''p''</sup>}} is also finite and{{sfnm|1a1=Billingsley|1y=1995|1loc=Section 19}} <math display="block">
\Bigl(\operatorname{E}|X+Y|^p\Bigr)^{1/p}\leq\Bigl(\operatorname{E}|X|^p\Bigr)^{1/p}+\Bigl(\operatorname{E}|Y|^p\Bigr)^{1/p}.
\Bigl(\operatorname{E}|X+Y|^p\Bigr)^{1/p}\leq\Bigl(\operatorname{E}|X|^p\Bigr)^{1/p}+\Bigl(\operatorname{E}|Y|^p\Bigr)^{1/p}.
Line 226: Line 225:
The Hölder and Minkowski inequalities can be extended to general [[measure space]]s, and are often given in that context. By contrast, the Jensen inequality is special to the case of probability spaces.
The Hölder and Minkowski inequalities can be extended to general [[measure space]]s, and are often given in that context. By contrast, the Jensen inequality is special to the case of probability spaces.


=== Expectations under convergence of random variables ===
===Expectations under convergence of random variables===
In general, it is not the case that <math>\operatorname{E}[X_n] \to \operatorname{E}[X]</math> even if <math>X_n\to X</math> pointwise. Thus, one cannot interchange limits and expectation, without additional conditions on the random variables. To see this, let <math>U</math> be a random variable distributed uniformly on <math>[0,1]</math>. For <math>n\geq 1,</math> define a sequence of random variables
In general, it is not the case that <math>\operatorname{E}[X_n] \to \operatorname{E}[X]</math> even if <math>X_n\to X</math> pointwise. Thus, one cannot interchange limits and expectation, without additional conditions on the random variables. To see this, let <math>U</math> be a random variable distributed uniformly on <math>[0,1].</math> For <math>n\geq 1,</math> define a sequence of random variables
<math display="block">X_n = n \cdot \mathbf{1}\left\{ U \in \left(0,\tfrac{1}{n}\right)\right\},</math>
with <math>\mathbf{1}\{A\}</math> being the indicator function of the event <math>A.</math> Then, it follows that <math>X_n \to 0</math> pointwise. But, <math>\operatorname{E}[X_n] = n \cdot \Pr\left(U \in \left[ 0, \tfrac{1}{n}\right] \right) = n \cdot \tfrac{1}{n} = 1</math> for each <math>n.</math> Hence, <math>\lim_{n \to \infty} \operatorname{E}[X_n] = 1 \neq 0 = \operatorname{E}\left[ \lim_{n \to \infty} X_n \right].</math>


Analogously, for general sequence of random variables <math>\{ Y_n : n \geq 0\},</math> the expected value operator is not <math>\sigma</math>-additive, i.e.
:<math>X_n = n \cdot \mathbf{1}\left\{ U \in \left(0,\tfrac{1}{n}\right)\right\},</math>
<math display="block">\operatorname{E}\left[\sum^\infty_{n=0} Y_n\right] \neq \sum^\infty_{n=0}\operatorname{E}[Y_n].</math>


An example is easily obtained by setting <math>Y_0 = X_1</math> and <math>Y_n = X_{n+1} - X_n</math> for <math>n \geq 1,</math> where <math>X_n</math> is as in the previous example.
with <math>{\mathbf 1}\{A\}</math> being the indicator function of the event <math>A</math>. Then, it follows that <math>X_n \to 0</math> pointwise. But, <math>\operatorname{E}[X_n] = n \cdot \operatorname{P}\left(U \in \left[ 0, \tfrac{1}{n}\right] \right) = n \cdot \tfrac{1}{n} = 1</math> for each <math>n</math>. Hence, <math> \lim_{n \to \infty} \operatorname{E}[X_n] = 1 \neq 0 = \operatorname{E}\left[ \lim_{n \to \infty} X_n \right].</math>

Analogously, for general sequence of random variables <math>\{ Y_n : n \geq 0\}</math>, the expected value operator is not <math>\sigma</math>-additive, i.e.

:<math>\operatorname{E}\left[\sum^\infty_{n=0} Y_n\right] \neq \sum^\infty_{n=0}\operatorname{E}[Y_n].</math>

An example is easily obtained by setting <math>Y_0 = X_1</math> and <math>Y_n = X_{n+1} - X_n</math> for <math>n \geq 1</math>, where <math>X_n</math> is as in the previous example.


A number of convergence results specify exact conditions which allow one to interchange limits and expectations, as specified below.
A number of convergence results specify exact conditions which allow one to interchange limits and expectations, as specified below.
* [[Monotone convergence theorem]]: Let <math>\{X_n : n \geq 0\}</math> be a sequence of random variables, with <math>0 \leq X_n \leq X_{n+1}</math> (a.s) for each <math> n \geq 0</math>. Furthermore, let <math> X_n \to X </math> pointwise. Then, the monotone convergence theorem states that <math>\lim_n\operatorname{E}[X_n]=\operatorname{E}[X].</math> {{pb}} Using the monotone convergence theorem, one can show that expectation indeed satisfies countable additivity for non-negative random variables. In particular, let <math>\{X_i\}^\infty_{i=0}</math> be non-negative random variables. It follows from [[#Monotone convergence theorem|monotone convergence theorem]] that <math display="block">
* [[Monotone convergence theorem]]: Let <math>\{X_n : n \geq 0\}</math> be a sequence of random variables, with <math>0 \leq X_n \leq X_{n+1}</math> (a.s) for each <math>n \geq 0.</math> Furthermore, let <math>X_n \to X</math> pointwise. Then, the monotone convergence theorem states that <math>\lim_n\operatorname{E}[X_n]=\operatorname{E}[X].</math> {{pb}} Using the monotone convergence theorem, one can show that expectation indeed satisfies countable additivity for non-negative random variables. In particular, let <math>\{X_i\}_{i=0}^\infty</math> be non-negative random variables. It follows from the [[Monotone convergence theorem|monotone convergence theorem]] that <math display="block">
\operatorname{E}\left[\sum^\infty_{i=0}X_i\right] = \sum^\infty_{i=0}\operatorname{E}[X_i].
\operatorname{E}\left[\sum^\infty_{i=0}X_i\right] = \sum^\infty_{i=0}\operatorname{E}[X_i].
</math>
</math>
* [[Fatou's lemma]]: Let <math>\{ X_n \geq 0 : n \geq 0\}</math> be a sequence of non-negative random variables. Fatou's lemma states that <math display="block">\operatorname{E}[\liminf_n X_n] \leq \liminf_n \operatorname{E}[X_n]. </math> {{pb}} '''Corollary.''' Let <math> X_n \geq 0</math> with <math>\operatorname{E}[X_n] \leq C </math> for all <math> n \geq 0</math>. If <math>X_n \to X</math> (a.s), then <math>\operatorname{E}[X] \leq C. </math> {{pb}} '''Proof''' is by observing that <math display="inline"> X = \liminf_n X_n</math> (a.s.) and applying Fatou's lemma.
* [[Fatou's lemma]]: Let <math>\{ X_n \geq 0 : n \geq 0\}</math> be a sequence of non-negative random variables. Fatou's lemma states that <math display="block">\operatorname{E}[\liminf_n X_n] \leq \liminf_n \operatorname{E}[X_n].</math> {{pb}} '''Corollary.''' Let <math>X_n \geq 0</math> with <math>\operatorname{E}[X_n] \leq C</math> for all <math>n \geq 0.</math> If <math>X_n \to X</math> (a.s), then <math>\operatorname{E}[X] \leq C.</math> {{pb}} '''Proof''' is by observing that <math display="inline"> X = \liminf_n X_n</math> (a.s.) and applying Fatou's lemma.
* [[Dominated convergence theorem]]: Let <math>\{X_n : n \geq 0 \}</math> be a sequence of random variables. If <math>X_n\to X</math> [[pointwise convergence|pointwise]] (a.s.), <math>|X_n|\leq Y \leq +\infty</math> (a.s.), and <math>\operatorname{E}[Y]<\infty</math>. Then, according to the dominated convergence theorem,
* [[Dominated convergence theorem]]: Let <math>\{X_n : n \geq 0 \}</math> be a sequence of random variables. If <math>X_n\to X</math> [[pointwise convergence|pointwise]] (a.s.), <math>|X_n|\leq Y \leq +\infty</math> (a.s.), and <math>\operatorname{E}[Y]<\infty.</math> Then, according to the dominated convergence theorem,
** <math>\operatorname{E}|X| \leq \operatorname{E}[Y] <\infty</math>;
** <math>\operatorname{E}|X| \leq \operatorname{E}[Y] <\infty</math>;
** <math>\lim_n\operatorname{E}[X_n]=\operatorname{E}[X]</math>
** <math>\lim_n\operatorname{E}[X_n]=\operatorname{E}[X]</math>
** <math>\lim_n\operatorname{E}|X_n - X| = 0. </math>
** <math>\lim_n\operatorname{E}|X_n - X| = 0.</math>
* [[Uniform integrability]]: In some cases, the equality <math>\lim_n\operatorname{E}[X_n]=\operatorname{E}[\lim_n X_n]</math> holds when the sequence <math>\{X_n\}</math> is ''uniformly integrable''.
* [[Uniform integrability]]: In some cases, the equality <math>\lim_n\operatorname{E}[X_n]=\operatorname{E}[\lim_n X_n]</math> holds when the sequence <math>\{X_n\}</math> is ''uniformly integrable.''


===Relationship with characteristic function===
===Relationship with characteristic function===
The probability density function <math>f_X</math> of a scalar random variable <math>X</math> is related to its [[characteristic function (probability)|characteristic function]] <math>\varphi_X</math> by the inversion formula:
The probability density function <math>f_X</math> of a scalar random variable <math>X</math> is related to its [[characteristic function (probability)|characteristic function]] <math>\varphi_X</math> by the inversion formula:
: <math>f_X(x) = \frac{1}{2\pi}\int_{\mathbb{R}} e^{-itx}\varphi_X(t) \, \mathrm{d}t.</math>
<math display="block">f_X(x) = \frac{1}{2\pi}\int_{\mathbb{R}} e^{-itx}\varphi_X(t) \, dt.</math>


For the expected value of <math>g(X)</math> (where <math>g:{\mathbb R}\to{\mathbb R}</math> is a [[Measurable function|Borel function]]), we can use this inversion formula to obtain
For the expected value of <math>g(X)</math> (where <math>g:{\mathbb R}\to{\mathbb R}</math> is a [[Measurable function|Borel function]]), we can use this inversion formula to obtain
<math display="block">\operatorname{E}[g(X)] = \frac{1}{2\pi} \int_\Reals g(x) \left[ \int_\Reals e^{-itx}\varphi_X(t) \, dt \right] dx.</math>

:<math> \operatorname{E}[g(X)] = \frac{1}{2\pi} \int_{\mathbb R} g(x)\left[ \int_{\mathbb R} e^{-itx}\varphi_X(t) \, \mathrm{d}t \right]\,\mathrm{d}x.</math>


If <math>\operatorname{E}[g(X)]</math> is finite, changing the order of integration, we get, in accordance with [[Fubini theorem|Fubini–Tonelli theorem]],
If <math>\operatorname{E}[g(X)]</math> is finite, changing the order of integration, we get, in accordance with [[Fubini theorem|Fubini–Tonelli theorem]],
<math display="block">\operatorname{E}[g(X)] = \frac{1}{2\pi} \int_\Reals G(t) \varphi_X(t) \, dt,</math>

:<math> \operatorname{E}[g(X)] = \frac{1}{2\pi} \int_{\mathbb R} G(t) \varphi_X(t) \, \mathrm{d}t,</math>

where
where
<math display="block">G(t) = \int_\Reals g(x) e^{-itx} \, dx</math>
is the [[Fourier transform]] of <math>g(x).</math> The expression for <math>\operatorname{E}[g(X)]</math> also follows directly from the [[Plancherel theorem]].


==Uses and applications==
:<math>G(t) = \int_{\mathbb R} g(x) e^{-itx} \, \mathrm{d}x</math>
The expectation of a random variable plays an important role in a variety of contexts.


In [[statistics]], where one seeks [[Estimator|estimates]] for unknown [[Statistical parameter|parameters]] based on available data gained from [[Sampling (statistics)|samples]], the [[sample mean]] serves as an estimate for the expectation, and is itself a random variable. In such settings, the sample mean is considered to meet the desirable criterion for a "good" estimator in being [[Bias of an estimator|unbiased]]; that is, the expected value of the estimate is equal to the [[true value]] of the underlying parameter. {{See also|Estimation theory}}
is the Fourier transform of <math> g(x). </math> The expression for <math>\operatorname{E}[g(X)]</math> also follows directly from [[Plancherel theorem]].


For a different example, in [[decision theory]], an agent making an optimal choice in the context of incomplete information is often assumed to maximize the expected value of their [[von Neumann–Morgenstern utility function|utility function]].
== Uses and applications ==
The expectation of a random variable plays an important role in a variety of contexts. For example, in [[decision theory]], an agent making an optimal choice in the context of incomplete information is often assumed to maximize the expected value of their [[von Neumann–Morgenstern utility function|utility function]].
For a different example, in [[statistics]], where one seeks estimates for unknown parameters based on available data, the estimate itself is a random variable. In such settings, a desirable criterion for a "good" estimator is that it is ''[[unbiased estimator|unbiased]]''; that is, the expected value of the estimate is equal to the [[true value]] of the underlying parameter.


It is possible to construct an expected value equal to the probability of an event, by taking the expectation of an [[indicator function]] that is one if the event has occurred and zero otherwise. This relationship can be used to translate properties of expected values into properties of probabilities, e.g. using the [[law of large numbers]] to justify estimating probabilities by [[Statistical frequency|frequencies]].
It is possible to construct an expected value equal to the probability of an event by taking the expectation of an [[indicator function]] that is one if the event has occurred and zero otherwise. This relationship can be used to translate properties of expected values into properties of probabilities, e.g. using the [[law of large numbers]] to justify estimating probabilities by [[Frequency (statistics)|frequencies]].


The expected values of the powers of ''X'' are called the [[moment (mathematics)|moments]] of ''X''; the [[moment about the mean|moments about the mean]] of ''X'' are expected values of powers of {{math|''X'' − E[''X'']}}. The moments of some random variables can be used to specify their distributions, via their [[moment generating function]]s.
The expected values of the powers of ''X'' are called the [[moment (mathematics)|moments]] of ''X''; the [[moment about the mean|moments about the mean]] of ''X'' are expected values of powers of {{math|''X'' − E[''X'']}}. The moments of some random variables can be used to specify their distributions, via their [[moment generating function]]s.


To empirically [[Estimation theory|estimate]] the expected value of a random variable, one repeatedly measures observations of the variable and computes the [[arithmetic mean]] of the results. If the expected value exists, this procedure estimates the true expected value in an [[estimator bias|unbiased]] manner and has the property of minimizing the sum of the squares of the [[errors and residuals in statistics|residuals]] (the sum of the squared differences between the observations and the [[estimator|estimate]]). The [[law of large numbers]] demonstrates (under fairly mild conditions) that, as the [[Sample size|size]] of the [[statistical sample|sample]] gets larger, the [[variance]] of this [[estimator|estimate]] gets smaller.
To empirically estimate the expected value of a random variable, one repeatedly measures observations of the variable and computes the [[arithmetic mean]] of the results. If the expected value exists, this procedure estimates the true expected value in an unbiased manner and has the property of minimizing the sum of the squares of the [[errors and residuals in statistics|residuals]] (the sum of the squared differences between the observations and the estimate). The law of large numbers demonstrates (under fairly mild conditions) that, as the [[Sample size|size]] of the sample gets larger, the [[variance]] of this estimate gets smaller.

This property is often exploited in a wide variety of applications, including general problems of [[statistical estimation]] and [[machine learning]], to estimate (probabilistic) quantities of interest via [[Monte Carlo methods]], since most quantities of interest can be written in terms of expectation, e.g. <math>\operatorname{P}({X \in \mathcal{A}}) = \operatorname{E}[{\mathbf 1}_{\mathcal{A}}]</math>, where <math> {\mathbf 1}_{\mathcal{A}}</math> is the indicator function of the set <math>\mathcal{A}</math>.


This property is often exploited in a wide variety of applications, including general problems of [[Estimation theory|statistical estimation]] and [[machine learning]], to estimate (probabilistic) quantities of interest via [[Monte Carlo methods]], since most quantities of interest can be written in terms of expectation, e.g. <math>\operatorname{P}({X \in \mathcal{A}}) = \operatorname{E}[{\mathbf 1}_{\mathcal{A}}],</math> where <math>{\mathbf 1}_{\mathcal{A}}</math> is the indicator function of the set <math>\mathcal{A}.</math>
[[File:Beta first moment.svg|thumb|The mass of probability distribution is balanced at the expected value, here a Beta(α,β) distribution with expected value α/(α+β).]] In [[classical mechanics]], the [[center of mass]] is an analogous concept to expectation. For example, suppose ''X'' is a discrete random variable with values ''x<sub>i</sub>'' and corresponding probabilities ''p<sub>i</sub>''. Now consider a weightless rod on which are placed weights, at locations ''x<sub>i</sub>'' along the rod and having masses ''p<sub>i</sub>'' (whose sum is one). The point at which the rod balances is E[''X''].


[[File:Beta first moment.svg|thumb|The mass of probability distribution is balanced at the expected value, here a Beta(α,β) distribution with expected value α/(α+β).]]
Expected values can also be used to compute the [[variance]], by means of the computational formula for the variance
In [[classical mechanics]], the [[center of mass]] is an analogous concept to expectation. For example, suppose ''X'' is a discrete random variable with values ''x<sub>i</sub>'' and corresponding probabilities ''p<sub>i</sub>.'' Now consider a weightless rod on which are placed weights, at locations ''x<sub>i</sub>'' along the rod and having masses ''p<sub>i</sub>'' (whose sum is one). The point at which the rod balances is E[''X''].


Expected values can also be used to compute the variance, by means of the computational formula for the variance
:<math>\operatorname{Var}(X)= \operatorname{E}[X^2] - (\operatorname{E}[X])^2.</math>
<math display="block">\operatorname{Var}(X)= \operatorname{E}[X^2] - (\operatorname{E}[X])^2.</math>


A very important application of the expectation value is in the field of [[quantum mechanics]]. The expectation value of a quantum mechanical operator <math>\hat{A}</math> operating on a [[quantum state]] vector <math>|\psi\rangle</math> is written as <math>\langle\hat{A}\rangle = \langle\psi|A|\psi\rangle</math>. The [[uncertainty principle|uncertainty]] in <math>\hat{A}</math> can be calculated by the formula <math>(\Delta A)^2 = \langle\hat{A}^2\rangle - \langle \hat{A} \rangle^2 </math>.
A very important application of the expectation value is in the field of [[quantum mechanics]]. The [[Expectation value (quantum mechanics)|expectation value of a quantum mechanical operator]] <math>\hat{A}</math> operating on a [[quantum state]] vector <math>|\psi\rangle</math> is written as <math>\langle\hat{A}\rangle = \langle\psi|\hat{A}|\psi\rangle.</math> The [[uncertainty principle|uncertainty]] in <math>\hat{A}</math> can be calculated by the formula <math>(\Delta A)^2 = \langle\hat{A}^2\rangle - \langle \hat{A} \rangle^2</math>.


==See also==
==See also==
* [[Center of mass]]
* [[Central tendency]]
* [[Central tendency]]
* [[Chebyshev's inequality]] (an inequality on location and scale parameters)
* [[Conditional expectation]]
* [[Conditional expectation]]
* [[Expectation (epistemic)|Expectation]] (the general term)
* [[Expectation (epistemic)]]
* [[Expectile]] – related to expectations in a way analogous to that in which quantiles are related to medians
* [[Expectation value (quantum mechanics)]]
* [[Law of total expectation]]—the expected value of the conditional expected value of ''X'' given ''Y'' is the same as the expected value of ''X''.
* [[Law of total expectation]] – the expected value of the conditional expected value of ''X'' given ''Y'' is the same as the expected value of ''X''
* [[Nonlinear expectation]] – a generalization of the expected value
* [[Moment (mathematics)]]
* [[Nonlinear expectation]] (a generalization of the expected value)
* [[Sample mean]]
* [[Population mean]]
* [[Population mean]]
* [[Predicted value]]
* [[Predicted value]]
* [[Wald's equation]]—an equation for calculating the expected value of a random number of random variables
* [[Wald's equation]] – an equation for calculating the expected value of a random number of random variables


== References ==
==References==
{{Reflist}}
{{Reflist}}


==Literature==
==Bibliography==
{{refbegin}}
{{refbegin}}
* {{cite book | last = Edwards | first = A.W.F | title = Pascal's arithmetical triangle: the story of a mathematical idea | year = 2002 | edition = 2nd | publisher = JHU Press | isbn = 0-8018-6946-3 }}
* {{cite book
* {{cite book | last = Huygens | first = Christiaan | title = De ratiociniis in ludo aleæ | format = English translation, published in 1714 | url = http://www.york.ac.uk/depts/maths/histstat/huygens.pdf | year = 1657 }}
| last = Edwards | first = A.W.F
* {{cite book|last1=Billingsley|first1=Patrick|title=Probability and measure|edition=Third edition of 1979 original|series=Wiley Series in Probability and Mathematical Statistics|publisher=John Wiley & Sons, Inc.|location=New York|year=1995|isbn=0-471-00710-2|mr=1324786|author-link1=Patrick Billingsley}}
| title = Pascal's arithmetical triangle: the story of a mathematical idea
| year = 2002
| edition = 2nd
| publisher = JHU Press
| isbn = 0-8018-6946-3
}}
* {{cite book
| last = Huygens | first = Christiaan
| title = De ratiociniis in ludo aleæ
| format = English translation, published in 1714 | url = http://www.york.ac.uk/depts/maths/histstat/huygens.pdf
| year = 1657
}}
* {{cite book|last1=Billingsley|first1=Patrick|title=Probability and measure|edition=Third edition of 1979 original|series=Wiley Series in Probability and Mathematical Statistics|publisher=[[John Wiley & Sons, Inc.]]|location=New York|year=1995|isbn=0-471-00710-2|mr=1324786|author-link1=Patrick Billingsley}}
* {{cite book|last1=Casella|first1=George|last2=Berger|first2=Roger L.|title=Statistical inference|series=Duxbury Advanced Series|publisher=Duxbury|location=Pacific Grove, CA|year=2001|edition=Second edition of 1990 original|isbn=0-534-11958-1|author-link1=George Casella|author-link2=Roger Lee Berger}}
* {{cite book|last1=Casella|first1=George|last2=Berger|first2=Roger L.|title=Statistical inference|series=Duxbury Advanced Series|publisher=Duxbury|location=Pacific Grove, CA|year=2001|edition=Second edition of 1990 original|isbn=0-534-11958-1|author-link1=George Casella|author-link2=Roger Lee Berger}}
* {{cite book|last1=Feller|first1=William|title=An introduction to probability theory and its applications. Volume I|edition=Third edition of 1950 original|publisher=[[John Wiley & Sons, Inc.]]|location=New York–London–Sydney|year=1968|author-link1=William Feller|mr=0228020}}
* {{cite book|last1=Feller|first1=William|title=An introduction to probability theory and its applications. Volume I|edition=Third edition of 1950 original|publisher=John Wiley & Sons, Inc.|location=New York–London–Sydney|year=1968|author-link1=William Feller|mr=0228020}}
* {{cite book|last1=Feller|first1=William|title=An introduction to probability theory and its applications. Volume II|edition=Second edition of 1966 original|publisher=[[John Wiley & Sons, Inc.]]|location=New York–London–Sydney|year=1971|author-link1=William Feller|mr=0270403}}
* {{cite book|last1=Feller|first1=William|title=An introduction to probability theory and its applications. Volume II|edition=Second edition of 1966 original|publisher=John Wiley & Sons, Inc.|location=New York–London–Sydney|year=1971|author-link1=William Feller|mr=0270403}}
* {{cite book|last1=Johnson|first1=Norman L.|author-link1=Norman Johnson (mathematician)|author-link2=Samuel Kotz|last2=Kotz|first2=Samuel|last3=Balakrishnan|first3=N.|title=Continuous univariate distributions. Volume 1|edition=Second edition of 1970 original|series=Wiley Series in Probability and Mathematical Statistics|publisher=[[John Wiley & Sons, Inc.]]|location=New York|year=1994|isbn=0-471-58495-9|mr=1299979}}
* {{cite book|last1=Johnson|first1=Norman L.|author-link1=Norman Johnson (mathematician)|author-link2=Samuel Kotz|last2=Kotz|first2=Samuel|last3=Balakrishnan|first3=N.|title=Continuous univariate distributions. Volume 1|edition=Second edition of 1970 original|series=Wiley Series in Probability and Mathematical Statistics|publisher=John Wiley & Sons, Inc.|location=New York|year=1994|isbn=0-471-58495-9|mr=1299979}}
* {{cite book|last1=Papoulis|first1=Athanasios|last2=Pillai|first2=S. Unnikrishna|title=Probability, random variables, and stochastic processes|edition=Fourth edition of 1965 original|author-link1=Athanasios Papoulis|author-link2=Unnikrishna Pillai|year=2002|publisher=[[McGraw-Hill]]|location=New York|isbn=0-07-366011-6}} {{erratum|https://www.mhhe.com/engcs/electrical/papoulis/graphics/eratta.pdf|checked=yes}}
* {{cite book|last1=Papoulis|first1=Athanasios|last2=Pillai|first2=S. Unnikrishna|title=Probability, random variables, and stochastic processes|edition=Fourth edition of 1965 original|author-link1=Athanasios Papoulis|author-link2=Unnikrishna Pillai|year=2002|publisher=McGraw-Hill|location=New York|isbn=0-07-366011-6}} {{erratum|https://www.mhhe.com/engcs/electrical/papoulis/graphics/eratta.pdf|checked=yes}}
* {{cite book|last1=Ross|first1=Sheldon M.|title=Introduction to probability models|edition=Twelfth edition of 1972 original|publisher=[[Academic Press]]|location=London|year=2019|isbn=978-0-12-814346-9|mr=3931305|author-link1=Sheldon M. Ross|doi=10.1016/C2017-0-01324-1}}
* {{cite book|last1=Ross|first1=Sheldon M.|title=Introduction to probability models|edition=Twelfth edition of 1972 original|publisher=Academic Press|location=London|year=2019|isbn=978-0-12-814346-9|mr=3931305|author-link1=Sheldon M. Ross|doi=10.1016/C2017-0-01324-1}}
{{refend}}
{{refend}}



Latest revision as of 17:03, 24 June 2024

In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of the possible values a random variable can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would "expect" to get in reality.

The expected value of a random variable with a finite number of outcomes is a weighted average of all possible outcomes. In the case of a continuum of possible outcomes, the expectation is defined by integration. In the axiomatic foundation for probability provided by measure theory, the expectation is given by Lebesgue integration.

The expected value of a random variable X is often denoted by E(X), E[X], or EX, with E also often stylized as or E.[1][2][3]

History[edit]

The idea of the expected value originated in the middle of the 17th century from the study of the so-called problem of points, which seeks to divide the stakes in a fair way between two players, who have to end their game before it is properly finished.[4] This problem had been debated for centuries. Many conflicting proposals and solutions had been suggested over the years when it was posed to Blaise Pascal by French writer and amateur mathematician Chevalier de Méré in 1654. Méré claimed that this problem could not be solved and that it showed just how flawed mathematics was when it came to its application to the real world. Pascal, being a mathematician, was provoked and determined to solve the problem once and for all.

He began to discuss the problem in the famous series of letters to Pierre de Fermat. Soon enough, they both independently came up with a solution. They solved the problem in different computational ways, but their results were identical because their computations were based on the same fundamental principle. The principle is that the value of a future gain should be directly proportional to the chance of getting it. This principle seemed to have come naturally to both of them. They were very pleased by the fact that they had found essentially the same solution, and this in turn made them absolutely convinced that they had solved the problem conclusively; however, they did not publish their findings. They only informed a small circle of mutual scientific friends in Paris about it.[5]

In Dutch mathematician Christiaan Huygens' book, he considered the problem of points, and presented a solution based on the same principle as the solutions of Pascal and Fermat. Huygens published his treatise in 1657, (see Huygens (1657)) "De ratiociniis in ludo aleæ" on probability theory just after visiting Paris. The book extended the concept of expectation by adding rules for how to calculate expectations in more complicated situations than the original problem (e.g., for three or more players), and can be seen as the first successful attempt at laying down the foundations of the theory of probability.

In the foreword to his treatise, Huygens wrote:

It should be said, also, that for some time some of the best mathematicians of France have occupied themselves with this kind of calculus so that no one should attribute to me the honour of the first invention. This does not belong to me. But these savants, although they put each other to the test by proposing to each other many questions difficult to solve, have hidden their methods. I have had therefore to examine and go deeply for myself into this matter by beginning with the elements, and it is impossible for me for this reason to affirm that I have even started from the same principle. But finally I have found that my answers in many cases do not differ from theirs.

— Edwards (2002)

In the mid-nineteenth century, Pafnuty Chebyshev became the first person to think systematically in terms of the expectations of random variables.[6]

Etymology[edit]

Neither Pascal nor Huygens used the term "expectation" in its modern sense. In particular, Huygens writes:[7]

That any one Chance or Expectation to win any thing is worth just such a Sum, as wou'd procure in the same Chance and Expectation at a fair Lay. ... If I expect a or b, and have an equal chance of gaining them, my Expectation is worth (a+b)/2.

More than a hundred years later, in 1814, Pierre-Simon Laplace published his tract "Théorie analytique des probabilités", where the concept of expected value was defined explicitly:[8]

... this advantage in the theory of chance is the product of the sum hoped for by the probability of obtaining it; it is the partial sum which ought to result when we do not wish to run the risks of the event in supposing that the division is made proportional to the probabilities. This division is the only equitable one when all strange circumstances are eliminated; because an equal degree of probability gives an equal right for the sum hoped for. We will call this advantage mathematical hope.

Notations[edit]

The use of the letter E to denote "expected value" goes back to W. A. Whitworth in 1901.[9] The symbol has since become popular for English writers. In German, E stands for Erwartungswert, in Spanish for esperanza matemática, and in French for espérance mathématique.[10]

When "E" is used to denote "expected value", authors use a variety of stylizations: the expectation operator can be stylized as E (upright), E (italic), or (in blackboard bold), while a variety of bracket notations (such as E(X), E[X], and EX) are all used.

Another popular notation is μX, whereas X, Xav, and are commonly used in physics,[11] and M(X) in Russian-language literature.

Definition[edit]

As discussed above, there are several context-dependent ways of defining the expected value. The simplest and original definition deals with the case of finitely many possible outcomes, such as in the flip of a coin. With the theory of infinite series, this can be extended to the case of countably many possible outcomes. It is also very common to consider the distinct case of random variables dictated by (piecewise-)continuous probability density functions, as these arise in many natural contexts. All of these specific definitions may be viewed as special cases of the general definition based upon the mathematical tools of measure theory and Lebesgue integration, which provide these different contexts with an axiomatic foundation and common language.

Any definition of expected value may be extended to define an expected value of a multidimensional random variable, i.e. a random vector X. It is defined component by component, as E[X]i = E[Xi]. Similarly, one may define the expected value of a random matrix X with components Xij by E[X]ij = E[Xij].

Random variables with finitely many outcomes[edit]

Consider a random variable X with a finite list x1, ..., xk of possible outcomes, each of which (respectively) has probability p1, ..., pk of occurring. The expectation of X is defined as[12]

Since the probabilities must satisfy p1 + ⋅⋅⋅ + pk = 1, it is natural to interpret E[X] as a weighted average of the xi values, with weights given by their probabilities pi.

In the special case that all possible outcomes are equiprobable (that is, p1 = ⋅⋅⋅ = pk), the weighted average is given by the standard average. In the general case, the expected value takes into account the fact that some outcomes are more likely than others.

Examples[edit]

An illustration of the convergence of sequence averages of rolls of a dice to the expected value of 3.5 as the number of rolls (trials) grows
  • Let represent the outcome of a roll of a fair six-sided die. More specifically, will be the number of pips showing on the top face of the die after the toss. The possible values for are 1, 2, 3, 4, 5, and 6, all of which are equally likely with a probability of 1/6. The expectation of is If one rolls the die times and computes the average (arithmetic mean) of the results, then as grows, the average will almost surely converge to the expected value, a fact known as the strong law of large numbers.
  • The roulette game consists of a small ball and a wheel with 38 numbered pockets around the edge. As the wheel is spun, the ball bounces around randomly until it settles down in one of the pockets. Suppose random variable represents the (monetary) outcome of a $1 bet on a single number ("straight up" bet). If the bet wins (which happens with probability 1/38 in American roulette), the payoff is $35; otherwise the player loses the bet. The expected profit from such a bet will be That is, the expected value to be won from a $1 bet is −$1/19. Thus, in 190 bets, the net loss will probably be about $10.

Random variables with countably infinitely many outcomes[edit]

Informally, the expectation of a random variable with a countably infinite set of possible outcomes is defined analogously as the weighted average of all possible outcomes, where the weights are given by the probabilities of realizing each given value. This is to say that where x1, x2, ... are the possible outcomes of the random variable X and p1, p2, ... are their corresponding probabilities. In many non-mathematical textbooks, this is presented as the full definition of expected values in this context.[13]

However, there are some subtleties with infinite summation, so the above formula is not suitable as a mathematical definition. In particular, the Riemann series theorem of mathematical analysis illustrates that the value of certain infinite sums involving positive and negative summands depends on the order in which the summands are given. Since the outcomes of a random variable have no naturally given order, this creates a difficulty in defining expected value precisely.

For this reason, many mathematical textbooks only consider the case that the infinite sum given above converges absolutely, which implies that the infinite sum is a finite number independent of the ordering of summands.[14] In the alternative case that the infinite sum does not converge absolutely, one says the random variable does not have finite expectation.[14]

Examples[edit]

  • Suppose and for where is the scaling factor which makes the probabilities sum to 1. Then we have

Random variables with density[edit]

Now consider a random variable X which has a probability density function given by a function f on the real number line. This means that the probability of X taking on a value in any given open interval is given by the integral of f over that interval. The expectation of X is then given by the integral[15] A general and mathematically precise formulation of this definition uses measure theory and Lebesgue integration, and the corresponding theory of absolutely continuous random variables is described in the next section. The density functions of many common distributions are piecewise continuous, and as such the theory is often developed in this restricted setting.[16] For such functions, it is sufficient to only consider the standard Riemann integration. Sometimes continuous random variables are defined as those corresponding to this special class of densities, although the term is used differently by various authors.

Analogously to the countably-infinite case above, there are subtleties with this expression due to the infinite region of integration. Such subtleties can be seen concretely if the distribution of X is given by the Cauchy distribution Cauchy(0, π), so that f(x) = (x2 + π2)−1. It is straightforward to compute in this case that The limit of this expression as a → −∞ and b → ∞ does not exist: if the limits are taken so that a = −b, then the limit is zero, while if the constraint 2a = −b is taken, then the limit is ln(2).

To avoid such ambiguities, in mathematical textbooks it is common to require that the given integral converges absolutely, with E[X] left undefined otherwise.[17] However, measure-theoretic notions as given below can be used to give a systematic definition of E[X] for more general random variables X.

Arbitrary real-valued random variables[edit]

All definitions of the expected value may be expressed in the language of measure theory. In general, if X is a real-valued random variable defined on a probability space (Ω, Σ, P), then the expected value of X, denoted by E[X], is defined as the Lebesgue integral[18] Despite the newly abstract situation, this definition is extremely similar in nature to the very simplest definition of expected values, given above, as certain weighted averages. This is because, in measure theory, the value of the Lebesgue integral of X is defined via weighted averages of approximations of X which take on finitely many values.[19] Moreover, if given a random variable with finitely or countably many possible values, the Lebesgue theory of expectation is identical to the summation formulas given above. However, the Lebesgue theory clarifies the scope of the theory of probability density functions. A random variable X is said to be absolutely continuous if any of the following conditions are satisfied:

  • there is a nonnegative measurable function f on the real line such that for any Borel set A, in which the integral is Lebesgue.
  • the cumulative distribution function of X is absolutely continuous.
  • for any Borel set A of real numbers with Lebesgue measure equal to zero, the probability of X being valued in A is also equal to zero
  • for any positive number ε there is a positive number δ such that: if A is a Borel set with Lebesgue measure less than δ, then the probability of X being valued in A is less than ε.

These conditions are all equivalent, although this is nontrivial to establish.[20] In this definition, f is called the probability density function of X (relative to Lebesgue measure). According to the change-of-variables formula for Lebesgue integration,[21] combined with the law of the unconscious statistician,[22] it follows that for any absolutely continuous random variable X. The above discussion of continuous random variables is thus a special case of the general Lebesgue theory, due to the fact that every piecewise-continuous function is measurable.

Expected value μ and median 𝑚
Expected value μ and median 𝑚

The expected value of any real-valued random variable can also be defined on the graph of its cumulative distribution function by a nearby equality of areas. In fact, with a real number if and only if the two surfaces in the --plane, described by respectively, have the same finite area, i.e. if and both improper Riemann integrals converge. Finally, this is equivalent to the representation also with convergent integrals.[23]

Infinite expected values[edit]

Expected values as defined above are automatically finite numbers. However, in many cases it is fundamental to be able to consider expected values of ±∞. This is intuitive, for example, in the case of the St. Petersburg paradox, in which one considers a random variable with possible outcomes xi = 2i, with associated probabilities pi = 2i, for i ranging over all positive integers. According to the summation formula in the case of random variables with countably many outcomes, one has It is natural to say that the expected value equals +∞.

There is a rigorous mathematical theory underlying such ideas, which is often taken as part of the definition of the Lebesgue integral.[19] The first fundamental observation is that, whichever of the above definitions are followed, any nonnegative random variable whatsoever can be given an unambiguous expected value; whenever absolute convergence fails, then the expected value can be defined as +∞. The second fundamental observation is that any random variable can be written as the difference of two nonnegative random variables. Given a random variable X, one defines the positive and negative parts by X + = max(X, 0) and X = −min(X, 0). These are nonnegative random variables, and it can be directly checked that X = X +X. Since E[X +] and E[X] are both then defined as either nonnegative numbers or +∞, it is then natural to define:

According to this definition, E[X] exists and is finite if and only if E[X +] and E[X] are both finite. Due to the formula |X| = X + + X, this is the case if and only if E|X| is finite, and this is equivalent to the absolute convergence conditions in the definitions above. As such, the present considerations do not define finite expected values in any cases not previously considered; they are only useful for infinite expectations.

  • In the case of the St. Petersburg paradox, one has X = 0 and so E[X] = +∞ as desired.
  • Suppose the random variable X takes values 1, −2,3, −4, ... with respective probabilities −2, 6(2π)−2, 6(3π)−2, 6(4π)−2, .... Then it follows that X + takes value 2k−1 with probability 6((2k−1)π)−2 for each positive integer k, and takes value 0 with remaining probability. Similarly, X takes value 2k with probability 6(2kπ)−2 for each positive integer k and takes value 0 with remaining probability. Using the definition for non-negative random variables, one can show that both E[X +] = ∞ and E[X] = ∞ (see Harmonic series). Hence, in this case the expectation of X is undefined.
  • Similarly, the Cauchy distribution, as discussed above, has undefined expectation.

Expected values of common distributions[edit]

The following table gives the expected values of some commonly occurring probability distributions. The third column gives the expected values both in the form immediately given by the definition, as well as in the simplified form obtained by computation therefrom. The details of these computations, which are not always straightforward, can be found in the indicated references.

Distribution Notation Mean E(X)
Bernoulli[24]
Binomial[25]
Poisson[26]
Geometric[27]
Uniform[28]
Exponential[29]
Normal[30]
Standard Normal[31]
Pareto[32]
Cauchy[33] is undefined

Properties[edit]

The basic properties below (and their names in bold) replicate or follow immediately from those of Lebesgue integral. Note that the letters "a.s." stand for "almost surely"—a central property of the Lebesgue integral. Basically, one says that an inequality like is true almost surely, when the probability measure attributes zero-mass to the complementary event

  • Non-negativity: If (a.s.), then
  • Linearity of expectation:[34] The expected value operator (or expectation operator) is linear in the sense that, for any random variables and and a constant whenever the right-hand side is well-defined. By induction, this means that the expected value of the sum of any finite number of random variables is the sum of the expected values of the individual random variables, and the expected value scales linearly with a multiplicative constant. Symbolically, for random variables and constants we have If we think of the set of random variables with finite expected value as forming a vector space, then the linearity of expectation implies that the expected value is a linear form on this vector space.
  • Monotonicity: If (a.s.), and both and exist, then
    Proof follows from the linearity and the non-negativity property for since (a.s.).
  • Non-degeneracy: If then (a.s.).
  • If (a.s.), then In other words, if X and Y are random variables that take different values with probability zero, then the expectation of X will equal the expectation of Y.
  • If (a.s.) for some real number c, then In particular, for a random variable with well-defined expectation, A well defined expectation implies that there is one number, or rather, one constant that defines the expected value. Thus follows that the expectation of this constant is just the original expected value.
  • As a consequence of the formula |X| = X+ + X as discussed above, together with the triangle inequality, it follows that for any random variable with well-defined expectation, one has
  • Let 1A denote the indicator function of an event A, then E[1A] is given by the probability of A. This is nothing but a different way of stating the expectation of a Bernoulli random variable, as calculated in the table above.
  • Formulas in terms of CDF: If is the cumulative distribution function of a random variable X, then where the values on both sides are well defined or not well defined simultaneously, and the integral is taken in the sense of Lebesgue-Stieltjes. As a consequence of integration by parts as applied to this representation of E[X], it can be proved that with the integrals taken in the sense of Lebesgue.[35] As a special case, for any random variable X valued in the nonnegative integers {0, 1, 2, 3, ...}, one has where P denotes the underlying probability measure.
  • Non-multiplicativity: In general, the expected value is not multiplicative, i.e. is not necessarily equal to If and are independent, then one can show that If the random variables are dependent, then generally although in special cases of dependency the equality may hold.
  • Law of the unconscious statistician: The expected value of a measurable function of given that has a probability density function is given by the inner product of and :[34] This formula also holds in multidimensional case, when is a function of several random variables, and is their joint density.[34][36]

Inequalities[edit]

Concentration inequalities control the likelihood of a random variable taking on large values. Markov's inequality is among the best-known and simplest to prove: for a nonnegative random variable X and any positive number a, it states that[37]

If X is any random variable with finite expectation, then Markov's inequality may be applied to the random variable |X−E[X]|2 to obtain Chebyshev's inequality where Var is the variance.[37] These inequalities are significant for their nearly complete lack of conditional assumptions. For example, for any random variable with finite expectation, the Chebyshev inequality implies that there is at least a 75% probability of an outcome being within two standard deviations of the expected value. However, in special cases the Markov and Chebyshev inequalities often give much weaker information than is otherwise available. For example, in the case of an unweighted dice, Chebyshev's inequality says that odds of rolling between 1 and 6 is at least 53%; in reality, the odds are of course 100%.[38] The Kolmogorov inequality extends the Chebyshev inequality to the context of sums of random variables.[39]

The following three inequalities are of fundamental importance in the field of mathematical analysis and its applications to probability theory.

  • Jensen's inequality: Let f: RR be a convex function and X a random variable with finite expectation. Then[40] Part of the assertion is that the negative part of f(X) has finite expectation, so that the right-hand side is well-defined (possibly infinite). Convexity of f can be phrased as saying that the output of the weighted average of two inputs under-estimates the same weighted average of the two outputs; Jensen's inequality extends this to the setting of completely general weighted averages, as represented by the expectation. In the special case that f(x) = |x|t/s for positive numbers s < t, one obtains the Lyapunov inequality[41] This can also be proved by the Hölder inequality.[40] In measure theory, this is particularly notable for proving the inclusion Ls ⊂ Lt of Lp spaces, in the special case of probability spaces.
  • Hölder's inequality: if p > 1 and q > 1 are numbers satisfying p −1 + q −1 = 1, then for any random variables X and Y.[40] The special case of p = q = 2 is called the Cauchy–Schwarz inequality, and is particularly well-known.[40]
  • Minkowski inequality: given any number p ≥ 1, for any random variables X and Y with E|X|p and E|Y|p both finite, it follows that E|X + Y|p is also finite and[42]

The Hölder and Minkowski inequalities can be extended to general measure spaces, and are often given in that context. By contrast, the Jensen inequality is special to the case of probability spaces.

Expectations under convergence of random variables[edit]

In general, it is not the case that even if pointwise. Thus, one cannot interchange limits and expectation, without additional conditions on the random variables. To see this, let be a random variable distributed uniformly on For define a sequence of random variables with being the indicator function of the event Then, it follows that pointwise. But, for each Hence,

Analogously, for general sequence of random variables the expected value operator is not -additive, i.e.

An example is easily obtained by setting and for where is as in the previous example.

A number of convergence results specify exact conditions which allow one to interchange limits and expectations, as specified below.

  • Monotone convergence theorem: Let be a sequence of random variables, with (a.s) for each Furthermore, let pointwise. Then, the monotone convergence theorem states that
    Using the monotone convergence theorem, one can show that expectation indeed satisfies countable additivity for non-negative random variables. In particular, let be non-negative random variables. It follows from the monotone convergence theorem that
  • Fatou's lemma: Let be a sequence of non-negative random variables. Fatou's lemma states that
    Corollary. Let with for all If (a.s), then
    Proof is by observing that (a.s.) and applying Fatou's lemma.
  • Dominated convergence theorem: Let be a sequence of random variables. If pointwise (a.s.), (a.s.), and Then, according to the dominated convergence theorem,
    • ;
  • Uniform integrability: In some cases, the equality holds when the sequence is uniformly integrable.

Relationship with characteristic function[edit]

The probability density function of a scalar random variable is related to its characteristic function by the inversion formula:

For the expected value of (where is a Borel function), we can use this inversion formula to obtain

If is finite, changing the order of integration, we get, in accordance with Fubini–Tonelli theorem, where is the Fourier transform of The expression for also follows directly from the Plancherel theorem.

Uses and applications[edit]

The expectation of a random variable plays an important role in a variety of contexts.

In statistics, where one seeks estimates for unknown parameters based on available data gained from samples, the sample mean serves as an estimate for the expectation, and is itself a random variable. In such settings, the sample mean is considered to meet the desirable criterion for a "good" estimator in being unbiased; that is, the expected value of the estimate is equal to the true value of the underlying parameter.

For a different example, in decision theory, an agent making an optimal choice in the context of incomplete information is often assumed to maximize the expected value of their utility function.

It is possible to construct an expected value equal to the probability of an event by taking the expectation of an indicator function that is one if the event has occurred and zero otherwise. This relationship can be used to translate properties of expected values into properties of probabilities, e.g. using the law of large numbers to justify estimating probabilities by frequencies.

The expected values of the powers of X are called the moments of X; the moments about the mean of X are expected values of powers of X − E[X]. The moments of some random variables can be used to specify their distributions, via their moment generating functions.

To empirically estimate the expected value of a random variable, one repeatedly measures observations of the variable and computes the arithmetic mean of the results. If the expected value exists, this procedure estimates the true expected value in an unbiased manner and has the property of minimizing the sum of the squares of the residuals (the sum of the squared differences between the observations and the estimate). The law of large numbers demonstrates (under fairly mild conditions) that, as the size of the sample gets larger, the variance of this estimate gets smaller.

This property is often exploited in a wide variety of applications, including general problems of statistical estimation and machine learning, to estimate (probabilistic) quantities of interest via Monte Carlo methods, since most quantities of interest can be written in terms of expectation, e.g. where is the indicator function of the set

The mass of probability distribution is balanced at the expected value, here a Beta(α,β) distribution with expected value α/(α+β).

In classical mechanics, the center of mass is an analogous concept to expectation. For example, suppose X is a discrete random variable with values xi and corresponding probabilities pi. Now consider a weightless rod on which are placed weights, at locations xi along the rod and having masses pi (whose sum is one). The point at which the rod balances is E[X].

Expected values can also be used to compute the variance, by means of the computational formula for the variance

A very important application of the expectation value is in the field of quantum mechanics. The expectation value of a quantum mechanical operator operating on a quantum state vector is written as The uncertainty in can be calculated by the formula .

See also[edit]

References[edit]

  1. ^ "Expectation | Mean | Average". www.probabilitycourse.com. Retrieved 2020-09-11.
  2. ^ Hansen, Bruce. "PROBABILITY AND STATISTICS FOR ECONOMISTS" (PDF). Archived from the original (PDF) on 2022-01-19. Retrieved 2021-07-20.
  3. ^ Wasserman, Larry (December 2010). All of Statistics: a concise course in statistical inference. Springer texts in statistics. p. 47. ISBN 9781441923226.
  4. ^ History of Probability and Statistics and Their Applications before 1750. Wiley Series in Probability and Statistics. 1990. doi:10.1002/0471725161. ISBN 9780471725169.
  5. ^ Ore, Oystein (1960). "Ore, Pascal and the Invention of Probability Theory". The American Mathematical Monthly. 67 (5): 409–419. doi:10.2307/2309286. JSTOR 2309286.
  6. ^ George Mackey (July 1980). "HARMONIC ANALYSIS AS THE EXPLOITATION OF SYMMETRY - A HISTORICAL SURVEY". Bulletin of the American Mathematical Society. New Series. 3 (1): 549.
  7. ^ Huygens, Christian. "The Value of Chances in Games of Fortune. English Translation" (PDF).
  8. ^ Laplace, Pierre Simon, marquis de, 1749-1827. (1952) [1951]. A philosophical essay on probabilities. Dover Publications. OCLC 475539.{{cite book}}: CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link)
  9. ^ Whitworth, W.A. (1901) Choice and Chance with One Thousand Exercises. Fifth edition. Deighton Bell, Cambridge. [Reprinted by Hafner Publishing Co., New York, 1959.]
  10. ^ "Earliest uses of symbols in probability and statistics".
  11. ^ Feller 1968, p. 221.
  12. ^ Billingsley 1995, p. 76.
  13. ^ Ross 2019, Section 2.4.1.
  14. ^ a b Feller 1968, Section IX.2.
  15. ^ Papoulis & Pillai 2002, Section 5-3; Ross 2019, Section 2.4.2.
  16. ^ Feller 1971, Section I.2.
  17. ^ Feller 1971, p. 5.
  18. ^ Billingsley 1995, p. 273.
  19. ^ a b Billingsley 1995, Section 15.
  20. ^ Billingsley 1995, Theorems 31.7 and 31.8 and p. 422.
  21. ^ Billingsley 1995, Theorem 16.13.
  22. ^ Billingsley 1995, Theorem 16.11.
  23. ^ Uhl, Roland (2023). Charakterisierung des Erwartungswertes am Graphen der Verteilungsfunktion (PDF). Technische Hochschule Brandenburg. doi:10.25933/opus4-2986. pp. 2–4.
  24. ^ Casella & Berger 2001, p. 89; Ross 2019, Example 2.16.
  25. ^ Casella & Berger 2001, Example 2.2.3; Ross 2019, Example 2.17.
  26. ^ Billingsley 1995, Example 21.4; Casella & Berger 2001, p. 92; Ross 2019, Example 2.19.
  27. ^ Casella & Berger 2001, p. 97; Ross 2019, Example 2.18.
  28. ^ Casella & Berger 2001, p. 99; Ross 2019, Example 2.20.
  29. ^ Billingsley 1995, Example 21.3; Casella & Berger 2001, Example 2.2.2; Ross 2019, Example 2.21.
  30. ^ Casella & Berger 2001, p. 103; Ross 2019, Example 2.22.
  31. ^ Billingsley 1995, Example 21.1; Casella & Berger 2001, p. 103.
  32. ^ Johnson, Kotz & Balakrishnan 1994, Chapter 20.
  33. ^ Feller 1971, Section II.4.
  34. ^ a b c Weisstein, Eric W. "Expectation Value". mathworld.wolfram.com. Retrieved 2020-09-11.
  35. ^ Feller 1971, Section V.6.
  36. ^ Papoulis & Pillai 2002, Section 6-4.
  37. ^ a b Feller 1968, Section IX.6; Feller 1971, Section V.7; Papoulis & Pillai 2002, Section 5-4; Ross 2019, Section 2.8.
  38. ^ Feller 1968, Section IX.6.
  39. ^ Feller 1968, Section IX.7.
  40. ^ a b c d Feller 1971, Section V.8.
  41. ^ Billingsley 1995, pp. 81, 277.
  42. ^ Billingsley 1995, Section 19.

Bibliography[edit]