EXPECTED VALUE of a RV. corresponds to the average value one would get for the RV when repeating the experiment, =0.

Similar documents
REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

Joint Distribution of Two or More Random Variables

Jointly Distributed Random Variables

More on Distribution Function

ECON Fundamentals of Probability

Lecture 2: Review of Probability

Bivariate distributions

BASICS OF PROBABILITY

CHAPTER 4 MATHEMATICAL EXPECTATION. 4.1 Mean of a Random Variable

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Lecture 4: Proofs for Expectation, Variance, and Covariance Formula

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Preliminary Statistics. Lecture 3: Probability Models and Distributions

ECE Lecture #9 Part 2 Overview

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Random Variables and Their Distributions

Discrete Random Variables

Preliminary Statistics Lecture 3: Probability Models and Distributions (Outline) prelimsoas.webs.com

Joint Probability Distributions and Random Samples (Devore Chapter Five)

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

Review of Probability. CS1538: Introduction to Simulations

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Topic 3: The Expectation of a Random Variable

Math 180B Problem Set 3

Chapter 4 continued. Chapter 4 sections

Chapter 4. Chapter 4 sections

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Notes on Mathematics Groups

Topic 3 Random variables, expectation, and variance, II

Random Variables and Expectations

Notes for Math 324, Part 19

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Multivariate Distributions (Hogg Chapter Two)

n x u x v n x = (u+v) n. p x (1 p) n x x = ( e t p+(1 p) ) n., x = 0,1,2,... λe = e λ t ) x = e λ e λet = e λ(et 1).

Multivariate Random Variable

Covariance and Correlation Class 7, Jeremy Orloff and Jonathan Bloom

X = X X n, + X 2

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

CME 106: Review Probability theory

Distributions of linear combinations

Analysis of Engineering and Scientific Data. Semester

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

2 (Statistics) Random variables

Stat 5101 Notes: Algorithms

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.

Introduction to Machine Learning

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Multiple Random Variables

1.12 Multivariate Random Variables

Probability and random variables. Sept 2018

Overview. CSE 21 Day 5. Image/Coimage. Monotonic Lists. Functions Probabilistic analysis

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

Multivariate probability distributions and linear regression

Lecture 2: Repetition of probability theory and statistics

ECE Homework Set 3

The Multivariate Normal Distribution. In this case according to our theorem

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Lecture 10. Variance and standard deviation

Statistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Section 8.1. Vector Notation

The mean, variance and covariance. (Chs 3.4.1, 3.4.2)

Class 8 Review Problems 18.05, Spring 2014

ECE 450 Homework #3. 1. Given the joint density function f XY (x,y) = 0.5 1<x<2, 2<y< <x<4, 2<y<3 0 else

18 Bivariate normal distribution I

Business Statistics 41000: Homework # 2 Solutions

Review of probability

3 Multiple Discrete Random Variables

Expectation and Variance

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

1 Presessional Probability

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

Gaussian random variables inr n

Week 12-13: Discrete Probability

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

Notation: X = random variable; x = particular value; P(X = x) denotes probability that X equals the value x.

Estimation of uncertainties using the Guide to the expression of uncertainty (GUM)

Homework 4 Solution, due July 23

Alex Psomas: Lecture 17.

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

MAS223 Statistical Inference and Modelling Exercises

1. Consider a random independent sample of size 712 from a distribution with the following pdf. c 1+x. f(x) =

Formulas for probability theory and linear models SF2941

SDS 321: Introduction to Probability and Statistics

Algorithms for Uncertainty Quantification

ECON 5350 Class Notes Review of Probability and Distribution Theory

Tom Salisbury

4.1 Expected value of a discrete random variable. Suppose we shall win $i if the die lands on i. What is our ideal average winnings per toss?

4. Distributions of Functions of Random Variables

Covariance and Correlation

CS70: Jean Walrand: Lecture 22.

The expected value E[X] of discrete random variable X is defined by. xp X (x), (6.1) E[X] =

Joint probability distributions: Discrete Variables. Two Discrete Random Variables. Example 1. Example 1

Transcription:

EXPECTED VALUE of a RV corresponds to the average value one would get for the RV when repeating the experiment, independently, infinitely many times. Sample (RIS) of n values of X (e.g. More accurately, consider a Random Independent 0,,, 0,,,,,, 0) and the corresponding P nj= X j Sample Mean X P i f n i =0 3 + 5 + =0.9, where f 0 0 0 i are the All i observed relative frequencies (of each possible value). We know that, as n (the sample size) increases, each f i tends to the corresponding Pr(X = i).this implies that, in the same limit, X tends to P All i i Pr(X = i), which we denote E(X) and call the expected (mean) value of X (so, in the end, the expected value is computed from the theoretical probabilities, not by doing the experiment). Note that P All i i Pr(X = i) is a weighted average of all possible values of X by their probabilities. And, as we have two names for it, we also have two alternate notations, E(X) is sometimes denoted µ x (the Greek mu ). EXAMPLES: When X is the number of dots in a single roll of a die, we get E(X) = ++3+4+5+ =3.5 (for a symmetric distribution, the mean is at the centre of symmetry - for any other distribution, it is at the center of mass ). Note that the name expected value is somehow misleading. Let Y be the larger of the two numbers when rolling two dice. E(Y )= 3 + 3 +3 5 +4 7 +5 9 + = ++5+8+45+ = =4.47. 3 3 3 3 3 3 3 Let U have the following (arbitrarily chosen) distribution: E(U) =0.+0.8+0.4 =.4. U = 0 3 4 Prob: 0.3 0. 0.4 0 0.

What happens to the expected value of X when we transform it, i.e. define a new RV by: U X, or g(x) in general? +X The main thing to remember is that, in general,the mean does not transform accordingly, i.e. µ u = h(µ x,µ y ). µ x +µ x, etc. This is also true for a transformation of X and Y, i.e. E[h(X, Y )] = But(atleastonegoodnews),tofindtheexpectedvalueofg(X), h(x, Y ),..., we can bypass constructing the new distribution (which was a tedious process) and use: E[g(X)] = X All i g(i) Pr(X = i) E[h(X, Y )] = X All i,j h(i, j) Pr(X = i Y = j) EXAMPLE: Based on U,wedefine W = U 3 and compute E(U) =8 0.3+ 0.+0 0.4+8 0. = 3.4. Clearly, it would have been a mistake to use.4 3 =0. (totally wrong). We can also get the correct answer the long way (just to verify the short answer), by first finding the distribution of W : W = 8 0 8 U = 0 4 Prob: 0.3 0. 0.4 0. implying W = 0 8 Prob: 0.4 0. 0.4. Based on this, E(W )=0.+3. =3.4 (check). Exception: Linear Transformations Only for these, we can find the mean of the new RV by simply replacing X by µ x, thus: E(a X + c) =a µ x + c

Proof: E(a X + c) = P (a i + c)f x (i) =a P i f x (i)+c P f x (i) =a µ x + c All i All i All i EXAMPLE: E(U 3) =.4 3= 0. Expected values related to a bivariate distribution When a bivariate distribution is given, the easiest way to compute the individual expected values (of X and Y ) is through the marginals. EXAMPLE: X = 3 Y = 0 0. 0 0.3 0.4 0.3 0. 0. 0. 0.4 0. 0.5 we compute E(X) = 0.4+ 0.+3 0.5 =. and E(Y )=0 0.4+ 0. =0.. This is how we also deal with an expected value of g (X) and/or g (Y ). Only when the new variable is defined by h(x, Y ), we have weigh-average the whole table. For example: E(X Y )= 0 0.+ 0 0+3 0 0.3+ 0.3+ 0.+3 0. =. More EXAMPLES: E [(X ) ]=0 0.4+ 0.+ 0.5 =. E +Y = 0.4+ 0. =0.7 + 0 + h i E (X ) (multiplying the last two results would be wrong). Here it may help to first +Y build the corresponding table of the (X ) +Y values: 0.4 =.5 0 4 0.Answer:. + 0.05 + 3

For Linear Transformation (of two RVs) we get: E(a X + b Y + c) =a µ x + b µ y + c Proof: E(aX + by +c) = P P (a i +b j + c)f xy (i, j) =a P i f x (i)+b P i j i j a E(X)+b E(Y )+c Note that X and Y need not be independent! j f y (j)+c = EXAMPLE: Using the previous bivariate distribution, E(X 3Y +4)is simply. 3 0.+4=.4 The previous formula easily extends to any number of RVs (again, not necessarily independent!) E(a X + a X +... + a k X k + c) = a E(X )+a E(X )+... + a k E(X k )+c Can Independence help (in other cases of expected value)? Yes, the expected value of a product of RVs equals the product of the individual expected values, but ONLY when these RVs are independent: E(X Y )=E(X) E(Y ) Proof: E(X Y )= P µ Ã! P P P i j f x (i) f y (j) = i f x (i) j f y (j) = E(X) E(Y ) i j i j The statement can actually be made more general: WhenX and Y are independent E [g (X) g (Y )] = E [g (X)] E [g (Y )] Moments of a RV 4

There are two types of moments, Simple Moments E(X k ) and Central Moments E (X µ x ) k where k is an integer. Special cases: The 0 th moment is identically equal to. The first simple moment is µ x (yet another name for it!). The second simple moment is E(X ) = µ x, etc. The first central moment is identically equal to 0. The second central moment E [(X µ x ) ] must be 0 (averaging non-negative quantities cannot result in a negative number). It is of such importance that it gets its own name: the variance of X, denoted Var(X). When doing the computation by hand, it helps to realize that E [(X µ x ) ]=E(X ) µ x. As a measure of the spread of the distribution of X values, we take σ x p Var(X) and call it the standard deviation of X. For all distributions, the (µ σ, ρ + σ) interval should contain the bulk of the distribution, i.e. anywhere from 50 to 90% (in terms of the corresponding histogram). Finally, skewness is defined as E [(X µ x) 3 ] (itmeasurestowhatextentisthedistribution non-symmetric, or skewed ), and kurtosis as E [(X µ x) 4 ] σ 3 x (it measures the degree σ 4 x of flatness, 3 being its typical value, higher for peaked, smaller for flat distributions). Thesearealotlessimportantthanthemeanandvariance(later,wewillunderstandwhy). EXAMPLES: X is the number of dots when rolling one die: µ x = 7,Var(X) + +3 +4 +5 + = 7 q = 35 implying σ 35 x = =.7078. Note that 3.5±.708 contains.7% of the dis- tribution. Skewness, for a symmetric distribution, is always equal to 0, kurtosis can be 5

computed from E [(X µ) 4 ]= (.5)4 +(.5) 4 +( 0.5) 4 +0.5 4 +.5 4 +.5 4 =4. 79 kurtosis = 4.79 ( 35 =. 734 ( flat ). ) U = 0 4 For µ u =.4, Var(U) =0 0.3+ 0.+ 0.4+4 Prob: 0.3 0. 0.4 0. 0..4 =.44 σ u =.44 =.. From E [(U µ u ) 3 ]=(.4) 3 0.3+ ( 0.4) 3 0.+ 0. 3 0.4+. 3 0. =.008, the skewness is.008. =0.58 3 (long right tail), 3 and from E [(U µ u ) 4 ]=(.4) 4 0.3+ ( 0.4) 4 0.+ 0. 4 0.4+. 4 0. =5.779 the kurtosis equals 5.779. 4 =.787 When X is transformed to Y = g(x), we already know that there is no general shortcut for computing E(Y ). This (even more so) applies to the variance of Y, which also needs to be computed from scratch. But, we did manage to simplify the expected value of a linear transformation of X (i.e., of Y = ax + c). Is there any direct conversion of Var(X) into Var(Y ) in this (linear) case? The answer is yes, and we can easily derive the corresponding formula: Var(aX + c) = E [(ax + c) ] (aµ x +c) = E [a X +acx + c ] (aµ x +c) = a E(X ) a µ x = a Var(X). This implies that σ ax+c = a σ X Moments - the bivariate case Firstly, there will be the individual moments of X (and, separately, Y ), which can be established based on the corresponding marginal distribution. Are there any other (joint) moments? Yes, and again we have the simple moments E(X k Y m ) and the central moments E (X µ x ) k (Y µ y ) m. The most important of

these is the first,first central moment called the covariance of X and Y : Cov(X, Y )= E (X µ x ) (Y µ y ) E(X Y ) µ x µ y It is obviously symmetric, i.e. Cov(X, Y )=Cov(Y,X) and it becomes zero when X and Y are independent (but not necessarily the other way round). Based on Cov(X, Y ), one can define the Correlation Coefficient between X and Y by: ρ xy = Cov(X, Y ) σ x σ y (Greek letter rho ). Its value must be always between and. Proof: Obviously, Var(X by )=Var(X) bcov(x, Y )+b Var(Y ) 0 for any value of b, including This implies that b = Cov(X,Y ) Var(Y ) Var(X) Cov(X,Y ) Cov(X,Y ) Cov(X, Y )+ Var(Y ) Var(Y ) Var(Y ) = Var(X) which, after dividing by Var(X) yields Cov(X,Y ) Var(Y ) 0 ρ 7

EXAMPLE: For one of our previous distributions X = 3 Y = 0 0. 0 0.3 0.4 0.3 0. 0. 0. we 0.4 0. 0.5 get µ x =., µ y =0., Var(X) =5.3. =0.89, Var(Y )=0. 0. =0.4, Cov(X, Y )= 0.3+0.+0.. 0. = 0. (may be negative), and ρ xy = 0. 0.89 0.4 = 0.34 One can also show that ρ ax+c,by +d = Cov(aX+c,bY +d) σ ax+c σ by +d = a b Cov(X,Y ) a b σ X σ Y = ±ρ xy (linear transformation does not change the value of ρ, but it may change its sign - can you tell when?). Linear Combination of RVs Starting with X and Y, let s see whether we can simplify the expression for Var(aX + by + c) = E [(ax + by + c) ] aµ x + bµ y + c = a E(X )+b E(Y )+abe(x Y ) a µ x b µ x abµ x µ y = a Var(X)+b Var(Y )+abcov(x, Y ) Independence eliminates the last term. This result can be easily extended to a linear combination of any number of random variables: Var(a X + a X +...a k X k + c) = a Var(X )+a Var(X )+... + a kvar(x k )+ a a Cov(X,X )+a a 3 Cov(X,X 3 )+...... +a k a k Cov(X k,x k ) Mutual independence (if present) eliminates the last row of k covariances. 8

And finally a formula for a covariance of one linear combination of RVs against another: Cov (a X + a X +..., b Y + b Y +...) = a b Cov(X,Y )+a b Cov(X,Y ) +a b Cov(X,Y )+a b Cov(X,Y )+... (I will call this the distributive law of covariance). Sample Mean and its distribution Consider a random independent sample of size n from an arbitrary distribution. We know that the sample mean X = P n i= X i n is itself a RV with its own distribution. Regardless how that distribution looks like, its expected value must be the same as the expected value of the distribution from which we sample. Proof: E( X) = n E(X )+ n E(X )+... + n E(X n)= n µ + n µ +... + n µ = µ That s why X is often used as an estimator of µ, when its value is unknown. Similarly, Var( X) =( n ) Var(X )+( n ) Var(X )+... +( n ) Var(X n )=( n ) σ +( n ) σ +... +( n ) σ = σ n that where σ is the standard deviation of the original distribution. This implies σ X = σ n The standard deviation of X (sometimes called the standard error of X) is n times smaller than that of the original distribution. Note that the standard error tends to zero as sample size increases. Now, how about the shape of the X-distribution, how does it relate to the shape of the sampled distribution? The surprising answer is: it doesn t. For n bigger than say 5, the 9

distribution of X quickly approaches the same regular shape, regardless of how the original distribution looked like. Now, consider a random independent sample of size n from a bi-variate distribution, where (X,Y ), (X,Y ),... (X n,y n ) are the individual pairs of observations. Then, Var(X + P X +... + X n )=Var(X )+ Var(X )+...+Var(X n ) n Var(X) and, similarly, Var( n Y i )= i= P n Var(Y ). Similarly, Cov( n P X i, n Y i )=Cov(X,Y )+Cov(X,Y )+...+Cov(X n,y n )=ncov(x, Y ). i= i= P All this implies that the correlation coefficient between n P X i and n n Cov(X,Y ) Y i equals n Var(X) n Var(Y ) ρ xy (the correlation between a single X and Y pair). Thesameistrueforthecorresponding i= i= sample means X P ni= X i n and Ȳ P ni= Y i n,why? Conditional expected value is, simply put, an expected value computed using the corresponding conditional distribution, e.g. E(X Y =)= X i i f x (i Y =) etc. EXAMPLE: Using our old bivariate distributions X = E(X Y = 3 Y = 0 0. 0 0.3 0.4 0.3 0. 0. 0. 0.4 0. 0.5 ) is constructed based on the corresponding conditional distribution X Y = 3 Prob: 3 0

by the usual process: 3 + +3 =.8 3 (note that this is different from E(X) =. calculated previously). Similarly E(X Y =)= 3 + +3 =4.. These imply that Var(X Y =)= 4..8 3 =0.805. Also,E( X Y =)= 3 + + 3 =0.9 4. When values of a RV are only integers (the case of most of our examples so far), we define its probability generating function (PGF) by P (z) = P All i z i Pr(X = i) where z is a parameter. Differentiating k times and substituting z =yields d k P (z) dz k = E[X (X )... (X k +)] z= (the so called k th factorial moment). For mutually independent RVs, we also get P X+Y +Z (z) =P x (z) P y (z) P z (z) Furthermore, given P (z), we can recover the distribution by Taylor-expanding P (z) - the coefficient of z i is Pr(X = i). EXAMPLES: For the distribution of our last example, we get P (z) = P (note that this can be also obtained from the MGF by e t z). P 0 (z) = P 00 (z) = ( z) 4 ( z) 3 z= = z= =4 i= z i ( )i = z z = z z giving us the mean of (check) and the variance of 4+ =(check).

What is the distribution of the total number of dots when rolling 3 dice? Well, the PGF of X,X and X 3 (each) is z + z + z 3 + z 4 + z 5 + z since they are independent RVs, the PGF of their sum is simply µ z + z + z 3 + z 4 + z 5 + z 3 Expanding (easy in Maple), we get: z3 + 7 z4 + 3 z5 + 5 08 z + 5 7 z7 + 7 7 z8 + 5 z9 + 8 z0 + 8 z + 5 z + 7 7 z3 + 5 7 z4 + 5 08 z5 + 3 z + 7 z7 + z8 Note that a RV can have infinite expected value (not too common, but it can happen). EXAMPLE: Suppose you flip a coin till the first head appears, and you win $ if this takes only one flip, $4 if it takes two flips, $8 if it takes 3 flips, etc. (the amount always doubles - i.e. you get $ for each of the first flips, $4 for the third, $8 for the fourth, etc.). What is the expected value of your win? Clearly, you win Y = X dollars. E(Y )= +4 +8 3 +... =+++... =.