CS70: Jean Walrand: Lecture 19.

Similar documents
CS70: Jean Walrand: Lecture 26.

Alex Psomas: Lecture 17.

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation

X = X X n, + X 2

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation

Discrete Random Variables

Probabilities and Expectations

Lecture 4: Probability and Discrete Random Variables

The expected value E[X] of discrete random variable X is defined by. xp X (x), (6.1) E[X] =

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand JEAN WALRAND - Probability Review

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

Random Variable. Pr(X = a) = Pr(s)

Expectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then,

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

SDS 321: Introduction to Probability and Statistics

Introduction to Randomized Algorithms: Quick Sort and Quick Selection

Discrete Random Variables

Random variables (discrete)

Topic 3: The Expectation of a Random Variable

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

Analysis of Engineering and Scientific Data. Semester

Sets. Review of basic probability. Tuples. E = {all even integers} S = {x E : x is a multiple of 3} CSE 101. I = [0, 1] = {x : 0 x 1}

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Random Variables. Statistics 110. Summer Copyright c 2006 by Mark E. Irwin

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

Chapter 2: Random Variables

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

3. DISCRETE RANDOM VARIABLES

Disjointness and Additivity

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

3 Multiple Discrete Random Variables

CS70: Jean Walrand: Lecture 15b.

Overview. CSE 21 Day 5. Image/Coimage. Monotonic Lists. Functions Probabilistic analysis

Probability Basics Review

CS5314 Randomized Algorithms. Lecture 5: Discrete Random Variables and Expectation (Conditional Expectation, Geometric RV)

Midterm Exam 1 (Solutions)

Discrete Random Variable

Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

2. Conditional Expectation (9/15/12; cf. Ross)

Carleton University. Final Examination Fall DURATION: 2 HOURS No. of students: 223

Week 12-13: Discrete Probability

Great Theoretical Ideas in Computer Science

Discrete Mathematics and Probability Theory Fall 2011 Rao Midterm 2 Solutions

Econ 113. Lecture Module 2

random variables T T T T H T H H

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

1 Exercises for lecture 1

Part I: Discrete Math.

Lecture Notes 2 Random Variables. Random Variable

CS70: Jean Walrand: Lecture 16.

n px p x (1 p) n x. p x n(n 1)... (n x + 1) x!

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

CS70: Jean Walrand: Lecture 24.

Review of probability

COMP 2804 Assignment 4

Statistics and Econometrics I

p. 4-1 Random Variables

CME 106: Review Probability theory

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Lecture 10. Variance and standard deviation

CS70: Jean Walrand: Lecture 22.

(Practice Version) Midterm Exam 2

Expected Value 7/7/2006

EECS 70 Discrete Mathematics and Probability Theory Fall 2015 Walrand/Rao Final

1. Consider a random independent sample of size 712 from a distribution with the following pdf. c 1+x. f(x) =

7 Random samples and sampling distributions

System Identification

1 Random Variable: Topics

5. Conditional Distributions

4. What is the probability that the two values differ by 4 or more in absolute value? There are only six

Chapter 4 : Discrete Random Variables

Probability Space: Formalism Simplest physical model of a uniform probability space:

1 Presessional Probability

MATH 3C: MIDTERM 1 REVIEW. 1. Counting

Math 493 Final Exam December 01

Discrete Mathematics and Probability Theory Fall 2010 Tse/Wagner MT 2 Soln

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

ECE353: Probability and Random Processes. Lecture 5 - Cumulative Distribution Function and Expectation

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Mathematical Foundations of Computer Science Lecture Outline October 18, 2018

Guidelines for Solving Probability Problems

Fundamental Tools - Probability Theory II

Lecture 6 - Random Variables and Parameterized Sample Spaces

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Twelfth Problem Assignment

Notes on Discrete Probability

BASICS OF PROBABILITY

SDS 321: Introduction to Probability and Statistics

Probability. VCE Maths Methods - Unit 2 - Probability

Random variables. DS GA 1002 Probability and Statistics for Data Science.

1 Basic continuous random variable problems

EXPECTED VALUE of a RV. corresponds to the average value one would get for the RV when repeating the experiment, =0.

Mathematical Foundations of Computer Science Lecture Outline October 9, 2018

Conditional Expectation and Martingales

Transcription:

CS70: Jean Walrand: Lecture 19. Random Variables: Expectation 1. Random Variables: Brief Review 2. Expectation 3. Important Distributions

Random Variables: Definitions Definition A random variable, X, for a random experiment with sample space Ω is a function X : Ω R. Thus, X( ) assigns a real number X(ω) to each ω Ω. Definitions (a) For a R, one defines (b) For A R, one defines X 1 (a) := {ω Ω X(ω) = a}. X 1 (A) := {ω Ω X(ω) A}. (c) The probability that X = a is defined as Pr[X = a] = Pr[X 1 (a)]. (d) The probability that X A is defined as Pr[X A] = Pr[X 1 (A)]. (e) The distribution of a random variable X, is {(a,pr[x = a]) : a A }, where A is the range of X. That is, A = {X(ω),ω Ω}.

Random Variables: Definitions Definition Let X,Y,Z be random variables on Ω and g : R 3 R a function. Then g(x,y,z ) is the random variable that assigns the value g(x(ω),y (ω),z (ω)) to ω. Thus, if V = g(x,y,z ), then V (ω) := g(x(ω),y (ω),z (ω)). Examples: X k (X a) 2 a + bx + cx 2 + (Y Z ) 2 (X Y ) 2 X cos(2πy + Z ).

Expectation - Definition Definition: The expected value (or mean, or expectation) of a random variable X is Theorem: E[X] = a Pr[X = a]. a E[X] = X(ω) Pr[ω]. ω

An Example Flip a fair coin three times. Ω = {HHH,HHT,HTH,THH,HTT,THT,TTH,TTT }. X = number of H s: {3,2,2,2,1,1,1,0}. Thus, Also, X(ω)Pr[ω] = {3 + 2 + 2 + 2 + 1 + 1 + 1 + 0} 1 ω 8. a Pr[X = a] = 3 1 a 8 + 2 3 8 + 1 3 8 + 0 1 8.

Win or Lose. Expected winnings for heads/tails games, with 3 flips? Recall the definition of the random variable X: {HHH,HHT,HTH,HTT,THH,THT,TTH,TTT } {3,1,1, 1,1, 1, 1, 3}. E[X] = 3 1 8 + 1 3 8 1 3 8 3 1 8 = 0. Can you ever win 0? Apparently: expected value is not a common value, by any means. The expected value of X is not the value that you expect! It is the average value per experiment, if you perform the experiment many times: X 1 + + X n, when n 1. n The fact that this average converges to E[X] is a theorem: the Law of Large Numbers. (See later.)

Law of Large Numbers An Illustration: Rolling Dice

Indicators Definition Let A be an event. The random variable X defined by { 1, if ω A X(ω) = 0, if ω / A is called the indicator of the event A. Note that Pr[X = 1] = Pr[A] and Pr[X = 0] = 1 Pr[A]. Hence, E[X] = 1 Pr[X = 1] + 0 Pr[X = 0] = Pr[A]. This random variable X(ω) is sometimes written as Thus, we will write X = 1 A. 1{ω A} or 1 A (ω).

Linearity of Expectation Theorem: Expectation is linear E[a 1 X 1 + + a n X n ] = a 1 E[X 1 ] + + a n E[X n ]. Proof: E[a 1 X 1 + + a n X n ] = (a 1 X 1 + + a n X n )(ω)pr[ω] ω = (a 1 X 1 (ω) + + a n X n (ω))pr[ω] ω = a 1 ω X 1 (ω)pr[ω] + + a nx n (ω)pr[ω] ω = a 1 E[X 1 ] + + a n E[X n ]. Note: If we had defined Y = a 1 X 1 + + a n X n has had tried to compute E[Y ] = y ypr[y = y], we would have been in trouble!

Using Linearity - 1: Pips (dots) on dice Roll a die n times. X m = number of pips on roll m. X = X 1 + + X n = total number of pips in n rolls. E[X] = E[X 1 + + X n ] = E[X 1 ] + + E[X n ], by linearity = ne[x 1 ], because the X m have the same distribution Now, E[X 1 ] = 1 1 6 + + 6 1 6 = 6 7 2 1 6 = 7 2. Hence, E[X] = 7n 2. Note: Computing x xpr[x = x] directly is not easy!

Using Linearity - 2: Fixed point. Hand out assignments at random to n students. X = number of students that get their own assignment back. X = X 1 + + X n where X m = 1{student m gets his/her own assignment back}. One has E[X] = E[X 1 + + X n ] = E[X 1 ] + + E[X n ], by linearity = ne[x 1 ], because all the X m have the same distribution = npr[x 1 = 1], because X 1 is an indicator = n(1/n), because student 1 is equally likely to get any one of the n assignments = 1. Note that linearity holds even though the X m are not independent (whatever that means). Note: What is Pr[X = m]? Tricky...

Using Linearity - 3: Binomial Distribution. Flip n coins with heads probability p. X - number of heads Binomial Distibution: Pr[X = i], for each i. ( ) n Pr[X = i] = p i (1 p) n i. i E[X] = i i Pr[X = i] = i i Uh oh.... Or... a better approach: Let { 1 if ith flip is heads X i = 0 otherwise E[X i ] = 1 Pr[ heads ] + 0 Pr[ tails ] = p. Moreover X = X 1 + X n and ( ) n p i (1 p) n i. i E[X] = E[X 1 ] + E[X 2 ] + E[X n ] = n E[X i ]= np.

Using Linearity - 4 Assume A and B are disjoint events. Then 1 A B (ω) = 1 A (ω) + 1 B (ω). Taking expectation, we get Pr[A B] = E[1 A B ] = E[1 A + 1 B ] = E[1 A ] + E[1 B ] = Pr[A] + Pr[B]. In general, 1 A B (ω) = 1 A (ω) + 1 B (ω) 1 A B (ω). Taking expectation, we get Pr[A B] = Pr[A] + Pr[B] Pr[A B]. Observe that if Y (ω) = b for all ω, then E[Y ] = b. Thus, E[X + b] = E[X] + b.

Calculating E[g(X)] Let Y = g(x). Assume that we know the distribution of X. We want to calculate E[Y ]. Method 1: We calculate the distribution of Y : Pr[Y = y] = Pr[X g 1 (y)] where g 1 (x) = {x R : g(x) = y}. This is typically rather tedious! Method 2: We use the following result. Theorem: Proof: E[g(X)] = ω = x E[g(X)] = g(x)pr[x = x]. x g(x(ω))pr[ω] = x ω X 1 (x) = g(x)pr[x = x]. x ω X 1 (x) g(x)pr[ω] = x g(x) g(x(ω))pr[ω] ω X 1 (x) Pr[ω]

An Example Let X be uniform in { 2, 1,0,1,2,3}. Let also g(x) = X 2. Then (method 2) E[g(X)] = 3 x= 2x 2 1 6 = {4 + 1 + 0 + 1 + 4 + 9} 1 6 = 19 6. Method 1 - We find the distribution of Y = X 2 : 4, w.p. 2 6 1, w.p. 2 Y = 6 0, w.p. 1 6 9, w.p. 1 6. Thus, E[Y ] = 4 2 6 + 12 6 + 01 6 + 91 6 = 19 6.

Calculating E[g(X,Y,Z )] We have seen that E[g(X)] = x g(x)pr[x = x]. Using a similar derivation, one can show that E[g(X,Y,Z )] = g(x,y,z)pr[x = x,y = y,z = z]. x,y,z An Example. Let X,Y be as shown below: Y 1 0 0.2 0.1 0 1 0.3 8 (0, 0), w.p. 0.1 >< (1, 0), w.p. 0.4 (X, Y )= (0, 1), w.p. 0.2 >: (1, 1), w.p. 0.3 0.4 X E[cos(2πX + πy )] = 0.1cos(0) + 0.4cos(2π) + 0.2cos(π) + 0.3cos(3π) = 0.1 1 + 0.4 1 + 0.2 ( 1) + 0.3 ( 1) = 0.

Best Guess: Least Squares If you only know the distribution of X, it seems that E[X] is a good guess for X. The following result makes that idea precise. Theorem The value of a that minimizes E[(X a) 2 ] is a = E[X]. Proof 1: E[(X a) 2 ] = E[(X E[X] + E[X] a) 2 ] = E[(X E[X]) 2 + 2(X E[X])(E[X] a) + (E[X] a) 2 ] = E[(X E[X]) 2 ] + 2(E[X] a)e[x E[X]] + (E[X] a) 2 = E[(X E[X]) 2 ] + 0 + (E[X] a) 2 E[(X E[X]) 2 ].

Best Guess: Least Squares If you only know the distribution of X, it seems that E[X] is a good guess for X. The following result makes that idea precise. Theorem The value of a that minimizes E[(X a) 2 ] is a = E[X]. Proof 2: Let g(a) := [(X a) 2 ] = E[X 2 2aX + a 2 ] = E[X 2 ] 2aE[X] + a 2. To find the minimizer of g(a), we set to zero d da g(a). We get 0 = d da g(a) = 2E[X] + 2a. Hence, the minimizer is a = E[X].

Best Guess: Least Absolute Deviation Thus E[X] minimizes E[(X a) 2 ]. It must be noted that the measure of the quality of the approximation matters. The following result illustrates that point. Theorem The value of a that minimizes E[ X a ] is the median of X. The median ν of X is any real number such that Pr[X ν] = Pr[X ν]. Proof: g(a) := E[ X a ] = x a (a x)pr[x = x] + x a (x a)pr[x = x]. Thus, if 0 < ε << 1, g(a + ε) = g(a) + εpr[x a] εpr[x a]. Hence, changing a cannot reduce g(a) only if Pr[X a] = Pr[X a].

Best Guess: Illustration X 2 {1, 2, 3, 11, 13} equal probabilities E[ X a ] E[(X a) 2 ] 1 10 a median E[X] mean

Best Guess: Another Illustration X 2 {1, 2, 11, 13} equal probabilities E[ X a ] median E[(X a) 2 ] 1 10 a E[X] mean

Center of Mass The expected value has a center of mass interpretation: 0.5 0.5 0.3 0.7 0 1 0.5 p 1 p 2 p 3 0 1 0.7 X p n (a n µ)=0 a 1 a 2 a 3 n, µ = X a n p n = E[X] µ n p 1 (a 1 µ) p 3 (a 3 µ) p 2 (a 2 µ)

Monotonicity Definition Let X,Y be two random variables on Ω. We write X Y if X(ω) Y (ω) for all ω Ω, and similarly for X Y and X a for some constant a. Facts (a) If X 0, then E[X] 0. (b) If X Y, then E[X] E[Y ]. Proof (a) If X 0, every value a of X is nonnegative. Hence, E[X] = apr[x = a] 0. a (b) X Y Y X 0 E[Y ] E[X] = E[Y X] 0. Example: B = m A m 1 B (ω) m 1 Am (ω) Pr[ m A m ] m Pr[A m ].

Uniform Distribution Roll a six-sided balanced die. Let X be the number of pips (dots). Then X is equally likely to take any of the values {1,2,...,6}. We say that X is uniformly distributed in {1,2,...,6}. More generally, we say that X is uniformly distributed in {1,2,...,n} if Pr[X = m] = 1/n for m = 1,2,...,n. In that case, E[X] = n m=1 n mpr[x = m] = m=1m 1 n = 1 n n(n + 1) 2 = n + 1 2.

Geometric Distribution Let s flip a coin with Pr[H] = p until we get H. For instance: ω 1 = H, or ω 2 = T H, or ω 3 = T T H, or ω n = T T T T T H. Note that Ω = {ω n,n = 1,2,...}. Let X be the number of flips until the first H. Then, X(ω n ) = n. Also, Pr[X = n] = (1 p) n 1 p, n 1.

Geometric Distribution Pr[X = n] = (1 p) n 1 p,n 1.

Geometric Distribution Note that Pr[X n ] = n=1 n=1 Pr[X = n] = (1 p) n 1 p,n 1. (1 p) n 1 p = p n=1 (1 p) n 1 = p Now, if a < 1, then S := n=0 an = 1 1 a. Indeed, S = 1 + a + a 2 + a 3 + as = a + a 2 + a 3 + a 4 + (1 a)s = 1 + a a + a 2 a 2 + = 1. n=0 (1 p) n. Hence, Pr[X n ] = p n=1 1 1 (1 p) = 1.

Geometric Distribution: Expectation One has Thus, X = D G(p), i.e., Pr[X = n] = (1 p) n 1 p,n 1. E[X] = n=1 npr[x = n] = n=1 n(1 p) n 1 p. E[X] = p + 2(1 p)p + 3(1 p) 2 p + 4(1 p) 3 p + (1 p)e[x] = (1 p)p + 2(1 p) 2 p + 3(1 p) 3 p + Hence, pe[x] = p + (1 p)p + (1 p) 2 p + (1 p) 3 p + = by subtracting the previous two identities Pr[X = n] = 1. n=1 E[X] = 1 p.

Geometric Distribution: Memoryless Let X be G(p). Then, for n 0, Theorem Proof: Pr[X > n] = Pr[ first n flips are T ] = (1 p) n. Pr[X > n + m X > n] = Pr[X > m],m,n 0. Pr[X > n + m X > n] = Pr[X > n + m and X > n] Pr[X > n] = Pr[X > n + m] Pr[X > n] = (1 p)n+m (1 p) n = (1 p) m = Pr[X > m].

Geometric Distribution: Memoryless - Interpretation Pr[X > n + m X > n] = Pr[X > m],m,n 0. Pr[X > n + m X > n] = Pr[A B] = Pr[A] = Pr[X > m]. The coin is memoryless, therefore, so is X.

Geometric Distribution: Yet another look Theorem: For a r.v. X that takes the values {0,1,2,...}, one has [See later for a proof.] E[X] = i=1 Pr[X i]. If X = G(p), then Pr[X i] = Pr[X > i 1] = (1 p) i 1. Hence, E[X] = i=1 (1 p) i 1 = i=0 (1 p) i = 1 1 (1 p) = 1 p.

Expected Value of Integer RV Theorem: For a r.v. X that takes values in {0,1,2,...}, one has Proof: One has E[X] = = = = = i=1 i=1 i=1 i=1 i=1 E[X] = i=1 i Pr[X = i] Pr[X i]. i{pr[x i] Pr[X i + 1]} {i Pr[X i] i Pr[X i + 1]} {i Pr[X i] (i 1) Pr[X i]} Pr[X i].

Poisson Experiment: flip a coin n times. The coin is such that Pr[H] = λ/n. Random Variable: X - number of heads. Thus, X = B(n,λ/n). Poisson Distribution is distribution of X for large n.

Poisson Experiment: flip a coin n times. The coin is such that Pr[H] = λ/n. Random Variable: X - number of heads. Thus, X = B(n,λ/n). Poisson Distribution is distribution of X for large n. We expect X n. For m n one has Pr[X = m] = = = (1) ( ) n m p m (1 p) n m, with p = λ/n n(n 1) (n m + 1) m! ( λ n n(n 1) (n m + 1) n m m! λ m ( 1 λ ) n m (2) λ m m! n m! λ m ) m ( 1 λ ) n m n ( 1 λ ) n m n ( 1 λ ) n λ m n m! e λ. For (1) we used m n; for (2) we used (1 a/n) n e a.

Poisson Distribution: Definition and Mean Definition Poisson Distribution with parameter λ > 0 X = P(λ) Pr[X = m] = λ m m! e λ,m 0. Fact: E[X] = λ. Proof: E[X] = m λ m m=1 m! e λ = e λ = e λ m=0 λ m+1 m! = e λ λe λ = λ. = e λ λ m=1 m=0 λ m (m 1)! λ m m!

Simeon Poisson The Poisson distribution is named after:

Equal Time: B. Geometric The geometric distribution is named after: I could not find a picture of D. Binomial, sorry.

Summary Random Variables A random variable X is a function X : Ω R. Pr[X = a] := Pr[X 1 (a)] = Pr[{ω X(ω) = a}]. Pr[X A] := Pr[X 1 (A)]. The distribution of X is the list of possible values and their probability: {(a,pr[x = a]),a A }. g(x,y,z ) assigns the value.... E[X] := a apr[x = a]. Expectation is Linear. B(n,p),U[1 : n],g(p),p(λ).