Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Similar documents
Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Multivariate random variables

Multivariate random variables

Lecture 2: Repetition of probability theory and statistics

Multiple Random Variables

6 The normal distribution, the central limit theorem and random samples

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Basics on Probability. Jingrui He 09/11/2007

Continuous Random Variables

1: PROBABILITY REVIEW

Convergence of Random Processes

Descriptive Statistics

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

MATH Notebook 5 Fall 2018/2019

Lecture 4: Sampling, Tail Inequalities

Algorithms for Uncertainty Quantification

Expectation of Random Variables

Quick Tour of Basic Probability Theory and Linear Algebra

Chapter 5 continued. Chapter 5 sections

Review (Probability & Linear Algebra)

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

MAS223 Statistical Inference and Modelling Exercises

Multivariate Random Variable

Probability Review. Gonzalo Mateos

Statistical Data Analysis

We introduce methods that are useful in:

1 Random variables and distributions

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Bivariate distributions

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

1 Exercises for lecture 1

Twelfth Problem Assignment

Recitation 2: Probability

Kousha Etessami. U. of Edinburgh, UK. Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 7) 1 / 13

1 Presessional Probability

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Processes. DS GA 1002 Probability and Statistics for Data Science.

Review: mostly probability and some statistics

Multiple Random Variables

6.041/6.431 Fall 2010 Quiz 2 Solutions

Chapter 4 : Expectation and Moments

CSE 312 Final Review: Section AA

Probability Background

Statistical Methods in Particle Physics

Probability and Distributions

Product measure and Fubini s theorem

Continuous RVs. 1. Suppose a random variable X has the following probability density function: π, zero otherwise. f ( x ) = sin x, 0 < x < 2

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Random Variables. P(x) = P[X(e)] = P(e). (1)

Chapter 3: Random Variables 1

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

X = X X n, + X 2

Review of Probabilities and Basic Statistics

A Probability Review

STAT 414: Introduction to Probability Theory

Relationship between probability set function and random variable - 2 -

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

3 Multiple Discrete Random Variables

Tom Salisbury

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review of probability

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Chapter 5. Chapter 5 sections

Chapter 4. Chapter 4 sections

5 Operations on Multiple Random Variables

Discrete Random Variables

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Preliminary statistics

Brief Review of Probability

Probability and Statistics Notes

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

1 Random Variable: Topics

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

Statistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University

3. Probability and Statistics

Probability: Handout

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Chapter 2: Random Variables

COMPSCI 240: Reasoning Under Uncertainty

Exercises with solutions (Set D)

S n = x + X 1 + X X n.

1 Basic continuous random variable problems

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Lecture Notes 1: Vector spaces

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Chapter 2. Discrete Distributions

1 Review of Probability

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

LIST OF FORMULAS FOR STK1100 AND STK1110

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

MATH Notebook 4 Fall 2018/2019

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

Probability reminders

Chapter 4 Multiple Random Variables

Stochastic Processes. Review of Elementary Probability Lecture I. Hamid R. Rabiee Ali Jalali

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

Continuous Distributions

Transcription:

Expectation DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda

Aim Describe random variables with a few numbers: mean, variance, covariance

Expectation operator Mean and variance Covariance Conditional expectation

Discrete random variables Average of the values of a function weighted by the pmf E (g (X )) = x R g (x) p X (x) E (g (X, Y )) = x R X x R Y g (x, y) p X,Y (x, y) ( ( )) E g X = x1 g ( x) p X ( x) x 2 x n

Continuous random variables Average of the values of a function weighted by the pdf E (g (X )) = x= g (x) f X (x) dx E (g (X, Y )) = x= y= g (x, y) f X,Y (x, y) dx dy ( ( )) E g X = g ( x) f X ( x) dx 1 dx 2... dx n x 1 = x 2 = x n=

Discrete and continuous random variables E (g (C, D)) = = c= d R D d R D g (c, d) f C (c) p D C (d c) dc c= g (c, d) p D (d) f C D (c d) dc

St Petersburg paradox A casino offers you a game Flip an unbiased coin until it lands on heads You get 2 k dollars where k = number of flips Expected gain?

St Petersburg paradox E (Gain) = 2 k 1 2 k k=1

St Petersburg paradox E (Gain) = 2 k 1 2 k k=1 =

Linearity of expectation For any constants a and b and any functions g 1 and g 2 E (a g 1 (X, Y ) + b g 2 (X, Y )) = a E (g 1 (X, Y )) + b E (g 2 (X, Y )) Follows from linearity of sums and integrals (ag 1 (x, y) + bg 2 (x, y))p X,Y (x, y) x R X x R Y = a g 1 (x, y) p X,Y (x, y) + b g 2 (x, y) p X,Y (x, y) x R Y x R Y x R X x R X

Example: Coffee beans Company buys coffee beans from two local producers Beans from Colombia: C tons/year Beans from Vietnam: V tons/year Model: C uniform between 0 and 1 V uniform between 0 and 2 C and V independent What is the expected total amount of beans B?

Example: Coffee beans E (C + V )

Example: Coffee beans E (C + V ) = E (C) + E (V )

Example: Coffee beans E (C + V ) = E (C) + E (V ) = 0.5 + 1 = 1.5 tons

Example: Coffee beans E (C + V ) = E (C) + E (V ) = 0.5 + 1 = 1.5 tons Holds even if C and V are not independent

Independence If X, Y are independent then E (g (X ) h (Y )) = E (g (X )) E (h (Y ))

Independence E (g (X ) h (Y )) = x= y= g (x) h (y) f X,Y (x, y) dx dy

Independence E (g (X ) h (Y )) = = x= y= x= y= g (x) h (y) f X,Y (x, y) dx dy g (x) h (y) f X (x) f Y (y) dx dy

Independence E (g (X ) h (Y )) = = x= y= x= y= = E (g (X )) E (h (Y )) g (x) h (y) f X,Y (x, y) dx dy g (x) h (y) f X (x) f Y (y) dx dy

Expectation operator Mean and variance Covariance Conditional expectation

Mean The mean or first moment of X is E (X ) It s the center of mass of the distribution

Bernoulli E (X ) = 0 p X (0) + 1 p X (1) = p

Binomial A binomial is a sum of n Bernoulli random variables X = n i=1 B i

Binomial A binomial is a sum of n Bernoulli random variables X = n i=1 B i ( n ) E (X ) = E B i i=1

Binomial A binomial is a sum of n Bernoulli random variables X = n i=1 B i ( n ) E (X ) = E B i = i=1 n E (B i ) i=1

Binomial A binomial is a sum of n Bernoulli random variables X = n i=1 B i ( n ) E (X ) = E B i = i=1 n E (B i ) i=1 = np

Mean of important random variables Random variable Parameters Mean Bernoulli p p Geometric p 1 p Binomial n, p np Poisson λ λ Uniform a, b a+b 2 Exponential λ 1 λ Gaussian µ, σ µ

Cauchy random variable 0.3 fx (x) 0.2 0.1 0 10 5 0 5 10 x f X (x) = 1 π(1 + x 2 ).

Cauchy random variable E(X ) = = 0 x π(1 + x 2 ) dx x π(1 + x 2 ) dx 0 x π(1 + x 2 ) dx

Cauchy random variable E(X ) = = 0 x π(1 + x 2 ) dx x π(1 + x 2 ) dx 0 x π(1 + x 2 ) dx 0 x π(1 + x 2 ) dx = 0 1 2π(1 + t) dt = lim t log(1 + t) 2π

Cauchy random variable E(X ) = = 0 x π(1 + x 2 ) dx x π(1 + x 2 ) dx 0 x π(1 + x 2 ) dx 0 x π(1 + x 2 ) dx = 1 0 2π(1 + t) dt log(1 + t) = lim t 2π =

Mean of a random vector Vector formed by the means of its components E (X 1 ) ( ) E X := E (X 2 ) E (X n ) By linearity of expectation, for any matrix A R m n and b R m ( E AX + ) ( ) b = A E X + b

The mean as a typical value The mean is a typical value of the random variable The probability that X equals E (X ) can be zero The mean can be severely distorted by a subset of extreme values

Density with subset of extreme values 0.1 fx (x) 0 0 20 40 60 80 100 x Uniform random variable X with support [ 4.5, 4.5] [99.5, 100.5]

Density with subset of extreme values 4.5 100.5 E (X ) = x f X (x) dx + x f X (x) dx x= 4.5 x=99.5 = 1 100.5 2 99.5 2 10 2 = 10

Density with subset of extreme values 0.1 fx (x) 0 0 20 40 60 80 100 x

Median Midpoint of the distribution: number m such that P (X m) 1 2 and P (X m) 1 2 For continuous random variables F X (m) = m f X (x) dx = 1 2

Density with subset of extreme values F X (m) = m 4.5 = m + 4.5 10 f X (x) dx

Density with subset of extreme values F X (m) = m 4.5 = m + 4.5 10 = 1 2 f X (x) dx m = 0.5

Density with subset of extreme values 0.1 Mean Median fx (x) 0 0 20 40 60 80 100 x

Variance The mean square or second moment of X is E ( X 2) The variance of X is Var (X ) := E ((X E (X )) 2) = E ( X 2 2X E (X ) + E 2 (X ) ) = E ( X 2) E 2 (X ) The standard deviation of X is σ X := Var (X )

Bernoulli E ( X 2) = 0 p X (0) + 1 p X (1) = p Var (X ) = E ( X 2) E 2 (X ) = p p 2 = p (1 p)

Variance of common random variables Random variable Parameters Variance Bernoulli p p (1 p) Geometric p 1 p p 2 Binomial n, p np (1 p) Poisson λ λ Uniform a, b (b a) 2 12 Exponential λ 1 λ 2 Gaussian µ, σ σ 2

Geometric (p = 0.2) 0.2 0.15 px (k) 0.1 5 10 2 0 0 5 10 15 20 k

Binomial (n = 20, p = 0.5) 0.2 0.15 0.1 5 10 2 0 0 5 10 15 20 k

Poisson (λ = 25) 8 10 2 6 4 2 0 10 20 30 40 k

Uniform [0, 1] 1 0.8 fx (x) 0.6 0.4 0.2 0 0.5 0 0.5 1 1.5 x

Exponential (λ = 1) 1 0.8 0.6 0.4 0.2 0 0 1 2 3 4 5 x

Gaussian (µ = 0, σ = 1) 0.4 0.3 0.2 0.1 0 4 2 0 2 4 x

Variance The variance operator is not linear, but Var (a X + b) = E ((a X + b E (a X + b)) 2) = E ((a X + b ae (X ) b) 2) = a 2 E ((X E (X )) 2) = a 2 Var (X )

Bounding probabilities using expectations Aim: Characterize behavior of X to some extent using E (X ) and Var (X )

Markov s inequality For any nonnegative random variable X and any a > 0 P (X a) E (X ) a

Markov s inequality Consider the indicator variable 1 X a X a 1 X a 0

Markov s inequality Consider the indicator variable 1 X a X a 1 X a 0 E (X ) a E (1 X a )

Markov s inequality Consider the indicator variable 1 X a X a 1 X a 0 E (X ) a E (1 X a ) = a P (X a)

Age of students at NYU Mean: 20 years How many are younger than 30?

Age of students at NYU Mean: 20 years How many are younger than 30? P(A 30) E (A) 30

Age of students at NYU Mean: 20 years How many are younger than 30? At least 1/3 P(A 30) E (A) 30 = 2 3

Chebyshev s inequality For any positive constant a > 0, P ( X E (X ) a) Var (X ) a 2

Chebyshev s inequality For any positive constant a > 0, P ( X E (X ) a) Var (X ) a 2 Corollary: If Var (X ) = 0 then P (X E (X )) = 0

Chebyshev s inequality For any positive constant a > 0, P ( X E (X ) a) Var (X ) a 2 Corollary: If Var (X ) = 0 then P (X E (X )) = 0 For any ɛ > 0 P ( X E (X ) ɛ) Var (X ) ɛ 2 = 0

Chebyshev s inequality Define Y := (X E (X )) 2 By Markov s inequality P ( X E (X ) a) = P ( Y a 2)

Chebyshev s inequality Define Y := (X E (X )) 2 By Markov s inequality P ( X E (X ) a) = P ( Y a 2) E (Y ) a 2

Chebyshev s inequality Define Y := (X E (X )) 2 By Markov s inequality P ( X E (X ) a) = P ( Y a 2) E (Y ) = a 2 Var (X ) a 2

Age of students at NYU Mean: 20 years, standard deviation: 3 years How many are younger than 30?

Age of students at NYU Mean: 20 years, standard deviation: 3 years How many are younger than 30? P(A 30) P( A 20 10)

Age of students at NYU Mean: 20 years, standard deviation: 3 years How many are younger than 30? At least 91 % P(A 30) P( A 20 10) Var (A) 100 = 9 100

Expectation operator Mean and variance Covariance Conditional expectation

Covariance The covariance of X and Y is Cov (X, Y ) := E ((X E (X )) (Y E (Y ))) = E (XY Y E (X ) X E (Y ) + E (X ) E (Y )) = E (XY ) E (X ) E (Y ) If Cov (X, Y ) = 0, X and Y are uncorrelated

Covariance Cov (X, Y ) 0.5 0.9 0.99 Cov (X, Y ) 0-0.9-0.99

Variance of the sum Var (X + Y ) = E ((X + Y E (X + Y )) 2) ( = E (X E (X )) 2) + E ((Y E (Y )) 2) + 2E ((X E (X )) (Y E (Y ))) = Var (X ) + Var (Y ) + 2 Cov (X, Y )

Variance of the sum Var (X + Y ) = E ((X + Y E (X + Y )) 2) ( = E (X E (X )) 2) + E ((Y E (Y )) 2) + 2E ((X E (X )) (Y E (Y ))) = Var (X ) + Var (Y ) + 2 Cov (X, Y ) If X and Y are uncorrelated, then Var (X + Y ) = Var (X ) + Var (Y )

Independence implies uncorrelation Cov (X, Y ) = E (XY ) E (X ) E (Y ) = E (X ) E (Y ) E (X ) E (Y ) = 0

Uncorrelation does not imply independence X, Y are independent Bernoulli with parameter 1 2 Let U = X + Y and V = X Y Are U and V independent? Are they uncorrelated?

Uncorrelation does not imply independence p U (0) p V (0) p U,V (0, 0)

Uncorrelation does not imply independence p U (0) = P (X = 0, Y = 0) = 1 4 p V (0) p U,V (0, 0)

Uncorrelation does not imply independence p U (0) = P (X = 0, Y = 0) = 1 4 p V (0) = P (X = 1, Y = 1) + P (X = 0, Y = 0) = 1 2 p U,V (0, 0)

Uncorrelation does not imply independence p U (0) = P (X = 0, Y = 0) = 1 4 p V (0) = P (X = 1, Y = 1) + P (X = 0, Y = 0) = 1 2 p U,V (0, 0) = P (X = 0, Y = 0) = 1 4

Uncorrelation does not imply independence p U (0) = P (X = 0, Y = 0) = 1 4 p V (0) = P (X = 1, Y = 1) + P (X = 0, Y = 0) = 1 2 p U,V (0, 0) = P (X = 0, Y = 0) = 1 4 p U (0) p V (0) = 1 8

Uncorrelation does not imply independence Cov (U, V ) = E (UV ) E (U) E (V ) = E ((X + Y ) (X Y )) E (X + Y ) E (X Y ) = E ( X 2) E ( Y 2) E 2 (X ) + E 2 (Y )

Uncorrelation does not imply independence Cov (U, V ) = E (UV ) E (U) E (V ) = E ((X + Y ) (X Y )) E (X + Y ) E (X Y ) = E ( X 2) E ( Y 2) E 2 (X ) + E 2 (Y ) = 0

Correlation coefficient Pearson correlation coefficient of X and Y ρ X,Y := Cov (X, Y ) σ X σ Y. Covariance between X /σ X and Y /σ Y

Correlation coefficient σ Y = 1, Cov (X, Y ) = 0.9, ρ X,Y = 0.9 σ Y = 3, Cov (X, Y ) = 0.9, ρ X,Y = 0.3 σ Y = 3, Cov (X, Y ) = 2.7, ρ X,Y = 0.9

Cauchy-Schwarz inequality For any X and Y E (XY ) E (X 2 ) E (Y 2 ). and E (XY ) = E (X 2 ) E (Y 2 E (Y ) Y = 2 ) E (X 2 ) X E (XY ) = E (X 2 ) E (Y 2 E (Y ) Y = 2 ) E (X 2 ) X

Cauchy-Schwarz inequality We have Cov (X, Y ) σ X σ Y and equivalently ρ X,Y 1 In addition ρ X,Y = 1 Y = c X + d where c := { σy σ X if ρ X,Y = 1, σ Y σ X if ρ X,Y = 1, d := E (Y ) ce (X )

Covariance matrix of a random vector The covariance matrix of X is defined as Var (X 1 ) Cov (X 1, X 2 ) Cov (X 1, X n ) Cov (X 2, X 1 ) Var (X 2 ) Cov (X 2, X n ) Σ X =...... Cov (X n, X 2 ) Cov (X n, X 2 ) Var (X n ) ( = E X X ) ( ) ( ) T T E X E X

Covariance matrix after a linear transformation Σ A X + b

Covariance matrix after a linear transformation ( ( Σ AX + b = E AX + ) ( b AX + ) ) T ( b E AX + ) ( b E AX + ) T b

Covariance matrix after a linear transformation ( ( Σ AX + b = E AX + ) ( b AX + ) ) T ( b E AX + ) ( b E AX + ) T b ( = A E X X ) T A T + ( ) T ( ) b E X A T + A E X b T + b b T ( ) ( ) T ( ) A E X E X A T A E X b T ( ) T b E X A T b b T

Covariance matrix after a linear transformation ( ( Σ AX + b = E AX + ) ( b AX + ) ) T ( b E AX + ) ( b E AX + ) T b ( = A E X X ) T A T + ( ) T ( ) b E X A T + A E X b T + b b T ( ) ( ) T X X A T b b T = A A E ( E ( X X T ) E ( ) T ( ) E X A T A E X b T b E ( ) X E ( X ) T ) A T

Covariance matrix after a linear transformation ( ( Σ AX + b = E AX + ) ( b AX + ) ) T ( b E AX + ) ( b E AX + ) T b ( = A E X X ) T A T + ( ) T ( ) b E X A T + A E X b T + b b T ( ) ( ) T X X A T b b T = A A E ( E ( X X T ) E = AΣ X A T ( ) T ( ) E X A T A E X b T b E ( ) X E ( X ) T ) A T

Variance in a fixed direction For any unit vector u ) Var ( u T X = u T Σ X u

Direction of maximum variance To find direction of maximum variance we must solve arg max u 2 =1 ut Σ X u

Linear algebra Symmetric matrices have orthogonal eigenvectors Σ X = UΛU T λ 1 0 0 = [ ] u 1 u 2 u n 0 λ 2 0 [ u1 u 2 ] T u n 0 0 λ n

Linear algebra λ 1 = max u 2 =1 ut Au u 1 = arg max u 2 =1 ut Au λ k = max u 2 =1,u u 1,...,u k 1 u T Au u k = arg max u T Au u 2 =1,u u 1,...,u k 1

Direction of maximum variance λ1 = 1.22, λ2 = 0.71 λ1 = 1, λ 2 = 1 λ1 = 1.38, λ2 = 0.32

Coloring Goal: Transform uncorrelated samples with unit variance so that they have a prescribed covariance matrix Σ 1. Compute the eigendecomposition Σ = UΛU T. 2. Set y := U Λ x where λ1 0 0 Λ := 0 λ2 0 0 0 λn

Coloring Σ Y

Coloring Σ Y = U ΛΣ X Λ T U T

Coloring Σ Y = U ΛΣ X Λ T U T = U ΛI Λ T U T

Coloring Σ Y = U ΛΣ X Λ T U T = U ΛI Λ T U T = Σ

Coloring X Λ X U Λ X

Generating Gaussian random vectors Goal: Sampling from an n-dimensional Gaussian random vector with mean µ and covariance matrix Σ 1. Generate n independent standard Gaussian samples x 2. Compute the eigendecomposition Σ = UΛU T 3. Set y := U Λ x + µ For non-gaussian random vectors, coloring does not necessarily preserve the distribution

For Gaussian rvs uncorrelation implies mutual independence Uncorrelation implies σ 2 1 0 0 0 σ2 2 0 Σ X =...... 0 0 σn 2 which in turn implies 1 f X ( x) = ( (2π) n Σ exp 1 ) 2 ( x µ)t Σ 1 ( x µ)

For Gaussian rvs uncorrelation implies mutual independence Uncorrelation implies σ 2 1 0 0 0 σ2 2 0 Σ X =...... 0 0 σn 2 which in turn implies 1 f X ( x) = ( (2π) n Σ exp 1 ) 2 ( x µ)t Σ 1 ( x µ) = n i=1 ( ) 1 exp (x i µ i ) 2 (2π)σi 2σi 2

For Gaussian rvs uncorrelation implies mutual independence Uncorrelation implies σ 2 1 0 0 0 σ2 2 0 Σ X =...... 0 0 σn 2 which in turn implies 1 f X ( x) = ( (2π) n Σ exp 1 ) 2 ( x µ)t Σ 1 ( x µ) = = n i=1 ( ) 1 exp (x i µ i ) 2 (2π)σi 2σi 2 n f Xi (x i ) i=1

Expectation operator Mean and variance Covariance Conditional expectation

Conditional expectation Expectation of g (X, Y ) given X = x? E (g (X, Y ) X = x) = Can be interpreted as a function y= h (x) := E (g (X, Y ) X = x) g(x, y) f Y X (y x) dy, The conditional expectation of g (X, Y ) given X is It s a random variable E (g (X, Y ) X ) := h (X )

Iterated expectation For any X and Y and any function g : R 2 R E (g (X, Y )) = E (E (g (X, Y ) X ))

Iterated expectation h (x) := E (g (X, Y ) X = x) = y= g (x, y) f Y X (y x) dy

Iterated expectation h (x) := E (g (X, Y ) X = x) = y= g (x, y) f Y X (y x) dy E (E (g (X, Y ) X )) = E (h (X ))

Iterated expectation h (x) := E (g (X, Y ) X = x) = y= g (x, y) f Y X (y x) dy E (E (g (X, Y ) X )) = E (h (X )) = x= h (x) f X (x) dx

Iterated expectation h (x) := E (g (X, Y ) X = x) = y= g (x, y) f Y X (y x) dy E (E (g (X, Y ) X )) = E (h (X )) = = x= x= h (x) f X (x) dx y= f X (x) f Y X (y x) g (x, y) dy dx

Iterated expectation h (x) := E (g (X, Y ) X = x) = y= g (x, y) f Y X (y x) dy E (E (g (X, Y ) X )) = E (h (X )) = = x= x= h (x) f X (x) dx y= = E (g (X, Y )) f X (x) f Y X (y x) g (x, y) dy dx

Example: Desert Car traveling through the desert Time until the car breaks down: T State of the motor: M State of the road: R Model: M uniform between 0 (no problem) and 1 (very bad) R uniform between 0 (no problem) and 1 (very bad) M and R independent T exponential with parameter M + R

Example: Desert E (T ) = E (E (T M, R))

Example: Desert E (T ) = E (E (T M, R)) ( ) 1 = E M + R

Example: Desert E (T ) = E (E (T M, R)) ( ) 1 = E M + R = 1 1 0 0 1 dm dr m + r

Example: Desert E (T ) = E (E (T M, R)) ( ) 1 = E M + R = = 1 1 0 1 0 0 1 dm dr m + r log (r + 1) log (r) dr

Example: Desert E (T ) = E (E (T M, R)) ( ) 1 = E M + R = = 1 1 0 1 0 0 1 dm dr m + r log (r + 1) log (r) dr = log 4 = 1.39

Grizzlies in Yellowstone Model for the weight of grizzly bears in Yellowstone: Males: Gaussian with µ := 240 kg and σ := 40kg Females: Gaussian with µ := 140 kg and σ := 20kg There are about the same number of females and males

Grizzlies in Yellowstone E (W ) = E (E (W S))

Grizzlies in Yellowstone E (W ) = E (E (W S)) = E (W S = 0) + E (W S = 1) 2

Grizzlies in Yellowstone E (W ) = E (E (W S)) E (W S = 0) + E (W S = 1) = 2 = 180 kg

Bayesian coin flip Bayesian methods often endow parameters of discrete distributions with a continuous marginal distribution You suspect a coin is biased You are uncertain about the bias so you model it as a random variable with pdf f B (b) = 2t for t [0, 1] What is the expected value of the coin flip X?

Bayesian coin flip E (X ) = E (E (X B))

Bayesian coin flip E (X ) = E (E (X B)) = E (B)

Bayesian coin flip E (X ) = E (E (X B)) = E (B) = 1 0 2b 2 db

Bayesian coin flip E (X ) = E (E (X B)) = E (B) = 1 0 = 2 3 2b 2 db