Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Similar documents
Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Multivariate random variables

Multivariate random variables

Lecture 2: Repetition of probability theory and statistics

Descriptive Statistics

Random variables. DS GA 1002 Probability and Statistics for Data Science.

6 The normal distribution, the central limit theorem and random samples

Multiple Random Variables

Review (Probability & Linear Algebra)

1: PROBABILITY REVIEW

Convergence of Random Processes

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

MATH Notebook 5 Fall 2018/2019

Basics on Probability. Jingrui He 09/11/2007

Continuous Random Variables

Lecture 4: Sampling, Tail Inequalities

Algorithms for Uncertainty Quantification

Expectation of Random Variables

Quick Tour of Basic Probability Theory and Linear Algebra

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Chapter 5 continued. Chapter 5 sections

1 Exercises for lecture 1

MAS223 Statistical Inference and Modelling Exercises

We introduce methods that are useful in:

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Probability Review. Gonzalo Mateos

Twelfth Problem Assignment

Statistical Data Analysis

Recitation 2: Probability

Multivariate Random Variable

6.041/6.431 Fall 2010 Quiz 2 Solutions

Chapter 4 : Expectation and Moments

Statistical Methods in Particle Physics

Bivariate distributions

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Probability Background

1 Presessional Probability

Probability and Distributions

Product measure and Fubini s theorem

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Continuous RVs. 1. Suppose a random variable X has the following probability density function: π, zero otherwise. f ( x ) = sin x, 0 < x < 2

Kousha Etessami. U. of Edinburgh, UK. Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 7) 1 / 13

Chapter 3: Random Variables 1

Multiple Random Variables

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Review: mostly probability and some statistics

X = X X n, + X 2

Review of Probabilities and Basic Statistics

Random Processes. DS GA 1002 Probability and Statistics for Data Science.

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Relationship between probability set function and random variable - 2 -

3 Multiple Discrete Random Variables

3. Probability and Statistics

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Polytechnic Institute of NYU MA 2212 MIDTERM Feb 12, 2009

A Probability Review

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Review of probability

1 Random variables and distributions

5 Operations on Multiple Random Variables

Discrete Random Variables

Chapter 4. Chapter 4 sections

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Probability and Statistics Notes

Brief Review of Probability

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

Chapter 5. Chapter 5 sections

STAT 414: Introduction to Probability Theory

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Lecture Notes 1: Vector spaces

Tom Salisbury

S n = x + X 1 + X X n.

1 Basic continuous random variable problems

Chapter 2: Random Variables

Exercises with solutions (Set D)

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

MATH Notebook 4 Fall 2018/2019

1 Review of Probability

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

LIST OF FORMULAS FOR STK1100 AND STK1110

Stochastic Processes. Review of Elementary Probability Lecture I. Hamid R. Rabiee Ali Jalali

Introduction to Probability and Stocastic Processes - Part I

Mathematical Statistics

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Continuous Distributions

PROBABILITY DISTRIBUTION

CSE 312 Final Review: Section AA

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

COMP2610/COMP Information Theory

Preliminary statistics

Chapter 6: Functions of Random Variables

Random Variables. P(x) = P[X(e)] = P(e). (1)

FINAL EXAM: Monday 8-10am

Formulas for probability theory and linear models SF2941

Chapter 3: Random Variables 1

Transcription:

Expectation DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda

Aim Describe random variables with a few numbers: mean, variance, covariance

Expectation operator Mean and variance Covariance Conditional expectation

Discrete random variables Average of the values of a function weighted by the pmf E (g (X )) = x R g (x) p X (x) E (g (X, Y )) = x R X x R Y g (x, y) p X,Y (x, y) ( ( )) E g X = x1 g ( x) p X ( x) x 2 x n

Continuous random variables Average of the values of a function weighted by the pdf E (g (X )) = x= g (x) f X (x) dx E (g (X, Y )) = x= y= g (x, y) f X,Y (x, y) dx dy ( ( )) E g X = g ( x) f X ( x) dx 1 dx 2... dx n x 1 = x 2 = x n=

Discrete and continuous random variables E (g (C, D)) = = c= d R D d R D g (c, d) f C (c) p D C (d c) dc c= g (c, d) p D (d) f C D (c d) dc

St Petersburg paradox A casino offers you a game Flip an unbiased coin until it lands on heads You get 2 k dollars where k = number of flips Expected gain?

St Petersburg paradox E (Gain) = 2 k 1 2 k k=1

St Petersburg paradox E (Gain) = 2 k 1 2 k k=1 =

Linearity of expectation For any constants a and b and any functions g 1 and g 2 E (a g 1 (X, Y ) + b g 2 (X, Y )) = a E (g 1 (X, Y )) + b E (g 2 (X, Y )) Follows from linearity of sums and integrals

Example: Coffee beans Company buys coffee beans from two local producers Beans from Colombia: C tons/year Beans from Vietnam: V tons/year Model: C uniform between 0 and 1 V uniform between 0 and 2 C and V independent What is the expected total amount of beans B?

Example: Coffee beans E (C + V )

Example: Coffee beans E (C + V ) = E (C) + E (V )

Example: Coffee beans E (C + V ) = E (C) + E (V ) = 0.5 + 1 = 1.5 tons

Example: Coffee beans E (C + V ) = E (C) + E (V ) = 0.5 + 1 = 1.5 tons Holds even if C and V are not independent

Independence If X, Y are independent then E (g (X ) h (Y )) = E (g (X )) E (h (Y ))

Independence E (g (X ) h (Y )) = x= y= g (x) h (y) f X,Y (x, y) dx dy

Independence E (g (X ) h (Y )) = = x= y= x= y= g (x) h (y) f X,Y (x, y) dx dy g (x) h (y) f X (x) f Y (y) dx dy

Independence E (g (X ) h (Y )) = = x= y= x= y= = E (g (X )) E (h (Y )) g (x) h (y) f X,Y (x, y) dx dy g (x) h (y) f X (x) f Y (y) dx dy

Expectation operator Mean and variance Covariance Conditional expectation

Mean The mean or first moment of X is E (X ) It s the center of mass of the distribution

Bernoulli E (X ) = 0 p X (0) + 1 p X (1) = p

Binomial A binomial is a sum of n Bernoulli random variables X = n i=1 B i

Binomial A binomial is a sum of n Bernoulli random variables X = n i=1 B i ( n ) E (X ) = E B i i=1

Binomial A binomial is a sum of n Bernoulli random variables X = n i=1 B i ( n ) E (X ) = E B i = i=1 n E (B i ) i=1

Binomial A binomial is a sum of n Bernoulli random variables X = n i=1 B i ( n ) E (X ) = E B i = i=1 n E (B i ) i=1 = np

Mean of important random variables Random variable Parameters Mean Bernoulli p p Geometric p 1 p Binomial n, p np Poisson λ λ Uniform a, b a+b 2 Exponential λ 1 λ Gaussian µ, σ µ

Cauchy random variable 0.3 fx (x) 0.2 0.1 0 10 5 0 5 10 x f X (x) = 1 π(1 + x 2 ).

Cauchy random variable E(X ) = = 0 x π(1 + x 2 ) dx x π(1 + x 2 ) dx 0 x π(1 + x 2 ) dx

Cauchy random variable E(X ) = = 0 x π(1 + x 2 ) dx x π(1 + x 2 ) dx 0 x π(1 + x 2 ) dx 0 x π(1 + x 2 ) dx = 0 1 2π(1 + t) dt = lim t log(1 + t) 2π

Cauchy random variable E(X ) = = 0 x π(1 + x 2 ) dx x π(1 + x 2 ) dx 0 x π(1 + x 2 ) dx 0 x π(1 + x 2 ) dx = 1 0 2π(1 + t) dt log(1 + t) = lim t 2π =

Mean of a random vector Vector formed by the means of its components E (X 1 ) ( ) E X := E (X 2 ) E (X n ) By linearity of expectation, for any matrix A R m n and b R m ( E AX + ) ( ) b = A E X + b

The mean as a typical value The mean is a typical value of the random variable The probability that X equals E (X ) can be zero The mean can be severely distorted by a subset of extreme values

Density with subset of extreme values 0.1 fx (x) 0 0 20 40 60 80 100 x Uniform random variable X with support [ 4.5, 4.5] [99.5, 100.5]

Density with subset of extreme values 4.5 100.5 E (X ) = x f X (x) dx + x f X (x) dx x= 4.5 x=99.5 = 1 100.5 2 99.5 2 10 2 = 10

Density with subset of extreme values 0.1 fx (x) 0 0 20 40 60 80 100 x

Median Midpoint of the distribution: number m such that P (X m) 1 2 and P (X m) 1 2 For continuous random variables F X (m) = m f X (x) dx = 1 2

Density with subset of extreme values F X (m) = m 4.5 = m + 4.5 10 f X (x) dx

Density with subset of extreme values F X (m) = m 4.5 = m + 4.5 10 = 1 2 f X (x) dx m = 0.5

Density with subset of extreme values 0.1 Mean Median fx (x) 0 0 20 40 60 80 100 x

Variance The mean square or second moment of X is E ( X 2) The variance of X is Var (X ) := E ((X E (X )) 2) = E ( X 2 2X E (X ) + E 2 (X ) ) = E ( X 2) E 2 (X ) The standard deviation of X is σ X := Var (X )

Bernoulli E ( X 2) = 0 p X (0) + 1 p X (1) = p Var (X ) = E ( X 2) E 2 (X ) = p p 2 = p (1 p)

Variance of common random variables Random variable Parameters Variance Bernoulli p p (1 p) Geometric p 1 p p 2 Binomial n, p np (1 p) Poisson λ λ Uniform a, b (b a) 2 12 Exponential λ 1 λ 2 Gaussian µ, σ σ 2

Geometric (p = 0.2) 0.2 0.15 px (k) 0.1 5 10 2 0 0 5 10 15 20 k

Binomial (n = 20, p = 0.5) 0.2 0.15 0.1 5 10 2 0 0 5 10 15 20 k

Poisson (λ = 25) 8 10 2 6 4 2 0 10 20 30 40 k

Uniform [0, 1] 1 0.8 fx (x) 0.6 0.4 0.2 0 0.5 0 0.5 1 1.5 x

Exponential (λ = 1) 1 0.8 0.6 0.4 0.2 0 0 1 2 3 4 5 x

Gaussian (µ = 0, σ = 1) 0.4 0.3 0.2 0.1 0 4 2 0 2 4 x

Variance The variance operator is not linear, but Var (a X + b) = E ((a X + b E (a X + b)) 2) = E ((a X + b ae (X ) b) 2) = a 2 E ((X E (X )) 2) = a 2 Var (X )

Bounding probabilities using expectations Aim: Characterize behavior of X to some extent using E (X ) and Var (X )

Markov s inequality For any nonnegative random variable X and any a > 0 P (X a) E (X ) a

Markov s inequality Consider the indicator variable 1 X a X a 1 X a 0

Markov s inequality Consider the indicator variable 1 X a X a 1 X a 0 E (X ) a E (1 X a )

Markov s inequality Consider the indicator variable 1 X a X a 1 X a 0 E (X ) a E (1 X a ) = a P (X a)

Age of students at NYU Mean: 20 years How many are younger than 30?

Age of students at NYU Mean: 20 years How many are younger than 30? P(A 30) E (A) 30

Age of students at NYU Mean: 20 years How many are younger than 30? At least 1/3 P(A 30) E (A) 30 = 2 3

Chebyshev s inequality For any positive constant a > 0, P ( X E (X ) a) Var (X ) a 2

Chebyshev s inequality For any positive constant a > 0, P ( X E (X ) a) Var (X ) a 2 Corollary: If Var (X ) = 0 then P (X E (X )) = 0

Chebyshev s inequality For any positive constant a > 0, P ( X E (X ) a) Var (X ) a 2 Corollary: If Var (X ) = 0 then P (X E (X )) = 0 For any ɛ > 0 P ( X E (X ) ɛ) Var (X ) ɛ 2 = 0

Chebyshev s inequality Define Y := (X E (X )) 2 By Markov s inequality P ( X E (X ) a) = P ( Y a 2)

Chebyshev s inequality Define Y := (X E (X )) 2 By Markov s inequality P ( X E (X ) a) = P ( Y a 2) E (Y ) a 2

Chebyshev s inequality Define Y := (X E (X )) 2 By Markov s inequality P ( X E (X ) a) = P ( Y a 2) E (Y ) = a 2 Var (X ) a 2

Age of students at NYU Mean: 20 years, standard deviation: 3 years How many are younger than 30?

Age of students at NYU Mean: 20 years, standard deviation: 3 years How many are younger than 30? P(A 30) P( A 20 10)

Age of students at NYU Mean: 20 years, standard deviation: 3 years How many are younger than 30? At least 91 % P(A 30) P( A 20 10) Var (A) 100 = 9 100

Expectation operator Mean and variance Covariance Conditional expectation

Covariance The covariance of X and Y is Cov (X, Y ) := E ((X E (X )) (Y E (Y ))) = E (XY Y E (X ) X E (Y ) + E (X ) E (Y )) = E (XY ) E (X ) E (Y ) If Cov (X, Y ) = 0, X and Y are uncorrelated

Covariance Cov (X, Y ) 0.5 0.9 0.99 Cov (X, Y ) 0-0.9-0.99

Variance of the sum Var (X + Y ) = E ((X + Y E (X + Y )) 2) ( = E (X E (X )) 2) + E ((Y E (Y )) 2) + 2E ((X E (X )) (Y E (Y ))) = Var (X ) + Var (Y ) + 2 Cov (X, Y )

Variance of the sum Var (X + Y ) = E ((X + Y E (X + Y )) 2) ( = E (X E (X )) 2) + E ((Y E (Y )) 2) + 2E ((X E (X )) (Y E (Y ))) = Var (X ) + Var (Y ) + 2 Cov (X, Y ) If X and Y are uncorrelated, then Var (X + Y ) = Var (X ) + Var (Y )

Independence implies uncorrelation Cov (X, Y ) = E (XY ) E (X ) E (Y ) = E (X ) E (Y ) E (X ) E (Y ) = 0

Uncorrelation does not imply independence X, Y are independent Bernoulli with parameter 1 2 Let U = X + Y and V = X Y Are U and V independent? Are they uncorrelated?

Uncorrelation does not imply independence p U (0) p V (0) p U,V (0, 0)

Uncorrelation does not imply independence p U (0) = P (X = 0, Y = 0) = 1 4 p V (0) p U,V (0, 0)

Uncorrelation does not imply independence p U (0) = P (X = 0, Y = 0) = 1 4 p V (0) = P (X = 1, Y = 1) + P (X = 0, Y = 0) = 1 2 p U,V (0, 0)

Uncorrelation does not imply independence p U (0) = P (X = 0, Y = 0) = 1 4 p V (0) = P (X = 1, Y = 1) + P (X = 0, Y = 0) = 1 2 p U,V (0, 0) = P (X = 0, Y = 0) = 1 4

Uncorrelation does not imply independence p U (0) = P (X = 0, Y = 0) = 1 4 p V (0) = P (X = 1, Y = 1) + P (X = 0, Y = 0) = 1 2 p U,V (0, 0) = P (X = 0, Y = 0) = 1 4 p U (0) p V (0) = 1 8

Uncorrelation does not imply independence Cov (U, V ) = E (UV ) E (U) E (V ) = E ((X + Y ) (X Y )) E (X + Y ) E (X Y ) = E ( X 2) E ( Y 2) E 2 (X ) + E 2 (Y )

Uncorrelation does not imply independence Cov (U, V ) = E (UV ) E (U) E (V ) = E ((X + Y ) (X Y )) E (X + Y ) E (X Y ) = E ( X 2) E ( Y 2) E 2 (X ) + E 2 (Y ) = 0

Correlation coefficient Pearson correlation coefficient of X and Y ρ X,Y := Cov (X, Y ) σ X σ Y. Covariance between X /σ X and Y /σ Y

Correlation coefficient σ Y = 1, Cov (X, Y ) = 0.9, ρ X,Y = 0.9 σ Y = 3, Cov (X, Y ) = 0.9, ρ X,Y = 0.3 σ Y = 3, Cov (X, Y ) = 2.7, ρ X,Y = 0.9

Cauchy-Schwarz inequality For any X and Y E (XY ) E (X 2 ) E (Y 2 ). and E (XY ) = E (X 2 ) E (Y 2 E (Y ) Y = 2 ) E (X 2 ) X E (XY ) = E (X 2 ) E (Y 2 E (Y ) Y = 2 ) E (X 2 ) X

Cauchy-Schwarz inequality We have Cov (X, Y ) σ X σ Y and equivalently ρ X,Y 1 In addition ρ X,Y = 1 Y = c X + d where c := { σy σ X if ρ X,Y = 1, σ Y σ X if ρ X,Y = 1, d := E (Y ) ce (X )

Covariance matrix of a random vector The covariance matrix of X is defined as Var (X 1 ) Cov (X 1, X 2 ) Cov (X 1, X n ) Cov (X 2, X 1 ) Var (X 2 ) Cov (X 2, X n ) Σ X =...... Cov (X n, X 2 ) Cov (X n, X 2 ) Var (X n ) ( = E X X ) ( ) ( ) T T E X E X

Covariance matrix after a linear transformation Σ A X + b

Covariance matrix after a linear transformation ( ( Σ AX + b = E AX + ) ( b AX + ) ) T ( b E AX + ) ( b E AX + ) T b

Covariance matrix after a linear transformation ( ( Σ AX + b = E AX + ) ( b AX + ) ) T ( b E AX + ) ( b E AX + ) T b ( = A E X X ) T A T + ( ) T ( ) b E X A T + A E X b T + b b T ( ) ( ) T ( ) A E X E X A T A E X b T ( ) T b E X A T b b T

Covariance matrix after a linear transformation ( ( Σ AX + b = E AX + ) ( b AX + ) ) T ( b E AX + ) ( b E AX + ) T b ( = A E X X ) T A T + ( ) T ( ) b E X A T + A E X b T + b b T ( ) ( ) T X X A T b b T = A A E ( E ( X X T ) E ( ) T ( ) E X A T A E X b T b E ( ) X E ( X ) T ) A T

Covariance matrix after a linear transformation ( ( Σ AX + b = E AX + ) ( b AX + ) ) T ( b E AX + ) ( b E AX + ) T b ( = A E X X ) T A T + ( ) T ( ) b E X A T + A E X b T + b b T ( ) ( ) T X X A T b b T = A A E ( E ( X X T ) E = AΣ X A T ( ) T ( ) E X A T A E X b T b E ( ) X E ( X ) T ) A T

Variance in a fixed direction For any unit vector u ) Var ( u T X = u T Σ X u

Direction of maximum variance To find direction of maximum variance we must solve arg max u 2 =1 ut Σ X u

Linear algebra Symmetric matrices have orthogonal eigenvectors Σ X = UΛU T λ 1 0 0 = [ ] u 1 u 2 u n 0 λ 2 0 [ u1 u 2 ] T u n 0 0 λ n

Linear algebra λ 1 = max u 2 =1 ut Au u 1 = arg max u 2 =1 ut Au λ k = max u 2 =1,u u 1,...,u k 1 u T Au u k = arg max u T Au u 2 =1,u u 1,...,u k 1

Direction of maximum variance λ1 = 1.22, λ2 = 0.71 λ1 = 1, λ 2 = 1 λ1 = 1.38, λ2 = 0.32

Whitening Let Σ X = UΛU T be full rank All the entries of Λ 1 U T X, where 1 λ1 0 0 Λ 1 0 1 := λ2 0, 1 0 0 λn are uncorrelated

Whitening Σ Λ 1 U T X = Λ 1 U T Σ X U Λ 1

Whitening Σ Λ 1 U T X = Λ 1 U T Σ X U Λ 1 = Λ 1 U T UΛU T U Λ 1

Whitening Σ Λ 1 U T X = Λ 1 U T Σ X U Λ 1 = Λ 1 U T UΛU T U Λ 1 = Λ 1 Λ Λ 1 because U T U = I

Whitening Σ Λ 1 U T X = Λ 1 U T Σ X U Λ 1 = Λ 1 U T UΛU T U Λ 1 = Λ 1 Λ Λ 1 because U T U = I = I

Whitening X U T X Λ 1 U T X

For Gaussian rvs uncorrelation implies mutual independence Uncorrelation implies σ 2 1 0 0 0 σ2 2 0 Σ X =...... 0 0 σn 2 which in turn implies 1 f X ( x) = ( (2π) n Σ exp 1 ) 2 ( x µ)t Σ 1 ( x µ) = = n i=1 ( ) 1 exp (x i µ i ) 2 (2π)σi 2σi 2 n f Xi (x i ) i=1

Expectation operator Mean and variance Covariance Conditional expectation

Conditional expectation Expectation of g (X, Y ) given X = x? E (g (X, Y ) X = x) = Can be interpreted as a function y= h (x) := E (g (X, Y ) X = x) g(x, y) f Y X (y x) dy, The conditional expectation of g (X, Y ) given X is It s a random variable E (g (X, Y ) X ) := h (X )

Iterated expectation For any X and Y and any function g : R 2 R E (g (X, Y )) = E (E (g (X, Y ) X ))

Iterated expectation h (x) := E (g (X, Y ) X = x) = y= g (x, y) f Y X (y x) dy

Iterated expectation h (x) := E (g (X, Y ) X = x) = y= g (x, y) f Y X (y x) dy E (E (g (X, Y ) X )) = E (h (X ))

Iterated expectation h (x) := E (g (X, Y ) X = x) = y= g (x, y) f Y X (y x) dy E (E (g (X, Y ) X )) = E (h (X )) = x= h (x) f X (x) dx

Iterated expectation h (x) := E (g (X, Y ) X = x) = y= g (x, y) f Y X (y x) dy E (E (g (X, Y ) X )) = E (h (X )) = = x= x= h (x) f X (x) dx y= f X (x) f Y X (y x) g (x, y) dy dx

Iterated expectation h (x) := E (g (X, Y ) X = x) = y= g (x, y) f Y X (y x) dy E (E (g (X, Y ) X )) = E (h (X )) = = x= x= h (x) f X (x) dx y= = E (g (X, Y )) f X (x) f Y X (y x) g (x, y) dy dx

Example: Desert Car traveling through the desert Time until the car breaks down: T State of the motor: M State of the road: R Model: M uniform between 0 (no problem) and 1 (very bad) R uniform between 0 (no problem) and 1 (very bad) M and R independent T exponential with parameter M + R

Example: Desert E (T ) = E (E (T M, R))

Example: Desert E (T ) = E (E (T M, R)) ( ) 1 = E M + R

Example: Desert E (T ) = E (E (T M, R)) ( ) 1 = E M + R = 1 1 0 0 1 dm dr m + r

Example: Desert E (T ) = E (E (T M, R)) ( ) 1 = E M + R = = 1 1 0 1 0 0 1 dm dr m + r log (r + 1) log (r) dr

Example: Desert E (T ) = E (E (T M, R)) ( ) 1 = E M + R = = 1 1 0 1 0 0 1 dm dr m + r log (r + 1) log (r) dr = log 4 = 1.39

Grizzlies in Yellowstone Model for the weight of grizzly bears in Yellowstone: Males: Gaussian with µ := 240 kg and σ := 40kg Females: Gaussian with µ := 140 kg and σ := 20kg There are about the same number of females and males

Grizzlies in Yellowstone E (W ) = E (E (W S))

Grizzlies in Yellowstone E (W ) = E (E (W S)) = E (W S = 1) + E (W S = 1) 2

Grizzlies in Yellowstone E (W ) = E (E (W S)) E (W S = 1) + E (W S = 1) = 2 = 170 kg

Bayesian coin flip Bayesian methods often endow parameters of discrete distributions with a continuous marginal distribution You suspect a coin is biased You are uncertain about the bias so you model it as a random variable with pdf f B (b) = 2t for t [0, 1] What is the expected value of the coin flip X?

Bayesian coin flip E (X ) = E (E (X B))

Bayesian coin flip E (X ) = E (E (X B)) = E (B)

Bayesian coin flip E (X ) = E (E (X B)) = E (B) = 1 0 2b 2 db

Bayesian coin flip E (X ) = E (E (X B)) = E (B) = 1 0 = 2 3 2b 2 db