STAT:5100 (22S:193) Statistical Inference I

Similar documents
Chapter 4 Multiple Random Variables

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B

Stat 5101 Notes: Brand Name Distributions

Chapter 5 continued. Chapter 5 sections

Chapter 5. Chapter 5 sections

Bivariate Transformations

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

Bivariate distributions

More on Distribution Function

ARCONES MANUAL FOR THE SOA EXAM P/CAS EXAM 1, PROBABILITY, SPRING 2010 EDITION.

THE QUEEN S UNIVERSITY OF BELFAST

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

15 Discrete Distributions

Stat 5101 Notes: Brand Name Distributions

MAS223 Statistical Inference and Modelling Exercises

Random Variables and Their Distributions

18.440: Lecture 28 Lectures Review

Probability and Distributions

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Lecture 16: Hierarchical models and miscellanea

Probability Theory. Patrick Lam

Lecture 1: August 28

ACM 116: Lectures 3 4

STT 441 Final Exam Fall 2013

4. CONTINUOUS RANDOM VARIABLES

Common probability distributionsi Math 217 Probability and Statistics Prof. D. Joyce, Fall 2014

LIST OF FORMULAS FOR STK1100 AND STK1110

Joint p.d.f. and Independent Random Variables

Final Examination Solutions (Total: 100 points)

STA 256: Statistics and Probability I

3 Conditional Expectation

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Lecture 2: Repetition of probability theory and statistics

Formulas for probability theory and linear models SF2941

18.440: Lecture 26 Conditional expectation

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 8 Fall 2007

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution. Charles J. Geyer School of Statistics University of Minnesota

Conditional distributions (discrete case)

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

STAT Chapter 5 Continuous Distributions

Continuous random variables

18.440: Lecture 28 Lectures Review

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

9.07 Introduction to Probability and Statistics for Brain and Cognitive Sciences Emery N. Brown

STA 4322 Exam I Name: Introduction to Statistics Theory

1 Review of Probability and Distributions

Bivariate Normal Distribution

Continuous Random Variables

Continuous Random Variables

3. Probability and Statistics

ECSE B Solutions to Assignment 8 Fall 2008

Multiple Random Variables

Continuous Random Variables and Continuous Distributions

2 Functions of random variables

Stat 426 : Homework 1.

1 Introduction. P (n = 1 red ball drawn) =

Statistics 1B. Statistics 1B 1 (1 1)

Joint Distribution of Two or More Random Variables

Contents 1. Contents

Probability Background

Chapter 6: Functions of Random Variables

Stat410 Probability and Statistics II (F16)

Miscellaneous Errors in the Chapter 6 Solutions

STAT 512 sp 2018 Summary Sheet

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

3 Continuous Random Variables

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

THE UNIVERSITY OF HONG KONG DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE

MAS223 Statistical Inference and Modelling Exercises and Solutions

Multivariate Random Variable

Conditional densities, mass functions, and expectations

Chp 4. Expectation and Variance

STAT:5100 (22S:193) Statistical Inference I Homework Assignments. Luke Tierney

ECON 5350 Class Notes Review of Probability and Distribution Theory

Mathematical Statistics

Mathematical statistics

Algorithms for Uncertainty Quantification

Statistics Ph.D. Qualifying Exam

Advanced topics from statistics

Mathematical statistics

1 Probability and Random Variables

Probability Distributions

MULTIVARIATE DISTRIBUTIONS

Applied Probability Models in Marketing Research: Introduction

1.12 Multivariate Random Variables

Multiple Random Variables

Sampling Distributions

Math 151. Rumbos Spring Solutions to Review Problems for Exam 3

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

1 Exercises for lecture 1

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Transcription:

STAT:5100 (22S:193) Statistical Inference I Week 10 Luke Tierney University of Iowa Fall 2015 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 1

Monday, October 26, 2015 Recap Multivariate normal distribution Linear combinations of random variables Copula models Conditional distributions Conditional PMFs and PMFs Conditional expectation Law of total expectation Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 2

Monday, October 26, 2015 Conditional Expectations Some useful properties of conditional expectation: E[g(Y ) + X Y ] = g(y ) + E[X Y ] E[g(Y ) Y ] = g(y ) E[g(Y )X Y ] = g(y )E[X Y ] This shows that E[g(Y )X ] = E[g(Y )E[X Y ]] since E[g(Y )X ] = E[E[g(Y )X Y ]] = E[g(Y )E[X Y ]] This property can be taken as the definition of E[X Y ]. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 3

Monday, October 26, 2015 Conditional Expectations A formal definition of conditional expectation: Definition Let X, Y be random variables, and assume that E[ X ] <. A conditional expectation E[X Y ] of X given Y is any random variable Z such that for some measurable function h, and Z = h(y ) E[Xg(Y )] = E[Zg(Y )] for all bounded measurable functions g. Theorem A version of the conditional expectation E[X Y ] exists, and any two versions are almost surely equal. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 4

Monday, October 26, 2015 Conditional Expectations A useful consequence of the properties of conditional expectation: Theorem Suppose X and Y are random variables and g is a function such that E[X 2 ] < and E[g(Y ) 2 ] <. Then E[(X g(y )) 2 ] = E[(X E[X Y ]) 2 ] + E[(E[X Y ] g(y )) 2 ] Notes E[(X g(y )) 2 ] is the mean squared error for using g(y ) to predict X. The function of Y with the smallest mean square prediction error is E[X Y ]. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 5

Monday, October 26, 2015 Conditional Expectations Proof. E[(X g(y )) 2 ] =E[(X E[X Y ] + E[X Y ] g(y )) 2 ] =E[(X E[X Y ]) 2 ] + E[(E[X Y ] g(y )) 2 ] + 2E[(X E[X Y ])(E[X Y ] g(y ))] To complete the proof we need to show that the cross product term is zero. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 6

Monday, October 26, 2015 Conditional Expectations Proof (continued). For any function h with E[h(Y ) 2 ] < E[(X E[X Y ])h(y )] = E[E[X E[X Y ] Y ]h(y )] and So E[X E[X Y ] Y ] = E[X Y ] E[X Y ] = 0. E[(X E[X Y ])h(y )] = 0. The cross product expectation corresponds to and is therefore zero. h(y ) = E[X Y ] g(y ) Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 7

Monday, October 26, 2015 Conditional Expectations Corollary If X, Y are random variables with E[X 2 ] < then Var(X ) = E[Var(X Y )] + Var(E[X Y ]) Proof. Take g(y ) = E[X ]: Var(X ) = E[(X E[X ]) 2 ] = E[(X E[X Y ]) 2 ] + E[(E[X Y ] E[X ]) 2 ] = E[E[(X E[X Y ]) 2 Y ]] + E[(E[X Y ] E[E[X Y ]]) 2 ] = E[Var(X Y )] + Var(E[X Y ]) Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 8

Monday, October 26, 2015 Conditional Expectations Example Consider again the example of N customers making total purchases S = N X i. i=1 Suppose the independent purchase amounts have finite variance σ 2. The conditional variance of S given N = n is ( N ) Var(S N = n) = Var X i N = n i=1 ( n ) = Var X i N = n i=1 ( n ) = Var X i i=1 fix upper limit independence of X i and N = nσ 2 mutual independence of X i. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 9

Monday, October 26, 2015 Conditional Expectations Example (continued) The conditional variance of S given N is the random variable Var(S N) = Nσ 2. The variance of S is therefore Var(S) = E[Var(S N)] + Var(E[S N]) = E[Nσ 2 ] + Var(Nµ) = E[N]σ 2 + Var(N)µ 2 = λσ 2 + λµ 2 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 10

Monday, October 26, 2015 Conditional Expectations Corollary Let X, Y be random variables E[X 2 ] <. Then for any g such that E[g(Y ) 2 ] < E[(X g(y )) 2 ] E[(X E[X Y ]) 2 ] Proof. From the theorem, E[(X g(y )) 2 ] = E[(X E[X Y ]) 2 ] + E[(E[X Y ] g(y )) 2 ] E[(X E[X Y ]) 2 ]. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 11

Monday, October 26, 2015 Conditional Expectations Conditional Expectation and Orthogonality A distance among random variables can be defined as X Y = E(X Y ) 2. In terms of this distance E[X Y ] is the closest function of Y to X. Put another way: Among all functions of Y the function with the lowest mean squared error for predicting X is E[X Y ]. The result that E[(X E[X Y ])h(y )] = 0 for all h can be interpreted as: The prediction error X E[X Y ] is orthogonal to the set of all functions of Y. E[X Y ] can be viewed as the orthogonal projection of X onto the set of all random variables that are functions of Y. There are strong parallels to least squares fitting in regression analysis. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 12

Monday, October 26, 2015 Hierarchical Models Hierarchical Models Often it is useful to build up models from conditional distributions. Example Each customer visiting to a store buys something with probability p. Customers make their purchasing decisions independently. Let X = number who buy something N = number who come to store Then X N Binomial(N, p). Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 13

Monday, October 26, 2015 Hierarchical Models Example (continued) Suppose the number of customers who arrive in a given period has a Poisson(λ) distribution. This is a two-stage hierarchical model: X N Binomial(N, p) N Poisson(λ) If the store is chosen at random, then λ would vary from store to store. This produces a three-stage hierarchical model: X N, λ Binomial(N, p) N λ Poisson(λ) λ f p might also vary from store to store. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 14

Monday, October 26, 2015 Hierarchical Models Example (continued) The marginal distribution of X is a mixture distribution. In the two-stage model, X is a Poisson mixture of binomials: f X (x) = P(X = x) = n P(X = x, N = n) = n P(X = x N = n)p(n = n) = So X Poisson(pλ). n=x ( n )p x n x λn (1 p) x n! e λ = (pλ)x e λ 1 [(1 p)λ]n x x! (n x)! n=x = (pλ)x x! e λ+(1 p)λ = (pλ)x e pλ x! Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 15

Monday, October 26, 2015 Hierarchical Models Example (continued) With a third stage, P(X = x) = = 0 0 P(X = x λ)f (λ)dλ (pλ) x e pλ f (λ)dλ x! What forms of f are both flexible and convenient? Gamma is a natural choice: f (λ) = 1 Γ(α)β α λα 1 e λ/β 1 (0, ) (λ) Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 16

Monday, October 26, 2015 Hierarchical Models Example (continued) Then the marginal PMF of X is (pλ) x P(X = x) = e pλ 1 0 x! Γ(α)β α λα 1 e λ/β dλ p x = x!γ(α)β α λ x+α 1 e λ(p+1/β) dλ 0 p x Γ(x + α) = x!γ(α)β α (p + 1/β) ( x+α ) Γ(x + α) p x ( ) 1/β α = Γ(α)x! p + 1/β p + 1/β For α a nonnegative integer this is a negative binomial distribution. For non-integer α this also a negative binomial distribution this is a definition. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 17

Wednesday, October 28, 2015 Recap Properties of conditional expectation Variance decomposition Geometry of conditional expectation Hierarchical models Poisson mixture of binomial distributions Gamma mixture of Poisson distributions Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 18

Wednesday, October 28, 2015 Hierarchical Models Example Suppose we observe N = n customers visiting the store. What is the conditional distribution of the store s λ value? The joint density/pmf of λ and N is f (λ, n) = f N λ (n λ)f (λ) = λn 1 n! e λ Γ(α)β α λα 1 e λ/β = λn+α 1 n!γ(α)β α e λ(1+1/β). The conditional density of λ given N = n is therefore f λ N (λ n) = f (λ, n) f N (n) f (λ, n) λn+α 1 e λ(1+1/β) = λ n+α 1 e λ(β+1)/β. This corresponds to a Γ(n + α, β/(1 + β)) distribution. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 19

Wednesday, October 28, 2015 Hierarchical Models Example A poll of size n is to be take to see whether voters prefer Candidate A or Candidate B. Assuming sampling with replacement, the number X in the sample who favor A is Binomial(n, p) with p the population proportion who favor A. The race is close and we are fairly confident that p is between 0.4 and 0.6. We can capture this by thinking of p as a random variable with a distribution that puts about 90% probability between 0.4 and 0.6. This is our prior probability distribution on p. Once we have collected our data we can compute a posterior probability like P(p > 0.5 data). This is an example of Bayesian inference. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 20

Wednesday, October 28, 2015 Hierarchical Models Example (continued) A convenient form for the prior distribution is a Beta(α, β) distribution. The joint density/pmf of X, p is then of the form f (x, p) = f X p (x p)f (p) ( ) n = p x (1 p) n x 1 x B(α, β) pα 1 (1 p) β 1. The posterior density of p given X = x is f (p x) = f (x, p) f X (x) f (x, p) p x+α 1 (1 p) n x+β 1. This is the density of a Beta(x + α, n x + β) distribution. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 21

Wednesday, October 28, 2015 Hierarchical Models Example (continued) A Beta distribution with α = β = 33 is symmetric about 0.5 and assigns approximately 0.9 probability to the interval between 0.4 and 0.6. Results for some possible sample sizes and observed counts: x n p P(p > 0.5 data) 15 20 0.75 0.86 115 200 0.575 0.97 1,046 2,000 0.523 0.98 All three scenarios produce approximately the same p-value for testing H 0 : p 0.5 against H 1 : p > 0.5. Some plots: p <- seq(0, 1, len = 101) plot(p, dbeta(p, 33, 33), type = "l", ylim = c(0, 40)) abline(v = 0.5, lty = 2) lines(p, dbeta(p, 33 + 15, 33 + 20-15), col = "red") lines(p, dbeta(p, 33 + 115, 33 + 200-115), col = "forestgreen") lines(p, dbeta(p, 33 + 1046, 33 + 2000-1046), col = "blue") Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 22

Wednesday, October 28, 2015 Hierarchical Models Example Suppose X, Y are bivariate normal with standard normal marginals and correlation ρ. The joint density is f (x, y) = { 1 2π 1 ρ exp x 2 + y 2 } 2ρxy 2 2(1 ρ 2. ) The conditional density of Y given X = x is { f Y X (y x) exp y 2 } 2ρxy 2(1 ρ 2. ) This is a N(ρx, 1 ρ 2 ) density. For a general bivariate normal distribution ( ) x µ X Y X = x N µ Y + ρσ Y, σy 2 (1 ρ 2 ). σ X Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 23

Wednesday, October 28, 2015 Hierarchical Models Example Suppose X 1, X 2, given M = m, are independent N(m, 1). The marginal distribution of M is N(0, δ 2 ). Then X 1, X 2 are bivariate normal with E[X 1 ] = E[X 2 ] = E[E[X 1 M]] = E[M] = 0 Var(X 1 ) = Var(X 2 ) = E[Var(X 1 M)] + Var(E[X 1 M]) The covariance of X 1, X 2 is So the correlation is = 1 + Var(M) = 1 + δ 2. Cov(X 1, X 2 ) = E[X 1 X 2 ] = E[E[X 1 X 2 M]] = E[E[X 1 M]E[X 2 M]] = E[M 2 ] = δ 2. ρ = δ2 1 + δ 2. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 24

Wednesday, October 28, 2015 Change of Variables Change of Variables Suppose X, Y are jointly continuous on a set A. Let U, V be defined as U = g 1 (X, Y ) V = g 2 (X, Y ) The image of A under the transformation g is B = {(u, v) : u = g 1 (x, y) and v = g 2 (x, y) for some (x, y) A} Assume g = (g 1, g 2 ) is one-to-one on A. Then g has an inverse h = (h 1, h 2 ) defined on B, and X = h 1 (U, V ) Y = h 2 (U, V ) Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 25

Wednesday, October 28, 2015 Change of Variables A small rectangle [u, u + du] [v, v + dv] has area dudv. The image of this rectangle under h is (approximately) a parallelogram. (u, v) (x, y) Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 26

Wednesday, October 28, 2015 Change of Variables The area of this parallelogram is approximately J(u, v) dudv where ( x J(u, v) = dxdy dudv = det u y u = x y u v x y v u x v y v ) J is called the Jacobian determinant of the transformation. This generalizes to three or more variables in the obvious way. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 27

Wednesday, October 28, 2015 Change of Variables The density of U, V for (u, v) B can be derived as f U,V (u, v)dudv = f X,Y (x, y)dxdy. Then f U,V (u, v) = f X,Y (x, y) dxdy dudv = f X,Y (x, y) J(u, v) Alternatively, f U,V (u, v) = f X,Y (h 1 (u, v), h 2 (u, v)) J(u, v) Here J(u, v) = det ( h1 (u,v) u h 2 (u,v) u h 1 (u,v) v h 2 (u,v) v ) Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 28

Friday, October 30, 2015 Recap Simple Bayesian inference examples Gaussian hierarchical models Change of variables for jointly continuous random variables Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 29

Friday, October 30, 2015 Change of Variables Example Let X, Y be independent with X Gamma(α, 1) Y Gamma(β, 1) Define U = X /(X + Y ) V = X + Y Then A = (0, ) (0, ) B = (0, 1) (0, ). Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 30

Friday, October 30, 2015 Change of Variables Example (continued) The inverse transformation is The Jacobian is J(u, v) = det x = h 1 (u, v) = uv y = h 2 (u, v) = v uv = (1 u)v ( ) v u = v(1 u) + vu = v. v (1 u) Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 31

Friday, October 30, 2015 Change of Variables Example (continued) The joint density of U and V for 0 < u < 1 and v > 0 is therefore f U,V (u, v) = f X (uv)f Y ((1 u)v)v = 1 Γ(α) (uv)α 1 e uv 1 Γ(β) ((1 u)v)β 1 e (1 u)v v 1 = Γ(α)Γ(β) uα 1 (1 u) β 1 v α+β 1 e v Incorporating indicator functions for B gives f U,V (u, v) = 1 [ u α 1 (1 u) β 1 1 (0,1) (u) ] [ v α+β 1 e v 1 (0, ) (v) ]. Γ(α)Γ(β) So U, V are independent with U Beta(α, β) V Gamma(α + β, 1) Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 32

Friday, October 30, 2015 Change of Variables Example Suppose X, Y are uniformly distributed on the region A = {(x, y) : 1 x 1, 1 y 1}. The joint density of X, Y is { 1 f X,Y (x, y) = 4 if 1 x 1 and 1 y 1 0 otherwise. Let U = X + Y V = X Y. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 33

Friday, October 30, 2015 Change of Variables Example (continued) The inverse transformation is The range of the transformation is x = u + v 2 y = u v. 2 B = {(u, v) : 2 u + v 2, 2 u v 2}, The Jacobian determinant of the inverse transformation is ( 1 ) 1 J(u, v) = det 2 2 1 = 1 4 1 4 = 1 2 2 1 2 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 34

Friday, October 30, 2015 v Change of Variables Example (continued) The joint density of U, V is therefore ( u + v f U,V (u, v) = f X,Y, u v ) J(u, v) 2 2 { 1 = 8 if 2 u + v 2 and 2 u v 2 0 otherwise. This is a uniform distribution on the square B: 2 1 0 1 2 u v = 2 u + v = 2 u + v = 2 u v = 2 2 1 0 1 2 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 35 u

Friday, October 30, 2015 Change of Variables Example (continued) We can compute the marginal density of U by integrating out v from the joint density: f U (u) = f U,V (u, v)dv 2+u 1 2 u 8 dv = 1 8 2(2 + u) = 1 2 + u 4 if 2 u 0 = 2 u 1 u 2 8 dv = 1 8 2(2 u) = 1 2 u 4 if 0 u 2 0 otherwise { 1 = 2 u 4 if u 2 0 otherwise. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 36

Friday, October 30, 2015 Change of Variables Example Suppose X, Y are uniformly distributed on the unit disk The joint density of X and Y is A = {(x, y) : x 2 + y 2 1} f X,Y (x, y) = 1 π 1 A(x, y) Let U = X /Y V = Y Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 37

Friday, October 30, 2015 Change of Variables Example (continued) The inverse transformation is x = uv y = v. The range of the transformation is B = {(u, v) : u 2 v 2 + v 2 1} = {(u, v) : v 1/ 1 + u 2 }. The Jacobian determinant of the inverse transformation is ( ) v u J(u, v) = det = v 0 1 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 38

Friday, October 30, 2015 Change of Variables Example (continued) The joint density of U, V is therefore The marginal density of U is f U (u) = f U,V (u, v) = f X,Y (uv, v) J(u, v) { v = π if v 1/ 1 + u 2 0 otherwise. f U,V (u, v)dv = 1/ 1+u 2 v = 2 0 This is a Cauchy density. π dv = v 2 1/ 1+u 2 v 1/ 1+u 2 π dv 1/ 1+u 2 π = 1 1 π 1 + u 2 0 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 39

Friday, October 30, 2015 Change of Variables Two problems: 1. X, Y are continuous, independent. Find the density of U = X + Y. 2. X N(0, 1), Y χ 2 p, independent. Find the density of U = X / Y /p t p. Two possible approaches: a. identify the region U u in the x, y plane integrate the joint density over this region differentiate the result with respect to u b. Add another variable V so that (X, Y ) (U, V ) is one-to-one find the joint density of U, V integrate out V The second approach is often easier since it only requires a one-dimensional integral V can be chosen to make this integral easier Often taking V = X or V = Y is sufficient. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 40

Friday, October 30, 2015 Change of Variables Example For the first problem with U = X + Y take V = X. The inverse transformation is X = V Y = U V The Jacobian determinant of the inverse is ( ) 0 1 J(u, v) = det = 1 1 1 So f U,V (u, v) = f X (v)f Y (u v) The marginal distribution of U is therefore f U (u) = f U,V (u, v)dv = This is called the convolution of f X and f Y. f X (v)f Y (u v)dv Transforms (e.g. moment generating functions) turn convolutions into products. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 41

Friday, October 30, 2015 Change of Variables Example For the second problem with U = X / Y /p set V = Y : Then the inverse transformation is U = X / Y /p V = Y X = U V /p Y = V The Jacobian determinant of the inverse transformation is ( v/p u J(u, v) = det ) 2p v/p = v/p. 0 1 The range of the transformation is B = {(u, v) : < u <, 0 < v < }. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 42

Friday, October 30, 2015 Change of Variables Example (continued) So the joint density of U and V is f U,V (u, v) = f X (u v/p)f Y (v) v/p { 1 = 2π exp { 1 2 u2 v/p } 1 v p/2 1 e v/2 v/p v > 0 Γ(p/2)2 p/2 0 v 0 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 43

Friday, October 30, 2015 Change of Variables Example (continued) The marginal density of U is therefore { 1 f U (u) = exp 1 } 1 2π 2 u2 v/p Γ(p/2)2 v p/2 1 e v/2 v/pdv p/2 = 0 1 2π pγ(p/2)2 p/2 0 v p+1 2 1 e 1 2 v(1+u2 /p) dv = Γ ( ) p+1 2 2 (p+1)/2 1 2πΓ(p/2) p2 p/2 (1 + u 2 /p) (p+1)/2 Γ ( ) p+1 2 1 = Γ(p/2)(pπ) 1/2 (1 + u 2 /p) (p+1)/2 This is the density of Student s t distribution with p degrees of freedom. For p = 1 this is a Cauchy distribution. Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 44