Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Joint Distributions Joint Distributions A bivariate normal distribution generalizes the concept of normal distribution to bivariate random variables It requires a matrix formulation of quadratic forms, and it is later studied in relation with a linear transformation of joint densities An important application for bivariate normal distributions is discussed along with the introduction of conditional expectation and prediction Operations in matrices Matrices are usually denoted by capital letters, A, B, C,, while scalars (real numbers) are denoted by lower case letters, a, b, c, [ [ a b ka kb (a) Scalar multiplication: k c d kc kd [ [ [ a b e f ae + bg af + bh (b) Product of two matrices: c d g h ce + dg cf + dh (c) The transpose of a matrix: [ T [ a b a c c d b d Determinant and inverse of a matrix [ a b (a) Determinant of a matrix: det ad bc c d [ a b (b) The inverse of a matrix: If A and det A, then the inverse A c d of A is given by A [ [ d b d b, det A c a ad bc c a [ and satisfies AA A A [ a b Quadratic forms If a -by- matrix A satisfies A T A, then it is of the form, and [ b c x is said to be symmetric A -by- matrix x is called a column vector, usually denoted y by boldface letters x, y,, and the transpose x T [ x y is called the row vector Then, the bivariate function Q(x, y) x T Ax [ x y [ [ a b x ax + bxy + cy (5) b c y is called a quadratic form Page Probability and Statistics I/December 4, 7

Joint Distributions Cholesky decomposition If ac b and ac, then the symmetric matrix can be decomposed into the following forms known as Cholesky decompositions: [ [ [ a b a b c b/ a a b/ a (5) (ac b )/a (ac b )/a [ (ac b )/c b/ [ c (ac b )/c c b/ (53) c c Decomposition of quadratic form Corresponding to the Cholesky decompositions of the matrix A, the quadratic form Q(x, y) can be also decomposed as follows: ( a ( b ac b Q(x, y) x + y) + y) (54) a a ( ( ac b b x) + x + c y) (55) c c Bivariate normal density The function f(x, y) ( π σ exp ) Q(x, y) y ρ with the quadratic form Q(x, y) ρ [ (x ) ( ) µx y µy + ρ (x µ x)(y µ y ) (56) (57) gives the joint density function of a bivariate normal distribution Quadratic form of bivariate normal density Note that the parameters σx, σy, and ρ must satisfy σx >, σy >, and < ρ < By defining the -by- symmetric matrix (also known as covariance matrix) and the two column vectors [ [ σ x ρ µx Σ ρ σ y the quadratic form can be expressed as, x Q(x, y) (x µ) T Σ (x µ) [ x y [ x µ x y µ y [ σ x ρ ρ σ y and µ µ y [ x µx y µ y, (58) Marginal density functions Suppose that two random variables X and Y has the bivariate normal distribution (56) Then their marginal distributions f X (x) and f Y (x) respectively become f X (x) exp ( ) } x µx πσx Page Probability and Statistics I/December 4, 7

Joint Distributions and f Y (y) exp πσy ( ) } y µy Thus, X and Y are normally distributed with respective parameters (µ x, σ x) and (µ y, σ y) Independence condition for X and Y If ρ, then the quadratic form (57) becomes ( ) ( ) x µx y µy Q(x, y) +, and consequently we have f(x, y) f X (x)f Y (y) Thus, X and Y are independent if and only if ρ Conditional density functions If X and Y are not independent (that is, ρ ), we can compute the conditional density functions f Y X (y x) given X x as ( ) y f Y X (y x) exp πσy ρ µy ρ σy (x µ x ) ρ which is the normal density function with parameter (µ y + ρ σy (x µ x ), σy( ρ )) Similarly, the conditional density function f X Y (y x) given Y y becomes ( ) x µx f X Y (x y) exp πσx ρ ρ σx (y µ y ) ρ which is the normal density function with parameter (µ x + ρ σx (y µ y ), σ x( ρ )) Covariance calculation Let (X, Y ) be bivariate normal random variables with parameters (µ x, µ y, σ x, σ y, ρ) Recall that f(x, y) f(x)f(y x), and that f(y x) is the normal density with mean µ y + ρ σy (x µ x ) and variance σ y( ρ ) First we can compute Then we can apply it to obtain Cov(X, Y ) (y µ y )f(y x) dy ρ (x µ x ) ρ (x µ x )(y µ y )f(x, y) dy dx [ (x µ x ) (y µ y )f(y x) dy f(x) dx (x µ x ) f(x) dx ρ σx ρ Correlation coefficient ρ The correlation coefficient of X and Y is defined by ρ Cov(X, Y ) Var(X)Var(Y ) Page 3 Probability and Statistics I/December 4, 7

Joint Distributions It implies that the parameter ρ of bivariate normal distribution represents the correlation coefficient of X and Y Conditional expectation Let f(x, y) be a joint density function for random variables X and Y, and let f Y X (y x) be the conditional density function of Y given X x (see Note #: conditional density functions ) Then the conditional expectation of Y given X x, denoted by E(Y X x), is defined as the function h(x) of x: E(Y X x) : h(x) y f Y X (y x) dy In particular, h(x) is the function of the random variable X, and is called the conditional expectation of Y given X, denoted by E(Y X) Conditional expectation of a function g(y ) Similarly, we can define the conditional expectation for a function g(y ) of the random variable Y by E(g(Y ) X x) : g(y) f Y X (y x) dy Since E(g(Y ) X x) is a function, say h(x), of x, we can define E(g(Y ) X) as the function h(x) of the random variable X Linearity properties of conditional expectation Let a and b be constants By the definition of conditional expectation, it clearly follows that Consequently, we obtain E(aY + b X x) ae(y X x) + b E(aY + b X) ae(y X) + b Law of total expectation Since E(g(Y ) X) is a function h(x) of random variable of X, we can consider the expectation E[E(g(Y ) X) of the conditional expectation E(g(Y ) X), and compute it as follows E[E(g(Y ) X) h(x)f X (x) dx [ g(y) f Y X (y x) dy f X (x) dx [ f Y X (y x)f X (x) dx g(y) dy Thus, we have E[E(g(Y ) X) E[g(Y ) g(y)f Y (y) dy E[g(Y ) Conditional variance formula The conditional variance, denoted by Var(Y X x), can be defined by Var(Y X x) E((Y E(Y X x)) X x) E(Y X x) (E(Y X x)), Page 4 Probability and Statistics I/December 4, 7

Joint Distributions where the second equality can be obtained from the linearity property Since Var(Y X x) is a function, say h(x), of x, we can define Var(Y X) as the function h(x) of the random variable X Conditional variance formula, continued Now compute the variance Var(E(Y X)) of the conditional expectation E(Y X) and the expectation E[Var(Y X) of the conditional variance Var(Y X) as follows By adding () and () together, we can obtain Var(E(Y X)) E[(E(Y X)) (E[E(Y X)) E[(E(Y X)) (E[Y ) () E[Var(Y X) E[E(Y X) E[(E(Y X)) E[Y E[(E(Y X)) () Var(E(Y X)) + E[Var(Y X) E[Y (E[Y ) Var(Y ) A function g (X) in conditional expectation In calculating the conditional expectation E(g (X)g (Y ) X x) given X x, we can treat g (X) as the constant g (x); thus, we have E(g (X)g (Y ) X x) g (x)e(g (Y ) X x) This justifies E(g (X)g (Y ) X) g (X)E(g (Y ) X) Prediction Suppose that we have two random variables X and Y Having observed X x, one may attempt to predict the value of Y as a function g(x) of x The function g(x) of the random variable X can be constructed for this purpose, and is called a predictor of Y The criterion for the best predictor g(x) is to find a function g to minimize the expected square distance E[(Y g(x)) between Y and the predictor g(x) Prediction, continued We can compute E[(Y g(x)) by applying all the properties of conditional expectation E[(Y g(x)) E [ (Y E(Y X) + E(Y X) g(x)) E[(Y E(Y X)) + E[(Y E(Y X))(E(Y X) g(x)) + (E(Y X) g(x)) E[E((Y E(Y X)) X) [ + E (E(Y X) g(x))e((y E(Y X)) X) + E[(E(Y X) g(x)) E[Var(Y X) + E[(E(Y X) g(x)), which is minimized when g(x) E(Y X x) Thus, we can find g(x) E(Y X) as the best predictor of Y Page 5 Probability and Statistics I/December 4, 7

Joint Distributions The best predictor for bivariate normal distributions When X and Y has the bivariate normal distribution with parameter (µ x, µ y, σ x, σ y, ρ), we can find the best predictor E(Y X) µ y + ρ (X µ x ), which minimizes the expected square distance E[(Y g(x)) to σ y( ρ ) The best linear predictor Except for the case of bivariate normal distribution, the conditional expectation E(Y X) could be a complicated nonlinear function of X So we can instead consider the best linear predictor g(x) α + βx of Y by minimizing E[(Y (α + βx)) First we can observe that E[(Y (α + βx)) where µ x E[X and µ y E[Y E[Y αµ y βe[xy + α + αβµ x + β E[X The best linear predictor, continued we can obtain the desired values α and β by solving α E[(Y (α + βx)) µ y + α + βµ x β E[(Y (α + βx)) E[XY + αµ x + βe[x The solution can be expressed by using σ x Var(X), σ y Var(Y ), and ρ Cov(X, Y )/( ), and obtain β E[XY E[XE[Y E[X (E[X) α µ y βµ x µ y ρ µ x In summary, the best linear predictor g(x) becomes ρ g(x) µ y + ρ (X µ x ) Transformation of variable For an interval A [a, b on real line with a < b, the integration of a function f(x) over A is formally written as A f(x) dx b a f(x) dx (59) Let h(u) be differentiable and strictly monotonic function (that is, either strictly increasing or strictly decreasing) on real line Then there is the inverse function g h, so that we can define the inverse image h (A) of A [a, b by h (A) g(a) g(x) : a x b} Page 6 Probability and Statistics I/December 4, 7

Joint Distributions The change of variable in the integral (59) provides the formula f(x) dx f(h(u)) d du h(u) du (5) A g(a) Transformation theorem of random variable Theorem Let X be a random variable with pdf f X (x) Suppose that a function g(x) is differentiable and strictly monotonic Then the random variable Y g(x) has the pdf f Y (y) f X (g (y)) d dy g (y) (5) A sketch of proof Since g(x) is strictly monotonic, we can find the inverse function h g, and apply the formula (5) Thus, by letting A (, t, we can rewrite the cdf for Y as By comparing it with the form F Y (t) P (Y t) P (Y A) P (g(x) A) P (X h(a)) f X (x) dx f X (h(y)) d h(a) g(h(a)) dy h(y) dy f X (g (y)) d dy g (y) dy A t f Y (y) dy, we can obtain (5) Example Let X be a uniform random variable on [, Suppose that X takes nonnegative values [and therefore, f(x) for x Find the pdf f Y (y) for the random variable Y X n Solution Let g(x) x n Then we obtain h(y) g (y) y /n, and h (y) n y(/n) Thus, we obtain f Y (y) n y(/n) f X (y /n n ) y(/n) if y ; otherwise Linear transform of normal random variable Let X be a normal random variable with parameter (µ, σ ) Then we can define the linear transform Y ax + b with real values a and b To compute the density function f Y (y) of the random variable Y, we can apply the transformation theorem, and obtain f Y (y) a σ π exp [ (y (aµ + b)) (aσ) This is the normal density function with parameter (aµ + b, (aσ) ) Transformation of two variables For a rectangle A (x, y) : a x b, a y b } on a plane, the integration of a function f(x, y) over A is formally written as b b f(x, y) dx dy f(x, y) dx dy (5) A a a Page 7 Probability and Statistics I/December 4, 7

Joint Distributions Suppose that a transformation (g (x, y), g (x, y)) is differentiable and has the inverse transformation (h (u, v), h (u, v)) satisfying x h (u, v) u g (x, y) if and only if (53) y h (u, v) v g (x, y) Transformation of two variables, continued We can state the change of variable in the double integral (5) as follows: (i) define the inverse image of A by (ii) calculate the Jacobian h (A) g(a) (g (x, y), g (x, y)) : (x, y) A}, [ J h (u, v) det h u (u, v) h u (u, v) h v (u, v) h, (54) v (u, v) (iii) finally, if it does not vanish for all x, y, then we can obtain f(x, y) dx dy f(h (u, v), h (u, v)) J h (u, v) du dv (55) A g(a) Transformation theorem of bivariate random variables Theorem 3 Suppose that (i) random variables X and Y has a joint density function f X,Y (x, y), (ii) a transformation (g (x, y), g (x, y)) is differentiable, and (iii) its inverse transformation (h (u, v), h (u, v)) satisfies (53) Then the random variables U g (X, Y ) and V g (X, Y ) has the joint density function f U,V (u, v) f X,Y (h (u, v), h (u, v)) J h (u, v) (56) A sketch of proof Define A (x, y) : x s, y t} and the image of A by the inverse transformation (h (x, y), h (x, y)) by h(a) (h (u, v), h (u, v)) : (u, v) A} Then we can rewrite the joint distribution function F U,V for (U, V ) by applying the formula (55) F U,V (s, t) P ((U, V ) A) P ((g (X, Y ), g (X, Y )) A) P ((X, Y ) h(a)) f X,Y (x, y) dx dy h(a) f X,Y (h (u, v), h (u, v)) J h (u, v) du dv g(h(a)) f X,Y (h (u, v), h (u, v)) J h (u, v) du dv A which is equal to t s f U,V (u, v) du dv; thus, we obtain (56) Page 8 Probability and Statistics I/December 4, 7

Exercises Exercises Problem Verify the Cholesky decompositions by multiplying the matrices in each of (5) and (53) Problem Show that the right-hand side of (54) equals (5) by calculating Q(x, y) [ x y [ [ [ a b/ a a b/ a x (ac b )/a (ac b )/a y Similarly show that the right-hand side of (55) equals (5) Problem 3 Show that (58) has the following decomposition ( ) x µx ρ σx σ Q(x, y) y (y µ y ) ( ) y µy + (E) ρ ( ) ( ) x µx y µy ρ σy σ + x (x µ x ) (E) ρ by completing the following: (a) Show that (57) is equal to (58) (b) Find the Cholesky decompositions of Σ (c) Verify (E) and (E) by finding the Cholesky decompositions of Σ Problem 4 Compute f X (x) by completing (i) and (ii) Then find f Y X (y x) given X x (i) Show that πσx exp f(x, y) dy becomes ( ) } x µx ( y exp πσy ρ µy ρ σy (x µ x ) ρ ) dy (ii) Show that πσy ρ π exp ( y exp µy ρ σy } u du (x µ x ) ρ ) dy Page 9 Probability and Statistics I/December 4, 7

Exercises Problem 5 Suppose that the joint density function f(x, y) of X and Y is given by if x y ; f(x, y) otherwise (a) Find the covariance and the correlation of X and Y (b) Find the conditional expectations E(X Y y) and E(Y X x) (c) Find the best linear predictor of Y in terms of X (d) Find the best predictor of Y Problem 6 Suppose that we have the random variables X and Y having the joint density 6 7 f(x, y) (x + y) if x and y ; otherwise (a) Find the covariance and correlation of X and Y (b) Find the conditional expectation E(Y X x) (c) Find the best linear predictor of Y Problem 7 Let X be the age of the pregnant woman, and let Y be the age of the prospective father at a certain hospital Suppose that the distribution of X and Y can be approximated by a bivariate normal distribution with parameters µ x 8, µ y 3, 6, 8 and ρ 8 (a) Find the probability that the prospective father is over 3 (b) When the pregnant woman is 3 years old, what is the best guess of the prospective father s age? (c) When the pregnant woman is 3 years old, find the probability that the prospective father is over 3 Problem [ 8 Suppose that (X, X[ ) has a bivariate normal distribution with mean vector µ µx σ and covariance matrix Σ x ρ Consider the linear transformation ρ µ y Then answer (a) (c) in the following σ y Y g (X, X ) a X + b ; Y g (X, X ) a X + b (a) Find the inverse transformation (h (u, v), h (u, v)), and compute the Jacobian J h (u, v) Page Probability and Statistics I/December 4, 7

Exercises (b) Find the joint density function f Y,Y (u, v) for (Y, Y ) (c) Conclude that the [ joint density function f Y,Y (u, v) is the [ bivariate normal density with a µ mean vector µ x + b (a and covariance matrix Σ ) ρ(a )(a ) a µ y + b ρ(a )(a ) (a ) Problem 9 Let X and X be iid standard normal random variables transformation Consider the linear Y g (X, X ) a X + a X + b ; Y g (X, X ) a X + a X + b, where a a a a Find the joint density function f Y,Y (u, [ v) for (Y, Y ), and conclude b that it is the bivariate normal density with mean vector µ and covariance matrix b [ σ Σ x ρ where σx ρ σ a + a y, σy a + a a a + a a, and ρ (a + a )(a + a ) σ y Problem Suppose that X and Y are independent random variables with respective pdf s f X (x) and f Y (y) Consider the transformation Then answer (a) (c) in the following distribution with parameter (α, α ) U g (X, Y ) X + Y ; V g (X, Y ) X X + Y The density function (57) for V is called the beta (a) Find the inverse transformation (h (u, v), h (u, v)), and compute the Jacobian J h (u, v) (b) Find the joint density function f XY (x, y) for (X, Y ) (c) Now let f X (x) and f Y (y) be the gamma density functions with respective parameters (a, λ) and (a, λ) Then show that U X +Y has the gamma distribution with parameter (α + α, λ) and that V X has the density function X + Y f V (v) Γ(α + α ) Γ(α )Γ(α ) vα ( v) α (57) Problem Let X and Y be random variables The joint density function of X and Y is given by xe (x+y) if x and y; f(x, y) otherwise (a) Are X and Y independent? Describe the marginal distributions for X and Y with specific parameters Page Probability and Statistics I/December 4, 7

Exercises (b) Find the distribution for the random variable U X + Y Problem Consider an investment for three years in which you plan to invest $5 in average at the beginning of every year The fund yields the fixed annual interest of %; thus, the fund invested at the beginning of the first year yields 33% at the end of term (that is, at the end of the third year), the fund of the second year yields %, and the fund of the third year yields % Then answer (a) (c) in the following (a) How much do you expect to have in your investment account at the end of term? (b) From your past spending behavior, you assume that the annual investment you decide is independent every year, and that its standard deviation is $ (That is, the standard deviation for the amount of money you invest each year is $) Find the standard deviation for the total amount of money you have in the account at the end of term (c) Furthermore, suppose that your annual investment is normally distributed (That is, the amount of money you invest each year is normally distributed) Then what is the probability that you have more than $ in the account at the end of term? Problem 3 Let X be an exponential random variable with parameter λ And define where n is a positive integer (a) Compute the pdf f Y (y) for Y (b) Find c so that f(x) Y X n cx n e λx if x ; otherwise is a pdf Hint: You should know the name of the pdf f(x) (c) By using (b), find the expectation E[Y Problem 4 Let X and Y be independent normal random variables with mean zero and the respective variances σx and σy Then consider the change of variables via U a X + a Y (58) V a X + a Y where D a a a a (i) Find the variances σ u and σ v for U and V, respectively (ii) Find the inverse transformation (h (u, v), h (u, v)) so that X and Y can be expressed in terms of U and V by X h (U, V ); Y h (U, V ) And (iii) compute the Jacobian J h (u, v) Then continue to answer (a) (d) in the following Page Probability and Statistics I/December 4, 7

Exercises (a) Find ρ so that the joint density function f U,V (u, v) of U and V can be expressed as f U,V (u, v) ( πσ u σ exp ) Q(u, v) v ρ (b) Express Q(u, v) by using σ u, σ v, and ρ to conclude that the joint density is the bivariate normal density with parameters µ u µ v, σ u, σ v, and ρ (c) Let X X + µ x and Y Y + µ y Then consider the change of variables via U a X + a Y V a X + a Y Express U and V in terms of U and V, and describe the joint distribution for U and V with specific parameters (d) Now suppose that X and Y are independent normal random variables with the respective parameters (µ x, σ x) and (µ y, σ y) Then what is the joint distribution for U and V given by (58)? Problem 5 Suppose that a real valued data X is transmitted though a noisy channel from location A to location B, and that the value U received at location B is given by U X + Y, where Y is a normal random variable with mean zero and variance σ y Furthermore, we assume that X is normally distributed with mean µ and variance σ x, and is independent of Y (a) Find the expectation E[U and the variance Var(U) of the received value U at location B (b) Describe the joint distribution for X and U with specific parameters (c) Provided that U t is observed at location B, find the best predictor g(t) of X (d) Suppose that we have µ, and 5 When U 3 was observed, find the best predictor of X given U 3 To complete Problem 5, you may use the following theorem: Linear transformation theorem of bivariate normal random variables Let X and Y be independent normal random variables with the respective parameters (µ x, σ x) and (µ y, σ y) Then if a a a a, then the random variables U a X + a Y V a X + a Y has the bivariate normal distribution with parameter µ u a µ x + a µ y, µ v a µ x + a µ y, σu a σx + a σy, σv a σx + a σy, and ρ a a σx + a a σy (a σx + a σy)(a σx + a σy) Page 3 Probability and Statistics I/December 4, 7

Optional problems Optional problems Objective of assignment A sequence of values, X,, X n, are governed by the discrete dynamical system X k a k X k + B k, k,, n, (59) where a k s are known constants, and B k s are independent and normally distributed random variables with mean and variance Q It is assumed that initially the random variable X is normally distributed with mean ˆx and variance P Otherwise, the information about the value X k of interest is obtained by observing Z k h k X k + W k, k,, n, (5) where h k s are known constants, and W k s are independent and normally distributed random variables with mean and variance R Objective of assignment, continued The data Z k z k, k,, n, are collected sequentially By completing Problem 6 and 7 you will find an algorithm to recursively compute the best estimate ˆx,, ˆx n for the sequence of unobserved values, X,, X n Problem 6 The density function ( f(y, y ) πσ σ exp ) ρ Q(y, y ) with the quadratic form Q(y, y ) ρ [ (y ) ( ) µ y µ + ρ (y µ )(y µ ) σ σ σ σ gives a bivariate normal distribution for a pair (Y, Y ) of random variables Then answer (a) (c) in the following (a) The marginal distribution for Y has the normal density function f (y ) exp ( ) } y µ πσ Find the expectation E[Y and the variance Var(Y ) for Y (b) Find f (y y ) satisfying σ f(y, y ) f (y y )f (y ) The function f (y y ) of y is called a conditional density function given Y y (c) Argue that f (y y ) is a normal density function of y given a fixed value y, and find the expectation E(Y Y y ) and the variance Var(Y Y y ) given Y y Page 4 Probability and Statistics I/December 4, 7

Optional problems Remark on Problem 6 Note that the parameters σ, σ, and ρ must satisfy σ >, σ >, and < ρ < It is determined by the -by- symmetric matrix and the mean column vectors [ [ [ [ Var(Y ) Cov(Y, Y ) σ ρσ σ E[Y µ Cov(Y, Y ) Var(Y ) ρσ σ σ and E[Y µ The quadratic form Q(y, y ) has the following decomposition ( ) y µ ρ σ ( σ Q(y, y ) (y µ ) y µ + σ ρ σ ) Problem 7 Suppose that the variable X k has a normal distribution with mean ˆx k and variance P k at the (k )-th iteration of algorithm In order to complete this problem you may use linear transformation theorem of bivariate normal random variables in Problem 5 Answer (a) (e) in the following (a) Argue that X k has a normal distribution, and find x k E[X k and P k Var(X k ) (b) Find the joint density function for (X k, Z k ) in terms of x k, Pk, h k and R HINT: Let U X X k, V Z k, and Y W k, and apply the linear transformation theorem with a a, a, and a h k (c) Find the best estimate ˆx k E(X k Z k z k ) and the variance P k Var(X k Z k z k ) in terms of x k, Pk, h k and R (d) Argue that X k has a normal distribution with mean ˆx k and variance P k (e) Continue from (c) Find the formula K k in order to express ˆx k x k + K k (z k h k x k ) Concluding remark The formula K k in Problem 7(e) is called the Kalman gain Then the best estimate ˆx k and the variance P k are recursively used and the computation procedure of Problem 7(a) and (c) is applied repeatedly The resulting sequence ˆx,, ˆx n is the best estimate for X,, X n This iterative procedure became known as the discrete Kalman filter algorithm Page 5 Probability and Statistics I/December 4, 7

Answers to exercises Answers to exercises Problems and Discussion in class should be sufficient in order to answer all the questions Problems 3 and 4 The following decompositions of Σ are sufficient in order to answer all the questions [ σ x ρ ρ σy [ σy ρ σxσ y( ρ ) ρ σx [ ρ ρ ρ ρ ρ ρ [ ρ ρ ρ ρ ρ ρ Problems 5 (a) Cov(X, Y ) E[XY E[XE[Y (/4) (/3)(/3) /36, and ρ (/36)/ (/8) / (b) Since f(x y) y y for x y and f(y x) x x+ and E(Y X x) for x y, we obtain E(X Y y) (c) The best linear predictor g(x) of Y is given by g(x) µ y + ρ σy (X µ x ) X + (d) Then best predictor of Y is E(Y X) X+ [This is the same as (c) Why? Problems 6 (a) Cov(X, Y ) E[XY E[XE[Y 7/4 (9/4) 5/588 and ρ ( 5/588) (99/94) 5/99 (b) E(Y X x) 6x +8x+3 4(3x +3x+) (c) The best linear predictor g(x) of Y is given by g(x) 5 9 X + 44 99 Problems 7 (a) P (Y 3) P ( Y 3 8 4) Φ( 5) 6 (b) E(Y X 3) (3) + (8) ( 8 6) (3 8) 35 Page 6 Probability and Statistics I/December 4, 7

Answers to exercises (c) Since the conditional density f Y X (y x) has the normal density with mean µ 35 and variance σ (8) ( (8) ) 34, we can obtain ( ) Y 35 P (Y 3 X 3) P 8 48 X 3 Φ( 8) 86 Problems 8 (a) h (u, v) u b and h (u, v) v b [ a Thus, J h (u, v) det a a a a a ( u b (b) f Y,Y (u, v) f X,X, v b ) a a a a ( π( a )( a ) ρ exp ) Q(u, v), [ ( ) ( ) where Q(u, v) u (a µ x+b ) ρ a + v (a µ y+b ) a ρ (u (a µ x+b ))(v (a µ y+b )) (a )(a ) (c) From (b), f Y,Y (u, v) is a bivariate normal distribution with parameter µ u a µ x + b, µ v a µ y + b, σ u a, σ v a, and ρ if a a > (or, ρ if a a < ) Problems 9 (i) h (u, v) a (u b ) a (v b ) and h (u, v) a (u b ) + a (v b ) a a a a a a a a Thus, J h (u, v) a a a a (ii) Note that f X,X (x, y) π exp ( ) (x + y ) Then we obtain f Y,Y (u, v) f X,X (h (u, v), h (u, v)) a a a a ( π a a a a exp ) Q(u, v), where Q(u, v) [ (u b ) (v b ) [ a + a a a + a a a a + a a a + a [ u b v b (iii) By comparing f Y,Y (u, v) with a bivariate normal density, we can find that f Y,Y (u, v) is a bivariate normal distribution with parameter µ u b, µ v b, σu a + a, σv a + a, a a + a a and ρ (a + a )(a + a ) Problems [ v u (a) h (u, v) uv and h (u, v) u uv Thus, J h (u, v) det u v u (b) f XY (x, y) f X (x)f Y (y) Page 7 Probability and Statistics I/December 4, 7

Answers to exercises (c) We can obtain f UV (u, v) u f X (uv) f Y (u uv) u Γ(α ) λα e λ(uv) (uv) α Γ(α ) λα e λ(u uv) (u uv) α f U (u)f V (v) where f U (u) Γ(α + α ) λα +α e λu u α +α and f V (v) Γ(α + α ) Γ(α )Γ(α ) vα ( v) α Thus, f U (u) has a gamma distribution with parameter (α + α, λ), and f V (v) has a beta distribution with parameter (α, α ) Problems (a) Observe that f(x, y) (xe x ) (e y ) and that xe x, x, is the gamma density with parameter (, ) and e y, y, is the exponential density with λ Thus, we can obtain f X (x) f Y (y) f(x, y) dy xe x e y dy xe x, x ; f(x, y) dx e y xe x dx e y, y Since f(x, y) f X (x)f Y (y), X and Y are independent (b) Notice that the exponential density f Y (y) with λ can be seen as the gamma distribution with (, ) Since the sum of independent gamma random variables has again a gamma distribution, U has the gamma distribution with (3, ) Problems Let X be the amount of money you invest in the first year, let X be the amount in the second year, and let X be the amount in the third year Then the total amount of money you have at the end of term, say Y, becomes Y 33X + X + X 3 (a) E(33X + X + X 3 ) 33E(X ) + E(X ) + E(X 3 ) 364 5 85 (b) Var(33X + X + X 3 ) (33) Var(X ) + () Var(X ) + () Var(X 3 ) 4446 778 Thus, the standard deviation is Var(Y ) 778 47 ( ) Y 85 85 (c) P (Y ) P Φ(43) 334 47 47 Problems 3 (a) By using the change-of-variable formula with g(x) x n, we can obtain f Y (y) f X (g (y)) ) f X (y n ) n n y ( n ) n λy ( n ) n exp ( λy n, y Page 8 Probability and Statistics I/December 4, 7 d dy g (y)

Answers to exercises (b) Since the pdf f(x) is of the form of gamma density with (n +, λ), we must have f(x) λ n+ Γ(n + ) xn e λx, x Therefore, c λn+ λn+ (, since n is an integer) n! Γ(n + ) (c) Since f(x) in (b) is a pdf, we have f(x) dx λ n+ Γ(n+) xn e λx dx Thus, we can obtain E[Y E[X n x n ( λe λx) dx Γ(n+) λ n ( n! λ n, since n is an integer) Problems 4 (i) σ u Var(U) a σ x + a σ y and σ v Var(V ) a σ x + a σ y (ii) h (u, v) D (a u a v) and h (u, v) D ( a u + a v) (iii) We have J h (u, v) D (a) ρ a a σ x +a a σ y (a σ x+a )(a +a ) (b) Q(u, v) can be expressed as Q(u, v) ρ [ ( ) ( ) u v + ρ uv σ u σ v σ u σ v Thus, the joint density function f U,V (u, v) is the bivariate normal density with parameters µ u µ v together with σ u, σ v, ρ as calculated in (a) and (c) (c) U U + a µ x + a µ y and V V + a µ x + a µ y Then U and V has the bivariate normal density with parameters µ u a µ x + a µ y, µ v a µ x + a µ y, σ u, σ v, and ρ (d) (U, V ) has a bivariate normal distribution with parameter µ u a µ x + a µ y, µ v a µ x + a µ y, σu a σx + a σy, σv a σx + a σy, a and ρ a σx+a a σy (a σx +a )(a +a σ) y Problems 5 (a) E[U E[X + E[Y µ and Var(U) Var(X) + Var(Y ) σ x + σ y (b) By using linear transformation theorem, we can find that X and U have the bivariate normal distribution with means µ x µ and µ u µ, variances σx σx and σu σx + σy, and ρ σ x + σy (c) E(X U t) µ x + ρ σ u (t µ u ) µ + (d) E(X U 3) + (t µ) σx + σy () (3 ) 4 () + (5) Page 9 Probability and Statistics I/December 4, 7