ECE534, Spring 08: Solutions for Problem Set #3 Jointly Gaussian Random Variables and MMSE Estimation Suppose that X, Y are jointly Gaussian random variables with µ X = µ Y = 0 and σ X = σ Y = Let their correlation coefficient be ρ with ρ < Based on X, Y, we define the following random variables: W = + ρ + X + ρ + ρ Y ρ Z = + ρ X + ρ + ρ + Y ρ a Are W, Z jointly Gaussian? Justify your answer b Calculate f W Z w, z c Find the MMSE estimator of Z given W d Find the linear MMSE estimator of X given W Solution a Let α = +ρ + ρ and β = +ρ ρ Note that any linear combination of W, Z corresponds to a linear combination of X, Y : aw + bz = aα + bβx + aβ + bαy, which is Gaussian since X, Y are jointly Gaussian Therefore, W and Z are jointly Gaussian b From a, W and Z have a bi-variate Gaussian density, determined by the mean vector and covariance matrix Clearly, µ W = µ Z = 0 For the variances and the correlation coefficient, we have: σz = σw ρ = CovW, W = + ρ ρ =, ρ W,Z = CovW, Z = ρ + ρ + ρ ρ = 0 Therefore, W Z are uncorrelated hence independent, being jointly Gaussian This shows that for w, z R: f W Z w, z = f W wf Z z = e w e z = π π π e w +z c The MMSE estimator of Z given W is the conditional mean E[Z W ] Since Z, W are independent, we have E[Z W ] = E[Z] = µ Z = 0
d A straightforward computation gives CovX, W = + ρ + ρ We now have: Ê[X W ] = µ X + CovX, W σw W µ W = + ρ + ρ W Interplay between Information Theory and Estimation Let P, Q be two distributions defined on R k with densities f P, f Q, respectively Assume that the support of the densities coincides with R k, ie, f P x > 0 and f Q x > 0 for any x R k Their Kullback-Leibler divergence is then defined as follows: DP Q = f P x log f P x R k f Q x dx DP Q corresponds to a measure of dissimilarity between P and Q Consider now two continuous random variables X and Y with joint distribution P XY and marginals P X, P Y, respectively Suppose that the support of P XY is R and also the support of the marginals is R Then, the mutual information IX; Y between X, Y is defined as: IX; Y = DP XY P X P Y Assume that two continuous random variables X and Y are related by the following relationship: Y = ax + N, where a 0 is a deterministic parameter, X N 0, and N N 0, is independent of X We already know that ˆX = ˆX a = E[X Y ] is the MMSE estimator of X given Y, while [ MMSE = MMSEa = E X ˆX ] is the achievable mean square error Note that we have explicitly shown the dependence of the estimator and the MMSE on the parameter a Show that by proving the following steps: a IX; Y = log + a b ˆX = a +a Y c MMSEa = +a Combine the above steps to conclude that holds d da IX; Y = MMSEa Note: All logarithms in this exercise are natural base e
Solution: a Note that the mutual information can be written in expectation form: { IX; Y = E X,Y log f XY X, Y f X Xf Y Y For the involved densities, we note that Y N 0, + a and therefore f Y y = π + a e y +a For the joint density we have f Y X=x y = f N y ax, hence f X,Y x, y = f Y X=x yf X x = f N y axf X x indepedence of X,N and f X,Y x, y f X xf Y y = f Ny ax f X x f X xf Y y = e y ax / π + a π = + ae y +a y ax e y +a Using the previous expressions in IX; Y, we obtain: IX; Y = { E X,Y Y ax + + a Y + log + a = log + a + [ ] E Y + a Y a E X [X ] + a E X,Y [XY ] Moreover, E X,Y [XY ] = E[XN + ax] = a, E X [X ] =, and E Y [Y ] = + a, thus IX; Y = log + a + b The MMSE estimator is linear and it is given by c ˆX = µ X + a a + a a = log + a a {{ CovX, Y Y VarY µ Y = +a a a + Y a {{ MMSE = E X ˆX CovX, Y = VarX 3 VarY +a = + a
Combining the above steps, follows by differentiating the mutual information: d da IX; Y = d log + a = da + a = MMSEa 3 Data Reduction, Sufficient Statistics and the MVUE Assume that we want to estimate an unknown parameter θ Treating θ as a deterministic variable, a good estimator ˆθ of θ is one which takes values close to the true parameter θ when attempting to estimate it If ˆθ is an unbiased estimator of θ, then MSE = E[ˆθ θ ] = Varˆθ as we have proved in class If the bias of ˆθ, bθ = E[ˆθ] θ, is nonzero, the MSE is still a good measure to evaluate the performance of a biased estimator ˆθ Given a set of observations X = {X, X,, X n containing information about the unknown parameter θ, a sufficient statistic is a function of the data: T X = T X, X,, X n containing all information that the data brings about θ any measurable function of the data tx is a statistic, but not necessarily sufficient More rigorously and assuming for simplicity that the data has a joint pdf, T X is a sufficient static for θ if the conditional pdf px, X,, X n T X ; θ = px T X ; θ does not depend on θ Clearly, when this holds, T X provides all information hidden in the data about θ and therefore, the initial set of data X can be discarded and only T X is stored hence, the name sufficient statistic Moreover, sufficient statistics are not unique Relevant to the notion of a sufficient statistic, the following two theorems are important: Theorem Fisher-Neyman factorization theorem: T X is a sufficient statistic if and only if the joint pdf of the data can be factored as px ; θ = hx gt X, θ That is, the joint pdf is factored into two parts: one part that depends only on the statistic and the parameters θ and a second part that is independent of the parameters θ Theorem Rao-Blackwell theorem: Let ˆθ be an estimator of θ with E[ˆθ ] < for all θ Suppose that T X is a sufficient statistic for θ and let ˆθ = E[ˆθ T X ] Then for all θ, [ ] [ ] E ˆθ θ E ˆθ θ The inequality is strict unless ˆθ is a function of the sufficient statistic T X 4
As it is clear from the theorem, ˆθ is an estimator of θ and if ˆθ is unbiased, then so is ˆθ since [ ] E[ˆθ ] = E E[ˆθ T X ] = E[ˆθ] = θ Moreover, if there is a unique function qt X which is an unbiased estimator of θ, then qt X is the MVUE To finish the above introductory material, we note that a statistic tx is complete if for any measurable function γ, the condition E[γtX ] = 0 θ implies that γ 0 almost surely If the sufficient statistic T X is also complete, then there will be at most one function qt X that would correspond to an unbiased estimator for θ With the above in mind, an approach to identify the MVUE is the following: i Find a sufficient statistic T X ii Argue that this statistic is complete iii Find an unbiased estimator qt X of θ This can be done via Rao- Blackwellization by considering any unbiased estimator ˆθ of θ and by setting qt X = E[ˆθ T X ] iv The MVUE is ˆθ = qt X Moreover, the MVUE is unique The above approach is based on the Lehmann-Scheffé Theorem, which is a direct consequence of the previous two theorems and the completeness of the statistic T X Let X be a set of n iid N µ, σ observations a Assume that the unknown parameter θ is the mean value µ Use Theorem to show that S n = n X i is a sufficient statistic for µ b Assume that µ is known but θ = σ is the unknown parameter Use Theorem to show that n X i µ is a sufficient statistic for σ c Assume that both µ and σ are unknown, ie, θ = µ, σ Use Theorem to show that the pair X i, is a sufficient statistic in this case d Given the set of data X = {X, X,, X n with θ = µ, a possible statistic is T X = {X, X Argue that T X is not complete by using e Assume that the unknown parameter is µ and note that ˆθ = X is an unbiased estimator for µ Argue that the MVUE which is unique for µ is Sn n X i 5
Solution a Consider an arbitrary realization x, x,, x n of X The joint density of X all random variables are iid can be factored are follows: { n px,, x n ; µ = f Xi x i = πσ exp x i µ n σ { = πσ n exp x σ i x i µ + µ { = πσ n exp σ x i µs n + nµ { = πσ n { exp σ x i exp µsn σ + nµ hx gt X,µ Therefore, by Theorem, T X = S n is a sufficient statistic Note: For notational conservatism, here, we use S n to also denote the realized value of the statistic S n b Let Q n = n X i µ Again, we consider the joint density, but this time with θ = σ : { px,, x n ; σ = πσ exp x i µ n σ = hx πσ { n Qn exp σ {{ gt X,σ c Let R n := n X i so that T X = {S n, R n This gives rise to { px,, x n ; µ, σ = πσ exp x i µ n σ θ { = πσ n exp x σ i x i µ + µ { = πσ n exp σ R n µs n + nµ hx gt X,θ d We seek some nonzero function γ, for which EγX, X = 0 Consider a nonzero linear function of the form γx, X = ax + bx with a, b 0 Then, EγX, X = a + bµ 6
Clearly, whenever a + b = 0 we get EγX, X = 0 whereas PγX, X = 0 = PX = X = 0, µ Therefore, γt X 0 almost surely and hence T X is not complete e We will follow the Lehmann-Scheffé steps described above i We already established that S n is a sufficient statistic for µ ii [bonus part] To argue that S n is complete, note that S n N nµ, nσ Suppose that for all µ R, we have: EγS n = s nµ γs e nσ ds = 0 3 πnσ We need to show that γ 0 almost surely Observe that the integral in 3 is a bilateral Laplace transform: EγS n = γse sµ σ ds = B{ γ µ σ = 0, µ R 4 Here, γs := e s nµ nσ γs Assuming that γ is continuous, we πnσ deduce that γ is continuous By the properties of the bilateral Laplace transform, 4 implies that γ 0 almost surely and, thus, γ 0 almost surely iii Using Rao-Blackwellization, we begin with some unbiased estimator of µ, which in this case is ˆµ = X, and seek some q given by qs n = Eˆµ S n = EX S n Note that qs n = EX i S n for all i =,, n by the iid assumption on X, X,, X n By summing, we obtain: Sn nqs n = EX i S n = E X i = ES n S n = S n qs n Therefore, qs n = n S n 4 Maximum of a finite set of sub-gaussian Random variables A random variable X R is called sub-gaussian with variance proxy σ if E[X] = 0 and its moment generating function satisfies: We write X subgσ m X u = E [ e ux] e u σ, u R a Use the Chernoff bound to show that when X subg σ PX > t e t σ, t > 0 7
b Let X, X,, X n be subgσ random variables, not necessarily independent Show that the expectation of the maximum can be bounded as E[ max i n X i] σ log n Hint: Start your derivation by noting that E[ max X i] = [ ] i n λ E log e λ max i n X i, λ > 0 c With the same assumptions as in the previous part, show that P max X i > t ne t σ, t > 0 i n Solution: a By applying the Chernoff bound, we obtain: PX > t = P e ux > e ut which holds for any u > 0 By subgaussianity, Plugging 6 into 5 gives m X u {{ E e ux e ut, 5 m X u e σ u / 6 PX > t e σ u ut = e φu, 7 where φu := σ u ut Choose u = t σ, which minimizes φu to obtain φu = t σ Therefore, as required PX > t e φu = e t σ b Using the hint, Emax X i = log e i λ E λ max i X i E e λ log λ max i X i Jensen, concavity of log = E λ log max e λx i Monotonicity of e x i E λ log e λx i max X i X i i i λ log ne λ σ subgaussianity = log n λ + λσ {{ gλ 8
Minimizing gλ by setting its derivative to zero, we obtain λ = which we have log n σ for E[max X i ] σ log n i c Invoking the union bound and part a we obtain: 5 Erdős-Rényi graphs n P max X i > t = P {X i > t i n PX i > t e t σ ne t σ The distance between any two nodes in a graph is the length of the shortest path connecting them The diameter of a graph is the maximum distance between any two nodes of the graph Consider again the ensemble Gn, p of Erdős-Rényi graphs in Problem 3 of HW# Suppose that p is fixed and let D n be the diameter of a graph drawn from this ensemble Show that P D n as n Note: If p is allowed to scale with n, then p = log n n is a sharp threshold for the aforementioned property, meaning that if p < log n n, then P D n >, as n Hint: Define the random variable X n, which is the number of node pairs u, v in the graph G Gn, p with no common neighbors Show that P X n = 0 as n Solution: Using the provided hint, we note that X n = 0 D n, ie, the diameter of a graph is if and only if all nodes are separated by up to one neighbor, or if they are connected directly Equivalently, X n D n > By employing Markov inequality, we have: PD n > = PX n EX n We need to show that EX n 0 as n, which in turn will guarantee that PD n 9
Let u, v be a pair of nodes with u v Define the event A u,v : A u,v := {du, v > = {u, v are neither neigbhors nor share neighbors Here, du, v is the distance shortest path between u and v The nodes u, v are not adjacent ie, neighbors with probability p; the rest of the n nodes are not adjacent to both u and v with probability p n Hence, PA u,v = p n = p p n Clearly A u,v = A v,u since the graph is undirected Therefore, we consider only the nodes S n := {u, v u < v n We further note that the cardinality of S n is n We now have: X n = and thus, EX n = u,v S n PA u,v = S n p n u,v S n Au,v n pe n p 0 as n, where the elementary inequality e x + x, x R has been used 0