ECE534, Spring 2018: Solutions for Problem Set #3

Similar documents
Mathematical statistics

EIE6207: Estimation Theory

Chapter 8: Least squares (beginning of chapter)

Probability Theory and Statistics. Peter Jochumzen

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

Advanced Signal Processing Introduction to Estimation Theory

Exercises with solutions (Set D)

Brief Review on Estimation Theory

STAT 730 Chapter 4: Estimation

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

ECE534, Spring 2018: Solutions for Problem Set #4 Due Friday April 6, 2018

Exercises and Answers to Chapter 1

Probability and Distributions

Example: An experiment can either result in success or failure with probability θ and (1 θ) respectively. The experiment is performed independently

Chapters 9. Properties of Point Estimators

Continuous Random Variables

MAS223 Statistical Inference and Modelling Exercises

Appendix A : Introduction to Probability and stochastic processes

x log x, which is strictly convex, and use Jensen s Inequality:

Mathematical statistics

Probability Background

Chapter 4 : Expectation and Moments

Multiple Random Variables

5 Operations on Multiple Random Variables

UCSD ECE250 Handout #20 Prof. Young-Han Kim Monday, February 26, Solutions to Exercise Set #7

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Formulas for probability theory and linear models SF2941

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

STAT 512 sp 2018 Summary Sheet

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Solutions to Homework Set #6 (Prepared by Lele Wang)

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

ST5215: Advanced Statistical Theory

Math 362, Problem set 1

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

HT Introduction. P(X i = x i ) = e λ λ x i

ECE 4400:693 - Information Theory

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

ECE 275A Homework 7 Solutions

Multiple Random Variables

Statistics and Econometrics I

Spring 2012 Math 541B Exam 1

Math 494: Mathematical Statistics

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee

Quick Tour of Basic Probability Theory and Linear Algebra

557: MATHEMATICAL STATISTICS II BIAS AND VARIANCE

Mathematical statistics

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Lecture 8: Information Theory and Statistics

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Lecture 7 Introduction to Statistical Decision Theory

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

UCSD ECE153 Handout #30 Prof. Young-Han Kim Thursday, May 15, Homework Set #6 Due: Thursday, May 22, 2011

Mathematical Statistics

Chapter 8.8.1: A factorization theorem

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

18 Bivariate normal distribution I

BASICS OF PROBABILITY

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t,

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Final Examination Solutions (Total: 100 points)

Bivariate distributions

Theory of Statistics.

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

conditional cdf, conditional pdf, total probability theorem?

6 The normal distribution, the central limit theorem and random samples

Lecture 7: Chapter 7. Sums of Random Variables and Long-Term Averages

Chapter 7. Basic Probability Theory

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

ESTIMATION THEORY. Chapter Estimation of Random Variables

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π

ECE 275A Homework 6 Solutions

ECON 3150/4150, Spring term Lecture 6

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Statistics: Learning models from data

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

Classical Estimation Topics

STA 6857 Autocorrelation and Cross-Correlation & Stationary Time Series ( 1.4, 1.5)

Notes 6 : First and second moment methods

4 Derivations of the Discrete-Time Kalman Filter

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

Chp 4. Expectation and Variance

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

CHAPTER 5. Jointly Probability Mass Function for Two Discrete Distributed Random Variables:

We introduce methods that are useful in:

Introduction to Normal Distribution

February 26, 2017 COMPLETENESS AND THE LEHMANN-SCHEFFE THEOREM

ENGG2430A-Homework 2

The Delta Method and Applications

Probability and Measure

Transcription:

ECE534, Spring 08: Solutions for Problem Set #3 Jointly Gaussian Random Variables and MMSE Estimation Suppose that X, Y are jointly Gaussian random variables with µ X = µ Y = 0 and σ X = σ Y = Let their correlation coefficient be ρ with ρ < Based on X, Y, we define the following random variables: W = + ρ + X + ρ + ρ Y ρ Z = + ρ X + ρ + ρ + Y ρ a Are W, Z jointly Gaussian? Justify your answer b Calculate f W Z w, z c Find the MMSE estimator of Z given W d Find the linear MMSE estimator of X given W Solution a Let α = +ρ + ρ and β = +ρ ρ Note that any linear combination of W, Z corresponds to a linear combination of X, Y : aw + bz = aα + bβx + aβ + bαy, which is Gaussian since X, Y are jointly Gaussian Therefore, W and Z are jointly Gaussian b From a, W and Z have a bi-variate Gaussian density, determined by the mean vector and covariance matrix Clearly, µ W = µ Z = 0 For the variances and the correlation coefficient, we have: σz = σw ρ = CovW, W = + ρ ρ =, ρ W,Z = CovW, Z = ρ + ρ + ρ ρ = 0 Therefore, W Z are uncorrelated hence independent, being jointly Gaussian This shows that for w, z R: f W Z w, z = f W wf Z z = e w e z = π π π e w +z c The MMSE estimator of Z given W is the conditional mean E[Z W ] Since Z, W are independent, we have E[Z W ] = E[Z] = µ Z = 0

d A straightforward computation gives CovX, W = + ρ + ρ We now have: Ê[X W ] = µ X + CovX, W σw W µ W = + ρ + ρ W Interplay between Information Theory and Estimation Let P, Q be two distributions defined on R k with densities f P, f Q, respectively Assume that the support of the densities coincides with R k, ie, f P x > 0 and f Q x > 0 for any x R k Their Kullback-Leibler divergence is then defined as follows: DP Q = f P x log f P x R k f Q x dx DP Q corresponds to a measure of dissimilarity between P and Q Consider now two continuous random variables X and Y with joint distribution P XY and marginals P X, P Y, respectively Suppose that the support of P XY is R and also the support of the marginals is R Then, the mutual information IX; Y between X, Y is defined as: IX; Y = DP XY P X P Y Assume that two continuous random variables X and Y are related by the following relationship: Y = ax + N, where a 0 is a deterministic parameter, X N 0, and N N 0, is independent of X We already know that ˆX = ˆX a = E[X Y ] is the MMSE estimator of X given Y, while [ MMSE = MMSEa = E X ˆX ] is the achievable mean square error Note that we have explicitly shown the dependence of the estimator and the MMSE on the parameter a Show that by proving the following steps: a IX; Y = log + a b ˆX = a +a Y c MMSEa = +a Combine the above steps to conclude that holds d da IX; Y = MMSEa Note: All logarithms in this exercise are natural base e

Solution: a Note that the mutual information can be written in expectation form: { IX; Y = E X,Y log f XY X, Y f X Xf Y Y For the involved densities, we note that Y N 0, + a and therefore f Y y = π + a e y +a For the joint density we have f Y X=x y = f N y ax, hence f X,Y x, y = f Y X=x yf X x = f N y axf X x indepedence of X,N and f X,Y x, y f X xf Y y = f Ny ax f X x f X xf Y y = e y ax / π + a π = + ae y +a y ax e y +a Using the previous expressions in IX; Y, we obtain: IX; Y = { E X,Y Y ax + + a Y + log + a = log + a + [ ] E Y + a Y a E X [X ] + a E X,Y [XY ] Moreover, E X,Y [XY ] = E[XN + ax] = a, E X [X ] =, and E Y [Y ] = + a, thus IX; Y = log + a + b The MMSE estimator is linear and it is given by c ˆX = µ X + a a + a a = log + a a {{ CovX, Y Y VarY µ Y = +a a a + Y a {{ MMSE = E X ˆX CovX, Y = VarX 3 VarY +a = + a

Combining the above steps, follows by differentiating the mutual information: d da IX; Y = d log + a = da + a = MMSEa 3 Data Reduction, Sufficient Statistics and the MVUE Assume that we want to estimate an unknown parameter θ Treating θ as a deterministic variable, a good estimator ˆθ of θ is one which takes values close to the true parameter θ when attempting to estimate it If ˆθ is an unbiased estimator of θ, then MSE = E[ˆθ θ ] = Varˆθ as we have proved in class If the bias of ˆθ, bθ = E[ˆθ] θ, is nonzero, the MSE is still a good measure to evaluate the performance of a biased estimator ˆθ Given a set of observations X = {X, X,, X n containing information about the unknown parameter θ, a sufficient statistic is a function of the data: T X = T X, X,, X n containing all information that the data brings about θ any measurable function of the data tx is a statistic, but not necessarily sufficient More rigorously and assuming for simplicity that the data has a joint pdf, T X is a sufficient static for θ if the conditional pdf px, X,, X n T X ; θ = px T X ; θ does not depend on θ Clearly, when this holds, T X provides all information hidden in the data about θ and therefore, the initial set of data X can be discarded and only T X is stored hence, the name sufficient statistic Moreover, sufficient statistics are not unique Relevant to the notion of a sufficient statistic, the following two theorems are important: Theorem Fisher-Neyman factorization theorem: T X is a sufficient statistic if and only if the joint pdf of the data can be factored as px ; θ = hx gt X, θ That is, the joint pdf is factored into two parts: one part that depends only on the statistic and the parameters θ and a second part that is independent of the parameters θ Theorem Rao-Blackwell theorem: Let ˆθ be an estimator of θ with E[ˆθ ] < for all θ Suppose that T X is a sufficient statistic for θ and let ˆθ = E[ˆθ T X ] Then for all θ, [ ] [ ] E ˆθ θ E ˆθ θ The inequality is strict unless ˆθ is a function of the sufficient statistic T X 4

As it is clear from the theorem, ˆθ is an estimator of θ and if ˆθ is unbiased, then so is ˆθ since [ ] E[ˆθ ] = E E[ˆθ T X ] = E[ˆθ] = θ Moreover, if there is a unique function qt X which is an unbiased estimator of θ, then qt X is the MVUE To finish the above introductory material, we note that a statistic tx is complete if for any measurable function γ, the condition E[γtX ] = 0 θ implies that γ 0 almost surely If the sufficient statistic T X is also complete, then there will be at most one function qt X that would correspond to an unbiased estimator for θ With the above in mind, an approach to identify the MVUE is the following: i Find a sufficient statistic T X ii Argue that this statistic is complete iii Find an unbiased estimator qt X of θ This can be done via Rao- Blackwellization by considering any unbiased estimator ˆθ of θ and by setting qt X = E[ˆθ T X ] iv The MVUE is ˆθ = qt X Moreover, the MVUE is unique The above approach is based on the Lehmann-Scheffé Theorem, which is a direct consequence of the previous two theorems and the completeness of the statistic T X Let X be a set of n iid N µ, σ observations a Assume that the unknown parameter θ is the mean value µ Use Theorem to show that S n = n X i is a sufficient statistic for µ b Assume that µ is known but θ = σ is the unknown parameter Use Theorem to show that n X i µ is a sufficient statistic for σ c Assume that both µ and σ are unknown, ie, θ = µ, σ Use Theorem to show that the pair X i, is a sufficient statistic in this case d Given the set of data X = {X, X,, X n with θ = µ, a possible statistic is T X = {X, X Argue that T X is not complete by using e Assume that the unknown parameter is µ and note that ˆθ = X is an unbiased estimator for µ Argue that the MVUE which is unique for µ is Sn n X i 5

Solution a Consider an arbitrary realization x, x,, x n of X The joint density of X all random variables are iid can be factored are follows: { n px,, x n ; µ = f Xi x i = πσ exp x i µ n σ { = πσ n exp x σ i x i µ + µ { = πσ n exp σ x i µs n + nµ { = πσ n { exp σ x i exp µsn σ + nµ hx gt X,µ Therefore, by Theorem, T X = S n is a sufficient statistic Note: For notational conservatism, here, we use S n to also denote the realized value of the statistic S n b Let Q n = n X i µ Again, we consider the joint density, but this time with θ = σ : { px,, x n ; σ = πσ exp x i µ n σ = hx πσ { n Qn exp σ {{ gt X,σ c Let R n := n X i so that T X = {S n, R n This gives rise to { px,, x n ; µ, σ = πσ exp x i µ n σ θ { = πσ n exp x σ i x i µ + µ { = πσ n exp σ R n µs n + nµ hx gt X,θ d We seek some nonzero function γ, for which EγX, X = 0 Consider a nonzero linear function of the form γx, X = ax + bx with a, b 0 Then, EγX, X = a + bµ 6

Clearly, whenever a + b = 0 we get EγX, X = 0 whereas PγX, X = 0 = PX = X = 0, µ Therefore, γt X 0 almost surely and hence T X is not complete e We will follow the Lehmann-Scheffé steps described above i We already established that S n is a sufficient statistic for µ ii [bonus part] To argue that S n is complete, note that S n N nµ, nσ Suppose that for all µ R, we have: EγS n = s nµ γs e nσ ds = 0 3 πnσ We need to show that γ 0 almost surely Observe that the integral in 3 is a bilateral Laplace transform: EγS n = γse sµ σ ds = B{ γ µ σ = 0, µ R 4 Here, γs := e s nµ nσ γs Assuming that γ is continuous, we πnσ deduce that γ is continuous By the properties of the bilateral Laplace transform, 4 implies that γ 0 almost surely and, thus, γ 0 almost surely iii Using Rao-Blackwellization, we begin with some unbiased estimator of µ, which in this case is ˆµ = X, and seek some q given by qs n = Eˆµ S n = EX S n Note that qs n = EX i S n for all i =,, n by the iid assumption on X, X,, X n By summing, we obtain: Sn nqs n = EX i S n = E X i = ES n S n = S n qs n Therefore, qs n = n S n 4 Maximum of a finite set of sub-gaussian Random variables A random variable X R is called sub-gaussian with variance proxy σ if E[X] = 0 and its moment generating function satisfies: We write X subgσ m X u = E [ e ux] e u σ, u R a Use the Chernoff bound to show that when X subg σ PX > t e t σ, t > 0 7

b Let X, X,, X n be subgσ random variables, not necessarily independent Show that the expectation of the maximum can be bounded as E[ max i n X i] σ log n Hint: Start your derivation by noting that E[ max X i] = [ ] i n λ E log e λ max i n X i, λ > 0 c With the same assumptions as in the previous part, show that P max X i > t ne t σ, t > 0 i n Solution: a By applying the Chernoff bound, we obtain: PX > t = P e ux > e ut which holds for any u > 0 By subgaussianity, Plugging 6 into 5 gives m X u {{ E e ux e ut, 5 m X u e σ u / 6 PX > t e σ u ut = e φu, 7 where φu := σ u ut Choose u = t σ, which minimizes φu to obtain φu = t σ Therefore, as required PX > t e φu = e t σ b Using the hint, Emax X i = log e i λ E λ max i X i E e λ log λ max i X i Jensen, concavity of log = E λ log max e λx i Monotonicity of e x i E λ log e λx i max X i X i i i λ log ne λ σ subgaussianity = log n λ + λσ {{ gλ 8

Minimizing gλ by setting its derivative to zero, we obtain λ = which we have log n σ for E[max X i ] σ log n i c Invoking the union bound and part a we obtain: 5 Erdős-Rényi graphs n P max X i > t = P {X i > t i n PX i > t e t σ ne t σ The distance between any two nodes in a graph is the length of the shortest path connecting them The diameter of a graph is the maximum distance between any two nodes of the graph Consider again the ensemble Gn, p of Erdős-Rényi graphs in Problem 3 of HW# Suppose that p is fixed and let D n be the diameter of a graph drawn from this ensemble Show that P D n as n Note: If p is allowed to scale with n, then p = log n n is a sharp threshold for the aforementioned property, meaning that if p < log n n, then P D n >, as n Hint: Define the random variable X n, which is the number of node pairs u, v in the graph G Gn, p with no common neighbors Show that P X n = 0 as n Solution: Using the provided hint, we note that X n = 0 D n, ie, the diameter of a graph is if and only if all nodes are separated by up to one neighbor, or if they are connected directly Equivalently, X n D n > By employing Markov inequality, we have: PD n > = PX n EX n We need to show that EX n 0 as n, which in turn will guarantee that PD n 9

Let u, v be a pair of nodes with u v Define the event A u,v : A u,v := {du, v > = {u, v are neither neigbhors nor share neighbors Here, du, v is the distance shortest path between u and v The nodes u, v are not adjacent ie, neighbors with probability p; the rest of the n nodes are not adjacent to both u and v with probability p n Hence, PA u,v = p n = p p n Clearly A u,v = A v,u since the graph is undirected Therefore, we consider only the nodes S n := {u, v u < v n We further note that the cardinality of S n is n We now have: X n = and thus, EX n = u,v S n PA u,v = S n p n u,v S n Au,v n pe n p 0 as n, where the elementary inequality e x + x, x R has been used 0