ECE 275A Homework 6 Solutions

Similar documents
Concentration Ellipsoids

ECE 275A Homework 7 Solutions

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

ECE 275B Homework # 1 Solutions Winter 2018

ECE 275B Homework # 1 Solutions Version Winter 2015

Linear Algebra: Matrix Eigenvalue Problems

A Few Notes on Fisher Information (WIP)

ECE 275A Homework #3 Solutions

Graduate Econometrics I: Unbiased Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Classical Estimation Topics

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

COMP 558 lecture 18 Nov. 15, 2010

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

ECE531 Lecture 8: Non-Random Parameter Estimation

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

EIE6207: Estimation Theory

Moore-Penrose Conditions & SVD

5 Linear Algebra and Inverse Problem

10. Linear Models and Maximum Likelihood Estimation

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Brief Review on Estimation Theory

Advanced Signal Processing Minimum Variance Unbiased Estimation (MVU)

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

Lecture 7: Positive Semidefinite Matrices

Maximum Likelihood Estimation

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Chapter 3 Transformations

Review of Linear Algebra

Mathematical foundations - linear algebra

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Math Linear Algebra II. 1. Inner Products and Norms

Econ Slides from Lecture 7

Detection & Estimation Lecture 1

2 Statistical Estimation: Basic Concepts

MIT Final Exam Solutions, Spring 2017

Exercise Sheet 1.

Estimation Theory Fredrik Rusek. Chapters

[y i α βx i ] 2 (2) Q = i=1

Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms. 1 Diagonalization and Change of Basis

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Math 215 HW #11 Solutions

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Detection & Estimation Lecture 1

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM

557: MATHEMATICAL STATISTICS II BIAS AND VARIANCE

Chapter 3. Point Estimation. 3.1 Introduction

1. Matrix multiplication and Pauli Matrices: Pauli matrices are the 2 2 matrices. 1 0 i 0. 0 i

Pseudoinverse & Orthogonal Projection Operators

EE731 Lecture Notes: Matrix Computations for Signal Processing

Stat 206: Linear algebra

Recitation 8: Graphs and Adjacency Matrices

This property turns out to be a general property of eigenvectors of a symmetric A that correspond to distinct eigenvalues as we shall see later.

A Brief Outline of Math 355

Lecture 2: Linear Algebra Review

Lecture 6 Positive Definite Matrices

Review of some mathematical tools

Fundamentals of Engineering Analysis (650163)

complex dot product x, y =

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Lecture 7 Introduction to Statistical Decision Theory

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Estimation and Detection

Properties of Linear Transformations from R n to R m

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

2. Matrix Algebra and Random Vectors

Advanced Signal Processing Introduction to Estimation Theory

Lie Groups for 2D and 3D Transformations

Homework 1 Elena Davidson (B) (C) (D) (E) (F) (G) (H) (I)

Outline. Motivation Contest Sample. Estimator. Loss. Standard Error. Prior Pseudo-Data. Bayesian Estimator. Estimators. John Dodson.

Chapter 6. Eigenvalues. Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45

Math Homework 8 (selected problems)

Maximum Likelihood Diffusive Source Localization Based on Binary Observations

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Notes on Eigenvalues, Singular Values and QR

Chapter 8: Least squares (beginning of chapter)

MTH 2032 SemesterII

Section 7.3: SYMMETRIC MATRICES AND ORTHOGONAL DIAGONALIZATION

Linear Models Review

Week Quadratic forms. Principal axes theorem. Text reference: this material corresponds to parts of sections 5.5, 8.2,

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Mathematical foundations - linear algebra

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Symmetric Matrices and Eigendecomposition

Sample ECE275A Midterm Exam Questions

Numerical Methods I Eigenvalue Problems

LINEAR ALGEBRA REVIEW

Linear Algebra: Characteristic Value Problem

ECE534, Spring 2018: Solutions for Problem Set #3

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Properties of the least squares estimates

CS 143 Linear Algebra Review

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Introduction to Estimation Methods for Time Series models Lecture 2

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Transcription:

ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ = π T Λ π = k 2. Thus k-level surfaces in θ space uniquely correspond to k-level surfaces in π-space and vice versa. Note also that the canonical coordinate-axes in π-space e i = (0, 0,, 0,, 0) T where the is in the i th position, correspond to the orthonormal directions u i in θ space according to e i u i, i =,, n. In π space we have π T Λ π = π2 [] λ 2 + + π2 [n] λ 2 n = k 2 or π 2 [] (kλ ) + + π2 [n] 2 (kλ n ) =. 2 This shows that each axis e i in π-space is a semi-major axis of a hyperellipsoid with length kλ i. Thus in the original θ space each direction u i is a semi-major axis of length kλ i. Note that λ i is the standard deviation of the error in the direction of the i-th semi-major axis and that k gives the number of error standard deviations that we might happen to be interested in. Let 0 < k < k and consider the k -level surface defined by θ T Σ θ = k 2 < k 2. This defines a hyperellipsoid with semi-major axes of length k λ i, i =, n. Since k λ i < kλ i for all i this shows that a k -level surface lies entirely within the k-level surface for any k < k. Also note that ( α θ ) T ( ) Σ α θ = k 2, where α = k k >. Thus any vector θ on the k level surface has to be expanded by a factor α > in order to lie on the k-level surface. We have shown that the region defined by θ T Σ θ < k 2 is the interior of the k-level surface of the hyperellipsoid.

2. (i) Note that for i =, 2, E θ a T θi = a T E θ θi = 0 for all a. Therefore from the fact that Σ Σ 2, for all a we have a T Σ a a T Σ 2 a a T E θ θ θt a a T E θ θ2 θt 2 a ( ) 2 ( ) 2 E θ a T θ E θ a T θ2 Cov θ a T θ Cov θ a T θ2. (ii) The form of the Tchebycheff inequality relevant to this problem is P ( e L) E e2 L 2 p. Setting L = L i and e = a T θ, i =, 2, for a specified lower bound probability p we have Cov θ a T θi p = L 2 i, for i =, 2. This yields Cov θ a T θ L = p Cov θ a T θ 2 p = L 2. We can also determine that L L 2 = Cov θ a T θ Cov θ a T θ. 2 This shows that in any direction in θ space, the at least probability p upper error bound L i, i =, 2, in the estimate is smaller for θ than for θ 2. 3. Given that 0 < Σ Σ 2 we want to prove that 0 < Σ 2 Σ. We will do this by figuring out how to simultaneously diagonalized Σ and Σ 2. Note that if Σ > 0 it is easy to show that Σ > 0. Convince yourself of this fact. 2

Why diagonalize? We first diagonalize because we can easily show that the desired result holds for positive definite diagonal matrices. First note that diagonal matrices commute, i.e. D D 2 = D 2 D for all diagonal matrices D and D 2. Also note that the square root of a positive definite matrix is a positive definite matrix. Thus for two positive definite diagonal matrices such that 0 < D D 2 we have for all x 0 x T (D 2 D )x = x T D (D D 2 )D 2 x for all y = D 2 2 D 2 x. Therefore 0 D D 2 or = x T D 2 D 2 2 (D D2 )D 2 2 D 2 x = y T (D D2 )y 0 < D 2 D. Simultaneous Diagonalization. We diagonalize Σ as which yields Σ = UΛU T Λ 2 U T Σ UΛ 2 = I. () We can then form the following positive definite matrix from Σ 2, which, of course, can also be diagonalized, yielding Λ 2 U T Σ 2 UΛ 2, Λ 2 U T Σ 2 UΛ 2 = V Π V T, V T Λ 2 U T Σ 2 UΛ 2 V = Π. (2) Now premultipy () by V T and then postmultiply the result by V. This yields, V T Λ 2 U T Σ UΛ 2 V = I. (3) Inspection of (2) and (3) shows that we have simultaneously diagonalized Σ and Σ 2 as A T Σ A = I A T Σ 2 A = Π where A = UΛ 2 V 3

with A invertible (but not generally orthogonal). Note that this can be rewritten as which, when inverted, yields Σ = A T A Σ 2 = A T ΠA, Σ = AA T Σ 2 = AΠ A T. Completion of the Proof. We have that for all x 0 x T (Σ 2 Σ )x = (A x) T (Π I)(A x) = y T (Π I)y, for all y = A x. Therefore 0 < I Π, which yields (from our earlier results for diagonal matrices) Final note that for all x, 0 < Π I. x T (Σ Σ 2 )x = (A T x) T (I Π )(A T x) = y T (I Π )y 0, for all y = A T x showing that Σ Σ 2 as claimed. 4. (i) This was essentially done in Problem () above. (ii) The form of the Tchebycheff inequality most convenient to this problem is P (y ɛ) E y ɛ, y 0, ɛ > 0. In particular we take ɛ = k 2 and y = θ T Σ θ. Note that θt E θ y = E θ Σ θ = E θ trace Σ θ θt = trace Σ Σ = trace I = n. Then ( θt P Σ θ k 2) n k p. 2 If we set a value for the probability level p, this sets a value for k of n k = p. 4

(iii) Note that Σ 2 Σ. For the same vector θ, define k > 0 and k > 0 by and We have This shows that ( α θ where k 2 θ T Σ k 2 θ T Σ 2 θ. θ. k 2 = θ T Σ 2 θ θ T Σ θ = k 2. ) T Σ 2 ( α θ ) = k 2 α = k k. This shows that every θ on a k level surface of Σ has to be grown by a factor of α in order to then lie on the k level surface (same k) of Σ 2. Thus the k level surface of Σ lies entirely within the k-level surface (same k) of Σ 2. An equivalent statement is that along any direction in error space, k standard deviations of the error θ is less than k standard deviations of the error θ 2. (iv) If two error covariances can be ordered as Σ Σ 2 then the k level errorconcentration ellipsoids of Σ lies entirely within the k level (same k) error concentration ellipsoids of Σ 2. If the two error covariances cannot be ordered in this manner, then their k level (same k) concentration ellipsoids cannot be properly nested one inside the other. 5. (a) i. M has n eigenvector-eigenvalue pairs (λ i, x i ), Mx i = λ i x i, i =, n, where the n eigenvectors are linearly independent. 2 Thus, λ 0 0 0 M [ ] x x 2 x 3 x n = [ ] 0 λ 2 0 0 x x 2 x 3 x n 0 0 λ 3 0 X X....... 0 0 0 0 λ n Λ 2 This is a nontrivial assumption. A sufficient condition for it to be true are that all n eigenvalues are distinct. Another is that the matrix M be normal, MM H = M H M. Since a hermitian matrix, M = M H is normal, an n n hermitian matrix has n linearly independent eigenvectors. Note that this implies that a real symmetric matrix has n linearly independent eigenvectors. Normalizing the eigenvectors to have unit length, it is not hard to show that a hermitian matrix also has real eigenvalues: λ i = λ i x H i x i = x H i Mx i = x T i M T x i = x H i M H x i = x H i Mx i = λ i. Although we don t prove it, it is also the case that the n eigenvectors of a hermitian matrix can be taken to be mutually orthonormal. 5

or MX = XΛ. With the assumption that the n eigenvectors are linearly independent, the so-called modal matrix X is invertible so that Λ = X MX. 3 It is also evident that M = XΛX. Thus, tr M = tr ( XΛX ) = tr ( ΛX X ) = tr Λ = n λ i. (4) i= Because a real n n can be viewed as a special type of a complex matrix (one for which all elements have zero imaginary parts), the result (4) also holds for real, symmetric matrices M = M H = M T. ii. As mentioned in Footnote 2, an n n hermitian matrix is guaranteed to have a full set of n linearly independent eigenvectors x i, i =,, n. Therefore, given the result (4) which holds for such matrices, to show that a complex n n matrix M that is hermitian and positive semidefinite satisfies tr M = n λ i 0, (5) i= it is enough to show that each eigenvalue λ i is real and nonnegative. This is straightforward to do because, by definition of being positive semidefinite, we have 4 R x H Mx 0 for all x C n. If in particular we take x = x i, with x i normalized, we have 0 x H i Mx i = λ i x H i x i = λ i. The result (5) also holds for real symmetric positive semidefinite matrices as such matrices can be view as special forms of hermitian semidefinite matrices. (b) From the definition MSE θ (ˆθ) = E θ θ θ T, θ = ˆθ θ, the matrix MSEθ (ˆθ) is symmetric positive semidefinite for all parameter vectors θ and for estimators ˆθ. Therefore tr MSE θ (ˆθ) = E θ θ T θ = Eθ θ 2 = mse θ (ˆθ) 0, as expected from our nonnegativity result for the trace of a symmetric positive semidefinite derived in Part (a) of this problem. From the definition of the positive semidefinite partial ordering, we have MSE θ (ˆθ ) MSE θ (ˆθ) M θ MSE θ (ˆθ) MSE θ (ˆθ ) 0. 3 This shows that an n n matrix with n linearly independent eigenvalues is diagonalizable. In particular hermitian and symmetric matrices are diagonalizable. 4 The matrix M being hermitian is sufficient for the quadratic form x H Mx to be real (just show that x H Mx = x H Mx.). M being positive semidefinite imposes the addition condition that this quadratic form be nonnegative. 6

Note that M θ is real, symmetric, and positive semidefinite. Therefore, from our results in Part (a), 0 tr M θ = tr MSE θ (ˆθ) tr MSE θ (ˆθ ) = mse θ (ˆθ) mse θ (ˆθ ). Thus we have shown that MSE θ (ˆθ ) MSE θ (ˆθ) = mse θ (ˆθ ) mse θ (ˆθ). 6. Note that the Gaussian Linear Model meets the requirements of the Gauss-Markov Theorem. Also (ignoring minor notational differences) note from the solution to Homework Problem 5. and the class lecture on the BLUE, that the BLUE and the MLE under the Gaussian assumption are identical, thus trivially the MLE is the BLUE by inspection. The solution in either case is ˆx = A + y where, A + = ( A T C A ) A T C, is a weighted norm pseudoinverse of A which, because A is assumed full column rank, is also a left inverse, A + A = I. As shown in the solution to HP 5., this results in ˆx being an absolutely unbiased estimator of x. Under the Gaussian assumption it is readily shown that ln p x (y) 2 y Ax 2 C, ignoring (as usual) terms independent of x. Taking the derivative of this with respect to x (using the row vector definition), x ln p x(y) = (y Ax) T C A. Taking the transpose of this yields the score function, S(x) = A T C (y Ax). (Note that the score has zero mean for all x, as required.) The Fisher information matrix of an unbiased estimator is the covariance of the score function, J x = E x S(x)S T (x) = A T C E x (y Ax)(y Ax) T C A = A T C E x nn T C A = A T C CC A = A T C A, 7

which is this instance happens to be independent of x. The Cramér Rao lower bound is then, CRLB x = J x = ( A T C A ), which in this instance is independent of x. Because A + is a left inverse, it is easy to show that, where x has zero mean. We have, x = ˆx x = A + n, Cov x ˆx = Cov x x = E x x T = A + E nn A +T = A + CA +T = ( A T C A ). Thus the MLE attains the CRLB and is therefore efficient. Alternatively, note that S(x) = A T C (y Ax) = A T C ( A(A T C A) A T C y Ax ) = A T C A ( A + y x ) = J x ( A + y x ), which is the necessary and sufficient condition for the unbiased estimator A + y to attain the CRLB. 7. (a) This problem is an easy application of Homework Problem 5.6 once it is recognized that ˆθ(y) is an unbiased estimator of θ + b(θ). With E θ θ = b(θ) we have MSE θ (ˆθ) E θ θ θ T ) ] ) = E θ [( θ b(θ) + b(θ) [( θ b(θ) = Cov θ ( θ) + b(θ)b T (θ) ] T + b(θ) To complete the rest of the problem we will use the result of Homework Prob. 5.6. To do so take g(θ) θ + b(θ) and note that the Jacobian matrix of g(θ) is given by g (θ) = I + b (θ) Also note that ĝ(y) ˆθ(y) 8

is a uniformly unbiased estimate of g(θ) We have or E θ ĝ(y) = E θ ˆθ(y) = θ + b(θ) = g(θ) Cov θ ( θ) = E θ ( θ b(θ) ) ( θ b(θ) ) T = E θ [ˆθ (θ + b(θ)) ] [ˆθ (θ + b(θ)) ] T = MSE θ (ĝ) = Cov θ (ĝ) = Cov θ ( g) g (θ)j (θ)g T (θ) (from Homework Problem 5.6) Cov θ ( θ) (I + b (θ)) T J (θ)(i + b (θ)) T yielding the result to be proved. This shows that MSE θ (ˆθ) (I + b (θ)) T J (θ)(i + b (θ)) T + b(θ)b T (θ) Note that the the right hand side of this inequality depends upon the bias of the particular estimator ˆθ whereas within the class of unbiased estimators the CRLB is independent of the (unbiased) estimator. (b) For a scalar parameter θ, an efficient biased estimator ˆθ will be more efficient than an efficient unbiased estimator if uniformly in θ we have ( + b (θ)) 2 J(θ) + b 2 (θ) J(θ) (6) with strict inequality for at least one value of θ. Condition (6) is satisfied (with strict inequality for all θ) if for all θ 0 < b 2 (θ) < ( + b (θ)) 2 J(θ) (7) The Assumption that < b (θ) < ɛ < 0 for all θ yields ( ɛ) 2 < ( + b (θ) 2 ) for all θ. Therefore a sufficient condition for (7) (and hence (6)) to hold is 0 < b 2 (θ) < ( ɛ)2 J(θ) 9

for all θ. For example, if ɛ = 0.2 the sufficient condition becomes for all θ. b 2 (θ) < 0.36 J(θ) 8. First consider a single measurement y. The random variable y has a Poisson distribution with parameter λ. It is straightforward to determine that E λ y = λ. 5 For a single measurement, y, one can easily compute the quantities S(y; λ) = λ ln P (y; λ) = y λ λ S(y; λ) = y λ 2 This yields the single measurement Fisher Information, J () (λ) J () (λ) = E λ S(y; λ) = E λ y = λ λ λ 2 λ = 2 λ. Now consider the m iid measurements Y m = y,, y m. It can easily be ascertained that S(Y m ; λ) = m S(y i ; λ) i= λ S(Y m ; λ) = m i= λ S(y i; λ). From this the full measurement Fisher Information is determined to be J (m) (λ) = E λ λ S(Y m ; λ) = This yields the Cramér Rao lower bound, m i= CRLB(m) = J (m) (λ) = m J () (λ) = λ m. E λ λ S(y i; λ) = m J () (λ) = m λ. Note that CRLB(m) 0 as m, showing that an efficient unbiased estimator ˆλ(m) (if it exists) of λ is consistent. 6 5 It is also straightforward to compute E λ y(y ), from which one can readily determine that Var λ y = λ, although we do not use this fact here. 6 Why is this fact true? 0

9. Kay 4. and 6.. Except for specifying the precise form of the noise distribution in 4., the two problems are essentially identical in structure (viewing the known parameters p and r i as TBD for each specific problem). As a consequence the estimator in both cases is the BLUE and has exactly the same form. Therefore, we only have to solve for the general form of the BLUE once, and then apply the solution for a specific choice of p and r i. 7 However, in 6. the estimator can be claimed to be the BLUE only (and generally is not a MVUE) because the noise is not assumed to be Gaussian. In 4., the BLUE is also a MVUE because here the noise is also taken to be Gaussian. For arbitrary p the problem can be recast as where x[0] x[] x =., H = x[n ] with x = Hθ + w r r p..... r N rp N A, θ =. A p E θ w = 0 and Cov θ w = σ 2 I., w = The general form of the BLUE is given by, ˆθ = ( H T H ) H T x with Cov θ ˆθ = σ ( 2 H T H ). w[0] w[]. w[n ] Problem 4.. Here, we take p = 2, r =, r 2 =, and N even. It is easily shown that for this case, and yielding H T H = N I 2 2, N x[n] H T x = N ( ) n x[n] ) (Â ˆθ = = Â 2 N N N x[n] N ( ) n x[n] 7 Specifically, we take p = 2 in the second part of 4. and p = in 6.. In the second part of 4., we set r = and r 2 =. In 6., r = r is left unspecified but assumed known.

with ( σ 2 ) 0 Cov θ ˆθ = N σ. 0 2 N In this (the Gaussian) case, BLUE = MVUE. Furthermore, we see that the estimator is consistent. Problem 6.. In this case we have p = and r = r = unspecified, yielding H T H = N r 2n and N r n x[n]. The resulting estimator is with ˆθ = Â = N r n x[n] N r 2n Cov A Â = σ2. N r 2n Note that if r 2 <, then in the limit as N we have r 2n = r 2 < showing that in this case we do not have a consistent estimator. However, if r 2 this sum is divergent, and the estimator is consistent. In either case the BLUE cannot be claimed to be also a MVUE because we do not know if the noise is Gaussian. 0. Kay 4.3. Assuming that Cov θ w = σ 2 I, setting K = (H T H) H +, ˆθ = Kx = θ + Kw E θ ˆθ = θ + E θ Kw = θ + B(θ) where the bias B(θ) = E θ Kw. If H and w are independent then B(θ) = E θ K E θ w = 0 and the estimator is unbiased in this case. If H and w are not independent, then generally the estimator is biased. Furthermore in the latter case the computation 2

of the mse or the variance is much more complex. independence we have that MSE θ = Cov θ ˆθ where Cov θ ˆθ On the other hand, assuming = E θ θ θt = E θ Kww T K T = E θ Eθ Kww T K T H = E θ KEθ ww T H K T = E θ K ( σ 2 I ) K T = σ 2 E θ KK T = σ 2 E θ (H T H ).. Kay 6.2. For the model with x = Hθ + w E θ w = 0 and Cov θ w = C the BLUE of θ is given by ˆθ = Kx where is a left inverse of H, KH = I. K = ( H T C H ) H T C Now consider the reparameterization α = Bθ + B with B invertible. Setting θ = B (α b) and substituting into our original model yields x + HB b = HB α + w. Defining A = HB and y = x + Ab we have recast our original model as y = Aα + w with E α w = 0 and Cov α w = C. The BLUE of α is given by ˆα = Ly where L = ( A T C A ) A T C = B ( H T C H ) H T C = BK is a left inverse of A, LA = I. We have that ˆα = Ly = L(x + Ab) = Lx + b = BKx + b = B ˆθ + b. 3

2. We have m iid measurements y k U[0, θ], k =,, m, for unknown 0 < θ < and we want to find the BLUE of θ. This is an interesting problem because, as we know from Kay Problem 3., this distribution does not have a Cramér Rao lower bound. Note that for all k =,, m we have E y k = θ and Cov 2 θ y k = θ2. We choose to 2 model this as where Defining y k = θ 2 + w k, k =,, m, E w k = 0 and Cov θ w k = θ2 2. y w y =., A = 2., w =. y n this can be equivalently recast as, with where 2 y = Aθ + w w n E w = 0 and Cov θ w = σ 2 (θ) I σ 2 (θ) = θ2 2. (a) Although Cov θ w is functionally dependent upon the unknown parameter θ, it is off the form c(θ) M where c(θ) is a scalar and M is a known matrix independent of θ. Thus the BLUE is realizable and given by, ˆθ = (A T A) A T y = 2 m y k = 2 y k = 2 m m where y k = m are two alternative ways to indicate the sample mean of the measurements y k. It is easily demonstrated that ˆθ is indeed absolutely unbiased. The variance of ˆθ is k= Cov θ ˆθ = σ 2 (θ) (A T A) = 4 σ2 (θ) m = 4 θ2 2 m = θ2 3 m. (b) Note that the BLUE is consistent. However, it is not very efficient relative to the true MVUE. The MVUE can be found from the RBLS procedure and has variance θ 2 m+2. The ratio of the BLUE variance over the MVUE variance is as m(m+2) 3 m. Note that for m = the BLUE and MVUE estimation errors have the same standard deviation (SD), for m = 0 the BLUE error has twice the SD of the MVUE error, while for m = 46 the BLUE error has four times the SD of the MVUE error. Despite this relative inefficiency the BLUE is superior to any other linear unbiased estimator of θ. 4