ECE531 Homework Assignment Number 6 Solution

Similar documents
ECE531 Screencast 5.5: Bayesian Estimation for the Linear Gaussian Model

ECE531 Lecture 8: Non-Random Parameter Estimation

Parameter Estimation

ENGG2430A-Homework 2

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

ECE531 Homework Assignment Number 7 Solution

ECE531 Lecture 10b: Maximum Likelihood Estimation

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

ECE 302 Division 2 Exam 2 Solutions, 11/4/2009.

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

ECE531 Lecture 2b: Bayesian Hypothesis Testing

STA 256: Statistics and Probability I

Frequentist-Bayesian Model Comparisons: A Simple Example

Gaussian and Linear Discriminant Analysis; Multiclass Classification

UCSD ECE153 Handout #30 Prof. Young-Han Kim Thursday, May 15, Homework Set #6 Due: Thursday, May 22, 2011

David Giles Bayesian Econometrics

General Bayesian Inference I

ECE531: Principles of Detection and Estimation Course Introduction

MAS223 Statistical Inference and Modelling Exercises

Introduction to Bayesian Statistics

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

Math Review Sheet, Fall 2008

A Very Brief Summary of Bayesian Inference, and Examples

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

FINAL EXAM: Monday 8-10am

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Chapter 2: Random Variables

ECE531: Principles of Detection and Estimation Course Introduction

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

SDS 321: Introduction to Probability and Statistics

Continuous Random Variables

Recitation 2: Probability

ST 740: Multiparameter Inference

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

Multiple Random Variables

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

Bivariate distributions

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

Choosing among models

18 Bivariate normal distribution I

EE4601 Communication Systems

Topic 12 Overview of Estimation

STA 260: Statistics and Probability II

MATH 180A - INTRODUCTION TO PROBABILITY PRACTICE MIDTERM #2 FALL 2018

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

STAT215: Solutions for Homework 2

SOLUTION FOR HOMEWORK 6, STAT 6331

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20

A Probability Review

Appendix A : Introduction to Probability and stochastic processes

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1

A Very Brief Summary of Statistical Inference, and Examples

UNIT Define joint distribution and joint probability density function for the two random variables X and Y.

CS446: Machine Learning Fall Final Exam. December 6 th, 2016

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Bayesian Learning (II)

Chapter 5,6 Multiple RandomVariables

Political Science 6000: Beginnings and Mini Math Boot Camp

The Multivariate Gaussian Distribution [DRAFT]

f (1 0.5)/n Z =

Conditional distributions (discrete case)

B4 Estimation and Inference

Advanced Signal Processing Introduction to Estimation Theory

Linear Methods for Prediction

Final Examination Solutions (Total: 100 points)

Statistics for scientists and engineers

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Naïve Bayes classification

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Introduction to Bayesian Statistics

Empirical Risk Minimization is an incomplete inductive principle Thomas P. Minka

Multiple Random Variables

Let X and Y denote two random variables. The joint distribution of these random

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Detection theory. H 0 : x[n] = w[n]

POLI 8501 Introduction to Maximum Likelihood Estimation

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Introduction to Bayesian Inference

Joint Probability Distributions and Random Samples (Devore Chapter Five)

ECE 302: Probabilistic Methods in Engineering

F denotes cumulative density. denotes probability density function; (.)

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review

Joint Gaussian Graphical Model Review Series I

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Data Analysis and Monte Carlo Methods

Probability and Estimation. Alan Moses

Transcription:

ECE53 Homework Assignment Number 6 Solution Due by 8:5pm on Wednesday 3-Mar- Make sure your reasoning and work are clear to receive full credit for each problem.. 6 points. Suppose you have a scalar random parameter Θ R with discrete prior distribution π(θ) (δ(θ )+δ(θ +)). You receive a single observation where W N(,σ ) and σ is known. Y Θ+W (a) points. Find the Bayesian MMSE estimator for the parameter Θ. Solution: All of the Bayesian estimators require us to compute the posterior density π y (θ) p θ(y)π(θ) p(y) First we calculate the unconditional density p(y): p(y) p θ (y)π(θ)dθ Λ exp{ πσ σ (y ) }+ exp{ πσ σ (y +) } Then we can compute the posterior density: exp{ π y (θ) πσ exp{ πσ σ (y ) } σ (y ) }+ exp{ πσ σ (y+) } exp{ πσ σ (y+) } exp{ πσ σ (y ) }+ exp{ πσ σ (y+) } exp{ y σ } exp{ y σ }+exp{ y if θ σ exp{ } y σ } exp{ y σ }+exp{ y if θ σ } The MMSE estimate is then the conditional mean, ˆθ MMSE (y) θπ y (θ)dθ Λ π(θ y) π(θ y) exp y exp y σ σ exp y σ +exp y σ sinh( y ) ( σ y ) cosh ( y ) tanh σ σ if θ if θ

(b) points. Find the Bayesian MAP estimator for the parameter Θ. Solution: The MAP estimator is the conditional mode, which we can compute as ˆθ MAP (y) argmax π y(θ) θ± if exp y σ exp y if { if y if y < sgn(y) exp y σ +exp y σ σ exp y σ +exp y σ exp y σ +exp y σ < σ exp y σ +exp y σ exp y σ exp y (c) point. Under what conditions are the MMSE and MAP estimates approximately equal? Solution: We can see that tanh(x) as x and similarly, tanh(x) as x. Thus, if σ is very small ( ), then the two estimators will be approximately equal. (d) points. Discuss how this problem relates to simple binary Bayesian hypothesis testing. Solution: In this problem, θ can only take on two possible values. So, it could also be consideredinthecontextofdetectionifweassignhypothesesh : θ andh : θ. We know the Bayes decision rule in this case is to decide H when y and decide H when y < (with ambiguity at y ). The ˆθ MMSE tanh(y/σ ) MMSE estimator can be thought of as a soft decision rule in the sense that the estimate is always in [,] and approaches one of the possible values for θ when the evidence in favor of + or is very strong, i.e. when y/σ or y/σ. You can use the MMSE estimate to get a Bayes detector by deciding H : θ when ˆθ and deciding H : θ otherwise. But the soft MMSE estimate also gives a result that reflects to some extent the uncertainty of the decision. The MAP estimator gives the same result as a Bayesian detector in this problem (decide H : θ when y and decide H : θ otherwise).

. 4 points. Kay I, problem.3. Solution: This problem requires the computation of the posterior density p y (θ) and then the conditional mean. I will use the notation y n for the observations (Kay uses x[n]). Note that, since the observations are conditionally independent, the joint conditional density of the observations can be written as the product of the marginals p θ (y) { exp( (y n θ)) all y n > θ p θ (y n ) If we let ˇy miny n and ȳ N yn, we can write the joint conditional density as { exp( N(ȳ θ)) ˇy > θ p θ (y) The posterior density can then be computed as π y (θ) p θ(y)π(θ) p(y) { exp( N(ȳ θ))exp( θ) ˇy exp( N(ȳ θ))exp( θ)dθ < θ < ˇy { (N )exp(θ(n )) exp(ˇy(n )) < θ < ˇy Note that the posterior density is only a function of ˇy miny n and N. All of the ȳ terms cancelled out. Also note this result is only valid for N. If N, we can perform this same calculation with the marginal conditional density. To compute the MMSE estimator, we need to compute the conditional mean. Note that only the numerator of π y (θ) is a function of θ, so we can write ˆθ MMSE E[θ Y y] ˇy θ(n )exp(θ(n ))dθ exp(ˇy(n )) The integral is of the form xe x dx, which you solve via integration by parts (or a lookup table or a symbolic math software package). After performing the integral and doing some algebra to simplify the result, we get ˆθ MMSE ˇy exp{ ˇy(N )} N. As a sanity check, let s see what happens when N. In this case ˆθ MMSE ˇy, which should make intuitive sense. As the number of observations becomes large, the prior is washed away and only the observations matter. The parameter θ represents the minimum value of the observations (the left edge of the conditional density), hence ˆθ MMSE ˇy makes intuitive sense. When thenumber of observations is finite, the prior acts as a correction factor on the estimate in the sense that the data is optimally weighted with the prior to minimize the Bayes MMSE risk. Thepriorinthisproblemsaysthat θ is morelikely tobesmall thanlarge. Figureshows the MMSE estimate as a function of ˇy and N for values of N,3,...,. As expected, the prior causes the MMSE estimate to always be less than ˇy, and especially so for smaller values of N.

3.5 N N MMSE estimate.5 N.5.5.5.5 3 min y n Figure : MMSE estimates in problem as a function of ˇy and N.

3. 4 points. Kay I, problem.4. Solution: As before, we need to compute the posterior density p y (θ) and then compute the conditional mean. I will use the notation y n for the observations (Kay uses x[n]). Since the observations are conditionally independent, the joint conditional density of the observations can be written as the product of the marginals p θ (y) p θ (y n ) { θ N all y n satisfy y n θ Note that miny n for any parameter θ in this problem since Θ U[,β]. So, if we let ŷ maxy n, we can write the joint conditional density as Also note that p θ (y) π(θ) { θ N ŷ θ { β θ β We have everything we need. The posterior density can then be computed as π y (θ) p θ(y)π(θ) p(y) θ N β β ŷ θ β ŷ θ N β dθ θ N ŷ θ β N (ŷ (N ) β (N ) ) Note this result is only valid for N. If N, we can perform this same calculation with the marginal conditional density. To compute the MMSE estimator, we need to compute the conditional mean. Note that only the numerator of π y (θ) is a function of θ, so we can write Note this result is only valid for N 3. ˆθ MMSE E[θ Y y] β ŷ θ dθ θ N N (ŷ (N ) β (N )) ŷ (N ) β (N ) N ŷ (N ) β (N ) N When thenumber of observations is finite, the prior acts as a correction factor on the estimate in the sense that the data is optimally weighted with the prior to minimize the Bayes MMSE risk. Since ŷ is always going to be less than the parameter of interest, we can expect the correction to cause the estimate to be larger than ŷ. Figure shows the MMSE estimate as a function of ŷ and N when β for values of N 3,4,...,. As expected, the prior causes the MMSE estimate to always be more than ŷ, and especially so for smaller values of N.

.9 β.8 MMSE estimate.7.6.5.4 N3 N.3.....3.4.5.6.7.8.9 max y n Figure : MMSE estimates in problem 3 as a function of ŷ and N for the case when β.

When β is large and N 3, we can write ˆθ MMSE ŷ (N ) N ŷ (N ) N ŷ N N. If N is also large, then ˆθ MMSE ŷ. This makes intuitive sense since the observations are upper bounded by θ and the maximum observation should give a good estimate for θ as N.

4. 4 points. Kay I, problem.. Solution: Everything here is jointly Gaussian (or bivariate Gaussian since there are only two random variables). If we let z [x,y], the joint density can be written as p Z (z) p X,Y (x,y) exp π C / { z C z Now conditioning on x x and doing a bit of algebra, we have { (x g(y) exp ρx y +y } ) π C / ( ρ ) exp π C / }. { } h(y) ( ρ. ) So g(y) is maximized when h(y) is minimized. To minimize h(y), we take a derivative and set the result to zero, i.e. d dy h(y) y ρx which has a unique solution at y ρx. You can take another derivative to confirm that this is indeed a minimum of h(y). Hence, we have proven that g(y) is maximized when y ρx. Now, we can use the results in Theorem. of Kay I for bivariate Gaussian distributed random variables to write E[Y X x ] E[Y]+ cov(x,y) var(x) (x E[X]). From the problem description, we know E[Y] E[X], cov(x,y) ρ, and var(x). Hence, E[Y X x ] ρx. Why are they the same? Conditioning on X x does not change the Gaussianity of Y. All the conditioning does is make the posterior distribution of Y have a different mean and variance, but the posterior distribution of Y remains Gaussian. We know that the symmetry of a Gaussian distribution causes the mean, median, and mode to all be the same thing. When ρ, X and Y are uncorrelated (actually independent since uncorrelated Gaussian random variables are independent). So observing X x tells us nothing about Y. As you would expect then, E[Y X x ] E[Y]+ cov(x,y) var(x) (x E[X]) E[Y]+ (x E[X]) E[Y].

5. 3 points. Kay I, problem.3. Also find the Bayesian MMAE estimator. Solution: We ve already been given the posterior pdf in this problem, so we just need to compute the conditional mean, median, and mode to get the MMSE, MMAE, and MAP estimators, respectively. I will use the notation y for my observation (Kay uses x). We can compute ˆθ MMSE (y) E[Θ Y y] y y + θexp( (θ y))dθ This makes sense because the posterior distribution is an exponential distribution with parameter λ shifted to the right by y. The mean of an exponentially distributed random variable with parameter λ is λ. Hence, the mean of this shifted exponential distribution should be y + λ y +. The median of an exponentially distributed random variable with parameter λ is ln λ. What we have here is an exponential distribution with parameter λ shifted to the right by y. Hence, ˆθ MMAE (y) median[θ Y y] y +ln. Note that the MMAE estimate is smaller than the MMSE estimate since ln.693. Finally, the mode of an exponentially distributed random variable with parameter λ is. What we have here is an exponential distribution with parameter λ shifted to the right by y. Hence, ˆθ MAP (y) mode[θ Y y] y. Note that the MAP estimate is smaller than the MMAE estimate. This is a nice example of a case where the MMSE, MMAE, and MAP estimates are all different.

6. 4 points. Kay I, problem.6. Solution: This problem falls has a linear Gaussian model. So instead of re-deriving all of those results, we will just apply them here by noting that Y M M [ ] W M A Y... + B. Y M M }{{} W M }{{} Θ }{{} H W To derive the MMSE estimator and its associated performance, we need to calculate where it is given that E[Θ Y y] µ Θ +Σ Θ H ( HΣ Θ H +Σ W ) (y HµΘ ) cov[θ Y y] Σ Θ Σ Θ H ( HΣ Θ H +Σ W ) HΣΘ µ Θ [A,B ] Σ Θ diag(σ A,σ B) Σ W σ I. We ve got everything we need, it is just a matter of doing the calculations. The only difficult part is the matrix inverse. If you just try to plug and chug here, you will see that you need to do a M + M + matrix inverse. Instead, we can use the the matrix inversion lemma (sorry for the conflicting notation here) (A BD C) A +A B(D CA B) CA. with A Σ W, B H, C H, and D Σ Θ. Then we can write ( ) HΣ Θ H +Σ W σ I ( σ H Σ Θ + ) σ H H H σ. The good news about this is that the new matrix inverse is just. Let s focus on that for a second ( ([ Σ Θ + ] [ M+ ] σ H) ) σ H A + σ σb σ A + M+ σ σ B+ P σ P σ where P : M m M m. So plugging this result in and grinding out the rest of the linear algebra, we can write the MMSE/MMAE/MAP estimator as A + ˆθ B + ( P ȳ A σ (M+)σ A + M m M mym) B σ Pσ A +

where ȳ M M+ m M y m. The MMSE can be similarly computed as MMSE ( ) + M+ σa σ ( σ B + P σ ) Both MMSE s benefit from narrower priors (smaller σa and σ B ), as you would expect. Note, however, that P : M m M m grows faster in M than M +. Hence, the MMSE of the slope is going to zero more quickly than the MMSE of the intercept as more observations arrive. We can conclude then that the intercept (parameter A) benefits the most from prior knowledge since the prior on the slope is washed away more quickly by the observations than the prior on the intercept.