ECE531 Screencast 5.5: Bayesian Estimation for the Linear Gaussian Model

Similar documents
ECE531 Homework Assignment Number 6 Solution

ECE531 Lecture 8: Non-Random Parameter Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Screencast 9.2: N-P Detection with an Infinite Number of Possible Observations

ECE531 Lecture 10b: Dynamic Parameter Estimation: System Model

CS229 Lecture notes. Andrew Ng

Ch. 12 Linear Bayesian Estimators

ECE531 Lecture 11: Dynamic Parameter Estimation: Kalman-Bucy Filter

2 Statistical Estimation: Basic Concepts

Kalman Filtering. Namrata Vaswani. March 29, Kalman Filter as a causal MMSE estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

ECE531 Lecture 2b: Bayesian Hypothesis Testing

Multivariate Bayesian Linear Regression MLAI Lecture 11

ECE531 Screencast 11.4: Composite Neyman-Pearson Hypothesis Testing

Bivariate Distributions. Discrete Bivariate Distribution Example

UCSD ECE250 Handout #27 Prof. Young-Han Kim Friday, June 8, Practice Final Examination (Winter 2017)

Statistics & Data Sciences: First Year Prelim Exam May 2018

Frequentist-Bayesian Model Comparisons: A Simple Example

Chapter 8.8.1: A factorization theorem

Probabilistic Graphical Models

Bayesian Interpretations of Regularization

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares

More Spectral Clustering and an Introduction to Conjugacy

Generalized Linear Models. Kurt Hornik

1. Fisher Information

Chapter 4 - Fundamentals of spatial processes Lecture notes

ECE531 Lecture 12: Linear Estimation and Causal Wiener-Kolmogorov Filtering

Prediction. Prediction MIT Dr. Kempthorne. Spring MIT Prediction

Parameter Estimation

ST 740: Linear Models and Multivariate Normal Inference

Remarks on Improper Ignorance Priors

Part 6: Multivariate Normal and Linear Models

11 - The linear model

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

UNIT Define joint distribution and joint probability density function for the two random variables X and Y.

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Probability Distributions

The Multivariate Gaussian Distribution [DRAFT]

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

MLES & Multivariate Normal Theory

Definition 1.1 (Parametric family of distributions) A parametric distribution is a set of distribution functions, each of which is determined by speci

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Multivariate Gaussians. Sargur Srihari

Choosing among models

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Nonparameteric Regression:

ECE531 Lecture 13: Sequential Detection of Discrete-Time Signals

Bayesian Linear Regression [DRAFT - In Progress]

Mathematical statistics: Estimation theory

Lecture Note 12: Kalman Filter

Consistent Bivariate Distribution

Chapter 4 - Fundamentals of spatial processes Lecture notes

The Expectation-Maximization Algorithm

Introduction to Probabilistic Graphical Models: Exercises

GAUSSIAN PROCESS REGRESSION

Final Examination Solutions (Total: 100 points)

2.3. The Gaussian Distribution

STAT 100C: Linear models

2. Matrix Algebra and Random Vectors

Gaussian Models (9/9/13)

CSC411 Fall 2018 Homework 5

Bayesian Decision and Bayesian Learning

The Multivariate Gaussian Distribution

Economics 620, Lecture 5: exp

Lecture 8: Signal Detection and Noise Assumption

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

ECE531 Screencast 11.5: Uniformly Most Powerful Decision Rules

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Data are collected along transect lines, with dense data along. Spatial modelling using GMRFs. 200m. Today? Spatial modelling using GMRFs

Probability Background

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

F & B Approaches to a simple model

6 The normal distribution, the central limit theorem and random samples

Estimation Theory Fredrik Rusek. Chapter 11

Stat 366 A1 (Fall 2006) Midterm Solutions (October 23) page 1

STAT 730 Chapter 4: Estimation

ECE504: Lecture 10. D. Richard Brown III. Worcester Polytechnic Institute. 11-Nov-2008

Kalman Filter. Man-Wai MAK

MAS3301 Bayesian Statistics

6.4 Kalman Filter Equations

Random matrices: Distribution of the least singular value (via Property Testing)

Reliability Monitoring Using Log Gaussian Process Regression

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5

Matrices and Multivariate Statistics - II

Strong Lens Modeling (II): Statistical Methods

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

ECE531 Lecture 4b: Composite Hypothesis Testing

Lecture 3. Inference about multivariate normal distribution

Chapter 5,6 Multiple RandomVariables

X t = a t + r t, (7.1)

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Estimation techniques

Previously Monte Carlo Integration

Introduction to Machine Learning

Lecture 13 Fundamentals of Bayesian Inference

Econometrics II - EXAM Answer each question in separate sheets in three hours

Transcription:

ECE53 Screencast 5.5: Bayesian Estimation for the Linear Gaussian Model D. Richard Brown III Worcester Polytechnic Institute Worcester Polytechnic Institute D. Richard Brown III / 8

Bayesian Estimation for the Linear Gaussian Model Recall the linear Gaussian model Y = HΘ+W where the observation Y R n, the mixing matrix H R n m is known, the unknown parameter vector Θ R m is distributed as N(µ Θ,Σ Θ ), and the unknown noise vector W R n is distributed as N(0,Σ W ). Unless otherwise specified, we always assume the noise and the unknown parameters are independent of each other. What are the Bayesian MMSE/MMAE/MAP estimators in this case? Worcester Polytechnic Institute D. Richard Brown III 2 / 8

Linear Gaussian Model: Posterior Distribution Analysis To develop an expression for the posterior distribution π y (θ), we first note that π y (θ) = p Y,Θ(y,θ) p Y (y). To find the joint distribution p Y,Θ (y,θ) let ] ][ ] Θ Z = [ Y Θ = [ H I I 0 Since Θ and W are independent of each other and each is Gaussian, they are jointly Gaussian. Furthermore, since Z is a linear transformation of a jointly Gaussian random vector, it too is jointly Gaussian. To fully characterize Z N(µ Z,Σ Z ), we just need its mean and covariance: [ ] HµΘ µ Z := E[Z] = µ Θ [ HΣΘ H Σ Z := cov[z] = ] +Σ W HΣ Θ Σ Θ H W Σ Θ Worcester Polytechnic Institute D. Richard Brown III 3 / 8

Linear Gaussian Model: Posterior Distribution Analysis To compute the posterior, we can write π y (θ) = p Z(z) p Y (y) = (2π) (m+n)/2 Σ Z /2 exp (2π) n/2 Σ Y /2 exp { (z µz ) Σ Z (z µ Z) 2 { } (y µy ) Σ Y (y µ Y ) 2 To simplify the terms outside of the exponentials, note that [ HΣΘ H Σ Z := cov[z] = ] [ ] +Σ W HΣ Θ ΣY Σ Σ Θ H = Y,Θ Σ Θ Σ Θ,Y Σ Θ [ ] A B The determinant of a partitioned matrix P = can be written as C D P = A D CA B if A is invertible. Covariance matrices are invertible, hence the terms outside the exponentials can be simplified to (2π) (m+n)/2 Σ Z /2 = (2π) n/2 Σ Y /2 (2π) m/2 Σ Θ Σ Θ,Y Σ Y Σ Y,Θ /2 Worcester Polytechnic Institute D. Richard Brown III 4 / 8 }

Linear Gaussian Model: Posterior Distribution Analysis To simplify the terms inside the exponentials, we can use a matrix inversion formula for partitioned matrices (A must be invertible) [ ] A B = C D [ (A BD C) (D CA B) CA and the matrix inversion lemma A B(D CA B) ] (D CA B) (A BD C) = A +A B(D CA B) CA. Skipping all the algebraic details, we can write { } (z µz ) exp Σ Z (z µ Z) { 2 (θ α(y)) Σ } (θ α(y)) { } = exp (y µy ) exp Σ 2 Y (y µ Y ) 2 where α(y) = µ Θ +Σ Θ,Y Σ Y (y µ Y) and Σ = Σ Θ Σ Θ,Y Σ Y Σ Y,Θ. Worcester Polytechnic Institute D. Richard Brown III 5 / 8

Linear Gaussian Model: Posterior Distribution Analysis Putting it all together, we have the posterior distribution { (θ α(y)) Σ } (θ α(y)) π y (θ) = (2π) m/2 exp Σ /2 2 where α(y) = µ Θ +Σ Θ,Y Σ Y (y µ Y) and Σ = Σ Θ Σ Θ,Y Σ Y Σ Y,Θ with [ Σ Θ,Y = cov(θ,y) = E (Θ µ Θ )(HΘ+W Hµ Θ ) ] = Σ Θ H Σ Y,Θ = Σ Θ,Y = HΣ Θ Σ Y = cov(y,y) = HΣ Θ H +Σ W µ Y = E[HΘ+W] = Hµ Θ What can we say about the posterior distribution of the random parameter Θ conditioned on the observation Y = y? Worcester Polytechnic Institute D. Richard Brown III 6 / 8

Linear Gaussian Model: Bayesian Estimators Lemma In the linear Gaussian model, the parameter vector Θ conditioned on the observation Y = y is jointly Gaussian distributed with E[Θ Y = y] = µ Θ +Σ Θ H ( HΣ Θ H +Σ W ) (y HµΘ ) cov[θ Y = y] = Σ Θ Σ Θ H ( HΣ Θ H +Σ W ) HΣΘ Corollary In the linear Gaussian model ˆθ mmse (y) = ˆθ mmae (y) = ˆθ map (y) Worcester Polytechnic Institute D. Richard Brown III 7 / 8

Linear Gaussian Model: Bayesian Estimator Remarks All of the estimators are linear (actually affine) in the observation y. Recall that the performance of the Bayesian MMSE estimator is [ MMSE = E Θ ˆθ ] mmse (Y) 2 2 = trace{cov(θ Y = y)} p(y)dy. In the linear Gaussian model, we see that cov[θ Y = y] does not depend on y. Hence, we can move the trace outside of the integral and write the MMSE as MMSE = trace{cov[θ Y = y]} p(y) dy = trace{σ Θ } trace {Σ Θ H ( ) } HΣΘ HΣ Θ H +Σ W. Worcester Polytechnic Institute D. Richard Brown III 8 / 8