Recursive Estimation

Similar documents
Recursive Estimation

6.4 Kalman Filter Equations

6. Vector Random Variables

Lecture 3: Pattern Classification. Pattern classification

Lecture Note 2: Estimation and Information Theory

CS 532: 3D Computer Vision 6 th Set of Notes

2 Statistical Estimation: Basic Concepts

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

Recursive Estimation

Problem Set 3 Due Oct, 5

MACHINE LEARNING ADVANCED MACHINE LEARNING

CSE 559A: Computer Vision

3.3.1 Linear functions yet again and dot product In 2D, a homogenous linear scalar function takes the general form:

MACHINE LEARNING ADVANCED MACHINE LEARNING

Where now? Machine Learning and Bayesian Inference

Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda

CSE 559A: Computer Vision Tomorrow Zhihao's Office Hours back in Jolley 309: 10:30am-Noon

EE 302 Division 1. Homework 6 Solutions.

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

EA = I 3 = E = i=1, i k

Basic concepts in estimation

Computation fundamentals of discrete GMRF representations of continuous domain spatial models

Kalman Filtering. Namrata Vaswani. March 29, Kalman Filter as a causal MMSE estimator

Assignment #9: Orthogonal Projections, Gram-Schmidt, and Least Squares. Name:

MAT 2037 LINEAR ALGEBRA I web:

Decomposable and Directed Graphical Gaussian Models

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

Noise-Blind Image Deblurring Supplementary Material

Machine Learning Basics III

Gaussians. Hiroshi Shimodaira. January-March Continuous random variables

2.7 The Gaussian Probability Density Function Forms of the Gaussian pdf for Real Variates

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

Mathematics 1. Part II: Linear Algebra. Exercises and problems

State Space and Hidden Markov Models

Economics 205 Exercises

Matrices. In this chapter: matrices, determinants. inverse matrix

Machine Learning Lecture 2

Machine Learning Lecture 2

3.1: 1, 3, 5, 9, 10, 12, 14, 18

INF Introduction to classifiction Anne Solberg

33A Linear Algebra and Applications: Practice Final Exam - Solutions

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra

Copyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS

NON-NEGATIVE MATRIX FACTORIZATION FOR PARAMETER ESTIMATION IN HIDDEN MARKOV MODELS. Balaji Lakshminarayanan and Raviv Raich

Lemma 8: Suppose the N by N matrix A has the following block upper triangular form:

Matrix Algebra Determinant, Inverse matrix. Matrices. A. Fabretti. Mathematics 2 A.Y. 2015/2016. A. Fabretti Matrices

Factor Analysis. Qian-Li Xue

Least Squares with Examples in Signal Processing 1. 2 Overdetermined equations. 1 Notation. The sum of squares of x is denoted by x 2 2, i.e.

0.1. Linear transformations

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Expression arrays, normalization, and error models

Bayesian Methods / G.D. Hager S. Leonard

Multidisciplinary System Design Optimization (MSDO)

Ordering the Extraction of Polluting Nonrenewable Resources

Optimization Methods: Optimization using Calculus - Equality constraints 1. Module 2 Lecture Notes 4

Kernel PCA, clustering and canonical correlation analysis

Learning from Data: Regression

Speech and Language Processing

Figure 29: AR model fit into speech sample ah (top), the residual, and the random sample of the model (bottom).

Now, if you see that x and y are present in both equations, you may write:

Math for ML: review. Milos Hauskrecht 5329 Sennott Square, x people.cs.pitt.edu/~milos/courses/cs1675/

9 - Markov processes and Burt & Allison 1963 AGEC

6.867 Machine learning

Chapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence

Rejection sampling - Acceptance probability. Review: How to sample from a multivariate normal in R. Review: Rejection sampling. Weighted resampling

Probabilistic Graphical Models

UNCONSTRAINED OPTIMIZATION PAUL SCHRIMPF OCTOBER 24, 2013

Mobile Robot Localization

1 Cricket chirps: an example

Probability & Statistics: Infinite Statistics. Robert Leishman Mark Colton ME 363 Spring 2011

Correlation analysis 2: Measures of correlation

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015

Matrix Algebra. Matrix Algebra. Chapter 8 - S&B

Matrix Derivatives and Descent Optimization Methods

Expectation Maximization Deconvolution Algorithm

Particle Methods as Message Passing

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

Managing Uncertainty

ECE Lecture #9 Part 2 Overview

ECON 5350 Class Notes Review of Probability and Distribution Theory

14.30 Introduction to Statistical Methods in Economics Spring 2009

INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.

Lecture 3: September 10

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Probabilistic Graphical Models

M.S. Project Report. Efficient Failure Rate Prediction for SRAM Cells via Gibbs Sampling. Yamei Feng 12/15/2011

Machine Learning Lecture 3

ANOVA: Analysis of Variance - Part I

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q

Local Probability Models

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Infinite Horizon LQ. Given continuous-time state equation. Find the control function u(t) to minimize

Linear Equations and Matrix

Pre-Calculus Summer Packet

AP Calculus BC Summer Packet 2017

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

Pre-Calculus and Trigonometry Capacity Matrix

A Robust State Estimator Based on Maximum Exponential Square (MES)

The Kalman filter is arguably one of the most notable algorithms

Transcription:

Recursive Estimation Raffaello D Andrea Spring 07 Problem Set 3: Etracting Estimates from Probability Distributions Last updated: April 5, 07 Notes: Notation: Unlessotherwisenoted,, y,andz denoterandomvariables, f )ortheshort hand f)) denotes the probability density function of, and f y y) or f y)) denotes the conditional probability density function of conditioned on y. The epected value is denoted by E[ ], the variance is denoted by Var[ ], and PrZ) denotes the probability that the event Z occurs. A normally distributed random variable with mean µ and variance σ is denoted by N µ,σ ). Please report any errors found in this problem set to the teaching assistants michaemu@ethz.ch or lhewing@ethz.ch).

Problem Set Problem Let and w be scalar CRVs, where,w are independent: f,w) = f)fw), and w [,] is uniformly distributed. Let z = +3w. a) Calculate fz ) using the change of variables. b) Let =. Verify that Prz [,4] = ) = Prw [0,] = ). Problem Let w R and R 4 be CRVs, where and w are independent: f,w) = f)fw). The two elements of the measurement noise vector w are w,w, and the PDF of w is uniform: f w w) = f w,w w,w ) = The measurement model is where z = H+Gw H = /4 for w and w 0 otherwise. [ ] [ 0 0 0 ], G = 0 0 0 0 0 3 Calculate the conditional joint PDF f z z ) using the change of variables.. Problem 3 Find the maimum likelihood ML) estimate ˆ ML := argmafz ) when z i = +w i, i =, where w i are independent with w i N 0,σ i) and zi,w i, R. Problem 4 Solve Problem 3 if the joint PDF of w is fw) = fw,w ) = /4 for w and w 0 otherwise.

Problem 5 Let R n be an unknown, constant parameter. Furthermore, let w R m, n m, represent the measurement noise with its joint PDF defined by the multivariate Gaussian distribution f w w) = π) m / detσ)) ep ) / wt Σ w with mean E[w] = 0 and variance E[ww T ] =: Σ = Σ T R m m. Assume that and w are independent. Let the measurement be z = H+w where the matri H R m n has full column rank. a) Calculate the ML estimate ˆ ML := argmafz ) for a diagonal variance σ 0 0 Σ = 0 σ........... 0. 0 0 σm Note that in this case, the elements w i of w are mutually independent. b) Calculate the ML estimate for non-diagonal Σ, which implies that the elements w i of w are correlated. Hint: The following matri calculus is useful for this problem see, for eample, the Matri Cookbook for reference): Recall the definition of a Jacobian matri from the lecture: Let g) : R n R m be a function that maps the vector R n to another vector g) R m. The Jacobian matri is defined as g) := g ) g ). g m) g ) g ).. g m) g ) n g ) n. g m) n Rm n where j is the j-th element of and g i ) the i-th element of g). Derivative of a quadratic form: Chain rule: T A = T A+A T) R n, A R n n. u = g), R n,u R n ut Au = u T A+A T) u, u = g) Rn n. 3

Problem 6 Find the maimum a posteriori MAP) estimate ˆ MAP := argmafz )f) when z = +w,,z,w R where is eponentially distributed, 0 for < 0 f) = e for 0 and w N0, ). Problem 7 In the derivation of the recursive least squares algorithm, we used the following result: trace ABA T) = AB if B = B T A where trace ) is the sum of the diagonal elements of a matri. Prove that the above holds for the two-dimensional case, i.e. A,B R. Problem 8 The derivation of the recursive least squares algorithm also used the following result: traceab) A = B T Prove that the above holds for A,B R. Problem 9 Consider the recursive least squares algorithm presented in lecture, where the estimation error at step k was shown to be ) ek) = I Kk)Hk) ek ) Kk)wk). Furthermore, E[wk)] = 0, E[ek)] = 0, and, w),...} are mutually independent. Show that the measurement error variance Pk) = Var[ek)] = E[ek)e T k)] is given by ) T Pk) = I Kk)Hk) Pk ) I Kk)Hk)) +Kk)Rk)K T k) where Rk) = Var[wk)]. Problem 0 Use the results from Problems 7 and 8 to derive the gain matri Kk) for the recursive least squares algorithm that minimizes the mean squared error MMSE) by solving 0 = ) T Kk) trace I Kk)Hk)) Pk ) I Kk)Hk)) +Kk)Rk)K T k). 4

Problem Apply the recursive least squares algorithm when where zk) = +wk), k =,,... ) E[] = 0, Var[] =, and E[wk)] = 0, Var[wk)] =. ) The random variables,w),w),...} are mutually independent and zk),wk), R. a) Epress the gain matri Kk) and variance Pk) as eplicit functions of k. b) Simulate the algorithm for normally distributed continuous random variables wk) and with the above properties ). c) Simulate the algorithm for and wk) being discrete random variables that take the values and with equal probability. Note that and wk) satisfy ). Simulate for values of k up to 0000. Problem Consider the random variables and measurement equation from Problem c), i.e. zk) = +wk), k =,,... 3) for mutually independent and wk) that take the values and with equal probability. a) WhatistheMMSEestimateofattimek givenallpastmeasurementsz),z),...,zk)? Remember that the MMSE estimate is given by [ ˆ MMSE k) := argmine ˆ ) ] z),z),...,zk). ˆ b) Now consider the MMSE estimate that is restricted to the sample space of, i.e. [ ˆ MMSE k) := argmin E ˆ ) ] z),z),...,zk) ˆ X where X denotes the sample space of. Calculate the estimate of at time k given all past measurements z),z),...,zk). 5

Sample solutions Problem a) The change of variables formula for conditional PDFs when z = g,w) is f z z ) = f w hz,) ) g. w,hz,)) where w = hz,) is the unique solution to z = g,w). We solve z = g,w) for w to find hz,) w = z 3 = hz,) and apply the change of variables: f z z ) = f w hz,) ) g w,hz,)) = f w 3 z ) ) 3 = 3 f w ) z ). 3 With the given uniform PDF of w, the PDF of fz ) can then be stated eplicitely: 6 for 3 z +3 f z z ) = 0 otherwise b) We begin by computing Prz [,4] = ): Prz [,4] = ) = 4 The probability Prw [0,] = ) is f z z = )dz = Prw [0,] = ) = Prw [0,]) = which is indeed identical to Prz [,4] = ). 0 4 6 dz =. f w w)dw = 0 dw = Problem Analogous to the lecture notes, the multivariate change of variables formula for conditional PDFs is f z z ) = f w hz,) ) detg) where w = hz,) is the unique solution to z = H+Gw. The function hz,) is given by [ ] [ ] ) w = G 0 0 0 0 z H) = 0 z 3 0 0 0 [ ] z = ) 3 z 3 ) 6

where we used =,, 3, 4 ) and z = z,z ). Applying the change of variables, it follows that f z z ) = f w hz,) ) detg) = f w G z H) ) 3 = 3 f w Using the PDF f w w), this can then be epressed as f z z ) = z ), ) 3 z 3 ). 6 for z + and 3 +3 z 3 3 0 otherwise. Problem 3 For a normally-distributed random variable with w i N 0,σ i), we have fw i ) ep w i ) σi. Using the change of variables, and by the conditional independence of z,z given, we can write )) fz,z ) = fz )fz ) ep z ) σ + z ) σ. Differentiating with respect to, and setting to 0, leads to fz,z ) z ˆ ML) z ˆ ML) = 0 + =ˆ ML σ σ = 0 ˆ ML = z σ +z σ σ +σ which is a weighted sum of the measurements. Note that for σ = 0 : ˆ ML = z, for σ = 0 : ˆ ML = z. Problem 4 Using the total probability theorem, we find that fw i ) = for w i [,] 0 otherwise. and fw,w ) = fw )fw ). Using the change of variables, fz i ) = if z i 0 otherwise. We need to solve for fz,z ) = fz )fz ). Consider the different cases: z z >, for which the likelihoods do not overlap, see the following figure: 7

fz ) z fz ) z Since no value for can eplain both, z and z, given the model z i = +w i, we find fz,z ) = 0. z z, where the likelihoods overlap, which is shown in the following figure: fz,z ) fz ) fz ) 4 z z z z We find fz,z ) = 4 for [z,z +] [z,z +] 0 otherwise where A B denotes the intersection of the sets A and B. Therefore the estimate is ˆ ML [z,z +] [z,z +]. All values in this range are equally likely, and any value in the range is therefore a valid maimum likelihood estimate ˆ ML. Problem 5 a) We apply the multivariate change of coordinates: f z z ) = f w hz,) ) detg) = f w z H ) = f w z H) = π) m / detσ)) ep ) / z H)T Σ z H) We proceed similarly as in class, for z,w R m and R n : H = H H. H m, H i = [ ] h i h i... h in, where hij R. Using fz ) ep m )) z i H i ) i= σ i. 8

differentiating with respect to j, and setting to zero yields m z i H iˆ ML h ij = 0, i= σ i [ ] h j h mj }} column of H From which we have and H T W z Hˆ ML) = 0 σ ˆ ML = H T WH ) H T Wz. σ... 0 σm j =,,...,n } } W 0 z Hˆ ML ) = 0, j =,...,n. Note that this can also be interpreted as a weighted least squares, w) = z H ˆ = argmin w) T Ww) where W is a weighting of the measurements. The result in both cases is the same! b) We find the maimizing ˆ ML by setting the derivative of fz ) with respect to to zero. With the given facts about matri calculus, we find 0 = fz ) ep ) z H)T Σ z H) where we use the fact that Σ ) T = Σ T ) = Σ. z H)T Σ + Σ ) T ) z H) = z H) T Σ H) Finally, z H) T Σ H = 0 z T Σ H = T H T Σ H and after we transpose the left and right hand sides, we find H T Σ z = H T Σ H. It follows that ˆ ML = H T Σ H ) H T Σ z. 9

Problem 6 We calculate the conditional probability density function, fz ) = ep ) z ). π Both the measurement likelihood fz ) and the prior f) are shown in the following figure f) fz ) z We find fz )f) = 0 for < 0 π ep )ep z )) for 0. We calculate the MAP estimate ˆ MAP that maimizes the above function for a given z. We note that fz )f) > 0 for 0, and the maimizing value is therefore ˆ MAP 0. Since the function is discontinuous at = 0, we have to consider two cases:. The maimum is at the boundary, ˆ MAP = 0. It follows that f z ˆ MAP ) ) f MAP ˆ = ep π z ). 4). The maimum is where the derivative of the likelihood fz )f) with respect to is zero, i.e. 0 = fz )f)) from which follows ˆ MAP = z =ˆ MAP = + z ˆ MAP ) at the maimum. Substituting back into the likelihood, we obtain f z ˆ MAP ) ) f MAP ˆ = ep z +)ep ) = ep z + ). 5) π π We note that f z ˆ MAP ) f ˆ MAP therefore 0 for z < ˆ MAP = z for z. ) f z ˆ MAP ) f ˆ MAP ) for z, and the ML estimate is 0

Problem 7 We begin by evaluating the individual elements of AB) and ABA T ) with A) ij denoting the i,j)th element of matri A element in ith row and jth column): AB) ij = a i b j +a i b j ABA T ) ij = a ib +a i b )a j +a i b +a i b )a j From this we obtain the trace as trace ABA T) = ABA T) + ABA T) and the derivatives = a b +a b )a +a b +a b )a +a b +a b )a +a b +a b )a trace ABA T) = a b +a b +a a }} b = AB) =b trace ABA T) = a b +a b +a b = AB) a trace ABA T) = a b +a b +a b = AB) a trace ABA T) a = a b +a b +a b = AB) We therefore proved that trace ABA T) = AB A holds for A,B R. Problem 8 We proceed similarly to the previous problem: that is traceab) = AB) +AB) = a b +a b +a b +a b traceab) = b = B) a = B T) traceab) = b = B) a = B T) traceab) = b = B) a = B T) traceab) = b = B) a = B T) traceab) A = B T for A,B R.

Problem 9 By definition, Pk) = E[ek)e T k)]. Substituting the estimation error ) ek) = I Kk)Hk) ek ) Kk)wk) in the definition, we obtain ) ) ) ) ] T Pk) =E[ I Kk)Hk) ek ) Kk)wk) I Kk)Hk) ek ) Kk)wk) ) T =E[ I Kk)Hk) ek )e T k ) I Kk)Hk)) +Kk)wk)w T k)k T k) ) ) ] T I Kk)Hk) ek )w T k)k T k) Kk)wk)e T k ) I Kk)Hk) ) T = I Kk)Hk) Pk ) I Kk)Hk)) +Kk)Rk)K T k) ) I Kk)Hk) E [ ek )w T k) ] K T k) Kk)E [ wk)e T k ) ] T. I Kk)Hk)) Becauseek ) andwk)areindependent,e[wk)e T k )] = E[wk)]E[e T k )]. SinceE[wk)] = 0, E[wk)]E[e T k )] = 0, and it follows that ) T Pk) = I Kk)Hk) Pk ) I Kk)Hk)) +Kk)Rk)K T k). Problem 0 First, we epand ) T I Kk)Hk) Pk ) I Kk)Hk)) +Kk)Rk)K T k) ) = Pk ) Kk)Hk)Pk ) Pk )H T k)k T k)+kk) Hk)Pk )H T k)+rk) K T k). We use the results from problems 7 and 8 and evaluate the summands individually, since tracea+b) = tracea)+traceb). We find Kk) trace Pk ) ) = 0 Kk) trace Kk)Hk)Pk ) ) = Pk )H T k), note that Pk ) = P T k ) Kk) trace Pk )H T k)k T k) ) = Pk )H T k), note that tracea) = trace A T) ) ) ) Kk) trace Kk) Hk)Pk )H T k) +Rk) K T k) = Kk) Hk)Pk )H T k) +Rk). Note that R T k) = Rk) and Hk)Pk )H T k) ) T = Hk)Pk )H T k). Finally, ) Kk) Hk)Pk )H T k) +Rk) = Pk )H T k). Kk) = Pk )H T k) Hk)Pk )H T k) +Rk))

Problem a) We initialize P0) =, ˆ0) = 0, Hk) = and Rk) = k =,,..., and get: Kk) = Pk ) Pk ) + 6) Pk) = Kk)) Pk ) +Kk) = Pk ) +Pk )) + Pk ) +Pk )) = Pk ) +Pk ). 7) From 7) and P0) =, we have P) =, P) = 3,..., and we can show by induction see below) that Pk) = +k, Kk) = +k from 6). Proof by induction: Assume Pk) = +k. 8) Start with P0) = +0 =, which satisfies 8). Now we show for k+): Pk+) = = = +k+ = k+ = Pk) +Pk) +k + +k +k +) by assumption 8) q.e.d. from 7) b) The Matlab code is available on the class web page. Recursive least squares works, and provides a reasonable estimate after several thousand iterations. The recursive least squares estimator is the optimal estimator for Gaussian noise. c) The Matlab code is available on the class web page. Recursive least squares works, but we can do much better with the proposed nonlinear estimator derived in Problem. 3

Problem a) The MMSE estimate is ˆ MMSE k) = argmin ˆ =,) ˆ ) f z),z),...,zk)). 9) Let us first consider the MMSE estimate at time k =. The only possible combinations are: w) z) - - - - 0-0 We can now construct the conditional PDF of when conditioned on z): z) f z)) - - - 0 / 0 / Therefore, if z) = or z) =, we know precisely: = and =, respectively. If z) = 0, it is equally likely that = or =. The MMSE estimate is for z) = : for z) = : for z) = 0: ˆ MMSE ) = since f = z) = ) = 0 ˆ MMSE ) = since f = z) = ) = 0 ˆ MMSE ) = argmin ˆ = argmin ˆ = argmin ˆ ˆ + ) = 0. f = z) = 0)ˆ ) +f = z) = 0)ˆ+) ) ˆ ) + ) ˆ+) Note that the same result could have been obtained by evaluating ˆ MMSE k) = E[ z)] = f z) = 0) = )+ ) ) = 0. =,) At time k =, we could construct a table for the conditional PDF of when conditioned on z) and z) as above. However, note that the MMSE estimate is given for all k if one measures either or at time k =, for z) =, ˆ MMSE k) = k =,,... for z) =, ˆ MMSE k) = k =,,.... 4

Ontheotherhand, ameasurementz) = 0doesnotprovideanyinformation, i.e. ) = and ) = stay equally likely. As soon as a subsequent measurement zk), k, has a value of or, is known precisely which is also reflected by the MMSE estimate. As long as we keep measuring 0, we do not gain additional information and ˆ MMSE k) = 0 if zl) = 0 l =,,...,k. Matlab code for this nonlinear MMSE estimator is available on the class webpage. b) The solution is identical to part a), ecept that the estimate is the minimizing value from the sample space of : ˆ MMSE k) = argmin ˆ ) f z),z),...,zk)) ˆ X. 0) =,) The same argument holds that, if one measurement is either - or, we know precisely. However, for z) = 0, the estimate is ˆ MMSE ) = argmin f = z) = 0)ˆ ) +f = z) = 0)ˆ+) ) ˆ X = argmin ˆ X = ±. ˆ ) + ) ˆ+) Note that X =,} and therefore, we cannot differentiate and set to zero theright hand side of Equation 0 to find the minimizing value. This implies that the estimate is not necessarily equal to E[ z], which is the case for measuring z = 0. In this case E[ z = 0] is equal 0, whereas the MMMSE estimator yields ± for ˆ MMSE. The remainder of the solution is identical to the above part a) in that the estimate remains ˆ MMSE = ± until a measurement is either - or. 5