Lecture 4: Least Squares (LS) Estimation

Similar documents
4 Derivations of the Discrete-Time Kalman Filter

A DARK GREY P O N T, with a Switch Tail, and a small Star on the Forehead. Any

Solutions to Homework Set #6 (Prepared by Lele Wang)

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

Multivariate probability distributions and linear regression

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

ESTIMATION THEORY. Chapter Estimation of Random Variables

UCSD ECE153 Handout #30 Prof. Young-Han Kim Thursday, May 15, Homework Set #6 Due: Thursday, May 22, 2011

Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π

UCSD ECE153 Handout #27 Prof. Young-Han Kim Tuesday, May 6, Solutions to Homework Set #5 (Prepared by TA Fatemeh Arbabjolfaei)

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

ECE 541 Stochastic Signals and Systems Problem Set 9 Solutions

Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel

ENGG2430A-Homework 2

18.440: Lecture 26 Conditional expectation

A Probability Review

The Hilbert Space of Random Variables

ENGR352 Problem Set 02

LOWELL WEEKLY JOURNAL

ECE531 Lecture 11: Dynamic Parameter Estimation: Kalman-Bucy Filter

Appendix A : Introduction to Probability and stochastic processes

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares

ACCEPTS HUGE FLORAL KEY TO LOWELL. Mrs, Walter Laid to Rest Yesterday

Neatest and Promptest Manner. E d i t u r ami rul)lihher. FOIt THE CIIILDIIES'. Trifles.

ECE Lecture #9 Part 2 Overview

STAT Homework 8 - Solutions

Econ 2120: Section 2

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Lecture Notes Special: Using Neural Networks for Mean-Squared Estimation. So: How can we evaluate E(X Y )? What is a neural network? How is it useful?

CS229 Lecture notes. Andrew Ng

Problem Y is an exponential random variable with parameter λ = 0.2. Given the event A = {Y < 2},

LOWELL JOURNAL. MUST APOLOGIZE. such communication with the shore as Is m i Boimhle, noewwary and proper for the comfort

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Optimal Linear Unbiased Filtering with Polar Measurements for Target Tracking Λ

Exercise Sheet 1.

2x (x 2 + y 2 + 1) 2 2y. (x 2 + y 2 + 1) 4. 4xy. (1, 1)(x 1) + (1, 1)(y + 1) (1, 1)(x 1)(y + 1) 81 x y y + 7.

Two Posts to Fill On School Board

3. Probability and Statistics

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 8 Fall 2007

SIMG-713 Homework 5 Solutions

Fisher Information Matrix-based Nonlinear System Conversion for State Estimation

Ch4. Distribution of Quadratic Forms in y

Kalman Filtering. Namrata Vaswani. March 29, Kalman Filter as a causal MMSE estimator

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Simple Linear Regression Estimation and Properties

Lecture 8 Analyzing the diffusion weighted signal. Room CSB 272 this week! Please install AFNI

6.041/6.431 Fall 2010 Quiz 2 Solutions

conditional cdf, conditional pdf, total probability theorem?

PALACE PIER, ST. LEONARDS. M A N A G E R - B O W A R D V A N B I E N E.

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Vectors and Matrices Statistics with Vectors and Matrices

Kalman Filter. Man-Wai MAK

Course 2BA1: Hilary Term 2007 Section 8: Quaternions and Rotations

' Liberty and Umou Ono and Inseparablo "

5 Operations on Multiple Random Variables

MANY BILLS OF CONCERN TO PUBLIC

COMP 175 COMPUTER GRAPHICS. Lecture 04: Transform 1. COMP 175: Computer Graphics February 9, Erik Anderson 04 Transform 1

Gaussian Random Variables Why we Care

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Homework 1/Solutions. Graded Exercises

X t = a t + r t, (7.1)

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Linear Models and Estimation by Least Squares

MATH443 PARTIAL DIFFERENTIAL EQUATIONS Second Midterm Exam-Solutions. December 6, 2017, Wednesday 10:40-12:30, SA-Z02

Lecture Note 1: Probability Theory and Statistics

Matrix Algebra, Class Notes (part 2) by Hrishikesh D. Vinod Copyright 1998 by Prof. H. D. Vinod, Fordham University, New York. All rights reserved.

Statistics 910, #15 1. Kalman Filter

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

Linear Algebra. Chapter 8: Eigenvalues: Further Applications and Computations Section 8.2. Applications to Geometry Proofs of Theorems.

Multivariate Gaussians. Sargur Srihari

Least Squares and Kalman Filtering Questions: me,

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.

OWELL WEEKLY JOURNAL

ECE Lecture #10 Overview

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Gaussian random variables inr n

Unit IV State of stress in Three Dimensions

UCSD ECE 153 Handout #46 Prof. Young-Han Kim Thursday, June 5, Solutions to Homework Set #8 (Prepared by TA Fatemeh Arbabjolfaei)

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Advanced Random Processes

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

ECON 186 Class Notes: Optimization Part 2

Multipole moments. Dipole moment. The second moment µ is more commonly called the dipole moment, of the charge. distribution and is a vector

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2

A. H. Hall, 33, 35 &37, Lendoi

Least Squares Estimation Namrata Vaswani,

PEAT SEISMOLOGY Lecture 2: Continuum mechanics

Regression and Covariance

Closed-Form Solution Of Absolute Orientation Using Unit Quaternions

Stochastic Processes

Polynomials. In many problems, it is useful to write polynomials as products. For example, when solving equations: Example:

12. Stresses and Strains

Multivariate Random Variable

DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012

Transcription:

ME 233, UC Berkeley, Spring 2014 Xu Chen Lecture 4: Least Squares (LS) Estimation Background and general solution Solution in the Gaussian case Properties Example Big picture general least squares estimation: given: jointly distributed x (n-dimensional) & y (m-dimensional) goal: find the optimal estimate ˆx that minimizes E x ˆx 2 y = y1 = E (x ˆx) T (x ˆx) y = y 1 solution: consider J (z) = E x z 2 y = y 1 = E x T x y = y 1 2z T Ex y = y 1 + z T z which is quadratic in z. For optimal cost, hence z J(z) = 0 z = Ex y = y 1 ˆx ˆx = Ex y = y 1 = xp x y (x y 1 )dx { } J min = J (ˆx) = Tr E (x ˆx)(x ˆx) T y = y 1 Lecture 4: Least Squares (LS) Estimation ME233 4-1

Big picture general least squares estimation: ˆx = Ex y = y 1 = achieves the minimization of solution concepts: E x ˆx 2 y = y1 xp x y (x y 1 )dx the solution holds for any probability distribution in y for each y 1, Ex y = y 1 is different if no specific value of y is given, ˆx is a function of the random vector/variable y, written as ˆx = Ex y Lecture 4: Least Squares (LS) Estimation ME233 4-2 Least square estimation in the Gaussian case Why Gaussian? Gaussian is common in practice: macroscopic random phenomena = superposition of microscopic random effects (Central limit theorem) Gaussian distribution has nice properties that make it mathematically feasible to solve many practical problems: pdf is solely determined by the mean and the variance/covariance linear functions of a Gaussian random process are still Gaussian the output of an LTI system is a Gaussian random process if the input is Gaussian if two jointly Gaussian distributed random variables are uncorrelated, then they are independent X1 and X 2 jointly Gaussian X 1 X 2 and X 2 X 1 are also Gaussian Lecture 4: Least Squares (LS) Estimation ME233 4-3

Least square estimation in the Gaussian case Why Gaussian? Gaussian and white: they are different concepts there can be Gaussian white noise, Poisson white noise, etc Gaussian white noise is used a lot since it is a good approximation to many practical noises Lecture 4: Least Squares (LS) Estimation ME233 4-4 Least square estimation in the Gaussian case the solution problem (re-stated): x, y Gaussian distributed minimize E x ˆx 2 y solution: ˆx = Ex y = Ex + X xy Xyy 1 (y Ey) properties: the estimation is unbiased: Eˆx = Ex y is Gaussian ˆx is Gaussian; and x ˆx is also Gaussian covariance of ˆx: E (ˆx Eˆx)(ˆx Eˆx) T = E estimation error x x ˆx: zero mean and Cov x = E (x Ex y)(x Ex y) T } {{ } conditional covariance { (y Ey) X xy X 1 yy (y Ey) T } = X xy X 1 yy X yx = X xx X xy X 1 yy X yx Lecture 4: Least Squares (LS) Estimation ME233 4-5

Least square estimation in the Gaussian case ˆx = Ex y = Ex + X xy X 1 yy (y Ey) Ex y is a better estimate than Ex: the estimation is unbiased: Eˆx = Ex estimation error x x ˆx: zero mean and Covx ˆx = X xx X xy X 1 yy X yx Covx EX Lecture 4: Least Squares (LS) Estimation ME233 4-6 Properties of least square estimate (Gaussian case) two random vectors x and y Property 1: (i) the estimation error x = x ˆx is uncorrelated with y (ii) x and ˆx are orthogonal: E (x ˆx) T ˆx = 0 proof of (i): (x E x (y m y ) T = E Ex Xxy Xyy 1 (y m y ) ) (y m y ) T = X xy X xy X 1 yy X yy = 0 Lecture 4: Least Squares (LS) Estimation ME233 4-7

Properties of least square estimate (Gaussian case) two random vectors x and y proof of (ii): E x T ˆx = E (x ˆx) T ( Ex + X xy Xyy 1 (y m y ) ) = E x T Ex + E (x ˆx) T X xy Xyy 1 (y m y ) where E x T = 0 and E (x ˆx) T X xy Xyy 1 (y m y ) { = Tr X xy Xyy 1 E { } = Tr E X xy Xyy 1 (y m y )(x ˆx) T (y m y )(x ˆx) T } = 0 because of (i) note: Tr{BA} = Tr{AB}. Consider, e.g. A = a,b, B = c d Lecture 4: Least Squares (LS) Estimation ME233 4-8 Properties of least square estimate (Gaussian case) two random vectors x and y Property 1 (re-stated): (i) the estimation error x = x ˆx is uncorrelated with y (ii) x and ˆx are orthogonal: E (x ˆx) T ˆx = 0 intuition: least square estimation is a projection x x = x ˆx y ˆx Lecture 4: Least Squares (LS) Estimation ME233 4-9

Properties of least square estimate (Gaussian case) three random vectors x y and z, where y and z are uncorrelated Property 2: let y and z be Gaussian and uncorrelated, then (i) the optimal estimate of x is first improvement second improvement {}}{{}}{ Ex y,z = Ex + (Ex y Ex)+ (Ex z Ex) = Ex y + (Ex z Ex) Alternatively, let ˆx y Ex y, x y x Ex y = x ˆx y, then Ex y,z = Ex y + E x y z (ii) the estimation error covariance is X xx X xy Xyy 1 X yx X xz Xzz 1 X zx = X x x X xz Xzz 1 X zx = X x x X xz Xzz 1 X z x where X x x = E x y x y T and X xz = E x y (z m z ) T Lecture 4: Least Squares (LS) Estimation ME233 4-10 Properties of least square estimate (Gaussian case) three random vectors x y and z, where y and z are uncorrelated proof of (i): let w = y,z T Ex w = Ex + X X xy X yy X yz xz X zy X zz Using X yz = 0 yields 1 y Ey z Ez Ex w = Ex + X xy Xyy 1 (y Ey) +X xz Xzz 1 (z Ez) }{{}}{{} Ex y Ex Ex z Ex = Ex y + E (ˆx y + x y ) z Ex = Ex y + E x y z where E ˆx y z = EEx y z = Ex as y and z are independent Lecture 4: Least Squares (LS) Estimation ME233 4-11

Properties of least square estimate (Gaussian case) three random vectors x y and z, where y and z are uncorrelated proof of (ii): let w = y,z T, the estimation error covariance is X xx X xw X 1 wwx wx = X xx X xy X 1 yy X yx X xz X 1 zz X zx additionally ( X xz = E (x Ex)(z Ez) T = E ˆx y + x y Ex )(z Ez) T = E (ˆx y Ex ) (z Ez) T + E x y (z Ez) T but ˆx y Ex is a linear function of y, which is uncorrelated with z, hence E (ˆx y Ex ) (z Ez) T = 0 and X xz = X x y z Lecture 4: Least Squares (LS) Estimation ME233 4-12 Properties of least square estimate (Gaussian case) three random vectors x y and z, where y and z are uncorrelated Property 2 (re-stated): let y and z be Gaussian and uncorrelated (i) the optimal estimate of x is Ex y,z = Ex y + E x y z (ii) the estimation error covariance is intuition: X x x X xz X 1 zz X z x E x y z x x y z y ˆx y Lecture 4: Least Squares (LS) Estimation ME233 4-13

Properties of least square estimate (Gaussian case) three random vectors x y and z, where y and z are correlated Property 3: let y and z be Gaussian and correlated, then (i) the optimal estimate of x is Ex y,z = Ex y + E x y z y where z y = z ẑ y = z Ez y and x y = x ˆx y = x Ex y (ii) the estimation error covariance is intuition: E x y z y X x y x y X x y z y X 1 z y z y X z y x y z x x y z y y ˆx y Lecture 4: Least Squares (LS) Estimation ME233 4-14 Application of the three properties Consider noise System + + y (k) Given y (0),y (1),...,y (k) T, we want to estimate the state x (k) the properties give a recursive way to compute ˆx (k) {y (0),y (1),...,y (k)} Lecture 4: Least Squares (LS) Estimation ME233 4-15

Example Consider estimating the velocity x of a motor, with Ex = m x = 10 rad/s Varx = 2 rad 2 /s 2 There are two (tachometer) sensors available: y 1 = x + v 1 : Ev 1 = 0, E v 2 1 = 1 rad 2 /s 2 y 2 = x + v 2 : Ev 2 = 0, E v 2 2 = 1 rad 2 /s 2 where v 1 and v 2 are independent, Gaussian, Ev 1 v 2 = 0 and x is independent of v i, E(x Ex)v i = 0 Lecture 4: Least Squares (LS) Estimation ME233 4-16 Example best estimate of x using only y 1 : X xy1 = E(x m x )(y 1 m y1 ) = E(x m x )(x m x + v 1 ) = X xx + E(x m x )v 1 = 2 X y1 y 1 = E(y 1 m y1 )(y 1 m y1 ) = E(x m x + v 1 )(x m x + v 1 ) = X xx + E v 2 1 = 3 ˆx y1 = Ex + X xy1 X 1 y 1 y 1 (y 1 Ey 1 ) = 10 + 2 3 (y 1 10) similarly, best estimate of x using only y 2 : ˆx y2 = 10 + 2 3 (y 2 10) Lecture 4: Least Squares (LS) Estimation ME233 4-17

Example best estimate of x using y 1 and y 2 (direct approach): let y = y 1,y 2 T X xy = E (x m x ) y1 m y1 y 2 m y2 T = 2,2 y1 m X yy = E y1 3 2 y1 m y 2 m y1 y 2 m y2 = y2 2 3 1 ˆx y = Ex+X xy Xyy 1 3 2 y1 10 (y m y ) = 10+2,2 2 3 y 2 10 note: X 1 yy is expensive to compute at high dimensions Lecture 4: Least Squares (LS) Estimation ME233 4-18 Example best estimate of x using y 1 and y 2 (alternative approach using Property 3): Ex y 1,y 2 = Ex y 1 + E x y1 ỹ 2 y1 which involves just the scalar computations: Ex y 1 = 10 + 2 3 (y 1 10), x y1 = x Ex y 1 = 1 3 (x 10) + 2 3 v 1 1 ỹ 2 y1 = y 2 Ey 2 y 1 = y 2 Ey 2 + X y2 y 1 X x y1 ỹ 2 y1 = E (y 1 m y1 ) X y1 y 1 ( 1 3 (x 10) + 2 )( 3 v 1 (y 2 10) 2 3 (y 1 10) ) T = (y 2 10) 2 3 (y 1 10) = 1 9 Varx + 4 9 Varv 1 = 2 3 Xỹ2 y1 ỹ 2 y1 = 1 9 Varx + Varv 2 + 4 9 Varv 1 = 5 3 E x y1 ỹ 2 y1 = E x y1 + X x y1 ỹ 2 y1 1 Xỹ2 y1 ỹ 2 y1 ỹ2 y1 E ỹ 2 y1 = 10 + 2 5 (y 1 10) + 2 5 (y 2 10) Lecture 4: Least Squares (LS) Estimation ME233 4-19

Summary 1. Big picture ˆx = Ex y minimizes J = E x ˆx 2 y 2. Solution in the Gaussian case Why Gaussian? ˆx = Ex y = Ex + X xy X 1 yy (y Ey) 3. Properties of least square estimate (Gaussian case) two random vectors x and y three random vectors x y and z: y and z are uncorrelated three random vectors x y and z: y and z are correlated Lecture 4: Least Squares (LS) Estimation ME233 4-20 * Appendix: trace of a matrix the trace of a n n matrix is given by Tr(A) = n i=1 a ii trace is the matrix inner product: ( ) ( ) A,B = Tr A T B = Tr B T A = B,A (1) take a three-column example: write the matrices in the column vector form B = b 1,b 2,b 3, A = a 1,a 2,a 3, then, a A T 1 T b 1 B = a2 T b 2 (2) a3 T b 3 T ( ) Tr A T B = a1 T b 1 + a2 T b 2 + a3 T b 3 = which is the inner product of the two long stacked vectors. we frequently use the inner-product equality A, B = B, A a 1 a 2 a 3 b 1 b 2 b 3 (3) Lecture 4: Least Squares (LS) Estimation ME233 4-21