Least Square Es?ma?on, Filtering, and Predic?on: ECE 5/639 Sta?s?cal Signal Processing II: Linear Es?ma?on

Similar documents
Statistical and Adaptive Signal Processing

Advanced Digital Signal Processing -Introduction

ECE 8440 Unit 13 Sec0on Effects of Round- Off Noise in Digital Filters

Adap>ve Filters Part 2 (LMS variants and analysis) ECE 5/639 Sta>s>cal Signal Processing II: Linear Es>ma>on

ADAPTIVE FILTER THEORY

1 Cricket chirps: an example

SIMON FRASER UNIVERSITY School of Engineering Science

Part III Spectrum Estimation

ADAPTIVE FILTER THEORY

CONTENTS NOTATIONAL CONVENTIONS GLOSSARY OF KEY SYMBOLS 1 INTRODUCTION 1

ADSP ADSP ADSP ADSP. Advanced Digital Signal Processing (18-792) Spring Fall Semester, Department of Electrical and Computer Engineering

Deriva'on of The Kalman Filter. Fred DePiero CalPoly State University EE 525 Stochas'c Processes

Applied Linear Algebra in Geoscience Using MATLAB

Linear Prediction 1 / 41

Statistical Signal Processing Detection, Estimation, and Time Series Analysis

6. Methods for Rational Spectra It is assumed that signals have rational spectra m k= m

Chapter 2 Wiener Filtering

DS-GA 1002 Lecture notes 10 November 23, Linear models

UVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia

Wiener Filtering. EE264: Lecture 12

Polynomials and Gröbner Bases

Laboratory Project 2: Spectral Analysis and Optimal Filtering

Least squares: the big idea

5.6. PSEUDOINVERSES 101. A H w.

Introduction to Particle Filters for Data Assimilation

CCNY. BME I5100: Biomedical Signal Processing. Stochastic Processes. Lucas C. Parra Biomedical Engineering Department City College of New York

Stochas(c Dual Ascent Linear Systems, Quasi-Newton Updates and Matrix Inversion

Machine Learning and Data Mining. Linear regression. Prof. Alexander Ihler

Bellman s Curse of Dimensionality

Sta$s$cal sequence recogni$on

Lecture 5 Least-squares

Nonparametric and Parametric Defined This text distinguishes between systems and the sequences (processes) that result when a WN input is applied

Practical Spectral Estimation

Other types of errors due to using a finite no. of bits: Round- off error due to rounding of products

ECE Unit 4. Realizable system used to approximate the ideal system is shown below: Figure 4.47 (b) Digital Processing of Analog Signals

Least Squares Parameter Es.ma.on

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Parametric Method Based PSD Estimation using Gaussian Window

Pseudoinverse & Moore-Penrose Conditions

E : Lecture 1 Introduction

Lecture 7: Linear Prediction

Bias/variance tradeoff, Model assessment and selec+on

Linear Regression and Correla/on. Correla/on and Regression Analysis. Three Ques/ons 9/14/14. Chapter 13. Dr. Richard Jerz

Linear Regression and Correla/on

Lesson 1. Optimal signalbehandling LTH. September Statistical Digital Signal Processing and Modeling, Hayes, M:

STAD68: Machine Learning

Numerical Methods in Physics

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL.

STA 4273H: Sta-s-cal Machine Learning

1. Determine if each of the following are valid autocorrelation matrices of WSS processes. (Correlation Matrix),R c =

COMP 562: Introduction to Machine Learning

III.C - Linear Transformations: Optimal Filtering

Linear Optimum Filtering: Statement

Review problems for MA 54, Fall 2004.

I. Multiple Choice Questions (Answer any eight)

Reduced Models for Process Simula2on and Op2miza2on

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

There are six more problems on the next two pages

The goal of the Wiener filter is to filter out noise that has corrupted a signal. It is based on a statistical approach.

Linear Algebra, part 3 QR and SVD

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Quan&fying Uncertainty. Sai Ravela Massachuse7s Ins&tute of Technology

ARMA SPECTRAL ESTIMATION BY AN ADAPTIVE IIR FILTER. by JIANDE CHEN, JOOS VANDEWALLE, and BART DE MOOR4

Fitting Linear Statistical Models to Data by Least Squares: Introduction

Linear Models Review

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Linear Stochastic Models. Special Types of Random Processes: AR, MA, and ARMA. Digital Signal Processing

Introduc)on to linear algebra

(a)

Complement on Digital Spectral Analysis and Optimal Filtering: Theory and Exercises

ODEs + Singulari0es + Monodromies + Boundary condi0ons. Kerr BH ScaRering: a systema0c study. Schwarzschild BH ScaRering: Quasi- normal modes

FINAL EXAM Ma (Eakin) Fall 2015 December 16, 2015

Automatic Autocorrelation and Spectral Analysis

P = A(A T A) 1 A T. A Om (m n)

(v, w) = arccos( < v, w >

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

MATH36001 Generalized Inverses and the SVD 2015

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague

Complement on Digital Spectral Analysis and Optimal Filtering: Theory and Exercises

Pseudoinverse and Adjoint Operators

Designing Information Devices and Systems II

CS 6140: Machine Learning Spring 2016

Linear Algebra and Robot Modeling

7. Symmetric Matrices and Quadratic Forms

Generalized Sidelobe Canceller and MVDR Power Spectrum Estimation. Bhaskar D Rao University of California, San Diego

Lecture 4 Orthonormal vectors and QR factorization

Linear Regression with mul2ple variables. Mul2ple features. Machine Learning

(v, w) = arccos( < v, w >

Parametric Signal Modeling and Linear Prediction Theory 4. The Levinson-Durbin Recursion

EECS 275 Matrix Computation

Time- varying signals: cross- and auto- correla5on, correlograms. NEU 466M Instructor: Professor Ila R. Fiete Spring 2016

Linear Algebra- Final Exam Review

General linear model: basic

ECE4270 Fundamentals of DSP Lecture 20. Fixed-Point Arithmetic in FIR and IIR Filters (part I) Overview of Lecture. Overflow. FIR Digital Filter

Basic Elements of Linear Algebra

Introduction to Numerical Linear Algebra II

Sec$on The Use of Exponen$al Weigh$ng Exponen$al weigh$ng of a sequence x(n) is defined by. (equa$on 13.70)

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 11: Hidden Markov Models

Transcription:

Least Square Es?ma?on, Filtering, and Predic?on: Sta?s?cal Signal Processing II: Linear Es?ma?on Eric Wan, Ph.D. Fall 2015 1

Mo?va?ons If the second-order sta?s?cs are known, the op?mum es?mator is given by the normal equa?ons or the solu?on to the Wiener-Hopf Equa?ons For most applica?ons, the actual sta?s?cs are unknown. Alterna?ve approach is to es?mate the coefficients from observed data Two possible approaches Es?mate required moments from available data and build an approximate MMSE es?mator Build an es?mator that minimizes some error func?onal calculated from the available data 2

MMSE versus Least Squares Recall that MMSE es'mators are op?mal in expecta?on across the ensemble of all stochas?c processes with the same second order sta?s?cs Least squares es?mators minimize the error on a given block of data In signal processing applica?ons, the block of data is a fnite-length period of?me Note the book defines E as a sum instead of an average No guarantees about op?mality on other data sets or other stochas?c processes When can we infer something about the ensemble performance based on a single observa?on sequence of an experiment? 3

MMSE versus Least Squares No guarantees about op?mality on other data sets or other stochas?c processes When can we infer something about the ensemble performance based on a single observa?on sequence of an experiment? If the process is ergodic and sta'onary, the LSE es?mator approaches the MMSE es?mator as the size of the data set grows Will only discuss the sum of squares as the performance criterion Recall our earlier discussion about alterna?ves Ra?onale Mathema?cally tractable - Picking sum of squares will permit us to obtain a closedform op?mal solu?on Solu?on only depends on second order moments, which are easily es?mated 4

Block Processing 5

Least Squares Least squares is a method for finding the best fit to a linear system of N equa?ons and M unknown, a 11 x 1 + a 12 x 2 = y 1 a 21 x 2 + a 22 x 2 = y 2 N = M! # # " a 11 a 12 a 21 a 22 $! &# &# %" x 1 x 2 $! & & = y # 1 # % " y 2 $ & & % Ax = y x = A 1 y 6

Least Squares Least squares is a method for finding the best fit to a linear system of N equa?ons and M unknown, a 11 a 12 a 21 a 22 a 31 a 32 x 1 x 2 = y 1 y 2 y 3 N > M Ax = y x = A 1 y e = y Ax A Is called the Data Matrix min x (e T e) = min e(n) 2 x N n=1 (note index n starts at 1 versus 0 in book) 7

Linear Least Squares Es?ma?on and Filtering Back to book s nota?on 8

Linear Least Squares Es?ma?on and Filtering Defini?ons 9

Matrix Formula?on - Mul?ple Sensors (previous nota?on: e = y Ax ) 10

Matrix Formula?on - Mul?ple Sensors 11

Matrix Formula?on - Filtering Principle is the same, but need to consider edge effects. Will come back to computa?onal and windowing aspects of filtering later 12

Matrix Formula?on Back to mul?ple signals 13

Squared Error 14

Squared Error Components (note that book does not normalize by 1/N) 15

Squared Error Components 16

Rela?ng LSE to MMSE Es?ma?on Plugging in our defini?ons for R and d Which should look familiar from before. Many of the concepts, solu?ons, etc. will be similar 17

Least Squares Es?mate Three ways to solve for the least squares es?mate 1. Take gradient and set to zero 2. Complete the square 3. Orthogonality 18

Least Squares Es?mate 1. Take gradient and set to zero e T e = (y - Xc) T (y - Xc) This yields the normal equa?ons = y T y 2c T X T y + c T X T Xc e T e c = 2XT y + 2X T Xc = 0 X T Xc = X T y ˆRc = ˆd For N>M the problem is almost always overdetermined and hence the columns of the data matrix X are independent. This implies X T X is full rank. c LS = ( X T X) 1 X T y c LS = ˆR 1ˆd 19

Least Squares Es?mate 2. Complete the square 20

Least Squares Es?mate 2. Complete the square Both the LSE and MSE criteria are quadra?c func?ons of the coefficient vector Same form as the FIR Wiener solu?on When are they equivalent? For an ergodic process in the limit of large N 21

A toy example Line fiing y x" x" x" x" x" x" x" x" x" y = ax + b x! # # # # # " e 1 e 2! e N $! & # & # & = # & # & # % " y 1 y 2! y N $! & # & # & # & # & # % " x 1 1 x N 1!! x 1 1 $ & &! &# &" & % a b $ & % e = y - Xc Note the coefficients could be for a higher-order polynomial. The system must be linear in the unknown parameters, not the equa?ons themselves. 22

More Applica?ons Applica?ons we ve seen earlier for Linear Es?ma?on / Wiener Filtering Noise reduc?on Equaliza?on Predic?on System Iden?fica?on Can be solved using LS given a block of data Applies to FIR (not general IIR filter) Numerical and computa?onal aspects need further inves?ga?on 23

Computa?onal Issues 24

Example: Time Series Predic?on (from McNames notes) Goal: Predict the S&P 500 Clearly not a sta?onary signal What might we do? Common trick: difference the?me series 25

Example: Difference Time Series 26

Example: Percent Change Time Series 27

Example: Predic?on Results 28

Orthogonality and Geometric Interpreta?on (2-D illustra?on) Consider the simple example:! X = # 2 " 1 $! & y = # 2 % " 2 $ & % e = y - Xc y e o Xc LS ŷ = Xc ŷ(0) = 2c ŷ(1) =1c If we want to make Xc LS as close as possible to y, then the error vector e should be orthogonal to the line (column space) Xc (Xc) T e o = 0 c c T X T (Xc LS y) = c T [X T Xc LS X T y] = 0 c P = X(X T X) 1 X T Since this must hold for all c, we must have, X T Xc LS = X T y (normal equa2ons) 29

Orthogonality and Geometric Interpreta?on (2-D illustra?on) Consider the simple example:! X = # 2 " 1 $! & y = # 2 % " 2 $ & % e = y - Xc y e o Xc LS ŷ = Xc ŷ(0) = 2c ŷ(1) =1c Subs?tu?ng directly for the LS solu?on: c LS = ( X T X) 1 X T y ŷ = X(X T X) 1 X T y The matrix P = X(X T X) 1 X T is a projec?on operator which projects y onto the space spanned by X 30

Orthogonality and Geometric Interpreta?on (book) 31

Orthogonality and Geometric Interpreta?on 32

Orthogonality and Geometric Interpreta?on 33

Uniqueness 34

The Pseudoinverse We can write the least squares solu?on as c LS = ( X T X) 1 X T y = X + y Where we have defined the pseudoinverse of the matrix independent columns: X + = ( X T X) 1 X T The pseudoinverse has the following proper?es X with linear i) XX + X = X ii) (XX + ) T = XX + It can be shown using orthogonality that any matrix X + sa?sfying the above two condi?ons yields the least squares solu?on X + y to the equa?on y -Xc = e. 35

Minimum Norm Solu?on Suppose that the columns of X are not linearly independent, or simple N<M. Then X T X cannot be inverted and there are an infinite number of solu?ons which solve y = Xc exactly Which to choose? X =! 1 2 "# $ %& y =! 4 "# $ %& 1c 1 + 2c 2 = 4 y - Xc = 0 c 2 c min c 1 c min is orthogonal to the subspace of y - Xc = 0 c min is orthogonal to the null space of X is in the range space of X T c min c = X T min λ 36

Minimum Norm Solu?on Solving c min = X T c min = X T λ y = Xc y = XX T λ λ = (XX T ) 1 y ( X T X) 1 y = X + y Psuedoinverse X + = X T ( X T X) 1 iii) X + XX + = X + iv) (X + X) T = X + X Moore-Penrose pseudoinverse. For any matrix X there is only one matrix X + that sa?sfies all four condi?ons 37

Weighed Least Squares 38

Weighed Least Squares 39

Weighed Least Squares 40

Proper?es of the LS Es?mate Assume a sta?s?cal model of how the data was generated Some proper?es won t hold when the model is not accurate 41

Determinis?c versus Stochas?c Data Matrix 42

Es?mator Proper?es (Determinis?c case) 43

Es?mator Proper?es (Determinis?c case) Error Variance Define This is is an unbiased es?mate of the true error variance See book for proof 44

Es?mator Proper?es (Determinis?c case) Other proper?es - see book for proofs 45

Es?mator Proper?es (Stochas?c case) 46

Another Perspec?ve System iden?fica?on v(n) x(n) H(z) y(n) c LS ŷ(n) - e(n) Addi?ve noise did not affect the Wiener solu?on LS solu?on is s?ll unbiased (if the model matches). But adds variance to the LS solu?on, so you need more data for a good fit. 47

Example: Power Spectral Es?ma?on Power spectral es?ma?on (material from chapter 7) 1. Use non-parametric approach (e.g., Welch method) from ECE538 2. Fit a model e(n) H(z) x(n) If e(n) is white, then Use an autoregressive model driven by noise, e(n) M 1 x(n) = a k x(n k) + e(n) k=1 48

Example: Power Spectral Es?ma?on Least Squares Solu?on a LS = c LS = ( X T X) 1 X T y Note autoregressive models have a duality with predic?on y(n) = x(n +1) x(n) h lp (n) ŷ(n)!" e(n +1) (we will discuss windowing aspects later) 49

Example: Power Spectral Es?ma?on Generate some data e(n) H(z) x(n) 1 H(z) = 1+.3z 1 +.6z 2 6 x 4 2 0 2 4 6 0 100 200 300 400 500 600 700 800 900 1000 n 50

Example: Power Spectral Es?ma?on True Power Spectrum 25 Power Spectrum Rxx(e jw ) 20 15 Power (db) 10 5 0 5 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Freq 51

Example: Power Spectral Es?ma?on Periodogram 25 Rxx(e jw ) versus DFT 2 20 15 Power (db) 10 5 0 5 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Freq 52

Example: Power Spectral Es?ma?on Welch Method 25 Rxx(e jw ) versus DFT 2 (WELCH(256)) 20 15 Power (db) 10 5 0 5 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Freq 53

Example: Power Spectral Es?ma?on LS Fit (M=5) 25 Rxx(e jw ) versus AR fit 20 15 Power (db) 10 5 0 5 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Freq 54

Power Spectral Es?ma?on Mixing parametric and non-parametric Use predic?on filter to whiten the error Use non-parametric on the residual Color the PSD es?mate using the AP model (undo the pre-whitening) 55

Least Squares Filtering Addi?onal aspects 56

Edge condi?ons and windowing 57

Compu?ng the correla?on matrix 58

Deriva?on of Correla?on Matrix Recursion 59

Window op?ons See text for minor modifica?ons to correla?on recursions 60

More on computa?onally efficient methods Methods based on forward-backward predic?on and Order Recursive algorithms A lot of the details in Chapter 7, 8.5.2 We will touch on some aspects rela?ng to predic?on Allows for an alterna?ve windowing approach. 8.4.2 General Linear Algebra approaches (come back to this) Cholesky decomposi?on, SVD, etc. 8.5 61

Linear Predic?on Recall the Wiener Solu?on y(n) = x(n +1) M x(n) = a k x(n k) + e(n) k=1 x(n) a ŷ(n)!" e(n) Ra o = d d What is?! # # d = E[x(n) y(n)] = E[x(n)x(n +1)] = # # # " r x (1) r x (2)! r x (M ) $ & & & = r & & % Ra o = r M 1 r(n) = a k r(n 1) k=1 These are known as the Yule-Walker equa?ons Leads to efficient order recursive computa?ons (Levinson-Durbin algorithms) 62

!" Backward Linear Predic?on Think of?me running in the reverse y(n) = x(n +1) a f x(n +1) x(n) a f ŷ(n)!" e f (n) y(n) = x(n M 1) x(n M 1) a b e b (n) ŷ(n) a b x(n) M x(n) = a f x(n k) + e f (n) x(n M 1) = a b x(n k) + e b (n) k k k=1 M k=1 Easy to show using the Yule-Walker equa?ons that So how do we make use of this? Note, book s nota?on slightly different (b = a b ) a f = flip(a b ) 63

Forward-Backward Linear Predic?on Minimize the forward and backward error. Double the size of the data matrix F w a FB F B B Lowers the variance of the LS es?mate Correla?on or modified covariance window methods See book for addi?onal details and more careful nota?on See MATLAB s ar(x,m) (uses forward-backward and short/no windows by default) 64

Applica?on Example: Narrow band interference canceling 65

Applica?on Example: Narrow band interference canceling 66

Applica?on Example: Narrow band interference canceling 67

Applica?on Example: Narrow band interference canceling Just a D-Step ahead predictor Some?mes called a line enhancer 68

Example: Microelectrode Narrowband Interference 69

Example: Microelectrode Narrowband Interference Signal PSD 0.1 Input PSD 0.09 0.08 0.07 PSD (scaled) 0.06 0.05 0.04 0.03 0.02 0.01 0 0 2000 4000 6000 8000 10000 Frequency (Hz) 70

Example: Microelectrode Narrowband Interference Residual PSD 0.1 Output PSD 0.09 0.08 0.07 PSD (scaled) 0.06 0.05 0.04 0.03 0.02 0.01 0 0 2000 4000 6000 8000 10000 Frequency (Hz) 71

Example: Microelectrode Narrowband Interference Input and Predicted Signal 0.8 0.6 NMSE= 93.4% D = 44 (5 seconds) M = 500 Observed Estimated 0.4 0.2 Signal 0 0.2 0.4 0.6 0.8 1 0 0.05 0.1 0.15 0.2 0.25 0.3 Time (s) 72

Example: Microelectrode Narrowband Interference Predic?on filter frequency response 10 1 Prediction Filter 10 0 Magnitude Response H(e jω ) 2 10 1 10 2 10 3 10 4 10 5 10 6 0 2000 4000 6000 8000 10000 Frequency (Hz) 73

Example: Microelectrode Narrowband Interference Predic?on error filter frequency response 10 1 Prediction Error Filter 10 0 Magnitude Response H(e jω ) 2 10 1 10 2 10 3 10 4 0 2000 4000 6000 8000 10000 Frequency (Hz) 74

Example: OGI Seminar 1 original recording 0.5 0 0.5 1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 10 4 0.4 noise segment 0.2 0 0.2 0.4 0.6 100 200 300 400 500 600 700 800 75

Example: OGI Seminar 1 original recording D =1 M = 500 0.5 0 0.5 1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 10 4 1 Enhanced Speech 0.5 0 0.5 1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 10 4 (Audi2on Noise reduc2on) 76

Example Applica?on: IIR filtering / System ID We previously used a predictor / AR model (all-pole) for spectral es?ma?on Now consider a general zero/pole IIR model Given input x(n) and desired output y(n) v(n) x(n)? y(n) H LS (z) = B(z) A(z) ŷ(n) - e(n) M 1 ŷ(n) = a(n) ŷ(n k) + b(n)x(n k) k=1 M 1 k=0 77

IIR filtering / System ID Data matrices M 1 ŷ(n) = a(n) ŷ(n k) + b(n)x(n k) k=1 M 1 k=0 - y What s wrong with this? How do we get ŷ(n)! X c 78

IIR filtering / System ID Data matrices M 1 ŷ(n) = a(n) ŷ(n k) + b(n)x(n k) k=1 M 1 k=0 - y Subs?tute the for the best solu?on: Called Equa?on Error method Easy to show the solu?on is unbiased if no noise, X ŷ(n) y(n) v(n) = 0 c 79

Back to Numerical Methods QR decomposi?on (just the basics) Any matrix with linearly independent columns can be factored as X = QR R is upper triangular and inver?ble (not to be confused with the autocorrela?on matrix) The columns of Q are Orthonormal Q T Q = I Factoriza?on is achieved using Gram-Schimdt or Householder algorithms Subs?tu?ng into the LS equa?ons ( ) 1 X T y c LS = X T X = (R T Q T QR) 1 R T Q T y = (R T R) 1 R T Q T y = R 1 Q T y Which is easily solves using back subs?tu?on since R is upper triangular MATLAB s backslash command, c LS = X \ y Rc LS = Q T y 80

Back to Numerical Methods Singular Value Decomposi?on Any matrix of rank r can be factored 81

Back to Numerical Methods Singular Value Decomposi?on Any matrix of rank r can be factored Easy to show that the pseudoinverse is given by Thus the LS solu?on is just c LS = X + y 82

Other Topics Not Covered Addi?onal details on Numeric Methods Details on Signal Modeling and Parametric Spectral Es?ma?on Minimum variance spectral es?ma?on Harmonic models and super resolu'on algorithms. 83