Least Squares with Examples in Signal Processing 1. 2 Overdetermined equations. 1 Notation. The sum of squares of x is denoted by x 2 2, i.e.

Similar documents
LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

Statistical Geometry Processing Winter Semester 2011/2012

SPARSITY-ASSISTED SIGNAL SMOOTHING (REVISITED) Ivan Selesnick

Introduction to Sparsity in Signal Processing

Fitting functions to data

Lesson 1. Optimal signalbehandling LTH. September Statistical Digital Signal Processing and Modeling, Hayes, M:

On the Frequency-Domain Properties of Savitzky-Golay Filters

Introduction to Sparsity in Signal Processing

Statistical and Adaptive Signal Processing

SPARSE SIGNAL RESTORATION. 1. Introduction

Assignment #9: Orthogonal Projections, Gram-Schmidt, and Least Squares. Name:

Inverses. Stephen Boyd. EE103 Stanford University. October 28, 2017

Lecture 11 FIR Filters

0.1. Linear transformations

Chap 3. Linear Algebra

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

POLYNOMIAL SINGULAR VALUES FOR NUMBER OF WIDEBAND SOURCES ESTIMATION AND PRINCIPAL COMPONENT ANALYSIS

Basic Concepts in Linear Algebra

Discrete-Time Systems

Introduction to Sparsity in Signal Processing 1

Lecture 6. Numerical methods. Approximation of functions

Review of Basic Concepts in Linear Algebra

George Mason University ECE 201: Introduction to Signal Analysis Spring 2017

Lecture 7: Weak Duality

Trace balancing with PEF plane annihilators

4. Matrix inverses. left and right inverse. linear independence. nonsingular matrices. matrices with linearly independent columns

Expanding brackets and factorising

COUNCIL ROCK HIGH SCHOOL MATHEMATICS. A Note Guideline of Algebraic Concepts. Designed to assist students in A Summer Review of Algebra

PROBLEMS In each of Problems 1 through 12:

Advanced Digital Signal Processing -Introduction

LU Factorization. A m x n matrix A admits an LU factorization if it can be written in the form of A = LU

Basic Linear Inverse Method Theory - DRAFT NOTES

Vector Spaces, Orthogonality, and Linear Least Squares

Linear Models in Machine Learning

Linear Least-Squares Data Fitting

Polynomial Functions of Higher Degree

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

Examples. 2-input, 1-output discrete-time systems: 1-input, 1-output discrete-time systems:

ELECTRONOTES APPLICATION NOTE NO Hanshaw Road Ithaca, NY Mar 6, 2015

Lecture Notes 5: Multiresolution Analysis

Midterm Solutions. EE127A L. El Ghaoui 3/19/11

Multistat2. The course is focused on the parts in bold; a short description will be presented for the other parts

Numerical Integration (Quadrature) Another application for our interpolation tools!

Linear Algebra. and

A Padé approximation to the scalar wavefield extrapolator for inhomogeneous media

Signal Processing COS 323

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

A structured low-rank approximation approach to system identification. Ivan Markovsky

Notes on Eigenvalues, Singular Values and QR

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Introduction to Data Mining

Y m = y n e 2πi(m 1)(n 1)/N (1) Y m e 2πi(m 1)(n 1)/N (2) m=1

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

Lecture 19 IIR Filters

Review of some mathematical tools

Robustness of Principal Components

Section 3.3 Graphs of Polynomial Functions

Introduction to Mobile Robotics Compact Course on Linear Algebra. Wolfram Burgard, Bastian Steder

pset3-sol September 7, 2017

Designing Information Devices and Systems I Fall 2018 Homework 5

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Generalized Pencil Of Function Method

Ch. 7: Z-transform Reading

4. Linear transformations as a vector space 17

Mathematical Methods in Engineering and Science Prof. Bhaskar Dasgupta Department of Mechanical Engineering Indian Institute of Technology, Kanpur

Computational Methods. Least Squares Approximation/Optimization

Math Analysis Chapter 2 Notes: Polynomial and Rational Functions

Linear Regression (9/11/13)

Jeffrey D. Ullman Stanford University

DS-GA 1002 Lecture notes 10 November 23, Linear models

Data-driven signal processing

Numerical Analysis: Solving Systems of Linear Equations

Multivariate Statistical Analysis

Identification of MIMO linear models: introduction to subspace methods

6.02 Fall 2012 Lecture #10

Lecture Note 2: Estimation and Information Theory

Math 307 Learning Goals. March 23, 2010

Theory and Problems of Signals and Systems

Polynomials and Polynomial Functions

and Rational Functions

Sparsity. The implication is that we would like to find ways to increase efficiency of LU decomposition.

Introduction to Mobile Robotics Compact Course on Linear Algebra. Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz

3.3.1 Linear functions yet again and dot product In 2D, a homogenous linear scalar function takes the general form:

Adaptive Noise Cancellation

Sparsity in system identification and data-driven control

Multiplying matrices by diagonal matrices is faster than usual matrix multiplication.

Name: INSERT YOUR NAME HERE. Due to dropbox by 6pm PDT, Wednesday, December 14, 2011

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Very useful for designing and analyzing signal processing systems

Blind Identification of FIR Systems and Deconvolution of White Input Sequences

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

Machine Learning. A Bayesian and Optimization Perspective. Academic Press, Sergios Theodoridis 1. of Athens, Athens, Greece.

We use the overhead arrow to denote a column vector, i.e., a number with a direction. For example, in three-space, we write

Scientific Computing: Dense Linear Systems

RECURSIVE ESTIMATION AND KALMAN FILTERING

A Multi-Affine Model for Tensor Decomposition

Adaptive Systems Homework Assignment 1

Transcription:

Least Squares with Eamples in Signal Processing Ivan Selesnick March 7, 3 NYU-Poly These notes address (approimate) solutions to linear equations by least squares We deal with the easy case wherein the system matri is full rank If the system matri is rank deficient, then other methods are needed, eg, QR decomposition, singular value decomposition, or the pseudo-inverse, [, 3] In these notes, least squares is illustrated by applying it to several basic problems in signal processing: Linear prediction Smoothing 3 Deconvolution 4 System identification 5 Estimating missing data For the use of least squares in filter design, see [] Notation We denote vectors in lower-case bold, ie, = () N We denote matrices in upper-case bold The transpose of a vector or matri in indicated by a superscript T, ie, T is the transpose of The notation refers to the Euclidian length of the vector, ie, = + + + N () For feedback/corrections, email selesi@polyedu Matlab software to reproduce the eamples in these notes is available on the web or from the author http://eewebpolyedu/iselesni/lecture_notes/ Last edit: November 6, 3 The sum of squares of is denoted by, ie, = n (n) = T (3) The energy of a vector refers to In these notes, it is assumed that all vectors and matrices are real-valued In the comple-valued case, the conjugate transpose should be used in place of the transpose, etc Overdetered equations Consider the system of linear equations y = H If there is no solution to this system of equations, then the system is overdetered This frequently happens when H is a tall matri (more rows than columns) with linearly independent columns In this case, it is common to seek a solution imizing the energy of the error: J() = y H Epanding J() gives J() = (y H) T (y H) (4) = y T y y T H T H T y + T H T H (5) = y T y y T H + T H T H (6) Note that each of the four terms in (5) are scalars Note also that the scalar T H T y is the transpose of the scalar y T H, and hence T H T y = y T H Taking the derivative (see Appendi A), gives J() = HT y + H T H Setting the derivative to zero, J() = = HT H = H T y

Let us assume that H T H is invertible Then the solution is given by = (H T H) H T y This is the least squares solution y H = = (H T H) H T y (7) In some situations, it is desirable to imize the weighted square error, ie, n w n r n where r is the residual, or error, r = y H, and w n are positive weights This corresponds to imizing W / (y H) where W is the diagonal matri, [W] n,n = w n Using (7) gives W / (y H) = = (H T WH) H T Wy (8) where we have used the fact that W is symmetric 3 Underdetered equations Consider the system of linear equations y = H Set the derivatives to zero to get: = HT µ () y = H () Plugging () into () gives y = HHT µ Let us assume HH T is invertible Then µ = (HH T ) y (3) Plugging (3) into () gives the least squares solution: = H T (HH T ) y We can verify that in this formula does in fact satisfy y = H by plugging in: If there are many solutions, then the system is underdetered This frequently happens when H is a wide matri (more columns than rows) with linearly independent rows In this case, it is common to seek a solution with imum norm That is, we would like to solve the optimization problem So, H = H [ H T (HH T ) y ] = (HH T )(HH T ) y = y st y = H = = H T (HH T ) y (4) (9) such that y = H () Minimization with constraints can be done with Lagrange multipliers So, define the Lagrangian: L(, µ) = + µ T (y H) Take the derivatives of the Lagrangian: L() = HT µ L() = y H µ In some situations, it is desirable to imize the weighted energy, ie, n w n n, where w n are positive weights This corresponds to imizing W / where W is the diagonal matri, [W] n,n = w n The derivation of the solution is similar, and gives W / st y = H = = W H T ( HW H T ) y This solution is also derived below, see (5) 4 Regularization (5) In the overdetered case, we imized y H In the underdetered case, we imized Another approach is to imize the

weighted sum: c y H + c The solution depends on the ratio c /c, not on c and c individually A common approach to obtain an ineact solution to a linear system is to imize the objective function: J() = y H + λ (6) Setting the derivative to zero, J() = = HT H + λa T A = H T y So the solution is given by = (H T H + λa T A) = H T y where λ > Taking the derivative, we get J() = HT (H y) + λ So, = (H T H + λa T A) H T y Setting the derivative to zero, J() = = HT H + λ = H T y So the solution is given by So, = (H T H + λi) H T y = (H T H + λi) = H T y y H + λ = = (H T H + λi) H T y (7) This is referred to as diagonal loading because a constant, λ, is added to the diagonal elements of H T H The approach also avoids the problem of rank deficiency because H T H + λi is invertible even if H T H is not In addition, the solution (7) can be used in both cases: when H is tall and when H is wide 5 Weighted regularization A more general form of the regularized objective function (6) is: J() = y H + λ A where λ > Taking the derivative, we get J() = HT (H y) + λa T A y H + λ A = = (H T H + λa T A) H T y Note that if A is the identity matri, then equation (8) becomes (7) 6 Constrained least squares (8) Constrained least squares refers to the problem of finding a least squares solution that eactly satisfies additional constraints If the additional constraints are a set of linear equations, then the solution is obtained as follows The constrained least squares problem is of the form: y H (9) such that C = b () Define the Lagrangian, L(, µ) = y H + µ T (C b) The derivatives are: L() = HT (H y) + C T µ L() = C b µ 3

Setting the derivatives to zero, L() = = = (HT H) (H T y 5C T µ) () L() = = C = b () µ Multiplying () on the left by C gives C, which from () is b, so we have C(H T H) (H T y 5C T µ) = b or, epanding, C(H T H) H T y 5C(H T H) C T µ = b Solving for µ gives µ = ( C(H T H) C T ) ( C(H T H) H T y b ) Plugging µ into () gives = (H T H) ( H T y C T ( C(H T H) C T ) ( C(H T H) H T y b )) Let us verify that in this formula does in fact satisfy C = b, So, C = C(H T H) ( H T y C T ( C(H T H) C T ) ( C(H T H) H T y b )) = C(H T H) H T y C(H T H) C T ( C(H T H) C T ) ( C(H T H) H T y b ) = C(H T H) H T y ( C(H T H) H T y b ) = b y H st C = b = = (H T H) ( H T y C T ( C(H T H) C T ) ( C(H T H) H T y b )) (3) 6 Special cases Simpler forms of (3) are frequently useful For eample, if H = I and b = in (3), then we get y st C = = = y C T ( CC T ) Cy (4) If y = in (3), then we get H st C = b = = (H T H) C T ( C(H T H) C T ) b (5) If y = and H = I in (3), then we get st C = b = = C T ( CC T ) b (6) which is the same as (4) 7 Note The epressions above involve matri inverses For eample, (7) involves (H T H) However, it must be emphasized that finding the least square solution does not require computing the inverse of H T H even though the inverse appears in the formula Instead, in (7) should be obtained, in practice, by solving the system A = b where A = H T H and b = H T y The most direct way to solve a linear system of equations is by Gaussian eliation Gaussian eliation is much faster than computing the inverse of the matri A 8 Eamples 8 Polynomial approimation An important eample of least squares is fitting a low-order polynomial to data Suppose the N-point data is of the form (t i, y i ) for i N The goal is to find a polynomial that approimates the data by imizing the energy of the residual: E = i (y i p(t i )) 4

where p(t) is a polynomial, eg, p(t) = a + a t + a t Data 5 5 Polynomial approimation (degree = ) 5 5 Polynomial approimation (degree = 4) 5 5 Figure : Least squares polynomial approimation The problem can be viewed as solving the overdetered system of equations, y t t y t t a a, a y N t N t N which we denote as y Ha The energy of the residual, E, is written as E = y Ha From (7), the least squares solution is given by a = (H T H) H T y An eample is illustrated in Fig 8 Linear prediction One approach to predict future values of a time-series is based on linear prediction, eg, y(n) a y(n ) + a y(n ) + a 3 y(n 3) (7) If past data y(n) is available, then the problem of finding a i can be solved using least squares Finding a = (a, a, a ) T can be viewed as one of solving an overdetered system of equations For eample, if y(n) is available for n N, and we seek a third order linear predictor, then the overdetered system of equations are given by y(3) y() y() y() y(4) y(3) y() y() a a, a 3 y(n ) y(n ) y(n 3) y(n 4) which we can write as ȳ = Ha where H is a matri of size (N 3) 3 From (7), the least squares solution is given by a = (H T H) H T ȳ Note that H T H is small, of size 3 3 only Hence, a is obtained by solving a small linear system of equations 5

Given data to be predicted Once the coefficients a i are found, then y(n) for n > N can be estimated using the recursive difference equation (7) An eample is illustrated in Fig One hundred samples of data are available, ie, y(n) for n 99 From these samples, a p-order linear predictor is obtained by least squares, and the subsequent samples are predicted 5 5 Predicted samples (deg = 3) p = 3 5 5 Predicted samples (deg = 4) p = 4 5 5 Predicted samples (deg = 6) p = 6 5 5 Figure : Least squares linear prediction 83 Smoothing One approach to smooth a noisy signal is based on least squares weighted regularization The idea is to obtain a signal similar to the noisy one, but smoother The smoothness of a signal can be measured by the energy of its derivative (or second-order derivative) The smoother a signal is, the smaller the energy of its derivative is Define the matri D as D = (8) Then D is the second-order difference (a discrete form of the second-order derivative) of the signal (n) See Appendi B If is smooth, then D is small in value If y(n) is a noisy signal, then a smooth signal (n), that approimates y(n), can be obtained as the solution to the problem: y + λ D (9) where λ > is a parameter to be specified Minimizing y forces to be similar to the noisy signal y Minimizing D forces to be smooth Minimizing the sum in (9) forces to be both similar to y and smooth (as far as possible, and depending on λ) If λ =, then the solution will be the noisy data, ie, = y, because this solution makes (9) equal to zero In this case, no smoothing is achieved On the other hand, the greater λ is, the smoother will be Using (8), the signal imizing (9) is given by = (I + λd T D) y (3) 6

5 5 Noisy data 5 5 5 5 3 5 5 Least squares smoothing 5 5 5 5 3 Figure 3: Least squares smoothing Note that the matri I + λd T D is banded (The only non-zero values are near the main diagonal) Therefore, the solution can be obtained using fast solvers for banded systems [5, Sect 4] An eample of least squares smoothing is illustrated in Fig 3 A noisy ECG signal is smoothed using (3) We have used the ECG waveform generator ECGSYN [4] 84 Deconvolution Deconvolution refers to the problem of finding the input to an LTI system when the output signal is known Here, we assume the impulse response of the system is known The output, y(n), is given by y(n) = h() (n) + h() (n ) + + h(n) (n N) (3) where (n) is the input signal and h(n) is the impulse response Equation (3) can be written as y = H where H is a matri of the form h() h() h() H = h() h() h() This matri is constant-valued along its diagonals Such matrices are called Toeplitz matrices It may be epected that can be obtained from y by solving the linear system y = H In some situations, this is possible However, the matri H is often singular or almost singular In this case, Gaussian eliation encounters division by zeros For eample, Fig 4 illustrates an input signal, (n), an impulse response, h(n), and the output signal, y(n) When we attempt to obtain by solving y = H in Matlab, we receive the warning message: Matri is singular to working precision and we obtain a vector of all NaN (not a number) Due to H being singular, we regularize the problem Note that the input signal in Fig 4 is mostly zero, hence, it is reasonable to seek a solution with small energy The signal we seek should also satisfy y = H, at least approimately To obtain such a signal,, we solve the problem: y H + λ (3) where λ > is a parameter to be specified Minimizing y H forces to be consistent with the output signal y Minimizing forces to have low energy Minimizing the sum in (3) forces to be consistent with y and to have low energy (as far as possible, and depending on λ) Using (8), the signal imizing (3) is given by = (H T H + λi) H T y (33) This technique is called diagonal loading because λ is added the diagonal of H T H A small value of λ is sufficient to make the matri invertible The solution, illustrated in Fig 4, is very similar to the original input signal, shown in the figure In practice, the available data is also noisy In this case, the data y is given by y = H + w where w is the noise The noise is often modeled as an additive white Gaussian random signal In this case, diagonal loading with a small λ will generally produce a noisy estimate of the input signal In Fig 4, we used λ = When the same value is used with the noisy data, a noisy result is obtained, as illustrated in Fig 5 A larger λ is needed so as to attenuate the noise But if λ is too large, then the estimate of the input signal is distorted Notice that with λ =, the noise is reduced 7

3 Input signal 5 5 5 3 4 Impulse response 4 5 5 5 3 5 5 3 Output signal (noise free) 5 5 5 3 Deconvolution (diagonal loading) λ = 5 5 5 3 Figure 4: Deconvolution of noise-free data by diagonal loading but the height of the the pulses present in the original signal are somewhat attenuated With λ = 5, the noise is reduced slightly more, but the pulses are substantially more attenuated To improve the deconvolution result in the presence of noise, we can imize the energy of the derivative (or second-order derivative) of instead As in the smoothing eample above, imizing the energy of the second-order derivative forces to be smooth In order that is consistent with the data y and is also smooth, we solve the problem: y H + λ D (34) where D is the second-order difference matri (8) Using (8), the signal imizing (34) is given by = (H T H + λd T D) H T y (35) The solution obtained using (35) is illustrated in Fig 6 Compared to the solutions obtained by diagonal loading, illustrated in Fig 5, this solution is less noisy and less distorted This eample illustrates the need for regularization even when the data is noise-free (an unrealistic ideal case) It also illustrates the choice of regularizer (ie,, D, or other) affects the quality of the result 85 System identification System identification refers to the problem of estimating an unknown system In its simplest form, the system is LTI and input-output data is available Here, we assume that the output signal is noisy We also assume that the impulse response is relatively short The output, y(n), of the system can be written as y(n) = h (n) + h (n ) + h (n ) + w(n) (36) where (n) is the input signal and w(n) is the noise Here, we have assumed the impulse response h n is of length 3 We can write this in matri form as y y y y 3 3 h h h 8

Output signal (noisy) Output signal (noisy) 5 5 5 5 5 5 3 5 5 5 5 3 3 Deconvolution (derivative regularization) 3 Deconvolution (diagonal loading) λ = 5 5 5 3 λ = 5 5 5 3 Figure 6: Deconvolution of noisy data by derivative regularization 3 Deconvolution (diagonal loading) which we denote as y Xh If y is much longer than the length of the impulse response h, then X is a tall matri and y Xh is an overdetered system of equations In this case, h can be estimated from (7) as h = (X T X) X T y (37) λ = 5 5 5 3 3 Deconvolution (diagonal loading) λ = 5 5 5 5 3 Figure 5: Deconvolution of noisy data by diagonal loading An eample is illustrated in Fig 7 A binary input signal and noisy output signal are shown When it is assumed that h is of length, then we obtain the impulse response shown The residual, ie, r = y Xh, is also shown in the figure It is informative to plot the root-mean-square-error (RMSE), ie, r, as a function of the length of the impulse response This is a decreasing function If the data really is the input-output data of an LTI system with a finite impulse response, and if the noise is not too severe, then the RMSE tends to flatten out at the correct impulse response length This provides an indication of the length of the unknown impulse response 86 Missing sample estimation Due to transmission errors, transient interference, or impulsive noise, some samples of a signal may be lost or so badly corrupted as to be unusable In 9

Input signal 5 5 Signal with missing samples 4 6 8 5 5 5 Output signal (noisy) 5 5 Estimated samples 4 6 8 5 5 5 8 6 4 Estimated impulse response (length ) 4 6 8 5 5 Final output 5 5 5 RMSE Residual (RMSE = 83) 4 6 8 8 6 4 RMSE vs impulse response length 5 5 Length of impulse response Figure 7: Least squares system identification Figure 8: Least squares estimation of missing data this case, the missing samples should be estimated based on the available uncorrupted data To complicate the problem, the missing samples may be randomly distributed through out the signal Filling in missing values in order to conceal errors is called error concealment [6] This eample shows how the missing samples can be estimated by least squares As an eample, Fig 8 shows a -point ECG signal wherein samples are missing The problem is to fill in the missing samples To formulate the problem as a least squares problem, we introduce some notation Let be a signal of length N Suppose K samples of are known, where K < N The K-point known signal, y, can be written as y = S (38) where S is a selection (or sampling ) matri of size K N For eample,

if only the first, second and last elements of a 5-point signal are observed, then the matri S is given by S = (39) The matri S is the identity matri with rows removed, corresponding to the missing samples Note that S removes two samples from the signal, () () () S = () = () = y (4) (3) (4) (4) The vector y consists of the known samples of So, the vector y is shorter than (K < N) The problem can be stated as: Given the signal, y, and the matri, S, find such that y = S Of course, there are infinitely many solutions Below, it is shown how to obtain a smooth solution by least squares Note that S T y has the effect of setting the missing samples to zero For eample, with S in (39) we have y() y() y() S T y = y() = (4) y() y() Let us define S c as the complement of S The matri S c consists of the rows of the identity matri not appearing in S Continuing the 5-point eample, S c = [ Now, an estimate ˆ can be represented as ] (4) ˆ = S T y + S T c v (43) where y is the available data and v consists of the samples to be detered For eample, y() y() [ ] y() S T y + S T c v = y() + v() = v() v() (44) y() v() y() The problem is to estimate the vector v, which is of length N K Let us assume that the original signal,, is smooth Then it is reasonable to find v to optimize the smoothness of ˆ, ie, to imize the energy of the second-order derivative of ˆ Therefore, v can be obtained by imizing Dˆ where D is the second-order difference matri (8) Using (43), we find v by solving the problem ie, v D(S T y + S T c v), (45) v DS T y + DS T c v (46) From (7), the solution is given by v = (S c D T D S T c ) S c D T D S T y (47) Once v is obtained, the estimate ˆ in (43) can be constructed simply by inserting entries v(i) into y An eample of least square estimation of missing data using (47) is illustrated in Fig 8 The result is a smoothly interpolated signal We make several remarks All matrices in (47) are banded, so the computation of v can be implemented with very high efficiency using a fast solver for banded systems [5, Sect 4] The banded property of the matri G = S c D T D S T c (48) arising in (47) is illustrated in Fig 9

3 4 5 6 7 8 9 4 6 8 nz = 8 Figure 9: Visualization of the banded matri G in (48) All the non-zero values lie near the main diagonal The method does not require the pattern of missing samples to have any particular structure The missing samples can be distributed quite randomly 3 This method (47) does not require any regularization parameter λ be specified However, this derivation does assume the available data, y, is noise free If y is noisy, then simultaneous smoothing and missing sample estimation is required (see the Eercises) Speech de-clipping: In audio recording, if the amplitude of the audio source is too high, then the recorded waveform may suffer from clipping (ie, saturation) Figure shows a speech waveform that is clipped All values greater than in absolute value are lost due to clipping To estimate the missing data, we can use the least squares approach given by (47) That is, we fill in the missing data so as to imize the energy of the derivative of the total signal In this eample, we imize the energy of the the third derivative This encourages the filled in data to have the form of a parabola (second order polynomial), because the third derivative of a parabola is zero In order to use (47) for this problem, we 6 4 4 6 4 4 6 4 4 Clipped speech with 39 missing samples 3 4 5 Estimated samples 3 4 5 Final output 3 4 5 Figure : Estimation of speech waveform samples lost due to clipping The lost samples are estimated by least squares only need to change the matri D to the following one If we define the matri D as D = 3 3 3 3 3 3, (49) then D is an approimation of the third-order derivative of the signal Using (47) with D defined in (49), we obtain the signal shown in the Fig The samples lost due to clipping have been smoothly filled in Figure shows both, the clipped signal and the estimated samples, on the same ais

Filled in (red) Clipped data (blue) eperiments and compare results of -nd and 3-rd order difference matrices 6 4 4 3 4 5 Figure : The available clipped speech waveform is shown in blue The filled in signal, estimated by least squares, is shown in red 9 Eercises Find the solution to the least squares problem: y A + λ b Show that the solution to the least squares problem is λ b A + λ b A + λ 3 b 3 A 3 = ( λ A T A + λ A T A + λ 3 A T 3 A 3 ) ( λ A T b + λ A T b + λ 3 A T 3 b 3 ) (5) 3 In reference to (7), why is H T H + λi with λ > invertible even if H T H is not? 4 Show (56) 5 Smoothing Demonstrate least square smoothing of noisy data Use various values of λ What behavior do you observe when λ is very high? 6 The second-order difference matri (8) was used in the eamples for smoothing, deconvolution, and estimating missing samples Discuss the use of the third-order difference instead Perform numerical A 7 System identification Perform a system identification eperiment with varying variance of additive Gaussian noise Plot the RMSE versus impulse response length How does the plot of RMSE change with respect to the variance of the noise? 8 Speech de-clipping Record your own speech and use it to artificially create a clipped signal Perform numerical eperiments to test the least square estimation of the lost samples 9 Suppose the available data is both noisy and that some samples are missing Formulate a suitable least squares optimization problem to smooth the data and recover the missing samples Illustrate the effectiveness by a numerical demonstration (eg using Matlab) Vector derivatives If f() is a function of,, N, then the derivative of f() with respect to is the vector of derivatives, f() f() f() = (5) f() N This is the gradient of f, denoted f By direct calculation, we have and T b = b (5) bt = b (53) Suppose that A is a symmetric real matri, A T = A Then, by direct calculation, we also have T A = A (54) 3

Also, and (y )T A(y ) = A( y), (55) A b = A T (A b) (56) We illustrate (54) by an eample Set A as the matri, [ ] 3 A = (57) 5 Then, by direct calculation [ ] [ ] [ T 3 A = 5 so and ] = 3 + 4 + 5 (58) ( T A ) = 6 + 4 (59) ( T A ) = 4 + (6) Let us verify that the right-hand side of (54) gives the same: [ ] [ ] [ ] 3 6 + 4 A = = (6) 5 4 + B The Kth-order difference The first order difference of a discrete-time signal (n), n Z, is defined as y(n) = (n) (n ) (6) This is represented as a system with input and output y, The second order difference is obtained by taking the first order difference twice: D D y which give the difference equation y(n) = (n) (n ) (n ) (63) The third order difference is obtained by taking the first order difference three times: D D D y which give the difference equation y(n) = (n) 3 (n ) + 3 (n ) (n 3) (64) In terms of discrete-time linear time-invariant systems (LTI), the first order difference is an LTI system with transfer function D(z) = z The second order difference has the transfer function D (z) = ( z ) = z + z The third order difference has the transfer function D 3 (z) = ( z ) 3 = 3z + 3z z 3 Note that the coefficient come from Pascal s triangle: 3 3 4 6 4 D y 4

C Additional Eercises for Signal Processing Students For smoothing a noisy signal using least squares, we obtained (3), = (I + λd T D) y, λ > The matri G = (I + λd T D) can be understood as a low-pass filter Using Matlab, compute and plot the output, y(n), when the input is an impulse, (n) = δ(n n o ) This is an impulse located at inde n o Try placing the impulse at various points in the signal For eample, put the impulse around the middle of the signal What happens when the impulse is located near the ends of the signal (n o = or n o = N )? [5] W H Press, S A Teukolsky, W T Vetterling, and B P Flannery Numerical recipes in C: the art of scientific computing (nd ed) Cambridge University Press, 99 [6] Y Wang and Q Zhu Error control and concealment for video communication: A review Proc IEEE, 86(5):974 997, May 998 For smoothing a noisy signal using least squares, we obtained (3), = (I + λd T D) y, λ >, where D represents the K-th order derivative Assume D is the first-order difference and that the matrices I and D are infinite is size Then I + λd T D is a convolution matri and represents an LTI system (a) Find and sketch the frequency response of the system I + λd T D (b) Based on (a), find the frequency response of G = (I + λd T D) Sketch the frequency response of G What kind of filter is it (low-pass, high-pass, band-pass, etc)? What is its dc gain? References [] C Burrus Least squared error design of FIR filters Conneions Web site, http://cnorg/content/m689/3/ [] C Burrus General solutions of simultaneous equations Conneions Web site, 3 http://cnorg/content/m956/5/ [3] C L Lawson and R J Hanson Solving Least Squares Problems SIAM, 987 [4] P E McSharry, G D Clifford, L Tarassenko, and L A Smith A dynamical model for generating synthetic electrocardiogram signals Trans on Biomed Eng, 5(3):89 94, March 3 5