Some notes about signals, orthogonal polynomials and linear algebra

Size: px

Start display at page:

Download "Some notes about signals, orthogonal polynomials and linear algebra"

Kelley Carroll
5 years ago
Views:

1 Some notes about signals, orthogonal polynomials and linear algebra Adhemar Bultheel Report TW 180, November 1992 Revised February 1993 n Katholieke Universiteit Leuven Department of Computer Science Celestijnenlaan 200A B-3001 Heverlee (Belgium)

2 Some notes about signals, orthogonal polynomials and linear algebra Adhemar Bultheel Report TW 180, November 1992 Revised February 1993 Department of Computer Science, KULeuven Abstract These short notes contain an introduction to filtering of deterministic and stochastic signals The connection with algorithms from classical complex analysis (the Schur algorithm) and with the recurrence relation (Szegő) for polynomials orthogonal with respect to a measure supported on the unit circle are both given The interpretation of these algorithms in terms of linear algebra lead to fast algorithms for structured matrices It is explained how these algorithms can be generalized to situations where the signal is not stationary The corresponding notion of matrices with low displacement rank is introduced The notes are organized like lecture notes and contain several exercises that form an essential part of the text Solutions are provided at the end Keywords : signals, orthogonal polynomials, linear algebra, displacement rank MSC : Primary : 93E11 Secondary : 65F05, 41A20

3 Some notes about signals, orthogonal polynomials and linear algebra Adhemar Bultheel November 23, Discrete signals and digital filters Let s(t) be a continuous time signal that is sampled to obtain a discrete time signal or time series s(nt ) T is the sampling period and 1/T is the sampling frequency We shall abbreviate s(nt ) as s n Denote by l the space of complex double sequences l = {s = (s n ) : s n C, n Z} Note: In practical applications, a signal is real valued, but considering complex signals does not really complicate the analysis On the contrary, we shall make use of complex analysis and considering complex signals seems to be quite natural On the other hand, it is easier to leave away the complex conjugate bar if you want to specialize to real signals than to introduce it at appropriate places if you want to use results from complex analysis Let l D and l R be two subspaces of l and T : l D l R a linear operator As a special case we can define the shift operator Z by A transformation T is called shift-invariant if (Zs) n = s n 1, s l, n Z ZT = T Z Note that this implies immediately Z m T = T Z m, m Z A digital filter is a linear shift invariant transformation A signal s = (s n ) = (δ n0 ) = δ is called unit pulse (at time n = 0) and h = T δ for this pulse is called the (unit) impulse response of the filter T Consider the normed spaces l p = {s l : s p = ( s k p ) 1/p < }, k Z 0 < p < l = {s l : s = sup s k < } k Z The convolution of two signals u, h l is defined as u h = s, with s = (s n ) given by s n = m Z u m h n m, n Z if these sums converge 1

4 Note that the impulse response h completely characterizes the filter T s = T ( m s m (Z m u)) = s m (T Z m u) = m Z m T u = m s m (Z m h) = m s m h n m = (s h) n Exercise 11 Let h l 1, then the sums in the convolution u h converge if u l and T u = u h = h u, u l, h l 1 defines a digital filter T : l l T is a continuous transformation and h is the unit impulse response T δ = h A filter T is called BIBO (bounded input, bounded output) stable if the sequence T u is bounded whenever u is bounded (in some l p sense) Sometimes u 2 2 for u l 2 is called the energy of the signal u A signal u l is called causal if u k = 0 for k < 0 A filter is called causal if it maps causal signals into causal signals Exercise 12 Prove that T is causal iff u k = v k for k < m (T u) k = (T v) k for k < m Exercise 13 A filter T : l l defined by T u = h u and h l 1 with h causal is a causal filter Prove this The z-transform U(z) of a signal u is defined as the formal series U(z) = n Z u n z n Note: In the engineering literature this is usually defined with z 1 instead of z We use here the mathematical convention The space of signals l is often called the time domain, while the space of all the z-transforms is called the frequency domain Note that the z-transform of a convolution is the product of the z-transforms For example let h l 1, u l and y = T u = h u, then Y (z) = H(z)U(z) If h l 1, then H(z) converges for z = 1 to some function If h l 1 and causal, then H(z) converges for z 1 to a (complex) function analytic in z < 1 Another useful observation The shift operator Z in the time domain (Zu) n = (u n 1 ) corresponds in the frequency domain to a multiplication with z u n 1 z n = u n z n+1 = z u n z n = zu(z) n n n We shall abuse the notation z for Z, also in the time domain Thus from now on zu n = u n 1 Thus z represents the (backward) shift or time delay operator 2

5 Exercise 14 Let u = (u k ) be a given signal with samples of the form I I u k = α i e jωik = λ 0 + λ i cos(ω i k + ϕ i ), i= I i=1 k Z for 1 I <, α 0 = λ 0 0, ω 0 = 0 and λ i > 0 for 1 i I, ω i = ω i R, ϕ i = ϕ i = ϕ i R, α i = ᾱ i = (1/2) λ i e jϕ i Let T : l l be defined by T u = h u, h l 1 Then (T u) k = I α i H(e jω i ) e jω ik i= I = λ 0 H(1) + I λ i H(e jω i ) cos (ω i k ϕ i + arg H(e jω i )) The autocorrelation function r = (r n ) for u l 2 is defined by i=1 r n = k u n+k ū k, r = (r n ), n = 0, ±1, ±2, and r 0 = u k 2 is called the total energy or average power of the signal u Note 0 < r 0 < if u l 2 Exercise 15 Prove r n = r n Define R u (z) as the z-transform of the autocorrelation function: R u (z) = n r nz n The function R u (e jω ) defined on the unit circle is called the energy spectrum or power spectrum of u If (r n ) l 1, then R u (z) is a function defined on z = 1 and its Fourier coefficients are given by r n = 1 2π π π e jnω R u (e jω )dω The autocorrelation function (r n ) and the energy spectrum form a Fourier transform pair Note that R u (z) = n ( k u n+k ū k )z n = i,k u i ū k z i k = ( i u i z i ) ( k ū k z k ) with U = U(1/ z) If u l 2, = U(z) U (z) R u (e jω ) = U(e jω ) 2 Indeed, U (e jω ) = U(e jω ) From Fourier analysis we borrow the following isomorphy relation : u = (u n ) l p U(e jω ) = n u n e jnω L p, 1 p 3

6 where L p is the normed space { L p = f : f p = } 1 π 1/p f(e jω ) dω p < 2π π and Parseval s identity u p = U p holds If u l p is a causal signal in l p, then U(z) = k=0 u kz k is analytic in z < 1 and if it belongs to L p, then we say that it is an element from the Hardy space H p H p = {f L p and f analytic in z < 1} 2 Stochastic processes and signals Due to components in the signal such as measurement noise, a signal is often considered as a stochastic entity A sequence s = (s n ) of identically distributed random variables is called a (discrete) (stochastic) signal or a discrete stochastic process In general, a stochastic variable x is a complex function on a probability space (Σ, m), that is a set Σ with a probability measure m ( Σ dm = 1) The expectation operator E is defined for a random variable s as Es = s(σ)dm(σ) = the mean of s If s = (s n ) is a stochastic process with we define L 2 to be the closure of the linear hull Σ E s n 2 <, n Z, m { c n s n : c n C, n Z, m Z} m This L 2 is a Hilbert space with inner product u, v = u(σ)v(σ)dm(σ), u, v L 2 The autocorrelation function is defined by r(k, l) = s k, s l = The autocovariance is defined as Σ = E s k s l Σ s k (σ)s l (σ)dm(σ), c(k, l) = E(s k Es k )(s l Es l ) k, l Z = E s k s l E s k Es l Es l E s k + E s k Es l = E s k s l E s k Es l Clearly, if we have a zero-mean stochastic process then r(k, l) = c(k, l) and in that case r is also called the (auto)covariance function The process s = (s n ) is called stationary if r(k, l) = r(k + m, l + m), k, l, m Z 4

7 This means that the covariance function of a stochastic process depends only on the difference of the indices k and l We denote then r m = r(k, k + m) = s m, s 0 For a stationary process, s k 2 = s 0 2 = r 0 is called its energy or average power Exercise 21 Prove that for a stationary process r m = r m and the Toeplitz forms n i,j=0 c ic j r i j are all positive semi-definite The Fourier transform R of the autocorrelation function is called the (power) spectrum of the stochastic process: R(e jω ) = r m e jmω ; r m = 1 π e jmω R(e jω )dω 2π m π Note that the spectrum has the properties real : R(e jω ) = R(e jω ) positive : R(e jω ) 0 Hence it can be used as a weight function to define another Hilbert space L 2 with inner product f, g = 1 2π π π f(e jω )g(e jω )R(e jω )dω It consists of the square integrable complex functions for which f 2 = 1 2π π In this formalism we can for example write π f(e jω ) 2 R(e jω )dω < r m = z m, 1 More generally, the mapping s n e jnω is an isomorphism (the Kolmogorov isomorphism) between the two Hilbert spaces L 2 and L 2 defined in this section Moreover, the L 2 space of the deterministic case and the L 2 space in the stochastic case are mathematically the same This shows that the deterministic and the stochastic case can be treated in a uniform mathematical framework Let H(z) be a causal filter with impulse response h = (h n ) l 1 and z-transform (transfer function of the filter) H(z) = n=0 h nz n Note that we assume h to be a deterministic sequence Apply as input to the filter a stationary stochastic process u = (u n ), then the output s = (s n ) is also a stationary process and they are related by s n = m u m h n m or S(z) = H(z)U(z) It can be shown that their power spectra are related by R s (e jω ) = R u (e jω ) H(e jω ) 2 = R u (e jω )H(e jω )H (e jω ) A stationary process w = (w n ) is called white noise if E w n w m = δ nm w 2, Ew n = 0 5

8 Hence its power spectrum is constant : R(e jω ) = w 2 Often, one considers normalized white noise, ie, one takes w = 1 Note that white noise is the stochastic equivalent of a deterministic unit pulse in the sense that they have both the same perfectly flat spectrum A signal having a flat spectrum means that it does not contain any significant information The samples are completely uncorrelated to each other The process forgets immediately everything about its past There is no strategy to be detected in the signal observed Thus it is useless to try and predict the next output after observing what has happened in the past In a modern terminology, one would say that it has a completely chaotic behaviour As in the deterministic case, we shall also for stochastic signals use z as a notation for the time delay for example n n A(z)u m = a k u m k for A(z) = a k z k 3 Linear prediction k=0 A major problem in signal processing is to predict its behaviour from the observations of the signal itself That is to say, we want to model the filter (= system) which produces the particular signal we are observing In many application, (eg EEG-signals, speech processing, ) we do not have access to the input signal and we can only observe the output From the observation of the present and the past, we want to predict the next observation by a linear relation Thus we estimate s n by ŝ n, given some observations s n 1, s n 2,, s n p where ŝ n = p a k s n k The coefficients a k are the predictor coefficients and A(z) = n k=0 a kz k, a 0 = 1 is called the predictor The error or residual for s n is e n = s n ŝ n = s n + k=0 p a k s n k = A(z)s n, a 0 = 1 We want to minimize the least squares error, ie the energy of the error signal E p = e 2 which is given by n e n 2 in the deterministic case and by E e n 2 in the stochastic case Note that E e n 2 is independent of n in the stationary case 31 Deterministic case We can describe this as a linear least squares problem with infinitely many equations Define the column vectors s = (s n ) (the signal), e = (e n ) (the prediction errors), ŝ = (ŝ n ) (the predicted signal), and a = (a i ) p i=1 (predictor coefficients) and also the p matrix S with columns zs, z 2 s,, z p s Then e = s ŝ = s + Sa Minimizing e corresponds to finding the least squares solution of the system Sa = s Its solution is given by the normal equations (S H S)a = S H s 6

9 where S H S is a p p matrix and S H s a p 1 vector These are obtained by setting or more explicitly: E a i = 0, i = 1,, p; E p = e 2 ( ) p s n k s n i a k = n n s n s n i, i = 1,, p Exercise 31 Show that the least squared prediction error is given by E p = n s n 2 + p a k ( n s n s n k ) In principle, the summation for n should range over Z Practically however, we can t work with infinite sequences or an infinite system of equations That is why we did not specify the range for the summation of n We can then take two approaches: either we consider only a moderate number of the infinite set of equations Sa = s (and just hope that it will give a meaningful solution as well) or we want to consider the infinite set of equations by all means As we shall see below, this assumes that it is possible by some device to perform infinite summations (or at least approximate them adequately) We accordingly get two different situations The latter being the simpler one (mathematically that is) The methods are known in the literature as covariance and autocorrelation method respectively This terminology is certainly misleading because it has nothing in common with the difference between correlation and covariance 311 The autocorrelation method Here the error is minimized over an infinite duration The normal equations reduce to p r i k a k = r i, 1 i p where r i = s ns n+i Thus (S H S) ik = r i k is a p p Toeplitz matrix (hermitian and positive (semi) definite) The least squared error is given by E p = r 0 + p r k a k Note: In practice, an infinite signal can be made finite by putting a window over it This means that we consider a window (w n ) which is only different from zero outside some interval, say 0 n N 1 Thus we consider s where { s sn w n = n 0 n N 1 0 otherwise The autocorrelation function is then r i (N) = N 1 i n=0 s ns n+i, i 0 Of course a more direct method is to minimize only over a finite interval This is done in the next subsection 7

10 312 The covariance method Here we minimize the prediction error over a finite interval 0 n N 1 The normal equations become p r ik a k = r i0, 1 i p where now r ik = N 1 n=0 s n i s n k The matrix of the system is no longer Toeplitz, but it is still hermitian and positive (semi) definite It has still a special structure as the product of two Toeplitz matrices: S H S Of course, for N, the covariance method reduces to the autocorrelation method 32 Stochastic case Here similar results can be obtained as for the deterministic case Minimizing gives the normal equations: E p = E e n 2 = E s n + p a k s n k 2 p a k Es n k s n i = Es n s n i, 1 i p with least squared error Now we can also distinguish two cases: 321 Stationary process E p = E s n 2 + p a k E s n s n k If the process is stationary, then the autocorrelation function r i of the process r i k = Es n k s n i is independent of n and we get the same Toeplitz system as we had before, which in the stochastic literature is known as the set of Yule-Walker equations 322 Nonstationary case The autocorrelation function r(n k, n i) = E s n k s n i does depend on n now and the system we have to solve is no longer Toeplitz However, when the process is close to stationary, the system will have a matrix that is close to Toeplitz For more details see later sections 8

11 4 The filter model 41 The autoregressive model In the previous analysis, we did not take into account what the input was However, if we want to simulate (model) the signal, we should produce it by some system (filter) which has got to have some input, otherwise nothing would happen Observe that if we filter the signal s through the finite impulse response filter (predictor) A(z) = p 0 a kz k, a 0 = 1, then the output is the error signal (e n ): or explicitly: A(z)S(z) = E(z) or A(z)s n = e n, n Z e n = p a k s n k = s n + k=0 p a k s n k = s n ŝ n If we invert this filter and feed the error e as input, we shall get the signal s reconstructed Mathematically this is simply writing S(z) = 1 A(z) E(z), which means that we have to build a filter with transfer function 1/A(z), which has an infinite impulse response Unfortunately, for practical reasons, it is impossible to apply the exact error So the real problem is to find out what input signal to apply which can replace the error e Therefore we make the following reasoning Our objective was to extract as much information as possible from the signal Thus what remains (the error) should be completely meaningless, ie it should be random noise which is as white as possible In the deterministic case, this is a unit impulse However, when the filter with transfer function 1/A(z) is driven by a unit pulse, the impulse response will always start with h 0 = 1 because A(0) = 1 This need not be true for the signal we want to model, since s 0 need not be 1 So, we need some scaling factor G We propose a modeling filter of the form G/A(z), G > 0, such that if u (impulse or white noise) is the input (with flat spectrum U 2 = 1), then the output s is a model (approximation) for the true signal s The factor G is called the gain of this filter Thus the system which models the signal looks like S (z) = G A(z) U(z) = 1 A(z) E (z), u n G A(z) s n We see that an equivalent formulation is that we use 1/A(z) as transfer function and apply the input E (z) = GU(z) Thus we can also say that we model the signal by replacing the true error e by the approximation e n = Gu n with u = (w n ) (unit) white noise in the stochastic case and u = (δ n0 ) a unit impulse in the deterministic case The filter A(z) itself is called the whitening filter because it makes e n = A(z)s n as white as possible So, our problem is to find a value of the gain G It is defined by requiring that the energy of the model s is equal to the energy of the true signal s Now, requiring that s 2 = s 2, corresponds to requiring that we have equality of the Fourier coefficients r 0 = r 0 Thus we have chosen the model p p s n = a k s n k + Gu n for s n = a k s n k + e n so that it satisfies r 0 = r 0 9

12 Exercise 41 Prove that the autocorrelation function r i of the deterministic signal (s n) satisfies p r 0 = a k r k + G2 r i = p a k r i k, i 1 (Assume the system is causal, thus that s n = 0 for n < 0) Exercise 42 Prove that if r 0 = r 0 then (use previous exercise) r i = r i for i = 1,, p (and hence also r i = r i, i = 1,, p) Warning: this exercise is very hard to solve with our current knowledge You better try again later If the previous exercises are solved, then it follows easily that G 2 = p a k r k = k=0 p a k r k = E p = e 2 (a 0 = 1) k=0 From the previous exercises, it follows that we can reformulate our modeling problem as: find a filter of the form G/A(z), A(z) = p k=0 a kz k, a 0 = 1, such that the autocorrelation function r i of the output (impulse response) matches the autocorrelation function of the signal for p i p Thus G R s (z) 2 = A(z)A (z) = r k zk and R s (z) = r k z k with r k = r k, k = p,, p This is an example of a (0/p) Laurent-Padé approximant 42 Other models The previous example of a model is just one possibility to model a signal A more general model could be a filter of the form q 1 + b l z l H(z) = G B(z) A(z) = G l=1 p 1 + a k z k This is pole-zero model When a k = 0, 1 k p, we get an all-zero model or a moving average model (MA) If b l = 0, 1 l q, we obtain an all-pole model or an autoregressive (AR) model The general case is an ARMA model When we know the transfer function of the given filter, eg, H(z), then we can try Padé approximation to find B(z) and A(z) And as we have seen, we can always assume that we know the transfer function Either the input U(z) and the output S(z) is known, and then we get H(z) = S(z)/U(z), or we do not know the input, and then we assume that the S(z) is generated by some filter for which it is the transfer function If we assume that not the transfer function, but its spectrum, ie, its autocorrelation coefficients are known, then we can try to approximate it with a more general Laurent-Padé approximant To find the transfer function itself, we need the approximating spectrum in the factored form R s (z) = ( G B(z) A(z) 10 ) ( G B ) (z) A (z)

13 which is easily obtained for the denominator, but is more difficult for the numerator One problem is that the spectrum has to be nonnegative on the unit circle The approximating spectrum which is constructed by Laurent-Padé approximation can take negative values there We shall make another suggestion to solve the ARMA case later on 5 Spectral estimation The previous solution to the prediction or modeling problem can also be used to solve a spectral estimation problem in the following sense Note that the prediction error e is related to the signal s by E(z) = A(z)S(z) where A(z) is the predictor polynomial and thus is the spectrum of the signal s given by R(e jω ) = S(e jω ) 2 = E(ejω ) 2 A(e jω ) 2 The spectrum of the approximating signal s is given by Thus So that R (e jω ) = S (e jω ) 2 = R(e jω ) R (e jω ) = E(ejω ) 2 G 2 G 2 A(e jω ) 2 E p = G 2 = 1 π π E(e jω ) 2 dω = G2 R(e jω ) 2π π 2π π R (e jω ) dω This implies that the model is chosen so as to make 1 π R(e jω ) 2π π R (e jω dω = 1 ) Thus we approximate the given spectrum R by R in the previous sense, rather than eg minimizing R R 2! We can also write it as 1 π 1 2π π R(e jω ) 1 R (e jω R(e jω )dω = 0 ) One can in fact prove (but we shall not do it) that the linear prediction method proposes actually minimizes 1 S A 2 G = 1 π 1 2π π S(e jω ) A(ejω 2 ) G R(e jω )dω At the same time we have solved a spectral factorization problem approximately because R(e jω ) = S(e jω ) 2 G A(e jω ) G A (e jω ) with (as we shall see later) one factor (G/A(z)) analytic in z < 1 and the other (G/A (z)) analytic in z > 1 11

14 6 Toeplitz systems and orthogonal polynomials An alternative form to write the normal equations (autocorrelation method/stationary case) is r 0 r 1 r p 1 E p r 1 r 0 a 1 = 0 T pa p = E p e 0 r p r 0 a n 0 This shows the importance in this context to solve Toeplitz systems Let Ĩ be the reversal matrix: with 1 s on the main antidiagonal, then a Toeplitz matrix T satisfies Ĩ T Ĩ = T T Using this property, we can rephrase the previous system as (recall T H = T here) or r 0 r 1 r p r 1 r 0 r p r 0 ĨT p ĨĨA p = ĨE pe 0 = E p e p ā p ā 1 1 = 0 0 E p T pa p = E p e p These normal equations are related to the optimization problem min e 2, which in the frequency domain translates into: find p 2 E p = min a 1,,a p 1 + a k z k = A p 2 with optimal solution the predictor polynomial A p (z) = p k=0 a kz k, a 0 = 1, and the minimum being E p = A p (z) 2 = 1 π p a k e jωk 2 R(e jω )dω 2π π k=0 = 1 π p ā p k e jωk 2 R(e jω )dω 2π π = ϕ p (z) 2 where ϕ p (z) = z p + a 1 z p ā p = z p A p (1/ z) = z p A p (z) In the sequel we shall denote this transformation as ϕ p (z) = A p(z) Note that the column A p indeed contains the coefficients of the polynomial A p(z) Note also that in this Hilbert space k=0 z j 1 π p, ϕ p = ( ā p k z k )z j R(z)dω, 2π π k=0 p 1 π = ā p k z k j R(z)dω 2π k=0 π p = ā p k r j k k=0 z = e jω 12

15 Thus the previous Toeplitz system expresses that z j, ϕ p = 0 for j = 0, 1,, p 1 = E p for j = p In other words, ϕ p is an orthogonal (monic) polynomial with minimal norm z p, ϕ p = ϕ p 2 = E p Thus finding a p-th order predictor is equivalent to finding a (monic) orthogonal polynomial in this weighted Hilbert space Exercise 61 Prove that in any Hilbert space, the polynomial monic and of minimal norm is an orthogonal one Thus min{ p 2, p Π n and monic} is obtained for p = ϕ n where ϕ n is the monic orthogonal polynomial It is uniquely defined up to a constant factor of modulus 1 7 Computation of the predictor Special methods exist that require only O(p 2 ) operations to solve structured p p systems such as Toeplitz systems A basic method is called the Levinson algorithm (1947)7 Define for p = 0, 1, the Toeplitz matrices T p by and define A p = 1 a 1 a p T such that T p ij = r i j ; i, j = 0,, p, r i = r i T p A p = E p 0 0 T = E p e 0 Furthermore set A p = ā p ā 1 1 T Then, as long as E i 1 0 it holds that E 0 = r 0 for p = 1, 2, η p 1 = r p r 1 A p 1 ρ p = η p 1 /E p 1 Ap 1 A p = ρ 0 p E p = (1 ρ p 2 )E p 1 0 A p 1 Exercise 71 Prove that this algorithm gives indeed the solutions of the systems T i A i = E i e 0 for i = 0, 1, Exercise 72 Compute the complexity of this algorithm Note that in this algorithm, to compute the prediction of order p, we also compute all the previous ones and as a by-product we also get all the prediction errors E i Because E p has to be decreasing with p, and it is non-negative 0 E p E p 1 E 0 = r 0 It thus follows that ρ i 1, and if ρ p = 1, then E p = 0, which means that the model matches the given signal perfectly 13

16 The coefficients ρ i are crucial and they are called reflection coefficients (derived from scattering theory see later) or partial correlation (PARCOR) coefficients (from their stochastic interpretation) The predictor A i is called the forward predictor and A i is the backward predictor The latter is explained as follows: Interpreting again z as a delay operator, we have e n = A i (z)s n, A i (z) = 1 z z i A i, A i = 1 a 1 a i T where A i (z) is constructed to minimize the least squared error e 2 Similarly A i (z) appearing in f n = A i (z)s n, A i (z) = 1 z z i A i, A i = ā i ā 1 1 T = s n i + ā 1 s n i ā i s n can be seen as a (backward) prediction of s n i, given s n i+1,, s n, resulting in a backward prediction error f n It is constructed so as to minimize the least squared error f 2 In a stochastic context, the processes (f n ) and (e n ) are called the backward and forward innovation processes Exercise 73 Prove that in terms of the polynomials A i (z) and A i (z), the Levinson algorithm gives the recursion A i (z) A i (z) = 1 ρi ρ i 1 z Ai 1 (z) A i 1 (z) Ai 1 (z) = ϑ i (z) A i 1 (z) With the interpretations above, we can derive the circuit of figure 1 where e n (i) means the forward prediction error for s n when using predictor A i (z) etc This is called a lattice realization of one e n (i 1) f n (i 1) e n (i) ρ i z ρ i f n (i) e n (i 1) f n (i 1) i e n (i) f n (i) Figure 1: Lattice filter section of the filter Thus for such a realization, we do not need the predictor coefficients a i explicitly, it is sufficient to know the reflection coefficients ρ i The whitening filter (which analyses the signal) can be realized with such blocks as illustrated in figure 2 The modeling filter (which synthesises the signal) has the realization of figure 3 where the i-th block has now the inverse lattice form as depicted in figure 4 Exercise 74 Check the inversion as suggested by figure 4 Note that the normalized predictors which generate e n(i) = e n(i) G i are given by the normalized recursion A i (z) A i (z) = 1 1 ρi 2 1 ρi ρ i 1 = A i(z) G i s n = A i(z)s n ; G 2 i = E i z A i 1 (z) A i 1 (z) A = θ i (z) i 1 (z) A i 1 (z) 14

17 s n e n (1) e n (2) e n (i 1) e n (i) 1 2 i f n (1) f n (2) f n (i 1) f n (i) Figure 2: Whitening filter e n (i) e n (i 1) e n (i 2) e n (1) s n i i 1 1 f n (i) f n (i 1) f n (i 2) f n (1) Figure 3: Modeling filter e n (i) f n (i) + ρ i ρ i + z e n (i 1) f n (i 1) e n (i) f n (i) i e n (i 1) f n (i 1) Figure 4: Inverse lattice section 15

18 8 Example : seismic signal processing Consider a layered medium (like the earth surface) with layers of equal thickness (See figure 5) A wave is propagating through this medium At every interface between two layers the wave is x Figure 5: A layered medium t partially reflected and partially transmitted, according to the relative physical properties of the two layers meeting there The wave equation has two particular solutions 2 u(x, t) x 2 = 1 2 u(x, t) c 2 t 2 (c = speed) u(x, t) = ẽ(t + x ) (ascending wave) c u(x, t) = f(t x ) (descending wave) c We also normalized the time axis, so that it takes half a time unit for a wave to travel through a layer Thus at integer time intervals, a wave reaches the surface To describe the scattering of this wave in the medium we use the following notations : f n(i) = descending wave, emerging from boundary i at time n f n (i) = descending wave, reaching boundary i + 1 at time n ẽ n (i) = ascending wave, emerging from boundary i + 1 at time n ẽ n(i) = ascending wave, reaching boundary i at time n These are illustrated in figure 6 We also suppose that the medium is passive and lossless, which means that no energy is added to the wave, nor that it looses energy Using z as a time delay : z 1/2 s n ( ) = s n 1 ( ), we have that in the homogeneous layer i : 2 fn (i) z 1/2 0 f = n (i) ẽ n (i) 0 z 1/2 ẽ n(i) which says that eg, signal f n (i) which emerges from boundary i at time n will reach boundary i + 1 unaltered, half a time unit later, where according to our notation, it is called f n (i) Thus f n (i) = z 1/2 f n (i) = f n 1 2 (i) The scattering matrix Σ i+1 which describes the interaction at boundary i + 1 should be unitary (orthogonal), because if f n (i + 1) fn (i) = Σ i+1 ẽ n (i) ẽ n(i + 1) 16

19 x t i layer i f (i) f (i) ẽ (i) ẽ (i) i + 1 ẽ (i + 1) f (i + 1) Figure 6: Notation of interacting waves then, no energy should be lost; which means f n (i) 2 + ẽ n(i + 1) 2 = n n n f n(i + 1) 2 + n ẽ n (i) 2 (the incoming energy equals the outgoing energy) This is guaranteed when Σ H i+1 Σ i+1 = I Thus we may choose Σ i+1 to be a Givens rotation cos θi+1 sin θ Σ i+1 = i+1 sin θ i+1 cos θ i+1 or, in the complex case ci+1 s Σ i+1 = i+1 s i+1 c i+1, c 2 i+1 + s i+1 2 = 1, c i+1 R Exercise 81 Prove that with this choice of the scattering matrix, the interaction can also be written in the form f n (i + 1) fn (i) ẽ = θ i+1 n(i + 1) ẽ n (i) with θ i+1 = 1 c i+1 1 s i+1 s i+1 1 = 1 1 ρi ρ i+1 ρ i+1 1, ρ i+1 = s i+1 The θ i+1 -matrix of the previous exercise is called a chain scattering matrix It represents a hyperbolic rotation It is unitary with respect to the indefinite metric 1 0 J = 0 1 ie, it is J-unitary, which means that θ H i+1jθ i+1 = J 17

20 This relation is (in the real case) satisfied by a matrix of the form cosh θi+1 sinh θ θ i+1 = i+1 sinh θ i+1 cosh θ i+1 Thus we may describe the complete effect of a layer, including its boundary by f n (i + 1) fn (i) f ẽ = θ n (i) i+1 n(i + 1) ẽ n (i) ẽ n(i) When switching to the time-delayed signals (waves) = θ i+1 z 1/2 0 0 z 1/2 e n(i) = z i/2 ẽ n(i) and f n(i) = z i/2 f n (i), this reduces to f n (i + 1) e n(i + 1) = 1 1 ρi ρ i+1 ρ i+1 1 z f n (i) e n(i) which corresponds to the normalized recursions for the forward and backward prediction errors This application explains the term reflection coefficient for ρ i Looking at figure 6 and the defining relation of Σ i+1, you see when we write the outgoing waves f n (i + 1) and ẽ n (i) in terms of the incoming waves ẽ n(i + 1) and f n (i) as f n(i + 1) = c i+1 fn (i) s i+1 ẽ n(i + 1) ẽ n (i) = s i+1 fn (i) + c i+1 ẽ n(i + 1) then you can check on the figure that the parts of the right hand side with a coefficient c i+1 correspond to transmitted parts and the parts with an s i+1 -coefficient to reflected parts Hence the term reflection coefficient for ρ i+1 = s i+1 The previous analysis suggests the following method to get a model for the earth surface, which is of crucial interest for eg, oil companies We apply an impulse as input at the surface (eg, an explosion) and measure the response of the (pressure) wave at the surface When the Levinson algorithm computes the reflection coefficients for the signal, it actually computes the relative density properties of the different earth layers below the surface, which may disclose the presence of a fluid (oil) or a gas bubble A problem of the above type, where you know the effect of some scattering medium to a signal, and where you want to model the medium is called an inverse scattering problem The same analysis can be applied to eg, an acoustic signal, scattered by an organ pipe or the vocal tract for a speech signal For small time intervals, a speech signal may be considered as stationary Computing the reflection coefficients results in a model for the speech-sound and they can be used to synthesize the speech again by a modeling filter Sending all the reflection coefficients (plus some extra information about pitch etc) uses much less bandwidth than sampling the speech signal and send these samples It is possible to get a considerable data reduction in this way Of course, in such applications the analysis and synthesis should be done in real time The CORDIC (coordinate rotation for digital computing) chip performs a hyperbolic rotation in a few clockcycles 9 Stability If H(z) L 2 is a filter which is causal (and has finite energy) then H(z) = k=0 h kz k represents a function analytic in z < 1 Thus a rational H(z) L 2 is stable ( H 2 ) if all its poles are in 18

21 z > 1 A rational filter is called minimal phase if all its zeros are in z > 1 For an AR model H(z) = G A n (z) = G ϕ n(z), A n(z) = ϕ n(z) = z n ϕ n (1/ z) = z n ϕ n(z) the filter is certainly minimal phase (all zeros at z = ) and it will be stable if all the zeros of A n (z) are in z > 1 or equivalently, the zeros of ϕ n (z) are all in z < 1 Exercise 91 Prove that if ϕ n (α) = 0 then ϕ n(1/ᾱ) = 0 too Since we identified ϕ n as an orthogonal polynomial, this property of its zeros is a consequence of that theory We can however give a direct proof Lemma 91 Define F n (z) = A n(z)/a n (z) with A n (z) = n a k z k Let A n (z) 0 for z 1 Then F n (z) (and zf n (z)) maps the unit disc onto itself Proof Writing A n(z) = ā n 0 i=1 (z α i), with all α i < 1, we check easily that z α i 1 ᾱ i z < 1, = 1, > 1 when z < 1, = 1, > 1 and thus, because also F n (z) = ā0 a 0 k=0 n i=1 z α i 1 ᾱ i z, F n (z) < 1, = 1, > 1 when z < 1, = 1, > 1 The same holds for zf n (z) Hence the functions are injective They are also surjective because the inverse of z w = z α 1 ᾱz is of the same form and thus also injective is w z = w + α 1 + wᾱ Theorem 92 A n (z) has all its zeros in z > 1 iff all ρ i < 1, i = 1,, n Proof This goes by induction on n We only prove the induction step Let A k (z) 0 for z < 1 and ρ k+1 < 1 By the previous lemma, zf k (z) = 1 ρ k+1 has no solution in z 1 Thus A k+1 (z) = A k (z) ρ k+1 za k (z) 0 in z 1 Conversely, if the zeros of A k are in z > 1, the product of its zeros is ( 1) k+1 ρ 1 k (a 0 = 1, a k = ρ k ) Hence ρ k < 1 Furthermore A k 1 (z) = 1 1 ρ k 2 A k(z) + ρ k A k (z) and this can only be zero if F k = 1/ρ k, which can only happen for z > 1 We may thus conclude that the Levinson algorithm generates a filter that is guaranteed to be minimal phase and stable, as long as the reflection coefficients are bounded by 1 When ρ n = 1, then E n = 0 and the approximation is exact Exercise 92 Prove that if ρ k < 1, k = 1, 2,, n 1 and ρ n = 1, then A n (z) = ηa n(z) with η = 1 and its zeros are all of modulus 1 19

22 10 Orthogonal polynomials We have seen before that the backward predictors ϕ n = A n are orthogonal polynomials wrt an inner product whose weight is the power spectrum R(e jω ) Its Fourier coefficients can be expressed as z n, 1 = r n, n Z They satisfy the recurrence relation (recall that ϕ n is monic) ϕ n (z) = zϕ n 1 (z) ρ n ϕ n 1(z), ϕ 0 = 1 or also with ϕn (z) ϕ n(z) ϑ n (z) = ϕn 1 (z) = ϑ n (z) ϕ n 1 (z) 1 ρ n ρ n 1 z Exercise 101 The polynomial ϕ n is orthogonal to span{1, z,, z n 1 } Prove that ϕ n is orthogonal to span{z,, z n } Exercise 102 Prove that the Levinson algorithm can be reformulated in terms of orthogonal polynomials as follows ϕ 0 = 1 E 0 = 1 2 = ϕ 0 2 = r 0 for i = 1, 2, η i 1 = zϕ i 1, ϕ i 1 ρ i = η i 1 /E i 1 ϕ i (z) = zϕ i 1 (z) ρ i ϕ i 1 (z) E i = E i 1 (1 ρ i 2 ) Exercise 103 Prove that det T n = E 0 E 1 E n and thus that E n = det T n / det T n 1 Exercise 104 Derive from the recurrence relation the Christoffel-Darboux type formula (1 xȳ) n k=0 ϕ k (x)ϕ k (y) E k = ϕ n(x)ϕ n(y) xȳϕ n (x)ϕ n (y) E n = ϕ n+1 (x)ϕ n+1 (y) ϕ n+1(x)ϕ n+1 (y) E n The polynomials of the second kind are defined by 1 π e jω + z ψ n (z) = 2π π e jω z ϕ n(e jω ) ϕ n (z)r(e jω )dω, n > 1 r0, n = 0 Exercise 105 Prove that ψ n is a polynomial of degree n 20

23 Exercise 106 Define Ω(z) = 1 2π π π D(t, z)r(t)dω, D(t, z) = t + z t z, t = ejω Prove that { Ω + r0, n = 0 ϕ n Ω + ψ n = z n g(z), n > 0, g(z) analytic in z < 1 ϕ nω ψ n = { Ω r0, n = 0 z n+1 h(z), n > 0, h(z) analytic in z < 1 This shows that ϕ n /ψ n is a two-point Padé approximant for Ω Exercise 107 Prove that the ψ n satisfy the recurrence relation ψ n (z) = zψ n 1 (z) + ρ n ψ n 1(z) Exercise 108 Prove that the recurrences for ϕ n and ψ n can be collected in one relation as ϕn (z) ψ n (z) ϕn 1 (z) ψ ϕ n(z) ψn(z) = ϑ n (z) n 1 (z) 1 ρ ϕ n 1 (z) ψ n 1 (z), ϑ n (z) = n z 0 ρ n and thus that the following holds Θ n (z) = ϑ n (z) ϑ 1 (z) = 1 2r 0 r0 ϕ n + ψ n r 0 ϕ n ψ n r 0 ϕ n ψ n r 0 ϕ n + ψ n which becomes particularly simple if r 0 is normalized to be 1 Exercise 109 Take determinants in the previous exercise to obtain ϕ n (z)ψ n(z) + ϕ nψ n (z) = 2r 0 n 1 (1 ρ i 2 )z n = 2E n z n (101) These polynomials were first studied by Szegő Therefore, they are called Szegő polynomials and in this context, the parameters ρ n are called Szegő coefficients They are the complex conjugates of the reflection coefficients The Levinson algorithm (or its reformulation we gave above) can thus be considered as a fast Gram-Schmidt procedure (only O(n 2 ) flops) to construct orthogonal polynomials Exercise 1010 Set x = y = z in the Christoffel-Darboux formula and derive in this way that ϕ n(z) 2 > 0 for z < 1 Thus all the zeros of ϕ n are in z 1 The zeros of ϕ n(z) can not be on z = 1, because if ϕ n(z 0 ) = 0, z 0 = 1, then ϕ n (z 0 ) = z0 nϕ n(z 0 ) = 0, and this is impossible by (101) This is an alternative proof for the stability of the filter in the AR model Note: The orthonormal polynomials are φ k = ϕ k /G k with G 2 k = E k = ϕ k 2 These satisfy the normalized recursion φn φn 1 φ = θ n (z) n φ n 1 21

24 with normalized transition matrix 1 1 ρ θ n (z) = n 1 ρn 2 ρ n 1 z = 1 1 ρn 2 ϑ n(z) We have proved that the AR filter is stable by construction However, the system which defines the polynomials may be ill conditioned (Note that we basically use the unmodified moments r n = z n, 1 ) The better the approximation of the signal, the smaller the prediction error E n, thus the smaller det T n which means that the matrix of the defining Toeplitz system comes closer and closer to a singular matrix This ill conditioning means that small errors in the computation of the moments r n, perturb the Toeplitz matrix so that it is no longer positive definite and this will imply that the ϕ n no longer have all their zeros in z < 1 Moreover rounding errors during the computation in the Levinson algorithm may propagate fast, which gives erroneous predictor coefficients and also this will affect its stability properties Thus it is advisable to check the stability of the actually computed filter This reasoning is still true in a much wider context: inverse problems (such as the inverse scattering problem) are often very ill conditioned 11 Stability test If we have to check if all the zeros of a polynomial are in z < 1, we could compute its zeros, but this is of course a costly procedure Instead, we could use the previous theory in reverse order We know that a (monic) polynomial ϕ n has all its zeros in z < 1 iff all the ρ k < 1 iff all the E k = ϕ k 2 > 0 iff all the Toeplitz matrices T k are positive definite iff det T k > 0 for all k = 1, 2,, n All this information can be extracted from ϕ n by inverting the Szegő recursion This gives the following algorithm: ρ n = ϕ n (0) for k = n, n 1,, 2 ϕ k 1(z) = ϕ k(z) + ρ k ϕ k (z) z(1 ρ k 2 ) ρ k 1 = ϕ k 1 (0) Exercise 111 Prove the previous algorithm What is its complexity? The previous algorithm is known as the Lehmer-Schur or Jury test 12 Fast (inverse) Cholesky factorization Suppose that we have computed the least squares (polynomial) predictors and the corresponding least squares errors In terms of the (monic) orthogonal polynomials ϕ k, we thus have z k, ϕ i = δ ki E k, k i = 0, 1,, n We can express this orthogonality relation in terms of matrices as follows Denote by U n the unit upper triangular matrix whose k-th column consists of the coefficients of ϕ k (and zeros below the diagonal) The orthogonality can then be expressed as r 0 r 1 r n T n U n = r 1 r 0 U n = r n r 0 22 E 0 E 1 =: F n E n

25 Because T n is hermitian, is upper triangular and therefore U H n T H n = U H n T n = F H n U H n T n U n = D n is diagonal: D n = diag(e 0,, E n ) Note also that Un H = D n Fn 1 This relation implies the upper triangular diagonal lower triangular factorization of T 1 T 1 n = U n Dn 1 Un H This is a kind of Cholesky factorization for Tn 1 We would rather expect Tn 1 to be factorized as a product of a lower triangular diagonal upper triangular But the latter can also be obtained as follows Let us as before denote by Ĩ the reversal operator (matrix) with 1 s on the anti-diagonal Then ĨT nĩ = (T n) T and thus Tn 1 = ĨT T n Ĩ Therefore T 1 n = (ĨŪnĨ)(ĨD 1 n Ĩ)(ĨU T n Ĩ) is indeed of the required form: lower triangular diagonal upper triangular Exercise 121 Check that ĨŪnĨ is the unit lower triangular matrix which contains in its k-th column the coefficients of the polynomial ϕ n k (and zeros above the diagonal) Thus we have identified the Levinson (or Szegő) algorithm as a fast (O(n 2 ) instead of O(n 3 ) flops) Cholesky algorithm for computing the triangular (Cholesky) factors of the (hermitian positive definite) inverted Toeplitz matrix Tn 1, starting from the data in the matrix T n itself Therefore, it is sometimes called a fast inverse Cholesky algorithm The natural question now is whether we can also find the Cholesky factors of T n itself? Of course T n = Un H D n Un 1 = L n D n L H n = F n Dn 1 Fn H with L n = F n D 1 n = Un H, a unit lower triangular matrix But, the question is, can we find the columns of L n (or F n ) directly, ie, without first computing U n? The answer is yes Therefore, consider the vectors (V 1k is the k-th column of U n ) V 1k = ā k ā T C (n+1) 1 V 2k = a 0 a k T C (n+1) 1 where ϕ k (z) = k i=0 a iz i = A k (z) Then we can easily check that (z denotes the down shift operator) η k E k 0 0 where and hence z 0 T n V 1k V 2k 0 1 = E k η k η k = r k+1 r 1 a 0 a k T η k = r 1 r k 1 ā k =: Q 1k Q 2k (k zeros) ā 0 T 23

26 is its complex conjugate and E k = ϕ k 2 = r k r 0 ā k ā 0 T = r 0 r k a 0 a k T These quantities are exactly the ones that showed up in the Levinson algorithm and the reflection coefficient is given by ρ k+1 = η k /E k It is now easy to verify that 1 ρ Q 1k Q 2k k+1 = Q ρ k+1 1 1k Q 2k ϑ T k+1 0 E k z 0 = T n V 1k V 2k ϑ T 0 1 k+1 = E k+1 0 = P 1,k+1 P 2,k+1 (k + 1 zeros) η k+1 The first column of the right hand side is the (k + 1)-st column of F n This gives in principle the algorithm we are looking for since we have again some η k+1 and some E k+1 which will define some ρ k+2 = η k+1 /E k+1 and another step can be performed As you may see from these calculations, it is not necessary to keep all the zeros in the algorithm to compute the reflection coefficients; ie, we may keep from the vectors P 1k P 2k only the elements (the operation z 1 now cuts off the top element of a vector) 1k 2k = P 1k P 2k z k = E k 0 η k, P 1k P 2k = Q 1k Q 2k z z It is then not difficult to see that these reduced vectors satisfy the recurrence relation z 0 1,k+1 2,k+1 = 1k 2k ϑ T 0 1 k+1 z 1 where the reflection coefficients ρ k+1 = η k /E k can be found from the elements E k and η k which are the top elements of the reduced vectors The initial conditions for this procedure are easily found to be = r 0 0 r 1 r 1 An operation count for this procedure will reveal that it requires also O(n 2 ) flops and it is thus a fast Cholesky algorithm for (hermitian positive definite) Toeplitz matrices The representation of the down-shift operation by a multiplication with z and the cutting (upshift) operator as a multiplication with z 1 makes sense in the following way Let n and define the functions (or formal series) D k (z) = D 1k (z) D 2k (z) = z k 1 1 z z 2 1k 2k 24 r n r n z 0 0 1,

27 then the previous recurrence shows that 1 ρ D 1k (z) D 2k (z) k+1 ρ k+1 1 z = D 1,k+1 (z) D 2,k+1 (z) where the operation denoted by z is now a genuine multiplication with z Note that if the D ik (z) converge in a neighborhood of z = 0 and indeed represent functions there, then 13 The Schur algorithm ρ k+1 = lim z 0 D 2k (z) D 1k (z) The functions D ik (z) introduced at the end of the previous section have a meaning Exercise 131 Define Γ k (z) = D 2k (z)/d 1k (z) Prove that the recurrence for the D ik (z) of the previous section implies the recurrence for the Γ k functions Γ k+1 (z) = 1 z Γ k (z) ρ k+1 1 ρ k+1 Γ k (z), ρ k+1 = Γ k (0), The functions Γ k are all Schur functions This means that they belong to the class We shall prove this by induction We start with the function S = {Γ : Γ 1 in z 1 and analytic in z < 1} Ω 0 (z) = r r k z k This is a Carathéodory function or positive real function, which means that it belongs to the class To prove the latter, note that for z < 1, and its real part is C = {Ω : RΩ 0 in z 1 and analytic in z < 1} Ω 0 (z) = 1 2π = 1 2π π π π π ( RΩ 0 (z) = 1 π 2π = 1 2π ) e jkω z k R(e jω )dω e jω + z e jω z R(ejω )dω π π π R ejω + z e jω z R(ejω )dω 1 z 2 e jω z 2 R(ejω )dω which is positive for z < 1 because the kernel (Poisson kernel) and the weight R(e jω ) are positive On the boundary, RΩ 0 (e jω ) = 1 r r k e jkω + r r k e jkω 2 = r k e jkω = R(e jω ) 25

28 and hence positive Because Ω 0 (z) has a causal expansion in z < 1, it is analytic This proves that Ω 0 C Next we note that Γ 0 (z) = 1 Ω 0 (z) Ω 0 (0) z Ω 0 (z) + Ω 0 (0), Ω 0(0) = r 0 > 0 The ratio satisfies Ω 0 (z) Ω 0 (0) 2 Ω 0 (z) + Ω 0 (0) = Ω 0(z) 2 Ω 0 (0)2R(Ω 0 (z)) + Ω 0 (0) 2 Ω 0 (z) 2 + Ω 0 (0)2R(Ω 0 (z)) + Ω 0 (0) 2 1 Because the numerator is zero for z = 0, we can divide out z (Schwarz lemma) and get a function Γ 0 which is thus a Schur function All the other Γ k are also Schur functions as is proved in the induction step below Note that the mapping z z α 1 αz, α < 1 is a bijection of Ĉ = C { } onto itself, which maps the unit circle onto itself, the open unit disk onto itself and the open outside of the unit disk onto itself Exercise 132 Prove this bijection property for the previous Moebius map Thus Γ k ρ k+1 1 ρ k+1 Γ k will again be a Schur function (since ρ k+1 = Γ k (0) is bounded by 1) Because this function is zero for z = 0, we can divide out z and still get a Schur function, which proves that Γ k+1 is in the class S I Schur used the recursion for Γ k to check whether a given function Γ 0 was in the class S He proved the following theorem (for this reason, the ρ k are also referred to as Schur coefficients) Theorem 131 (Schur 1917) Let Γ 0 be a given function Define recursively Γ k+1 (z) = 1 z Γ k (z) ρ k+1 1 ρ k+1 Γ k (z), ρ k+1 = Γ k (0) Then Γ 0 S iff ρ k < 1 for k = 1, 2, or ρ k < 1 for k = 1, 2,, n 1 and ρ n = 1 and ρ n+k = 0 for k = 1, 2, In the latter case, Γ 0 is a Blaschke product of degree n: Γ 0 (z) = P (z) P (z), = c n i=1 P (z) Π n 14 The general pole-zero model z α i 1 α i z, α i < 1, i = 1,, n, c = 1 If we allow a rational model of ARMA type, the problem becomes a nonlinear one (AR modelling was essentially reduced to a linear least squares problem) Hence we shall have to apply iterative methods to minimize the error However, if we have some idea about the location of the zeros, we can propose a filter of the form H(z) = G (1 ᾱ 1z) (1 ᾱ n z) A n (z) = G D n (z) 26

29 where 1/ᾱ i are the estimated zeros, with α i < 1 to make it minimal phase, Define D n (z) = A n(z) π n (z), π n(z) = n (1 ᾱ k z), A n Π n ϕ n (z) = A n(z) π n (z) then both ϕ n and D n are in the space { } p n n L n =, p n Π n, π n (z) = (1 ᾱ k z) π n of rational functions with prescribed poles (outside the closed unit disk) Exercise 141 Prove that the space L n is also spanned by the basis functions {ζ k, k = 1, 2,, n} with ζ 0 = 1, ζ k (z) = z α k, k = 1, 2,, n 1 ᾱ k z or by {B k, k = 1,, n} where Exercise 142 Prove that B 0 = ζ 0, B k = B k 1 ζ k, k = 1,, n ϕ n (z) = D n(z) := B n (z)d n (z) = B n (z)d n (1/ z) L n Note that when all α k = 0, the space L n reduces to the space Π n of polynomials and we are back in the AR case If we want for a stochastic process (s n ) to predict s 0 from its past, ie, from span{s k, k = 1, 2, } in best L 2 sense, we should find (h n ) such that 2 s 0 + h k s k is minimal Under the Kolmogorov isomorphism, this is equivalent to finding the minimum of H 2, where H(z) = 1 + h k z k and where the norm is taken in the Hilbert space with inner product Thus we look for f, g = 1 2π π π f(e jω )g(e jω )R(e jω )dω min{ H 2 : H H 2, H(0) = 1} In the AR model, this (infinite dimensional) problem was approximated by minimizing over polynomial subspaces Π n H 2 However, it is perfectly possible to minimize over any other finite dimensional subspace of H 2, for example the subspaces L n We should then solve the problems min{ f 2, f L n H 2, f(0) = 1} 27

30 To solve this problem (theoretically and practically) we need the concept of reproducing kernel Let H be a Hilbert space, then we call k(z, w) a reproducing kernel for H, if w k(z, w), f(z) = f(w), f H Exercise 143 Prove that if ϕ k, k = 0, 1, is an orthogonal basis for H, then k(z, w) = k ϕ k (z)ϕ k (w) ϕ k 2 is a reproducing kernel in H We have now the following Theorem 141 The optimization problem min{ f 2, f H, f(w) = 1} in the Hilbert space H with reproducing kernel k is solved for and the minimum is 1/k(w, w) f(z) = g w (z) = k(z, w) k(w, w) Proof It is clear that g w (w) = 1 and g w 2 = 1/k(w, w) Let f = k a kϕ k (z) be an arbitrary function in H such that f(w) = 1 (suppose ϕ k forms an orthonormal basis) Then we have to minimize f 2 = a k 2 k with the side condition k a kϕ k (w) = 1 By the Cauchy-Schwarz inequality 1 = k a k ϕ k (w) 2 ( k a k 2 )( k ϕ k (w) 2 ) which implies 1 k(w, w) = 1 k ϕ k(w) 2 k a k 2 The lower bound is indeed reached by choosing a k = ϕ k (w)/k(w, w) and this proves the theorem A possible solution method for the prediction problem is now clear: we first construct the orthonormal functions ϕ 0, ϕ 1,, ϕ n, eg, by the orthogonalization of the basis B 0, B 1,, B n and we so get the reproducing kernel n ϕ k (z)ϕ k (w) k n (z, w) = ϕ k 2 k=0 It is even possible to find a recurrence relation for the reproducing kernels directly The optimal predictor from L n is then given by H(z) = Gk n(0, 0) k n(z, 0) with k n(z, 0) = B n (z)k n (z, 0) 28

Lab 9a. Linear Predictive Coding for Speech Processing

EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z)