1 Introduction 1.1 The Contribution The well-nown least mean squares (LMS) algorithm, aiming at tracing the \best linear t" of an observed (or desired

Size: px

Start display at page:

Download "1 Introduction 1.1 The Contribution The well-nown least mean squares (LMS) algorithm, aiming at tracing the \best linear t" of an observed (or desired"

Eric Banks
6 years ago
Views:

1 Necessary and Sucient Conditions for Stability of LMS Lei Guo y Lennart Ljung z and Guan-Jun Wang x First Version: November 9, 1995 Revised Version: August 2, 1996 Abstract. In a recent wor [7], some general results on exponential stability of random linear equations are established, which can be applied directly to the performance analysis of a wide class of adaptive algorithms including the basic LMS ones, without requiring stationarity, independency and boundedness assumptions of the system signals. The current paper attempts to give a complete characterization of the exponential stability of the LMS algorithms, by providing a necessary and sucient condition for such a stability in the case of possibly unbounded, nonstationary and non--mixing signals. The results of this paper can be applied to a very large class of signals including those generated from, e.g., a Gaussian process via a time-varying linear lter. As an application, several novel and extended results on convergence and tracing performance of LMS are derived under various assumptions. Neither stationarity nor Marov chain assumptions are necessarily required in the paper. This wor was supported by the National Natural Science Foundation of China and the Swedish Research Council for Engineering Sciences (TFR). y Institute of Systems Science, Chinese Academy of Sciences, Beijing, 10000, P.R. China. Lguo@iss03.iss.ac.cn. z Department of Electrical Engineering, Linoping University, S-51 3 Linoping, Sweden. Ljung@isy.liu.se. x Department of Mathematics, The Central University for Nationalities, Beijing 10001, P. R. China. 1

2 1 Introduction 1.1 The Contribution The well-nown least mean squares (LMS) algorithm, aiming at tracing the \best linear t" of an observed (or desired) signal fy g based on a measured d-dimensional (input) signal f g, is dened recursively by x +1 = x + (y? x ); x 0 2 R d ; (1) where > 0 is a step-size. Due to its simplicity, robustness and ease of implementation, the LMS algorithm is nown to be one of the most basic adaptive algorithms in many areas including adaptive signal processing, system identication and adaptive control, and it has received considerable attention in both theory and applications over the past several decades (see, among many others, the boos [20], [19] and [2], the survey [14], and the references therein). Also, it has been found recently that the LMS is H 1 -optimal in the sense that it minimizes the energy gain from the disturbances to the predicted errors, and it is also ris sensitive optimal and minimizes a certain exponential cost function (see [11]). In many situations, it is desirable to now at least the answers to the following questions: Is the LMS stable in the mean squares sense? Does the LMS have good tracing ability? How to calculate and to minimize the tracing errors? Now, for a given sequence f g, (1) is a linear, time-varying dierence equation. The properties of this equation are essentially determined by the homogeneous equation: x +1 = (I? )x (2) with fundamental matrix (t; ) = ty j= (I? j j ) (3) 2

3 The expression for tracing errors will then be of the form t =1 (t; )v() (4) where fv()g describes the error sources (measurement noise, parameter variations etc). As elaborated in, e.g., [] and [6], the essential ey to the analysis of (4) is to prove exponential stability of (3). This was also the motivation behind the wor of [1]. We shall establish such exponential stability in the sense that for any p 1 there exist positive constants M, and such that [E (t; ) p ] 1=p M(1? ) t? ; t ; 2 (0; ]: (5) The expectation E here is with respect to the sequence f g. Clearly, the property (5) is a property of the sequence f g only. We shall here establish (5) under very general conditions on f g. These are of the ind (precise conditions are given in Theorem 2): Restrictions on the dependence among the : This taes the form that is formed by possibly time varying, but uniformly stable ltering of a noise source " j which is mixing and obeys an additional condition on the rate of decay of dependence. Restrictions on the tail of the distribution of. This taes the form that E[exp(" 2 )] < C ; ; (6) for some > 0 and some constant C. Here " is the \source" from which was formed. Both these restrictions are very mild, and allow for example the Gaussian, dependent case (unlie most previous treatments). Now, for sequences subject to these two restrictions the necessary and sucient condition for (5) to hold is that +h i=+1 E[ i i ] I ; 0; (7) for some h > 0 and > 0. This is the \persistence of excitation" or \full ran" condition on. This result is the main contribution of this paper. Furthermore, several direct applications of the stability result to adaptive tracing will be given under various noise assumptions, which in particular, yield more general results on LMS than those established recently in []. 3

4 1.2 Earlier Wor Most of the existing wor related to exponential stability of (2) is concerned with the case where the signals f g are independent or M-dependent (cf., e.g., [20], [19], [4], [1],[2]). This independence assumption can be relaxed considerably if we assume that the signals f g are bounded as in, e.g., [6],[1] and [12]. Note that the boundedness assumption is suitable for the study of the so called normalized LMS algorithms (cf. [19], [6] and [15]), since the normalized signals are automatically bounded. In this case, some general results together with a very wee (probably the weaest ever nown) excitation condition for guaranteeing the exponential stability of LMS can be found in [6]. Moreover, in the bounded - mixing case, a complete characterization of the exponential stability can also be given. Indeed, in that case it has been shown in [6] that (7) is the necessary and sucient condition for (2) to be exponentially stable. For general unbounded and correlated random signals, the stability analysis for the standard LMS algorithm (1), becomes more complex as to have deed complete solution for over 30 years. Recently, some general stability results applicable to unbounded nonstationary dependent signals are established in [7], and based on which a number of results on the tracing performance of the LMS algorithms can be derived (see []). In particular, the result of [7] can be applied to a typical situation where the signal process is generated from a white noise sequence through a stable linear lter : = 1 j=?1 A j "?j + ; 1 j=?1 where f" g is an independent sequence satisfying A j < 1; () sup E[exp(" )] < 1 ; for some > 0; > 2; (9) and f g is a bounded deterministic process. It is obvious that the expression () has a similar form as the well-nown Wold decomposition for wide-sense stationary processes. Note, however, that the signal process f g dened by () need not be a stationary process nor a Marov chain in general. Unfortunately, the condition (9) with > 2 excludes the case where f" g is a 4

5 Gaussian process, since such signals could only satisfy a weaer condition : sup E[exp(" 2 )] < 1 ; for some > 0: (10) The motivation of this paper has thus been to relax the moment condition (9) so that, at least, the signal process f g dened by () and (10) can be included. This will be done in a more general setting based on a relaxation of the moment condition used in Theorem 3.2 of [7]. 2 The Main results 2.1 Notations Here we adopt the following notations introduced in [7]. a). The maximum eigenvalue of a matrix is denoted by max (), and the Euclidean norm of is dened as its maximum singular value, i.e., 4 = f max ( )g 1 2 ; and the L p -norm of a random matrix is dened as p 4 = fe( p )g 1 p ; p 1; b). For any square random matrix sequence F = ff g, and real numbers p 1; 2 (0; 1), the L p -exponentially stable family S p is dened by < S p ( ) = : F : Y (I? F j ) p M(1? )?i ; ) 2 (0; ]; i 0; for some M > 0 and 2 (0; 1) Liewise, the averaged exponentially stable family S is dened by S( ) = < : F : Y (I? E[F j ]) M(1? )?i ; ) 2 (0; ]; i 0; for some M > 0; and 2 (0; 1) In what follows, it will be convenient to set S p 4 = [ 2(0;1) S p ( ); S 4 = [ 5 2(0;1) S( ); (11)

6 where c). Let p 1; F 4 = ff i g. Set M p = F : sup i S (T ) i p = o(t ); as T! 1 (12) (i+1)t?1 S (T ) i = (F j? E[F j ]) (13) j=it The denition of M p is reminiscent of the law of large numbers. As shown by Lemma 3 of [9], it includes a large class of random processes. 2.2 The Main Results We rst present a preliminary theorem. Theorem 1. Let ff g be a random matrix process. Then ff g 2 S =) ff g 2 S p ; p 1; provided that the following two conditions are satised: (i). There exist positive constants "; M and K such that for any n 1, "!# n E exp " F ji M exp(kn) i=1 holds for any integer sequence 0 j 1 < j 2 ::::: < j n. (ii). There exists a constant M and a nondecreasing function g(t ) with g(t ) = o(t ), as T! 1, such that for any xed T, all small > 0 and any n i 0, E < : exp n S (T ) j where S (T ) j is dened by (13). The proof is given in Section = A ; M exp f[g(t ) + o()](n? i)g Remar 1. The form of Theorem 1 is similar to that of Theorem 3.2 in [7]. The ey dierence lies in the condition (i). This condition was introduced in [5], p.112 and is, in a certain sense, a relaxation of the corresponding condition used in Theorem 3.2 of [7]. Such a relaxation enables us to include Gaussian signals as a special case, when the LMS algorithms are in consideration, as will be shown shortly. 6

7 Based on Theorem 1 we may prove that for a large class of unbounded nonstationary signals including (), the condition (7) is also necessary and sucient for the exponential stability of LMS. Let us start with the following decomposition which is more general than that in (): = 1 j=?1 A(; j)"?j + ; 1 j=?1 sup A(; j) < 1; (14) where f g is a d-dimensional bounded deterministic process, and f" g is now a general m-dimensional -mixing sequence. The weighting matrices A(; j) 2 R dm are assumed to be deterministic. We remar that the summability condition in (14) is precisely the standard denition for uniform stability of time-varying linear lters (cf., e.g., [13]). Also, recall that a random sequence f" g is called -mixing if there exists a nonincreasing function (m) (called the mixing rate) with (m) 2 [0; 1], m 0 and (m)! 0 as m! 1 such that sup jp (BjA)? P (B)j (m); m 0; 2 (?1; 1) A2F?1 ;B2F1 +m where by denition F j i,?1 i j 1, is the -algebra generated by f" ; i jg. The -mixing concept is a standard one in the literature for describing wealy dependent random processes. As is well-nown, the -mixing property is satised by, for example, any M-dependent sequences, sequences generated from bounded white noises via a stable linear lter, and stationary aperiodic Marov chains which are Marov ergodic and satisfy Doeblin's condition (cf. [3]). The main result of this paper is then stated as follows. Theorem 2. Consider the random linear equation (2). Let the signal process f g be generated by (14) where f g is a bounded deterministic sequence, and f" g is a -mixing process which satises for any n 1 and any integer sequence j 1 < j 2 ::::: < j n E " exp!# n " ji 2 M exp(kn) (15) i=1 where ; M and K are positive constants. Then for any p 1, there exist constants > 0, M > 0 and 2 (0; 1), such that for all 2 (0; ] 2 31=p ty 4 E (I? j j ) p 5 M(1? ) t? ; t 0 (16) j=+1 7

8 if and only if there exists an integer h > 0 and a constant > 0 such that +h i=+1 E[ i i ] I ; 0: (17) The proof is also given in Section 4. Remar 2. By taing A(; 0) = I; A(; j) = 0; ; j 6= 0 and = 0; in (14), we see that f g coincides with f" g, which means that Theorem 2 is applicable to any -mixing sequences. Furthermore, if f" g is bounded, then (15) is automatically satised. This shows that Theorem 2 may include the corresponding result in [6] as a special case. Note, however, that a linearly ltered -mixing process lie (14) will no longer be a -mixing sequence in general (because of the possible unboundedness of f" g). In fact, Theorem 2 is applicable also to a quite large class of processes other than -mixing, as shown by the following corollary. Corollary 1. Let the signal process f g be generated by (14) where f g is a bounded deterministic sequence, and f" g is an independent sequence satisfying condition (10). Then f g 2 S p for all p 1 if and only if there exists an integer h > 0 and a constant > 0 such that (17) holds. Proof. By Theorem 2, we need only to show that condition (15) is true. This is obvious since f" g is an independent sequence satisfying (10). 2 Remar 3. Corollary 1 continues to hold if the independence assumption of f" g is weaened to M-dependence. Moreover, the moment condition (10) used in Corollary 1 may also be further relaxed if additional conditions are imposed. This is the case when, for example, f g is a stationary process generated by a stable nite dimensional linear state space model with the innovation process f" g being an i.i.d. sequence (see, [16]). 3 Performance of Adaptive tracing Let us now assume that fy g and f g are related by a linear regression y = x + v (1) where fx g is the true or \ctitious" time-varying parameter process, and fv g represents the disturbance or unmodeled dynamics.

9 The objective of the LMS algorithm (1) is then to trac the time-varying unnown parameter process fx g. The tracing error will depend on the parameter variation process f g dened by = x? x?1 (19) through the following error equation obtained by substituting (1)-(19) into (1): ~x +1 = (I? )~x + v? +1 (20) where ~x 4 = x? x. Obviously, the quality of tracing will essentially depend on properties of f ; ; v g. The homogeneous part of (20) is exactly the equation (2), and can be dealt with by Theorem 2. Hence, we need only to consider the nonhomogeneous terms in (20). Dierent assumptions on f ; v g will give dierent tracing error bounds or expressions, and we shall treat three cases separately in the following. 3.1 First Performance Analysis By this, we mean that the tracing performance analysis is carried out under a \worst case" situation, i.e., the parameter variations and the disturbances are only assumed to be bounded in an averaging sense. To be specic, let us mae the following assumption: A1). There exists r > 2 such that and = 4 sup v r < 1 = 4 sup r < 1 Note that this condition includes any \unnown but bounded" deterministic disturbances and parameter variations as a special case. Theorem 3. Consider the LMS algorithm (1) applied to (1). Let condition A1) be satised. Also, let f g be as in Theorem 2 with (17) satised. Then for all t 1 and all small > 0 where 2 (0; 1) is a constant. Ex t? x t 2 = O( ) + O([1? ]t ) 9

10 This result follows immediately from Theorem 2, (20) and the Holder inequality. We remar that various such \worst case" results for other commonly used algorithms(e.g., RLS and KF) may be found in [6]. The main implication of Theorem 3 is that the tracing error will be small if both the parameter variation () and the disturbance () are small. 3.2 Second Performance Analysis By this, we mean that the tracing performance analysis is carried out for zero mean random parameter variations and disturbances which may be correlated processes in general. To be specic, we introduce the following set for r 1, where c w r N r = < : w : sup +n i=+1 9 p = w i r c w r n; n 1 ; (21) is a constant depending on r and the distribution of fw i g only. Obviously, N r is a subset of M r dened by (12). It is nown (see [9]) that martingale dierence, zero mean? and?mixing sequences can all be included in N r. Also, from the proof of Lemma 3 in [9], it is nown that the constant c w r can be dominated by sup w r in the rst two cases, and by sup w r+ ; ( > 0), in the last case. Moreover, it is interesting to note that N r is invariant under linear transformations. This means that if f g and f" g are related by () with 0, then f" g 2 N r implies that f g 2 N r. This can be easily seen from the following inequality: +n i=+1 1 j=?1 i r = 1 A j +n +n A j j=?1 i=+1 i=+1 " i?j r " i?j r Thus, random processes generated from martingale dierences, or? or? mixing sequences via an innite order linear lter can all be included in N r. Now, we are in a position to introduce the following condition for the second performance analysis. A2). For some r > 2, f g 2 N r and f v g 2 N r : Theorem 4. Consider the LMS algorithm (1) applied to the model (1). Let f g be dened as in Theorem 2 with (17) satised, and let the condition A2) 10

11 hold for a certain r. Then for all t 1 and all small > 0, Ex t? x t 2 = O (c v r ) 2 + (c r ) 2! + O([1? ] t ) where c v r and c r are the constants dened in (21), and which depend on the distributions of f v g and f g respectively. Moreover, is the same constant as in Theorem 3. Proof. By Lemma A.2 of [] and Theorem 2, it is easy to see from (20) that the desired result is true. 2 Note that the upper bound in Theorem 4 signicantly improves the \crude" bound given in Theorem 3 for small, and it roughly indicates the familiar trade-o between noise sensitivity and tracing ability. Theorem 4 can be applied directly to the convergence analysis of some standard ltering problems (cf. [20],[4] and [2]). For example, let fy g and f g be two stationary processes, and assume that our purpose is to trac the least mean squares solution of x = [E( )]?1 E( y ) min x E(y? x ) 2 recursively based on real-time measurements fy i ; i ; i g. Now, dene fv g by y = x + v It is then obvious that E v = 0. Furthermore, in many standard situations it can be veried that f v g 2 N r for some r > 2. Thus, Theorem 4 applied to the above linear regression, gives Ex t? x 2 = O() + O([1? ] t ); which tends to zero as t! 1 and! 0. Apparently, Theorem 4 is also applicable to nonstationary signals fy g and f g. 3.3 Third Performance Analysis By this, we mean that the analysis is purposed to get an explicit (approximate) expression for the tracing performance rather than just getting an upper 11

12 bound as in the previous two cases. This is usually carried out under white noise assumptions on f ; v g. Roughly speaing, the parameter process in this case will behave lie a random wal, and some detailed interpretations of this parameter model may be found in [14] and []. We mae the following assumptions: A3. The regressor process is generated by a time-varying causal lter = 1 j=0 A(; j)"?j + ; 1 j=0 sup A(; j) < 1 (22) where f g is a bounded deterministic sequence, and f" ; ; v?1 g is a -mixing process with mixing rate denoted by (m). Assume also that (15) and (17) hold. A4. The process f ; v g satises the following conditions: (i): E[v jf ] = 0; E[ +1 jf ] = E[ +1 v jf ] = 0; (ii): E[v 2 jf ] = R v (); (iii): sup E[jv j r jf ] M; E[ ] = Q(); = 4 sup r < 1; where r > 2 and M > 0 are constants, and F denotes the -algebra generated by f" i ; i ; v i?1 ; i g. Theorem 5. Consider the LMS algorithm (1) applied to the model (1). Let conditions A3) and A4) be satised. Then the tracing error covariance matrix has the following expansion for all t 1 and all small > 0 E[~x t ~x t ] = t + O ()[ (1? )t ] where the function ()! 0 as! 0, and t is recursively dened by t+1 = (I? S t ) t (I? S t ) + 2 R v (t)s t + Q(t + 1) with S t = E[ t t ] and R v (t) and Q(t) being dened as in condition A4).! This theorem relaxes and unies the conditions used in Theorem 5.1 of []. The proof is given in Section 4. The expression for the function () may be found from the proof, and from the related formula in Theorem 4.1 of []. (See (45)). Note that in the (wide-sense) stationary case, S t S; R v (t) R v ; Q(t) Q, and t will converge to a matrix dened by the Lyapunov equation(cf.[]) S + S = R v S + Q 12

13 In this case, the trace of the matrix, which represents the dominating part of the tracing error E~x t 2 for small and large t, can be expressed as tr() = 1 2 [R vd + tr(s?1 Q) ] where d = 4 dim( ). Minimizing tr() with respect to, one obtain the following formula for the step-size : s tr(s?1 Q) = : R v d 4 Proof of Theorems 1, 2 and 5 Proof of Theorem 1. By the proof of Lemma 5.2 in [7] we now that Theorem 1 will be true if (32) in [7] can be established. However, by (34) in [7] and condition (ii), it is easy to see that we need only to show that for any xed c 1, t 1 and T > 1, and for all small > 0, n Y where M > 0 is a constant and with (1 + 2 ch j ) t M 1 + O( 3 2 ) n?i ; n > i; (23) 2 H j = 2 H j (2) + 3 H j (3) + + T H j (T ) + O( 2 ) H j () = Now, let us set jt j 1 <j 2 <<j (j+1)t?1 f j = expf 1 4 F j F j1 ; = 2; ; T: (j+1)t?1 s=jt F s g Then for any 2 T and jt j 1 < ::: < j (j + 1)T? 1, by using the inequalities and x exp(x), we have for 2 (0; 1) F j :::F j ( 4F j ):::( 4F 1 j1 ) expf 4 (Fj1 + ::: + F j )g 3 2 fj 13

14 Consequently, (1 + 2 ch j ) T Y =2 T Y (1 + ch j ())(1 + O( 2 )) =2 it j 1 <j 2 <j (i+1)t?1 Y (1 + cf j F j1 )(1 + O( 2 )) ( cfj ) 2T (1 + O( 2 )) (24) Note that ny ( cfj ) = n?i ( 3 2 c) =0 i+1j 1 <:::<j n f j1 :::f j Now, applying the Minowsi inequality to the above identity, noting the disjoint property of the sets fj i T < j < (j i + 1)T? 1g, i = 1; 2; : : :, for j 1 < j 2 < : : :, taing small enough so that 2 T t 1 4 " and using Condition (i) it is evident that n?i =0 n Y M 1 2 T t ( cfj ) 2 T t ( 3 2 c) i+1j 1 <:::<j n 1 + c 3 2 exp( KT 2 T t ) n?i Finally, from this and (24), we have for any n > i n Y M (1 + 2 ch j ) t ny < : ( cfj ) 2 T 2 T t 1 + c 3 2 exp( KT 2 T t ) 2 M 1 2 T t expf( KT 2 T t )g [1 + O( 2 )] n?i 9 T = ; n?i [1 + O( 2 )] n?i M[1 + O( 3 2 )] n?i ; for all small > 0; which is (23). This completes the proof of Theorem

15 The proof of Theorem 2 is rather involved, and so it is divided (prefaced) with several lemmas. For the analysis to follow, it is convenient to rewrite (14) as where by denition = 1 j=?1 a j "(; j) + ; 1 j=?1 a j < 1; (25) 4 a j = sup A(; j); "(; j) = 4 a?1 j A(; j)"?j (26) (We set "(; j) = 0;, if a j = 0 for some j). The new process f"(; j)g has the following simple properties: (i). For any and j, "(; j) "?j ; (ii). For any xed j, the process f"(; j)g is -mixing with the same mixing rate as f" g; (iii). For any and j, "(; j) is f"?j g-measurable. These three properties will be frequently used in the sequel without further explanations. Lemma 1. Let ff t g be a -mixing d d dimensional matrix process with mixing rate f(m)g. Then where S (T ) i sup i ( S (T ) T?1 q ) 1 2 i 2 2cd T (m) ; T 1; m=0 is dened by (13) and c is dened by c = 4 sup F i? EF i 2. i Proof. Denote G = F? EF. Then by Theorem A.6 in [10](p.27) we have Consequently, by using the inequality E[G j G ] 2dc 2 q(jj? j); j; jtrfj df; F 2 R dd We get i 2 2 = E (i+1)t?1 G j G S (T ) j;=it 15

16 (i+1)t?1 trf j;=it (i+1)t?1 d j;=it 2c 2 d 2 (i+1)t?1 4c 2 d 2 T j;=it T?1 m=0 EG j G g EG j G q (jj? j) q (m) This gives the desired result. 2 Lemma 2. Let F =, where f g is dened by (14) with sup " 4 < 1. Then ff g 2 M 2 where M 2 is dened by (12). Proof. First of all, we may assume that the process f" g is of zero mean (otherwise, the mean can be included in ). Then by (25), i 2 = (i+1)t?1 [ t t? E t t ] 2 S (T ) t=it 1 ;j=? j=?1 (i+1)t?1 a a j t=it (i+1)t?1 a j t=it ["(t; )"(t; j)? E"(t; )"(t; j) ] 2 "(t; j) t 2 (27) Note that for any xed and j, both the processes f"(t; )"(t; j) g and f"(t; j)g are -mixing with mixing rate (m? j? jj) and (m) respectively (where by denition, (m) 4 = 1; m < 0). By Lemma 1, it is easy to see that the last term in (27) is of order o(t ). For dealing with the second last term, we denote f j (T ) = 2cd ( T?1 T m=0 q (m? j? jj) ) 1 2 : (2) where c is dened as in Lemma 1. Consequently, by (m) 1, m, it is not dicult to see that and sup f j (T ) 2cdT (29) ;j sup p f j (T ) = o(t ): (30) j?jj< T 16

17 Now, by the summability of fa j g, Hence by (29) and by (30) j?jj p T j?jj p T j?jj< p T Combining (31) and (32) gives 1 ;j=?1 a a j! 0; as T! 1 a a j f j (T ) = o(t ) (31) a a j f j (T ) = o(t ): (32) a a j f j (T ) = o(t ): (33) By this and Lemma 1, we now that the second last term in (27) is also of the order o(t ) uniformly in i. Hence, ff g 2 M 2 by the denition (12). 2 Lemma 3. Let sup E 2 < 1. Then f g 2 S if and only if condition (17) holds, where S is dened in (11). Proof. Let us rst assume that (17) is true. Tae = (1 + sup E 2 )?1. Then applying Theorem 2.1 in [6] to the deterministic sequence A = E[ ] for any 2 (0; ], it is easy to see that f g 2 S( ). Conversely, if f g 2 S, then there exists 2 (0; (1 + sup E 2 )?1 ] such that f g 2 S( ). Now, applying Theorem 2.2 in [6] to the deterministic sequence A = E[ ], it is easy to see that (17) holds. This completes the proof. 2 Lemma 4. Let F =, where f g is dened by (14) with (15) satised. Then ff g satises Condition (i) of Theorem 1. Proof. Without loss of generality assume that 0. Let us denote 1 A = j=?1 a j (34) where fa j g is dened by (26). Then by the Schwarz inequality from (25) we have 2 A 1 j=?1 17 a j "?j 2

18 Consequently, by the Holder inequality and (15) we have for " A?2 E expf" n i=1 F ji g 1 n E expf"a a j j=?1 i=1 1Y n = E expf"aa j j=?1 i=1 1 Y j=?1 Y 1 j=?1 = M expfkng: " ji?j 2 g " ji?j 2 g E expf"a 2 n i=1 " ji?j 2 g (M expfkng) a j A This completes the proof. 2 The following lemma was originally proved in [5] (p.113). Lemma 5. Let fz g be a nonnegative random sequence such that for some a > 0; b > 0 and for all i 1 < i 2 < :::::: < i n ; n 1, E expf n =1! a j A z i g expfan + bg: (35) Then for any L > 0 and any n i 0, n z j I(z j L)g expfe a? L 2 (n? i) + bg E expf 1 2 where I() is the indicator function. Proof. Denote f j = exp( 1 2 z j)i(z j L): Then by rst applying the simple inequality I(x L) e x 2 =e L 2 and then using (35), we have for any subsequence j 1 < j 2 :::::: < j E[f j1 ::::::f j ] = E exp( 1 2 E exp( i=1 i=1 z ji ) Y i=1 z ji )= exp( L 2 ) expf(a? L ) + bg 2 1 I(z ji L)

19 By this we have E expf n = E E = E = E ny ny ny < n?i : < n?i e b : = e b n Y exp 1 2 z ji(z j L)g expf 1 2 z ji(z j L)g f1 + exp( 1 2 z j)i(z j L)g f1 + f j g =0 i+1j 1 <:::<j n =0 i+1j 1 <:::<j n f j1 :::f j 9 = ; expf(a? L 2 )g 9 = ; f1 + exp(a? L 2 )g (n? i) exp(a? L 2 ) + b This completes the proof of Lemma 5. 2 Lemma 6. Let F =, where f g is dened by (14) with (15) satised. Then ff g satises Condition (ii) of Theorem 1. Proof. Set for any xed and l, (j+1)t?1 4 z j = zj (; l) = t=jt Then, similar to (27) from (25) we have n S (T ) j 1 ["(t; )"(t; l)? E"(t; )"(t; l) ] a a l n z j + ;l=?1 1 n (j+1)t?1 +2 a =?1 t=jt "(t; ) t : (36) We rst consider the second last term in (36). By the Holder inequality, E exp < 1 : ;l=?1 a a l 19 n z j 9 = ;

20 = E 1Y ;l=?1 1 Y ;l=?1 < < exp : a a l : E expfa2 where A is dened by (34). Now, let c = sup E" 2, and note that we have n n 9 = z j ; 9 = ; z j g "(t; )"(t; l) 1 2 ["(t; )2 + "(t; l) 2 ] z j 1 2 (j+1)t?1 t=jt 1 2 (" t? 2 + " t?l 2 ) (" t? 2 + " t?l 2 ) + ct a a l A 2 (37) By this and (15) it is easy to prove that the sequence fz j g satises condition (35) with a = (K + c)t and b = log M, where is dened as in (15). Consequently, by Lemma 5 we have for any L > 0 < E exp : 2 n z j I(z j LT ) M exp n e (K+c? L 2 )T (n? i) o (3) Now, in view of (3), taing < A?2 4 and L > 2?1 (K + c), and applying the Holder inequality, we have E expf2a 2 n where (T )! 0 as T! 1, which is dened by 9 = ; z j I(z j LT )g M expf(t )(n? i)g (39) (T ) = 4?1 A 2 expf(k + c? L 2 )Tg: Next, we consider the term x j 4 = zj I(z j LT ). By the inequality e x 1 + 2x; 0 x log 2, we have for small > 0 expf2a 2 n x j g 20 n Y (1 + 4A 2 x j ) (40)

21 As noted before, for any xed and l, the process f"(t; )"(t; l) g is -mixing with mixing rate (m? j? lj). Consequently, for any xed and l, both fz j g and fx j g are also -mixing with mixing rate ((m? 1)T + 1? j? lj). Note also that by Lemma 1 Ex j Ez j z j 2 f l (T ) where f l (T ) is dened by (2). Therefore, applying Lemma 6.2 in [7](p.133), we have E ny (1 + 4A 2 x j ) 2 n 1 + A 2 [f l (T ) + 2LT (T + 1? j? lj)] o n?i 2 exp n A 2 [f l (T ) + 2LT (T + 1? j? lj)](n? i) o : (41) Finally, combining (39){ (41) and using the Schwarz inequality we get E expfa 2 n < : E expf2a2 z j g n z j I(z j LT ) 9 = ; p n 2M exp [(T ) + A 2 f l (T ) + 16LT A 2 (T + 1? j? lj)](n? o i) 1 2 < : E expf2a2 Substituting this into (37) and noting (33), it is not dicult to see that there exists a function g(t ) = o(t ) such that for all small > 0, 9 < 1 n = E exp : a a l z j ; ;l=?1 p 2M expfg(t )(n? i)g: Obviously, for the last term in (36), a similar bound can also be derived using a similar treatment. Hence it is easy to see that the lemma is true. 2 Proof of Theorem 2. Necessity: Let f g 2 S p for p = 2. Then by Lemma 2 and Theorem 3.1 in [7], we now that f g 2 S. Consequently, by Lemma 3 we now that (17) holds. n x j 9 = ;

22 Suciency: If condition (17) holds, then by Lemma 3 we have f g 2 S. By this and Lemmas 4 and 6, we now that Theorem 1 is applicable, and consequently f g 2 S p ; p 1. This completes the proof. 2 Proof of Theorem 5. We need to verify all the conditions in Theorem 4.1 of []. However, by Theorem 2, Lemma 3 and the conditions of Theorem 5, it is not dicult to see that we need actually to verify the wea dependence condition in [], p In other words, we need to show that for any q 3, there is a bounded function (m) such that (m)! 0; as m! 1 and E[ jf?m ]? E[ ] q (m) ; 0; m 0: (42) First of all, since f" ; ; v?1 g is -mixing, we can apply the mixing inequality in [17] to obtain E["(; i)"(; j) jf?m ]? E["(; i)"(; j) ] q C q [(m? max(i; j))] 1?q?1 (43) for any nonnegative integers i; j; ; m and q 3, where "(; j) is dened by (26), C q depends only on sup " i " j q, and where by denition (m) = 4 1 for m < 0. i;j Without loss of generality, we may assume that f" g is of zero mean. Then, with some simple manipulations we get from (22) or (25) that sup E[ jf?m ]? E[ ] q 1 i;j=0 + 2 sup a i a j sup E["(; i)"(; j) jf?m ]? E["(; i)"(; j) ] q 1 i=0 a i E["(; i)jf?m ] q (44) Now, by (43), the second last term in (44) can be bounded by 1 C q i;j=0 a i a j [(m? max(i; j))] 1?q?1 which tends to zero as m! 1 by the dominated convergence theorem. The last term in (44) can be treated similarly. Denote the right hand side of (44) by (m). It thus tends to zero as m! 1. Hence, (42) is true and the 22

23 proof of Theorem 5 is complete. To nd the degree of approximation, dene, analogously to Theorem 4.1 in [], () = min m1 fmp + (m)g : (45) 5 Concluding Remars The LMS is a basic algorithm in the estimation of time-varying parameters of dynamical systems as well as in adaptive signal processing. There is an extensive and growing literature devoted to the study of its properties from various aspects, among which the exponential stability is the most fundamental. Despite the remarably simple structure of LMS, characterizing its properties analytically has long been nown very complicated in general. The main contributions of this paper are summarized as follows: (i). For a large class of nonstationary wealy dependent signals, the condition (17) is shown to be necessary and sucient for the exponential stability of LMS, even in the case where the signals are unbounded and non--mixing. (ii). The main stability result Theorem 2, has quite wide applicability. In particular, it is applicable to a typical situation where the signals are generated from e.g. Gaussian white noises via a time-varying linear lter of innite order (see Corollary 1); (iii). A \three stage procedure" for the tracing performance analysis is delineated (see Section 3), according to dierent assumptions on parameter variations and disturbances. These assumptions include \worst-case noises", \colored noises" and \white noises". By doing so, we have also generalized and simplied the recent related results on LMS in []. The basis for this tracing performance analysis, in all its stages, is the exponential stability. References [1] R.R. Bitmead and B.D.O. Anderson. Lyapunov techniques for the exponential stability of linear dierence equations with random coecients. IEEE Transactions on Automatic Control, AC-25(4):72{77, 190. [2] P. M. Clarson. Optimal and Adaptive Signal Processing. CRC Press, Boca Raton,

24 [3] J.L. Doob. Stochastic Processes. Wiley, New Yor, [4] E. Eweda and O. Macchi. Tracing error bounds of adaptive nonstationary ltering. Automatica, 21(3):293{302, 195. [5] L. Guo. Time-Varying Stochastic Systems: Stability, Estimation and Control. Jilin Science & Technology Press, Changchun, in Chinese. [6] L. Guo. Stability of recursive stochastic tracing algorithms. SIAM Journal of Control and Optimization, 32:1195{1225, [7] L. Guo and L. Ljung. Exponential stability of general tracing algorithms. IEEE Trans. Automatic Control, AC-40:1376{137, August [] L. Guo and L. Ljung. Performance analysis of general tracing algorithms. IEEE Trans. Automatic Control, AC-40:13{1402, August [9] L. Guo, L. Ljung, and P. Priouret. Performance analysis of the forgetting factor RLS algorithm. Int. J. Adaptive Control and Signal Processing, 7:525{ 537, [10] P. Hall and C.C. Heyde. Martingale Limit Theory and Its Applications. Academic Press, New Yor, 190. [11] B. Hassibi, A.H. Sayed, and T. Kailath. LMS is H 1 optimal. In Proc. IEEE-CDC, volume 1, pages 74{79, San Antonio, [12] H. J. Kushner and J. Yang. Analysis of adaptive step-size SA algorithms for parameter tracing. IEEE Transaction on Autom. Control, AC-40():1403{ 1410, [13] L. Ljung. System Identication - Theory for the User. Prentice-Hall, Englewood Clis, N.J., 197. [14] L. Ljung and S. Gunnarsson. Adaptive tracing in system identication - a survey. Automatica, 26(1):7{22, [15] L. Ljung and P. Priouret. A result of the mean square error obtained using general tracing algorithms. Int. J. of Adaptive Control and Signal Processing, 5(4):231{250,

25 [16] P. Priouret and A. Yu. Veretenniov. A remar on the stability of the LMS tracing algorithm. Stochastic Analysis and Applications, Submitted, [17] R. J. Sering. Contributions to central limit theory for dependent variables. The Annals of Math. Statistics, 39(4):115{1175, 196. [1] V. Solo. Averaging analysis of the LMS algorithm. In C. T. Leondes, editor, Control and Dynamic Systems: Advances in Theory and Applications. vol 65 (Part 2), pages 379{397, New Yor, Academic Press. [19] V. Solo and. Kong. Adaptive Signal Processing Algorithms: Stability and Performance. Prentice Hall, [20] B. Widrow and S. Stearns. Adaptive Signal Processing. Prentice-Hall, Englewood-Clis,

Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate

www.scichina.com info.scichina.com www.springerlin.com Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate WEI Chen & CHEN ZongJi School of Automation