ISSN 746-7233, England, UK World Journal of Modelling and Simulation Vol. 3 (2007) No. 4, pp. 289-298 Acceleration of Levenberg-Marquardt method training of chaotic systems fuzzy modeling Yuhui Wang, Qingxian Wu, Chansheng Jiang, Wei Fang, Yali Xue Lab of Pattern Recognition and Intelligent Control, College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 2006, P.R. China (Received June 7 2007, Accepted October 4 2007) Abstract. Takagi-Sugeno fuzzy systems have emerged as an effective tool for modeling unknown nonlinear dynamics as chaotic systems. In this study, a modification on the Levenberg-Marquardt (L-M) algorithm for Takagi-Sugeno fuzzy training is proposed. This method can adjust the parameters of each linear polynomials and fuzzy membership functions on line without relying on experts experience excessively. Firstly, its quadratic convergence is analysed, then it is applied to training a unified chaotic system based on Takagi- Sugeno fuzzy. Compared this method with the standard L-M method, the convergence speed is accelerated. The simulation results demonstrate the effectiveness of the proposed method. Keywords: chaotic system, Takagi-Sugeno fuzzy, Levenberg-Marquardt algorithm Introduction A chaotic system has complex dynamical behaviors that possess some special features, such as excessive sensitivity to initial conditions, broad spectrums of Fourier transform, and fractal properties of the motion in phase space [6, 9, 4]. All of these bring great challenge for chaotic systems modeling. In general, an unknown chaos is a black box belonging to a given class of nonlinearities. Therefore, a non-model-based method is suitable [9]. The fuzzy method is a popular for modeling unknown nonlinear system [], especially Takagi-Sugeno fuzzy method, which can describe more system information than Mamdani fuzzy [0]. In [8], Kevin M. Passino listed several methods for Takagi-Sugeno fuzzy systems training, such as Gauss-Newton method, recursive least squares, and Levenberg-Marquardt method, where the Levenberg- Marquardt (L-M) algorithm shows the most efficient convergence. However, more efforts on L-M method are concentrated on training of neural networks [ 3, 5, 2, 5]. Furthermore, when using the standard L-M method for training, several disadvantages still exist. Firstly, the Jacobian matrix must be computed each iteration, which require large memory for matrix operations. Researches to modify the standard L-M method are mostly concentrated on this issue [2, 2, 5]. Secondly, serious error oscillations will occur sometimes and this phenomenon even aggravates when approaching the required accuracy. In [3], the authors applied variable decay rate to reduce error oscillations during the L-M training neural networks process, and the training speed is greatly accelerated. Thirdly, when the L-M method is used for solving nonlinear equations, the Jacobian matrix must be nonsingular. However, this requirement seems too stringent for the purpose of ensuring convergence of L-M method. Recently, Yamashita and Fukushima [3] have shown that L-M method has quadratic rate of convergence under a local error bound condition, which Supported by the Doctorate Innovation Foundation of Nanjing University of Aeronautics and Astronautics (BCXJ06-06) and the National Natural Science Foundation of China under Grant (90450). Corresponding author. Tel.: +86-25-84893084; fax: +86-25-84892300. E-mail address: xiaodusan@63.com. Published by World Academic Press, World Academic Union
290 Y. Wang & Q. Wu & et. al: Acceleration of Levenberg-Marquardt method training is weaker than the nonsingularity. See also [4] for some subsequent related results for local error bound case, unfortunately, they were not applied to intelligent identification systems. As discussed above, a modification method by varying step size is proposed in this study, which has been made to speed up L-M algorithm under a local error bound condition [4, 3]. For standard L-M method, a fixed step size may be sufficient for most applications. However, in the practical applications of fuzzy systems, fast convergence in training is always desired, especially when fuzzy systems are running under a real-time mode. Here, we chose step size as a function of approximation error to accelerate the convergence speed, and its quadratic convergence is analysed theoretically. Finally, the proposed method is applied to training a unified chaotic system based on Takagi-Sugeno fuzzy. It can adjust the parameters of each linear polynomials and fuzzy membership functions on line, and raise the convergence speed greatly. Compared this method with the standard L-M method, the convergence speed is accelerated as shown in the simulation results. 2 Chaotic systems modeling 2. Chaotic systems description A unified chaotic system, namely y = f(x) = f (x) f 2 (x) f 3 (x) = ẋ ẋ 2 ẋ 3 = (25α + 0)(x 2 x ) (28 35α)x x x 3 + (29α )x 2 x x 2 (α+8)x 3 3 () was created by Lü et al [7] which was controlled by an important parameter α and displayed chaotic behaviour when α [0, ]. It is interesting that the unified chaotic system actually belongs to an extended Lorenz system when 0 α < 0.8, Lü system when α = 0.8, and extended Chen system when 0.8 < α. The three-dimensional output trajectory of the chaotic system () for α = 0 and α = 0.99 with initial conditions x(0) = [ 7, 0, 30] T is given in Fig.. Fig.. 3-D chaotic trajectory for α = 0 and α = 0.99 Here, we assume that the chaotic system () is a non-model dynamics, namely, the input-output data can be obtained from experiment but the structure is unknown. Next, we will identify the system () using input-output data based on Takagi-Sugeno fuzzy theory. 2.2 Approximation scheme The on-line function approximation scheme based on Takagi-Sugeno fuzzy is shown in Fig. 2, where we illustrate the tuning of ω 0 using the data pair (x(i), y(i)) information (i =, 2,, M, M is the number of sampling data). The approximator based on Takagi-Sugeno [0] fuzzy system is given by: WJMS email for contribution: submit@wjms.org.uk
World Journal of Modelling and Simulation, Vol. 3 (2007) No. 4, pp. 289-298 29 Fig. 2. Fuzzy on-line approximation scheme ŷ = F ts (x, ω 0 ) = R i= h i(x)µ i (x) R i= µ, (2) i(x) where h i (x) = a i, 0 +a i, (x ) + + a i, n x n, and a i, j (i =, 2,, R; j =, 2,, n) are constants usually defined by experts experience. Also, for i =, 2,, R, µ i (x) = ( ) n exp xj c i 2 j 2 σj i, (3) j= where c i j is the center of the membership function in the jth input for the i th rule, and σj i width of the membership function for the j th input and the i th rule. The i th fuzzy rule is: > 0 is the relative Rule i: IF x (t) is M i and x n (t) is M in THEN ŷ(t) = h i (x), i =, 2,, R. (4) If let ζ j = µ j (x) R i= µ, j =, 2,, R, (5) i(x) φ(x) = [ζ (x), ζ 2 (x),, ζ R (x), x ζ (x), x ζ 2 (x),, x ζ R (x),, x n ζ (x), x n ζ 2 (x),, x n ζ R (x)] T (6) and ω 0 = [a,0, a 2,0,, a R,0, a,, a 2,,, a R,,, a,n, a 2,n,, a R,n ] T, (7) so that the Takagi-Sugeno fuzzy system (2) can be rewritten as: ŷ = F ts (x, ω 0 ) = ω T 0 φ(x). (8) If the centers and widths of the membership functions are required to tuning on line, the vector ω containing all these parameters is defined [8] : ω = [c,, c R n, σ,, σ R n, a,0, a 2,0,, a R,0, a,, a 2,,, a R,,, a,n, a 2,n,, a R,n ] T. (9) 3 A modified L-M method In this section, we will present a modified L-M method, including the proof of its convergence and some expressions for its application to Takagi-Sugeno fuzzy training. Firstly, the principle of standard L-M method is given. WJMS email for subscription: info@wjms.org.uk
292 Y. Wang & Q. Wu & et. al: Acceleration of Levenberg-Marquardt method training 3. The standard l-m method review In the standard L-M method, the approximation error vector is defined as: ɛ(ω) = [ɛ, ɛ 2,, ɛ M ] T, (0) where ɛ(ω) is a mapping from R n to R M. Throughout the paper, we assume that ɛ(ω) is continuously differentiable and ɛ(ω) = 0 has a nonempty solution set, which we denote by Ω. The merit function J is defined by: J(ω) = M ɛ(i) T ɛ(i) = 2 2 ɛ(ω)t ɛ(ω). () i= The parameter update formula of the standard L-M method is: ω k = ( ( ɛ(ω k )) T ɛ(ω k ) + Λ k ) ( ɛ(ω k )) T ɛ(ω k ), (2) ( ( ɛ(ω where k is iteration number, Λ k = λ k I R p p is a diagonal matrix such that k ) ) ) T ɛ(ω k ) + Λ k is positive definite so that it is invertible, ɛ(ω k ) is a Jacobian matrix. 3.2 Quadratic convergence of the modified l-m method In the standard L-M method, thinking of λ k as step size, the value of λ k is taken to be a constant. Generally, a small value of λ k will get fast convergence, while a larger value will get slower convergence [8]. Here, we choose λ k as a function of approximation error to accelerate the convergence speed. Firstly, to verify the quadratic rate of convergence of the L-M method, we give a local error bound [3] for ɛ(ω) = 0 is defined as follows. Definition. Let N be a subset of R n such that Ω N. We say that ɛ(ω) provides a local error bound on N for ɛ(ω) = 0 if there exists a positive constant c such that ɛ(ω) cdist(ω, Ω ), ω N. (3) The local error bound condition is weaker than the nonsingularity of ɛ(ω ). Here, we discuss local convergence properties of modified L-M method without line search, i.e. the update formula is given by: { ω k+ = ω k + ω k ω k = ( ( ɛ k ) T ɛ k + λ k I ) ( ɛ k ) T ɛ k (4), where λ k is defined by: λ k = { ɛ(ω k ) 2 = ɛ k 2, ω k ω k ɛ(ω k ) 2 = ɛ k 2, ω k ω k <. (5) By using a property of the minimization problem, we will investigate a rate of convergence of the modified L-M method and define the quadratic function δ k : R n R by δ k ( ω k ) = ɛ k ω k + ɛ k 2 + λ k ω k 2. (6) The function δ k is strictly convex. Therefore, the unconstrained minimization problem min ω R δk ( ω) (7) n is equivalent to (4), which can be easily verified from the first-order optimality condition for (7). Next, we make some assumptions that will guarantee the quadratic rate of convergence of the L-M method. Assumption [3] For some ω Ω, the following conditions (a), (b) hold: WJMS email for contribution: submit@wjms.org.uk
World Journal of Modelling and Simulation, Vol. 3 (2007) No. 4, pp. 289-298 293 (a) There exist constants b (0, ) and c (0, ) such that ɛ(ν)(ω ν) (ɛ(ω) ɛ(ν)) c ν ω 2, ω, ν N(ω, b) = {ω R p ω ω b}. (8) (b) ɛ(ω) provides a local error bound on N(ω, b) for ɛ(ω) = 0, i.e. there exists a constant (0, ) such that ɛ(ω) dist (ω, Ω ), ω N(ω, b). (9) Assumption (a) holds when ɛ(ω) is continuously differentiable and ɛ(ω) is Lipschitz continuous. Note that, by Assumption (a), there exists a positive constant L such that that ɛ(ω) ɛ(ν) L ω ν, ω, ν N(ω, b). (20) Under these assumptions, we show the following key lemmas. For all k, ω k denotes a vector in Ω such ω k ω k = dist (ω k, Ω ), ω k Ω. (2) Below, ω refers to the solution of ɛ(ω) = 0 as specified in Assumption. Lemma. Suppose that Assumption hold. If ω k N(ω, b 2 ), then there exist constants c 3, c 4, c 5, c 6 such that the solution ω k of (4) satisfies ω k ɛ k ω k + ɛ(ω k ) { c3 ω k ω k, dist (ω k, Ω ) c 4 ω k ω k, dist (ω k, Ω ) <, { c5 ω k ω k 2, dist (ω k, Ω ) c 6 ω k ω k 2, dist (ω k, Ω ) <. (22) (23) Proof: Since ω k solves the minimization problem (7), we get Furthermore, since ω k N(ω, b 2 ), we get δ k ( ω k ) δ k ( ω k ω k ). (24) ω k ω ω k ω k + ω ω k ω ω k + ω ω k b (25) and hence ω k N(ω, b). It follows from the definition of δ k, the inequality (24), and Assumption (a) that Since, that, ω k 2 δ k ( ω k ) λ k δ k ( ω k ω k ) λ k = ( λ ɛ(ω k ) + ɛ( ω k ω k ) 2 + λ k ω k ω k 2) k = ( λ ɛ(ω k ) ɛ(ω k ω k ) 2 + λ k ω k ω k 2) k ( λ c 2 ( k ω k ω k ) 4 + λ k ω k ω k 2) ω k ω k 2 + ω k ω k 2. λ k λ k = ω k 2 And since ω k (ω, b 2 ), such that { ɛ k 2 L 2 ω k ω k 2, dist(ω k, Ω ) ɛ k 2 2 ωk ω k 2, dist(ω k, Ω ) <, { ( c 2 L 2 ω k ω 4 + ) ω k ω k 2, dist(ω k, Ω ) +c2 2 ω k ω k 2, dist(ω k, Ω ) <. 2 (26) (27) (28) ω k ω k ω k ω b 2. (29) WJMS email for subscription: info@wjms.org.uk
294 Y. Wang & Q. Wu & et. al: Acceleration of Levenberg-Marquardt method training So (28) can be rewritten as ω k L 2 b 4 +6 4 ω k ω k, dist(ω k, Ω ) + 2 ω k ω k, dist(ω k, Ω ) <. (30) Denote c 3 = L 2 b 4 +6 4,c 4 = + 2, we get (22). Next we show the second inequality. In a way similar to proving the first inequality, we can show And since λ k = ɛ k ω k 2 δ k ( ω k ) δ k ( ω k ω k ) ωk ω k 4 + λ k ω k ω k 2. { ɛ k 2 L 2 ω k ω k 2, dist (ω k, Ω ) ɛ k 2 2 ωk ω k 2, dist (ω k, Ω ) <. (3) (32) By the Lipshitz property (20) and (32), then we have { ɛ k ω k + ɛ(ω k c ) 2 + c 2 2 ωk ω k 2, dist (ω k, Ω ) c 2 + L 2 ω k ω k 2, dist (ω k, Ω ) <. (33) Denote c 5 = + c 2 2,c 6 = + L2, then we get (23). Hence the proof is complete. The next theorem shows that dist (ω k, Ω ) converges to 0 quadratically, as long as the iterates {ω k } lie sufficiently near ω. Theorem. If ω k+, ω k N(ω, b 2 ), then there exist constants c 7, c 8, such that dist(ω k+, Ω ) { c7 dist (ω k, Ω ) 2, dist (ω k, Ω ) c 8 dist (ω k, Ω ) 2, dist (ω k, Ω ) <. (34) Proof. Since ω k+, ω k N(ω, b 2 ) and ωk+ = ω k + ω k, it follows from Assumption and Lemma that dist (ω k+, Ω ) = dist (ω k + ω k, Ω ) ɛ(ω k + ω k ) ɛ k ω k + ɛ k + c ω k 2. (35) Therefore, substituting (30) and (33) into (35), we get dist (ω k+, Ω ) { c7 dist (ω k, Ω ) 2, dist (ω k, Ω ) c 8 dist (ω k, Ω ) 2, dist (ω k, Ω ) <, (36) where c 7 = 6 +c 2 2 +c ( L2 b 4 +6) 6, c 8 = c2 2 +L 2 +c ( +c2 2 ) c 3 2, then get (34). Hence the proof is complete. From this theorem, we see that dist (ω k, Ω ) converges to 0 quadratically if ω k N(ω, b 2 ) for all k. Remark. During the training process, it is difficult to judge whether dist (ω k, Ω ) is bigger than. In general, if parameters and L are rightly selected by using (9) and (20), the condition can be converted into judging whether ɛ k is bigger than. WJMS email for contribution: submit@wjms.org.uk
World Journal of Modelling and Simulation, Vol. 3 (2007) No. 4, pp. 289-298 295 3.3 Modified l-m method for fuzzy training The object of training is when the iteration number k, there has y ŷ. When training with the modified L-M method, the approximation error is: ɛ(i) = y(i) ŷ(i) (i =, 2,, M). (37) The error vector is a function of ω, which consists all weights of fuzzy system (2). The increment of weights ω can be obtained from (4), where Jacobian matrix ɛ(ω) is defined by: ɛ ω ɛ(ω) = Notice that for i =, 2,, M, j =, 2,, p: ɛ i ω j =. ɛ ω p ɛ M ω.... ɛ M ω p. (38) (y(i) ŷ(i)) = F ts (x(i), ω). (39) ω j ω j It is convenient to compute this partial by considering various components of the vector in sequence. We will use indices i and j to help avoid confusion with the indices i and j. We find, for i =, 2,, M, j =, 2,, R, σ j c j F ts (x(i ), ω) = c j F ts (x(i ), ω) = = ( P R ) i= h i(x(i ))µ i (x(i )) P R i= µ i(x(i )) ( P R i= µ i(x(i ))) h i (x(i )) ( P R = = c j ( P R i= µ i(x(i ))) 2 µ j (x(i )) i= h i(x(i ))µ i (x(i ))) c j µ j (x(i )) ( P R ( i= µ i(x(i )))) 2 hj (x(i )) F ts(x(i ),ω) P R i= µ i(x(i )) ( ) hj (x(i )) F ts(x(i ),ω) P R i= µ i(x(i )) c j!! µ j (x(i )) µ j (x(i )) ) h j (x(i )) F ts (x(i ), ω) R i= µ µ j (x(i )) i(x(i )) x (i ) c j. (σ j )2 Using the same development above, we can find other elements of the ɛ(ω) matrix. For space limited, here, we only give some expressions as follows: ( ) 2 (x (i ) c j a j,0 a j, F ts (x(i ), ω) = a j,0 (40) (σ j )3, (4) (h j (x(i )) ζ j (x(i ))) = ζ j (x(i )), (42) F ts (x(i ), ω) = x (i )ζ j (x(i )). (43) According to the above rule, all the elements for the ɛ(ω) matrix can be obtained, and hence we can implement the modified L-M method for Takagi-Sugeno fuzzy systems training. 4 Modified l-m method for chaotic systems fuzzy modeling To verify the performance of the proposed method, we will use Takagi-Sugeno fuzzy system to estimate the chaotic system () with α = 0.99. Because f (x) is a linear function, here, we only approximate nonlinear functions f 2 (x) and f 3 (x). The parameter update formula is given in equation (4), and the training step size is defined as (5). WJMS email for subscription: info@wjms.org.uk
296 Y. Wang & Q. Wu & et. al: Acceleration of Levenberg-Marquardt method training 4. Parameter set and initialization () The centers of membership functions are simply forced to lie between -5 and +5 (assume that the maximum variation on the input domain is known) and spreads to all to be between 0. and and uses projection to maintain this for each iteration, which can avoid a divide-by zero and have an upper bound. (2) The centers are initialized to be on a uniform grid across the input space [-5,5]. (3) The spreads are all initialized to be 0.5 so that there is a reasonable amount of separation. The parameters of ω 0 are simply initialized to be all zero. The fuzzy rule number R is defined as 2. 4.2 Results For x is a key variable where nonlinearities of f 2 (x) and f 3 (x) lie in, here, only the identification figures of f 2 (x) and f 3 (x) about variable are simulated, as shown in Fig. 3 Fig. 6. When the approximation error is less than 0.02, the iterations are stopped. Fig. 3. fuzzy approximation of f 2 (x) and approximation error after one iterative step Fig. 4. fuzzy approximation of f 2 (x) and approximation error after three iterative steps As can be seen from the figures, Takagi-Sugeno fuzzy system can approximate the unified chaotic system. The modified L-M method for fuzzy training satisfies the precision requirement only after three iterative steps, so it gets faster convergence than the standard L-M method. Because the step size is a function of the approximation error, and it can embody the approximation error s variation, undoubtfully this will help the L-M algorithm to increase the adaptive ability of training scheme. WJMS email for contribution: submit@wjms.org.uk
World Journal of Modelling and Simulation, Vol. 3 (2007) No. 4, pp. 289-298 297 Fig. 5. fuzzy approximation of f 3 (x) and approximation error after one iterative step Fig. 6. fuzzy approximation of f 2 (x) and approximation error after three iterative steps 5 Conclusions Although the standard L-M method is considered to be one of the most effective methods in training Takagi-Sugeno fuzzy systems, fast convergence in training is our object, especially for the real time operating systems. Furthermore, based on the modified L-M method for Takagi-Sugeno fuzzy systems training, there don t rely on experts experience excessively. This method can adjust the parameters of each linear polynomials and fuzzy membership functions on line, and raise the convergence speed greatly for considering the step size as a function of the approximation error. And in this study, we theoretically analyse its quadratic convergence, and applied this to training the unified chaotic system based on Takagi-Sugeno fuzzy. The results show that the proposed method can obtain faster convergence than the standard L-M method. Thus our work provides a new technique not only for identification of chaotic systems, but also for complex nonlinear systems. References [] A. S. Amir, B. T. Mohammad, H. Abbas. Modified Levenberg-Marquardt method for neural networks training. Transaction on Engineering. Computing and Technology, 2005, 6(2): 46 48. [2] L. W. Chan, C. C. Szeto. Training recurrent network with block-diagonal approximated Levenberg-Marquardt algorithm. in: Proceedings of the International Joint Conference on Neural Networks, IEEE Press, Washington D.C., 999, 52 526. [3] T. C. Chen, D. J. Han. Acceleration of Levenberg-Marquardt training of neural networks with variable decay rate. in: Proceedings of the International Joint Conference on Neural Networks, IEEE Press, Portland, 2003, 873 878. [4] J. Y. Fan, Y. X. Yuan. On the quadratic convergence of the Levenberg-Marquardt method without nonsingularity assumption. Computing, 2005, 74: 23 39. [5] M. T. Hagan, M. B. Menhaj. Training feedforward networks with the Marquardt algorithm. IEEE Transactions on Neural Networks, 994, 5(6): 989 993. WJMS email for subscription: info@wjms.org.uk
298 Y. Wang & Q. Wu & et. al: Acceleration of Levenberg-Marquardt method training [6] H. Huijberts, H. Nijmeijer, R. Willems. System identification in communication with chaotic systems. IEEE transactions on circuits and systems-i: fundamental theory and applications, 2000, 47(6): 800 808. [7] J. H. Lü, G. Chen, S. Zhang. Dynamical analysis of a new chaotic attractor. Int. J. Bifurcation and Chaos, 2002, 2(5): 2 5. [8] K. M. Passino. Biomimicry for Optimization, Control and Automation. Springer-Verlag, London, UK, 2005. [9] A. S. Poznyak, W. Yu, E. N. Sanchez. Identification and control of unknown chaotic systems via dynamic neural networks. IEEE transactions on circuits and systems-i: fundamental theory and applications, 999, 46(2): 49 495. [0] T. Takagi, M. Sugeno. Fuzzy identification of systems and its applications to modeling and control. IEEE Transaction on Systems, Man and Cybernetics, 985, 5(): 6 32. [] L. X. Wang. Fuzzy systems are universal approximators. in: Proceeding of IEEE International Conference on Fuzzy Systems, IEEE Press, San Diego, 992, 63 70. [2] B. M. Wilamowski, Y. Chen, A. Malinowski. Efficient algorithm for training neural networks with one hidden layer. in: Proceedings of the International Joint Conference on Neural Networks, IEEE Press, Washington D.C., 999, 725 728. [3] N. Yamashita, M. Fukushima. On the rate of the convergence of the Levenberg-Marquardt method. Computing, 200, 5: 239 249. [4] D. C. Yu, A. G. Wu, D. Q. Wang. A simple asymptotic trajectory control of full states of a unified chaotic system. Chinese Physics, 2006, 5(2): 306 309. [5] G. Zhou, J. Si. Advanced neural-network training algorithm with reduced complexity based on Jacobian deficiency. IEEE Transaction on Neural Networks, 988, 9(3): 448 453. WJMS email for contribution: submit@wjms.org.uk