Zhang Neural Network without Using Time-Derivative Information for Constant and Time-Varying Matrix Inversion Yunong Zhang, Member, IEEE, Zenghai Chen, Ke Chen, and Binghuang Cai Abstract To obtain the inverses of time-varying matrices in real time, a special kind of recurrent neural networks has recently been proposed by Zhang et al. Itisprovedthatsuch a Zhang neural network (ZNN) could globally exponentially converge to the exact inverse of a given time-varying matrix. To find out the effect of time-derivative term on global convergence as well as for easier hardware-implementation purposes, the ZNN model without exploiting time-derivative information is investigated in this paper for inverting online matrices. Theoretical results of both constant matrix inversion case and time-varying matrix inversion case are presented for comparative and illustrative purposes. In order to substantiate the presented theoretical results, computer-simulation results are shown, which demonstrate the importance of time derivative term of given matrices on the exact convergence of ZNN model to time-varying matrix inverses. I. INTRODUCTION The problem of obtaining online the inverse of a matrix arises in numerous fields of science, engineering, and business. It is usually a fundamental part of many solutions, e.g., as essential steps for signal-processing [][] and robot control [3][]. The circuit-realizable dynamic-system approach is one of the important parallel-computational methods for solving matrix-inverse problems [][3]-[]. Recently, due to the indepth research in neural networks, numerous dynamic and analog solvers in the form of recurrent neural networks have been developed and investigated [][][][8][]-[]. The neural-dynamic approach is now regarded as a powerful alternative to online computation because of its parallel distributed nature and convenience of hardware implementation [3][]. A special kind of recurrent neural networks with implicit dynamics has recently been proposed by Zhang et al for time-varying equations solving; e.g., [][9][][]. To solve for the inverse of time-varying matrix A(t) R n n,the following ZNN model could be established: A(t)Ẋ(t) = Ȧ(t)X(t) γf( A(t)X(t) I ), () where, starting from an initial condition X() R n n, X(t) is the activation state matrix corresponding to time-varying inverse A (t). On the other hand, to invert constant matrix A R n n, without derivative-circuit Ȧ(t)X(t) (in short, Y. Zhang, Z. Chen, and B. Cai are with the Department of Electronics and Communication Engineering, Sun Yat-Sen University, Guangzhou 575, China. K. Chen are with the School of Software, Sun Yat-Sen University, Guangzhou 575, China (phone: +8--83597; emails: ynzhang@ieee.org, zhynong@mail.sysu.edu.cn). D-circuit) due to time-derivative Ȧ(t) in this case, the above ZNN model () reduces to AẊ(t) = γf( AX(t) I ). () In () and (), I R n n denotes the identity matrix, design parameter γ > is used to scale the convergence rate of the neural solution, and F ( ) :R n n R n n denotes a matrix-valued activation-function array of neural networks. Processing array F ( ) is made of n monotonically increasing odd activation-functions f( ). For example, the first three basic types of the following activation functions are depicted in Fig. : linear activation function f(u) =u, bipolar sigmoid activation function (with ξ ) f(u) =( exp( ξu))/( + exp( ξu)), power activation function (with odd integer p 3) f(u) =u p, and, power-sigmoid activation function { u p, if u > f(u) = δ exp( ξu) +exp( ξu), otherwise (3) with ξ, p 3, andδ = ( + exp( ξ))/( exp( ξ)) >. It has been shown that ZNN () could globally exponentially converge to the exact inverse of a given timevarying matrix [][9][][]. In order to know the effect of time-derivative information on neural matrix inversion () as well as for lower-complexity hardware-implementation, such ZNN models without using time-derivative information are investigated in this paper for both constant-matrix and time-varying-matrix inversion. For comparison between the above presented ZNN model and the conventional gradientbased neural network (GNN) for matrix inversion, please see Appendix A. The remainder of this paper is thus organized in three sections. In Section II, we analyze and simulate the ZNN model () for constant matrix inversion. In Section III, we analyze and simulate the ZNN model () without D-circuit for time-varying matrix inversion. Section IV concludes this paper with final remarks. Before ending this introductory section, it is worth mentioning the main contributions of the article as follows. ) We investigate that ZNN model () could globally exponentially converge to the exact inverse of a given constant nonsingular matrix. 978---8-3/8/$5. c 8 IEEE
Fig.. linear sigmoid power f(u) Activation function f( ) being the ijth element of array F( ) ) We show that the time derivative information of given matrices (or to say, the D-circuit) plays an important role in ZNN model () which inverts time-varying matrices in real time. 3) We substantiate that the ZNN model without D-circuit can work approximately well for time-varying matrix inversion, if we pursue simpler neural-circuit implementation and allow less accurate solution. II. CONSTANT MATRIX INVERSION In this section, the ZNN model without time-derivative information [i.e., neural-dynamics ()] is employed for inverting online constant matrices. Theoretical analysis is presented in detail and verified by computer simulations. A. Theoretical Results The following theoretical results are established about the global exponential convergence of ZNN () which inverts constant nonsingular matrix A R n n online. Theorem : Consider constant nonsingular matrix A R n n. If a monotonically-increasing odd activation function array F ( ) is used, then the state matrix X(t) of ZNN () starting from any initial state X() R n n always converges to constant theoretical-inverse X := A of matrix A. Moreover, for constant A, the ZNN model () possesses ) global exponential convergence with rate γ if using linear activation-function array F(X) =X; ) global exponential convergence with rate ξγ/ if using bipolar sigmoid activation-function array; 3) superior convergence to situation ) for error range [AX(t) I] ij > with i, j {,,,n}, ifusing power activation-function array; and ) globally superior convergence to situation ) if using power-sigmoid activation-function array. Proof: Omitted due to space limitation. u B. Simulative Verification For illustration, let us consider the following constant matrix A R 3 3 with its theoretical inverse given below for comparison: A =, X := A =. The ZNN model () solving for A could thus be depicted in the following specific form: ẋ ẋ 3 ẋẋ ẋ ẋ 3 ẋ 3 ẋ 3 ẋ 33 = γf x x x 3 x x x 3, x 3 x 3 x 33 where processing array F( ) could typically be constructed by using n power-sigmoid activation functions in the form of (3) with ξ =and p =3. As seen from Fig., starting from any initial states randomly selected in [, ] 3 3, state matrices of ZNN model () all converge to constant theoretical-inverse A. Evidently, the convergence time could be decreased considerably by increasing design parameter γ. It follows from this figure and other simulation data that design parameter γ plays an important role on the convergence speed of ZNN models. In addition, for the situation of using power-sigmoid activationfunction array, Fig. 3 shows that superior convergence could be achieved, as compared to other activation-function situations. Note that A F := trace(a T A) denotes hereafter the Frobenius norm of matrix A. Computer-simulation has now substantiated the theoretical analysis presented in Subsection II-A. III. TIME-VARYING MATRIX INVERSION While Section II investigates the performance of ZNN model () inverting constant matrices, the time-varying matrices are inverted by the ZNN model in this section. For hardware implementation with lower complexity, however, instead of using ZNN (), we could have the following simplified neural-dynamic model by removing the D-circuit [i.e., the term of time derivative Ȧ(t)] from (): A(t)Ẋ(t) = γf (A(t)X(t) I), () which could solve approximately for time-varying A (t). Note that ZNN model () can be viewed as a time-varying version of ZNN model () by replacing A therein with A(t). A. Preliminaries To lay a basis for further detailed analysis of ZNN model (), the following invertibility-condition and A (t) F - lemma are presented []. The former guarantees the uniform existence of time-varying matrix inverse A (t), whereas the latter gives the uniform upper-bound of A (t) F.They will be used in the theoretical analysis of ZNN model () in the ensuing subsection. 8 International Joint Conference on Neural Networks (IJCNN 8) 3
x x x 3 x x x 3 5 5 5 x x x 3 5 5 5 x x x 3 5 5 5 x 3 x 3 x 33 5 5 5 x 3 x 3 x 33 5 5 5 5 5 5 (a) γ = (b) γ = Fig.. Inversion of constant matrix A by ZNN () using power-sigmoid activation-function array with ξ =and p =3 8 linear case power case 8 8 sigmoid case power-sigmoid case 8 Fig. 3. Convergence comparison of solution error X(t) A F by ZNN () using different activation-function arrays with γ = Condition: There exists a real number α> such that min λ i(a(t)) α, t (5) i {,,,n} where λ i ( ) denotes the ith eigenvalue of A(t) R n n. Lemma: If A(t) satisfies the invertibility condition (5) with its norm uniformly upper bounded by β (i.e., A(t) F β, t ), then A (t) F is uniformly upper bounded, i.e., n A (t) F ϕ := Cnβ i n i /α n i + n 3/ /α () i= for any time t, wherecn i := n!/(i!(n i)!) []. B. Theoretical Results The following theoretical results are established about the solution-error bound of simplified ZNN model () inverting time-varying nonsingular matrix A(t) online. Theorem : Consider time-varying nonsingular matrix A(t) R n n which satisfies invertibility condition (5) and norm condition (). If a monotonically-increasing odd activation function array F ( ) is used, then the computational error X(t) A (t) F of ZNN () starting from any initial state X() R n n is always upper bounded, with its steadystate solution-error no greater than nεϕ /(γρ εϕ), provided that Ȧ(t) F ε for any t [, ) and design-parameter γ is large enough (γ >εϕ/ρ), where coefficient ( ρ := min max i,j {,,n} ( f( e ij () )/ e ij () ) ),f (), (7) with e ij () := [A()X() I] ij, i, j {,,,n}. Proof: We can reformulate ZNN () as the following [with Δ B (t) := Ȧ(t) and Δ C(t) := R n n ]: A(t)Ẋ(t) = ( A(t)+Δ B )X(t) (8) γf (A(t)X(t) I)+Δ C, which becomes exactly equation () of []. In view of Δ B F = A = Ȧ F ε and Δ C F = for any t [, ), we could now reuse the theoretical results of Theorem in []. That is, the computational error X(t) A (t) F of neural-dynamics (8) [equivalently, ZNN ()] is always upper bounded. In addition, it follows immediately from Theorem and equation () of [] (see Appendix B) that its steady-state computational error lim X(t) t A (t) F nεϕ /(γρ εϕ). Furthermore, design-parameter γ is required therein to be greater than εϕ/ρ. In the original proof of Theorem of [], coefficient ρ>is defined between f(e ij ())/e ij () and f (). Following that proof and considering the worst case or such an error bound, we could determine the value of ρ as in (7). Specifically speaking, if the linear activationfunction array F(X) = X is used, then ρ ; if the bipolar sigmoid activation-function array is used, then ρ = max i,j {,,,n} (f( e ij () )/ e ij () ); and, if the powersigmoid activation-function array (3) is used, then ρ (where the sign of inequality > is taken in most situation). The proof is thus complete. 8 International Joint Conference on Neural Networks (IJCNN 8)
x x x x 5 5 5 5 x x x x 5 5 5 5 Fig.. Inversion of time-varying matrix A(t) by ZNN () using powersigmoid activation-function array and with design parameter γ =,where dashed-dotted curves denote the theoretical time-varying inverse A (t) Fig.. Inversion of time-varying matrix A(t) by ZNN () using powersigmoid activation-function array and with design parameter γ = 3.5 3.5.5.5 γ =.5 γ = 5 3.5 3.5.5 5 Fig. 5. Computational error X(t) A (t) F by ZNN () using powersigmoid activation-function array and different values of parameter γ C. Simulative Verification For illustration and comparison, let us consider the timevarying coefficient matrix with its theoretical inverse as the following: [ ] [ ] sin t cos t A(t) =,A sin t cos t (t) =. cos t sin t cos t sin t ZNN model () is thus in this specific form [ ] sin t cos t ][ẋ ẋ cos t sin t ẋ ẋ ([ ][ ] [ ]) sin t cos t x x = γf, cos t sin t x x where processing array F( ) could typically be constructed by using n power-sigmoid activation functions in the form of (3) with ξ =and p =3. Figs. and 5 could thus be generated to show the performance of ZNN (). According to Figs. and 5, starting from initial states randomly selected in [, ], state matrices of the presented ZNN model () could not converge to theoretical inverse exactly. Instead, it could only approach to an approximate solution of A (t). In addition, as shown in Fig. 5, when we increase design-parameter γ from to, the steady-state computational error lim t + X(t) A (t) F decreases rapidly. However, there always exists a steady-state solution-error which could not vanish to zero. These computer-simulation results have substantiated the theoretical results presented in Subsection III-B. For comparison between the simplified ZNN model () and the original ZNN model () [which has the time-derivative term Ȧ(t)X(t)], we could generate Fig. by applying ZNN () to this time-varying inversion. It shows the performance of the original ZNN model () for the time-varying matrix inversion under the same design parameters. Seeing and comparing Figs. and, we know that the time derivative information Ȧ(t) plays an important role on the convergence of ZNN models for time-varying matrix inversion. IV. CONCLUSIONS An efficient recurrent neural network for online timevarying matrix inversion has been proposed by Zhang et al [][9][][]. The performance analysis of such a ZNN model without time-derivative term is presented in this paper. For comparative purposes, both constant matrix inversion and time-varying matrix inversion are analyzed. On one hand, as time-derivative Ȧ, the global exponential convergence of the ZNN model () for such a given constantmatrix inversion could be achieved. On the other hand, without exploiting the Ȧ(t) information, the simplified ZNN model only approaches to the approximate inverse (instead of the exact one). Simulation results have demonstrated the importance of the time derivative information Ȧ(t), limiting the performance of the simplified ZNN model for online time-varying matrix inversion. 8 International Joint Conference on Neural Networks (IJCNN 8) 5
APPENDIX A The design of the gradient-based neural network (GNN) and the ZNN models ()-()-() could be viewed from the online solution of the following defining equation: AX(t) I =, t [, + ), (9) where coefficient A is an n-dimensional square-matrix (being constant for GNN design while able to be time-varying for ZNN design), and X(t) R n n is to be solved. GNN Design The GNN model for online constant-matrix inversion could be developed by the following procedure. Note that the gradient-descent design method could only be employed here for constant-matrix inversion problem. Firstly, to solve (9) for X(t) via a neural-dynamic approach, we can define a scalar-valued norm-based error function, E(t) = AX(t) I F /. Itisworth pointing out that a minimum point of residual-error function E(t) = AX(t) I F / could be achieved with E(t) =, if and only if X(t) is the exact solution of equation (9) [in other words, X(t) =X := A ]. Secondly, a computational scheme could be designed to evolve along a descent direction of this error function E(t), until the minimum point X is reached. Note that a typical descent direction is the negative gradient of E(t), i.e., ( E/ X) R n n. Thirdly, in view of E/ X = A T (AX(t) I) R n n, it follows from the gradient-descent design formula Ẋ(t) = γ E/ X that the following neural-dynamic equation could be adopted as of the conventional GNN model for online constant-matrix inversion: Ẋ(t) = γa T (AX(t) I), t [, + ), where design parameter γ> is defined the same as in the aforementioned ZNN models. ZNN Design The original ZNN model () for online constant and/or time-varying matrix inversion could be developed by the following procedure by Zhang et al [][9][][]. Afterwards, it could be simplified to be () and (). Note that this ZNN design method could be employed for both constant and time-varying problems solving. Firstly, we could construct a matrix-valued error function E(X(t),t) = AX(t) I. Note that E(X(t),t) equals zero if and only if X(t) is the solution of (9). Secondly, the error-function time-derivative Ė(X(t),t) is chosen to guarantee that every entry e ij (t), i, j =,,,n,ofe(x(t),t) converges to zero. Its general form could be given as the following: de(x(t),t) = γf (E(X(t),t)), () dt where design parameter γ>and activation function array F ( ) have been defined as in the previous sections. Finally, according to ZNN design formula (), we could thus have the ZNN model () for time-varying matrix inversion (effective as well for constant matrix inversion). Other ZNN variants [such as () and ()] could then be derived readily from this original ZNN model (). APPENDIX B Theorem of []: Consider the general RNN model with implementation errors Δ B and Δ C in (8). If Δ B (t) F ε and Δ C (t) F ε 3 for any t [, ), then the computation error X X F is bounded with steady-state residual error lim X(t) t X (t) F nϕ(ε 3 + ε ϕ)/(γρ ε ϕ) () under the design-parameter requirement γ>ε ϕ/ρ, where the parameter ρ > is defined between f(e ij ())/e ij () and f (). Furthermore,asγ tends to positive infinity, the steady-state residual error can be diminished to zero. REFERENCES [] R. J. Steriti and M. A. Fiddy, Regularized image reconstruction using SVD and a neural network method for matrix inversion, IEEE Transactions on Signal Processing, vol., no., pp. 37-377, 993. [] Y. Zhang, W. E. Leithead, and D. J. Leith, Time-series Gaussian process regression based on Toeplitz computation of O(N ) operations and O(N)-level storage, Proc. the th IEEE Conference on Decision and Control, Seville, 5, pp. 37-37. [3] R. H. Sturges Jr, Analog matrix inversion (robot kinematics), IEEE Journal of Robotics and Automation, vol., no., pp. 57-, 988. [] Y. Zhang and S. S. Ge, Design and analysis of a general recurrent neural network model for time-varying matrix inversion, IEEE Transations on Neural Networks, vol., no., pp. 77-9, 5. [5] N. C. F. Carneiro and L. P. Caloba, A new algorithm for analog matrix inversion, Proc. the 38th Midwest Symposium on Circuits and Systems, Rio de Janeiro, vol., 995, pp. -. [] F. L. Luo and B. Zheng, Neural network approach to computing matrix inversion, Applied Math. Comput., vol. 7, pp. 9-, 99. [7] R. K. Manherz, B. W. Jordan, and S. L. Hakimi, Analog methods for computation of the generalized inverse, IEEE Transactions on Automatic Control, vol. 3, no. 5, pp. 58-585, 98. [8] J. Song and Y. Yam, Complex recurrent neural network for computing the inverse and pseudo-inverse of the complex matrix, Appiled Mathematics and Computation, vol. 93, pp. 95-5, 998. [9] Y. Zhang and S. S. Ge, A general recurrent neural network model for time-varying matrix inversion, Proc. the nd IEEE Conference on Decision and Control, Hawaii, 3, pp. 9-7. [] Y. Zhang, K. Chen, and W. Ma, Matlab simulation and comparison of Zhang neural network and gradient neural network of online solution of linear time-varying equations, DCDIS Proc. International Conference on Life System Modeling and Simulation (LSMS 7), Shanghai, 7, pp. 5-5. [] Y. Zhang, K. Chen, W. Ma, and X. Li, Matlab simulation of gradientbased neural network for online matrix inversion, Lecture Notes on Artificial Intelligence, vol. 8, pp. 98-9, 7. [] Y. Zhang, D. Jiang, and J. Wang, A recurrent neural network for solving Sylvester equation with time-varying coefficients, IEEE Transactions on Neural Networks, vol. 3, no. 5, pp. 53-3,. [3] C. Mead, Analog VLSI and Neural Systems, Reading, Mass.: Addison- Wesley, 989. [] D. Tank and J. Hopfield, Simple neural optimization networks: an A/D converter, signal decision circuit, and a linear programming circuit, IEEE Trans. Circuits Syst., vol. 33, no. 5, pp. 533-5, 98. [5] Y. H. Kim, F. L. Lewis, and C. T. Abdallah, A dynamic recurrent neural-network-based adaptive observer for a class of nonlinear systems, Automatica, vol. 33, no. 8, pp. 539-53, 997. [] J. Wang and G. Wu, A multilayer recurrent neural network for on-line synthesis of minimum-norm linear feedback control systems via pole assignment, Automatica, vol. 3, no. 3, pp. 35-, 99. 8 International Joint Conference on Neural Networks (IJCNN 8)