A Reproducing Kernel Hilbert Space Approach for the Online Update of Radial Bases in Neuro-Adaptive Control

Size: px

Start display at page:

Download "A Reproducing Kernel Hilbert Space Approach for the Online Update of Radial Bases in Neuro-Adaptive Control"

Christal Randall
5 years ago
Views:

1 A Reproducing Kernel Hilbert Space Approach for the Online Update of Radial Bases in Neuro-Adaptive Control Hassan A. Kingravi, Girish Chowdhary, Patricio A. Vela, and Eric N. Johnson Abstract Classical work in model reference adaptive control for uncertain nonlinear dynamical systems with a Radial Basis Function (RBF) neural network adaptive element does not guarantee that the network weights stay bounded in a compact neighborhood of the ideal weights when the system signals are not Persistently Exciting (PE). Recent work has shown, however, that an adaptive controller using specifically recorded data concurrently with instantaneous data guarantees boundedness without PE signals. However, the work assumes fixed RBF network centers, which requires domain knowledge of the uncertainty. Motivated by Reproducing Kernel Hilbert Space theory, we propose an online algorithm for updating the RBF centers to remove the assumption. In addition to proving boundedness of the resulting neuro-adaptive controller, a connection is made between PE signals and kernel methods. Simulation results show improved performance. Index Terms Kernel, Adaptive control, Radial basis function networks, Nonlinear control systems I. INTRODUCTION In practice, it is impractical to assume that a perfect model of a system is available for controller synthesis. Assumptions introduced in modeling, and changes to system dynamics during operation, often result in modeling uncertainties that must be mitigated online. Model Reference Adaptive Control (MRAC) has been widely studied for broad classes of uncertain nonlinear dynamical systems with significant modeling uncertainties [, 4, 9, 4]. In MRAC, the system uncertainty is approximated using a weighted combination of basis functions, with the numerical values of the weights adapted online to minimize the tracking error. When the structure of the uncertainty is unknown, a neuro-adaptive approach is often employed, in which a Neural Network (NN) with its weights adapted online is used to capture the uncertainty. A popular example of such a NN is the Gaussian Radial Basis Function (RBF) NN for which the universal approximation property is known to hold [3]. Examples of MRAC systems employing Gaussian RBFs can be found in [6, 33, 39, 43]. In practice, then, nonlinear MRAC systems typically consist of an additional online learning component. What differentiates MRAC from traditional online learning strategies is the existence of the dynamical system and its connection to a control task. The strong connection between the learning and the evolving dynamical system through the controller imposes constraints on the online learning strategy employed. These con- H. Kingravi and Assoc. Prof. P. A. Vela are with the School of Electrical and Computer Engineering, and Assoc. Prof. E. N. Johnson is with the School of Aerospace Engineering at the Georgia Institute of Technology, Atlanta GA. G. Chowdhary is with the Department of Aeronautics and Astronautics at the Massachusetts Institute of Technology, Cambridge, MA. kingravi@gatech.edu, pvela@gatech.edu, Eric.Johnson@ae.gatech.edu, girishc@mit.edu straints ensure viability of the system s closed loop stability and performance. For example, classical gradient-based MRAC methods require a condition on Persistency of Excitation (PE) in the system states [3]. This condition is often impractical to implement and/or infeasible to monitor online. Hence, authors have introduced various modifications to the adaptive law to ensure that the weights stay bounded around an a-priori determined value (usually set to ). Examples of these include Ioannou s σ-mod, Narendra s e-mod, and the use of a ection operator to bound the weights [4, 4]. Recent work in concurrent learning has shown that if carefully selected and online recorded data is used concurrently with current data for adaptation, then the weights remain bounded within a compact neighborhood of the ideal weights. [8, 9] Neither concurrent learning nor the aforementioned modifications require PE, When concurrent learning is used, in order to successfully learn the uncertainty, a set of RBF centers must be chosen over the domain of the uncertainty. In prior neuro-adaptive control research, it is either assumed that the centers for the RBF network are fixed [8, 9, 9, 3, 4], or that the centers are moved to minimize the tracking error e [7, 38]. In the former case, the system designer is assumed to have some domain knowledge about the uncertainty to determine how the centers should be selected. In the latter case, the updates favor instantaneous tracking error reduction, the resulting center movement does not guarantee optimal uncertainty capture across the entire operating domain. In this work, utilizing the fact that the Gaussian RBF satisfies the properties of a Mercer kernel, we use methods from the theory of Reproducing Kernel Hilbert Spaces (RKHSs) to remove the need for domain knowledge and to propose a novel kernel adaptive control algorithm. Partly due to the success of Support Vector Machines, there has been great interest in recent years in kernel methods, a class of algorithms exploiting RKHS properties which can be used for classification [6], regression[, 37] and unsupervised learning []. We first use RKHS theory to make a connection between PE signals in the state space (e.g., the input space) and PE signals in the higher dimensional feature space (w/respct to the RBF bases). Particularly, we show that the amount of persistent excitation inserted in the states of the system may vanish or greatly diminish if the RBF centers are improperly chosen. We then utilize this connection along with the kernel linear independence test [3] to propose an online algorithm, called Budgeted Kernel Restructuring (BKR). BKR picks centers so as to ensure that any excitation in the system does not vanish or diminish. The algorithm is then augmented with Concurrent Learning (CL). The resulting kernel adaptive algorithm (BKR-CL) uses instantaneous as well as specifically selected and online recorded data points concurrently for

2 adaptation. Lyapunov-like analysis proves convergence of the RBF weights to a compact neighborhood of their ideal values (in the sense of the universal approximation property), if the system states are exciting over a finite interval. BKR-CL does not require PE. Further, BKR-CL can control an uncertain multivariable dynamical system effectively even if all the RBF centers are initialized to the same value (for example to ). In addition to removing the assumption of fixed RBF centers, we show that, given a fixed budget (maximum number of allowable centers), the presented method outperforms existing methods that uniformly distribute the centers over an expected domain. Effectiveness of the algorithm is shown through simulation and comparison with other neuro-adaptive controllers. To the authors knowledge, little work exists explicitly connecting ideas from RKHS theory to neuro-adaptive control. Recent work on bridging kernel methods with filtering and estimation exists under the name of kernel adaptive filtering []. Kernel recursive square methods [] and kernel least mean square algorithms [] have been previously explored for kernel filtering, which is mainly used for system-identification problems, where the goal is to minimize the error between the system model and an estimation model. The main contribution of this paper however is the presentation of an adaptive control technique which uses RKHS theory. The key requirements being that the tracking error between reference model and the system, and the adaptation error between the adaptive element and the actual uncertainty be simultaneously driven to a bounded set. Kernel adaptive filtering only addresses the former requirement. While there has been other work applying kernel methods such as support vector machines to nonlinear control problems [, 3], it does not make any explicit connections to PE conditions, and does not show the boundedness of the weights in a region around their true values. This paper takes the first crucial steps in establishing that connection, and therefore serves to further bring together ideas from machine learning and adaptive control. Organization: Section II outlines some preliminaries, including the definition of PE and the relevant ideas from RKHS theory. Section III poses the MRAC problem for uncertain multivariable nonlinear dynamical systems, and outlines the concurrent learning method. Section IV establishes a connection between PE and RKHS theory, then utilizes the connection to outline the BKR algorithm for online center selection. Section V outlines the BKR-CL algorithm, and establishes boundedness of the weights for the centers using Lyapunov like analysis. Section VI presents the results of a simulated experiments. Section VII concludes the paper. II. PRELIMINARIES A. Reproducing Kernel Hilbert Spaces Let H be a Hilbert space of functions defined on the domain D R n. Given any element f H and x D, by the Riesz representation theorem [], there exists some unique element K x (called a point evaluation functional) s.t. f(x) = f, K x H. () H is called a reproducing kernel Hilbert space if every such linear functional K x is continuous (something which is not true in an arbitrary Hilbert space). It is possible to construct these spaces using tools from operator theory. The following is standard material; see [3] and [] for more details. A Mercer kernel on D R n is any continuous, symmetric positive-semidefinite function of the form k : D D R. Associated to k is a linear operator T k : L (D) R which can be defined as T k (f)(x) := k(x, s)f(s)ds where f is a function in L (D), the space of square integrable functions on the domain D. Then the following theorem holds: Theorem (Mercer) [3] Suppose k is a continuous, symmetric positive-semidefinite kernel. Then there is an orthonormal basis {e i } i I of L (D) consisting of eigenfunctions of T k s.t the corresponding sequence of eigenvalues (λ i ) i I is nonnegative. Futhermore, the eigenfunctions corresponding to non-zero eigenvalues are continuous on D, and k has the representation k(s, t) = λ j e j (s) e j (t). where the convergence is absolute and uniform. If i= λ i <, this orthonormal set of eigenfunctions {e i } i I can be used to construct a space H := {f f = i= α ie i } for some sequence of reals (α i ) i I, with the constraint that f H := i= α i /λ i <, where f H is the norm induced by k. It can be seen that the point evaluation functional K x is given by K x = i λ ie i (x). This implies that k(x, y) = K x, K y H, and for a function f H, Equation () is continuous, which is known as the reproducing property. The importance of these spaces from a machine learning perspective is based on the fact that a kernel function meeting the above conditions implies that there exists some Hilbert space H (of functions) and a mapping ψ : D H s.t. k(x, y) = ψ(x), ψ(y) H, () For a given kernel function, the mapping ψ(x) H (i.e. the set of point evaluation functionals) does not have to be unique, and is often unknown. However, since ψ is implicit, it does not need to be known for most machine learning algorithms, which exploit the nonlinearity of the mapping to create nonlinear algorithms from linear ones []. The Gaussian function used in RBF networks given by k(x, y) = e ( x y /µ ), with bandwidth µ is an example of a bounded kernel function, which generates an infinite dimensional RKHS H. Given a point x D, the mapping ψ(x) H can be thought of as a vector in the Hilbert space H. In D, we have the relation ψ(x) = K x ( ) = e ( x /µ ). The remainder of the paper uses the notation K x ( ) = k(x, ) = e ( x /µ ). In this paper, we assume that the basis functions for the RBF network are Gaussian, but these ideas can be generalized to other kernel functions.

3 Fixing a dataset C = {c,..., c l }, where c i R l, let { l } F C := α i k(c i, ) : α i R, (3) i= be the linear subspace generated by C in H. Note that F C is a class of functions: any RBF network with the same centers and the same fixed bandwidth (but without bias) is an element of this subspace. Let σ(x) = [k(x, c )..., k(x, c l )] T, and let W = [w,..., w l ] where w i R. Then W T σ(x) is the output of a standard RBF network. In the machine learning literature, σ(x) is sometimes known as the empirical kernel map. Two different datasets C and C generate two different families of functions F C and F C, and two different empirical kernel maps σ C (x) and σ C (x). Finally, given the above dataset C, one can form an l l kernel matrix given by K ij := k(c i, c j ). The following theorem is one of the cornerstones of RBF network theory, and will be used in section IV: Theorem (Micchelli)[4] If the function k(x, y) is a positive-definite Mercer kernel, and if all the points in C are distinct, the kernel matrix K is nonsingular. B. Persistently Exciting Signals In this paper, we use Tao s definition of a PE signal [4]: Definition A bounded vector signal Φ(t) is exciting over an interval [t, t + T ], T > and t t if there exists γ > such that t+t t Φ(τ)Φ T (τ)dτ γi. (4) Definition A bounded vector signal Φ(t) is persistently exciting if for all t > t there exists T > and γ > such that t+t t Φ(τ)Φ T (τ)dτ γi. (5) The strength of the signal depends on the value of γ. III. MODEL REFERENCE ADAPTIVE CONTROL AND CONCURRENT LEARNING This section outlines the formulation of Model Reference Adaptive Control using approximate model inversion [8, 5, 7, 8, 7]. Let D x R n be compact, x(t) D x be the known state vector, and δ R k denote the control input. Consider the uncertain multivariable nonlinear dynamical system ẋ = f(x(t), δ(t)), (6) where the function f is assumed to be Lipschitz continuous in x D x, and the control input δ is assumed to be bounded and piecewise continuous. These conditions ensure the existence and the uniqueness of the solution to (6) over a sufficiently large domain D x. Since the exact model (6) is usually not available or not invertible, an approximate inversion model ˆf(x, δ) is introduced, which is used to determine the control input δ δ = ˆf (x, ν), (7) where ν is the pseudo control input, which represents the desired model output ẋ and is expected to be approximately achieved by δ. Hence, the pseudo control input is the output of the approximate inversion model ν = ˆf(x, δ). (8) This approximation results in a model error of the form ẋ = ˆf(x, δ) + (x, δ) (9) where the model error : R n+k R n is given by (x, δ) = f(x, δ) ˆf(x, δ). () A reference model is chosen to characterize the desired response of the system ẋ rm (t) = f rm (x rm (t), r(t)), () where f rm (x rm (t), r(t)) denotes the reference model dynamics which are assumed to be continuously differentiable in x for all x D x R n, and it is assumed that all requirements for guaranteeing the existence of a unique solution to () are satisfied. The command r(t) is assumed to be bounded and piecewise continuous. Furthermore, it is also assumed that the reference model states remain bounded for a bounded reference input. Consider a tracking control law consisting of a linear feedback part ν pd = Kx, a linear feedforward part ν rm = ẋ rm, and an adaptive part ν ad (x) in the following form [7, 6, 7] ν = ν rm + ν pd ν ad. () Define the tracking error e as e(t) = x rm (t) x(t), then, letting A = K the tracking error dynamics are found to be ė = Ae + [ν ad (x, δ) (x, δ)]. (3) The baseline full state feedback controller ν pd = Kx is designed to ensure that A is a Hurwitz matrix. Hence for any positive definite matrix Q R n n, a positive definite solution P R n n exists to the Lyapunov equation: A T P + P A + Q =. (4) Let x = [x, δ] R n+k. Generally, two cases for characterizing the uncertainty ( x) are considered. In structured uncertainty, the mapping Φ( x) is known, whereas in unstructured uncertainty, it is unknown. We focus on the latter in this paper. Assume that it is only known that the uncertainty ( x) is continuous and defined over a compact domain D R n+k. Let σ( x) = [, σ ( x), σ 3 ( x),..., σ l ( x)] T be a vector of known radial basis functions. For i =, 3..., l let c i denote the RBF centroid and let µ denote the RBF bandwidth. Then for each i, the radial basis functions are given as σ i (x) = e x ci /µ, (5) which can also be written as k(c i, ) as per the notation introduced earlier ( II). Appealing to the universal approximation property of RBF NN [3],[4], we have that given a fixed number of radial basis functions l there exist ideal weights

4 W R n l and a real number ɛ( x) such that the following approximation holds for all x D where D is compact (x) = W T σ( x) + ɛ( x), (6) and ɛ = sup x D ɛ( x) can be made arbitrarily small when sufficient number of radial basis functions are used. To make use of the universal approximation property, a Radial Basis Function (RBF) Neural Network (NN) can be used to approximate the unstructured uncertainty in (). In this case, the adaptive element takes the form ν ad ( x) = W T σ( x), (7) where W R n l. The goal is to design an adaptive law such that W (t) W as t and the tracking error e remains bounded. A commonly used update law, which will be referred to here as the baseline adaptive law, is given as [, 9, 4] Ẇ = Γ W σ( x)e T P B. (8) The adaptive law (8) guarantees that the weights approach their ideal values (W ) if and only if the signal σ( x) is PE. In absence of PE, without additional modifications such as σ-mod [4], e-mod [8], nor ection based adaptation [4], the law does not guarantee that the weights W remain bounded. The work in [8] shows that if specifically selected recorded data is used concurrently with instantaneous measurements, then the weights approach and stay bounded in a compact neighborhood of the ideal weights subject to a sufficient condition on the linear independence of the recorded data; PE is not needed. This is captured in the following theorem Theorem 3 [8] Consider the system in (6), the control law of (), x() D where D is compact, and the case of unstructured uncertainty. For the j th recorded data point let ɛ j (t) = W T (t)σ( x j ) ( x j ), with ( x j ) for a stored data point j calculated using () as ( x j ) = ẋ j ν j. Also, let p be the number of recorded data points σ( x j ) in the matrix Z = [σ( x ),..., σ( x p )]. If rank(z) = l, then the following weight update law p Ẇ = Γ W σ( x)e T P B Γ W σ( x j )ɛ T j, (9) renders the tracking error e and the RBF NN weight errors W uniformly ultimately bounded. Furthermore, the adaptive weights W (t) will approach and remain bounded in a compact neighborhood of the ideal weights W. The matrix Z will be referred to as the history stack. IV. KERNEL LINEAR INDEPENDENCE AND THE BUDGETED KERNEL RESTRUCTURING (BKR) ALGORITHM A. PE Signals and the RKHS In this section we make a connection between adaptive control and kernel methods by leveraging RKHS theory to relate PE of x(t) to PE of σ( x(t)). Let C = {c,..., c l }, and recall that F C represents the linear subspace generated by C in H (see (3)). Let G = σ( x(t))σ( x(t)) T, then k( x, c ) G =. ( k( x, c ) k( x, c l ) ) k( x, c l ) k( x, c )k( x, c ) k( x, c )k( x, c l ) =..... k( x, c l )k( x, c ) k( x, c l )k( x, c l ) ψ x,c ψ x,c ψ x,c ψ x,cl =....., ψ x,cl ψ x,c ψ x,cl ψ x,cl where ψ x,ci is shorthand for ψ( x), ψ(c i ). A matrix G is positive definite if and only if v T Gv > v R l. In the above, this translates to l v T Gv = v i v j G i,j = = = i, l v i v j ψ( x), ψ(c i ) ψ( x), ψ(c j ) i, ψ( x), ψ( x), l v i ψ(c i ) ψ( x), i= l v i ψ(c i ). i= l v j ψ(c j ) From Micchelli s theorem (Theorem ), it is known that if c i c j, then ψ(c i ) and ψ(c j ) are linearly independent in H. Further, if the trajectory x(t) R l is bounded for all time, then k( x(t), c). From this, it follows trivially that the signal t+t G(τ)dτ is bounded. Furthermore, the next t theorem follows immediately: Theorem 4 Suppose x(t) evolves in the state space according to Equation 6, then if there exists some time t f R + s.t. the mapping ψ( x(t)) H for t > t f is orthogonal to the linear subspace F C H for all time, the signal σ( x(t)) is not persistently exciting. If the state x(t) reaches a stationary point, we can state a similar theorem. Theorem 5 Suppose x(t) evolves in the state space according to Equation 6. If there exists some state x f R n and some t f R + s.t. x(t) = x f t > t f, σ( x(t)) is not persistently exciting. Therefore PE of σ( x(t)) follows only if neither of the above conditions are met. Figure shows an example of the mapping ψ in H given the trajectory x(t), while Figure depicts a geometric description of a non PE signal in the Hilbert space. The signal ψ( x(t)) in H becomes orthogonal to the centers C = {c,..., c l } mapped to H if x(t) moves far away from them in R n. Though orthogonality is desired in the state space R n for guaranteeing PE, orthogonality in H is

5 R n x(t) c c ψ ψ( x(t)) ψ(c ) ψ(c ) F C H analysis, σ( x j ) is the ection of ψ( x j ) onto F C. If ψ( x(t)) is orthogonal to F C in H, the first term in (9) (which is also the baseline adaptive law of (8)) vanishes, causing the evolution of weights to stop with respect to the tracking error e. The condition that p σ( x j)σ( x j ) T has the same rank as the number of centers in the RBF network is equivalent to the statement that one has a collection of vectors { x j } p where the ections of {ψ( x j )} p onto F C allow the weights to continue evolving to minimize the difference in uncertainty. Fig.. An example Hilbert space mapping. The trajectory x(t) evolves in the state space R n via an ODE; the mapping ψ an equivalent ODE in H. If c and c are the centers for the RBFs, then they generate the linear subspace F C H, which is a family of linearly parameterized functions, via the mappings ψ(c ) and ψ(c ). ψ( x(t)) Fig.. An example of a non PE signal: Any trajectory x(t) R n inducing a trajectory ψ( x(t)) H that is orthogonal to F C after some time t f renders σ( x(t)) non-pe. detrimental to PE of σ( x(t)). Recall that without PE of σ( x(t)) the baseline adaptive law of (8) does not guarantee that the weights converge to a neighborhood of the ideal weights. This shows that in order to guarantee PE in σ( x(t)), not only does x(t) have to be PE but also the centers should be such that the mapping ψ( x(t)) H is not orthogonal to the linear subspace F C generated by the centers. It is intuitively clear that locality is a concern when learning the uncertainty; if the system evolves far away from the original set of centers, the old centers will give less accurate information than new ones that are selected to better represent the current operating domain. Theorem 4 enables us to state this more rigorously; keeping the centers close to x(t) ensures persistency of excitation of σ( x(t)) (as long as x(t) is also PE). At the same time, we do not want to throw out all of the earlier centers either. This is in order to preserve global function approximation (over the entire domain), especially if the system is expected to revisit these regions of state space. The work in [8] shows that if the history stack is linearly independent, the law in (9) suffices to ensure convergence to a compact neighborhood of the ideal weight vector W without requiring PE of σ( x(t)). We can reinterpret the concurrent learning gradient descent in our RKHS framework. Note that the concurrent learning adaptive law in (9) will ensure ɛ j (t) ɛ by driving W (t) W. By the above F C H B. Linear Independence This section outlines a strategy to select RBF centers online. The algorithm presented here ensures that the RBF centers cover the current domain of operation while keeping at least some centers reflecting previous domains of operation. We use the scheme introduced in [3] to ensure that the RBF centers reflect the current domain of operation. Similar ideas are used in the kernel adaptive filtering literature in order to curb filter complexity []. At any point in time, our algorithm maintains a dictionary of centers C l = {c i } l i=, where l is the current size of the dictionary, and N D is the upper limit on the number of points (the budget). To test whether a new center c l should be inserted into the dictionary, we check whether it can or cannot be approximated in H by the current set of centers. This test is performed using l γ = a i ψ(c i ) ψ(c l ), () i= where the a i denote the coefficients of the linear independence. Unraveling the above equation in terms of () shows that the coefficients a i can be determined by minimizing γ, which yields the optimal coefficient vector â l = K l ˆk l, where K l = k(c l, C l ) is the kernel matrix for the dictionary dataset C l, and ˆk l = k(c l, c l ) is the kernel vector. After substituting the optimal value â l into (), we get γ = K(c l, c l ) ˆk T l â l. () The algorithm is summarized in Algorithm. For more details see [3]; in particular, an efficient way to implement the updates is given there, which we do not get into here. Algorithm Kernel Linear Independence Test Input: New point c l, η, N D. Compute â l = K l ˆk l k l = k(c l, c l ) Compute γ as in Equation if γ > η then if l < N D then Update dictionary by storing the new point c l, and recalculating γ s for each of the points else Update dictionary by deleting the point with the minimal γ, and then recalculate γ s for each of the points end if end if H

6 Note that due to the nature of the linear independence test, every time a state x(t) is encountered that cannot be approximated within tolerance η by the current dictionary C l, it is added to the dictionary. Further, since the dictionary is designed to keep the most varied basis possible, this is the best one can do on a budget using only instantaneous online data. Therefore, in the final algorithm, all the centers can be initialized to, and the linear independence test can be periodically run to check whether or not to add a new center to the RBF network. This is fundamentally different from update laws of the kind given in [7], which attempt to move the centers to minimize the tracking error e. Since such update laws are essentially rank-, if all the centers are initialized to, they will all move in the same direction together. Furthermore, from the above analysis and from (5), it is clear that the amount of excitation is maximized when the centers are previous states themselves. Summarizing, the following are the advantages of picking centers online with the linear independence test (): ) In light of Theorem 4, Algorithm ensures inserted excitation at the current time in the system does not disappear by selecting centers such that F C is not orthogonal to current states. In less formal terms, this means that at least some centers are sufficiently close to the current state. ) This also implies that not all of the old centers are discarded, which means that all of the regions where the system has previously operated are represented to some extent in the dictionary. This implies global function approximation. 3) Algorithm enables the design of adaptive controllers without any prior knowledge of the domain. For example, this method would allow one to initialize all centers to zero, with appropriate centers then selected by Algorithm online. 4) On a budget, selecting centers with Algorithm is better than evenly spacing centers, since the centers are selected along the path of the system in the state space. This results in a more judicious distribution of centers without any prior knowledge of the uncertainty. If the centers for the system are picked using the kernel linear independence test, and the weight update law is given by the standard baseline law (8), we call the resulting algorithm Budgeted Kernel Restructuring (BKR). A. Motivation V. THE BKR-CL ALGORITHM The BKR algorithm selects centers in order to ensure that any inserted excitation does not vanish. However, to fully leverage the universal approximation property (see (6)), we must also prove that the weights are driven towards their ideal values. In this section we present a concurrent learning adaptive law that guarantees the weights approach and remain bounded in compact domain around their ideal weights by concurrently utilizing online selected and recorded data with instantaneous data for adaptation. The concurrent learning adaptive law is wedded with BKR to create BKR-CL. We begin by describing the algorithm used to select, record, and remove data points in the history stack. Note that Algorithm picks the centers discretely; therefore, there always exists an interval [t k, t k+ ] where k N where the centers are fixed. This discrete update of the centers introduces switching in the closed loop system. Let σ k ( x) denote the value of σ given by this particular set of centers, denote by W k the ideal set of weights for these centers and by σ k the radial basis function for these centers. Let p N denote the subscript of the last point stored. For a stored data point x j, let σj k Rn denote σ k ( x j ). We will let S t = [ x,..., x p ] denote the matrix containing the recorded information in the history stack at time t, then Z t = [σ k,..., σp] k is the matrix containing the output of the RBF function for the current set of RBF centers. The p-th column of Z t will be denoted by Z t (:, p). It is assumed that the maximum allowable number of recorded data points is limited due to memory or processing power considerations. Therefore, we will require that Z t has a maximum of p N columns; clearly, in order to be able to satisfy rank(z t ) = l, p l. For the j-th data point, the associated model error ( x j ) is assumed to be stored in the array (:, j) = ( x j ). The uncertainty of the system is estimated by estimating ẋ j for the j th recorded data point using optimal fixed point optimal smoothing and solving equation () to get ( x) = ẋ j ν j. The history stack is populated using an algorithm that aims to maximize the minimum singular value of the symmetric matrix Ω = p σk (x j )σ kt (x j ). Thus any data point linearly independent of the data stored in the history stack is included in the history stack. At the initial time t = and k =, the algorithm begins by setting Z t (:, ) = σ ( x(t )). The algorithm then selects sufficiently different points for storage; a point is considered sufficiently different if it is linearly independent of the points in the history stack, or if it is sufficiently different in the sense of the Euclidean norm of the last point stored. Furthermore, if Algorithm updates the dictionary by adding a center, then this point is also added to the history stack (if the maximum allowable size of the history stack is not reached), so the rank of Z t is maintained. If the number of stored points exceeds the maximum allowable number, the algorithm seeks to incorporate new data points in manner that the minimum singular value of Z t is increased; the current data point is added to the history stack only if swapping it with an old point increases the singular value. One can see that Algorithm and Algorithm work similarly to one another. The structure of Algorithm is a condition that relates to the PE of the signal x(t), by ensuring that the history stack is as linearly independent in R n as possible, while Algorithm does the same for the PE of the signal σ( x(t)) as well, trying to ensure that the center set is as linearly independent in H as possible. In order to approximate the uncertainty well, both algorithms are needed. method by the acronym BKR-CL. B. Lyapunov Analysis In this section, we prove the boundedness of the closed loop signals when using BKR-CL. Over each interval, between

7 Algorithm Singular Value Maximizing Algorithm for Recording Data Points Require: p ɛ or a new center added by Algorithm without replacing an old center then p = p + S t (:, p) = x(t); {store (:, p) = ẋ ν(t)} else if new center added by Algorithm by replacing an old center then if old center was found in S t then overwrite old center in the history stack with new center set p equal to the location of the data point replaced in the history stack end if end if Recalculate Z t = [σ k,..., σp] k if p p then T = Z t S old = min SVD(Zt T ) for j = to p do Z t (:, j) = σ k (x(t)) S(j) = min SVD(Zt T ) Z t = T end for find max S and let k denote the corresponding column index if max S > S old then S t (:, k) = x(t); {store (:, k) = ẋ ν(t)} Z t (:, k) = σ k (x(t)) p = p else p = p Z t = T end if end if if σk (x(t)) σ k p σ k (x(t)) switches in the NN approximation, the tracking error dynamics are given by the following differential equation ė = Ae + [W T σ k ( x) ( x)]. () The NN approximation error of (6) for the k th system can be rewritten as ( x) = W k T σ k ( x) + ɛ k ( x), (3) We prove the following equivalent of Theorem 3 for the following switching weight update law p Ẇ = Γ W σ k ( x)e T P B Γ W σ k ( x j )ɛ kt j. (4) Theorem 6 Consider the system in (6), the control law of (), x() D where D is compact, and the case of unstructured uncertainty. For the j th recorded data point let ɛ k j (t) = W T (t)σ k ( x j ) ( x j ) and let p be the number of recorded data points σ k j := σ k ( x j ) in the matrix Z = [σ k,..., σ k p], such that rank(z ) = l at t =. Assume that the RBF centers are updated using Algorithm and the history stack updated using Algorithm. Then, the weight update law in (4) ensures that the tracking error e of () and the RBF NN weight errors W k of (3) are bounded. Proof: Consider the tracking error dynamics given by () and the update law of (4). Since ν ad = W T (σ k (x)), we have ɛ j = W T σ k ( x) ( x). Then the NN approximation error is now given by (3). With W k = W W k ɛ k j ( x) = W T σ k ( x) W k T σ k ( x) ɛ k ( x) = W k σ k ( x) ɛ k ( x). Therefore over [t k, t k+ ], the weight dynamics are given by the following switching system p W k (t) = Γ σ k ( x j )σ kt ( x j ) W k (t) + p σ k ( x j ) ɛ kt (z j ) σ k ( x(t))e T (t)p B. (5) Consider the family of positive definite functions V k = et P e + Tr( W kt Γ W W k ), where Tr( ) denotes the trace operator. Note that V k C, V k () = and V k (e, W k ) > e, W k, and define Ω k := j σk j skt j. Then V k (e, W k ) = et Qe + e T P B ɛ k Tr k W T σj k σj kt W k j where C k j σ j ɛ k j λ min(q)e T e + e T P B ɛ k λ min (Ω k kt ) W W W kt σi k ɛ k ( e C k ) λ min(q) e i + W k (C k λ min (Ω k ) W k ), = P B ɛ k, C = p ɛ k l (l being the number of RBFs in the network), and Ω k = p σ k ( x j )σ kt. Note that since Algorithm guarantees that only sufficiently different points are added to the history stack then due to Micchelli s theorem [4], Ω k is guaranteed to be positive definite. Hence if e > C /λ min (Q) and W k > C 3 /λ min (Ω k ), we have V (e, W k ) <. Hence the set Π k = {(e, W k ) : e + W k C /λ min (Q) + C /λ min (Ω k )} is positively invariant for the k th system. Let S = {(t, ), (t, ),... } be an arbitrary switching sequence with finite switches in finite time (note that this is always guaranteed due to the discrete nature of Algorithm ). The sequence denotes that a system S k was active between t k and t k+. Suppose at time t k+, the system switches from S k to S k+. Then e(t k ) = e(t k+ ) and W k+ = W k + W k, k

8 where W k = W k W (k+). Since V k (e, W k ) is guaranteed to be negative definite outside of a compact interval, it follows that e(t k ) and W k (t k ) are guaranteed to be bounded. Therefore, e(t k+ ) and W k+ (t k+ ) are also bounded and since V k+ (e, W k+ ) is guaranteed to be negative definite outside of a compact set, e(t) and W k+ (t) are also bounded. Furthermore, over every interval [t k, t k+ ], they will approach the positively invariant set Π k+ or stay bounded within Π k+ if inside. Remark Note that BKR-CL along with the assumption that rank(z) = l guarantees that the weights stay bounded in a compact neighborhood of their ideal values without requiring any additional damping terms such as σ modification [4] or e modification [9] (even in the presence of noise). Due to Micchelli s theorem [4], as long as the history stack matrix Z k contains l different points, rank(z) = l. Therefore, it is expected that this rank condition will be met within the first few time steps even when one begins with no a- priori recorded data points in the history stack. In addition, a ection operator (see for example [4]) can be used to bound the weights until rank(z t ) = l. This is a straight forward extension of the presented approach. VI. APPLICATION TO CONTROL OF WING ROCK DYNAMICS Modern highly swept-back or delta wing fighter aircraft are susceptible to lightly damped oscillations in roll known as Wing Rock. Wing rock often occurs at conditions commonly encountered at landing ([34]); making precision control in presence of wing rock critical for safe landing. In this section we use BKR-CL control to track a sequence of roll commands in the presence of simulated wing rock dynamics. Let θ denote the roll attitude of an aircraft, p denote the roll rate and δ a denote the aileron control input. Then a model for wing rock dynamics is [5] where θ = p (6) ṗ = L δa δ a + (x), (7) (x) = W + W θ + W p + W 3 θ p + W 4 p p + W 5 θ 3 (8) and L δa = 3. The parameters for wing rock motion are adapted from [36]; they are W =.34, W =.698, W3 =.645, W4 =.95, W5 =.4. In addition to these parameters, a trim error is introduced by setting W =.8. The ideal parameter vector W is assumed to be unknown. The chosen inversion model has the form ν = L δa δ a. This choice results in the modeling uncertainty of () being given by (x). The adaptive controller uses the control law of (). The linear gain K of the control law is given by [.5,.4], a second order reference model with natural frequency of rad/sec and damping ratio of.5 is chosen, and the learning rate is set to Γ W =. The simulation uses a time-step of. sec. We choose to make two sets of comparisons. In the first set, we compare BKR-CL with e-mod, σ-mod and ection operator adaptive laws (henceforth collectively denoted as classical adaptive laws ) for neuroadaptive control. In all cases, the number of centers was chosen to be. In BKR-CL, the centers were all initialized to R at t =, while in the classical adaptive laws, they were placed across the domain of uncertainty evenly (where the prior knowledge of the domain was determined through experiments conducted beforehand). The BKR-CL algorithm uses the switching weight update law (4). Figure 3(a) shows the tracking of the reference model by the classical laws and BKR-CL. Figure 3(b) shows the tracking error for the same. As can be seen, overall error, and especially the transient error for BKR-CL is lower than that for the classical laws. Figure 4 shows an example of how concurrent learning affects the evolution of weights; it has a tendency to spread the weights across a broader spectrum of values than the classical laws, leading the algorithm to choose from a richer class of functions in F C. Figure 5 shows the final set of centers that are used; as can be seen, the centers follow the path of the system in the state space. This is another reason why the uncertainty is approximated better by BKR-CL than the classical algorithms (this is discussed further in VI-A). To illustrate this further, we also ran another set of comparisons between BKR-Cl and the classical laws with the centers picked according to the BKR scheme instead of being uniformly distributed beforehand. In all subsequent figures, the BKR versions of e-mod, σ-mod and the ection operator are denoted by BKR e-mod, BKR σ-mod and BKR respectively. The tracking performance is compared in Figure 6, and it can be seen here that BKR-CL is again the better scheme. Finally, in both sets of comparisons, note that the control input is tracked better by BKR-CL than the other algorithms. Figure 7 shows that BKR-CL again drives the weights over a wider domain. A. Illustration of long-term learning effect introduced by BKR- CL One of the main contributions of the presented BKR- CL scheme is the introduction of long-term learning in the adaptive controller. Long-term learning here is characterized by better approximation of the uncertainty over the entire domain. Long term learning results due to the convergence of the NN weights to a neighborhood of their ideal values (in sense of the Universal Approximation Property of (7)). Figure 8 presents an interesting comparison between the online NN outputs (W T (t)σ( x(t))) for all the controllers, and the NN output with the weights and centers frozen at their values at the final time T of the simulation (W T (T )σ( x(t))), then resimulated with these learned weights. Figure 8(a) compares the online adaptation performance during the simulation, that is it compares ( x(t)) with W T (t)σ( x(t)). We can see that BKR-CL does the best job approximating the uncertainty online, particularly in the transient, whereas the classical laws tend to perform similar to each other. In Figure 8(b), we see that the difference is more dramatic when the weights are frozen at their learned values during the simulation; only BKR- CL completely captures the transient, and approximates the

9 θ (rad) 3 Angular Position ref model.5 weights weights weights time (seconds) Angular Rate.5 θ Dot (rad/s) time (seconds) θ Err (rad) θ DotErr (rad/s) δ (rad).5.5 Position Error Angular Rate Error Control Input time (seconds) Fig. 4. Weight evolution with BKR-CL, e-mod,σ-mod, and the ection operator RBF centers in state space system path BKR centers σ and centers Fig. 3. Tracking with e-mod, σ-mod, the ection operator, and with BKR and concurrent learning (BKR-CL). steady-state uncertainty fairly closely, while the classical laws do quite poorly. This implies that the weights learned with BKR-CL have converged to the values that provide a good approximation of the uncertainty over the entire operating domain. On the other hand, in Figure 8(b) we see that if BKR- CL is not used, the NN does not approximate the uncertainty as well, indicating that the weights have not approached their ideal values during the simulation. From Figure 4 we see that since the weights for e-mod and the ection operator are closer to those approached by BKR-CL than σ-mod, and they do slightly better in uncertainty approximation. The effect of long-term learning can be further characterized by studying the behavior of the BKR algorithm without concurrent learning and with a higher learning rate (Γ W = ). Higher learning rates are often used in traditional adaptive control (without concurrent learning) to guarantee good tracking. Figure 9(a) compares the online adaptation performance. Except for the initial transient, the NN outputs of the classical laws approximate the uncertainty well locally in time with a higher learning rate. However, Figure 9(b) shows that they do Fig Center placement for the classical laws and BKR-CL. a poor job in estimating the ideal weights with the learning rate increased, indicating that good local approximation of the uncertainty through high adaptation gains does not necessarily guarantee long-term learning. This is motivated by the fact that the classical laws relying on the gradient based weight update law of (8) drive the weights only in the direction of maximum reduction of instantaneous tracking error cost, and hence, do not guarantee that the weights are driven close to the ideal values that best represent the uncertainty globally. On the other hand, BKR-CL has a better long-term adaptation performance as not only are the centers updated to best reflect the domain of the uncertainty but the weights are also driven towards the ideal values that best represent the uncertainty over the domain visited. B. Computational Complexity The accuracy of the algorithm comes at a computational cost. The two main components of BKR-CL are the BKR

10 θ (rad) 3 Angular Position ref model.5 weights weights weights time (seconds).5 Angular Rate θ Dot (rad/s) time (seconds) θ Err (rad).5.5 Position Error Angular Rate Error Fig. 7. Weight evolution with BKR-CL and with BKR e-mod, BKR σ-mod and BKR. 3.5 true uncertainty θ DotErr (rad/s) Control Input δ (rad) time (seconds) (a) Online uncertainty tracking Fig. 6. Tracking with BKR e-mod, BKR σ-mod, BRK and with BKR and concurrent learning (BKR-CL)..5 true uncertainty step (Algorithm ) and the CL step (Algorithm ). Once the budget of the dictionary N D is reached, the BKR step roughly takes O(ND ) steps to add a new center, and is thus reasonably efficient. The main computational bottleneck in BKR-Cl is the computation of SVD of the history stack matrix Ω in the CL step. If there are p points in Ω, this takes O(p 3 ) steps. Given the fact that p is usually larger than N D, this can be quite expensive. Table I shows a computation time comparison between the non-bkr controllers and BKR-CL for the Wingrock simulation run for 6 seconds in MATLAB on a dual core Intel.93 Ghz processor with 8 GB RAM, while Table VI-B shows the same comparison between BKR controllers and BKR-CL. As can be seen, the computation time increases dramatically for BKR-CL as the size of Ω increases. There are two things to note here however; the number of points added to the history stack Ω is determined by the tolerance ɛ in Algorithm. Therefore, if too many points are being added to the history stack, one can increase the tolerance, or set a separate budget to make p fixed. Secondly, we used the native implementation of the SVD in MATLAB (b) Learned weight uncertainty tracking Fig. 8. Plots of the online NN output against the system uncertainty, and the NN output using the final weights frozen at the end of the simulation run. Note that the NN weights and centers with BKR-CL better approximate the uncertainty in both cases. for these experiments, which is not optimized for an online setting. Recent work in incremental SVD computation (see for example [4, 5, 35]) speeds up the decomposition considerably, resulting in O(p ) complexity once the first SVD with p points has been computed. Therefore, the computation time for the SVD component of BKR-CL can be made roughly comparable

11 true uncertainty (a) Online uncertainty tracking.5 true uncertainty.5 the mapping from the state space to the underlying RKHS is never orthogonal to the linear subspace generated by the Radial Basis Function (RBF) centers. This ensures that the output of the radial basis function does not vanish. We used this connection to motivate an algorithm called Budgeted Kernel Restructuring (BKR) that updates the RBF network in a way that ensures any inserted excitation is retained. This enabled us to design adaptive controllers without assuming any prior knowledge about the domain of the uncertainty. Furthermore, it was shown through simulation that on a budget (limitation on the maximum number of RBFs used), the method is better at capturing the uncertainty than evenly spaced centers, because the centers are selected along the path of the system through the state space. We augmented BKR with Concurrent Learning (CL), a method that concurrently uses specifically recorded and instantaneous data for adaptation, to create the BKR-CL adaptive control algorithm. It was shown through Lyapunovlike analysis that the weights for the centers picked by BKR- CL are bounded without needing PE. Simulations showed improved tracking performance (b) Learned weight uncertainty tracking Fig. 9. Plots of the online NN output against the system uncertainty, and the NN output using the final weights frozen at the end of the simulation run, with a high learning rate. TABLE I WINGROCK COMPUTATION TIME COMPARISON WITH NON-BKR CONTROLLERS. Centers e-mod σ-mod BKR-CL to the BKR component, which implies that the algorithm will scale better as the number of points in the history stack increases. VII. CONCLUSION In this work, we established a connection between kernel methods and Persistently Exciting (PE) signals using Reproducing Kernel Hilbert Space theory. Particularly, we showed that in order to ensure PE, not only do the system states have to PE, but the centers need to be also selected in such a way that TABLE II WINGROCK COMPUTATION TIME COMPARISON WITH BKR CONTROLLERS. Centers BKR e-mod BKR BKR σ-mod BKR-CL VIII. ACKNOWLEDGMENTS This work was supported in part by NSF ECS-38993, NSF ECS-84675, and NASA Cooperative Agreement NNX8AD6A. We thank Maximilian Mühlegg for his valuable help in revising this paper and in the simulations. REFERENCES [] K. J. Aström and B. Wittenmark. Adaptive Control. Addison-Weseley, Readings, nd edition, 995. [] Pantelis Bouboulis, Konstantinos Slavakis, and Sergios Theodoridis. Adaptive learning in complex reproducing kernel hilbert spaces employing wirtinger s subgradients. IEEE Trans. Neural Netw. Learning Syst., 3(3):45 438,. [3] S. Boyd and S. Sastry. Necessary and sufficient conditions for parameter convergence in adaptive control. Automatica, (6):69 639, 986. [4] M. Brand. Fast online svd revisions for lightweight recommender systems. In Proc. of SIAM Int. Conf. on Data Mining, 3. [5] M. Brand. Fast low-rank modifications of the thin singular value decomposition. Linear Algebra and its Applications, 45(): 3, May 6. [6] C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, : 67, 998. [7] A. J. Calise, M. Sharma, and S. Lee. Adaptive autopilot design for guided munitions. AIAA Journal of Guidance, Control, and Dynamics, 3(5): ,. [8] G. Chowdhary. Concurrent Learning for Convergence in Adaptive Control Without Persistency of Excitation. PhD thesis, Georgia Institute of Technology, Atlanta, GA,. [9] G. Chowdhary and E. N. Johnson. Concurrent learning for convergence in adaptive control without persistency

Concurrent Learning for Convergence in Adaptive Control without Persistency of Excitation

Concurrent Learning for Convergence in Adaptive Control without Persistency of Excitation Girish Chowdhary and Eric Johnson Abstract We show that for an adaptive controller that uses recorded and instantaneous