Optimal Scheduling for Reference Tracking or State Regulation using Reinforcement Learning

Size: px
Start display at page:

Download "Optimal Scheduling for Reference Tracking or State Regulation using Reinforcement Learning"

Transcription

1 Optimal Scheduling for Reference Tracking or State Regulation using Reinforcement Learning Ali Heydari Abstract The problem of optimal control of autonomous nonlinear switching systems with infinite-horizon cost functions, for the purpose of tracking a family of reference signals or regulation of the states, is investigated. A reinforcement learning scheme is presented which learns the solution and provides scheduling between the modes in a feedback form without enforcing a mode sequence or a number of switching. This is done through a value iteration based approach. The convergence of the iterative learning scheme to the optimal solution is proved. After answering different analytical questions about the solution, the learning algorithm is presented. Finally, numerical analyses are provided to evaluate the performance of the developed technique in practice. I. INTRODUCTION Optimal scheduling between different modes/subsystems in control of switching systems is a challenging problem in controls engineering discipline and numerous research papers have emerged in the literature within the last decade in this regard, [] [4]. The reason for this attention and the vast effort in solving these problems is the fact that many real-world control problems can be classified as switching problems, including problems in mechanical and aerospace systems [5], [6], electronics [7], chemical processes [8], and bioengineering [2], [9]. Conventional optimal control methods fail to provide solution to switching problems, generally. It should be noted that the solution includes discrete decisions, in switching problems. In other words, the solution provides optimal decisions, in terms of suitable switching between the modes. One of the most attractive approaches to solving optimal switching problems is freezing the mode sequence, i.e., the order of active modes, as well as the number of switches, and optimizing the switching times. Note that once the mode sequence and the number of switches are fixed, the unknown is the switching time. Nonlinear programming is an approach followed by different researchers [] [7], in which, the gradient of the cost function with respect to the switching instant is utilized for optimizing the switching instant/time with a pre-selected mode sequence and number of switching. Some ideas were presented in [5] and [6] for admitting free mode sequence conditions. In [5], a two stage optimization algorithm was developed where in one stage the switching time is updated and at another stage the mode sequence is modified. In a recent study, another nonlinear programming based solution was proposed in [20] for the case of free mode sequence. Nonlinear programming based methods generally lead to open loop solutions for a given/fixed initial condition. Each time the initial condition is changed, another set of numerical calculations are required to be conducted in order to find the new optimal switching times. The dependency of the solutions on the selected initial conditions leads to the limitation that, for example in finding the optimal switching between gears in a manual transmission car in order to accelerate to a desired speed, a calculated solution will be valid for implementation only if the initial speed of the car is exactly the same as the one using which the problem was numerically solved. Otherwise the solution will not take the car to the desired speed. In [3] the validity of the results was extended to different initial conditions within a pre-selected set through determining the switching parameter such that it minimizes the worst possible cost for all trajectories starting in the selected set of initial states. Discretization of state space is an approach followed by [8], in order to end up with a finite number of choices, and dynamic programming was used for solving the problem. Refs. [2] and [22] investigated the use of (relaxed/approximate) dynamic programming for different problems including optimal switching. Genetic algorithm is also another approach for finding a numerical solution for a given initial condition [9]. An optimization scheme was developed in [9] to find both the optimal mode sequence and the switching time for positive linear systems. The demonstrated potential of Reinforcement Learning (RL) and Approximate Dynamic Programming (ADP) in solving conventional optimal control problems, [23] [37] motivated the author of this study to utilize ADP Assistant Professor of Mechanical Engineering, South Dakota School of Mines and Technology, Rapid City, SD 5770, ali.heydari@sdsmt.edu.

2 for solving optimal switching problems in the past. The results were solutions to problems with fixed switching sequence [38], free switching sequence with autonomous subsystems [39], free switching sequence with controlled subsystems [40], and the applications of the developed ideas to multi-therapeutic treatment of HIV disease [2] and aerospace vehicles [6]. All these developments, however, deal with problems with fixed-final-time, i.e., problems with finite-horizon cost functions. Many real-world problems, however, have infinite horizon, e.g., regulation of a system with on-off actuators. The motivation behind this work is providing a solution for such problems. In a simultaneous, but independent research, Refs. [3] and [4] proposed a different ADP based solution to switching problems. The number of functions needed to be learned at each training iteration grows exponentially with the number of iterations and soon becomes prohibitive, in the proposed method. Moreover, in these developments the training is done for a single selected initial condition. Another investigation for solving switching problems using ADP was reported in [4]. The differences, compared with this study, are the approach and the point that the initial conditions are assumed to be known a priori in that study. Considering this background, the current study is aimed at extending the developments in [2], [6], [38] [40] and particularly the solution proposed in [39] to the problems with infinite-horizon cost functions. For the sake of generality, tracking a time-varying signal is selected, because, once the reference signal is selected as zero, the solution immediately extends to regulation of the states. The main challenge in direct extension of the results presented in [39] to the infinite-horizon problems is the fact that the so called value function, sometimes called cost-to-go function, can be easily learned in a backward fashion in fixed-final-time problems. But, once the horizon is infinite/unlimited, this is not possible, i.e., there is no final time to start from. An idea based on the ADP/RL approaches in solving conventional optimal control problems, is using value iteration [23] to learn the desired function. Doing so leads to multiple questions including the convergence of the iterations, the optimality of the limit function of the sequence resulting from the iterations, the continuity of the result to be approximated by neural networks (NNs), etc. The analytical contribution of this work is developing novel and rigorous, while straightforward and easy to understand answers to these questions. The approach followed for the convergence analyses is motivated by [42] and is unlike the well-established ideas for convergence of ADP in conventional problems including [2] and [27]. The former was adapted in [22] and the latter was adapted in different studies including [30], [32], [4], [43]. More specifically, the proposed idea in this study is establishing an analogy between the time-to-go in finite-horizon problems and the iteration index in value iteration for infinite-horizon problems. This idea is the key to the convergence, optimality, and continuity proofs presented here. Interested readers in convergence and continuity analyses of value iteration are referred to [22] for another approach for the analyses using a transformation of the continuous-time switching problems. Besides these theoretical analyses, another contribution of this work is the resulting controller which approximates the optimal solution to tracking/regulation problems with infinite-horizon cost functions. The proposed controller provides solution for different initial conditions without any need for retraining. Moreover, once the NN is trained, the result will be valid for tracking different reference signals which share the same dynamical model, for example, generated using different initial conditions. Another interesting feature of the proposed solution is calculating solutions in feedback forms, in the sense that the solution is directly calculated based on the instantaneous states of the systems and the reference signal. Finally, the proposed method does not assume a fixed mode sequence or a fixed number of switches. The solution, including the number of switching, the order of the modes, and the switching times, are all subject to be calculated such that a cost function is minimized. While the class of problems investigated in this study is different from the one in [4], the bases of the presented solutions can be compared together. a) Only one neural network is needed to be trained and implemented in the proposed method through this study, while in the other method the required number of critic networks grows exponentially with the iterations. b) The tracking problem is investigated in its general form in this study, while the solution proposed for tracking problems in [4] is limited to a certain type of tracking problems which are convertible to regulation type problems. c) The method proposed in [4] is valid for a single and unique reference signal, while the proposed method in this study provides solution for tracking a family of reference signals. d) The proposed method in this work is valid for different initial conditions without any need for retraining, but, the training algorithm in [4] is based on a selected initial condition, as shown in the training algorithm in [30], on which [4] is based. The rest of this paper is organized as follows. The problem is formulated in Section II and the proposed solution is detailed in Section III. Section IV presents the convergence analyses and the answers to the raised analytical 2

3 questions in section III. Afterwards, section V details the implementation of the proposed method. Section VI discusses the extension of the results to the case of optimal regulation of the states and Section VII presents the numerical analyses and simulations. Finally, concluding remarks are given in section VIII. II. PROBLEM FORMULATION The problem subject to this study is forcing the states of the system to track a given time-varying signal. The decision variable, however, is the active mode in the given switching system, which can be arbitrarily selected at each instant. More specifically, let the system subject to scheduling/control be given by M modes or subsystems with the known dynamics of x k+ = f i (x k ), k N, i I, () where f i : R n R n is continuous i I := {, 2,..., M}, N denotes the set of non-negative integers, and positive integer n is the dimension of the state vector x k. Sub-index k in x k represents the discrete time index and sub-index i in f i (.) represents the respective mode/subsystem. Denoting the active mode at instant k with i k I, a switching schedule identifies i k, k N. Once a switching schedule is selected, the system can operate from k = 0 to k =. The problem is defined as finding a switching schedule that forces the states (or a combination of its elements) to track a reference signal r k R m (or a combination of its elements) with the known dynamics of r k+ = F (r k ), (2) given initial condition r 0 R m, where F : R m R m is a continuous function. This objective can be fulfilled by minimizing cost function J = Q(x k, r k ), (3) k=0 where convex, continuous, and positive (semi-)definite function Q : R n R m R + penalizes the state error, with respect to desired reference r k. For example, Q(x k, r k ) := x k r k 2, if m = n, which represents the objective of x k tracking r k. Another example could be having the square of the nthe element of the state vector track the cube of the mth element of the reference, through Q(x k, r k ) := x 2 k (n) r3 k (m) 2, where the lth element of a vector y is denoted with y(l). In other words, Q(.,.) could be any non-negative convex and continuous function which returns zero only when the desired tracking is achieved. The set of non-negative reals is denoted with R +. Assumption. There exists at least one switching schedule for every given initial conditions x 0 and r 0 in some selected compact sets using which cost function (3) is bounded. Assumption 2. The dynamics of the subsystems are known. Assumption guarantees that the optimal solution exists and leads to a finite cost, otherwise, it will not be optimal compared with the assumed existing switching schedule. Assumption 2 clarifies the point that this study does not incorporate the case of having unmodeled dynamics. Optimal scheduling under the presence of modeling uncertainties may be conducted through extension of the presented method to online learning. III. PROPOSED SOLUTION The idea behind the proposed solution is approximating the so called value function, which outputs the cost-to-go (i.e., the cost incurred by evaluating Eq. (3) along the resulting trajectory) given the current state and the current reference signal, and assuming optimal decisions will be made from the current time to infinity. Denoting the value function with V : R n R m R +, considering the selected cost function, i.e., Eq. (3), one has V (x k, r k ) := Q(x k, r k ) + Q(x j, r j ), (4) j=k+ in which optimal (future) states, denoted with x j, j {k +, k + 2,...}, are calculated using dynamics () and optimal decisions i j I, j {k, k +,...}. Eq. (4) can be formed as a recursive equation as V (x, r) = Q(x, r) + V ( f i (x,r)(x), F (r) ), x R n, r R m, (5) 3

4 where i (x, r) denotes the optimal mode given the current x and r. By the Bellman principle of optimality [44], one has V (x, r) = min i I (Q(x, r) + V ( f i (x), F (r) )) = Q(x, r) + min i I V ( f i (x), F (r) ), x R n, r R m. (6) Moreover, the optimal mode i at each instant, which is also a function of the current x and r, is given by i (x, r) = argmin i I V ( f i (x), F (r) ), x R n, r R m. (7) In other words, i at each instant is selected such that we will have a smaller cost-to-go at the next time step. The key to the solution of the problem is the fact that if value function V (.,.) is obtained versus its inputs, then one can find the optimal mode in a feedback form in online operation, as seen in (7). Motivated by the developments in RL and ADP literature for optimal control problems [2] [32], [43], a reinforcement learning scheme is selected in this study for learning the desired function for all x Ω x R n and r Ω r R m. Domains Ω x and Ω r are selected to be closed and bounded, i.e., compact, representing the domains of interest for the respective variables. They need to be selected based on the physics of the problem and its operation envelope. The learning process starts with selecting an initial guess on V (.,.), denoted with V 0 (.,.), e.g., V 0 (x, r) = 0, x Ω x, r Ω r. Afterwards, one updates the guess using V j+ (x, r) = Q(x, r) + min i I V j( f i (x), F (r) ), x Ω x, r Ω r, (8) where superscript j denotes the iteration index. This selection leads to the standard value iteration approach to reinforcement learning [23] in solving conventional problems. However, considering the switching nature of the problem at hand, the following challenging questions arise. ) Does iterative equation (8) converge as j, i.e., is sequence {V 0 (x, r), V (x, r), V 2 (x, r),...}, denoted with {V j (x, r)} j=0, convergent, x Ω x, r Ω r? 2) Which initial guesses on V 0 (.,.) guarantee the convergence? 3) If the sequence is convergent, does it converge to the optimal solution, i.e., do we have lim V j (x, r) = V (x, r), x Ω x, r Ω r. (9) j 4) Since {V j (x, r)} j=0 is a sequence of functions, is its convergence pointwise, for every given x and r, or uniform throughout the domains Ω x and Ω r, [45]? 5) Assuming the sequence converges to the optimal solution, is the limit function a continuous function, so that one can use NNs for approximating it? Before proceeding to the next section, it should be noted that one eventually uses look-up tables or function approximators for approximating V i+ (.,.), generated from Eq. (8). Since the exact reconstruction of the right hand side of the equation in the general case is not possible, approximation errors will be introduced into the process. This study, however, assumes the function approximators are rich enough such that the approximation errors are negligible. IV. THEORETICAL ANALYSES In this section the theoretical questions raised at the end of the previous section are investigated and the answers are sought. The idea presented in this study for answering the questions is different from the standard approaches proposed in the RL and ADP literature in proof of convergence of the respective iterative equations in optimal control problems, e.g., [2], [27]. The idea, motivated by [42], is establishing an analogy between the iterative learning scheme given by Eq. (8), which is proposed for solving infinite-horizon problems and the solution to finitehorizon optimal control problems with fixed final time. The latter was investigated in [39] for the non-tracking case. Once this analogy is established, the answers to the questions follow in straightforward and easy-to-follow forms. Let the respective optimal tracking problem with a finite-horizon cost function be given by minimizing cost function N J N = ψ(x N, r N ) + Q(x k, r k ), (0) k=0 4

5 subject to system dynamics () and reference signal dynamics (2), where convex, continuous, and positive (semi- )definite function ψ : R n R m R + penalizes the state error at the final time and Q(.,.) is the same as in (3). As seen, the only difference between this problem and the problem subject to this study is the fact that the horizon is fixed and finite in this problem. There is an important difference between infinite-horizon and finite-horizon problems, namely, in infinite-horizon problems the objective is directing the states in certain directions without incorporating any time limitations. In finite-horizon problems, however, the time is limited, hence, the objective should be fulfilled in a given time. For example, in the selected cost function given by Eq. (0) one may select a large value for ψ(.,.) as compared with Q(.,.) to emphasize minimizing the tracking error at the final time compared with the tracking error during the horizon. Interested readers are referred to [39] for more details and several examples. Denoting the value function for the finite-horizon problem at time step k with V,N k (.,.), cost function (0) leads to V,0 (x, r) = ψ(x, r), x R n, r R m, () and V,N k (x, r) = Q(x, r) + V,N (k+)( f i,n k (x,r)(x), F (r) ), x R n, r R m, k K, (2) where i,n k (x, r) denotes the optimal mode at time step k and K := {0,, 2,..., N }. Note that in finitehorizon problems (with fixed final time N), the value function and hence, the solution depend on the remaining time or time-to-go, i.e., N k. In other words, having the same x k and r k, but a different time-to-go may leads to a different solution [44], [36]. The time dependencies of the value function and the optimal decision are incorporated by superscript N k in V,N k (.,.) and i,n k (.,.). The Bellman principle of optimality [44] leads to the solution, that is () along with and V,N k (x, r) = Q(x, r) + min i I V,N (k+)( f i (x), F (r) ), x R n, r R m, k K, (3) i,n k (x, r) = argmin i I V,N (k+)( f i (x), F (r) ), x R n, r R m, k K. (4) The important point in the finite-horizon problem is the fact that the final time is fixed and finite. Considering the value functions at different time-to-go s as separate functions, one can start from the final time to approximate V,0 (.,.) using (). Afterwards, each V,N k (.,.) can be found using (3) step-by-step from k = N to k = 0, i.e., in a backward fashion. Ref. [39] presents the training algorithm and its analysis in details, for finite-horizon (non-tracking) problems. In infinite-horizon problems, however, this approach is not possible, as there is no final time to start with. Considering the finite-horizon problems however, the following results can be obtained. Lemma. If the continuous positive semi-definite initial guess in iterative relation (8) is given by V 0 (.,.) and ψ(.,.) is selected as ψ(.,.) = V 0 (.,.) then one has V j (x, r) = V,j (x, r), x Ω x, r Ω r, j N. (5) Proof : Eq. ψ(.,.) = V 0 (.,.) is given, hence, (5) holds for j = 0, considering (). Assume that Eq. (5) holds for a given j N. Selecting N > j, and k = N (j + ), Eq. (3) leads to V,j+ (x, r) = Q(x, r) + min i I V,j( f i (x), F (r) ), x R n, r R m, j K, (6) Comparing (6) with (8) and considering (5) for the given j, leads to V j+ (x, r) = V,j+ (x, r), x Ω x, r Ω r. (7) Therefore, Eq. (5) holds for all j N, by mathematical induction. Lemma presents an interesting result, namely, the immature value function (with respect to the infinite-horizon problem at hand) subject to iteration at the jth iteration of (8) is exactly the (optimal) value function of a finitehorizon problem with the time-to-go of j. Considering this analogy between the iteration and the time-to-go, the answer to Question is within our reach. The following lemma helps in answering the question. 5

6 Lemma 2. Let the value function of the finite-horizon problem of minimizing (0) subject to () and (2) with the time-to-go of j be given by V,j (x, r). If ψ(.,.) is selected such that 0 ψ(x, r) Q(x, r), x R n, r R m, (8) then sequence {V,j (x, r)} j=0 is a convergent sequence, for every given x and r. Proof : The first step is showing that {V,j (x, r)} j=0 is a non-decreasing sequence, for every given x and r. The proof is done by induction. Considering (), (3) evaluated at k = N, and (8), one has V,0 (x, r) V, (x, r), x Ω x, r Ω r, (9) because only one of non-negative terms which form V, (x, r) is Q(x, r) and this term is greater then or equal to V,0 (x, r). Now, assume that for some j, one has Let s define V(.,.) as where V,j (x, r) V,j (x, r), x Ω x, r Ω r. (20) V(x, r) := Q(x, r) + V,j ( f i,j+ (x,r)(x), F (r) ), x Ω x, r Ω r, (2) i,j+ (x, r) = argmin i I V,j( f i (x), F (r) ), (22) per (4). Comparing (2) with (3), where the latter is evaluated at k = N j one has V,j (x, r) V(x, r), x Ω x, r Ω r, (23) because V,j (.,.) is the result of minimization of the right hand side of (3). Moreover, evaluating (3) at k = N j and considering (4) one has Comparing (24) with (2), one has V,j+ (x, r) = Q(x, r) + V,j( f i,j+ (x,r)(x), F (r) ), x Ω x, r Ω r. (24) because of (20). Finally inequalities (23) and (25) lead to V(x, r) V,j+ (x, r), x Ω x, r Ω r, (25) V,j (x, r) V,j+ (x, r), x Ω x, r Ω r, (26) which together with (9) and (20), proves the pointwise non-decreasing feature of {V,j (x, r)} j=0. On the other hand, there exists some switching schedule, using which, cost function (3) and hence, finitehorizon cost function (0) as N, are bounded, per Assumption. The existence of such a switching schedule leads to the upper boundedness of lim j V,j (x, r), because, otherwise, the utilized switching sequence in generating V,j (x, r) is not optimal compared to the existing switching schedule. Finally, the upper boundedness of {V,j (x, r)} j=0 and its non-decreasing feature lead to its convergence, [46]. Theorem. Iterative relation (8) converges to the optimal solution to the infinite-horizon optimal control problem of minimizing cost function (3) subject to () and (2), i.e., lim V j (x, r) = V (x, r), x Ω x, r Ω r, (27) j if initial guess V 0 (.,.) is a continuous function such that 0 V 0 (x, r) Q(x, r), x Ω x, r Ω r. Proof : Considering the analogy between the iterations of (8) and the solution to a finite-horizon problem with ψ(.,.) = V 0 (.,.), as shown in Lemma, and the convergence result given in Lemma 2, sequence {V j (x, r)} j=0 converges. Denoting the converged value with V (x, r), what is remained to show is V (x, r) = V (x, r). Note that V (x, r) is the value function corresponding to the cost-function (0) as N, while V (x, r) corresponds to cost function (3). Considering Assumption, one has lim Q(x k, r k ) = 0, x 0 Ω x, r 0 Ω r, (28) k 6

7 once the optimal modes are selected during the horizon, otherwise, the cost function becomes unbounded, [46]. Eq. (28), leads to lim N J N = J by their definition given by (3) and (0), considering 0 V 0 (x, r) = ψ(x, r) Q(x, r). Therefore, V (x, r) = V (x, r), otherwise, the smaller value will be the optimal solution to the infitnite-horizon optimal control problem and also the least upper bound to sequence {V j (x, r)} j=0. Theorem answers Questions, 2, and 3, raised in the previous section. The answer to Question 4, however, is particularly important to us, because, it will help us investigate certain features of the limit function of the sequence, which is the desired value function, including its continuity, asked in Question 5. Note that, NNs with continuous neurons are proved to provide uniform approximation if the function subject to approximation is continuous, [47], [48]. Lemma 3, which is based on a lemma developed in [39], proves the continuity of the value function of the respective finite-horizon problems. Afterwards, Lemma 4, based on an idea developed in [2] and adapted in [22] for a similar purpose, is presented which answers Question 4. Then, we proceed to Theorem 2 which answers Question 5, using Lemmas 3 and 4. Lemma 3. If functions F (.), ψ(.,.), Q(.,.), and f i (.), i, are continuous with respect to their inputs, then, the finite-horizon value functions defined by () and (3) are continuous versus inputs x and r. Proof : The proof is done by induction. Starting from V,0 (.,.), it is continuous because of () and the fact that ψ(.,.) is a continuous function versus its inputs. Now, assume that V,j (.,.) is continuous, if it can be shown that this assumption leads to V,j+ (.,.) being continuous, the proof is complete. Note that due to the switching between different is as x and r change, this continuity is not obvious from Eq. (3). Considering (6), which is Eq. (3) in terms of j, instead of N k, the continuity problem can be rephrased to the following. If function V : R n R m I R + is defined as and the piecewise constant function i : R n R m I is given by V ( x, r, i) := Q(x, r) + V,j( f i (x), F (r) ), (29) i (x, r) = argmin i I V ( x, r, i) = argmin i I V,j( f i (x), F (r) ), (30) where V,j (.,.) is continuous versus its inputs, then, prove that function V (.,., i (.,.) ) is a continuous function versus x and r at every x R n and r R m. Note that V ( x, r, i (x, r) ) = V,j+ (x, r), x and r. Therefore, the proof of continuity of V (.,., i (.,.) ) completes the proof of the lemma. Let x be any selected point in R n, for any given r R m set Select an open set α R n such that x belongs to the boundary of α and limit î = ī = i ( x, r). (3) lim x x 0,x α i (x, r), (32) exists, where. denotes the vector norm. If ī = î, for every such α, then there exists some open set β R n containing x such that i k (x, r) is constant for all x β, because i k (x, r) only assumes integer values. In this case the continuity of V (., r, i (., r) ) at x = x follows from the fact that V (., r, i ) is continuous at x = x, for every fixed i I and given r, by composition. The reason is Q(., r), f i (.), and V,j (., r) are continuous functions. Finally, the continuity of the function subject to investigation at every x R n, leads to the continuity of the function in R n. Now assume ī î, for some α. From the continuity of V(., r, î) for the given r and î, one has If it can be shown that, for every selected α, one has V( x, r, î) = lim V( x + δx, r, î) (33) δx 0 V( x, r, ī) = V( x, r, î), (34) then the continuity of V (., r, i (., r) ) versus x follows, because from (33) and (34) one has V( x, r, ī) = lim V( x + δx, r, î), (35) δx 0 7

8 and (35) leads to the continuity by definition, [45]. The proof that (34) holds is done by contradiction. Assume that for some x and some α one has V( x, r, ī) < V( x, r, î), (36) then, due to the continuity of both sides of (36) at x for the fixed r, ī, and î, there exists an open set γ containing x, such that V(x, r, ī) < V(x, r, î), x γ. (37) Inequality (37) implies that at points which are close enough to x, one has i (x, r) î. But, this contradicts Eq. (32) which implies that there always exists a point x arbitrarily close to x at which i (x, r) = î. Therefore, equality (37) cannot hold. Now, assume that V( x, r, ī) > V( x, r, î),. (38) Inequality (38) leads to i k ( x, r) ī. But, this is against (3), hence, (38) also cannot hold. Therefore, (34) holds and hence, V(., r, i (., r)) is continuous at every x R n for every fixed r. Repeating the entire process with a fixed x and varying r, the continuity of the function with respect to r also can be similarly proved. This completes the induction and the proof of the lemma. Lemma 4. If there exists a constant c such that V (x, r) cq(x, r), x Ω x, r Ω r, selecting ψ(.,.) = 0, the sequence of finite-horizon value functions converges uniformly to the optimal value function of the respective infinite-horizon problem in compact sets Ω x and Ω r. Proof : The proof is based on an idea developed in [2] and utilized in [22], by showing that ( V,k ) (x, r) ( + c ) k V (x, r), x Ω x, r Ω r, k K. (39) Considering V,0 (.,.) = 0 Eq. (39) holds for k = 0. Assume it holds for some k. Then V,k+ (x, r) = min i I (Q(x, r) + V,k( f i (x), F (r) )) min i I (( ( ( + c ) k+ )Q(x, r) + ( ( + c ) k + c ) ( + c ) k+ min i I (Q(x, r) + V ( f i (x), F (r) )) ( = ( + c ) k+ )V ( f i (x), F (r) )) = ) ( + c ) k+ V (x, r), x R n, r R m. Therefore, inequality (39) holds for all k. On the other hand, by Lemmas and 2 and Theorem, the non-decreasing feature of {V,k (x, r)} k=0 and its convergence to V (x, r) lead to the upper boundedness of each V,k (x, r) by V (x, r) for any given x and r. Utilizing this upper bound and the lower bound given in (39) one has V (x, r) V,k (x, r) (40) ( + c ) k V (x, r), x Ω x, r Ω r, k K. (4) Replacing V (x, r) with V := sup x Ωx,r Ω r V (x, r), which is a bounded constant, per Assumption, the foregoing inequality leads to the uniform convergence of sequence of finite-horizon value functions to the respective infinite-horizon optimal value function as the horizon is extended to infinity, [45]. Theorem 2. If there exists a constant c such that V (x, r) cq(x, r), x Ω x, r Ω r, and functions F (.), Q(.,.), and f i (.), i, are continuous with respect to their inputs, then, the value function of the infinitehorizon optimal control problem, V (.,.), is a continuous function with respect to its both inputs. Proof : As seen in Lemma 2 and Theorem, selecting for example ψ(.,.) = 0, the sequence of finite-horizon value functions {V,j (.,.)} j=0 converges to the infinite-horizon value function V (.,.). Moreover, Lemma 3 shows that the elements of the sequence of finite-horizon value functions are continuous with respect to both inputs. Since the convergence of the finite-horizon value functions is uniform (Lemma 4), the continuity is preserved, i.e., the limit function, which is the infinite-horizon value function, is also continuous with respect to the both inputs, [45]. 8

9 A. Offline Learning Process V. IMPLEMENTATION OF THE PROPOSED SOLUTION For implementation of the proposed method, one can use NNs as global function approximators. Selecting linear-in-weight NNs, the function is approximated within compact sets Ω x R n and Ω r R m using W T ϕ(x k, r k ) V (x k, r k ), x k Ω x, r k Ω r, (42) where the selected smooth basis functions are given by ϕ : R n R m R l, with l being a positive integer denoting the number of neurons. Unknown matrix W R l, to be found using learning algorithms, is the weight matrix of the network. Note that the inputs to the basis functions correspond to the dependency of the function subject to approximation on the current state and reference signal values. Once the NN structure is selected, the next step is developing the learning algorithm. Denoting the NN weight matrix at the jth iteration with W j, function W jt ϕ(.,.) is supposed to approximate V j (.,.). The learning starts through selecting an initial guess on W 0. Afterwards, one needs to update the weight matrix through Eq. (8). Rewriting Eq. (8) in terms of the NN, leads to W j+t ϕ(x, r) = Q(x, r) + min i I W jt ϕ ( f i (x), F (r) ), x Ω x, r Ω r, (43) hence, W j+ will be calculated based on W j using Eq. (43), until the weight matrix converges. This learning process can be either in a batch or in a sequential form, as detailed in Algorithms and 2, respectively. Algorithm - Batch Learning Step : Randomly select p different x [q] Ω x and r [q] Ω r, q {, 2,.., p}, for p being a large positive integer, where Ω x R n and Ω r R m represent the domains of interest. Step 2: Select initial guess W 0 R l, e.g., W 0 = 0. Step 3: Set j = 0. Step 4: Find W j+ such that W j+t ϕ(x [q], r [q] ) = Q(x [q], r [q] ) + min i I W jt ϕ ( f i (x [q] ), F (r [q] ) ), q {, 2,.., p}. (44) Step 5: If W j+ W j β, where β is a small positive real number, selected as the tolerance, then proceed to Step 6, otherwise, set j = j + and go back to Step 4. Step 6: Set W = W j+ and stop the training. Algorithm 2 - Sequential Learning Step : Select an initial guess on W 0 R l, e.g., W 0 = 0. Step 2: Set j = 0. Step 3: Randomly select x Ω x and r Ω r, where Ω x R n and Ω r R m represent the domains of interest. Step 4: Train weight W j+ of neural network W j+t ϕ(.,.) using inputs x and r and target Q(x, r) + min i I W jt ϕ ( f i (x), F (r) ). Step 5: If W j+ W j β for several consecutive runs of Steps 3 and 4, where β is a small positive real number, selected as the tolerance, then proceed to Step 6. Otherwise, set j = j + and go back to Step 3. Step 6: Set W = W j+ and stop the training. If Algorithm is selected, one can use the method of least squares for solving Eq. (44) in one shot and updating the weight matrix. Interested readers are referred to [39] for details on forming the least squares. Another option for updating the weights in both algorithms is using gradient descent based training laws. It should be noted that at each iteration of the learning algorithm, only one set of weights will be stored to be used in the next iteration. Also, set I, among whose elements the minimization in Eq. (43) is carried out, will have constant number of elements. These points lead to less storage and computation load compared with the proposed scheme in [3], [4] where the number of weight matrices and the number of elements in the respective set in the minimization grow exponentially with the iteration index. 9

10 Finally, before concluding this section, it should be noted that the selection of linear-in-weight form for the NN, as done in (42), is not required for the theory developed in this study to be valid. One can utilize multi-layer perceptrons for improving the approximation capability of the NN. In this case Eq. (42) changes to N (W, x, r) V (x, r), x Ω x, r Ω r, (45) where function N : R l R n R m R denotes the NN mapping, with the first argument being the tunable weights of the NN with l elements, and the next two argument being its inputs. B. Online Control Once the NN weight matrix is learned through Algorithms or 2 in offline training, the resulting final weights can be used for online control/switching of the system. This is done in real-time through feeding the current x and r to the following equations which calculate i (x, r) in a feedback form i (x, r) = argmin i I W T ϕ ( f i (x), F (r) ), x Ω x, r Ω r. (46) Note that Eq. (46) is the same as Eq. (7), except that it is rephrased in terms of the NN approximation of the value function. Since I is a discrete set with a finite number of elements, the minimization in Eq. (46) can be carried out easily in real-time. As a matter of fact, the computational burden is as low as evaluating M scalar-valued functions and selecting the i corresponding to the least value. Finally, it should be noted that the NN produces an approximation of the optimal solution as long as the x and r are within the domains using which the NN is trained. These domains need to be selected carefully to cover the entire operation envelop of the specific problem at hand. The validity of the results within those domains leads to an interesting characteristic of the proposed solution, that is, it provides solutions for different initial conditions x 0 and r 0, as long as the resulting trajectory stays in the domains. Therefore, no retraining is needed each time the initial conditions of the system, or of the reference signal changes and the same trained NN can be used for optimal control/switching of the system. VI. EXTENSION OF THE RESULTS TO REGULATION The proposed method solves the optimal regulation problems as well, i.e., minimizing J = Q(x k ), (47) k=0 subject to dynamics (), where Q : R n R +. Because, regulation of the states is a particular case of tracking, in which, the reference signal is zero. It can be seen that in regulation problems the value function will be only a function of x, i.e., V (x). Therefore, the NN given by W T ϕ(x k ) V (x k ), x k Ω x, (48) will be suitable for approximating the solution and there is no need to feed the zero to the network as the reference signal. The rest of the process is the same as discussed for the tracking problem. As for theoretical analyses, since regulation is a particular case of tracking, all the obtained results are valid for the regulation as well. VII. NUMERICAL ANALYSES A nonlinear second order system with three modes, simulated in [5] and [39], is selected. The source codes for the simulations are available at [49]. The objective of this problem is controlling the fluid level in a two-tank setup. The fluid flow into the upper tank can be adjusted through a valve which has three positions: fully open, half open, and fully closed. Each tank leaks fluid with a rate proportional to the square root of the height of the fluid in the respective tank. The upper tank leaks into the lower tank, and the lower tank leaks to the outside of the setup. Representing the fluid height in the upper tank with scalar y and in the lower tank with scalar z, the dynamics of the state vector x = [y, z] T are given by the following three modes, corresponding to the three positions of the valve, [ ] [ ] [ ] y y y + ẋ = f (x) :=, ẋ = f y z 2 (x) :=, ẋ = f y z 3 (x) :=. (49) y z 0

11 The selected objective is forcing the fluid level in the lower tank, i.e., z, to track reference signal r(t) R with the dynamics of ṙ(t) = r 3 (t). (50) Since the problem is in continuous time, sampling time of 0.05s was used for discretizing the problem using forward Euler integration. Then, cost function (3) was selected for evaluating the performance of the method with Q(x, r) = 0(z r) 2. The basis functions for this example were selected as polynomials y n zn 2 rn 3, where non-negative integers n i, i =, 2, 3, are such that 0 n + n 2 + n 3 4. This selection led to 75 neurons. Domains Ω x = {[y, z] T R 2 : 0 y, z < } and Ω r = {r R : 0 r < } were used for the training. The batch training scheme was conducted using least squares [39], such that p = 2000 random states were selected in implementing Algorithm. It was observed that the training converged after almost 60 iteration, as seen in Fig. which shows the evolution of the weight elements during the training iterations. The training process took almost 60 seconds on a desktop computer with Intel Core i7-3770, 3.40 GHz processor and 8 GB of memory, running Windows 7 and MATLAB 203 (single threading). Once the network was trained, initial conditions x 0 = [0, ] T and r 0 = were used to simulate the problem. The results are given in Fig. 2. As seen in the figure, the method did a nice job controlling the fluid level of the lower tank by tracking the desired reference signal. Next, the capability of the neurocontroller in controlling different initial conditions within Ω x is investigated. Note that, if the initial state x 0 is within the selected Ω x, the state trajectory will stay within Ω x, regardless of the applied switching schedule, due to the dynamics of the system. A new initial condition, namely x 0 = [, ] T is utilizes as the next simulation and the trained network (without re-training) is used for controlling it. The results, given in Fig. 3, show the capability of the controller in controlling different initial states without any need for retraining. Finally, a new initial condition for the reference signal is selected, namely, r 0 = 0.2. The dynamics of the reference signal are also such that whenever the initial condition is within Ω r, the whole trajectory will stay in Ω r. Assuming the initial states of x 0 = [0, 0] T the NN is used for tracking the new reference signal generated through the new r 0. The results, presented in Fig. 4, show that the controller has been successful in this scenario as well. In other words, the same trained NN can be used for tracking a family of reference signal which share the same dynamics, but, are generated using different initial conditions. It should be noted, however, that as seen in Eq. (8), the network is trained based on the assumed F (.). Therefore, even though we are feeding the current reference signal value to the network, it will only provide approximate optimal tracking solution if the fed reference signal has the dynamics modeled by F (.), as given in (2). Otherwise, the results will not be reliable. 400 Weight Elements Training Iterations Fig.. Evolution of the NN weights during the training/learning process. VIII. CONCLUSIONS A value iteration based scheme was presented for infinite-horizon optimal tracking/regulation of nonlinear switching systems. The iterative nature of the solution along with the need for using a function approximator for learning the input-output mapping, led to several fundamental questions, including its convergence. The raised questions were addressed analytically and rigorous answers were obtained. After providing the training algorithms and the process for online control, the performance was evaluated on a benchmark nonlinear switching system. It was shown that the controller provides approximate optimal solution for different initial conditions and different reference signals, as long as certain conditions hold. The low real-time computational burden of the proposed method makes it attractive for implementation in embedded systems for different real-world problems.

12 r K y k z k Active Mode Time Steps k Fig. 2. Simulation result for x 0 = [0, ] T, and r 0 = r K y k z k Active Mode Time Steps k Fig. 3. Simulation result for x 0 = [, ] T, and r 0 =. 0.4 r K 0.3 y k z k Active Mode Time Steps k Fig. 4. Simulation result for x 0 = [0, 0] T, and r 0 =

13 REFERENCES [] X. Xu and P. J. Antsaklis, Optimal control of switched systems via non-linear optimization based on direct differentiations of value functions, International Journal of Control, vol. 75, no. 6-7, pp , [2] X. Xu and P. Antsaklis, Optimal control of switched systems based on parameterization of the switching instants, IEEE Transactions on Automatic Control, vol. 49, pp. 2 6, Jan [3] H. Axelsson, M. Boccadoro, M. Egerstedt, P. Valigi, and Y. Wardi, Optimal mode-switching for hybrid systems with varying initial states, Nonlinear Analysis: Hybrid Systems, vol. 2, no. 3, pp , [4] X. Ding, A. Schild, M. Egerstedt, and L. Jan, Real-time optimal feedback control of switched autonomous systems, IFAC Proceedings Volumes (IFAC-PapersOnline), vol. 3, pp. 08 3, [5] H. Axelsson, M. Egerstedt, Y. Wardi, and G. Vachtsevanos, Algorithm for switching-time optimization in hybrid dynamical systems, in Proceedings of the IEEE International Symposium on Intelligent Control, pp , June [6] Y. Wardi and M. Egerstedt, Algorithm for optimal mode scheduling in switched systems, in Proceedings of the American Control Conference, 202. [7] M. Kamgarpour and C. Tomlin, On optimal control of non-autonomous switched systems with a fixed mode sequence, Automatica, vol. 48, no. 6, pp. 77 8, 202. [8] M. Rungger and O. Stursberg, A numerical method for hybrid optimal control based on dynamic programming, Nonlinear Analysis: Hybrid Systems, vol. 5, no. 2, pp , 20. [9] M. Sakly, A. Sakly, N. Majdoub, and M. Benrejeb, Optimization of switching instants for optimal control of linear switched systems based on genetic algorithms, in IFAC Proceedings Volumes (IFAC-PapersOnline), vol. 2, [0] C.-H. Lien, K.-W. Yu, H.-C. Chang, L.-Y. Chung, and J.-D. Chen, Switching signal design for exponential stability of discrete switched systems with interval time-varying delay, Journal of the Franklin Institute, vol. 349, no. 6, pp , 202. [] S. Zhai and X.-S. Yang, Exponential stability of time-delay feedback switched systems in the presence of asynchronous switching, Journal of the Franklin Institute, vol. 350, no., pp , 203. [2] A. Heydari and S. Balakrishnan, Optimal multi-therapeutic HIV treatment using a global optimal switching scheme, Applied Mathematics and Computation, vol. 29, no. 4, pp , 203. [3] C. Qin, H. Zhang, Y. Luo, and B. Wang, Finite horizon optimal control of non-linear discrete-time switched systems using adaptive dynamic programming with epsilon-error bound, International Journal of Systems Science, 203. [4] W. Lu and S. Ferrari, An approximate dynamic programming approach for model-free control of switched systems, Proceedings of the IEEE Conference on Decision and Control, pp , 203. [5] M. Rinehart, M. Dahleh, D. Reed, and I. Kolmanovsky, Suboptimal control of switched systems with an application to the disc engine, IEEE Transactions on Control Systems Technology, vol. 6, no. 2, pp , [6] A. Heydari and S. N. Balakrishnan, Optimal orbit transfer with on-off actuators using a closed form optimal switching scheme, in AIAA Guidance, Navigation, and Control Conference, 203. [7] K. Benmansour, A. Benalia, M. Djema, and J. de Leon, Hybrid control of a multicellular converter, Nonlinear Analysis: Hybrid Systems, vol., no., pp. 6 29, [8] C. Liu and Z. Gong, Modelling and optimal control of a time-delayed switched system in fed-batch process, Journal of the Franklin Institute, vol. 35, no. 2, pp , 204. [9] E. Hernandez-Vargas, P. Colaneri, R. Middleton, and F. Blanchini, Discrete-time control for switched positive systems with application to mitigating viral escape, Int. J. Robust and Nonlinear Control, pp. 093, 20. [20] J. Zhai, B. Shen, J. Gao, E. Feng, and H. Yin, Optimal control of switched systems and its parallel optimization algorithm, Journal of Computational and Applied Mathematics, vol. 26, pp , 204. [2] B. Lincoln and A. Rantzer, Relaxing dynamic programming, IEEE Transactions on Automatic Control, vol. 5, pp , Aug [22] M. Rinehart, M. Dahleh, and I. Kolmanovsky, Value iteration for (switched) homogeneous systems, IEEE Transactions on Automatic Control, vol. 54, no. 6, pp , [23] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, 2nd ed., 202. [24] P. J. Werbos, Approximate dynamic programming for real-time control and neural modeling, in Handbook of Intelligent Control (D. A. White and D. A. Sofge, eds.), Multiscience Press, 992. [25] S. N. Balakrishnan and V. Biega, Adaptive-critic based neural networks for aircraft optimal control, Journal of Guidance, Control and Dynamics, vol. 9, pp , 996. [26] D. Prokhorov and D. Wunsch, Adaptive critic designs, IEEE Transactions on Neural Networks, vol. 8, pp , 997. [27] A. Al-Tamimi, F. Lewis, and M. Abu-Khalaf, Discrete-time nonlinear hjb solution using approximate dynamic programming: Convergence proof, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 38, pp , Aug [28] G. Venayagamoorthy, R. Harley, and D. Wunsch, Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator, IEEE Transactions on Neural Networks, vol. 3, pp , May [29] P. He and S. Jagannathan, Reinforcement learning-based output feedback control of nonlinear systems with input constraints, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 35, no., pp , [30] H. Zhang, Q. Wei, and Y. Luo, A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 38, no. 4, pp , [3] T. Dierks, B. T. Thumati, and S. Jagannathan, Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence, Neural Networks, vol. 22, no. 5-6, pp , [32] D. Wang, D. Liu, Q. Wei, D. Zhao, and N. Jin, Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming, Automatica, vol. 48, no. 8, pp ,

14 [33] F. Lewis, D. Vrabie, and K. Vamvoudakis, Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers, IEEE Control Systems, vol. 32, pp , Dec 202. [34] M. Fairbank, E. Alonso, and D. Prokhorov, An equivalence between adaptive dynamic programming with a critic and backpropagation through time, IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 2, pp , 203. [35] X. Chen, Y. Gao, and R. Wang, Online selective kernel-based temporal difference learning, IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 2, pp , 203. [36] A. Heydari and S. N. Balakrishnan, Fixed-final-time optimal control of nonlinear systems with terminal constraints, Neural Networks, vol. 48, pp. 6 7, 203. [37] Q. Zhao, H. Xu, and S. Jagannathan, Optimal control of uncertain quantized linear discrete-time systems, International Journal of Adaptive Control and Signal Processing, 204. [38] A. Heydari and S. Balakrishnan, Optimal switching and control of nonlinear switching systems using approximate dynamic programming, IEEE Transactions on Neural Networks and Learning Systems, vol. 25, pp , 204. [39] A. Heydari and S. Balakrishnan, Optimal switching between autonomous subsystems, Journal of the Franklin Institute, vol. 35, 204. [40] A. Heydari and S. Balakrishnan, Optimal switching between controlled subsystems with free mode sequence, vol. 49, no. 0, pp , 205. [4] C. Qin, H. Zhang, and Y. Luo, Optimal tracking control of a class of nonlinear discrete-time switched systems using adaptive dynamic programming, Neural Computing and Applications, vol. 24, no. 3-4, pp , 204. [42] A. Heydari, Revisiting approximate dynamic programming and its convergence, IEEE Transactions on Cybernetics, vol. 44, no. 2, pp , 204. [43] A. Heydari and S. N. Balakrishnan, Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics., IEEE Trans. Neural Netw. Learning Syst., vol. 24, no., pp , 203. [44] D. E. Kirk, Optimal control theory; an introduction. Prentice-Hall, 970. pp [45] W. F. Trench, Introduction to Real Analysis vailable online at pp [46] W. Rudin, Principles of Mathematical Analysis. McGraw-Hill, 3rd ed., 976. pp. 55, 60. [47] K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Networks, vol. 2, no. 5, pp , 989. [48] H. Jeffreys and B. S. Jeffreys, Weierstrass s theorem on approximation by polynomials, in Methods of Mathematical Physics, pp , Cambridge University Press, 3rd ed., 988. [49] Available online at 4

Stability Analysis of Optimal Adaptive Control under Value Iteration using a Stabilizing Initial Policy

Stability Analysis of Optimal Adaptive Control under Value Iteration using a Stabilizing Initial Policy Stability Analysis of Optimal Adaptive Control under Value Iteration using a Stabilizing Initial Policy Ali Heydari, Member, IEEE Abstract Adaptive optimal control using value iteration initiated from

More information

arxiv: v1 [math.oc] 23 Oct 2017

arxiv: v1 [math.oc] 23 Oct 2017 Stability Analysis of Optimal Adaptive Control using Value Iteration Approximation Errors Ali Heydari arxiv:1710.08530v1 [math.oc] 23 Oct 2017 Abstract Adaptive optimal control using value iteration initiated

More information

An Optimal Tracking Approach to Formation Control of Nonlinear Multi-Agent Systems

An Optimal Tracking Approach to Formation Control of Nonlinear Multi-Agent Systems AIAA Guidance, Navigation, and Control Conference 13-16 August 212, Minneapolis, Minnesota AIAA 212-4694 An Optimal Tracking Approach to Formation Control of Nonlinear Multi-Agent Systems Ali Heydari 1

More information

Decentralized Control of Nonlinear Multi-Agent Systems Using Single Network Adaptive Critics

Decentralized Control of Nonlinear Multi-Agent Systems Using Single Network Adaptive Critics Decentralized Control of Nonlinear Multi-Agent Systems Using Single Network Adaptive Critics Ali Heydari Mechanical & Aerospace Engineering Dept. Missouri University of Science and Technology Rolla, MO,

More information

Optimal Triggering of Networked Control Systems

Optimal Triggering of Networked Control Systems Optimal Triggering of Networked Control Systems Ali Heydari 1, Member, IEEE Abstract This study is focused on bandwidth allocation in nonlinear networked control systems. The objective is optimal triggering/scheduling

More information

Learning Model Predictive Control for Iterative Tasks: A Computationally Efficient Approach for Linear System

Learning Model Predictive Control for Iterative Tasks: A Computationally Efficient Approach for Linear System Learning Model Predictive Control for Iterative Tasks: A Computationally Efficient Approach for Linear System Ugo Rosolia Francesco Borrelli University of California at Berkeley, Berkeley, CA 94701, USA

More information

arxiv:submit/ [cs.sy] 17 Dec 2014

arxiv:submit/ [cs.sy] 17 Dec 2014 Optimal Triggering of Networked Control Systems Ali Heydari 1 arxiv:submit/1141497 [cs.sy] 17 Dec 2014 Abstract The problem of resource allocation of nonlinear networked control systems is investigated,

More information

On Weak Topology for Optimal Control of. Switched Nonlinear Systems

On Weak Topology for Optimal Control of. Switched Nonlinear Systems On Weak Topology for Optimal Control of 1 Switched Nonlinear Systems Hua Chen and Wei Zhang arxiv:1409.6000v3 [math.oc] 24 Mar 2015 Abstract Optimal control of switched systems is challenging due to the

More information

Adaptive Nonlinear Model Predictive Control with Suboptimality and Stability Guarantees

Adaptive Nonlinear Model Predictive Control with Suboptimality and Stability Guarantees Adaptive Nonlinear Model Predictive Control with Suboptimality and Stability Guarantees Pontus Giselsson Department of Automatic Control LTH Lund University Box 118, SE-221 00 Lund, Sweden pontusg@control.lth.se

More information

Disturbance Attenuation Properties for Discrete-Time Uncertain Switched Linear Systems

Disturbance Attenuation Properties for Discrete-Time Uncertain Switched Linear Systems Disturbance Attenuation Properties for Discrete-Time Uncertain Switched Linear Systems Hai Lin Department of Electrical Engineering University of Notre Dame Notre Dame, IN 46556, USA Panos J. Antsaklis

More information

MOST control systems are designed under the assumption

MOST control systems are designed under the assumption 2076 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 53, NO. 9, OCTOBER 2008 Lyapunov-Based Model Predictive Control of Nonlinear Systems Subject to Data Losses David Muñoz de la Peña and Panagiotis D. Christofides

More information

H State-Feedback Controller Design for Discrete-Time Fuzzy Systems Using Fuzzy Weighting-Dependent Lyapunov Functions

H State-Feedback Controller Design for Discrete-Time Fuzzy Systems Using Fuzzy Weighting-Dependent Lyapunov Functions IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL 11, NO 2, APRIL 2003 271 H State-Feedback Controller Design for Discrete-Time Fuzzy Systems Using Fuzzy Weighting-Dependent Lyapunov Functions Doo Jin Choi and PooGyeon

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

A Novel Integral-Based Event Triggering Control for Linear Time-Invariant Systems

A Novel Integral-Based Event Triggering Control for Linear Time-Invariant Systems 53rd IEEE Conference on Decision and Control December 15-17, 2014. Los Angeles, California, USA A Novel Integral-Based Event Triggering Control for Linear Time-Invariant Systems Seyed Hossein Mousavi 1,

More information

ADAPTIVE control of uncertain time-varying plants is a

ADAPTIVE control of uncertain time-varying plants is a IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 1, JANUARY 2011 27 Supervisory Control of Uncertain Linear Time-Varying Systems Linh Vu, Member, IEEE, Daniel Liberzon, Senior Member, IEEE Abstract

More information

On the stability of receding horizon control with a general terminal cost

On the stability of receding horizon control with a general terminal cost On the stability of receding horizon control with a general terminal cost Ali Jadbabaie and John Hauser Abstract We study the stability and region of attraction properties of a family of receding horizon

More information

Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate

Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate www.scichina.com info.scichina.com www.springerlin.com Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate WEI Chen & CHEN ZongJi School of Automation

More information

Riccati difference equations to non linear extended Kalman filter constraints

Riccati difference equations to non linear extended Kalman filter constraints International Journal of Scientific & Engineering Research Volume 3, Issue 12, December-2012 1 Riccati difference equations to non linear extended Kalman filter constraints Abstract Elizabeth.S 1 & Jothilakshmi.R

More information

Online Adaptive Approximate Optimal Tracking Control with Simplified Dual Approximation Structure for Continuous-time Unknown Nonlinear Systems

Online Adaptive Approximate Optimal Tracking Control with Simplified Dual Approximation Structure for Continuous-time Unknown Nonlinear Systems 4 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL., NO. 4, OCTOBER 04 Online Adaptive Approximate Optimal Tracking Control with Simplified Dual Approximation Structure for Continuous-time Unknown Nonlinear

More information

Speed Profile Optimization for Optimal Path Tracking

Speed Profile Optimization for Optimal Path Tracking Speed Profile Optimization for Optimal Path Tracking Yiming Zhao and Panagiotis Tsiotras Abstract In this paper, we study the problem of minimumtime, and minimum-energy speed profile optimization along

More information

Optimal Control of Switching Surfaces

Optimal Control of Switching Surfaces Optimal Control of Switching Surfaces Y. Wardi, M. Egerstedt, M. Boccadoro, and E. Verriest {ywardi,magnus,verriest}@ece.gatech.edu School of Electrical and Computer Engineering Georgia Institute of Technology

More information

A Globally Stabilizing Receding Horizon Controller for Neutrally Stable Linear Systems with Input Constraints 1

A Globally Stabilizing Receding Horizon Controller for Neutrally Stable Linear Systems with Input Constraints 1 A Globally Stabilizing Receding Horizon Controller for Neutrally Stable Linear Systems with Input Constraints 1 Ali Jadbabaie, Claudio De Persis, and Tae-Woong Yoon 2 Department of Electrical Engineering

More information

Comparison of Heuristic Dynamic Programming and Dual Heuristic Programming Adaptive Critics for Neurocontrol of a Turbogenerator

Comparison of Heuristic Dynamic Programming and Dual Heuristic Programming Adaptive Critics for Neurocontrol of a Turbogenerator Missouri University of Science and Technology Scholars' Mine Electrical and Computer Engineering Faculty Research & Creative Works Electrical and Computer Engineering 1-1-2002 Comparison of Heuristic Dynamic

More information

SLIDING MODE FAULT TOLERANT CONTROL WITH PRESCRIBED PERFORMANCE. Jicheng Gao, Qikun Shen, Pengfei Yang and Jianye Gong

SLIDING MODE FAULT TOLERANT CONTROL WITH PRESCRIBED PERFORMANCE. Jicheng Gao, Qikun Shen, Pengfei Yang and Jianye Gong International Journal of Innovative Computing, Information and Control ICIC International c 27 ISSN 349-498 Volume 3, Number 2, April 27 pp. 687 694 SLIDING MODE FAULT TOLERANT CONTROL WITH PRESCRIBED

More information

Proceedings of the International Conference on Neural Networks, Orlando Florida, June Leemon C. Baird III

Proceedings of the International Conference on Neural Networks, Orlando Florida, June Leemon C. Baird III Proceedings of the International Conference on Neural Networks, Orlando Florida, June 1994. REINFORCEMENT LEARNING IN CONTINUOUS TIME: ADVANTAGE UPDATING Leemon C. Baird III bairdlc@wl.wpafb.af.mil Wright

More information

Using Neural Networks for Identification and Control of Systems

Using Neural Networks for Identification and Control of Systems Using Neural Networks for Identification and Control of Systems Jhonatam Cordeiro Department of Industrial and Systems Engineering North Carolina A&T State University, Greensboro, NC 27411 jcrodrig@aggies.ncat.edu

More information

TRANSITION-TIME OPTIMIZATION FOR SWITCHED SYSTEMS. Henrik Axelsson, 1 Yorai Wardi and Magnus Egerstedt

TRANSITION-TIME OPTIMIZATION FOR SWITCHED SYSTEMS. Henrik Axelsson, 1 Yorai Wardi and Magnus Egerstedt TRANSITION-TIME OPTIMIZATION FOR SWITCHED SYSTEMS Henrik Axelsson, 1 Yorai Wardi and Magnus Egerstedt {henrik,ywardi,magnus}@ece.gatech.edu Electrical and Computer Engineering, Georgia Institute of Technology,

More information

Adaptive Control with a Nested Saturation Reference Model

Adaptive Control with a Nested Saturation Reference Model Adaptive Control with a Nested Saturation Reference Model Suresh K Kannan and Eric N Johnson School of Aerospace Engineering Georgia Institute of Technology, Atlanta, GA 3332 This paper introduces a neural

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Adaptive linear quadratic control using policy. iteration. Steven J. Bradtke. University of Massachusetts.

Adaptive linear quadratic control using policy. iteration. Steven J. Bradtke. University of Massachusetts. Adaptive linear quadratic control using policy iteration Steven J. Bradtke Computer Science Department University of Massachusetts Amherst, MA 01003 bradtke@cs.umass.edu B. Erik Ydstie Department of Chemical

More information

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology, Kanpur Recurrent Networks p.1/40 Subjects

More information

Optimal Stopping Problems

Optimal Stopping Problems 2.997 Decision Making in Large Scale Systems March 3 MIT, Spring 2004 Handout #9 Lecture Note 5 Optimal Stopping Problems In the last lecture, we have analyzed the behavior of T D(λ) for approximating

More information

GAIN SCHEDULING CONTROL WITH MULTI-LOOP PID FOR 2- DOF ARM ROBOT TRAJECTORY CONTROL

GAIN SCHEDULING CONTROL WITH MULTI-LOOP PID FOR 2- DOF ARM ROBOT TRAJECTORY CONTROL GAIN SCHEDULING CONTROL WITH MULTI-LOOP PID FOR 2- DOF ARM ROBOT TRAJECTORY CONTROL 1 KHALED M. HELAL, 2 MOSTAFA R.A. ATIA, 3 MOHAMED I. ABU EL-SEBAH 1, 2 Mechanical Engineering Department ARAB ACADEMY

More information

FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES. Danlei Chu, Tongwen Chen, Horacio J. Marquez

FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES. Danlei Chu, Tongwen Chen, Horacio J. Marquez FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES Danlei Chu Tongwen Chen Horacio J Marquez Department of Electrical and Computer Engineering University of Alberta Edmonton

More information

Approximate dynamic programming for stochastic reachability

Approximate dynamic programming for stochastic reachability Approximate dynamic programming for stochastic reachability Nikolaos Kariotoglou, Sean Summers, Tyler Summers, Maryam Kamgarpour and John Lygeros Abstract In this work we illustrate how approximate dynamic

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

A Discrete Robust Adaptive Iterative Learning Control for a Class of Nonlinear Systems with Unknown Control Direction

A Discrete Robust Adaptive Iterative Learning Control for a Class of Nonlinear Systems with Unknown Control Direction Proceedings of the International MultiConference of Engineers and Computer Scientists 16 Vol I, IMECS 16, March 16-18, 16, Hong Kong A Discrete Robust Adaptive Iterative Learning Control for a Class of

More information

On Piecewise Quadratic Control-Lyapunov Functions for Switched Linear Systems

On Piecewise Quadratic Control-Lyapunov Functions for Switched Linear Systems On Piecewise Quadratic Control-Lyapunov Functions for Switched Linear Systems Wei Zhang, Alessandro Abate, Michael P. Vitus and Jianghai Hu Abstract In this paper, we prove that a discrete-time switched

More information

4F3 - Predictive Control

4F3 - Predictive Control 4F3 Predictive Control - Lecture 2 p 1/23 4F3 - Predictive Control Lecture 2 - Unconstrained Predictive Control Jan Maciejowski jmm@engcamacuk 4F3 Predictive Control - Lecture 2 p 2/23 References Predictive

More information

Online Identification And Control of A PV-Supplied DC Motor Using Universal Learning Networks

Online Identification And Control of A PV-Supplied DC Motor Using Universal Learning Networks Online Identification And Control of A PV-Supplied DC Motor Using Universal Learning Networks Ahmed Hussein * Kotaro Hirasawa ** Jinglu Hu ** * Graduate School of Information Science & Electrical Eng.,

More information

The Rationale for Second Level Adaptation

The Rationale for Second Level Adaptation The Rationale for Second Level Adaptation Kumpati S. Narendra, Yu Wang and Wei Chen Center for Systems Science, Yale University arxiv:1510.04989v1 [cs.sy] 16 Oct 2015 Abstract Recently, a new approach

More information

An homotopy method for exact tracking of nonlinear nonminimum phase systems: the example of the spherical inverted pendulum

An homotopy method for exact tracking of nonlinear nonminimum phase systems: the example of the spherical inverted pendulum 9 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June -, 9 FrA.5 An homotopy method for exact tracking of nonlinear nonminimum phase systems: the example of the spherical inverted

More information

Lyapunov Stability of Linear Predictor Feedback for Distributed Input Delays

Lyapunov Stability of Linear Predictor Feedback for Distributed Input Delays IEEE TRANSACTIONS ON AUTOMATIC CONTROL VOL. 56 NO. 3 MARCH 2011 655 Lyapunov Stability of Linear Predictor Feedback for Distributed Input Delays Nikolaos Bekiaris-Liberis Miroslav Krstic In this case system

More information

OPTIMAL CONTROL OF SWITCHING SURFACES IN HYBRID DYNAMIC SYSTEMS. Mauro Boccadoro Magnus Egerstedt,1 Yorai Wardi,1

OPTIMAL CONTROL OF SWITCHING SURFACES IN HYBRID DYNAMIC SYSTEMS. Mauro Boccadoro Magnus Egerstedt,1 Yorai Wardi,1 OPTIMAL CONTROL OF SWITCHING SURFACES IN HYBRID DYNAMIC SYSTEMS Mauro Boccadoro Magnus Egerstedt,1 Yorai Wardi,1 boccadoro@diei.unipg.it Dipartimento di Ingegneria Elettronica e dell Informazione Università

More information

Approximate optimal control for a class of nonlinear discrete-time systems with saturating actuators

Approximate optimal control for a class of nonlinear discrete-time systems with saturating actuators Available online at www.sciencedirect.com Progress in Natural Science 18 (28) 123 129 www.elsevier.com/locate/pnsc Approximate optimal control for a class of nonlinear discrete-time systems with saturating

More information

CHATTERING-FREE SMC WITH UNIDIRECTIONAL AUXILIARY SURFACES FOR NONLINEAR SYSTEM WITH STATE CONSTRAINTS. Jian Fu, Qing-Xian Wu and Ze-Hui Mao

CHATTERING-FREE SMC WITH UNIDIRECTIONAL AUXILIARY SURFACES FOR NONLINEAR SYSTEM WITH STATE CONSTRAINTS. Jian Fu, Qing-Xian Wu and Ze-Hui Mao International Journal of Innovative Computing, Information and Control ICIC International c 2013 ISSN 1349-4198 Volume 9, Number 12, December 2013 pp. 4793 4809 CHATTERING-FREE SMC WITH UNIDIRECTIONAL

More information

Distributed Receding Horizon Control of Cost Coupled Systems

Distributed Receding Horizon Control of Cost Coupled Systems Distributed Receding Horizon Control of Cost Coupled Systems William B. Dunbar Abstract This paper considers the problem of distributed control of dynamically decoupled systems that are subject to decoupled

More information

Noncausal Optimal Tracking of Linear Switched Systems

Noncausal Optimal Tracking of Linear Switched Systems Noncausal Optimal Tracking of Linear Switched Systems Gou Nakura Osaka University, Department of Engineering 2-1, Yamadaoka, Suita, Osaka, 565-0871, Japan nakura@watt.mech.eng.osaka-u.ac.jp Abstract. In

More information

L p Approximation of Sigma Pi Neural Networks

L p Approximation of Sigma Pi Neural Networks IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 6, NOVEMBER 2000 1485 L p Approximation of Sigma Pi Neural Networks Yue-hu Luo and Shi-yi Shen Abstract A feedforward Sigma Pi neural networks with a

More information

Packet-loss Dependent Controller Design for Networked Control Systems via Switched System Approach

Packet-loss Dependent Controller Design for Networked Control Systems via Switched System Approach Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 8 WeC6.3 Packet-loss Dependent Controller Design for Networked Control Systems via Switched System Approach Junyan

More information

Distributed and Real-time Predictive Control

Distributed and Real-time Predictive Control Distributed and Real-time Predictive Control Melanie Zeilinger Christian Conte (ETH) Alexander Domahidi (ETH) Ye Pu (EPFL) Colin Jones (EPFL) Challenges in modern control systems Power system: - Frequency

More information

Elements of Reinforcement Learning

Elements of Reinforcement Learning Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,

More information

IN recent years, controller design for systems having complex

IN recent years, controller design for systems having complex 818 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART B: CYBERNETICS, VOL 29, NO 6, DECEMBER 1999 Adaptive Neural Network Control of Nonlinear Systems by State and Output Feedback S S Ge, Member,

More information

NEURAL NETWORKS (NNs) play an important role in

NEURAL NETWORKS (NNs) play an important role in 1630 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART B: CYBERNETICS, VOL 34, NO 4, AUGUST 2004 Adaptive Neural Network Control for a Class of MIMO Nonlinear Systems With Disturbances in Discrete-Time

More information

Global stabilization of feedforward systems with exponentially unstable Jacobian linearization

Global stabilization of feedforward systems with exponentially unstable Jacobian linearization Global stabilization of feedforward systems with exponentially unstable Jacobian linearization F Grognard, R Sepulchre, G Bastin Center for Systems Engineering and Applied Mechanics Université catholique

More information

The ϵ-capacity of a gain matrix and tolerable disturbances: Discrete-time perturbed linear systems

The ϵ-capacity of a gain matrix and tolerable disturbances: Discrete-time perturbed linear systems IOSR Journal of Mathematics (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765X. Volume 11, Issue 3 Ver. IV (May - Jun. 2015), PP 52-62 www.iosrjournals.org The ϵ-capacity of a gain matrix and tolerable disturbances:

More information

Stabilization of Discrete-Time Switched Linear Systems: A Control-Lyapunov Function Approach

Stabilization of Discrete-Time Switched Linear Systems: A Control-Lyapunov Function Approach Stabilization of Discrete-Time Switched Linear Systems: A Control-Lyapunov Function Approach Wei Zhang 1, Alessandro Abate 2 and Jianghai Hu 1 1 School of Electrical and Computer Engineering, Purdue University,

More information

Tube Model Predictive Control Using Homothety & Invariance

Tube Model Predictive Control Using Homothety & Invariance Tube Model Predictive Control Using Homothety & Invariance Saša V. Raković rakovic@control.ee.ethz.ch http://control.ee.ethz.ch/~srakovic Collaboration in parts with Mr. Mirko Fiacchini Automatic Control

More information

Robustness of the nonlinear PI control method to ignored actuator dynamics

Robustness of the nonlinear PI control method to ignored actuator dynamics arxiv:148.3229v1 [cs.sy] 14 Aug 214 Robustness of the nonlinear PI control method to ignored actuator dynamics Haris E. Psillakis Hellenic Electricity Network Operator S.A. psilakish@hotmail.com Abstract

More information

Navigation and Obstacle Avoidance via Backstepping for Mechanical Systems with Drift in the Closed Loop

Navigation and Obstacle Avoidance via Backstepping for Mechanical Systems with Drift in the Closed Loop Navigation and Obstacle Avoidance via Backstepping for Mechanical Systems with Drift in the Closed Loop Jan Maximilian Montenbruck, Mathias Bürger, Frank Allgöwer Abstract We study backstepping controllers

More information

Gain Scheduling Control with Multi-loop PID for 2-DOF Arm Robot Trajectory Control

Gain Scheduling Control with Multi-loop PID for 2-DOF Arm Robot Trajectory Control Gain Scheduling Control with Multi-loop PID for 2-DOF Arm Robot Trajectory Control Khaled M. Helal, 2 Mostafa R.A. Atia, 3 Mohamed I. Abu El-Sebah, 2 Mechanical Engineering Department ARAB ACADEMY FOR

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Nonlinear Control Design for Linear Differential Inclusions via Convex Hull Quadratic Lyapunov Functions

Nonlinear Control Design for Linear Differential Inclusions via Convex Hull Quadratic Lyapunov Functions Nonlinear Control Design for Linear Differential Inclusions via Convex Hull Quadratic Lyapunov Functions Tingshu Hu Abstract This paper presents a nonlinear control design method for robust stabilization

More information

Multiple-mode switched observer-based unknown input estimation for a class of switched systems

Multiple-mode switched observer-based unknown input estimation for a class of switched systems Multiple-mode switched observer-based unknown input estimation for a class of switched systems Yantao Chen 1, Junqi Yang 1 *, Donglei Xie 1, Wei Zhang 2 1. College of Electrical Engineering and Automation,

More information

Event-based Stabilization of Nonlinear Time-Delay Systems

Event-based Stabilization of Nonlinear Time-Delay Systems Preprints of the 19th World Congress The International Federation of Automatic Control Event-based Stabilization of Nonlinear Time-Delay Systems Sylvain Durand Nicolas Marchand J. Fermi Guerrero-Castellanos

More information

Delay-dependent Stability Analysis for Markovian Jump Systems with Interval Time-varying-delays

Delay-dependent Stability Analysis for Markovian Jump Systems with Interval Time-varying-delays International Journal of Automation and Computing 7(2), May 2010, 224-229 DOI: 10.1007/s11633-010-0224-2 Delay-dependent Stability Analysis for Markovian Jump Systems with Interval Time-varying-delays

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Temporal Backpropagation for FIR Neural Networks

Temporal Backpropagation for FIR Neural Networks Temporal Backpropagation for FIR Neural Networks Eric A. Wan Stanford University Department of Electrical Engineering, Stanford, CA 94305-4055 Abstract The traditional feedforward neural network is a static

More information

Book review for Stability and Control of Dynamical Systems with Applications: A tribute to Anthony M. Michel

Book review for Stability and Control of Dynamical Systems with Applications: A tribute to Anthony M. Michel To appear in International Journal of Hybrid Systems c 2004 Nonpareil Publishers Book review for Stability and Control of Dynamical Systems with Applications: A tribute to Anthony M. Michel João Hespanha

More information

Theory in Model Predictive Control :" Constraint Satisfaction and Stability!

Theory in Model Predictive Control : Constraint Satisfaction and Stability! Theory in Model Predictive Control :" Constraint Satisfaction and Stability Colin Jones, Melanie Zeilinger Automatic Control Laboratory, EPFL Example: Cessna Citation Aircraft Linearized continuous-time

More information

A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Isabelle Rivals and Léon Personnaz

A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Isabelle Rivals and Léon Personnaz In Neurocomputing 2(-3): 279-294 (998). A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models Isabelle Rivals and Léon Personnaz Laboratoire d'électronique,

More information

Indirect Model Reference Adaptive Control System Based on Dynamic Certainty Equivalence Principle and Recursive Identifier Scheme

Indirect Model Reference Adaptive Control System Based on Dynamic Certainty Equivalence Principle and Recursive Identifier Scheme Indirect Model Reference Adaptive Control System Based on Dynamic Certainty Equivalence Principle and Recursive Identifier Scheme Itamiya, K. *1, Sawada, M. 2 1 Dept. of Electrical and Electronic Eng.,

More information

AFAULT diagnosis procedure is typically divided into three

AFAULT diagnosis procedure is typically divided into three 576 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 47, NO. 4, APRIL 2002 A Robust Detection and Isolation Scheme for Abrupt and Incipient Faults in Nonlinear Systems Xiaodong Zhang, Marios M. Polycarpou,

More information

Backstepping Control of Linear Time-Varying Systems With Known and Unknown Parameters

Backstepping Control of Linear Time-Varying Systems With Known and Unknown Parameters 1908 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 48, NO 11, NOVEMBER 2003 Backstepping Control of Linear Time-Varying Systems With Known and Unknown Parameters Youping Zhang, Member, IEEE, Barış Fidan,

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fourth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada Front ice Hall PRENTICE HALL Upper Saddle River, New Jersey 07458 Preface

More information

Optimal Control of Switching Surfaces in Hybrid Dynamical Systems

Optimal Control of Switching Surfaces in Hybrid Dynamical Systems Optimal Control of Switching Surfaces in Hybrid Dynamical Systems M. Boccadoro, Y. Wardi, M. Egerstedt, and E. Verriest boccadoro@diei.unipg.it Dipartimento di Ingegneria Elettronica e dell Informazione

More information

1 The Observability Canonical Form

1 The Observability Canonical Form NONLINEAR OBSERVERS AND SEPARATION PRINCIPLE 1 The Observability Canonical Form In this Chapter we discuss the design of observers for nonlinear systems modelled by equations of the form ẋ = f(x, u) (1)

More information

Nonlinear Tracking Control of Underactuated Surface Vessel

Nonlinear Tracking Control of Underactuated Surface Vessel American Control Conference June -. Portland OR USA FrB. Nonlinear Tracking Control of Underactuated Surface Vessel Wenjie Dong and Yi Guo Abstract We consider in this paper the tracking control problem

More information

Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions

Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions Artem Chernodub, Institute of Mathematical Machines and Systems NASU, Neurotechnologies

More information

arxiv: v1 [cs.lg] 23 Oct 2017

arxiv: v1 [cs.lg] 23 Oct 2017 Accelerated Reinforcement Learning K. Lakshmanan Department of Computer Science and Engineering Indian Institute of Technology (BHU), Varanasi, India Email: lakshmanank.cse@itbhu.ac.in arxiv:1710.08070v1

More information

Neural Dynamic Optimization for Control Systems Part II: Theory

Neural Dynamic Optimization for Control Systems Part II: Theory 490 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART B: CYBERNETICS, VOL. 31, NO. 4, AUGUST 2001 Neural Dynamic Optimization for Control Systems Part II: Theory Chang-Yun Seong, Member, IEEE, and

More information

Global Stability and Asymptotic Gain Imply Input-to-State Stability for State-Dependent Switched Systems

Global Stability and Asymptotic Gain Imply Input-to-State Stability for State-Dependent Switched Systems 2018 IEEE Conference on Decision and Control (CDC) Miami Beach, FL, USA, Dec. 17-19, 2018 Global Stability and Asymptotic Gain Imply Input-to-State Stability for State-Dependent Switched Systems Shenyu

More information

I. MAIN NOTATION LIST

I. MAIN NOTATION LIST IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 5, MAY 2008 817 Robust Neural Network Tracking Controller Using Simultaneous Perturbation Stochastic Approximation Qing Song, Member, IEEE, James C. Spall,

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

A Robust Controller for Scalar Autonomous Optimal Control Problems

A Robust Controller for Scalar Autonomous Optimal Control Problems A Robust Controller for Scalar Autonomous Optimal Control Problems S. H. Lam 1 Department of Mechanical and Aerospace Engineering Princeton University, Princeton, NJ 08544 lam@princeton.edu Abstract Is

More information

Applications of Controlled Invariance to the l 1 Optimal Control Problem

Applications of Controlled Invariance to the l 1 Optimal Control Problem Applications of Controlled Invariance to the l 1 Optimal Control Problem Carlos E.T. Dórea and Jean-Claude Hennet LAAS-CNRS 7, Ave. du Colonel Roche, 31077 Toulouse Cédex 4, FRANCE Phone : (+33) 61 33

More information

On Design of Reduced-Order H Filters for Discrete-Time Systems from Incomplete Measurements

On Design of Reduced-Order H Filters for Discrete-Time Systems from Incomplete Measurements Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 2008 On Design of Reduced-Order H Filters for Discrete-Time Systems from Incomplete Measurements Shaosheng Zhou

More information

THE nonholonomic systems, that is Lagrange systems

THE nonholonomic systems, that is Lagrange systems Finite-Time Control Design for Nonholonomic Mobile Robots Subject to Spatial Constraint Yanling Shang, Jiacai Huang, Hongsheng Li and Xiulan Wen Abstract This paper studies the problem of finite-time stabilizing

More information

Characterizing Uniformly Ultimately Bounded Switching Signals for Uncertain Switched Linear Systems

Characterizing Uniformly Ultimately Bounded Switching Signals for Uncertain Switched Linear Systems Proceedings of the 46th IEEE Conference on Decision and Control New Orleans, LA, USA, Dec. 12-14, 2007 Characterizing Uniformly Ultimately Bounded Switching Signals for Uncertain Switched Linear Systems

More information

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints Klaus Schittkowski Department of Computer Science, University of Bayreuth 95440 Bayreuth, Germany e-mail:

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Optimization Methods for Machine Learning Decomposition methods for FFN

Optimization Methods for Machine Learning Decomposition methods for FFN Optimization Methods for Machine Learning Laura Palagi http://www.dis.uniroma1.it/ palagi Dipartimento di Ingegneria informatica automatica e gestionale A. Ruberti Sapienza Università di Roma Via Ariosto

More information

90 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY /$ IEEE

90 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY /$ IEEE 90 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008 Generalized Hamilton Jacobi Bellman Formulation -Based Neural Network Control of Affine Nonlinear Discrete-Time Systems Zheng Chen,

More information

Neural-network-observer-based optimal control for unknown nonlinear systems using adaptive dynamic programming

Neural-network-observer-based optimal control for unknown nonlinear systems using adaptive dynamic programming International Journal of Control, 013 Vol. 86, No. 9, 1554 1566, http://dx.doi.org/10.1080/0007179.013.79056 Neural-network-observer-based optimal control for unknown nonlinear systems using adaptive dynamic

More information

Hybrid particle swarm algorithm for solving nonlinear constraint. optimization problem [5].

Hybrid particle swarm algorithm for solving nonlinear constraint. optimization problem [5]. Hybrid particle swarm algorithm for solving nonlinear constraint optimization problems BINGQIN QIAO, XIAOMING CHANG Computers and Software College Taiyuan University of Technology Department of Economic

More information

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,

More information

On Computing the Worst-case Performance of Lur'e Systems with Uncertain Time-invariant Delays

On Computing the Worst-case Performance of Lur'e Systems with Uncertain Time-invariant Delays Article On Computing the Worst-case Performance of Lur'e Systems with Uncertain Time-invariant Delays Thapana Nampradit and David Banjerdpongchai* Department of Electrical Engineering, Faculty of Engineering,

More information

ECE Introduction to Artificial Neural Network and Fuzzy Systems

ECE Introduction to Artificial Neural Network and Fuzzy Systems ECE 39 - Introduction to Artificial Neural Network and Fuzzy Systems Wavelet Neural Network control of two Continuous Stirred Tank Reactors in Series using MATLAB Tariq Ahamed Abstract. With the rapid

More information

Adaptive Control of a Class of Nonlinear Systems with Nonlinearly Parameterized Fuzzy Approximators

Adaptive Control of a Class of Nonlinear Systems with Nonlinearly Parameterized Fuzzy Approximators IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 2, APRIL 2001 315 Adaptive Control of a Class of Nonlinear Systems with Nonlinearly Parameterized Fuzzy Approximators Hugang Han, Chun-Yi Su, Yury Stepanenko

More information

An asymptotic ratio characterization of input-to-state stability

An asymptotic ratio characterization of input-to-state stability 1 An asymptotic ratio characterization of input-to-state stability Daniel Liberzon and Hyungbo Shim Abstract For continuous-time nonlinear systems with inputs, we introduce the notion of an asymptotic

More information