Stability Analysis of Optimal Adaptive Control under Value Iteration using a Stabilizing Initial Policy

Size: px
Start display at page:

Download "Stability Analysis of Optimal Adaptive Control under Value Iteration using a Stabilizing Initial Policy"

Transcription

1 Stability Analysis of Optimal Adaptive Control under Value Iteration using a Stabilizing Initial Policy Ali Heydari, Member, IEEE Abstract Adaptive optimal control using value iteration initiated from a stabilizing control policy is theoretically analyzed. The analysis is in terms of stability of the system during the learning stage and includes the system controlled by any fixed control policy and also by an evolving policy. A feature of the presented results is finding subsets of the region of attraction. This is done so that if the initial condition belongs to this region, the entire state trajectory remains within the training region. Therefore, the function approximation results remain reliable, as no extrapolation will be conducted. Index Terms- Value iteration; approximate/adaptive dynamic programing; adaptive optimal control; stabilizing value iteration. I. INTRODUCTION Approximate/adaptive dynamic programming (ADP) as a framework for learning optimal control has received an enormous attention in the last two decades, [2] [4]. ADPbased learning algorithms are typically classified as policy iteration (PI) and value iteration (VI) algorithms, [3], [4]. These methods are used for machine learning [3] and also for feedback control of dynamical systems [2], [4]. The control policy during the learning stage remains stabilizing in PI, [5]. Therefore, PI is naturally more attractive for online learning. However, the learning needs to start with a stabilizing initial control. But, VI can be initiated using an arbitrary policy and the control policy during the learning stage (i.e., immature control policy) may not stabilize the system. Stability under the VI-based results after the conclusion of (a finite number of) iterations was investigated in [6], [7]. In this study it is proved that any immature control policy generated using VI also will stabilize the system if the iteration is started using an initial stabilizing control policy, as in PI. Afterwards, the important concern that learning-based results are valid only if the future states stay within the region on which the controller is tuned is addressed. It should be noted that in the general case, it is not guaranteed that a state trajectory initiated from this region will remain inside it. If it exits the region, then the trained controller becomes invalid, as the controller is not reliable for extrapolation. This problem is solved by obtaining a subset of the region of attraction (SROA) [8, Section 8.2] for the closed loop system, in this study. Once done, if the initial condition of the system is inside the region, the entire trajectory is guaranteed to remain in the region over which the controller is valid. Then, it is discussed that the A. Heydari is an Assistant Professor of Mechanical Engineering with the Southern Methodist University, Dallas, TX, aheydari@smu.edu. This material is based upon work supported by the National Science Foundation under Grant No Initial results of this research were presented at American Control Conference 26 through Ref. []. provided stability proof, which is not substantially different from [9], is based on operating the system using a fixed control policy. However, in online learning, the control policy evolves, i.e., the policy changes versus time. But, if a time-varying control policy is applied, the previous stability result is no longer applicable. Therefore, another set of stability results for the evolving/changing control policy is developed with an idea for establishing its respective SROA, as another contribution of this work. In Ref. [7], stability of VI-based algorithms with arbitrary initial guesses and after the training stage was investigated. However, the current study investigates VI initiated using a stabilizing guess and during the training phase. Compared with [9] and also with existing results for PI, this study establishes SROAs and the closed loop stability under evolving policies. Another relatively similar result is Ref. []. A difference between this work and [] is the point that no termination assumption is made here. Under the termination assumption, starting from a non-zero initial state, there exists a finite time after which, the cost-to-go becomes zero. Finally, it may be mentioned that initial results of this research were presented at Ref. []. The main differences compared with that conference paper are listed next. a) The rigor of the analyses is improve. b) Proof of Lemma is included. c) The results in Theorem 2 is extended to the case of applying each policy for more than one time step while Ref. [] required each policy to be applied exactly for one step. d) An idea for establishing SROA for the case of applying evolving policies is presented. e) Numerical analysis are included in the current study. As for organization of the study, the problem formulation is given in Section II and the value iteration based solution is reviewed in Section III. The main results, i.e., the stability analyses are presented in Section IV, followed by numerical simulations and conclusions in Sections V and VI, respectively. II. PROBLEM FORMULATION Let the discrete-time nonlinear dynamics x k+ = f(x k, u k ), k N, () be considered where f : R n R m R n is a continuous function in x R n, as the state vector, and in u R m, as the control vector, with f(, ) =. The sets of real numbers and non-negative integers are denoted with R and N, respectively and sub-index k denotes the discrete time index. Cost function J = U(x k, u k ), (2)

2 is selected where utility function U(x k, u k ) := Q(x k )+R(u k ) is assumed. Let continuous functions Q : R n R + and R : R m R + be positive semi-definite and positive definite, respectively. Set R + denotes the non-negative reals. Let a control policy be given by π : R n R m for feedback control calculation, i.e., u k = π(x k ). The objective is finding the optimal control policy, denoted with π ( ), that is the policy using which cost function (2) is minimized, subject to dynamics (). In online learning, this process is done through selecting an initial control policy and updating it, until it converges to the optimal control policy. Definition. Let Ω R n be a compact and connected set containing the origin as an interior point. Also, let V π : R n R + denote the value function of policy π( ), i.e., V π ( ) = U ( x π k, π(x π k) ), (3) where x π k denotes the kth element on the state history started from and generated using control policy π( ). Then, control policy π( ) is called admissible in Ω if the following two conditions hold. ) The policy is a continuous function in R n satisfying π() =. 2) There exists a continuous positive definite function W : R n R + such that V π (x) W(x), x Ω. The defined admissibility is slightly different from the typical definitions as in []. While the continuity of the value function is a requirement for its uniform approximation [2] and also for using it as a candidate Lyapunov function, the milder condition of the value function being bounded by a continuous function is selected. It will be shown that the upper boundedness will lead to the desired continuity which in turns leads to its boundedness in compact sets, [3, Theorem 4.5]. The following two assumptions apply to the results presented in the rest of this study. Assumption. There exists an admissible policy in Ω. Assumption 2. The intersection of the set of n-vectors x at which U(x, ) = with the invariant set of f(., ) only contains the origin, i.e., no solution of x k+ = f(x k, ) can remain in {x R n : U(x, ) = }, other than x k =, k. By assumption, there is no state vector in Ω whose optimal value function, defined in the next section, is infinite. It may be mentioned that feedback linearization, [4], [5] is an example approach for finding the initial admissible policy. Another approach is using a control Lyapunov function (CLF) as used in [6]. Such a CLF guarantees ) continuity of the resulting policy, 2) continuity of the upper bound of the value function, and 3) the feature of the upper bound vanishing at the origin, [6, Section III.B]. Finally, Assumption 2 guarantees that there is no set of states in which the state trajectory can hide without convergence to the origin. If, for example, U(, ) is positive definite, this assumption is trivially satisfied. III. REVISITING VALUE ITERATION-BASED SOLUTION Value function of control policy π( ) satisfies V π (x) = U ( x, π(x) ) + V π (f ( x, π(x) )), x R n, (4) per Eq. (3). The optimal value function may be defined as the value function of the optimal control policy. Denoting it with V ( ), Bellman equation [7, p. 7], [] provides the solution to the problem: ( π (x) arg min U ( x, u ) + V ( f ( x, u ))), (5) u R m V (x) = min U ( x, u ) + V ( f ( x, u ))). (6) It is worth mentioning that the minimizing u in (5) may not be unique. Motivated by [], notation is used here to allow selecting any of the minimizers. Solving Bellman equation is computationally intractable for general nonlinear systems (curse of dimensionality, [7, p. 78].) The idea of approximating the optimal value function is pursued in ADP. This approximation is done using function approximators, e.g., neural networks (NNs), or look-up tables. This approximation/tuning is conducted over a connected and compact set with the origin as an interior point, namely, the region of interest, denoted by Ω. This region needs to be selected based on the expected operation envelop of the system, i.e., the expected states to be visited during online control. If the states exit this region, the tuned controller becomes invalid. Approximation of optimal value function can be done using VI. Starting with a selected V ( ), one iterates through the policy update equation π i (x) arg min and the value update equation U ( x, u ) + V i( f ( x, u ))), x Ω, (7) V i+ (x) = U ( x, π i (x) ) + V i( f ( x, π i (x) )), x Ω, (8) in VI. Equivalently, the iterations may be given by V i+ (x) = min U ( x, u ) + V i( f ( x, u ))), x Ω, (9) The iterations are done for i =,,... until they converge. If the iterations converge to the optimal value function, i.e., if V i ( ) V ( ) as i, the resulting V ( ) can be used in (5) for finding the (approximate) optimal policy. IV. STABILITY ANALYSIS UNDER VALUE ITERATION Let the initial V ( ), be selected as the value function of an admissible control policy. For brevity, the resulting VI is called stabilizing VI, as defined next. Definition 2. Stabilizing value iteration is defined as the value iteration algorithm (9) initiated by the value function of an admissible control policy. Selecting the initial admissible policy π( ), its value function, V π ( ), can be obtained using (4). One way of solving (4) for V π ( ) is using the successive approximation given by Vπ j+ (x) = U ( x, π(x) ) + Vπ j ( f ( x, π(x) )), x Ω, () where the superscript on V j π( ) is the index of iteration. Starting with the initial guess of V π(x) =, x, the iterations converge to V π ( ), [], [4]. Selecting V ( ) = V π ( ), as the initial guess in VI, the stability of the system using π i ( )s can be established. Before that, some theoretical results are needed. Let V( ) C(Ω) denote that function V( ) is continuous in Ω. 2

3 Lemma. Let π( ) be an admissible control policy in Ω and Assumption 2 hold. Then, V( ) C(Ω). Proof : The proof is by contradiction. Assume that V π ( ) is discontinuous at some y Ω. Then ɛ >, δ >, Ω : () V π ( ) V π (y ) > ɛ while y < δ, where. denotes a vector norm,. represents absolute value, and : denotes such that. The idea is showing that () is not possible. To this end, initially it may be noted that at jth iteration of (), one has j Vπ(x j ) = Vπ( π j ) + U ( x π k, π(x π k) ). (2) Selecting V π( ) =, from (3) and (2) one has Therefore, V π ( ) = V j π( ) + V π (x π j ). (3) V π ( ) V π (y ) = V j π( )+V π (x π j ) V j π(y ) V π (y π j ), (4) which leads to V π ( ) V π (y ) V j π( ) V j π(y ) + V π (x π j ) + V π (y π j ), (5) by triangle inequality of absolute values. Inequality (5) is the key to the solution, as it will be shown that the right hand side of the inequality can be made arbitrarily small if is close enough to y and j is large enough. By the admissibility of π( ), one has V π (x) W (x), x, therefore, V π ( ) V π (y ) Vπ(x j ) Vπ(y j ) + W (x π j ) + W (yj π ). (6) Also, by admissibility of π( ), the sequence of partial sums in the right hand side of (3), i.e., { i U( x π k, π(xπ k )) } i= is upper bounded by a continuous function. Therefore, the partial sums are finite in a compact set, given finiteness of continuous functions in such sets, [3, Theorem 4.5]. Moreover, the sequence is non-decreasing, given the nonnegative summands. Hence, it converges, [3, Theorem 3.4]. Therefore, U ( x π k, π(xπ k )) as k, [3, Theorem 3.23]. This leads to x π j as j, by Assumption 2. By W ( ) C(Ω) and W () =, which follows from positivedefiniteness of W ( ), one has y Ω, ɛ >, j = j (y, ɛ) : Moreover, by W ( ) C(Ω) y π j Ω, ɛ >, δ = δ (y π j, ɛ) : Hence, j j W (y π j ) < ɛ/4. x π j y π j < δ W (x π j ) W (y π j ) < ɛ/4. (7) (8) x π j y π j < δ W (x π j ) < ɛ/4 + W (y π j ). (9) On the other hand, due to the continuity of the closed loop system f (, π( ) ) in R n, the state trajectory at each finite time, for example j, continuously depends on the initial conditions. This may be seen through noting that the state at any finite time j is the result of composition of continuous function f (, π( ) ) for j times. Composition of a finite number of continuous functions is a continuous function, [3, Theorem 4.7]. Therefore, the trajectory yk π, k =,,..., j changes continuously as y changes. Hence, y Ω, δ >, δ 2 = δ 2 (y, δ, j) : y < δ 2 x π j y π j < δ. (2) Moreover, Vπ( ) j C(Ω), for j <, as it is a finite sum of continuous functions, per (2), evaluated along a trajectory which is a continuous function of the argument of Vπ( ). j Therefore, y Ω, ɛ >, δ 3 = δ 3 (y, ɛ, j) : y < δ 3 Vπ(x j ) Vπ(y j (2) ) < ɛ/4. Enough inequalities are now found for contradicting (). For any point of discontinuity y and ɛ whose existence is guaranteed by (), let us find j = j (y, ɛ) which leads to W (y π j ) < ɛ/4, (22) per (7). Then, let δ = δ (y π j, ɛ). By (9) and (22), x π j y π j < δ W (x π j ) < ɛ/4 + W (y π j ) < ɛ/2. (23) Let us select δ 2 = δ 2 (y, δ, j ) to have y < δ 2 x π j y π j < δ, (24) per (2). Finally, set δ 3 = δ 3 (y, ɛ, j ) to have y < δ 3 Vπ j ( ) Vπ j (y ) < ɛ/4, (25) per (2). Let δ = min(δ 2, δ 3 ). Using (22), (23), (24), and (25) in (6) one has y < δ V π ( ) V π (y ) Vπ j ( ) Vπ j (y ) + W (x π j ) + W (yj π ) < ɛ, which contradicts (). Therefore, V π ( ) C(Ω). (26) Lemma 2. Let Assumption hold. Sequence of functions {V j (x)} j= := {V (x), V (x),...} generated through stabilizing value iteration is pointwise non-increasing in Ω. Proof : This monotonicity is a well known feature of VI, [6], []. For the specific case in here that the initial guess is a value function, it is established as follows. Considering (4) which provides V ( ) = V π ( ) and (9), which for i = provides V ( ), one has V (x) V (x), x Ω, (27) because V ( ) is the minimum of the right hand side of (9) for i =, while V ( ) is based on using the selected π( ). Now, assume that for some i. By (9) one has V i (x) = min V i (x) V i (x), x Ω, (28) U ( x, u ) + V i ( f ( x, u ))), x Ω. (29) Comparing (29) with (9) and considering (28) one has V i+ (x) V i (x), x Ω. (3) This completes the induction and proves the lemma. Next step is showing that each V i ( ) is continuous in Ω. While functions f(, ) and U(, ) are continuous with respect to their inputs, the presence of arg min operator in Eq. (7) may result in a discontinuous π i ( ), which may then lead to a discontinuous V i+ ( ) in Eq. (8). Therefore, this continuity is not obvious. 3

4 Lemma 3. Let π( ) be an initial admissible policy in Ω used for stabilizing value iteration. Then, V i ( ) C(Ω), i N. Proof : It was shown in [7] that if V i ( ) C(Ω), then V i+ ( ) C(Ω). Given this result and the point that V ( ) C(Ω) (by Lemma ), it follows that V i ( ) C(Ω) for any given (finite) i, by induction. Remark. Unlike Lemma, where the continuity of V ( ) was established using continuity of the initial policy, the continuity of rest of value functions are established in Lemma 3 without assuming continuity of π i ( )s. Continuity of value functions, established in Lemmas and 3, is desired because it provides uniform approximation capability, [2] which is particularly suitable in generalization/interpolation, i.e., approximating the function at states not visited in the training stage. Moreover, this continuity provides the possibility of utilizing the value functions as candidate Lyapunov functions, in establishing stability, as done in Theorem. Before that, the term SROA needs to be formally defined, motivated by a similar concept in continuous-time systems, [8, Section 8.2]. Definition 3. A subset of the region of attraction (SROA) for the closed loop system is a region in the state space such that any trajectory initiated inside this region is defined and converges to the origin as time goes to infinity. Theorem. Let Assumptions and 2 hold. For every fixed i N, control policy π i ( ) generated using stabilizing value iteration renders the origin an asymptotically stable point. Moreover, compact set β i r := {x R n : V i (x) r} for any r > using which β i r Ω will be a subset of the region of attraction for the closed loop system. Proof : The claim is proved through using V i ( ) as a candidate Lyapunov function for policy π i ( ). Function V ( ) is continuous (by Lemma ). It is also positive definite by positive semi-definiteness of U(, ) and Assumption 2. As there is no non-zero x with the value function of zero. For any positive definite V i ( ), from (8) it follows that V i+ ( ) also is positive definite. The reason is, if U(x, ) = for some nonzero x, then f(x, ) by Assumption 2. Therefore, V i+ ( ) is positive definite, i N, by induction. Also, V i ( ) C(Ω) by Lemma 3. Given Eq. (8), one has V i( f ( x, π i (x) )) V i+ (x) = U ( x, π i (x) ), x Ω. (3) On the other hand, by Lemma 2, V i+ (x) V i (x), i, x Ω. Therefore, replacing V i+ (x) in (3) with V i (x) leads to V i( f ( x, π i (x) )) V i (x) U ( x, π i (x) ), x Ω. (32) Let S := {x R n : U(x, ) = }. The right hand side of (32) vanishes only if x S. Since, no non-zero state history can stays in S by Assumption 2, the asymptotic stability of the origin under π i ( ) follows from (32), [8, Corollary.3]. Set βr i is an SROA for the closed loop system, because, V i (x k+ ) V i (x k ) by (32), hence, x k βr i leads to x k+ βr, i k N. In other words, a state trajectory initiated within βr i will stay inside the region and hence, inside Ω. Given this feature along with the asymptotic stability result established in the previous paragraph, the state trajectory converges to the origin as k. Therefore, β i r will be an SROA. Finally, since β i r Ω, it is bounded. Also, β i r is closed as it is the inverse image of closed set [, r] under a continuous mapping function (Lemma 3), [3, p. 87]. Therefore, it is compact. The origin is an interior point of the set, because V i () =, r >, and V i ( ) is continuous in Ω. This completes the proof. Comparing the results given by Theorem with the existing literature, the closest one is [9], in which a novel VI algorithm, called θ-adp, was introduced. The point that θ-adp requires to be initiated from a function which acts similar to a control Lyapunov function (CLF) of the respective system (in order for the control under iterations to remain stabilizing) corresponds to the required initial admissible guess for VI in this study. However, admitting a positive semi-definite utility function as opposed to the positive definite one in that work and more importantly, establishing an SROA are the main differences of Theorem compared with [9]. Finally, it is worth noting that considering the analogy between the value iteration and finite-horizon optimal control, [9], the stability results given by Theorem resembles a method of stability proof in receding horizon control (RHC) literature, [6]. In that study, a CLF is utilized as the terminal cost in the respective finite-horizon problems to maintain stability. It was shown by Theorem that each selected/fixed π i ( ) will steer the states toward the origin. But, in online learning, the policy will be subject to change. More specifically, if π i ( ) is applied at the current time, policy π i+ ( ) may be applied next. Even though Theorem established asymptotic stability of the origin for the autonomous system x k+ = F (x k ) := f ( x k, π i (x k ) ) for any selected i, it does not include the nonautonomous system x k+ = F (x k, k) := f ( x k, π k (x k ) ). Hence, another stability analysis is required to show that the states under the evolving policies also converge to the origin. This is done next, for the general case of applying each policy π i ( ) for M i N steps before switching to the next policy, i.e., π i+ ( ) (and applying it for M i+ N steps.) Theorem 2. Let Assumptions and 2 hold and also let the sequence of control policies {π i ( )} i= resulting from stabilizing value iteration be used for operating the system such that each π i ( ) is applied for M i N time steps. Then, every trajectory which stays in Ω will converge to the origin. Proof : Let the state vector at time k, generated through the scenario of applying each π i ( ) for M i steps, be denoted with x + k and let x+ =. Eq. (8) and the monotonicity of value function (Lemma 2) lead to V ( ) = U (, π ( ) ) + V ( f (, π ( ) )) Therefore, V ( ), Ω, U ( x +, π (x + )) + V (x + ) V (x + ), x+ (33) Ω. (34) 4

5 The idea is using (34) in itself for M times to get U ( x + k, π (x + k )) + V (x + M ) V (x + ), x+. (35) This may be done by evaluating (34) at x + U ( x +, π (x + )) + V (x + 2 ) V (x + ), x+ to get Ω, (36) and replacing the V (x + ) in (34) with the left hand side of (36), which is not greater than V (x + ) per (36) to get U ( x +, π (x + )) + U ( x +, π (x + )) + V (x + 2 ) V (x + ), x+ Ω. (37) Repeating this process for M 2 times gives (35). Similarly, using Eq. (8) and the monotonicity, one has hence, V 2 (x) = U ( x, π (x) ) + V ( f ( x, π (x) )) V (x) V (x), x Ω. (38) U ( x + M, π (x + M ) ) + V (x + M + ) V (x + M ), x + M, (39) which once similarly repeated in itself for M times, leads to U ( x + M +k, π (x + M +k )) + V (x + M +M ) V (x + M ), x + M Ω. (4) The right hand side of (4) is less than V (x + M ), per the monotonicity of the value function, hence, one can replace V (x + M ) in (35) with the left hand side of (4), leading to U ( M x + k, π (x + k )) + U ( x + M +k, π (x + M +k )) + V (x + M +M ) V (x + ), x+ Ω. (4) So far, two generations of control policies, namely, π ( ) and π ( ) were applied and the result in the foregoing inequality was obtained. Repeating this process for N 2 more times, the following inequality can be obtained which handles applying N generations of control policies, with not necessarily identical utilization periods M i s. N i= i U ( x + i j= Mj+k, πi (x + i j= Mj+k)) + V N (x + N ) V (x + j= Mj ), x+ Ω. Since V N (x), x, the foregoing equations leads to N i (42) U ( x + i Mj+k, πi (x + i Mj+k)) j= j= (43) i= V (x + ), x+ Ω. therefore, the sequence of partial sums in the left hand side is upper bounded by the constant term V (x + ) and because of being non-decreasing, it converges, as N, [3, Theorem 3.4]. Therefore, U ( x + k, πi (x + k )) as k, [3, Theorem 3.23]. This leads to x + k by Assumption 2 if the state trajectory remains in Ω. Finally, it may be noted that the summation in the left hand side of (43) is evaluated along the trajectory of interest, i.e., x + k, k =,,... This concludes the proof. The right hand side of (43), as N, is actually the cost-to-go or value function of applying the evolving policy. Therefore, it can be used as a candidate Lyapunov function (which is time-dependent, as the dynamics of the system under evolving policies are time-dependent.) Denoting the cost-togo at time j with V(x j, j), where j corresponds to an instant during the period of applying π N ( ) for any given N N, one has M N l V(x + j, j) = U ( x + j+k, πn (x + j+k )) + i=n+ M i U ( x + i j= Mj+k, πi (x + i j= Mj+k)), x + Ω, (44) where l is the remaining number of time steps for applying policy π N ( ), i.e., l = j N M k. Function V(x j, j) satisfies V(x + j+, j + ) V(x+ j, j) = U( x + j, πn (x + j )). (45) Inequality (45) along with the fact that the right hand side does not vanish along any trajectory (per. Assumption 2) leads to the desired stability, [8, Theorem ]. However, before making this conclusion, given the time-dependency of the Lyapunov function, one needs to show that it is lower and upper bounded by some time-independent positive-definite functions, [8, Theorem ]. A lower bound is given by U ( x + j, ) + U ( x + j+, ), which is positive-definite, i.e., does not vanish for any non-zero x + j, per Assumption 2. The upper boundedness is given by V (x + j ), given (43). Note that the Lyapunov function for time-varying systems is not required to be continuous and the continuity of ) the difference given by (45), 2) the lower bound of the candidate Lyapunov function, and 3) the upper bound of the function suffices, [8, Theorem ]. Continuity of these three functions follows from the continuity of the utility function U(, ) and that of the V ( ) given by Lemma. As seen in the statement of Theorem 2, the stability is conditional on the state trajectory staying inside Ω. But, the theorem does not provide an SROA to guarantee this, for the case of applying an evolving policy. An idea for establishing an SROA is given next. As discussed above, function (44) is a candidate Lyapunov function for proof of stability of the evolving policy. Therefore, for establishing the SROA for the evolving policy, this candidate Lyapunov function can be used. Defining ˆ j := {x R n : V(x, j) r}, from x k ˆ j one has x k+ ˆ, j k = j, j +,... because of (45). The rest is similar to the last paragraph of proof of Theorem. Finally, convergence of the stabilizing value iteration is established. While, the convergence is not used for stability results in this study, it is of interest for implementation of stabilizing VI. Lemma 4. Let Assumption hold. The stabilizing value iteration, given by Eq. (9), converges to the optimal solution 5

6 in the selected compact and connected region Ω containing the origin as an interior point. Proof : The sequence of value function under the stabilizing VI is non-increasing (Lemma 2) and lower bounded (more specifically, non-negative per the proof of Theorem.) Therefore, it converges, [3, Theorem 3.4]. The limit function, i.e., the function to which the sequence of value functions converges, denoted with V ( ), can be shown to be the same as V ( ), either by resorting to the uniqueness of the solution to the Bellman equation, [2] (as both V ( ) and V ( ) satisfy it) or through the analogy between the value iteration and finitehorizon optimal control problems detailed in [9]. V. NUMERICAL EXAMPLE Some of the results presented in this study are numerically illustrated through an example. Van der Pol s oscillator, with continuous-time dynamics z = ( z 2 )ż z + u is selected. The problem was taken into state space by defining x = [X, Y ] T := [z, ż] T and discretized with sampling time t =.5s using Euler forward integration. Moreover, cost function terms Q(x) =.25x T x, R(u) =.5u 2, and U(x k, u k ) := Q(x k ) + R(u k ) were selected in (2). For implementation of the stabilizing VI, the initial admissible policy was selected as (feedback linearization based) policy π(x) = ( X 2 )Y X 5Y. The function approximator was selected in a polynomial form made of elements of x up to the fourth order. The region of interest was selected as Ω := [.5,.5] [.5,.5] R 2. Two hundred random xs were selected from Ω in each evaluation of Eq. (9) and least squares method was utilized for finding the parameters (coefficients of the polynomial terms). The minimizer in (7) can be found by setting the gradient of the term subject to minimization to zero, leading to u = 2 R g T V i( f(x, u) ), (46) where V i (x) := ( V (x)/ x) T and g := t[, ] T. Given the point that the unknown u exists on both sides of Eq. (46), the following successive approximation may be used for finding the unknown, [9]. u j+ = 2 R g T V i( f(x, u j ) ), (47) The learning iterations were observed to converge in 48 iterations, as shown through Fig., where histories of parameters of the value function approximator are plotted. To evaluate the optimality of the converged parameters, given by Lemma 4, the optimal trajectory was numerically found for the selected initial condition of = [.5, ] T and compared with the VIbased result in Fig. 2. Given the similarity of the resulting trajectories, it is concluded that at least for the selected initial state, the VI-based result is (near) optimal. Selecting the iteration index of i = 2, calculation of SROA denoted with βr i in Theorem is the next step. Numerically it was found that r = 8.9 is the greatest r using which βr 2 Ω. Given this value for r, region βr 2 is plotted in Fig. 3. Also, different initial conditions were selected and the respective state trajectories under the control policy h 2 ( ) are plotted in the same figure. It can be observed that the state trajectories Weights/Parameters Iterations Fig.. History of weights/parameters of the value function during learning iterations. States X (VI Results) Y (VI Results) X (Open Loop) Y (Open Loop) Time (s) Fig. 2. State trajectories for initial condition = [.5, ] T for ) using h ( ) = h 48 ( ) generated using VI and 2) using open loop numerical solution. did not leave the SROA and hence stayed in Ω and converged to the origin, as expected. These results confirm the ones given by Theorem. To emphasize the importance of finding SROA, the trajectory initiated from = [.45,.45] T is plotted also in Fig. 3, where it is shown that while Ω, the trajectory has exited Ω at some time steps. This has led to some extrapolations by the function approximator, as the approximator was tuned only for Ω. It may be noted that while the trajectory has returned to Ω, this was not guaranteed., it will be guaranteed for the trajectory to remain inside Ω. Finally, the monotonicity of sequence of value functions resulting from stabilizing VI, given by Lemma 2, is numerically illustrated. To this end, regions βr, βr,..., βr for r = 8.88 are plotted in Fig. 4. The monotonicity of value functions leads to βr βr... βr, per the definition of these domains. This feature of the domains is observed to hold through Fig. 4. Also, an initial condition was used for control under the two case of using fixed policy h 2 ( ) and evolving policy of M i = 4, i and the results are shown in this figure. The idea is showing that the trajectories could be considerably different, and therefore, analysis of one may not directly apply to the other one. This difference can be seen through the trajectories in Fig. 4. However, if β 2 r VI. CONCLUSIONS Stability of the system under value iteration initiated using an admissible guess was established. Afterwards, the results were extended to the case of applying an evolving control policy. Finally, regions of attraction were established, such that if the initial condition is within the region, the entire trajectory stays inside the region over which the controller is tuned. This study, however, is mainly a theoretical result as it does not include effects of approximation errors prevalent in practice. Future work is on incorporation of these errors. VII. ACKNOWLEDGMENT The author is thankful for constructive comments of anonymous reviewers and associate editor. 6

7 Y =[.3,.] T =[.45,.45] T Region of Interest =[.7,.44] T 2 = SROA =[.9,.63] T =[.2,.37] T =[.88,.46] T X Fig. 3. State trajectories for different initial conditions generated using fixed policy h 2 ( ) and the subset of region of attraction with r = 8.9. proof, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 38, pp , 28. [2] K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Networks, vol. 2, no. 5, pp , 989. [3] W. Rudin, Principles of Mathematical Analysis. McGraw-Hill, 3rd ed., 976. [4] B. Jakubczyk, Feedback linearization of discrete-time systems, Systems & Control Letters, vol. 9, no. 5, pp. 4 46, 987. [5] E. Aranda-Bricaire, Ü. Kotta, and C. Moog, Linearization of discretetime systems, SIAM Journal on Control and Optimization, vol. 34, no. 6, pp , 996. [6] A. Jadbabaie, J. Yu, and J. Hauser, Unconstrained receding-horizon control of nonlinear systems, IEEE Transactions on Automatic Control, vol. 46, no. 5, pp , 2. [7] D. E. Kirk, Optimal control theory; an introduction. Prentice-Hall, 97. [8] R. Kalman and J. Bertram, Control system analysis and design via the second method of lyapunov, Trans. ASME, vol., pp , 96. [9] A. Heydari, Revisiting approximate dynamic programming and its convergence, IEEE Transactions on Cybernetics, vol. 44, no. 2, pp , 24. [2] A. Heydari, Analyzing policy iteration in optimal control, in American Control Conference, pp , Region of Interest 48 = βr *.5 3 Y.5 48 = βr * Trajectory under evolving policy Trajectory under fixed policy h 2 (.) X Fig. 4. State trajectories generated using fixed policy h 2 ( ) and evolving policy with M i = 4, i with = [.54,.24] T and regions βr i s with r = REFERENCES [] A. Heydari, Analysis of stabilizing value iteration for adaptive optimal control, in Proceedings of the American Control Conference, 26. [2] P. J. Werbos, Approximate dynamic programming for real-time control and neural modeling, in Handbook of Intelligent Control (D. A. White and D. A. Sofge, eds.), Multiscience Press, 992. [3] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, 998. [4] F. Lewis, D. Vrabie, and K. Vamvoudakis, Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers, IEEE Control Systems, vol. 32, pp. 76 5, 22. [5] D. Liu and Q. Wei, Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems, IEEE Transactions on Neural Networks and Learning Systems, vol. 25, pp , 24. [6] Q. Wei, D. Liu, and H. Lin, Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems, IEEE Transactions on Cybernetics, vol. 46, no. 3, pp , 26. [7] A. Heydari, Theoretical and numerical analysis of approximate dynamic programming with approximation errors, Journal of Guidance, Control, and Dynamics, vol. 39, pp. 3 3, 26. [8] H. Khalil, Nonlinear Systems. Prentice-Hall, 22. [9] Q. Wei and D. Liu, A novel iterative θ-adaptive dynamic programming for discrete-time nonlinear systems, IEEE Transactions on Automation Science and Engineering, vol., no. 4, pp. 76 9, 24. [] D. P. Bertsekas, Value and policy iterations in optimal control and adaptive dynamic programming, IEEE Transactions on Neural Networks and Learning Systems, vol. 28, pp. 5 59, 27. [] A. Al-Tamimi, F. Lewis, and M. Abu-Khalaf, Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence 7

arxiv: v1 [math.oc] 23 Oct 2017

arxiv: v1 [math.oc] 23 Oct 2017 Stability Analysis of Optimal Adaptive Control using Value Iteration Approximation Errors Ali Heydari arxiv:1710.08530v1 [math.oc] 23 Oct 2017 Abstract Adaptive optimal control using value iteration initiated

More information

Optimal Scheduling for Reference Tracking or State Regulation using Reinforcement Learning

Optimal Scheduling for Reference Tracking or State Regulation using Reinforcement Learning Optimal Scheduling for Reference Tracking or State Regulation using Reinforcement Learning Ali Heydari Abstract The problem of optimal control of autonomous nonlinear switching systems with infinite-horizon

More information

Optimal Triggering of Networked Control Systems

Optimal Triggering of Networked Control Systems Optimal Triggering of Networked Control Systems Ali Heydari 1, Member, IEEE Abstract This study is focused on bandwidth allocation in nonlinear networked control systems. The objective is optimal triggering/scheduling

More information

A Globally Stabilizing Receding Horizon Controller for Neutrally Stable Linear Systems with Input Constraints 1

A Globally Stabilizing Receding Horizon Controller for Neutrally Stable Linear Systems with Input Constraints 1 A Globally Stabilizing Receding Horizon Controller for Neutrally Stable Linear Systems with Input Constraints 1 Ali Jadbabaie, Claudio De Persis, and Tae-Woong Yoon 2 Department of Electrical Engineering

More information

On the stability of receding horizon control with a general terminal cost

On the stability of receding horizon control with a general terminal cost On the stability of receding horizon control with a general terminal cost Ali Jadbabaie and John Hauser Abstract We study the stability and region of attraction properties of a family of receding horizon

More information

Lyapunov Stability Theory

Lyapunov Stability Theory Lyapunov Stability Theory Peter Al Hokayem and Eduardo Gallestey March 16, 2015 1 Introduction In this lecture we consider the stability of equilibrium points of autonomous nonlinear systems, both in continuous

More information

Distributed Receding Horizon Control of Cost Coupled Systems

Distributed Receding Horizon Control of Cost Coupled Systems Distributed Receding Horizon Control of Cost Coupled Systems William B. Dunbar Abstract This paper considers the problem of distributed control of dynamically decoupled systems that are subject to decoupled

More information

Optimal Stopping Problems

Optimal Stopping Problems 2.997 Decision Making in Large Scale Systems March 3 MIT, Spring 2004 Handout #9 Lecture Note 5 Optimal Stopping Problems In the last lecture, we have analyzed the behavior of T D(λ) for approximating

More information

Procedia Computer Science 00 (2011) 000 6

Procedia Computer Science 00 (2011) 000 6 Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-

More information

Adaptive Nonlinear Model Predictive Control with Suboptimality and Stability Guarantees

Adaptive Nonlinear Model Predictive Control with Suboptimality and Stability Guarantees Adaptive Nonlinear Model Predictive Control with Suboptimality and Stability Guarantees Pontus Giselsson Department of Automatic Control LTH Lund University Box 118, SE-221 00 Lund, Sweden pontusg@control.lth.se

More information

On robustness of suboptimal min-max model predictive control *

On robustness of suboptimal min-max model predictive control * Manuscript received June 5, 007; revised Sep., 007 On robustness of suboptimal min-max model predictive control * DE-FENG HE, HAI-BO JI, TAO ZHENG Department of Automation University of Science and Technology

More information

ECE7850 Lecture 8. Nonlinear Model Predictive Control: Theoretical Aspects

ECE7850 Lecture 8. Nonlinear Model Predictive Control: Theoretical Aspects ECE7850 Lecture 8 Nonlinear Model Predictive Control: Theoretical Aspects Model Predictive control (MPC) is a powerful control design method for constrained dynamical systems. The basic principles and

More information

arxiv:submit/ [cs.sy] 17 Dec 2014

arxiv:submit/ [cs.sy] 17 Dec 2014 Optimal Triggering of Networked Control Systems Ali Heydari 1 arxiv:submit/1141497 [cs.sy] 17 Dec 2014 Abstract The problem of resource allocation of nonlinear networked control systems is investigated,

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

Enlarged terminal sets guaranteeing stability of receding horizon control

Enlarged terminal sets guaranteeing stability of receding horizon control Enlarged terminal sets guaranteeing stability of receding horizon control J.A. De Doná a, M.M. Seron a D.Q. Mayne b G.C. Goodwin a a School of Electrical Engineering and Computer Science, The University

More information

Value and Policy Iteration

Value and Policy Iteration Chapter 7 Value and Policy Iteration 1 For infinite horizon problems, we need to replace our basic computational tool, the DP algorithm, which we used to compute the optimal cost and policy for finite

More information

ESTIMATES ON THE PREDICTION HORIZON LENGTH IN MODEL PREDICTIVE CONTROL

ESTIMATES ON THE PREDICTION HORIZON LENGTH IN MODEL PREDICTIVE CONTROL ESTIMATES ON THE PREDICTION HORIZON LENGTH IN MODEL PREDICTIVE CONTROL K. WORTHMANN Abstract. We are concerned with model predictive control without stabilizing terminal constraints or costs. Here, our

More information

L p Approximation of Sigma Pi Neural Networks

L p Approximation of Sigma Pi Neural Networks IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 6, NOVEMBER 2000 1485 L p Approximation of Sigma Pi Neural Networks Yue-hu Luo and Shi-yi Shen Abstract A feedforward Sigma Pi neural networks with a

More information

Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate

Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate www.scichina.com info.scichina.com www.springerlin.com Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate WEI Chen & CHEN ZongJi School of Automation

More information

Decentralized Control of Nonlinear Multi-Agent Systems Using Single Network Adaptive Critics

Decentralized Control of Nonlinear Multi-Agent Systems Using Single Network Adaptive Critics Decentralized Control of Nonlinear Multi-Agent Systems Using Single Network Adaptive Critics Ali Heydari Mechanical & Aerospace Engineering Dept. Missouri University of Science and Technology Rolla, MO,

More information

A Generalization of Barbalat s Lemma with Applications to Robust Model Predictive Control

A Generalization of Barbalat s Lemma with Applications to Robust Model Predictive Control A Generalization of Barbalat s Lemma with Applications to Robust Model Predictive Control Fernando A. C. C. Fontes 1 and Lalo Magni 2 1 Officina Mathematica, Departamento de Matemática para a Ciência e

More information

AFAULT diagnosis procedure is typically divided into three

AFAULT diagnosis procedure is typically divided into three 576 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 47, NO. 4, APRIL 2002 A Robust Detection and Isolation Scheme for Abrupt and Incipient Faults in Nonlinear Systems Xiaodong Zhang, Marios M. Polycarpou,

More information

L 2 -induced Gains of Switched Systems and Classes of Switching Signals

L 2 -induced Gains of Switched Systems and Classes of Switching Signals L 2 -induced Gains of Switched Systems and Classes of Switching Signals Kenji Hirata and João P. Hespanha Abstract This paper addresses the L 2-induced gain analysis for switched linear systems. We exploit

More information

On Piecewise Quadratic Control-Lyapunov Functions for Switched Linear Systems

On Piecewise Quadratic Control-Lyapunov Functions for Switched Linear Systems On Piecewise Quadratic Control-Lyapunov Functions for Switched Linear Systems Wei Zhang, Alessandro Abate, Michael P. Vitus and Jianghai Hu Abstract In this paper, we prove that a discrete-time switched

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

Stability Analysis and Synthesis for Scalar Linear Systems With a Quantized Feedback

Stability Analysis and Synthesis for Scalar Linear Systems With a Quantized Feedback IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 48, NO 9, SEPTEMBER 2003 1569 Stability Analysis and Synthesis for Scalar Linear Systems With a Quantized Feedback Fabio Fagnani and Sandro Zampieri Abstract

More information

THE INVERSE FUNCTION THEOREM

THE INVERSE FUNCTION THEOREM THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)

More information

The Heine-Borel and Arzela-Ascoli Theorems

The Heine-Borel and Arzela-Ascoli Theorems The Heine-Borel and Arzela-Ascoli Theorems David Jekel October 29, 2016 This paper explains two important results about compactness, the Heine- Borel theorem and the Arzela-Ascoli theorem. We prove them

More information

EE C128 / ME C134 Feedback Control Systems

EE C128 / ME C134 Feedback Control Systems EE C128 / ME C134 Feedback Control Systems Lecture Additional Material Introduction to Model Predictive Control Maximilian Balandat Department of Electrical Engineering & Computer Science University of

More information

Lecture notes for Analysis of Algorithms : Markov decision processes

Lecture notes for Analysis of Algorithms : Markov decision processes Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with

More information

3 Stability and Lyapunov Functions

3 Stability and Lyapunov Functions CDS140a Nonlinear Systems: Local Theory 02/01/2011 3 Stability and Lyapunov Functions 3.1 Lyapunov Stability Denition: An equilibrium point x 0 of (1) is stable if for all ɛ > 0, there exists a δ > 0 such

More information

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology, Kanpur Recurrent Networks p.1/40 Subjects

More information

Output Input Stability and Minimum-Phase Nonlinear Systems

Output Input Stability and Minimum-Phase Nonlinear Systems 422 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 47, NO. 3, MARCH 2002 Output Input Stability and Minimum-Phase Nonlinear Systems Daniel Liberzon, Member, IEEE, A. Stephen Morse, Fellow, IEEE, and Eduardo

More information

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Gabriel Y. Weintraub, Lanier Benkard, and Benjamin Van Roy Stanford University {gweintra,lanierb,bvr}@stanford.edu Abstract

More information

Learning Model Predictive Control for Iterative Tasks: A Computationally Efficient Approach for Linear System

Learning Model Predictive Control for Iterative Tasks: A Computationally Efficient Approach for Linear System Learning Model Predictive Control for Iterative Tasks: A Computationally Efficient Approach for Linear System Ugo Rosolia Francesco Borrelli University of California at Berkeley, Berkeley, CA 94701, USA

More information

Convergence Rate of Nonlinear Switched Systems

Convergence Rate of Nonlinear Switched Systems Convergence Rate of Nonlinear Switched Systems Philippe JOUAN and Saïd NACIRI arxiv:1511.01737v1 [math.oc] 5 Nov 2015 January 23, 2018 Abstract This paper is concerned with the convergence rate of the

More information

A Stable Block Model Predictive Control with Variable Implementation Horizon

A Stable Block Model Predictive Control with Variable Implementation Horizon American Control Conference June 8-,. Portland, OR, USA WeB9. A Stable Block Model Predictive Control with Variable Implementation Horizon Jing Sun, Shuhao Chen, Ilya Kolmanovsky Abstract In this paper,

More information

LMI Methods in Optimal and Robust Control

LMI Methods in Optimal and Robust Control LMI Methods in Optimal and Robust Control Matthew M. Peet Arizona State University Lecture 15: Nonlinear Systems and Lyapunov Functions Overview Our next goal is to extend LMI s and optimization to nonlinear

More information

ROBUSTNESS OF PERFORMANCE AND STABILITY FOR MULTISTEP AND UPDATED MULTISTEP MPC SCHEMES. Lars Grüne and Vryan Gil Palma

ROBUSTNESS OF PERFORMANCE AND STABILITY FOR MULTISTEP AND UPDATED MULTISTEP MPC SCHEMES. Lars Grüne and Vryan Gil Palma DISCRETE AND CONTINUOUS DYNAMICAL SYSTEMS Volume 35, Number 9, September 2015 doi:10.3934/dcds.2015.35.xx pp. X XX ROBUSTNESS OF PERFORMANCE AND STABILITY FOR MULTISTEP AND UPDATED MULTISTEP MPC SCHEMES

More information

On reduction of differential inclusions and Lyapunov stability

On reduction of differential inclusions and Lyapunov stability 1 On reduction of differential inclusions and Lyapunov stability Rushikesh Kamalapurkar, Warren E. Dixon, and Andrew R. Teel arxiv:1703.07071v5 [cs.sy] 25 Oct 2018 Abstract In this paper, locally Lipschitz

More information

A LaSalle version of Matrosov theorem

A LaSalle version of Matrosov theorem 5th IEEE Conference on Decision Control European Control Conference (CDC-ECC) Orlo, FL, USA, December -5, A LaSalle version of Matrosov theorem Alessro Astolfi Laurent Praly Abstract A weak version of

More information

MCE693/793: Analysis and Control of Nonlinear Systems

MCE693/793: Analysis and Control of Nonlinear Systems MCE693/793: Analysis and Control of Nonlinear Systems Lyapunov Stability - I Hanz Richter Mechanical Engineering Department Cleveland State University Definition of Stability - Lyapunov Sense Lyapunov

More information

Viscosity Solutions of the Bellman Equation for Perturbed Optimal Control Problems with Exit Times 0

Viscosity Solutions of the Bellman Equation for Perturbed Optimal Control Problems with Exit Times 0 Viscosity Solutions of the Bellman Equation for Perturbed Optimal Control Problems with Exit Times Michael Malisoff Department of Mathematics Louisiana State University Baton Rouge, LA 783-4918 USA malisoff@mathlsuedu

More information

Adaptive Predictive Observer Design for Class of Uncertain Nonlinear Systems with Bounded Disturbance

Adaptive Predictive Observer Design for Class of Uncertain Nonlinear Systems with Bounded Disturbance International Journal of Control Science and Engineering 2018, 8(2): 31-35 DOI: 10.5923/j.control.20180802.01 Adaptive Predictive Observer Design for Class of Saeed Kashefi *, Majid Hajatipor Faculty of

More information

Disturbance Attenuation Properties for Discrete-Time Uncertain Switched Linear Systems

Disturbance Attenuation Properties for Discrete-Time Uncertain Switched Linear Systems Disturbance Attenuation Properties for Discrete-Time Uncertain Switched Linear Systems Hai Lin Department of Electrical Engineering University of Notre Dame Notre Dame, IN 46556, USA Panos J. Antsaklis

More information

An Optimal Tracking Approach to Formation Control of Nonlinear Multi-Agent Systems

An Optimal Tracking Approach to Formation Control of Nonlinear Multi-Agent Systems AIAA Guidance, Navigation, and Control Conference 13-16 August 212, Minneapolis, Minnesota AIAA 212-4694 An Optimal Tracking Approach to Formation Control of Nonlinear Multi-Agent Systems Ali Heydari 1

More information

Hybrid Systems - Lecture n. 3 Lyapunov stability

Hybrid Systems - Lecture n. 3 Lyapunov stability OUTLINE Focus: stability of equilibrium point Hybrid Systems - Lecture n. 3 Lyapunov stability Maria Prandini DEI - Politecnico di Milano E-mail: prandini@elet.polimi.it continuous systems decribed by

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

IMPROVED MPC DESIGN BASED ON SATURATING CONTROL LAWS

IMPROVED MPC DESIGN BASED ON SATURATING CONTROL LAWS IMPROVED MPC DESIGN BASED ON SATURATING CONTROL LAWS D. Limon, J.M. Gomes da Silva Jr., T. Alamo and E.F. Camacho Dpto. de Ingenieria de Sistemas y Automática. Universidad de Sevilla Camino de los Descubrimientos

More information

On the Inherent Robustness of Suboptimal Model Predictive Control

On the Inherent Robustness of Suboptimal Model Predictive Control On the Inherent Robustness of Suboptimal Model Predictive Control James B. Rawlings, Gabriele Pannocchia, Stephen J. Wright, and Cuyler N. Bates Department of Chemical and Biological Engineering and Computer

More information

A PROVABLY CONVERGENT DYNAMIC WINDOW APPROACH TO OBSTACLE AVOIDANCE

A PROVABLY CONVERGENT DYNAMIC WINDOW APPROACH TO OBSTACLE AVOIDANCE Submitted to the IFAC (b 02), September 2001 A PROVABLY CONVERGENT DYNAMIC WINDOW APPROACH TO OBSTACLE AVOIDANCE Petter Ögren,1 Naomi E. Leonard,2 Division of Optimization and Systems Theory, Royal Institute

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

Adaptive linear quadratic control using policy. iteration. Steven J. Bradtke. University of Massachusetts.

Adaptive linear quadratic control using policy. iteration. Steven J. Bradtke. University of Massachusetts. Adaptive linear quadratic control using policy iteration Steven J. Bradtke Computer Science Department University of Massachusetts Amherst, MA 01003 bradtke@cs.umass.edu B. Erik Ydstie Department of Chemical

More information

A Novel Integral-Based Event Triggering Control for Linear Time-Invariant Systems

A Novel Integral-Based Event Triggering Control for Linear Time-Invariant Systems 53rd IEEE Conference on Decision and Control December 15-17, 2014. Los Angeles, California, USA A Novel Integral-Based Event Triggering Control for Linear Time-Invariant Systems Seyed Hossein Mousavi 1,

More information

MPC: implications of a growth condition on exponentially controllable systems

MPC: implications of a growth condition on exponentially controllable systems MPC: implications of a growth condition on exponentially controllable systems Lars Grüne, Jürgen Pannek, Marcus von Lossow, Karl Worthmann Mathematical Department, University of Bayreuth, Bayreuth, Germany

More information

On Backward Product of Stochastic Matrices

On Backward Product of Stochastic Matrices On Backward Product of Stochastic Matrices Behrouz Touri and Angelia Nedić 1 Abstract We study the ergodicity of backward product of stochastic and doubly stochastic matrices by introducing the concept

More information

Integrator Backstepping using Barrier Functions for Systems with Multiple State Constraints

Integrator Backstepping using Barrier Functions for Systems with Multiple State Constraints Integrator Backstepping using Barrier Functions for Systems with Multiple State Constraints Khoi Ngo Dep. Engineering, Australian National University, Australia Robert Mahony Dep. Engineering, Australian

More information

Online Adaptive Approximate Optimal Tracking Control with Simplified Dual Approximation Structure for Continuous-time Unknown Nonlinear Systems

Online Adaptive Approximate Optimal Tracking Control with Simplified Dual Approximation Structure for Continuous-time Unknown Nonlinear Systems 4 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL., NO. 4, OCTOBER 04 Online Adaptive Approximate Optimal Tracking Control with Simplified Dual Approximation Structure for Continuous-time Unknown Nonlinear

More information

Hybrid Systems Course Lyapunov stability

Hybrid Systems Course Lyapunov stability Hybrid Systems Course Lyapunov stability OUTLINE Focus: stability of an equilibrium point continuous systems decribed by ordinary differential equations (brief review) hybrid automata OUTLINE Focus: stability

More information

Basic Deterministic Dynamic Programming

Basic Deterministic Dynamic Programming Basic Deterministic Dynamic Programming Timothy Kam School of Economics & CAMA Australian National University ECON8022, This version March 17, 2008 Motivation What do we do? Outline Deterministic IHDP

More information

Applications of Controlled Invariance to the l 1 Optimal Control Problem

Applications of Controlled Invariance to the l 1 Optimal Control Problem Applications of Controlled Invariance to the l 1 Optimal Control Problem Carlos E.T. Dórea and Jean-Claude Hennet LAAS-CNRS 7, Ave. du Colonel Roche, 31077 Toulouse Cédex 4, FRANCE Phone : (+33) 61 33

More information

On the Inherent Robustness of Suboptimal Model Predictive Control

On the Inherent Robustness of Suboptimal Model Predictive Control On the Inherent Robustness of Suboptimal Model Predictive Control James B. Rawlings, Gabriele Pannocchia, Stephen J. Wright, and Cuyler N. Bates Department of Chemical & Biological Engineering Computer

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Hybrid Control and Switched Systems. Lecture #7 Stability and convergence of ODEs

Hybrid Control and Switched Systems. Lecture #7 Stability and convergence of ODEs Hybrid Control and Switched Systems Lecture #7 Stability and convergence of ODEs João P. Hespanha University of California at Santa Barbara Summary Lyapunov stability of ODEs epsilon-delta and beta-function

More information

Adaptive Control of a Class of Nonlinear Systems with Nonlinearly Parameterized Fuzzy Approximators

Adaptive Control of a Class of Nonlinear Systems with Nonlinearly Parameterized Fuzzy Approximators IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 2, APRIL 2001 315 Adaptive Control of a Class of Nonlinear Systems with Nonlinearly Parameterized Fuzzy Approximators Hugang Han, Chun-Yi Su, Yury Stepanenko

More information

Suboptimality of minmax MPC. Seungho Lee. ẋ(t) = f(x(t), u(t)), x(0) = x 0, t 0 (1)

Suboptimality of minmax MPC. Seungho Lee. ẋ(t) = f(x(t), u(t)), x(0) = x 0, t 0 (1) Suboptimality of minmax MPC Seungho Lee In this paper, we consider particular case of Model Predictive Control (MPC) when the problem that needs to be solved in each sample time is the form of min max

More information

Monotone Control System. Brad C. Yu SEACS, National ICT Australia And RSISE, The Australian National University June, 2005

Monotone Control System. Brad C. Yu SEACS, National ICT Australia And RSISE, The Australian National University June, 2005 Brad C. Yu SEACS, National ICT Australia And RSISE, The Australian National University June, 005 Foreword The aim of this presentation is to give a (primitive) overview of monotone systems and monotone

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Prioritized Sweeping Converges to the Optimal Value Function

Prioritized Sweeping Converges to the Optimal Value Function Technical Report DCS-TR-631 Prioritized Sweeping Converges to the Optimal Value Function Lihong Li and Michael L. Littman {lihong,mlittman}@cs.rutgers.edu RL 3 Laboratory Department of Computer Science

More information

Lecture 4. Chapter 4: Lyapunov Stability. Eugenio Schuster. Mechanical Engineering and Mechanics Lehigh University.

Lecture 4. Chapter 4: Lyapunov Stability. Eugenio Schuster. Mechanical Engineering and Mechanics Lehigh University. Lecture 4 Chapter 4: Lyapunov Stability Eugenio Schuster schuster@lehigh.edu Mechanical Engineering and Mechanics Lehigh University Lecture 4 p. 1/86 Autonomous Systems Consider the autonomous system ẋ

More information

Lecture Note 7: Switching Stabilization via Control-Lyapunov Function

Lecture Note 7: Switching Stabilization via Control-Lyapunov Function ECE7850: Hybrid Systems:Theory and Applications Lecture Note 7: Switching Stabilization via Control-Lyapunov Function Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio

More information

Adaptive Dynamic Inversion Control of a Linear Scalar Plant with Constrained Control Inputs

Adaptive Dynamic Inversion Control of a Linear Scalar Plant with Constrained Control Inputs 5 American Control Conference June 8-, 5. Portland, OR, USA ThA. Adaptive Dynamic Inversion Control of a Linear Scalar Plant with Constrained Control Inputs Monish D. Tandale and John Valasek Abstract

More information

Observations on the Stability Properties of Cooperative Systems

Observations on the Stability Properties of Cooperative Systems 1 Observations on the Stability Properties of Cooperative Systems Oliver Mason and Mark Verwoerd Abstract We extend two fundamental properties of positive linear time-invariant (LTI) systems to homogeneous

More information

Pattern generation, topology, and non-holonomic systems

Pattern generation, topology, and non-holonomic systems Systems & Control Letters ( www.elsevier.com/locate/sysconle Pattern generation, topology, and non-holonomic systems Abdol-Reza Mansouri Division of Engineering and Applied Sciences, Harvard University,

More information

Passivity-based Stabilization of Non-Compact Sets

Passivity-based Stabilization of Non-Compact Sets Passivity-based Stabilization of Non-Compact Sets Mohamed I. El-Hawwary and Manfredi Maggiore Abstract We investigate the stabilization of closed sets for passive nonlinear systems which are contained

More information

Global stabilization of feedforward systems with exponentially unstable Jacobian linearization

Global stabilization of feedforward systems with exponentially unstable Jacobian linearization Global stabilization of feedforward systems with exponentially unstable Jacobian linearization F Grognard, R Sepulchre, G Bastin Center for Systems Engineering and Applied Mechanics Université catholique

More information

Adaptive and Robust Controls of Uncertain Systems With Nonlinear Parameterization

Adaptive and Robust Controls of Uncertain Systems With Nonlinear Parameterization IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 48, NO. 0, OCTOBER 003 87 Adaptive and Robust Controls of Uncertain Systems With Nonlinear Parameterization Zhihua Qu Abstract Two classes of partially known

More information

FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES. Danlei Chu, Tongwen Chen, Horacio J. Marquez

FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES. Danlei Chu, Tongwen Chen, Horacio J. Marquez FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES Danlei Chu Tongwen Chen Horacio J Marquez Department of Electrical and Computer Engineering University of Alberta Edmonton

More information

On the Stabilization of Neutrally Stable Linear Discrete Time Systems

On the Stabilization of Neutrally Stable Linear Discrete Time Systems TWCCC Texas Wisconsin California Control Consortium Technical report number 2017 01 On the Stabilization of Neutrally Stable Linear Discrete Time Systems Travis J. Arnold and James B. Rawlings Department

More information

arxiv: v2 [cs.sy] 29 Mar 2016

arxiv: v2 [cs.sy] 29 Mar 2016 Approximate Dynamic Programming: a Q-Function Approach Paul Beuchat, Angelos Georghiou and John Lygeros 1 ariv:1602.07273v2 [cs.sy] 29 Mar 2016 Abstract In this paper we study both the value function and

More information

Robotics. Control Theory. Marc Toussaint U Stuttgart

Robotics. Control Theory. Marc Toussaint U Stuttgart Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,

More information

A Complete Stability Analysis of Planar Discrete-Time Linear Systems Under Saturation

A Complete Stability Analysis of Planar Discrete-Time Linear Systems Under Saturation 710 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL 48, NO 6, JUNE 2001 A Complete Stability Analysis of Planar Discrete-Time Linear Systems Under Saturation Tingshu

More information

An asymptotic ratio characterization of input-to-state stability

An asymptotic ratio characterization of input-to-state stability 1 An asymptotic ratio characterization of input-to-state stability Daniel Liberzon and Hyungbo Shim Abstract For continuous-time nonlinear systems with inputs, we introduce the notion of an asymptotic

More information

Topic # /31 Feedback Control Systems. Analysis of Nonlinear Systems Lyapunov Stability Analysis

Topic # /31 Feedback Control Systems. Analysis of Nonlinear Systems Lyapunov Stability Analysis Topic # 16.30/31 Feedback Control Systems Analysis of Nonlinear Systems Lyapunov Stability Analysis Fall 010 16.30/31 Lyapunov Stability Analysis Very general method to prove (or disprove) stability of

More information

ESC794: Special Topics: Model Predictive Control

ESC794: Special Topics: Model Predictive Control ESC794: Special Topics: Model Predictive Control Nonlinear MPC Analysis : Part 1 Reference: Nonlinear Model Predictive Control (Ch.3), Grüne and Pannek Hanz Richter, Professor Mechanical Engineering Department

More information

LYAPUNOV theory plays a major role in stability analysis.

LYAPUNOV theory plays a major role in stability analysis. 1090 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 49, NO. 7, JULY 2004 Satisficing: A New Approach to Constructive Nonlinear Control J. Willard Curtis, Member, IEEE, and Randal W. Beard, Senior Member,

More information

1030 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, MAY 2011

1030 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, MAY 2011 1030 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 56, NO 5, MAY 2011 L L 2 Low-Gain Feedback: Their Properties, Characterizations Applications in Constrained Control Bin Zhou, Member, IEEE, Zongli Lin,

More information

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5.

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5. VISCOSITY SOLUTIONS PETER HINTZ We follow Han and Lin, Elliptic Partial Differential Equations, 5. 1. Motivation Throughout, we will assume that Ω R n is a bounded and connected domain and that a ij C(Ω)

More information

Speed Profile Optimization for Optimal Path Tracking

Speed Profile Optimization for Optimal Path Tracking Speed Profile Optimization for Optimal Path Tracking Yiming Zhao and Panagiotis Tsiotras Abstract In this paper, we study the problem of minimumtime, and minimum-energy speed profile optimization along

More information

An Adaptive Clustering Method for Model-free Reinforcement Learning

An Adaptive Clustering Method for Model-free Reinforcement Learning An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at

More information

EN Nonlinear Control and Planning in Robotics Lecture 3: Stability February 4, 2015

EN Nonlinear Control and Planning in Robotics Lecture 3: Stability February 4, 2015 EN530.678 Nonlinear Control and Planning in Robotics Lecture 3: Stability February 4, 2015 Prof: Marin Kobilarov 0.1 Model prerequisites Consider ẋ = f(t, x). We will make the following basic assumptions

More information

Optimization-based Modeling and Analysis Techniques for Safety-Critical Software Verification

Optimization-based Modeling and Analysis Techniques for Safety-Critical Software Verification Optimization-based Modeling and Analysis Techniques for Safety-Critical Software Verification Mardavij Roozbehani Eric Feron Laboratory for Information and Decision Systems Department of Aeronautics and

More information

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Optimal Control. McGill COMP 765 Oct 3 rd, 2017 Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps

More information

Chapter III. Stability of Linear Systems

Chapter III. Stability of Linear Systems 1 Chapter III Stability of Linear Systems 1. Stability and state transition matrix 2. Time-varying (non-autonomous) systems 3. Time-invariant systems 1 STABILITY AND STATE TRANSITION MATRIX 2 In this chapter,

More information

Economic MPC using a Cyclic Horizon with Application to Networked Control Systems

Economic MPC using a Cyclic Horizon with Application to Networked Control Systems Economic MPC using a Cyclic Horizon with Application to Networked Control Systems Stefan Wildhagen 1, Matthias A. Müller 1, and Frank Allgöwer 1 arxiv:1902.08132v1 [cs.sy] 21 Feb 2019 1 Institute for Systems

More information

Feedback stabilisation with positive control of dissipative compartmental systems

Feedback stabilisation with positive control of dissipative compartmental systems Feedback stabilisation with positive control of dissipative compartmental systems G. Bastin and A. Provost Centre for Systems Engineering and Applied Mechanics (CESAME Université Catholique de Louvain

More information

Stabilization of Discrete-Time Switched Linear Systems: A Control-Lyapunov Function Approach

Stabilization of Discrete-Time Switched Linear Systems: A Control-Lyapunov Function Approach Stabilization of Discrete-Time Switched Linear Systems: A Control-Lyapunov Function Approach Wei Zhang 1, Alessandro Abate 2 and Jianghai Hu 1 1 School of Electrical and Computer Engineering, Purdue University,

More information

Navigation and Obstacle Avoidance via Backstepping for Mechanical Systems with Drift in the Closed Loop

Navigation and Obstacle Avoidance via Backstepping for Mechanical Systems with Drift in the Closed Loop Navigation and Obstacle Avoidance via Backstepping for Mechanical Systems with Drift in the Closed Loop Jan Maximilian Montenbruck, Mathias Bürger, Frank Allgöwer Abstract We study backstepping controllers

More information

STABILITY OF PLANAR NONLINEAR SWITCHED SYSTEMS

STABILITY OF PLANAR NONLINEAR SWITCHED SYSTEMS LABORATOIRE INORMATIQUE, SINAUX ET SYSTÈMES DE SOPHIA ANTIPOLIS UMR 6070 STABILITY O PLANAR NONLINEAR SWITCHED SYSTEMS Ugo Boscain, régoire Charlot Projet TOpModel Rapport de recherche ISRN I3S/RR 2004-07

More information