CONTROLLER SYNTHESIS FOR SWITCHED SYSTEMS USING APPROXIMATE DYNAMIC PROGRAMMING. A Dissertation. Submitted to the Faculty.

Size: px

Start display at page:

Download "CONTROLLER SYNTHESIS FOR SWITCHED SYSTEMS USING APPROXIMATE DYNAMIC PROGRAMMING. A Dissertation. Submitted to the Faculty."

April Cook
6 years ago
Views:

1 CONTROLLER SYNTHESIS FOR SWITCHED SYSTEMS USING APPROXIMATE DYNAMIC PROGRAMMING A Dissertation Submitted to the Faculty of Purdue University by Wei Zhang In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December 2009 Purdue University West Lafayette, Indiana

2 ii ACKNOWLEDGMENTS This dissertation concludes my four-and-half year PhD study at Purdue, which is perhaps the hardest but most rewarding experience of my life. For this, I must thank all the wonderful people who have made it possible. First of all, I want to express my deepest gratitude to my PhD advisor, Prof. Jianghai Hu, for his guidance, patience, and for giving me the freedom and the encouragement to pursue the research that led to this dissertation. He has constantly maintained the highest level of accessibility for research discussions and has guided me through most of the challenges of my research. I am truly amazed by the breadth and depth of his knowledge, by his vision of scientific problems, and by his impeccable attitude towards research. I am also greatly indebted to his generous support for my numerous conference travelings and my academic visitings to Stanford and UC San Diego, which has considerably broadened my knowledge in control theory and applications. Without any doubt, he is, in every aspect, a great mentor and a role model to me. I would also like to thank Prof. Raymond Decarlo, Inseok Hwang and Martin Coreless for serving on my PhD committee. Prof. Decarlo has constantly inspired me with his unique view on linear systems and switched systems. His principles projection, factorization and decomposition will always stay in my mind. I am also grateful to his deep thoughts that taking a high-level math course is not for its direct usefulness in one s research but for its training on one s ability of abstract thinking. This provides me a huge momentum to learn the extremely useful math courses which otherwise I would not be interested in. The main results of the switched LQR problem presented in this thesis was developed when I took the course Applied Optimal Control and Estimation with Prof. Hwang. His interest in this problem encouraged me to delve deeply into the problem which eventually led to the rest of the

3 iii thesis. I also appreciate Prof. Coreless s consent for serving on my PhD committee at short notice, and I personally have a great respect for his expertise on nonlinear systems. I am deeply indebted to Prof. Jose E. Figueroa-Lopez in the Statistic Department at Purdue, who has taught me many profound concepts in stochastic systems and sparked my great interest in mathematical finance. My sincere gratitude also goes to Prof. Alessandro Abate at Delft University of Technology, who introduced me the stochastic reachability problem during my visit at Stanford and collaborated with me on the quadratic regulation problem and the stabilization problem for switched linear systems. Many parts of this thesis benefits from our countless discussions at Stanford and after I came back to Purdue. I also want to thank Micheal Vitus and Prof. Claire Tomlin at UC Berkeley for coming up with the wonderful idea for sensor scheduling problem that greatly motivated me to study the general approximate dynamic programming problem. I am also excited about our upcoming collaborations. It is such a blessing to be surrounded by so many brilliant and nice friends at Purdue. I am particularly grateful to Wenqi Shen and Yang Liu for being a wonderful part of my life for the past few years. Thanks also goes to Jen-Yeu Chen, Muna Albatesman, Jianming Lian, Karan Kalsi, Maria Vlachopoulou, Jianing Wei, Zhou Yu, Jun Cai, Ziqing Mao, Ling Tong, Tiancheng Li, Rui Pan, Ziqiang Lin, and many more, for making my otherwise boring life at Purdue colorful and enjoyable. Finally, I want to thank my wife, Rong Zhang, and my parants, Shouguang Zhang and Yuguang shang, for their support, patience and unconditional love. The This dissertation is dedicated to them.

4 iv TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES SYMBOLS ABBREVIATIONS ABSTRACT Page 1 INTRODUCTION Backgrounds and Motivations Preview of Main Results UNDISCOUNTED APPROXIMATE DYNAMIC PROGRAMMING Problem Formulations Value Iteration Operator Main Assumption Exponential Convergence of Value Iteration Exponential Stability of the Optimal Trajectory Convergence of Value Iteration Approximation in Dynamic Programming Performance of Relaxed Policy in Finite Horizon Performance of Relaxed Policy in Infinite Horizon EXPONENTIAL STABILIZATION USING APPROXIMATE DYNAMIC PROGRAMMING Stabilization Problems Solution via Approximate Dynamic Programming SWITCHED LQR PROBLEMS IN DISCRETE TIME Backgrounds and Our Contribution vii viii ix xi xii

5 v Page 4.2 Problem Formulation Value Function and Optimal Policy Efficient Exact Solution in Finite Horizon Algebraic Redundancy and Equivalent Subsets Computation of (Minimum) Equivalent Subsets Overall Algorithm for Finite Horizon Numerical Examples Suboptimal Solution in Finite Horizon Relaxed Switched Riccati Mapping Suboptimal Policy in Finite Horizon Example Revisited Suboptimal Solution for Infinite Horizon Stabilizing Condition With Efficient Test Performance Bound for π ǫ,k Overall Algorithm Numerical Example STABILIZATION OF DISCRETE-TIME SWITCHED LINEAR SYSTEMS Backgrounds Problem Statement Stabilization Using the DSLQR Controller The Stationary Stabilizing Policy Properties of ξ ǫ k Relationships with Other Controllers Numerical Examples Example Example Example CONCLUSION

6 vi Page 6.1 Summary of main results Future work LIST OF REFERENCES A. PROOF OF THEOREM VITA

7 vii Table LIST OF TABLES Page 5.1 Closed-loop trajectory and controls driven by π 1,6 with x 0 = [0, 1] T Closed-loop trajectory and controls driven by π 0.1,5 with x 0 = [0, 1] T Closed-loop trajectory and controls driven by π 1,5 with x 0 = [0, 1] T for Example matrices in H 1 7 for Example Closed-loop trajectory and controls driven by π 1,7 with x 0 = [1, 0, 1, 1] T for Example

8 viii Figure LIST OF FIGURES Page 4.1 Representations of the value iterations in G + and in 2 A Typical optimal decision regions of a two-switched system, where mode 1 is optimal within the white region and mode 2 is optimal within the gray region. The optimal mode region is divided into smaller homogeneous regions, each of which corresponds to a different optimal-feedback gain Complexity of Algorithm 3 for Example Representations of the relaxed value iteration in G + and in 2 A Complexity comparison between Algorithm 3 and Algorithm 4 with ǫ = Simulation results for Example 1. Top figure: phase-plane trajectories generated by π 1,6 and π0.1,5 starting from the same initial condition x 0 = [0, 1] T. Bottom figure: the corresponding continuous controls Decision regions of Example Simulation results for Example 3 with two different initial conditions: x (1) 0 = [1, 1, 0, 1] T and x (2) 0 = [1, 0, 1, 1] T. (a) norms of the closed-loop trajectories associated with the two initial conditions; (b) the corresponding continuous control sequences; (c) the corresponding mode sequences. 88

9 ix SYMBOLS Z + (Z ++ ) set of nonnegative (positive) integers R + (R ++ ) set of nonnegative (positive) real numbers n, p, M constant positive integers R n, R p n-dimensional and p-dimensional Euclidean space M M := {1,..., M} is the set of subsystem indices (also called discrete control space) X R n continuous state space U R p continuous control space z a generic initial value for the continuous state Γ for each z X, Γ(z) U M is the set of feasible hybrid-control actions when the current continuous state is located at z L L : X U M R + is the running cost function satisfying L(0, 0, v) = 0 for all v M ψ ψ : X R + terminal cost function J(z, π N ) total cost of a policy π N with initial state z V N V N-horizon value function infinite-horizon value function In Chapters 2 and 3, is a short-hand notation for r p, where p Z ++, r R and p denotes the L p norm in R n. In Chapters 4 and 5, denotes specifically the Euclidean norm (i.e. 2-norm) G + ǫ space of nonnegative functions, i.e., G + = {g : X R + { }} relaxation parameter T value iteration operator defined in Definition 2.2.1

10 x E error function defined in Definition R ǫ relaxation operator defined in Definition V ǫ k k-horizon approximate value function defined by (2.11) Ṽ ǫ k A K i (P) T [V ǫ k ] set of all the positive semidefinite matrices Kalman gain matrix for subsystem i M with a given matrix P A ρ M switched Riccati mapping defined in Definition H k switched Riccati set defined in Definition H ǫ k ξ ǫ k ǫ-relaxed SRS In Chapters 2 and 3, it denotes the relaxed hybrid-control law satisfying (2.12), while in Chapters 4 and 5, it denotes the relaxed π ǫ,k δ( ) hybrid-control law for the DSLQR problem characterized by H ǫ k through equation (4.16) π ǫ,k := {ξǫ k, ξǫ k,...} relative distance to the infimum cost for a given policy

11 xi ABBREVIATIONS p.s.d. DP ADP TVECLF ECLF LQR DSLQR SRM SRS ES MES positive semidefinite matrix Dynamic Programming Approximate Dynamic Programming Time-Varying Exponentially Stabilizing Control-Lyapunov Function Exponentially Stabilizing Control-Lyapunov Function Linear Quadratic Regulation Discrete-Time Switched LQR Switched Riccati Mapping Switched Riccati Set Equivalent Subset Minimum Equivalent Subset

12 xii ABSTRACT Zhang, Wei. Ph.D., Purdue University, December, Controller Synthesis for Switched Systems Using Approximate Dynamic Programming. Major Professor: Jianghai Hu. This thesis develops an approximate dynamic programming (ADP) framework for solving optimal control and stabilization problems for a general discrete-time switched nonlinear system. Some important properties of the relaxed value iterations are derived. It is shown that under some mild conditions, the solution generated by the ADP algorithm is exponentially stabilizing and suboptimal. Furthermore, an important connection between the optimal control problem and the exponential stabilization problem is established. It is proved that a switched nonlinear system is exponentially stabilizable if and only if certain finite-horizon approximate value function of a related optimal control problem is an exponentially stabilizing control- Laypunov function (ECLF). This converse ECLF theorem makes the ADP a universal tool for solving the exponential stabilization problem of a general switched nonlinear system. In addition to the general ADP results, the optimal quadratic regulation problem for switched linear systems (DSLQR problem) is studied in details. A new concept, called the relaxed switched Riccati mapping, is introduced to characterize the relaxed value iteration of the DSLQR problem and an efficient algorithm is proposed to compute the iteration and the corresponding suboptimal policy. Furthermore, a stronger converse ECLF theorem is obtained for switched linear systems. It is shown that a switched linear system is exponentially stabilizable if and only if certain finite-horizon approximate value function of a related DSLQR problem is an ECLF. By the DSLQR results, we also know that this ECLF is piecewise quadratic. This result justifies many of the earlier controller synthesis methods that have adopted

13 xiii piecewise quadratic Lyapunov functions for convenience or heuristic reasons. An efficient algorithm based on certain suboptimal solution of the DSLQR problem is also proposed, which is guaranteed to yield an exponentially stabilizing policy, whenever the system is exponentially stabilizable.

14 1 1.1 Backgrounds and Motivations 1. INTRODUCTION Stability is one of the most important properties for dynamical systems. The problem of designing a controller to stabilize a dynamical system is called the stabilization problem. Such a problem is a major research thrust in control theory and engineering, and has been extensively studied in the past [1 4]. While a constructive way to stabilize a linear system has been found [1, 5, 6], the stabilization problem for general nonlinear systems is far from being well understood. Nevertheless, various nonlinear controller design methods have been proposed in the literature, e.g., gain scheduling [7 9], adaptive control [3,10 12], feedback linearization [4,13 15], backstepping [3,14,16], sliding mode control [17 19], etc. All of these are well-established techniques and can be applied to solve numerous nonlinear stabilization problems. For many controller synthesis problems, stability is just a necessary requirement rather than an ultimate design goal. It is often desirable to have a controller that not only stabilizes the system but also optimizes certain design criteria. This kind of problems is the main focus of the optimal control theory [20 22]. Methods for solving optimal control problems can be roughly divided into two categories: variational approach and dynamic programming (DP) approach. These two approaches are based on two different characterizations of an optimal solution. For the variational approach, a solution is deemed as optimal if its performance cannot be improved along any feasible direction. This characterization of optimal solutions can be used to derive various necessary conditions for optimality, and eventually leads to the Pontryagin s maximum principle [23]. The main limitation of the

15 2 variational approach is that it requires the differentiability of the cost function with respect to the decision variables, and only guarantees the optimality in a local sense. In contrast, the DP approach provides a global view of optimal solutions through the Bellman s principle of optimality [24]. According to this principle, a trajectory is optimal if any segment of this trajectory is optimal among all the trajectories joining the two end points of the segment. This optimality condition can be used to characterize the optimal cost-to-go function (value function) backward in time and eventually leads to the Hamilton-Jacobi-Bellman (HJB) equation [25]. The DP approach allows for discrete decision variables and guarantees the global optimality for the obtained solutions. Although the stabilization problem and the optimal control problem are mostly studied independently of each other, there have already been some results that hint at their strong connections. Notable among these is the stability result for model predictive control (MPC) [26, 27]. An MPC controller iteratively solves a finitehorizon optimal control problem at each time instant and only applies the first step of the obtained control strategy to the system. This approach has been widely used in the process control industry since 1980s [28,29]. The popularity of MPC is partly due to its ability to handle state and control constraints which are present for most practical control systems. Recent studies found [26] that the closed-loop stability of a MPC controller can be guaranteed by imposing certain terminal constraint set or by choosing a suitable terminal cost function. The stability result for MPC indicates that a properly formulated optimal control problem can yield a stabilizing controller. This result connects the optimal control problem to the stabilization problem, however, this connection has not been adequately explored in the literature. There are two fundamental questions about this connection remain open: (i) under what conditions, the optimal controller is guaranteed to be stabilizing, and (ii) whether a stabilization problem can always be solved through optimal control. Some partial answers to these questions are available for

16 3 certain simple dynamical systems, while an in-depth understanding of these questions for general nonlinear systems has not yet been attained. This thesis provides a relatively complete answer to the above two questions. Our studies interestingly show that for general discrete-time (switched) nonlinear systems with state and control constraints, an exponential stabilization problem is equivalent to an optimal control problem in the sense that the system is exponentially stabilizable if and only if it can be stabilized by the optimal solution of a properly defined optimal control problem. We now use two simple examples to illustrate this idea. The first example is about the stabilization problem of an unconstrained linear system: x(t + 1) = Ax(t) + Bu(t), t = 0, 1,.... To study this problem, we consider an infinite-horizon optimal control problem with a cost function defined by J(x(0)) = x T (t)qx(t) + u T (t)ru(t), t=0 where Q = Q T := C T C and R = R T are the state and control weighting matrices of appropriate dimensions. The solution of this optimal control problem is the wellknown LQR controller. If we choose Q such that (A, C) is detectable, then by standard LQR theory, the linear system is stabilizable if and only if it is stablizable by the LQR controller. It is interesting to see whether a similar conclusion can be made for a general nonlinear system. For this purpose, we consider a nonlinear system described by x(t + 1) = f(x(t), u(t)), t = 0, 1,..., where x(t) is the system state lies in some state space X, u(t) is the control input lies in some control space U, and f : X U X is an arbitrary nonlinear mapping. For stabilization purpose, the cost function we choose only penalizes the system state and is defined by J(x(0)) = t=0 x(t) 2, where denotes the standard Euclidean norm. Let µ : X U be the optimal control law that achieves the infinite-horizon value function V : X R +. The HJB equation of this optimal control problem is: { V (z) = inf z 2 + V (f(z, u)) } = z 2 + V (f(z, µ (z))), z X. u U

17 4 If the system is exponentially stabilizable, then there clearly exists a constant β < such that z 2 V (z) β z 2, for all z X. This together with the Bellman equation implies that V is a Lyapunov function of the closed-loop system and hence the optimal control law µ is exponentially stabilizing. The above two problems confirm the idea that one can always solve an exponential stabilization problem through a properly-defined optimal control problem. However, this idea will not be practically attractive unless the optimal control problem can be efficiently solved. DP is perhaps the most popular approach for solving an optimal control problem, especially in discrete time. It is well-known that the DP approach suffers greatly from the curse of dimensionality. Since 1990s, various methods have been proposed to efficiently solve a HJB equation by introducing certain approximations on the value functions and/or value iterations [30 34]. These methods have been referred to as neuro-dynamic programming (NDP), approximate dynamic programming (ADP) or relaxed dynamic programming. They have found a great success in the field of operation research, and have also gained considerable momentum in the control community in the past few years. Most previous results on ADP require a finite discrete state space and a discount factor strictly less than 1 in the cost function. These assumptions are reasonable for most operation research problems, but are not appropriate for controller synthesis problems because the obtained control strategy, though has a relatively small discounted total cost, is not guaranteed to yield a stable closed-loop system. The work in this thesis is mostly motivated by the strong connection between a stabilization problem and its corresponding optimal control problem, and by the lack of an efficient way to solve a stabilization-oriented optimal control problem. As will be discussed in the next section, we develop a general ADP framework to efficiently solve optimal control problems with continuous state spaces and undiscounted cost functions. This framework has also been successfully used to solve various controller synthesis problems for switched systems.

18 5 1.2 Preview of Main Results Generally speaking, this thesis is concerned with how to efficiently solve a general optimal control problem with certain numerical relaxations, and how to use the obtained suboptimal solution to solve the exponential stabilization problem. We now provide an overview of the main results discussed in each chapter. In Chapter 2, we develop a general framework to compute and analyze some suboptimal solutions to a constrained optimal control problem associated with a general switched nonlinear system. Motivated by the discussion in the last section, we will give special attentions to the stability property of the closed-loop system driven by a suboptimal control strategy. The framework is developed based on a DP approach. A general relaxation operator is introduced to tackle the computational challenges in representing the value functions on a continuous state space, and in solving the optimization problem associated with the value iterations. This operator accounts for the numerical errors incurred by any approximations of the value functions and/or any relaxations in computing the value iterations. Combining the relaxation operator with the value iteration results in a relaxed value iteration, which naturally leads to a general approximate dynamic programming (ADP) algorithm. Various properties of the algorithm are studied. In particular, we establish conditions under which the policy generated by the ADP algorithm is stabilizing and suboptimal. The ADP framework is successfully used in Chapter 3 to solve the exponential stabilization problem for general switched nonlinear systems. It is shown that a switched nonlinear system is exponentially stabilizable if and only if the approximate value function of a related optimal control problem is a control-lyapunov function. Such a converse control-lyapunov function theorem makes the ADP algorithm a general tool in solving the exponential stabilization problems of switched nonlinear systems. In Chapter 4, we study an important special case of the general optimal control problem studied in Chapter 2 with linear subsystems and a quadratic cost function.

19 6 The problem is a natural extension of the classical LQR problem in the switched linear system context, and is thus called the discrete-time switched LQR (DSLQR) problem. It is shown that the finite-horizon value function of a DSLQR problem is a pointwise minimum of a finite number of quadratic functions characterized by some positive semidefinite (p.s.d.) matrices that can be obtained recursively using the socalled switched Riccati mapping. Explicit expressions are also derived for the optimal switching-control law and the optimal continuous-control law, both of which are of state-feedback form and are homogeneous on the state space. The main challenge in solving the DSLQR problem is that the number of matrices required to characterize the value functions grows exponentially fast as the control horizon increases. A numerical relaxation strategy is introduced to tackle this computational challenge. This strategy can be viewed as a particularization of the general approximate dynamic programming framework developed in Chapter 2. Due to the special structure of the DSLQR problem, we are able to characterize the relaxed value iteration using the so-called relaxed switched Riccati mapping, which can be efficiently computed using convex optimization. Moreover, detailed performance bounds in terms of subsystem matrices are derived for the resulting finite-horizon and infinite-horizon suboptimal policies. In Chapter 5, we study the exponential stabilization problem for discrete-time switched linear systems. As a special case of the results in Chapter 3, we show that a switched linear system is exponentially stabilizable if and only if certain finite-horizon approximate value function of a related DSLQR problem is a control-lyapunov function. By our DSLQR results, we also know that this control-lyapunov function must be piecewise quadratic. Such a converse control-lyapunov function theorem justifies many of the earlier controller synthesis methods that have adopted piecewise quadratic Lyapunov functions for convenience or heuristic reasons. In addition, it is also proved that if a switched linear system is exponentially stabilizable, then it must be stabilizable by a stationary suboptimal policy of a related DSLQR problem. Based on the relaxation strategy proposed in Chapter 4, an efficient algorithm is proposed,

20 7 which is guaranteed to yield a stabilizing policy whenever the system is exponentially stabilizable.

21 8 2. UNDISCOUNTED APPROXIMATE DYNAMIC PROGRAMMING The classical dynamic programming method suffers greatly from the curse of dimensionality. The approximate dynamic programming (ADP) approach tackles this computational challenge by introducing some relaxation errors in representing the value functions and in computing the value iterations. Most previous results on ADP require a finite discrete state space and a discount factor strictly less than 1. In this chapter, we develop a more general ADP framework without these assumptions. We use a switched nonlinear system to model the underlying dynamical system because of its diverse applications in various engineering fields, such as power electronics [35 37], embedded systems [38,39], manufacturing [40], and communication networks [41], etc. 2.1 Problem Formulations Consider the following discrete-time time-invariant constrained switched nonlinear system: x(t + 1) = f v(t) (x(t), u(t)), t Z +, (2.1) where x(t) X R n is the continuous state, u(t) U R p is the continuous control, v(t) M {1..., M} is the discrete control that determines the discrete mode at time t. The sets X, U and M are called the (continuous) state space, the continuouscontrol space and the discrete-control (or switching-control) space, respectively. It is assumed that both X and U are connected and contain the origin in their interiors. For each time t Z +, the pair (u(t), v(t)) is called a hybrid-control action at time t and is constrained to take values in an nonempty subset Γ(x(t)) of U M, which in general may depend on the current state value x(t). The control constraint set Γ( )

22 9 is assumed to be time-invariant and is completely determined by the location of the state. The state space X is assumed to be (constrained) control invariant, namely, for any z X, there exists a hybrid-control action (u, v) Γ(z) such that f v (z, u) X. We also assume that (0, i) Γ(0) for at least one i M and that Γ( ) is consistent with X in the sense that f v (z, u) X for all (u, v) Γ(z) and all z X. For each i M, the mapping f i : R n R p R n is called a subsystem and is assumed to have an equilibrium point at the origin, i.e., f i (0, 0) = 0. For later reference, the above assumptions on system (2.1) are listed below. Assumption Assumptions on System (2.1): 1. X and U are connected and contain the origin in their interiors; 2. X is control invariant with respect to Γ, i.e., z X, (u, v) Γ(z) such that f v (z, u) X; 3. Trivial continuous control is allowed at the origin, i.e., (0, i) Γ(0) for at least one i M; 4. f v (z, u) X, for all (u, v) Γ(z) and all z X; 5. f i (0, 0) = 0, for all i M. Remark In the above assumption, one can obtain condition 4 from the rest. For example, if Γ satisfies all the conditions but condition 4, then the set ˆΓ(z) = Γ(z) {(u, v) U M : f v (z, u) X }, z X, satisfies all the conditions. Remark With a properly chosen constraint set Γ, system (2.1) can represent not only general discrete-time switched nonlinear systems, but also a large subclass of discrete-time hybrid systems [42,43]. The most general way of making a control decision is through a time-dependent (state-feedback) hybrid-control law, namely, a time-dependent function ξ t (µ t, ν t ) : X U M that maps the continuous state x(t) to a hybrid-control action ξ t (x(t))

23 10 Γ(x(t)). Here, µ t : X U and ν t : X M are called the (state-feedback) continuouscontrol law and the (state-feedback) switching-control law, respectively, at time t Z +. A sequence of hybrid control laws constitutes an infinite-horizon feedback policy: π {ξ 0, ξ 1,...,...}. For any finite integer N Z +, an N-horizon policy is defined similarly as: π N {ξ 0,...,ξ N 1 }. Denote by Π and Π N the sets of all the infinitehorizon and finite-horizon policies, respectively. If system (2.1) is driven by a feedback policy π, then the closed-loop dynamics is governed by x(t + 1) = f νt(x(t)) (x(t), µ t (x(t))), t Z +. (2.2) Let {x(t; z, π )} t Z+ be the closed-loop trajectory driven by π with initial condition z X, and let {(u(t; z, π ), v(t; z, π ))} t Z+ be the corresponding hybrid-control sequence. The performance of an infinite-horizon feedback policy π starting from an initial state z X can be quantified by the following cost function: J(z, π ) = L(x(t; z, π ), u(t; z, π ), v(t; z, π )), (2.3) t=0 where L : X U M : R +, satisying L(0, 0, v) = 0 for all v M, is called a running cost function. If some finite horizon is of interest, the cost function usually involves a nontrivial terminal cost penalizing the terminal state being outside certain desired set. Denote by ψ : X R + the terminal state cost. Then the cost associated with an N-horizon policy π N with initial state z X is given by: J(z, π N ) = ψ(x(n; z, π N )) + N 1 t=0 L(x(t; z, π N ), u(t; z, π N ), v(t; z, π N )). (2.4) With these notations, the finite-horizon and infinite-horizon optimal control problems associated with system (2.1) are stated below.

24 11 Problem For a given positive integer N, find an N-horizon feedback policy that solves the following optimal control problem: V N (z) = inf π N Π N J(z, π N ), z X. (2.5) Problem Find an infinite-horizon feedback policy that solves the following constrained optimal control problem: V (z) = inf π Π J(z, π ), z X. (2.6) The functions V N and V will be referred to as the N-horizon and the infinitehorizon value functions, respectively. Since we have assumed that Γ(z) is nonempty and compatible with X for all z X, V N is finite everywhere. However, V might be infinite at some point in X. 2.2 Value Iteration Operator In this section, we introduce an important concept of dynamic programming, namely, the value iteration. To simply our notation, the following two functional spaces are introduced. G := {g : X R {± }}, G + := {g : X R + {+ }}. Definition (Value Iteration Operator T ) The operator T : G G is called the value iteration operator associated with system (2.1) and is defined by: T [g](z) = min {L(z, u, v) + g (f v(z, u))}, z X, g G. (u,v) Γ(z) We denote by T N the composition of the mapping T with itself N times, i.e., T N+1 [g] = T [T N ] for all N Z + and g G.

25 12 Definition (Operator T ξ ) For any function g G and any hybrid-control law ξ = (µ, ν) : X U M, with z X ξ(z) Γ(z), the operator T ξ : G G associated with system (2.1) is defined by: T ξ [g](z) = L(z, µ(z), ν(z)) + g ( f ν(z) (z, µ(z)) ), z X, g G. (2.7) Remark Notice the subtle difference between the hybrid-control law (µ, ν) used Definition and the hybrid-control value (u, v) used in Definition With these definitions, some key results about dynamic programming are summarized in the following lemma. Lemma The finite-horizon and the infinite-horizon value functions satisfy the following properties. 1. (Value Iteration): V N = T N [ψ], for all N Z + ; 2. (Bellman Equation): V G + is a fixed point of T, i.e., T [V ] = V ; 3. (Minimum Property): For any g G +, if g T [g], then g V ; 4. (Monotonicity): If ψ 0, then V 0 V 1 V 2 V ; 5. (Stationary Optimal Policy): A stationary policy π = {ξ, ξ,..., } is optimal if and only if T ξ [V ] V. It is worth mentioning that with a nontrivial terminal cost ψ, the pointwise limit of V N as N may not always exist. Furthermore, even when the limit does exist, it may not always coincide with the infinite-horizon value function V as demonstrated in the following example. Example Consider the DP problem with one subsystem x(t+1) = 2x(t)+u(t), trivial state constraint X = R, nontrivial control constraint U = (0, ), running cost L(z, u) = z + u and total N-horizon cost J N (z, u) = N 1 t=0 L(x(t; z, u), u(t)), where x( ; z, u) is the closed-loop trajectory starting from z R under the control sequence

26 13 u. The N-horizon value function is: V N (z) = inf u U N J N (z, u). Clearly, V 0 (z) = 0 and V N (z) = (2 N 1) z, for all z [0, ). Thus for any N 0, V N (0) = 0. Hence, we know that lim N V N (0) exists and is also 0. However, the infinite-horizon value function V (z) = inf u U t=0 L(x(t; z, u), u(t)) = for all z [0, ). Thus lim N V N (0) V (0). 2.3 Main Assumption Central to the study of (approximate) dynamic programming is the convergence of the (approximate) value functions. The stability of the closed-loop trajectory driven by the (sub)optimal policy is also necessary for most applications. To ensure these two properties, the following assumption is needed. Assumption For any z X, (i) V β + V z, for some β+ V < ; (ii) L(z, u, v) β L z for some β L > 0 and all (u, v) U M; (iii) ψ(z) β+ ψ z for some β + ψ < ; (iv) ξ k such that T ξ k [V k](z) = V k (z), k Z +. Remark The symbol here is a short-hand notation for an arbitrary L p - norm raised to an arbitrary order of power r R, namely, = r p, where p denotes the L p norm in R n. Several additional remarks about Assumption are in order. Condition (i) is essentially an exponential stabilizability condition. It will be proved in Chapter 3 that with properly chosen running cost function L, system (2.1) is exponentially stabilizable if and only if condition (i) holds. On the other hand, condition (ii) can be viewed as an observability condition in the sense that any nonzero state x(t) will incur a cost larger than β L x(t). This condition is crucial for proving the stability of the (sub)optimal trajectory. Condition (iii) is needed to make the finite-horizon value functions comparable witht the infinite-horizon one V. The last condition is included mainly for simplifying our discussion. Without this condition, the main results, Theorems 2.4.4, and 2.7.1, still hold, however, their proofs would become much more technical.

27 14 Throughout this chapter, we shall always regard β + V, β L and β+ ψ introduced in Assumption 2.3.1, and denote by ξ k as the constants the optimal control law generated by V k, i.e. it satisfies condition (iv) of Assumption Notice that superscript indicates that the law is generated by the exact optimal value function V k instead of certain approximation of it as will be introduced in Section 2.5. In addition, the subscript k of the notation ξ k denotes specifically the horizon of the value function generating this control law, instead of the time instant when the control law ξ k is applied. Define the N-horizon optimal polices as πn := {ξ N 1,..., ξ 0 }. In other words, the policy πn applies the hybrid-control law ξ N t 1 at time t, for t = 0,..., N 1. We finish this section with an immediate consequence of Assumption Lemma Under Assumption 2.3.1, β L z V N(z) β + z, for all N Z ++ and z X, where β + = β+ V (1 + β+ ψ /β L ). Proof Fix an arbitrary z X and δ > 0. Let π be a policy such that J(z, π ) V (z) + δ. Let ˆx(t) = x(t; z, π ) and denote by (û( ), ˆv( )) the corresponding hybridcontrol sequence. The result V N (z) β L z follows from the fact that V N L(, u, i) for all N Z ++ and (u, i) U M. Furthermore, ˆx(t) 1 β L(ˆx(t), û(t), ˆv(t)) 1 L β V (ˆx(t)) 1 L β L V (z) β+ V β z, t Z ++ L Then, by the optimality of V N, we have V N (z) N 1 t=0 L(ˆx(t), û(t), ˆv(t))+ψ(ˆx(N)) β + z, for all N Z Exponential Convergence of Value Iteration The convergence of the value iterations is the most desired property of a dynamic programming algorithm. The limiting function of the value iterations can be used to obtain a (sub)optimal policy with a performance close to the optimal one. Most classical results on the convergence of value iterations assume either ψ V or the cost is discounted by a factor strictly less than 1. In this section, we present a

28 15 convergence result without any of these assumptions. Our strategy is to first establish the stability of the optimal trajectories and then use this stability result to show the convergence of the value iteration Exponential Stability of the Optimal Trajectory The stability results given in this subsection are based on the following (timevarying) Lyapunov function theorem. Theorem (Time-Varying Lyapunov Theorem) Let π N be an N-horizon policy for some N Z + { }. If there exists constants κ 1, κ 2, κ 3 (0, ), and a function g : X {1,..., N} R + such that 1. κ 1 z g(z, t) κ 2 z, z X and t = 0,...,N, 2. g(x(t; z, π N ), t) g(x(t + 1; z, π N ), t + 1) κ 3 x(t; z, π N ), z X and t = 0,..., N 1. Then the closed-loop trajectory x( ; z, π N ) satisfies x(t; z, π N ) κ ( ) t 2 1 z, z X and t = 0,...,N. κ κ 3 /κ 2 Proof Fix an arbitrary z X. For simplicity, let ˆx(t) = x(t; z, π N ) and ĝ t = g(ˆx(t), t). By the assumptions, we have for t = 0,..., N 1, Hence, ĝ t κ 3 /κ 2 ĝ t ĝ t ĝ t+1 κ 3ˆ x(t) κ 3 κ 2 ĝ t κ 3 κ 2 ĝ t+1. ( 1 1+κ 3 /κ 2 ) t+1 ĝ0, which implies the desired result. The exponential stability of the closed-loop system driven by a stationary policy can be checked based on a special version of Theorem as stated below.

29 16 Theorem (Lyapunov Theorem) For any hybrid-control law ξ = (µ, ν), if there exists constants κ 1, κ 2, κ 3 (0, ), and a function g G + such that for all z X, κ 2 z g(z) κ 1 z, g(z) g(f ν(z) (z, µ(z))) κ 3 z. Then the stationary policy π := {ξ,..., } is exponentially stabilizing with x(t; z, π ) κ ( ) t 2 1 z, z X and t Z +. κ κ 3 /κ 2 Proof The result follows as a special case of Theorem (2.8) We usually refer to the functions satisfying the conditions in the above two theorems as Lyapunov functions of the closed-loop system (2.2), or control-lyapunov functions of the open-loop system (2.1). Definition (TVECLF) A function g G + is called a Time-Varying Exponentially Stabilizing Control-Lyapunov Function (TVECLF) of system (2.1) with a stabilizing policy π N, if g and π N jointly satisfy the conditions in Theorem Definition (ECLF) A function g G + is called an Exponentially Stabilizing Control-Lyapunov Function (ECLF) of system (2.1) with a stabilizing policy π, if g and π jointly satisfy the conditions in Theorem The existence of (TV)ECLFs guarantees the exponential stabilizability of system (2.1). This result can be used to prove the stability of the closed-loop system driven by the optimal policies. Theorem (Stable Optimal Trajectories) Under Assumption 2.3.1, the N- horizon optimal trajectories satisfy: { x(t; z, π N ) β+ ( 1 β L 1+β 1 L /β + 1 ) t z, for t = 0,...,N 1.

30 17 Proof Fix an arbitrary z X. Let ˆx(t) = x(t; z, πn ), t = 0,..., N and let (û( ), ˆv( )) the corresponding optimal hybrid control sequence. Define g(ˆx(t), t) := J(ˆx(t), π t N ), for t = 0,...,N, where π t N denotes the last N t steps of the policy π N. Clearly, g(ˆx(t), t) β L 1 ˆx(t) for t = 0,...,N 1 as J(ˆx(t), π t N ) must be larger than the one-step running cost. By the optimality of πn, we have g(ˆx(t), t) = J(ˆx(t), πn ) β+ ˆx(t) for all t = 0,...,N. Furthermore, g(ˆx(t), t) g(ˆx(t + 1), t + 1) = L(ˆx(t), û(t), ˆv(t)) β L 1 ˆx(t), for t = 0,...,N 1. Therefore, g is a TVECLF with a stabilizing policy π N result follows from Theorem over the horizon [0, N 1] and the desired Notice that without further assumptions on the terminal cost ψ, the upper bound in the second inequality of the above theorem may not hold for the terminal state x(n; z, πn ). In fact, the terminal state of the optimal trajectory may be arbitrarily large if ψ 0. Example Consider the scalar linear system x(t + 1) = x(t) + u(t), for t = {0, 1,..., N 1}, with X = U = R. Let L(x, u) = x 2, ψ 0. Fix an initial state x(0) = z. Then it can be easily verified that the N-horizon control sequence of the form { z, 0,...,0, c} is optimal for all c R. Therefore, the terminal state of the corresponding optimal trajectory is equal to c and can be made arbitrarily large Convergence of Value Iteration The following lemma provides a bound on the difference between two value functions with different horizon lengths. Lemma Under Assumption 2.3.1, for all z X, N 1 and k 1, V k ( x(n; z, π N+k ) ) ψ ( x(n; z, π N+k )) V N+k (z) V N (z) V k+1 (x(n 1; z, π N)). (2.9)

31 18 Proof Fix an arbitrary z X. Let ˆπ N+k be the policy that agrees with πn for the first N 1 steps and agrees with πk+1 for the last k + 1 steps. By the optimality of V N+k (z), we have V N+k (z) J(z, ˆπ N+k ) = V N (z) + V k+1 (x(n 1; z, πn)) ψ(x(n 1; z, πn)) V N (z) + V k+1 (x(n 1; z, πn )), which is exactly the second desired inequality. To prove the first one, let π N be the policy that agrees with πn+k for the first N steps. By Bellman s principle of optimality, we have V N+k (z) = J(z, π N ) ψ ( x(n; z, πn+k) ) ( + V k x(n; z, π N+k ) ). Then, the first inequality follows directly from the fact that V N (z) J(z, π N ). With a nontrivial terminal cost, the N-horizon value function V N may not be monotone as N increases. Nevertheless, by Lemma 2.4.1, the difference between V N+k (z) and V N (z) can be bounded by the quadratic functions of x(n 1; z, π N ) and x(n; z, πn+k ). By Theorem 2.4.3, we know both quantities converge to zero as N grows to infinity. This will guarantee that by choosing N large enough, the upper and lower bounds in (2.9) can be made arbitrarily small. The convergence of the value iteration can thus be established. Theorem (Convergence of value iteration) Under Assumption 2.3.1, V N converges to V exponentially fast according to V N+k (z) V N (z) α V γ N V z, k, N Z +, z X, (2.10) where α V := β+ (max{β L,β+ ψ }+β+ ) 1 and γ β V :=. L 1+β 1 L /β + 1 Proof Fix an arbitrary z X. By Theorem (2.4.3) and Lemma 2.4.1, we have ( ) V N+k (z) V N (z) (β+ )2 β + β γ N 1 V = + β L β + L β γv N z, L

32 19 and V N (z) V N+k (z) ( ) β + + β + ψ β + β L γ N V z. Combining the above two inequalities yields inequality (2.10). Inequality (2.10) implies that the limit of V N as N exists. Let V (z) = lim N V N (z), z X. It remains to show that V V. By the optimality of V, we have V (z) V N (z) ψ(x(n; z, π N )) + V (x(n; z, π N )) for all N Z +. Note that ψ(x(n; z, πn )) β+ ψ x(n; z, π N ), V (x(n; z, πn )) β+ V x(n; z, π N ), and by Theorem 2.4.3, x(n; z, π N ) 0 as N. Therefore, V (z) V (z). To prove the other direction, notice that V (z) = inf π Π s J π (z, π ), where Π s denotes the set of all the infinite-horizon stabilizing policies. Let π be an arbitrary policy in Π s and let ˆx( ) and (û( ), ˆv( )) be the corresponding trajectory and the hybrid-control sequence, respectively. Since ˆx(t) 0 as t, for any ǫ > 0, there always exists an N 1 such that ψ(ˆx(t)) ǫ for all t N 1. Hence, for all N N 1, V N (z) N 1 t=0 L(ˆx(t), û(t), ˆv(t)) + ψ(ˆx(n)) N 1 L(ˆx(t), û(t), ˆv(t)) + ǫ J π (z) + ǫ. t=0 Let N, we have V (z) J π (z) + ǫ, π Π s. Thus, V (z) V (z) + ǫ, which implies V (z) V (z) because ǫ is arbitrary. It is worth mentioning a special case of Theorem with ψ 0. Corollary Under Assumption with ψ 0, V N V exponentially fast with ( 0 V N+k (z) V N (z) β+ V β + V + ) β L β γv N z, z X, k Z +, N Z +, L where γ V is the constant defined in Theorem Proof The result follows directly from Theorem by noticing that β + and V k V k+1, for all k Z +, when ψ 0. = β+ V

33 Approximation in Dynamic Programming When solving a practical dynamic programming problem, it is often impossible or numerically intractable to obtain the exact value functions and the corresponding optimal policy. In such cases, numerical relaxations or approximations are introduced, either explicitly or implicitly, to obtain a suboptimal solution. Two types of numerical relaxations may occur during the value iteration. The first one is called the computation relaxation, which accounts for the errors introduced during the solution of the optimization problem associated with the value iteration operator T. The second one is called the representation relaxation, which accounts for the errors introduced for simplifying the representation of the value function. We now illustrate the two types of relaxations through an example. Consider the problem of finding the value function at the next step T [V N ] starting from a known current value function V N. If the state space is continuous and V is a highly nonlinear function, we usually can only numerically compute T [V N ] at some grid points of the state space, which will definitely incur some numerical error. The error due to the state-space griding is viewed as a representation relaxation, while the error associated with the numerical optimization at each grid point is regarded as a computation relaxation. Usually, we can fully control the degree of the relaxations. For example, we can choose freely the fineness of the grids as well as the convergence criteria for terminating the optimization algorithm. Therefore, by tuning certain relaxation parameters, a tradeoff can be obtained between the accuracy and the complexity of the value iteration. To systematically study the impact of the numerical relaxations on the overall dynamic programming algorithm, we introduce some definitions. Definition (Error Function) A nonnegative function E G + is called an error function of an ADP problem if E(z) < for all z X and E(0) = 0. Notice that the error function can be thought of as a scaling profile for the maximal allowable relaxation errors over the entire state space and need not be small on its

34 21 own. The following assumption on the error function is frequently used in the rest of this Chapter. Assumption E cl(, u, v) for some c > 0 and all (u, v) U M. Definition (Relaxation Operator) For any ǫ 0, a mapping R ǫ : G G is called a relaxation operator (with parameter ǫ and error function E G + ) if there exists an invariant space G IV G such that 1. ψ G IV ; 2. R ǫ [g] G IV and T [g] G IV, for all g G IV ; 3. R ǫ [g](z) g(z) ǫe(z), for all z X and g G IV. Definition (Relaxed Value Iteration) Let R ǫ and T be defined in Definitions and 2.2.1, respectively. The composite mapping R ǫ T : G G is called an ǫ-relaxed value iteration, or approximate value iteration, with error function E. Similar to generating the value functions through the value iteration, one can obtain a sequence of approximate value functions using the relaxed value iteration. Denote by {V ǫ k } k Z + the ǫ-relaxed value functions obtained by: V ǫ 0 = ψ, and V ǫ k+1 = R ǫ T [V ǫ k ], for k Z +. (2.11) Denote by ξk ǫ := (µǫ k, νǫ ǫ k ) the hybrid-control law associated with Vk satisfying T [V ǫ k ] T ξ ǫ k [V ǫ k ] T [V ǫ k ] + ǫe. (2.12) In some special cases, one may easily compute the control law ξ ǫ k such that T [V ǫ k ] T ξ ǫ k [Vk ǫ ]. However, in general, it is seldom possible to solve the optimization problem associated with T exactly. Such a situation is handled by the relaxation term ǫe on the right hand side of (2.12). Therefore, with ǫ > 0, it is reasonable to assume the existence of a control law ξk ǫ satisfying (2.12).

35 Performance of Relaxed Policy in Finite Horizon Define πn ǫ := {ξǫ N 1,...,ξǫ 0}. Clearly, with ǫ = 0, we have J(, πn 0 ) J(, π N ). The goal of this section is to derive an upper bound for the cost of π ǫ N with a nontrivial ǫ. We first need to bound the difference between the approximate value function V ǫ k and the exact value function V k. Theorem Under Assumption 2.5.1, we have V ǫ k (1 + ǫc) V k for all ǫ 0 and (1 ǫc) V k V ǫ k for all ǫ [0, 1/c]. Proof The result clearly holds for k = 0 because V ǫ 0 V 0. Suppose it is true for some k Z + ; we shall show it holds for k + 1. For a fixed z X, we have Hence, T [Vk ǫ ǫ ](z) = inf {L(z, u, v) + Vk (f v (z, u))} (u,v) Γ(z) inf {L(z, u, v) + V k (f v (z, u)) + ǫcv k (f v (z, u))}. (u,v) Γ(z) V ǫ k+1(z) = R ǫ T [V ǫ k ](z) T [V ǫ k ](z) + ǫe(z) inf (u,v) Γ(z) (1 + ǫc)v k+1 (z). {(1 + ǫc) (L(z, u, v) + V k (f v (z, u)))} Similarly, we can show that V ǫ k+1 (z) (1 ǫc)v k+1(z) for all ǫ [0, 1/c]. The desired result then follows by induction. With this theorem, we can bound J(, πn ǫ ) in terms of the closed-loop trajectory x( ; z, π ǫ N ). Lemma Under Assumption 2.5.1, J(z, π ǫ N ) (1 + ǫc)v N(z) + N 1 t=0 2ǫE(x(t; z, πn ǫ )), z X, and ǫ 0

36 23 Proof Fix an arbitrary z X. Let ˆx( ) be the closed-loop trajectory driven by π ǫ N with initial state z and let (û( ), ˆv( )) be the corresponding hybrid-control sequence. Define Ṽ k+1 ǫ T ξk ǫ[v k ǫ ], for k = 0,..., N 1. N 1 J(z, πn ǫ ) = ψ(ˆx(n)) + t=0 N 1 = ψ(ˆx(n)) + t=0 Ṽ N ǫ ǫ (z) + ψ(ˆx(n)) V L(ˆx(t), û(t), ˆv(t)) ] ǫ [Ṽ N t(ˆx(t)) VN (t+1)(ˆx(t ǫ + 1)) N 1 0 (ˆx(N)) + t=1 [Ṽ ǫ N t (ˆx(t)) V ǫ N t (ˆx(t)) ]. Notice that ψ V ǫ 0, V ǫ N t R ǫ T [V ǫ N (t+1) ] and by (2.12), Ṽ ǫ N t T[V ǫ N (t+1) ]+ǫe. Hence, Ṽ N t ǫ T[V N (t+1) ǫ ] + 2ǫE and J(z, π ǫ N) Ṽ ǫ N(z) + N 1 t=1 (1 + ǫc)v N (z) + 2ǫE(ˆx(t)) N 1 t=0 2ǫE(ˆx(t)). The above upper bound depends on the closed-loop trajectory x( ; z, πn ǫ ). The following lemma provides an upper bound for x( ; z, πn ǫ ), which decays to zero exponentially fast. Lemma Suppose that Assumptions and hold. Then, with ǫ < 1/c, we have where α ǫ := x(t; z, π ǫ N) α ǫ γ t ǫ z, for t = 0,...,N, β+ and γ β L (1 ǫc) ǫ := β + (1+ǫc) [β + (1+ǫc)+β L ](1 ǫc).

37 24 Proof VN t ǫ (ˆx(t)) V N (t+1) ǫ (ˆx(t + 1)) Ṽ N t ǫ (ˆx(t)) V N (t+1) ǫ (ˆx(t + 1)) ǫe(ˆx(t)) = L(ˆx(t), û(t), ˆv(t)) ǫe(ˆx(t)) β L ˆx(t) ǫe(ˆx(t)) β L β + V N t (ˆx(t)) ǫe(ˆx(t)) β L β + (1 + ǫc) V N (t+1) ǫ (ˆx(t + 1)) ǫe(ˆx(t)). β L β + (1 + ǫc) V ǫ N t(ˆx(t)) ǫe(ˆx(t)) Hence, ( VN (t+1)(ˆx(t ǫ β ) 1 ( L + 1)) 1 + V ǫ β + (1 + ǫc) N t(ˆx(t)) + ǫe(ˆx(t)) ) β + (1 + ǫc) ǫ [β + (1 + ǫc) + β L ](1 ǫc)v N t(ˆx(t)) ( ) β + t+1 (1 + ǫc) [β + (1 + ǫc) + β Vt ǫ L ](1 ǫc) (z) ( ) β + β + t+1 (1 + ǫc) [β + (1 + ǫc) + β z. L ](1 ǫc) Therefore, ˆx(t) 1 V β N t (ˆx(t)) α ǫ γǫ z. t L Combining Lemmas and yields the main result of this section. Theorem Under the same conditions and notations as in Lemma 2.6.2, we have J(z, π ǫ N ) V N(z) + ǫcβ + ( 1 + 2α ) ǫ z, z X, 1 γ ǫ where α ǫ and γ ǫ are the constants defined in Lemma Proof The result follows directly from Lemma and Two points about Theorem are worth mentioning. Firstly, by reducing the relaxation parameter ǫ, the performance of the relaxed policy π ǫ N can be made arbitrarily close to the optimal cost V N (z). Secondly, the error bound does not depends on the horizon length N, implying the existence of an ǫ that works uniformly well for all the horizon lengths.

38 Performance of Relaxed Policy in Infinite Horizon The optimal infinite-horizon control law ξ is the one that satisfies the Bellman equation T ξ [V ] = V. The computation of this law requires the knowledge of V and needs to solve the optimization problem associated with T, both of which are often numerically challenging. A natural way to tackle these computational issues is to use ξ ǫ k in place of ξ and construct a stationary infinite-horizon policy π ǫ,k := {ξ ǫ k, ξǫ k,...}. The goal of this subsection is to establish conditions under which the policy π ǫ,k a suboptimal performance. Following a similar strategy as in Section 2.6, we first derive an upper bound for J(, π ǫ,k ) in terms of the closed-loop trajectory driven by πǫ,k. Lemma For each k Z +, ǫ 0 and z X, the cost of the policy π ǫ,k bounded by: J(z, π ǫ,k ) V (z) + ( [ 3ǫcβ + + α ) ] V γ k+1 V z + (1 + 3ǫc)β + ˆx(t), where α V and γ V are the constants defined in Theorem t=1 has is upper

39 26 Proof Fix an arbitrary z X, let ˆx(t) = x(t; z, π ǫ,k ) for t Z +, and let (û( ), ˆv( )) be the corresponding hybrid-control sequence. Then, J(z, π ǫ,k ) = L(ˆx(t), û(t), ˆv(t)) = t=0 T ξ ǫ k [Vk ǫ ](z) + t=1 T ξ ǫ k [Vk ǫ ](ˆx(t)) V k ǫ (ˆx(t + 1)) t=0 T ξ ǫ k [V ǫ k ](ˆx(t)) V ǫ k (ˆx(t)) [ ] T [Vk ǫ ](z) + ǫe(z) + T [Vk ǫ ](ˆx(t)) V k ǫ (ˆx(t)) + ǫe(ˆx(t)) t=1 [ Vk+1 ǫ (z) + 2ǫE(z) + t=1 (1 + ǫc)(v (z) + αγ k+1 z ) + 2ǫE(z) [ ] + (1 + ǫc)β + ˆx(t) + 2ǫE(ˆx(t)) t=1 V ǫ k+1 (ˆx(t)) V ǫ k (ˆx(t)) + 2ǫE(ˆx(t)) ] V (z) + ( [ ] 3ǫcβ + + αγk+1) z + (1 + 3ǫc)β + ˆx(t). t=1 The above lemma indicates that the stability of the closed-loop trajectory is necessary for the boundedness of J(, π ǫ,k ). Under Assumption 2.3.1, πǫ,k to be exponentially stabilizing for sufficiently large k and sufficiently small ǫ. is guaranteed Lemma Suppose Assumptions and holds. Then there exist constants ˆk Z +, ˆǫ > 0 and κ 3 > 0 such that for all k ˆk and ǫ ˆǫ, the approximate value function V ǫ k is an ECLF satisfying V ǫ k (z) V ǫ k (f ν ǫ k (z) (z, µ ǫ k (z)) κ 3 z, z X, (2.13) and the closed-loop trajectory driven by π ǫ,k ( x(t; z, π ǫ,k ) β+ (1 + ǫc) (1 ǫc) β L satisfies κ 3 β + (1+ǫc) ) t z, z X, t Z +.

40 27 Proof Fix an arbitrary z X and k Z ++. Let (û, ˆv) = ξk ǫ (z). By Lemma and Theorem 2.6.1, β ǫ L (1 ǫc) z Vk β+ (1 + ǫc) z. Furthermore, by Theorem and Theorem 2.6.1, we have that, Hence, V ǫ k (z) (1 ǫc)v k (z) (1 ǫc)(v k+1 (z) αγ k z ) 1 ǫc 1 + ǫc V ǫ k+1(z) (1 ǫc)αγ k z 1 ǫc 1 + ǫc T [V ǫ k ](z) (1 ǫc)αγk z 1 ǫc 1 + ǫc ǫe(z). V ǫ k (z) V ǫ k (fˆv (z, û)) T [Vk ǫ ](z) Vk ǫ (fˆv (z, û)) 1 ǫc 1 + ǫc ǫe(z) (1 ǫc)αγk z 2ǫc 1 + ǫc T [V k ǫ ](z) T ξ ǫ k [Vk ǫ ](z) V k ǫ (fˆv(z, û)) 1 ǫc 1 + ǫc ǫe(z) (1 ǫc)αγk z 2ǫc 1 + ǫc T [V k ǫ ](z) ǫe(z) β 1 ǫc L z 1 + ǫc ǫcβ+ z (1 ǫc)αγk z 2ǫc 1 + ǫc (1 + ǫc)β+ z ǫcβ+ z. Clearly, for sufficiently small ǫ and sufficiently large k, the right-hand side can be made larger than κ 3 z for some κ 3 > 0. In this case, V ǫ k with a stabilizing policy π ǫ,k becomes an ECLF of system (2.1) and the desired result follows from Theorem Theorem (Bound for J(, π ǫ,k )) Under Assumptions and 2.5.1, there exist ǫ > 0, k < and η(ǫ, k) < such that J(z; π ǫ,k ) V (Z) + η(ǫ, k) z, z Z +. Furthermore, η(ǫ, k) 0 as k and ǫ 0. Proof The result follows directly from Lemma and Lemma

41 28 A detailed expression for the bound η(ǫ, k) can be derived based on Lemma and Lemma This expression can often be dramatically simplified when a particular ADP problem is being considered. It is worth mentioning that the worst-case scenario is assumed throughout the derivation of this section, making the bound η(ǫ, k) overly conservative for some applications. On the other hand, the convergence property of η(ǫ, k) is of fundamental importance. It shows that Assumptions and are the right conditions for the ADP algorithm to work properly in the sense that with sufficiently small numerical relaxation ǫ, the ǫ-relaxed value iteration can eventually yield a policy with any predefined suboptimal performance.

42 29 3. EXPONENTIAL STABILIZATION USING APPROXIMATE DYNAMIC PROGRAMMING In this chapter, we study the exponential stabilization problem of system (2.1). Different from most earlier studies, our strategy is to establish an equivalent connection between the exponential stabilization problem and a properly-defined infinite-horizon optimal control problem. Such a connection allows us to solve the stabilization problem by solving the corresponding optimal control problem using the approximate dynamic programming (ADP) framework developed in the last chapter. Notice that, unless otherwise stated, all the notations in the last Chapter will have the same meaning in the current one. 3.1 Stabilization Problems We consider a general discrete-time switched nonlinear system described by x(t + 1) = f v(t) (x(t), u(t)), t Z +, (3.1) with continuous state space X, continuous control space U, discrete control space M and control constraint set Γ. See Section 2.1 for detailed definitions of these spaces. We also assume system (3.1) satisfies Assumption Using the same notations as in Section 2.1, the mapping ξ t = (µ t, ν t ) : X U M with z X (u, v) Γ(z) is called a hybrid-control law and a sequence of hybrid-control laws constitutes an infinite-horizon policy π = {ξ 0, ξ 1,...}. The policy that consists of the same control law at each time t is called a stationary policy. The set of all infinite-horizon policies is denoted by Π. The closed-loop trajectory driven by a policy π = {(µ t, ν t } t Z+ with initial condition z X evolves according to x(t + 1) = f νt(x(t))(x(t), µ(x(t))), t Z +. (3.2)

43 30 Denote by {x(t; z, π )} t Z+ the closed-loop trajectory driven by π with initial condition z X, and let {(u(t; z, π ), v(t; z, π ))} t Z+ be the corresponding hybrid-control sequence. Let be the shorthand notation for r p with p Z ++ and r R, where p denotes the L p norm in R n. In this chapter, we will focus on two types of stabilization problems. We first give a formal definition of the exponential stability of the closed-loop system. Definition (Exp. Stable) The origin of system (3.2) is called (globally) exponentially stable if there exist constants b 1 and 0 < a < 1 such that x(t) ba t z, t Z +, z X. Definition (Exp. Stabilizable) The system (3.1) is called (globally) exponentially stabilizable if there exists a feedback policy π under which the closed-loop system (3.2) is (globally) exponentially stable. Such a policy π is called exponentially stabilizing. Various definitions for the exponential stabilizability are used in the control literature. These definitions differ from one to another mainly in their assumptions on the stabilizing polices. For example, some definition may require the stabilizing policy to be stationary, and/or has a continuous-control law that varies continuously (or even smoothly) with respect to the system state. Definition is the most general definition on the exponential stabilizability in the sense that it does not impose any additional constraints on the stabilizing policy. However, in practice, one may be only interested in the stabilizing policies that possesses some nice properties. In the following, we introduce a less general stabilizability definition which requires the total continuous-control energy to be bounded. Definition (Exp. Stabilizable With Finite Energy) The system (3.1) is called (globally) exponentially stabilizable with finite energy if there exists a policy π under which the closed-loop system (3.2) is (globally) exponentially stable and the total control energy satisfies that t=0 u(t; z, π ) β u z, for some β u < and all z X.

44 31 Definition and Definition lead to the followign two different exponential stabilization problems. Problem Find an infinite-horizon policy that satisfies the conditions in Definition Problem Find an infinite-horizon policy that satisfies the conditions in Definition The rest of this chapter is devoted to solving the above two problems through the solutions of two properly-defined optimal control problems using the ADP approach. 3.2 Solution via Approximate Dynamic Programming Following the notations of Chapter 2, let L : X U M R + be a running cost function satisfying L(0, 0, v) = 0 for all v M. The cost associated with any infinite-horizon policy π is defined by J(z, π ) = L(x(t; z, π ), u(t; z, π ), u(t; z, π )), z X. t=0 Let T be the value iteration operator defined in Definition and define the k-horizon value function (with trivial terminal cost) by V k = T k [0], for k Z +. Let R ǫ be a relaxation operator defined in Definition with parameter ǫ 0 and error function E. Assume that E verifies Assumption with a constant c <, i.e., E cl, and assume that there exists a control law ξk satisfing V k+1 = T [ξk ] for all k Z +. For each ǫ 0, define the approximate value functions {V ǫ k } k Z + recursively according to V ǫ 0 = 0, and V ǫ k+1 = R ǫ T [V ǫ k ], for k Z +. This iteration is a special case of (2.11) with ψ 0. For each ǫ 0 and k Z +, let ξ ǫ k be a relaxed hybrid-control law (generated by V ǫ k ) satisfying T [V ǫ k ] T ξ ǫ k [V ǫ k ] T [V ǫ k ] + ǫe,

45 32 and define π ǫ,k = {ξǫ k, ξǫ k,...}. We shall show that with a properly-chosen running cost function, the policy π ǫ,k will be a solution to Problem or for sufficiently large k and sufficiently small ǫ. For this purpose, we consider the following two types of running cost functions β L z L(z, u, v) β+ L z, z X, (u, v) U M, (3.3) β L z + β L,u u L(z, u, v) β+ L z + β+ L,u u, z X, (u, v) U M, (3.4) where β L, β+ L, β L,u and β+ L,u are some positive finite constants. Notice that with ψ 0 and L satisfying either (3.3) or (3.4), conditions (ii) and (iii) of Assumption are both verified. Since we have also assumed that condition (iv) of Assumption holds, we only need to check the first condition of Assumption in order to use the main results of Chapter 2. It turns out that this condition is a consequence of certain stabilizability condition. Lemma The infinite-horizon value function satisfies V (z) β + V z for some β + V < and all z X if either one of the following two conditions holds: 1. System (2.1) is exponentially stabilizable and L satisfies (3.3); 2. System (2.1) is exponentially stabilizable with finite energy and L satisfies (3.4); Proof 1. Let π be an arbitrary stabilizing policy such that x(t; z, π ) ba t z for some b 1 and a (0, 1). For simplicity, let ˆx(t) = x(t; z, π ), t Z +, and let (û( ), ˆv( )) be the corresponding closed-loop control sequence. Then we have V (z) J(z, π ) = t=0 L(ˆx(t), û(t), ˆv(t)) β+ L b 1 a z. 2. Following the notation of part 1 and according to Definition 3.1.3, we have t=0 û(t) β u z for some β u <. Then, V (z) J(z, π ) = L(ˆx(t), û(t), ˆv(t)) t=0 ( β + L b ) 1 a + β+ L,u z.

46 33 With the above lemma, we can obtain the following two results. 1. The optimal control problem inf π Π J(z, π ) with L satisfying (3.3) verifies Assumption if system (3.1) is exponentially stabilizable. 2. The optimal control problem inf π Π J(z, π ) with L satisfying (3.4) verifies Assumption if system (3.1) is exponentially stabilizable with finite energy. These results allow us to solve Problems or using ADP algorithm with properly-chosen running cost functions. Theorem With L satisfying (3.3), system (2.1) is exponentially stabilizable if and only if there exist constants ˆk Z + and ˆǫ > 0 such that V ǫ k system (3.1) with a stabilizing policy π ǫ,k for all k ˆk and ǫ ˆǫ. is an ECLF of Proof The if direction follows from Theorem The other direction follows from Lemma and Lemma Similarly, we can use π ǫ,k if the running cost function L satisfies (3.4). to exponentially stabilize system (2.1) with finite energy Theorem With L satisfying (3.3), system (2.1) is exponentially stabilizable with finite energy if and only if there exist constants ˆk Z + and ˆǫ > 0 such that V ǫ k is an ECLF of system (3.1) with a stabilizing policy π ǫ,k Proof Sufficiency: Suppose that V ǫ k policy π ǫ,k for all k ˆk and ǫ ˆǫ. is an ECLF of system (3.1) with a stabilizing for some k Z + and ǫ > 0. Then V ǫ k (z) κ 2 z for all z X and some κ 2 <. In addition, according to Theorem 2.4.2, we know π ǫ,k is exponentially stabilizing. Fix an arbitrary z X. Let ˆx( ) be the closed-loop trajectory driven by π ǫ,k with initial state z and let (û( ), ˆv( )) be the corresponding hybrid-control sequence. It remains to show that t=0 û(t) β+ u z for some finite β+ u. By (3.4), we have û(t) 1 L(ˆx(t), û(t), ˆv(t)) 1 V β L,u β k ǫ (ˆx(t)) κ 2 ˆx(t), L,u β L,u

47 34 where the second inequality follows from the stationarity of π ǫ,k. Let b < and a (0, 1) be the constants such that ˆx(t) ba t z for all t Z +, then t=0 û(t) β L,u κ 2 b (1 a) z and the desired result is proved. Necessity: Follows from Lemma and Lemma Therefore, as we increase k and reduce ǫ, the policy π ǫ,k will eventually become a solution to Problem or Problem depending on the chosen running cost function L. The terminating condition of this procedure involves testing whether V ǫ k is an ECLF or not. Such a test can be done by checking the two conditions in (2.8). The above discussions suggest a general way to solve the exponential stabilization problems using the ADP approach. Suppose that we have chosen an error function satisfying Assumption and a running cost function satisfying (3.3) (for Problem 3.1.1), or (3.4) (for Problem 3.1.2). To find the stabilizing policy, we start with a reasonable guess of ǫ and perform the ǫ-relaxed value iteration (2.11) with a trivial terminal cost function ψ 0. After each iteration, we check whether the current approximate value function V ǫ k V ǫ k is an ECLF. If so, then the policy πǫ,k defined by is the stabilizing policy to be sought; otherwise, we should continue with iteration (2.11). If π ǫ,k is not an ECLF for sufficiently large k, we shall reduce ǫ and restart the iteration (2.11) from k = 0. This procedure is summarized in Algorithm 6. The constants ǫ min and k max in the algorithm are used to limit the computational complexity dedicated to computing the stabilizing policy. The main advantage of the ADP-based stabilization method is that it is guaranteed to yield a stabilizing policy (with finite energy) whenever the system is exponentially stabilzable (with finite energy).

48 35 Algorithm 1 (Stabilization Via ADP) Require: ǫ min > 0, k max < 1: Set V ǫ : while ǫ > ǫ min do 3: for k = 1 to k max do 4: V ǫ k = R ǫ T [V ǫ k 1 ] 5: if V ǫ k satisfies (2.8) then 6: stop and return ξ ǫ k 7: end if 8: end for 9: Reduce ǫ 10: end while generated by V ǫ k through (2.12)

49 36 4. SWITCHED LQR PROBLEMS IN DISCRETE TIME In this chapter, we study an important special case of Problem with linear subsystem dynamics and quadratic cost functions. Such a problem is a natural extension of the classical LQR problem to the switched linear system case, and is thus called the discrete-time switched LQR (DSLQR) problem. The problem is of fundamental importance both in theory and practice and has challenged researchers for many years. The bottleneck mostly lies in the determination of the optimal mode sequence, which in general is an NP-hard problem. However, our analysis in this chapter shows that a large number of DSLQR problems can be efficiently solved using the approximate dynamic programming approach developed in Chapter Backgrounds and Our Contribution The last few years have seen increasing interest in using DP to solve various optimal control problems of switched systems. In [44], Xu and Antsaklis used DP to study the continuous-time switched LQR problem and developed an algorithm to find the suboptimal switching instants and continuous control for a fixed switching sequence. In [45], Rantzer and Johansson derived lower and upper bounds for the value function of the quadratic optimal control problem of piecewise affine systems; these bounds were then used to construct a suboptimal control strategy. A discretetime version of this problem was studied by Borrelli et al. in [46, 47], where the value function and the optimal control law were proved to be piecewise quadratic and piecewise linear, respectively. Based on these structural properties, an algorithm using multi-parametric programming was developed to compute the optimal feedback control law. More recently, Lincoln and Rantzer developed a general relaxation procedure in [31] to tackle the curse of dimensionality of DP. This procedure was also

50 37 employed to study the infinite-horizon DSLQR problem in [31,48] and the quadratic optimal control problem of continuous-time switched homogeneous systems in [49]. One contribution of the results presented in this chapter is the analytical characterization of both the value function and the optimal control strategies for general DSLQR problems (See Section 4.3). In particular, we show that the value function of the DSLQR problem is the pointwise minimum of a finite number of quadratic functions. These quadratic functions can be exactly characterized by a finite set of positive semidefinite (p.s.d.) matrices, which can be obtained recursively using the so-called Switched Riccati Mapping. Explicit expressions are derived for both the optimal switching law and the optimal continuous control law. Both of them are of state-feedback form and are homogeneous on the state space. Furthermore, the optimal continuous control is shown to be piecewise linear with different Kalman-type feedback gains within different homogeneous regions of the state space. Although other researchers have also suggested a piecewise affine structure for the optimal feedback control [47, 50, 51], the analytical expression of the optimal feedback gain and in particular its connection with the Kalman gain and the Riccati equation of the classical LQR problem have not been explicitly presented. Another contribution of this chapter is the development of a general relaxation strategy to compute and analyze some suboptimal solutions to the finite-horizon and infinite-horizon DSLQR problems (See Sections 4.4 and 4.5). This strategy can be viewed as a particularization of the general approximate dynamic programming framework developed in Chapter 2. The relaxation strategy also induces a particular relaxed value iteration whose performance can be analyzed using the results of Chapter 2. Due to the special structure of the DSLQR problem, we are able to characterize the relaxed value iteration using the so-called relaxed switched Riccati mapping, which can be efficiently computed using convex optimization. Moreover, detailed bounds in terms of the linear subsystem matrices are derived for the obtained finite-horizon and infinite-horizon suboptimal policies.

51 Problem Formulation We consider an important special case of system (2.1) with a continuous state space X = R n, a continuous control space U = R p, a discrete control space M = {1,..., M}, a trivial hybrid-control constraint set Γ(z) = R p M, z R n, and linear subsystems f v (z, u) = A v z +B v u, z R n, u R p, v M. For each i M, A i and B i are constant matrices of appropriate dimensions. For a given hybrid control sequence (u, v), the system evolves according to: x(t + 1) = A v(t) x(t) + B v(t) u(t), t Z +. (4.1) Under a control policy π = {ξ 0, ξ 1,...} with ξ t = (µ t, ν t ), the closed-loop dynamics is governed by x(t + 1) = A νt(x(t))x(t) + B νt(x(t))µ t (x(t)), t Z +. (4.2) For any π, denote by x( ; z, π ) the closed-loop trajectory driven by π with initial state z R n and let (u( ; z, π ), v( ; z, π )) be the corresponding closed-loop hybridcontrol sequence. The closed-loop trajectory and the hybrid-control sequence under a finite-horizon policy π N is defined in a similar way. Throughout this chapter, we will denote by the Euclidean norm of a given vector or matrix. Let Q f = Q T f 0, Q v = Q T v 0 and R v = Rv T 0 be some weighting matrices of appropriate dimensions for which the following terminal and running cost functions are well-defined: ψ(x) = x T Q f x, L(x, u, v) = x T Q v x + u T R v u, x R n, u R p, v M. For an N-horizon policy π N with N Z +, J(, π N ) : R n R + as defined in (2.4). The cost of an infinite horizon policy π is denoted by J(, π ) : R n R + { } and is defined in (2.3) The goal of this chapter is to solve the following problem. Problem (DSLQR problem) Given N Z + { }, find the N-horizon policy π N Π N that minimizes J(z, π N ) for all z R n.

52 39 This problem is a natural extension of the classical LQR problem to the switched linear system case and is thus called the Discrete-time Switched LQR problem, hereby referred to as the DSLQR problem. Following the notations used in Chapter 2, we define V N (z) = inf πn Π N J(z, π N ), N Z +, z R n, V (z) = inf π Π J(z, π ), z R n, where Π N and Π respectively. are the sets of all the N-horizon and infinite-horizon polices, 4.3 Value Function and Optimal Policy An important feature of the DSLQR problem is that when N is finite, the value function V N and its corresponding optimal policy can be characterized analytically. The key idea is to generalize the classical difference Riccati recursion to the switched linear system case. Let A be the semidefinite cone, namely, the set of all the p.s.d. matrices and introduce the Riccati mapping ρ i : A A for subsystem i M as ρ i (P) =Q i + A T i PA i A T i PB i(r i + Bi T PB i) 1 Bi T PA i. (4.3) Let K i (P) be the Kalman gain of subsystem i M with matrix P A, i.e., K i (P) (R i + Bi T PB i) 1 Bi T PA i. (4.4) Definition Let 2 A be the power set of A. The mapping ρ M : 2 A 2 A defined by: ρ M (H) = {ρ i (P) : for some i M and P H}, H 2 A, is called the Switched Riccati Mapping (SRM) associated with Problem In words, the SRM maps a set of p.s.d. matrices to another set of p.s.d. matrices and each matrix in ρ M (H) is obtained by taking the classical Riccati mapping of some matrix in H through some subsystem i M.

53 40 Definition The sequence of sets {H k } N k=0 generated iteratively by H k+1 = ρ M (H k ) with initial condition H 0 = {Q f } is called the Switched Riccati Sets (SRSs) of Problem The SRSs always start from a singleton set {Q f } and evolve according to the SRM. For any finite N, the set H N consists of at most M N p.s.d. matrices. An important fact about the DSLQR problem is that its value functions are completely characterized by the SRSs. Theorem For k Z +, the k-horizon value function for the DSLQR problem is V k (z) = min P H k z T Pz. (4.5) Proof The theorem is proved by induction. It is obvious that for k = 0 the value function is V 0 (z) = z T Q f z, satisfying (4.5). Now suppose equation (4.5) holds for some k Z +, i.e., V k (z) = min P Hk z T Pz. We shall show that it is also true for k + 1. By Bellman s principle of optimality, the k + 1-horizon value function can be recursively computed as V k+1 (z) = [ inf z T Q i z + u T R i u + V k (A i z + B i u) ] i M,u R p [ = inf z T (Q i + A T i M,P H k,u R p i PA i )z + u T (R i + Bi T PB i )u + 2z T A T i PB i u ]. (4.6) Since the quantity inside the bracket is quadratic in u, the optimal u can be easily found to be u = (R i + B T i PB i ) 1 B T i PA i z = K i (P)z, (4.7) where K i (P) is the matrix defined in (4.4). Substituting u into (4.6), we obtain V k+1 (z) = min i M,P Hk z T ρ i (P)z. Observing that {ρ i (P) : i M, P H k } = ρ M (H k ) = H k+1, we have V k+1 (z) = min P Hk+1 z T Pz. For each k Z +, define ξk (z) = (µ k (z), ν k (z)) = arg min {L(z, u, v) + V k (A v z + B v u)}, z R n. (4.8) (u,v) R p M

54 41 Then according to a standard result of dynamic programming, the policy π N := {ξ N 1,...,ξ 0 } with N Z + is an optimal solution of the N-horizon DSLQR problem. In view of Theorem 4.3.1, the policy π N can also be characterized analytically. Corollary For any k Z +, the control law ξ k ( ξk (z) = (µ k (z), ν k (z)) = K i k (z)(pk (z))z, i k ), (z) defined in (4.8) is given by with (Pk(z), i k(z)) = arg min z T ρ i (P)z, z R n. (4.9) (P H k,i M) Remark Theorem is not a trivial variation of the results in [45,47], which deal with piecewise affine systems, where the mode sequence v(t) is determined by the evolution of the continuous state instead of being a decision variable independent of the continuous state as in the present DSLQR problem. Remark The piecewise quadratic structure of the value function has also been suggested in [31] for the infinite-horizon DSLQR problem. Compared with [31], the contribution of Theorem and Corollary lies in the explicit characterization of the value function in terms of the SRM and its connection to the Kalman gain and the Riccati equation of the classical LQR problem. Remark Theorem establishes a one-to-one correspondence between the function V k G + and the set H k 2 A. It is sometimes beneficial to view H k as a representation of V k in the space 2 A. From this perspective, ρ M becomes just a representation of the value iteration operator T in the space 2 A. Such a view point is illustrated in Fig Compared with the classical LQR problem, the value function of the DSLQR problem is no longer a single quadratic function; it becomes the pointwise minimum of a finite number of quadratic functions. At each time step, instead of having a single optimal-feedback gain for the entire state space, the optimal state feedback gain becomes state dependent. Furthermore, the minimizer (Pk (z), i k (z)) of equation (4.9) is radially invariant, indicating that at each time step all the points along

55 42 Fig Representations of the value iterations in G + and in 2 A. the same radial direction have the same optimal hybrid-control law. These properties are illustrated in Fig. 4.2 using an example in R 2 with 2 subsystems: at each time step, the state space is decomposed into several homogeneous decision regions, each of which corresponds to a pair of optimal mode and optimal-feedback gain. In addition, all the gray homogeneous regions have the same optimal mode, say mode 2. It is worth mentioning that in a higher dimensional state space, the homogeneous decision regions may become nonconvex and rather complicated. A salient feature of the DSLQR problem is that all these complex decision regions are completely encoded in a finite number of matrices in the SRSs.

56 43 Mode 2 Mode 1 Mode 2 Kalman gain 1 Mode 2 Kalman gain 2 Mode 2 Kalman gain 1 Fig Typical optimal decision regions of a two-switched system, where mode 1 is optimal within the white region and mode 2 is optimal within the gray region. The optimal mode region is divided into smaller homogeneous regions, each of which corresponds to a different optimal-feedback gain. 4.4 Efficient Exact Solution in Finite Horizon As indicated by (4.5), in terms of computing the value function, it suffices to keep the matrices in H k that give rise to the minimum of (4.5) for at least one z R n. Typically, although H k is exponentially large, only a small portion of its matrices may be useful for computing the value function. The rest of the matrices are redundant and can be removed without introducing any error. This is the key idea of our efficient algorithm.

57 Algebraic Redundancy and Equivalent Subsets To formalize the above idea, we introduce a few definitions. Definition (Algebraic Redundancy) A matrix P H is called (algebraic) redundant if for any z R n, there exists a matrix P H such that P P and z T Pz z T Pz. If P H is redundant, then H and H \ { P } define the same value function and thus are equivalent. Definition (Equivalent Set) Let H and Ĥ be two sets of p.s.d matrices. The set H is called equivalent to Ĥ, denoted by H Ĥ, if and only if min P H z T Pz = min ˆP Ĥ zt ˆPz, z R n. Therefore, two sets of p.s.d. matrices are equivalent if they define the same value functions of the DSLQR problem. The following lemma provides a test for the equivalent subsets of H k. Lemma Ĥ is an equivalent subset of H if and only if 1. Ĥ H 2. P H and z R n, there exists a ˆP Ĥ such that zt ˆPz z T Pz. Proof (sufficiency): We need to prove min P H z T Pz = min ˆP Ĥ zt ˆPz, z R n. Obviously, min P H z T Pz min ˆP Ĥ zt ˆPz, z R n as Ĥ H. On the other hand, the second condition of this lemma implies min ˆP Ĥ zt ˆPz minp H z T Pz, for all z R n. (necessity): straightforward by a standard contradiction argument. Remark Lemma can be used as an alternative definition of the equivalent subset. Although the original definition is conceptually simpler, the conditions given in this lemma provide a more explicit characterization of the equivalent subset, which will be useful in the subsequent discussions.

58 45 To ease the computation of the value function, we are interested in finding an equivalent subset of H k with as few elements as possible. Definition (Minimum Equivalent Subset (MES)) Let H and Ĥ be two sets of symmetric p.s.d matrices. Ĥ is called an equivalent subset (ES) of H if Ĥ H and Ĥ H. Furthermore, Ĥ is called a minimum equivalent subset (MES) of H if it is the equivalent subset of H with the fewest elements. Note that the MES of H may not be unique. Denote by Σ(H) one of the MESs of H Computation of (Minimum) Equivalent Subsets To simplify the computation of V k at each step k, we shall prune out as many redundant matrices as possible and obtain an equivalent subset of H k as close as possible to Σ(H k ). However, testing whether a matrix in H k is redundant or not is itself a challenging problem. Geometrically, any p.s.d. matrix P defines an ellipsoid (possibly degenerate) in R n : {x R n : x T Px 1}. It can be easily verified that P H k is redundant if and only if its corresponding ellipsoid is completely contained in the union of all the ellipsoids corresponding to the matrices in H k \ { P }. Since the union of ellipsoids is not convex in general, there is no efficient way to verify this geometric condition or equivalently the condition given in Definition Nevertheless, a sufficient condition for a matrix to be redundant can be easily obtained using the S-Procedure [52]. Lemma P is redundant in H k if there exist nonnegative constants {α j } H k 1 j=1 such that H k 1 j=1 α j = 1 and P H k 1 j=1 α j P (j), where {P (j) } H k 1 j=1 is an enumeration of H k \ { P }. Proof Suppose that the condition in this lemma holds. Then, for an arbitrary fixed z R n, we have z T Pz Hk 1 j=1 z T α j P (j) z z T P (jz) z, for some j z {1,..., H k 1}. Thus, by definition, P is redundant with respect to Hk.

59 46 For given P and H k, the condition in Lemma can be efficiently verified by solving a convex optimization problem. Lemma The condition in Lemma holds if and only if the solution of the following convex optimization problem satisfies H k 1 j=1 α j 1. max Hk 1 j=1 α j {α 1,..., H k 1} subject to: Hk 1 j=1 α j P (j) P (4.10) Proof The only if direction is trivial. To prove the other direction, let {α j } H k 1 j=1 be the solution of problem (4.10). Define α 0 = H k 1 j=1 α j and ˆα j = α j /α 0, j = 1,..., H k 1. If α 0 1, then H k 1 j=1 ˆα j = 1, and H k 1 i=1 ˆα j P (j) H k 1 j=1 α j P (j) P Problem (4.10) can be easily solved using various convex optimization algorithms [53, Chapter 11]. Based on the above two lemmas, an efficient algorithm (Algorithm 2) is developed to compute an ES for any given set H k. In words, the algorithm simply removes all the matrices that satisfy the condition of Lemma and returns the set consisting of the remaining matrices. Denote by Algo(H k ) the ES returned by Algorithm 2. Since the condition in Lemma is only sufficient, Algo(H k ) may not be an MES, but typically contains much fewer matrices than H k. It is worth mentioning that in R 2, the exact MES of any H k can be computed by directly partitioning the unit circle. The complexity of such an approach is prohibitive in high-dimensional state spaces and will not be discussed here. Interested readers are referred to [54] for more details.

60 47 Algorithm 2 [ Algo(H k ) ] Set Ĥk =. for each P H k do if P does NOT satisfies the condition in Lemma with respect to Ĥk then Ĥ k =Ĥk {P }; end if end for Return Ĥk Overall Algorithm for Finite Horizon Notice that the value iteration at step k + 1 only depends on V k (z) and that removing the redundant matrices will only change the representation of V k (z), not its actual value. These two facts guarantee that the redundant matrices removed at step k will not affect any value functions at later steps. The following lemma uses this property to embed the ES algorithm in the value iteration. Its basic idea is to remove the redundant matrices after each value iteration and then apply the next value iteration based on the obtained equivalent subset with fewer matrices. Lemma (ES Iteration) Let the sequence of sets {Ĥk} N k=0 be generated by Ĥ 0 = H 0, and Ĥk+1 = Algo(ρ M (Ĥk)), for 0 k N 1, (4.11) Then Ĥk Algo(H k ). Proof We prove this lemma using induction. The result clearly holds for k = 0. Suppose that Ĥ k Algo(H k ) for some general k. We shall show that Ĥ k+1 Algo(H k+1 ). Since Algo( ) will return an equivalent subset of the given set, we have Ĥ k+1 = Algo(ρ M (Ĥk)) ρ M (Ĥk) and Algo(H k+1 ) H k+1 = ρ M (H k ). Thus, it suffices to show that ρ M (Ĥk) ρ M (H k ) under the condition that Ĥk H k. It is easy

61 48 to verify that Ĥk must be a subset of H k. Thus, ρ M (Ĥk) is also a subset of ρ M (H k ). Hence, to prove the equivalence between ρ M (Ĥk) and ρ M (H k ), we only need to verify condition 2 in Lemma To this end, take an arbitrary matrix P ρ M (H k ) and an arbitrary point z R n. Then P = ρ i (P (j) (j) ) for some i M and P H k. Hence, k k z T Pz = z T ρ i (P (j) k [ )z ] =min z T Q i z + u T R i u + (A i z + B i u) T P (j) u k (A iz + B i u) [ ] min z T Q i z + u T R i u + (A i z + B i u) T (j) ˆP u k (A iz + B i u) =x T ρ i ( ˆP (j) k )x zt ˆPz. (for some ˆP ρm (Ĥk)) (for some ˆP (j) k Ĥk) where the first inequality follows from the fact that Ĥk is an equivalent subset of H k and Lemma Thus, ρ M (Ĥk) and ρ M (H k ) satisfy both conditions in Lemma and the desired result follows. Algorithm 3 (Algorithm for Finite-horizon DSLQR Problems) 1. Initialization: Set Ĥ0 = Q f. 2. ES Iteration: Compute {Ĥk} N k=0 using the iteration (4.11). 3. Value Function: V k (z) = min P Ĥ k z T Pz, z R n, k = 0,...,N. 4. Optimal Strategy: πn = {ξ N 1 (z),...,ξ 0 (z)}, where for k = 0,..., N 1, ( ξk (z) = Kîk (z) ( ˆP ) ( ) k (z))z, î k (z), with ˆPk (z), î k (z) = arg min z T ρ i (P)z. P Ĥk,i M In summary, to solve the DSLQR problem, we shall use (4.11) to obtain a sequence of sets {Ĥk} N k=0. By Lemma 4.4.4, {Ĥk} N k=0 define the same value functions as {H k} N k=0. By Corollary 4.3.1, the optimal strategies can also be computed based on {Ĥk} N k=0. This procedure of solving the finite-horizon DSLQR problem is summarized in Algorithm 3. A distinctive feature of this algorithm is that it computes the exact optimal

62 49 control strategy without any approximation. Algorithm 3 can efficiently solve many DSLQR problems with relatively large horizons, while the approach of enumerating the switching sequences always suffers from the combinatorial complexity Numerical Examples Analytical Example We first consider the following simple DSLQR problem for which an analytical solution is available for a verification purpose. A 1 = 0 1, A 2 = 0 0, Q 1 = 100 0, Q 2 = 0 0, Q f = 1 0,B 1 = B 2 = 0, R 1 = R 2 = 1, and N = 10; 0 1 It can be easily seen that the optimal mode sequence for the initial state x (1) 0 = [1, 0] T is {2, 1, 2, 1,..., 2, 1} and the corresponding optimal cost is 1. If the initial state is x (2) 0 = [0, 1] T, then the optimal cost remains the same, but the optimal mode sequence would be {1, 2, 1, 2..., 1, 2}. Let χ 1 = { r [cos(θ), sin(θ)] T R 2 : r 0 and θ [ π/4, π/4) [3π/4, 5π/4) } χ 2 = R 2 \ χ 1. By the symmetry of the problem, it can be easily seen that the optimal switchingcontrol law is: νk (x) = 1, if x χ 2 and νk (x) = 2 otherwise. With the analytical solution in mind, we now demonstrate how to obtain the same result by carrying out Algorithm 3. Initially, we have Ĥ0 = {Q f } = I 2. Taking the SRM yields H 1 = ρ M (Ĥ0) = 100 0,

63 50 None of the two matrices are redundant. Thus, Ĥ 1 =Algo(ρ M (Ĥ0))=H 1. Proceeding one more step, we have ρ M (Ĥ1) = 1 0, 100 0, 100 0, Obviously, the last two matrices are redundant. Thus, Ĥ 2 = 1 0, Continuing this process, we have, Ĥ k = , , k = 1,..., N. Then, using Step 4) of Algorithm 3, the same optimal policy as discussed in the last paragraph is obtained. This example shows that although the original SRSs {H k } N k=0 grow exponentially fast, their equivalent subsets {Ĥk} N k=0 can be made rather small and the optimal solution can be easily found using Algorithm 3. Numerical Example We next consider a more general example with the following matrices: A 1 = 2 1, A 2 = 2 1, A 3 = 3 1, A 4 = 3 1, B 1 = 1 1, B 2 = 1 2, B 3 = B 1, B 4 = B 2, Q i = Q f = I 2, R i = 1, i = 1,..., 4, and N = 20. This problem cannot be solved analytically. However, it can still be efficiently solved using Algorithm 3. The algorithm starts with Ĥ0 = {Q f }. After applying the SRM ρ M to Ĥ0, we have ρ M (Ĥ0 ) contains 4 matrices. One of these 4 matrices is algebraic

64 51 Fig Complexity of Algorithm 3 for Example redundant with respect to the rest and hence Ĥ1 contains 3 matrices. Continue one more step, we have ρ M (Ĥ1 ) contains 12 matrices. Using Algo( ) to this set, we found out that 3 matrices among these 12 are redundant and hence Ĥ2 contains 9 matrices. Keeping this iteration, we can compute the sets {Ĥk} 20 k=0 and the number of matrices in these sets are plotted in Fig Compared with the brute-force solution with combinatorial complexity on the order of 10 12, Algorithm 3 requires at most matrices to completely characterize the exact optimal strategy over the horizon Suboptimal Solution in Finite Horizon While Algorithm 3 can efficiently solve some DSLQR problems, it may still fail in many other cases. In fact, many DSLQR problems require a prohibitively large number of matrices to characterize their exact optimal solutions. Fortunately, suboptimal strategies are often acceptable in practice. In this chapter, we will further simplify the

65 52 computation by allowing some small errors in representing the value functions. This is certainly in line with the approximate dynamic programming framework introduced in Chapter 2. We first identify some important constants. Define λ Q =min i M {λ min(q i )}, λ + Q = max i M {λ max(q i )}, λ R =min i M {λ min(r i )}, λ + R = max i M {λ max(r i )}. Considering Assumption (2.3.1) and equations (3.3) and (3.4), we can easily see that β + L = λ+ Q, β L = λ Q, β L,u = λ R and β+ L,u = λ+ R Relaxed Switched Riccati Mapping We first generalize the redundancy and ES concepts to allow some error in representing the value functions. Definition (Numerical Redundancy) For any ǫ > 0, a matrix P H k is called (numerically) ǫ-redundant with respect to H k if min z T Pz min z T (P + ǫi n )z, for any z R n. P H k \ P P H k Definition (ǫ-es) The set H ǫ k is called an ǫ-equivalent-subset (ǫ-es) of H k if Hk ǫ H k and min z T Pz min z T Pz min z T (P + ǫi n )z, for any z R n. P H k P H ǫ k P H k Removing the ǫ-redundant matrices may introduce some error for the value function; but the error is no larger than ǫ for z 1. To simplify the computation of the value function, for a given tolerance ǫ, we want to prune out as many ǫ-redundant matrices as possible. Similar to Lemma 4.4.2, the following lemma provides a sufficient condition for testing the ǫ-redundancy for a given matrix. Lemma P is ǫ-redundant with respect to H k if there exist nonnegative constants {α j } H k 1 j=1 such that H k 1 j=1 α j = 1 and P + ǫi n H k 1 j=1 α j P (j), where {P (j) } H k 1 j=1 is an enumeration of H k \ { P }.

66 53 Algorithm 2 can be easily modified to compute an ǫ-es for a given set H k. Denote by Algo ǫ ( ) the ǫ-es of H k returned by the modified algorithm. We can remove the ǫ- equivalent matrices after each SRM resulting in a relaxed SRM and the corresponding relaxed SRSs. Definition For ǫ > 0, the composite mapping Algo ǫ ρ M : 2 A 2 A is called the ǫ-relaxed SRM of system (4.1). The sets {H ǫ k }N k=0 generated iteratively by: H0 ǫ = H 0, and Hk+1 ǫ = Algo ǫ(ρ M (Hk ǫ )), for 0 k N 1, (4.12) is called the ǫ-relaxed SRSs associated with system (4.1). It is beneficial to connect the above relaxation scheme with the general approximate dynamic programming framework developed in Chapter 2. As pointed out in Remark 4.3.3, the Riccati mapping ρ M is essentially a characterization of the value iteration operator of the DSLQR problem in the space 2 A. From this perspective, Algo ǫ becomes a particular representation of the relaxation operator defined in Definition 2.5.2, and thus the relaxed SRM induces a relaxed value iteration for the DSLQR problem. To formally define the relaxation operator for the DSLQR problem, it is convenient to introduce the following space of piecewise quadratic functions: { } G PQ := V = min P H zt Pz : for some finite set of p.s.d matrices H. For any function g G PQ defined by a set of p.s.d. matrices H, we define an operator R : G PQ G PQ by R ǫ [g](z) = min Algo ǫ(h) zt Pz, z R n, ǫ 0. (4.13) According to Definitions and 4.5.2, it can be easily verified that R ǫ is a relaxation operator with invariant space G PQ and error function E(z) = z, z R n. Furthermore, it can be easily seen that c = 1/λ Q, where c is the constant used in Assumption

67 54 With R ǫ defined in (4.13), the composite operator R ǫ T is the one-stage approximate value iteration for the DSLQR problem. Let V ǫ 0 ψ, The function V ǫ k Vk ǫ (R ǫ T ) k [ψ], and Ṽ k ǫ T [V k 1 ǫ ], for k = 1,...,N. (4.14) Problem The function Ṽ ǫ k properties of V ǫ k. Let ξǫ k defined above is called the k-horizon approximate value function of is an auxiliary function that is useful in studying the ǫ be the hybrid-control law such that Ṽk+1 = T ξk ǫ[v k ǫ ], i.e., ξk ǫ (z) = (µ k(z), ν k (z)) = arg min{l(z, u, v) + Vk ǫ (A vz + B v u)}, z R n. (4.15) (u,v) Theorem For each k = 1,..., N and all z R n, we have Vk ǫ (z) = min z T Pz, P H ǫ k Ṽk ǫ (z) = min Pz, and P ρ M (H ǫ k 1 )zt ( ξk ǫ (z) = K i ǫ k (z)(pk ǫ (z))z, iǫ k ), (z) where (P ǫk (z), iǫk (z) ) = arg min P H ǫ k,i M z T ρ i (P)z. (4.16) Proof The result follows easily by solving explicitly the quadratic optimization problem in (4.15) followed by a standard induction argument. The relationships between Vk ǫ and Ṽ k ǫ and their connections to the relaxed SRSs are illustraded in Fig As will be demonstrated later through numerical examples, V ǫ k is usually much easier to compute than V k because H ǫ k typically contains much fewer matrices than both H k and Algo(H k ). In the next subsection, we will show that the approximate value functions generate a suboptimal policy that can be efficiently computed Suboptimal Policy in Finite Horizon Let π ǫ N = {ξǫ N,...,ξǫ 1} be the N-horizon policy generated by the approximate value functions {V ǫ k }N 1 k=0. We want to derive an upper bound for the cost associated with this policy πn ǫ. For this purpose, the following inequalities are useful.

68 55 Fig Representations of the relaxed value iteration in G + and in 2 A. Lemma For each k = 1,..., N, z R n, the functions defined in (4.14) satisfy: 1. Ṽ ǫ k (z) V ǫ k Ṽ ǫ k (z) + ǫ z 2 ; 2. V k (z) V ǫ k (z) (1 + ǫ/λ Q )V k(z). Proof The first result follows from the fact that Vk ǫ = R ǫ[ṽ k ǫ ] and the definition of an ǫ-es. In addition, V ǫ k (z) (1 + ǫ/λ Q )V k(z) follows immediately from Theorem 2.6.1, while V k (z) V ǫ k (z) follows from the fact that Hǫ k H k. Following the above lemma, an upper bound for the cost associated with π ǫ N is given in the following theorem.

69 56 Theorem (Performance Bound for πn ǫ ) For any finite integer N and any z R n, the cost iccured by the policy π ǫ N is bounded above as: J(z, π ǫ N ) (1 + δ(πǫ N )) V N(z), where the δ(π ǫ N ) denotes the maximum relative error of policy πǫ N given by δ(π ǫ N) = ǫ/λ Q. (4.17) Proof Fix an arbitrary z R n. Let ˆx( ) be the closed-loop trajectory driven by π ǫ N with initial condition ˆx(0) = z and let (û( ), ˆv( )) be the corresponding hybridcontrol sequence, i.e., (û(t), ˆv(t)) = ξ ǫ N t (ˆx(t)). By (4.14) and (4.15), we have L(ˆx(t), û(t), ˆv(t)) = Ṽ N t ǫ (ˆx(t)) V N (t+1) ǫ (ˆx(t + 1)) for each t = 0,...,N 1. Therefore, J(z, π ǫ N) = N 1 t=0 L(ˆx(t), û(t), ˆv(t)) + ψ(ˆx(n)) N 1 = [Ṽ N t ǫ (ˆx(t)) V N (t+1) ǫ (ˆx(t + 1))] + ψ(ˆx(n)) t=0 N 1 = Ṽ N(z) ǫ + [Ṽ N t(ˆx(t)) ǫ VN t(ˆx(t))] ǫ + [ψ(ˆx(n)) V0 ǫ (ˆx(N))]. t=1 Using Lemma and the fact ψ(z) = V0 ǫ (z) yields J(z, π ǫ N ) Ṽ ǫ N (z) (1 + δ(πǫ N )) V N(z) z 2. Based on the results of this subsection, a suboptimal policy and the corresponding performance upper bound can be obtained as described in Algorithm 4. By choosing the relaxation parameter ǫ small enough, the obtained policy πn ǫ can be made arbitrarily close to the optimal one.

70 57 Algorithm 4 (N-horizon Suboptimal Policy) Require: ǫ Set H ǫ 0 = Q f; for k = 0 to N 1 do H ǫ k+1 = Algo ǫ(ρ M (H ǫ k )); end for return {H ǫ k }N k=0 characterizing the policy πǫ N = {ξǫ N,...,ξǫ 1} through (4.16). Maximum relative error: δ(π ǫ N ) = ǫ/λ Q.

71 58 Fig Complexity comparison between Algorithm 3 and Algorithm 4 with ǫ = Example Revisited For comparison, we test Algorithm 4 using the same example as described in Section As shown in Fig. 4.5, instead of characterizing the optimal solution exactly using matrices, with a relaxation parameter ǫ = 10 3, we can obtain a suboptimal strategy using only matrices. It is worth mentioning that for many other DSLQR problems, Algorithm 3 may still suffer from combinatorial complexity. In these cases, relaxing the accuracy using Algorithm 4 becomes necessary. 4.6 Suboptimal Solution for Infinite Horizon For each k Z + and ǫ 0, we introduce the following infinite-horizon policy π ǫ,k := {ξǫ k, ξǫ k,...,}, (4.18)

72 59 where ξk ǫ is characterized by the relaxed SRS Hǫ k through (4.16). Notice that for the infinite-horizon case, Hk ǫ is computed with Q f = 0. In addition, with Q f = 0, Lemma guarantees the monotonicity of the value functions, i.e., 0 V 1 V 2 V. The goal of this section is three-fold: (i) to establish conditions under which the infinite-horizon policy π ǫ,k is exponentially stabilizing and suboptimal, (ii) to derive an analytical upper bound for the cost function associated with π ǫ,k, (iii) and to develop an efficient algorithm to compute a suboptimal infinite-horizon policy. Our discussion here particularizes and improves upon the general results developed in Section 2.7 by taking advantage of some special features of the DSLQR problem. In order for J(, π ǫ,k ) to be finite, the stabilizability of system (4.1) is necessary. Assumption System (4.1) is exponentially stabilizable. In the rest of this chapter, we shall first establish the conditions under which π ǫ,k exponentially stabilizing, and then derive an performance bound for any stabilizing policy π ǫ,k. These results naturally lead to a numerical algorithm that can yield a stabilizing policy with any predefined suboptimal performance. is Stabilizing Condition With Efficient Test A key step for proving the stabilizing property of π ǫ,k is to show the boundedness of the value function V. In particular, we want to prove that Assumption implies that V (z) β z 2 for all z R n and some constant β <. We first introduce some { } notations. Let σ + A = max i M λmax (A T i A i). Denote by I + B M the set of indices of nonzero B matrices, i.e., I + B {i M : B i 0}. Let σ + min ( ) be the smallest positive singular value of a nonzero matrix. If I + B, define ˆσ B = min i I + {σ + B min (B i)}. Notice that the running cost function L in the DSLQR problem satisfies (3.4). Thus, the desired boundedness of V would follow from Lemma if system (4.1) is exponentially stabilizable with finite energy (Definition 3.1.3). Here, we want to

73 60 show this property under only assumption It turns out that for switched linear systems, the plain exponential stabilizability is equivalent to the exponential stabilizability with finite energy, and thus the desired result holds under Assumption Theorem Under Assumption 4.6.1, there exists a positive constant β < such that λ Q z 2 V (z) β z 2, for all z R n. Furthermore, one possible choice of the bound β is given by bλ + Q β=, 1 a ( λ + Q + λ+ R ) 2[a+(σ + A )2 ] ˆσ B 2 if I+ B = (4.19) b 1 a, otherwise, where b [1, ) and a (0, 1) are the constants such that the closed-loop trajectory satisfies x(t) 2 ba t x(0) 2 for all t Z +. Proof See Appendix 6.2. Theorem implies that Assumption holds whenever system 4.1 is exponentially stabilizable. Therefore, most of the results developed in Section 4.6 hold true for the DSLQR problem. In particular, the k-horizon value function V k converges exponentially fast to V in the DSLQR problem. Corollary Under Assumption 4.6.1, V k V exponentially fast with 0 V k1 (z) V k (z) α V γ k V z 2, z R n, k 1 k Z +, where α V := β(β+λ Q ) λ Q and γ V := 1. 1+β/λ Q Proof We have assumed that ψ 0 for the infinite-horizon case. The result then follows from Theorem and Corollary Furthermore, by Lemma 2.7.2, the infinite-horizon policy π ǫ,k is guaranteed to be exponentially stabilizing for sufficiently large k and sufficiently small ǫ, provided that system (4.1) is exponentially stabilizable.

74 61 Theorem If system (4.1) is exponentially stabilizable, then there exist constants ˆk Z +, ˆǫ > 0 and κ 3 > 0 such that for all k ˆk and ǫ ˆǫ, the approximate value function V ǫ k is an ECLF of system (4.1) satisfying V ǫ k (z) V ǫ k (f ν ǫ k (z)(z, µ ǫ k(z)) κ 3 z, z R n, (4.20) and the closed-loop trajectory driven by π ǫ,k satisfies x(t; z, π ǫ,k ) α xγ t x z, z Rn, t Z +. (4.21) where α x := β(1+ǫ/λ Q ) λ Q and γ x := β(1+ǫ/λ Q ). β(1+ǫ/λ Q )+κ 3 Proof Inequality (4.20) follows directly from (4.20). Notice that under Assumption 4.6.1, we have λ Q z V ǫ k (z) (1 + ǫ/λ Q )β z, for all z Rn, k Z ++ and ǫ 0. Therefore, the convergence result (4.21) follows from Theorem This theorem indicates that as we increase k and reduce ǫ, the approximate value function V ǫ k π ǫ,k. To test whether V ǫ k eventually becomes an ECLF of system (4.1) with a stabilizing policy is an ECLF, one shall verify condition (4.20). It turns out that for the DSLQR problem, this verification process can be efficiently carried out by taking advantage of the piecewise quadratic structure of V ǫ k. Lemma Inequality (4.20) holds for some constant κ 3 > 0 if for each P H ǫ k, there exist nonnegative constants α j, j = 1,..., j, such that j j=1 α j = 1 and P j j=1 α j ( ˆP (j) + (κ 3 κ )I n ) (4.22) where { ˆP (j) } j j=1 is an enumeration of the set ρ M(H ǫ k ) and κ := { } min λ min Ki (P) T R i K i (P) + Q i, i M,P H ǫ k with K i (P) being the Kalman gain defined in (4.4).

75 62 Proof Recall that Ṽ ǫ k+1 (z) = min P ρ M (H ǫ k ) z T Pz for all z R n. Clearly, condition (4.22) implies that V ǫ k (z) Ṽ ǫ k+1 (z) (κ 3 κ ) z 2, z R n. Now let z R n be arbitrary but fixed. Denote by ( ˆP, î) the minimizer in (4.16) for this fixed z. Suppose that the system starts from z at time 0 and is driven by the policy π ǫ,k,uc. Let û = K î ( ˆP)z and ˆx 1 = Aîz + Bîû be the continuous control at time 0 and the state at time t = 1, respectively. Plugging equations (4.3) and (4.4) into û, we have which implies ] Vk ǫ T (ˆx 1 ) = min [ˆx P H ǫ 1 P ˆx 1 k ˆx T 1 ˆP ˆx 1 = z T ρî( ˆP)z û T Rîû + z T Qîz z T ρî( ˆP)z κ z 2 = Ṽ ǫ k (z) κ z 2. V ǫ k (z) V ǫ k (ˆx 1) V ǫ k (z) Ṽ ǫ k (z) + κ z 2 κ 3 z 2. Remark Following a similar argument as in the proof of Lemma 4.4.3, condition (4.22) can be verified by solving a convex optimization problem Performance Bound for π ǫ,k Whenever the closed-loop trajectory is exponentially stable, the cost J(, π ǫ,k ) will be bounded above. We now derive an analytical expression for this bound. Theorem If V ǫ k satisfies the condition in Theorem 4.6.2, then the cost associated with the policy π ǫ,k,uc is bounded above by: J(z, π ǫ,k (z)) ( 1 + δ(π ǫ,k ) ) V (z),

76 63 where δ(π ǫ,k ) = ( ) ǫβ λ + α V γv k Q α x (1 γ x )λ. (4.23) Q Here α V, γ V, α x and γ x are the constants defined in Corollary and Theorem Proof Fix an arbitrary z X, let ˆx(t) = x(t; z, π ǫ,k ) for t Z +, and let (û( ), ˆv( )) be the corresponding hybrid-control sequence. Then, J(z, π ǫ,k ) = L(ˆx(t), û(t), ˆv(t)) = t=0 t=0 [Ṽ ǫ k+1 (ˆx(t)) V ǫ k (ˆx(t + 1)) ] = Ṽ k+1 ǫ (z) + ] ǫ [Ṽ k+1 (ˆx(t)) V k ǫ (ˆx(t)), t=1 where the last step is due to the stability of the trajectory ˆx(t) that guarantees Vk ǫ (x(t)) 0 as t. Applying Lemma 4.5.2, Corollary 4.6.1, Theorem 4.6.2, and noticing the monotonicity of the value functions due to ψ 0 yields J(z, π ǫ,k ) Ṽ ǫ k+1(z) + t=1 (1 + ǫ/λ Q )V (z) + V (z) + V (z) + ] ǫ [Ṽ k+1(ˆx(t)) Vk ǫ (ˆx(t)) (1 + ǫ/λ Q )V k+1(ˆx(t)) V k (ˆx(t)) t=1 ǫβ ˆx(t) 2 + t=0 λ Q ( ǫβ λ + α V γv k Q (1 + δ(π ǫ,k ))V (z). ) (V k+1 (ˆx(t)) V k (ˆx(t))) t=1 α x 1 γ x z 2 Remark The error bound δ(π ǫ,k ) derived in Theorem is often conservative as we have assumed the worst case for every inequality encountered in the

77 64 derivation. The conservative bound is of great theoretical importance as it indicates that the error decays linearly as ǫ decreases, and exponentially as k increases. Moreover, the error can be made arbitrarily small by choosing a proper combination of ǫ and k. 4.7 Overall Algorithm We now summarize the main results developed in this Chapter. If system (4.1) is exponentially stabilizable, then by Theorem 4.6.2, it can always be stabilized by π ǫ,k for sufficiently large k and sufficiently small ǫ. The policy π ǫ,k is exponentially stabilizing if its corresponding approximate value function V ǫ k verifies condition (4.20). Lemma points out that condition (4.20) is equivalent to the convex condition (4.22) that can be efficiently checked by solving a convex optimization problem. If the policy π ǫ,k is exponentially stabilizing for some k and ǫ, its corresponding cost J(z, π ǫ,k ) is bounded above for all initial state z Rn. Theorem implies that further increasing k and decreasing ǫ will continuously reduce the cost of π ǫ,k eventually make it a policy with any predefined suboptimal performance. and will Since the control law ξ ǫ k is completely characterized by the relaxed SRS Hǫ k, a suboptimal policy of the form π ǫ,k basic idea is to evolve H ǫ k can be obtained through the relaxed SRM. The according to the relaxed SRM (4.12) and stop when the obtained H ǫ k verifies condition (4.22). The resulting policy πǫ is guaranteed to be suboptimal with a relative error bounded above by δ(π ǫ,k ). The detailed solution procedure is given in Algorithm 5. By choosing proper parameter pair (ǫ, k max ) in the algorithm, the returned policy π ǫ,k under Assumption can achieve any desired suboptimal performance

78 65 Algorithm 5 Infinite-Horizon Suboptimal Policy Require: ǫ > 0, k max Z + Set k = 0, and H ǫ 0 = {0} for k = 1 to k max do H ǫ k = Algo ǫ(ρ M (H ǫ k )) if H ǫ k satisfies (4.22) then stop and return H ǫ k error bound δ(π ǫ,k ). end if end for characterizing a suboptimal policy πǫ,k,uc with a relative 4.8 Numerical Example Consider a simple infinite-horizon DSLQR problem with two second-order subsystems: A 1 = , B 1 = 1, A 2 = , B 2 = 1. 2 Suppose that the state and control weights are Q 1 = Q 2 = I 2 and R 1 = R 2 = 1, respectively. Both subsystems are unstable but controllable. Algorithm 5 with ǫ = 10 4 is applied to solve this DSLQR problem. The algorithm starts with H ǫ 0 = {0}. Applying the SRM ρ M to H ǫ 0 yields ρ M (H ǫ 0 ) = {I 2, I 2 }. Hence, H ǫ 1 = {I 2}. Continue one more step, we have ρ M (H1 ǫ ) = ,

79 66 Since none of the above two matrices are ǫ-redundant, we have H2 ǫ = ρ M (H1 ǫ ). Similarly, we can compute ρ M (H2 ǫ ) = ,, , Once again, none of the above 4 matrices are redundant, so we have H ǫ 3 = ρ M (H ǫ 2 ). Continue further, we have ρ M (H3 ǫ ) = Using Lemma 4.5.1, one can easily verify that the first 4 matrices in the above equation are ǫ-redundant with respect to the rest 4 matrices. Hence, H4 ǫ = Similarly, computing ρ M (H4) ǫ and removing its ǫ-redundant matrices yields, H5 ǫ = One can observe that H5 ǫ contains also 4 matrices and each of these 4 matrices differs only slightly from the corresponding one in H ǫ 4. It can also be easily verified that the

80 67 set H ǫ 5 verifies condition (2.13) with κ 3 = Therefore, we can stop the iteration now with a stabilizing policy π ǫ,5 characterized by Hǫ 5. The relative error bound of this policy is computed to be δ(π ǫ,5 ) = The actual performance of the policy should be much better than this conservative bound. If we want to further improve the bound, we can continue with the iterations. For example, if we iterate 3 more steps, we will have H8 ǫ = Again, the set H8 ǫ still contains only 4 matrices and all these matrices are very close to the ones in H ǫ 5. The set H ǫ 8 verifies condition (2.13) with κ 3 = However, since k is increased from 5 to 8, the conservative bound reduces to δ(π ǫ,8 ) = 0.009, which is less than 1 percent. This example not only illustrates the main idea of Algorithm 5, but also indicates that the complexity of the algorithm, namely, Hk ǫ, is indeed very small and stays at the maximum value 4 as opposed to growing exponentially as k increases.

81 68 5. STABILIZATION OF DISCRETE-TIME SWITCHED LINEAR SYSTEMS In this chapter, we study the exponential stabilization problem for discrete-time switched linear systems. Specifically, our goal is to develop an efficient and constructive way to design both a switching strategy and a continuous control strategy to exponentially stabilize the system, when none of the subsystems is stabilizable but the entire switched system is exponentially stabilizable. Such a problem is regarded as one of the fundamental problems for switched systems [55], and has attracted considerable research attention [56 60]. 5.1 Backgrounds Previous research on this topic has been mainly focused on the switching stabilization problem of autonomous switched linear systems, whose subsystems have no continuous-control inputs. Many existing results [61 64] approach the problem by jointly searching for a switching strategy and a Lyapunov or Lyapunov-like function for which the closed-loop system satisfies the Multiple Lyapunov Function (MLF) theorem [60]. Unlike the traditional Lyapunov function, MLFs needs not decrease monotonically along the closed-loop system trajectory, which provides more freedom to construct stabilizing switching strategies. The main idea of most MLF-based synthesis methods is to first parameterize the switching strategy and the Lyapunov-like function in terms of certain matrices and then translate the Lyapunov or multiple- Lyapunov function theorem into matrix inequalities. The solution of these matrix inequalities, if it exists, characterizes a stabilizing switching strategy. If the solution of the matrix inequalities defines a quadratic common Lyapunov function under the proposed switching strategy, then the system is called quadratic stabilizable. It was

82 69 proved in [58,61] that the quadratic stabilizability is equivalent to the strict completeness of a certain set of symmetric matrices. From a different perspective, in [65, 66], it was shown that the system is quadratic stabilizable if there exists a stable convex combination of the subsystem matrices. This condition is also necessary for quadratical stabilizability when there are only two subsystems [67]. Notice that the quadratic stabilization problem is just a particular kind of exponential stabilization problems; there are switched linear systems that are asymptotically or exponentially stabilizable without having a quadratic common Lyapunov function [57]. In [62], a piecewise quadratic Lyapunov function was used to study the switching exponential stabilization problem. By taking a so-called largest-region-function switching strategy, the stabilization problem was formulated as a bilinear matrix inequality (BMI) problem and some heuristics are proposed to solve the BMI problem numerically. Recently, stabilization of nonautonomous switched linear systems through both switching control and continuous control has also been studied [64,68,69]. The methods were mostly direct extensions of the switching stabilization results for autonomous systems. By associating to each subsystem a feedback gain and a quadratic Lyapunov function, the stabilization problem was also formulated as a matrix inequality problem, where the feedback-gain matrices were part of the design variables. The extensive use of various Lyapunov functions has sparked a great interest in the study of the converse Lyapunov function theorems for switched linear systems. In [70, 71], it was proved that the exponential stability of a switched linear system under arbitrary switching is equivalent to the existence of a piecewise quadratic, or a piecewise linear, or a smooth homogeneous common Lyapunov function. In [72], several sufficient and necessary conditions based on the composite Lyapunov functions were derived for stability/stabilizability of switched linear systems under arbitrary switching. However, these converse Lyapunov theorems and the equivalent conditions are only true for the arbitrary-switching case; they are far from necessary for the switching-stabilization problem.

83 70 Despite the extensive literature in this field, some fundamental questions regarding the stabilization of a switched linear system remain open. As stated in [73], necessary and sufficient conditions for the existence of a general (not necessarily quadratic) stabilizing feedback strategy are not known. In addition, a constructive way of finding a stabilizing strategy when the system is known to be exponentially stabilizable is also lacking. In this chapter, we propose a general control-lyapunov function framework to tackle these open problems. One of the main contributions of this chapter is the proof of the equivalence of the following statements for a discrete-time switched linear system: 1. The system is exponentially stabilizable; 2. There exists a piecewise quadratic control-lyapunov function that can be expressed as a pointwise minimum of a finite number of quadratic functions; 3. There exists a stationary exponentially-stabilizing hybrid-control policy that consists of a homogeneous switching-control law and a piecewise-linear continuouscontrol law. The particular type of Lyapunov functions as described by item (ii) was used in [74 76] to study the switching stabilization problem; several sufficient conditions in terms of BMIs were also derived. However, the existence of this type of Lyapunov functions has not been established in the literature. The equivalence of the above three statements constitutes a converse piecewise-quadratic control-lyapunov function theorem. The theorem guarantees that to study the stabilization problem, it suffices to only consider the control-lyapunov functions of piecewise-quadratic form and the continuous-control laws of piecewise-linear form. This justifies many of the earlier controller-synthesis methods that have adopted these forms for convenience or heuristic reasons. The above results are proved by establishing a connection between the exponential stabilization problem and the DSLQR problem studied in Chapter 4. It is shown that if the switched linear system is exponentially stabilizable by an arbitrary feedback

84 71 policy, then it must also be exponentially stabilizable by a stationary suboptimal policy of a related DSLQR problem. This property transforms the stabilization problem into a DSLQR problem. Motivated by the results developed in Chapter 4, an efficient algorithm is proposed which can yield a stabilizing policy whenever the system is exponentially stabilizable. Such an algorithm improves upon many existing ones that only provide sufficient conditions for stabilizability and may not yield a stabilizing strategy even when the switched system is exponentially stabilizable. As observed in some simulation examples (Section 5.5), the stabilizing feedback policy can often be computed efficiently. 5.2 Problem Statement Consider the unconstrained discrete-time switched linear system described by x(t + 1) = A v(t) x(t) + B v(t) u(t), t Z +, (5.1) where x(t) R n is the continuous state, v(t) M is the switching control that determines the discrete mode, and u(t) R p is the continuous control. The state x and the control (u, v) are unconstrained. Similar to Section 4.2, the sequence of pairs {(u(t), v(t))} t=0 is called a hybrid-control sequence, and the pair (A i, B i ), i M, is called a subsystem. The symbol denotes specifically the Euclidean norm of a given vector or matrix. A mapping ξ : R n R p M is called a hybridcontrol law and a sequence of hybrid-control laws constitutes an infinite-horizon policy π = {ξ 0, ξ 1,...}. An infinite-horizon policy π is called stationary if it consists of the same control law at each time t Z +, and is called exponentially stabilizing if x(t; z, π ) 2 ba t z 2, z R n, t Z +, where x( ; z, π ) denotes the closed-loop trajectory driven by an infinite-horizon policy π with initial state z R n. The system (5.1) is called exponentially stabilizable if there exists an exponentially stabilizing policy. This can be trivially guaranteed if

85 72 one of the subsystem is stabilizable. In this chapter, we study a nontrivial stabilization problem under the following assumption. Assumption (A i, B i ) is not stabilizable for any i M but system (5.1) is exponentally stabilizable. Problem Under Assumption 5.2.1, find, if possible, a policy π that exponentially stabilize system (5.1). Most stabilization problems studied in the literature [61, 64, 68] assume a priori that the hybrid-control policy is stationary, i.e., (µ t, ν t ) = (µ, ν), for any t Z +, and that each discrete mode is associated with only one feedback gain, i.e., µ(x) = F ν(x) x, for some {F i } M i=1. Problem is more general than these problems as it allows for arbitrary (possibly nonstationary) hybrid-control policies. It will be shown in Section 5.3 that if the system is exponentially stabilizable, then there must exists a stationary stabilizing policy; however, the number of distinct feedback gains may be larger than the number of subsystems M. See Section 5.4 for more details. 5.3 Stabilization Using the DSLQR Controller In this section, we develop a constructive algorithm to compute an exponentially stabilizing policy for system (5.1) under Assumption The algorithm is mainly based on the result developed in Section 4.6 for solving the infinite-horizon DSLQR problem. To solve Problem 5.2.1, we consider an infinite-horizon DSLQR problem (see Chapter 4) with state weighting matrix Q i 0 and control weighting matrix R i 0, i M. For each k Z +, let V k be the k-horizon value function of the DSLQR problem with a trivial terminal cost ψ 0. Denote by V the infinite-horizon value function. For each k Z + and ǫ 0, let V ǫ k be the k-horizon approximate value

86 73 function defined in (4.14) and let Ṽ ǫ k = T [V ǫ k ]. Denote by ξǫ k generated by V ǫ k and H ǫ k according to (4.15). By Theorem 4.5.1, we have that Vk ǫ (z) = min P H ǫ k ) where (P ǫk (z), iǫk (z) the hybrid-control law ( z T Pz, and ξk ǫ (z) = K i ǫ k (z)(pk ǫ (z))z, iǫ k ), (z) (5.2) = arg min P H ǫ k,i M z T ρ i (P)z, z R n, k Z +, is the ǫ-relaxed SRS defined by (4.12). Let πǫ,k be the infinite-horizon stationary policy defined by π ǫ = {ξ ǫ k, ξ ǫ k,...}, and let λ Q = min i M{λ min (Q i )}. For easy reference, we now summarize some important results about the DSLQR problem in the following lemma. The proofs of these results can be found in Chapter 4. Lemma For a DSLQR problem with λ Q > 0 and ψ 0, we have the following results. 1. V k Ṽ ǫ k V ǫ k (1 + ǫ/λ Q )V k, for all k Z + and ǫ If system (5.1) is exponentially stabilizable, then there exists a constant β <, independent of z, such that V (z) β z 2, for all z R n. 3. Let β be the constant such that V (z) β z 2, for all z R n. Then 0 V k1 (z) V k (z) α V γ k V z 2, for all z R n and k 1 k Z +, where α V = β(β+λ Q ) λ Q < and γ V = (1 + β/λ Q ) 1 < If system (5.1) is exponentially stabilizable, then it is exponentially stabilizable by π ǫ,k for sufficiently large k and sufficiently small ǫ. Item 4 of the above lemma indicates that under Assumption there always exist constants k Z + and ǫ > 0 such that system (5.1) is exponentially stabilizable by π ǫ,k. We now improve this existence result and derive a quantitative condition for the stabilizing property of π ǫ,k.

87 74 Theorem Suppose that system (5.1) is exponentially stabilizable with β < defined in (4.19). Then for all (ǫ, k) satisfying κ ǫ k := λ Q α V γv k ǫβ/λ Q > 0, (5.3) where γ V and α V are the constants defined in Lemma 5.3.1, the approximate value function V ǫ k is an ECLF of system (5.1) with a corresponding stabilizing policy πǫ,k. Proof Fix an arbitrary z R n. Let u ǫ = µ ǫ N (z), vǫ = ν ǫ N (z) and xǫ (1) = A v ǫz + B v ǫu ǫ. Recall that Ṽ k+1 ǫ ǫ (z) = T [Vk ] = T ξk ǫ[v k ǫ ](z). Therefore, Ṽ ǫ k+1 (z) V ǫ k (xǫ (1)) λ Q z 2. (5.4) By the exponential stabilizability, there exists a constant β < such that V (z) β z 2, z R n. Then by lemma 5.3.1, we have Ṽ ǫ k+1(z) V ǫ k+1(z) V k+1 (z) + ǫβ/λ Q z 2 Combining this with inequality (5.4) yields V k (z) + (α V γ k V + ǫβ/λ Q ) z 2 V ǫ k (z) + (α V γ k V + ǫβ/λ Q ) z 2. V ǫ k (z) V ǫ k (xǫ (1)) Ṽ ǫ k+1(z) V ǫ k (x ǫ (1)) (α V γ k V + ǫβ/λ Q ) z 2 (λ Q α V γ k V ǫβ/λ Q ) z 2. Since λ Q z 2 Vk ǫ(z) (1+ǫ/λ Q )β z 2, the function Vk ǫ is an ECLF of system (5.1) with a stabilizing policy π ǫ,k whenever (ǫ, k) satisfies (5.3). Remark Clearly, the quantity in (5.3) approaches λ Q > 0 exponentially as k increases, and linearly as ǫ decreases. Therefore, for a reasonably small ǫ, the policy π ǫ,k will become exponentially stabilizing very quickly as we increase k. Remark It follows from (5.2) that V ǫ k is piecewise quadratic. Thus, Theorem can be viewed as a converse piecewise-quadratic ECLF theorem. An important consequence of such a theorem is that if system (5.1) is exponentially stabilizable

88 75 by an arbitrary policy, then it must be stabilizable by a stationary policy with a piecewise quadratic Lyapunov function of the closed-loop system. Theorem motivates a general way to solve Problem as described in Algorithm 6. The basic idea of this algorithm is to keep performing iteration (4.12) for a reasonably small ǫ until the resulting π ǫ,k becomes exponentially stabilizing. Notice that checking condition (5.3) requires the knowledge of β which is usually not available. Thus, in the algorithm we use condition (4.22) to verify whether π ǫ,k has already become a stabilizing policy or not. This algorithm guarantees to yield an exponentially stabilizing policy under Assumption 5.2.1, provided that ǫ min is sufficiently small and k max is sufficiently large. Algorithm 6 [Solution of Problem 5.2.1] Specify proper values for ǫ, ǫ min and k max and let H 0 = {0}. while ǫ > ǫ min do for k = 1 to k max do H k = Algo ǫ (ρ M (H k 1 )) if H ǫ k satisfies the condition (4.22) then stop and return H ǫ k end if end for reduce ǫ end while characterizing the stabilizing policy πǫ,k. 5.4 The Stationary Stabilizing Policy In this section, we point out some important properties of the stationary stabilizing policy π ǫ,k and compare it with other controllers proposed in the literature to gain more insight about the stabilization problem.

89 Properties of ξ ǫ k Since both H ǫ k and M contain finitely many elements, the minimizer (P ǫ k (z), iǫ k (z)) in (4.16) must be piecewise constant. For each pair (P, i) Hk ǫ M, define a subset of R n as: Ω ǫ k (P, i)={z Rn : (P, i)=arg minz T ρ 0 ( ˆP)z}. (5.5) ˆP H ǫ k,î M î The set Ω ǫ k (P, i) such defined is called a decision region associated with ξǫ k in the sense that the points within the same decision region correspond to the same pair of feedback gain K i (P) and switching control i under the feedback law ξ ǫ k. According to (5.5), a decision region must be homogeneous. This implies that the feedback law ξk ǫ is also homogeneous. Furthermore, it follows immediately from (4.15) that the continuous-feedback law µ ǫ k is piecewise linear with a constant feedback gain within each decision region. Note that a decision region Ω ǫ k (P, i) may be disconnected except at the origin and the union of all the decision regions covers the entire space R n. See Fig. 5.2 in Section for a graphical illustration of the decision regions. The decision regions that have the same switching control constitute a switching region. For each i M, the switching region SN ǫ (i) is defined as: Sk ǫ (i) = P H ǫωǫ k k (P, i). (5.6) The states that reside in the same switching region evolve through the same subsystem; however, they may be controlled by different feedback gains. In summary, the control law ξ ǫ k divides the state space into at most M Hǫ k homogeneous decision regions, each of which corresponds to a pair of feedback gain and switching control. These decision regions are exactly characterized by the matrices in the relaxed switched Riccati set Hk ǫ. For a given state value z, by comparing the values of z T ρ 0 i(p)z for each pair of (P, i) Hk ǫ M, one can easily determine which decision region the state z belongs to. At time t, if x(t) Ω ǫ k (P, i), then the hybrid control action at this time step is: u(t) = K i (P)x(t) and v(t) = i. Therefore,

90 77 after obtaining the set H ǫ k from Algorithm 6, the hybrid control sequence and the closed-loop trajectory starting from any initial state can be easily computed Relationships with Other Controllers Many hybrid-control laws proposed in the literature [61,62,68] can be written in the following form: with ξ(z) = ( µ(z), ν(z)) = (Fĩ(z) z, ĩ(z)) ĩ(z) = arg min z T Q i z, i M (5.7) where {F i } i M are the feedback gains and {Q i } i M are some symmetric matrices characterizing the decision regions. The control law ξ(z) is exponentially stabilizing if {F i } i M and {Q i } i M satisfy certain matrix inequalities. However, these matrix inequalities are only sufficient conditions for the exponential stabilizability. There may not be a stabilizing control law necessarily of the form (5.7) even when the switched linear system is exponentially stabilizable. By a similar argument as in the last subsection, it can be easily verified that (i) ξ divides the state space into at most M homogeneous decision regions; (ii) each switching control is associated with only one feedback gain. Compared with ξ, the proposed control law ξk ǫ is more general. The number of decision regions of ξǫ k may be more than M and the same switching control may be paired with more than one feedback gains.

91 78 Fig Simulation results for Example 1. Top figure: phase-plane trajectories generated by π 1,6 and π 0.1,5 starting from the same initial condition x 0 = [0, 1] T. Bottom figure: the corresponding continuous controls.

92 Numerical Examples Example 1 Consider the following two-mode switched system: A 1 = 2 0, A 2 = , B 1 = 1 2, B 2 = 1 0, Q i = I 2, R i = 1, i = 1, 2. Neither of the subsystems is stabilizable by itself. However, the switched system is stabilizable through a proper hybrid control. The stabilization problem can be easily solved using Algorithm 6. Starting from ǫ = 1, the algorithm terminates after 5 steps which results in a stabilizing policy π 1,6 defined by the relaxed switched Riccati set H6 1 = , Using a smaller relaxation ǫ = 0.1, the algorithm stops after 4 steps resulting in a stabilizing policy π 0.1,5 defined by the relaxed switched Riccati set H5 0.1 =,, With these matrices, starting from any initial position x 0, the hybrid-control laws corresponding to H 1 6 and H0.1 5 can be computed using equation (4.16). The closedloop trajectories generated by these two hybrid-control laws starting from the same initial position x 0 = [0, 1] T are plotted on the top of Fig On the bottom of the same figure, the continuous control signals corresponding to the two trajectories are plotted. The actual values of the first 10 steps of the trajectories and the controls are also provided in Table and Table

93 80 Table 5.1 Closed-loop trajectory and controls driven by π 1,6 with x 0 = [0, 1] T t x(t) u(t) v(t) t x(t) u(t) v(t) t x(t) u(t) v(t) t x u(t) v(t)

94 81 Table 5.2 Closed-loop trajectory and controls driven by π 0.1,5 with x 0 = [0, 1] T t x(t) u(t) v(t) t x(t) u(t) v(t) t x(t) u(t) v(t) t x u(t) v(t)

95 82 It can be seen that the system can indeed be stabilized by π 1,6. Furthermore, it is quite clear from Fig. 5.1 that π 0.1,5 and π0.1,5 stabilizes the system with a faster convergence speed and a smaller control energy than π 1,6. This is because a smaller relaxation ǫ makes the resulting trajectory closer to the optimal trajectory of the DSLQR problem Example 2 Consider another two-mode autonomous switched linear system, where A 1 = 0.3 1,A 2 = 1.2 1,B 1 = B 2 = 0, and Q i = I 2, R i = 1, i = 1, 2. Clearly, none of the subsystems is stabilizable. One simple strategy to exponentially stabilize the system is to alternate between the two subsystems at each time step. Such a switching strategy is nonstationary and does not depend on the system state. From the results of this chapter, the system can also be stabilized by a stationary state-dependent feedback policy. To find such a stationary policy, we apply Algorithm 6 with ǫ = 1. The algorithm terminates after 4 steps, resulting in a stabilizing policy π 1,5 H 1 5 = {ξ 1 5, ξ 1 5,...,}. The switched Riccati set contains only 2 matrices: H5 1 = , (5.8) With the above matrices, starting from any initial condition x 0, the closed-loop system trajectory and the corresponding switching control sequence can be easily computed using equation (4.16). For example, we tested the stationary policy π 1,5 with an initial condition x 0 = [1, 1] T and the corresponding trajectory and switching control sequence are listed in Table It can be seen that the system is indeed stable under the stationary policy π 1,5. We also want to use this example to graphically illustrate the decision regions defined in Section 5.4. For this purpose, let P 1 and P 2 be the first and the second

96 83 Table 5.3 Closed-loop trajectory and controls driven by π 1,5 Example 2 with x 0 = [0, 1] T for t x(t) v(t) t x(t) v(t) t x(t) v(t) t x v(t) matrices in the equation 5.8, respectively. As discussed in Section 5.4, the hybridcontrol law ξk ǫ divides the state space into 4 decision regions, depending on which pair of (i, P j ) achieves the minimum of (5.5), where i, j = 1, 2. Figure 5.2 illustrates the decision regions and the corresponding minimizing pairs (i, P j ). This example not only demonstrates the effectiveness of Algorithm 6, but also shows that the stationary stabilizing policy may be quite complicated even when the system can be trivially stabilized by a nonstationary one.

97 Fig Decision regions of Example 2 84

Infinite-Horizon Switched LQR Problems in Discrete Time: A Suboptimal Algorithm With Performance Analysis

Infinite-Horizon Switched LQR Problems in Discrete Time: A Suboptimal Algorithm With Performance Analysis Wei Zhang, Jianghai Hu, and Alessandro Abate Abstract This paper studies the quadratic regulation