Infinite-Horizon Switched LQR Problems in Discrete Time: A Suboptimal Algorithm With Performance Analysis

Size: px

Start display at page:

Download "Infinite-Horizon Switched LQR Problems in Discrete Time: A Suboptimal Algorithm With Performance Analysis"

Rafe Flynn
6 years ago
Views:

1 Infinite-Horizon Switched LQR Problems in Discrete Time: A Suboptimal Algorithm With Performance Analysis Wei Zhang, Jianghai Hu, and Alessandro Abate Abstract This paper studies the quadratic regulation problem for discrete-time switched linear systems DSLQR problem on an infinite time horizon. A general relaxation framewor is developed to simplify the computation of the value iterations. Based on this framewor, an efficient algorithm is developed to solve the infinite-horizon DSLQR problem with guaranteed closed-loop stability and suboptimal performance. Due to its stability and suboptimal performance guarantees, the proposed algorithm can be used as a general controller synthesis tool for switched linear systems. This paper studies an extension of the classical LQR problem to switched linear systems SLS, which will be referred to as the Discrete-Time Switched LQR DSLQR Problem. The goal is to find both the continuous-control and switchingcontrol strategies to minimize a quadratic cost functional over an infinite time horizon. The problem is expected to play a fundamental role in the study of switched and hybrid systems as the classical LQR problem does for linear systems. In our earlier wor [, we have proved some analytical properties for the finite-horizon DSLQR problem. Due to the discrete nature of the switching control sequence, the exact solution to the DSLQR problem is NP hard even for a finite time horizon. The main contribution of this paper is the development of an efficient algorithm to solve the infinite-horizon DSLQR problem with guaranteed suboptimal performance. The algorithm is based on a relaxation framewor that can yield efficient representations of the approximate value functions. The ey idea is to use convex optimization to identify and remove matrices that are redundant in characterizing the value functions and the corresponding suboptimal strategies. This is in line with many existing methods on approximate dynamic programming ADP [2, [7, [8, which aim at simplifying the computations by finding compact representations of the value functions up to certain numerical relaxation errors. Central to most ADP approaches is the analysis of the evolution of the relaxation errors through value iterations. Most existing error analysis methods for ADP either require a discount factor strictly less than one [2 or assume the infinitehorizon value function and the running cost function jointly satisfy some technical conditions that are in general difficult to verify a priori [7. In this paper, by taing advantage of the particular structure of the DSLQR problem, we develop a relaxation framewor that enables error analysis for undiscounted cost functions under easy-to-chec assumptions. Our analysis indicates that W. Zhang is with the Department of Electrical and Computer Engineering, The Ohio State University, Columbus OH zhang@ece.osu.edu J. Hu is with the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN jianghai@purdue.edu A. Abate is with the Delft Center for Systems and Control, Delft University of Technology, The Netherlands. a.abate@tudelft.nl as long as the SLS is stabilizable, the closed-loop performance of the DSLQR solution can be made arbitrarily close to the optimal one by properly choosing the relaxation parameter. The distance-to-optimality error bound is also derived analytically in terms of the subsystem matrices; and the bound can be evaluated a priori. Moreover, the suboptimal DSLQR solution is guaranteed to be a stabilizing controller whenever the SLS is stabilizable. These stability and performance guarantees are of fundamental importance and are often not provided by other existing optimal control strategies for switched/hybrid systems [3, [7, especially on infinite control horizon. A preliminary version of this wor has appeared in an earlier conference paper [9, and has also been successfully applied to study stabilization of SLSs [0. The main distinction of this paper as compared with [9 lies in the development of a stationary suboptimal infinite horizon policy rather than a receding-horizon type of policy proposed in [9. In addition, the asymptotic stabilizability assumption adopted in this paper is much weaer than the one in [9 that assumes one of the subsystems is stabilizable. These advances are not of theoretical importance, but also dramatically simplifies the design of DSLQR controllers for a wider range of applications. I. PROBLEM FORMULATION Letn,M andpbe some positive integers, and let Z + denote the set of nonnegative integers. We consider a discrete time switched linear system SLS given by: xt+ = A vt xt+b vt ut, t Z +, where xt R n is the continuous state, ut R p is the continuous control, vt M := {,...,M} is the discrete control that determines the discrete mode at time t. The sequence {ut,vt} t=0 is called a hybrid-control sequence. The hybrid-control action ut, vt at time t is determined through a state-feedbac hybrid-control law, namely, a function ξ µ,ν : R n R p M that maps the continuous state xt to a hybrid-control action ξxt = ut,vt. Here, µ : R n R p and ν : R n M are called the state-feedbac continuous-control law and the switching-control law, respectively. A sequence of hybridcontrol laws {ξ t } t=0 constitutes an infinite-horizon feedbac policy π = {ξ 0,ξ,...}. The closed-loop dynamics driven by a feedbac policy π = {µ t,ν t } t Z+ is governed by xt+ = A νtxtxt+b νtxtµ t xt, t Z +. 2 To be more specific, the closed-loop trajectory under π with initial state z R n will be denoted by x ;z,π. Let Lx,u,v = x T Q v x + u T R v u, x R n,u R p,v M, be the running cost function, where Q v = Q T v 0 and

2 2 R v = Rv T 0, v M, are strictly positive definite matrices of appropriate dimensions. Denote by λ min and λ max the smallest and largest eigenvalues of a symmetric matrix, respectively. Define λ Q = min i M{λ min Q i }. The strict positive definiteness of {Q i } i M implies that λ Q > 0. For a given initial state x0 = z R n, the performance of a feedbac policy π is measured through the following cost function: Jz;π = Lxt,µ t xt,ν t xt. 3 t=0 Clearly, the boundedness of J ;π requires the stabilizability of system. System is called exponentially stabilizable if π,b,a 0, such that xt;z,π 2 ba t z 2, t Z +,z R n. Such a policy π is called an exponentially stabilizing policy. It has been proved in [5 that exponential stability and asymptotic stability are equivalent for SLSs. Hence, without loss of generality, the following assumption is adopted throughout this paper. Assumption : System is exponentially stabilizable. The goal of this paper is to solve the following optimal control problem. Problem DSLQR problem: Under Assumption, find an infinite-horizon policy π that solves: V z = inf π Jz;π, z R n. Problem is a natural extension of the classical LQR problem to SLSs and is thus called the infinite-horizon Discrete-Time Switched LQR DSLQR Problem. The resulting infimum cost V will be referred to as the infinite-horizon value function of the DSLQR problem. II. APPROXIMATE VALUE ITERATION AND ERROR PROPAGATION A. Exact Value Iteration One way to tacle Problem is through dynamic programming. Denote by G + {g : R n R + {+ }} the space of all nonnegative, extended-valued functions on R n. For an arbitrary control law ξ = µ,ν : R n R p M, the operator T ξ : G + G + is defined by T ξ [gz = Lz,µz,νz+gA νz z +B νz µz, for all z R n, g G +. Minimizing T ξ over ξ yields the one-stage value iteration operator T : G + G +, i.e., { T [gz= inf Lz,u,v+gAv z+b v u }. u R p,v M Denote by T the composition of T with itself times, i.e., T = T T for all =,2,... Let V 0 0 and define V := T [V 0 as the -horizon value function of Problem. An important consequence of Assumption is the exponential convergence of V to V. Theorem : Assumption implies i β < such that λ Q z 2 V z β z 2, and ii 0 V z V z α V γv z 2, for all Z +, where α V = β2 λ Q 2 and γ V =. The constantβ can be chosen, in particular, as in 3 +λ Q /β in the Appendix. λ Q Proof: See the Appendix. The above theorem indicates that under Assumption, V is a good approximation of V for large. It turns out that for DSLQR problems, V taes a simple analytical form [. To see this, let A be the set of all positive semidefinite p.s.d. matrices, and let F be the set of all finite subsets of A, i.e., F := {H A : H < }, where denotes the cardinality of a set. Furthermore, we denote by ρ i : A A the Riccati mapping associated with subsystem i M, i.e., ρ i P=Q i +A T i PA i A T i PB i R i +B T i PB i B T i PA i. 4 We call the mapping ρ M : F F defined by: ρ M H = {ρ i P : i M and P H}, H F, the Switched Riccati Mapping SRM associated with Problem. The sets {H } Z+ generated iteratively by H + = ρ M H with H 0 = {0} are called the Switched Riccati Sets SRSs, and they characterize the finite-horizon value functions {V } Z+. Theorem 2 [: The -horizon value function of the DSLQR problem is V z = min P H z T Pz, Z +. Remar : It is beneficial to view H as a representation of V in the space F. From this perspective, ρ M is essentially a representation of the value iteration operator T in F. B. Relaxed Value Iteration As increases, the number of matrices in H grows exponentially, maing the exact characterization of V increasingly expensive. One way to alleviate this computational challenge is to remove matrices in H that are less important in terms of characterizing V. To formalize the idea, we introduce a few definitions. For any H F, define V H z = min P H z T Pz, z R n. Definition -Redundancy: For any > 0 and H F, a matrix P H is called -redundant with respect to H if V H\ Pz V H z+ z 2, for any z R n. Definition 2 -ES: For any > 0 and H F, a subset H H is called an -Equivalent-Subset -ES of H if V H z V H z V H z+ z 2, for any z R n. To simplify the computation of V at each step, we shall prune out as many redundant matrices as possible from the corresponding set H. However, checing whether a matrix in H is redundant or not is itself a challenging problem. Geometrically, any p.s.d. matrix P defines an ellipsoid possibly degenerate in R n : {x R n : x T Px }. It can be verified that P H is -redundant if and only if the ellipsoid corresponding to P +I n is contained in the union of all the ellipsoids corresponding to the matrices in H \{ P}. Since the union of ellipsoids is usually not convex, there is no general way to efficiently verify this geometric condition or equivalently the condition given in Definition. Nevertheless, a sufficient condition for -redundancy can be easily derived. Lemma : P H is -redundant if there exist nonnegative constants α,...,α H such that H α j = and P + I n H α j P j, where {P j } H is an enumeration of H\{ P}. Proof: Under the condition in the Lemma, for any z R n, we have z T P + I n z H z T α j P j z

V e + e ++ z T P jz z, for some j z {,..., H }. Thus, by definition, P is -redundant in H.

Lemma 2: The condition in Lemma holds if and only if the solution of the following convex optimization problem H max α j {α,...,α H } subject to: { H α j P j P +I n α j 0,j =,..., H satisfies H α j.

Based on the above lemma, an efficient algorithm Algorithm is developed to compute an -ES for any given set H F.

To reduce complexity, starting from H 0, we can apply ES after each SRM ρ M to obtain an -ES with fewer matrices before further propagating the set with the next SRM.

Definition 3 -relaxed SRSs: For > 0, the composite mapping ES ρ M : F F is called the -relaxed SRM of system.

3 3 Algorithm [ ES H Set H =. for each P H do if P does NOT satisfy the condition in Lemma with respect to H then H H {P} end if end for Return H Relaxed Value Iteration in V e V e e + Relaxed Value Iteration in e r e e r ES e + V e + e ++ z T P jz z, for some j z {,..., H }. Thus, by definition, P is -redundant in H. For given P and H, the condition in Lemma can be efficiently verified by solving a convex optimization problem. Lemma 2: The condition in Lemma holds if and only if the solution of the following convex optimization problem H max α j {α,...,α H } subject to: { H α j P j P +I n α j 0,j =,..., H satisfies H α j. Proof: Straightforward. The optimization problem in 5 can be easily solved using various convex optimization algorithms [4, Chapter. Based on the above lemma, an efficient algorithm Algorithm is developed to compute an -ES for any given set H F. Qualitatively, the algorithm simply removes all the matrices that satisfy the condition of Lemma and returns the set of remaining matrices. Denote by ES H the -ES returned by Algorithm. To reduce complexity, starting from H 0, we can apply ES after each SRM ρ M to obtain an -ES with fewer matrices before further propagating the set with the next SRM. The SRM ρ M followed by the pruning algorithm ES can be viewed as a relaxed version of the original SRM. Definition 3 -relaxed SRSs: For > 0, the composite mapping ES ρ M : F F is called the -relaxed SRM of system. The sets {H } Z + generated iteratively by: H 0 = {0} and H + = ES ρ M H, Z +, 6 are called the -relaxed SRSs associated with Problem. Just as the SRM ρ M represents the value iteration operator T of the DSLQR problem in the space F, the relaxed SRM ES ρ M represents the relaxed value iteration operator defined by R T, where R : G + G + will be referred to as the relaxation operator with parameter and is defined as R [V H z = V ESHz, z R n,h F. Therefore, the relaxed value iteration is the exact value iteration followed by a relaxation step. C. Approximate Value Function and Control Law We introduce two important functions, V = R T [V 0, and Ṽ + = T [V, Z Fig.. Representations of the relaxed value iteration operator in G + and F. The function V is called the -horizon approximate value function of Problem. The function Ṽ is an auxiliary function that is useful in studying the properties of V. The relationships between V and Ṽ and their connections to the relaxed SRSs are illustrated in Fig.. According to the lemma below, the error V V can be fully controlled by tuning the relaxation parameter. Lemma 3: For any z R n and 0, the functions defined in 7 satisfy: Ṽ z V z Ṽ z+ z 2, for ; 2 V z +/λ Q V z, for Z +, where V z is the exact value function with no relaxation. Proof: By Definition 2, we have 0 R [V H z V H z z 2, H F,z R n. This together with the fact that V = R [Ṽ implies part of this lemma. The second part can be proved by induction. The result clearly holds for = 0 because V0 = V 0 0. Assuming it is true for some Z +, we show that it also holds for +. For a fixed z R n, we have Ṽ + z = T [V z inf u,v {Lz,u,v++/λ Q V A v z +B v u}. Thus, V +z = R T [Vz Ṽ +z+ z 2 inf u,v { +/λ Q [Lz,u,v+V A v z +B v u = +/λ Q V +z, where the second step is due to the fact that Lz,u,v λ Q z 2 for all u R p and v M. Denote by ξ the hybrid-control law generated by V, i.e., ξ z = µ z,ν z =argmin{lz,u,v+va v z +B v u}, z R n. 8 u,v The function V and the corresponding control law ξ can be characterized analytically using the relaxed SRS H. Theorem 3: For any Z +, z R n and 0, we have where V z = min z T Pz, Ṽ P H + z = min Pz, P ρ M H zt and ξ z = K i zp zz,i z P z,i z = argmin P H,i M z T ρ i Pz, 9 }

4 4 and K i P is the Kalman gain for subsystem i corresponding to matrix P, i.e., K i P R i +B T i PB i B T i PA i. 0 Proof: The result follows easily by solving explicitly the quadratic optimization problem in 8 followed by a standard induction argument. III. SUBOPTIMAL POLICY WITH PERFORMANCE BOUND According to a standard result of dynamic programming [, the stationary policy π = {ξ,ξ,...} is an optimal solution to the infinite-horizon DSLQR problem, where ξ is the control law satisfying the Bellman equation: T ξ [V = V. However, exact characterizations of V and ξ are often impossible for DSLQR problems. A natural solution is to use the control law ξ defined in 8 with a sufficiently large and small in place of ξ to construct a stationary policy π, = {ξ,ξ,...}. In the rest of this section, we shall first establish the conditions under which π, is exponentially stabilizing, and then derive a performance bound for such a stabilizing policy. A. Stabilizing Condition with Efficient Test A sufficient condition for π, to be exponentially stabilizing is that the corresponding closed-loop system admits an exponential Lyapunov function. We briefly reccall that a function g G + is called an Exponential Lyapunov Function ELF of the closed-loop system 2 if there exist positive finite constants κ,κ 2,κ 3 such that { κ z 2 gz κ 2 z 2, z R n gxt gxt+ κ 3 xt 2, t Z +, where xt is an arbitrary trajectory of system 2. Theorem 4 Lyapunov Theorem: If there exists a policy π under which system 2 has an ELF with parameters κ, κ 2 and κ 3, then the closed-loop trajectory xt;z,π of system 2 under π is exponentially stable and satisfies xt;z,π 2 κ t 2 z 2, t Z +,z R n. κ +κ 3 /κ 2 The above theorem follows easily from standard Lyapunov theory [6, Thm 4. and its proof is omitted here. We now show that the approximate value function V will be an ELF of system 2 under the policy π, for sufficiently large and sufficiently small. Theorem 5: Fix an arbitrary z R n, and let ˆxt = xt;z,π,. Under Assumption, there exist constants κ 3 > 0, ˆ Z +, ˆ > 0 such that for any ˆ, ˆ, V is an ELF of system 2 satisfying λ Q z 2 Vz β+/λ Q z 2 V ˆxt V ˆxt+ κ 3 ˆxt 2 ; a b 2 the policy π, is exponentially stabilizing, namely, ˆxt 2 α x γx t z 2, with α x = β+/λ Q and γ λ x = Q β+/λ Q β+/λ Q +κ3. Here β 0, is the constant given in Theorem. Proof: Fix an arbitrary z R n. Inequality a follows directly from Lemma 3 and part of Theorem. To show inequality b, let ˆxt := xt;z,π, and let ût, ˆvt be the corresponding hybrid-control sequence. By 7 and 8, we now that for all t Z +, Ṽ+ ˆxt V ˆxt + = T [V ˆxt V ˆxt + λ Q ˆxt 2. Furthermore, by Lemma 3, Theorem and the fact that V z λ Q z 2 for, we have, for, Ṽ+ ˆxt V + ˆxt +/λ Q V +ˆxt +/λ Q V ˆxt +/λ Q +α VγV /λ Q V ˆxt +/λ Q + αv γ V V λ ˆxt V ˆxt+c, ˆxt 2, Q where c, is some constant that can be made arbitrarily small by choosing large and small. Therefore, V ˆxt V ˆxt + Ṽ+ ˆxt V ˆxt+ c, ˆxt 2 λ Q c, ˆxt 2, which implies b. 2 This follows directly from part and Theorem 4. This theorem indicates that as we increase and reduce, the functionv eventually becomes an ELF of system 2 with an associated stabilizing policy π,. To test whether V is an ELF, one shall verify condition b. By taing advantage of the piecewise quadratic structure of V, this verification process can be greatly simplified as follows. Lemma 4: Inequality b holds for some constant κ 3 > 0 if for each P H there exist nonnegative constants α j,j =,...,j, such that j α j = and P j α j ˆPj +κ 3 κ I n, 2 where {ˆP j } j is an { enumeration of the set } ρ MH and κ = λ min Ki P T R i K i P+Q i, with Ki P min i M,P H being the Kalman gain defined in 0. Proof: all z R n. Clearly, condition 2 implies that Recall that Ṽ + z = min P ρ M H z T Pz, for V z Ṽ + z κ 3 κ z 2, z R n. Now let z R n be arbitrary but fixed. Denote by ˆP,î the minimizer in 9 for this fixed z. Suppose that the system starts from z at time 0 and is driven by the policy π,. Let û = KîˆPz and ˆx = Aîz + Bîû be the continuous control at time 0 and the state at time t =, respectively. Plugging equations 4 and 0 into û, we have V ˆx = T min P H [ˆx P ˆx ˆx T ˆP ˆx = z T ρîˆpz û T Rîû z T Qîz z T ρîˆpz κ z 2 = Ṽ + z κ z 2. which implies V z V ˆx V z Ṽ + z + κ z 2 κ 3 z 2. Similar to Lemma, condition 2 can be verified by solving a convex optimization problem as in Lemma 2. B. Performance Bound for π, Whenever π, is exponentially stabilizing, the cost J ;π, will be bounded from above. We now derive an analytical expression for this bound.

5 5 Theorem 6: If V satisfies the condition in Theorem 5, then the cost associated with π, is bounded from above: Jz;π, +ηπ, V z, where ηπ, = β +α λ V γv α x. Here β, α Q γ xλ V, γ V, α x and γ x are Q the constants defined in Theorems and 5. Proof: Fix an arbitrary z R n. Let ˆxt = xt;z,π, for t Z +, and let û,ˆv be the corresponding hybridcontrol sequence. Then, Jz;π, = Lˆxt,ût,ˆvt = t=0 t=0 [Ṽ + ˆxt V ˆxt+ =Ṽ + z+ [Ṽ + ˆxt V ˆxt, t= where the last step follows since the stability of the trajectory ˆxt implies that V xt 0 as t. Using Lemma 3, Theorems and 5, and noticing the monotonicity of the value functions, yields Jz;π, = Ṽ +z + [Ṽ + ˆxt V ˆxt t= + [ λ V z+ +/λ Q V +ˆxt V ˆxt Q t= V z+ β z 2 + [V + ˆxt V ˆxt + λ Q β ˆxt 2 λ t= Q V z+ t= β λ Q+α V γ V α x γ x z 2 +ηπ, V z. Remar 2: The error bound ηπ, is derived based on the worst case scenario and thus may be conservative. The conservative bound is of theoretical importance as it indicates that the error decays linearly as decreases, and exponentially as increases. Moreover, the error can be made arbitrarily small by choosing small enough and large enough. C. Overall Algorithm If system is exponentially stabilizable, then by Theorem 5, it can always be stabilized by π, for sufficiently large and sufficiently small. Such a policy can be found by checing condition, or more efficiently, by verifying condition 2 through the solution of a convex optimization problem. If the policy π, is exponentially stabilizing for some and, its corresponding cost Jz;π, is bounded from above for all initial states z R n. Theorem 6 implies that further increasing and decreasing will eventually result in a policy with any desired suboptimal performance. Since the control law ξ is completely characterized by the relaxed SRS H, a suboptimal policy of the form π, can be obtained through the relaxed SRM. The basic idea is to evolve H according to the relaxed SRM 6 and stop when the obtained H verifies condition 2. The resulting policy π is guaranteed to be suboptimal with a relative error upper bound ηπ,. The detailed solution procedure is given in Algorithm 2. By choosing a proper parameter pair, max in the algorithm, the returned policy π, is exponentially stabilizing and can achieve any desired suboptimal performance under Assumption. It is also worth mentioning that with the returned set H, the closed-loop control sequences driven by π, can also be easily determined using 9. For example, if the state at time t is xt, then the hybrid-control action at this time step is: ut = KîˆPxt and vt = î, where ˆP,î = argmin P H,i Mxt T ρ i Pxt. Algorithm 2 Infinite-Horizon Suboptimal Policy Choose > 0 and max Z +, and set = 0 and H0 = {0} for = to max do H ES ρ M H if H satisfies 2 then stop and return H that characterizes π, with a end if end for relative error bound ηπ,. A. A Simple 2D Example IV. NUMERICAL EXAMPLES Consider a simple infinite-horizon DSLQR problem with two second-order subsystems: [ [ [ [ 2 2 A =,B 0 =,A 2 =,B =. 2 Suppose that the state and control weights are Q = Q 2 = I 2 and R = R 2 =, respectively. Both subsystems are unstable but controllable. Algorithm 2 with = 0 4 is applied to solve this DSLQR problem. After 5 steps, we have {[ [ H =,, [ , [ }. We observe that H5 contains four matrices and it verifies condition 2 with κ 3 = Therefore, we can stop the iteration now with a stabilizing policy π,5 characterized by H5. The relative error bound of this policy is computed to be ηπ,5 = 0.3. The actual performance of the policy should be much better than indicated by this conservative bound. The bound can be improved by carrying out more iterations. For example, after 3 more steps, we will have {[ [ H8 =,, [ [ } , Again, the set H8 still contains only four matrices and all of them are very close to the ones in H5. The set H8 verifies condition 2 with κ 3 = However, with a larger

6 6 # of Problems Fig. 2. 2D state space Complexity # of Problems D state space Complexity Complexity distributions of the random examples. = 8, the conservative bound reduces to ηπ,8 = 0.009, which is less than one percent. For this particular example, the complexity of Algorithm 2 as measured by H, is indeed very small and stays at the maximum value 4 as opposed to growing exponentially. B. Complexity Statistics of Randomly Generated Examples To further demonstrate its effectiveness, Algorithm 2 is tested by two sets of randomly generated DSLQR problems. The first set consists of 000 two-dimensional DSLQR problems with 0 subsystems. The second set consists of 000 four-dimensional DSLQR problems with 4 subsystems. For both setups, the control horizon is infinite and = 0 3. All the instances of these problems are successfully solved by Algorithm 2 and the distributions of the complexity, namely, the number of matrices in H returned by the algorithm, are plotted in Fig. 2. It can be seen from the figure that all of the solutions of the two-dimensional problems require less than 50 matrices and a majority of them only need less than 5 matrices. However, a majority of the solutions of the four-dimensional problems need about 40 matrices and some of them may need more than 00 matrices. In general, the complexity of Algorithm 2 increases with the state dimension because randomly-generated higher-dimensional matrices are less liely to be redundant. In a higher dimensional state space, a larger relaxation is usually needed to retain the same computational complexity. V. CONCLUSION We have developed an efficient algorithm to solve the infinite-horizon DSLQR problem with guaranteed suboptimal performance. The proposed algorithm together with its analysis provides a general controller synthesis framewor for switched linear systems with quadratic performance index. A preliminary version of the proposed algorithm has already been successfully applied to stabilize switched linear systems [0. Since the LQR controller is widely used to achieve local stability for nonlinear systems, our DSLQR solution can be similarly used to locally stabilize switched nonlinear systems. In addition, we also envision the algorithm and its analysis to be useful in quadratic estimation, robust control, and model predictive control problems of switched linear systems or even general hybrid systems. max i M {λ max Q i },λ + R = max i M{λ max R i }, and σ { } + A = max i M λmax A T i A i. Let σ + min be the smallest positive singular value of a nonzero matrix. If I + B, let ˆσ B = min i I + {σ + min B i}. Let π be an exponentially stabilizing B policy, and let b and a 0, be the constants such that xt;z,π 2 ba t z 2. Define bλ + Q a β =, if I+ B = λ + Q +λ+ 2[a+σ + A 2 b R ˆσ B 2 a, otherwise; 3 Part i of Theorem can be found in [0, Lemma 4. To prove Part ii, we fix an arbitrary z R n and integer >, let x be the optimal -horizon trajectory starting from z and let u,v be the corresponding optimal hybrid-control sequence. Then, for t = 0,...,, we have V t x t V t+x t+ = Lx t,u t,v t λ Q x t 2 λ Q β V tx t λ Q β V t+x t +. Since λ Q x t 2 V t x t β x t 2, for t = 0,...,, the above inequality implies x 2 β λ Q +λ Q /β z 2. 4 Note that we can not obtain a similar inequality for x as V 0 x 0. By the optimality of V, we have V z 2 t=0 Lx t,u t,v t + V x = V z Lx,u,v +V x V z+β λ Q x 2. The desired result then follows easily by plugging 4 into the above inequality. REFERENCES [ D. P. Bertseas. Dynamic Programming and Optimal Control, volume 2. Athena Scientific, 2nd edition, 200. [2 D. P. Bertseas and J. Tsitsilis. Neuro-Dynamic Programming. Athena Scientific, Sep [3 F. Borrelli, M. Baotic, A. Bemporad, and M. Morari. Dynamic programming for constrained optimal control of discrete-time linear hybrid systems. Automatica, 4:709 72, Oct [4 S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, New Yor, NY, USA, [5 J. Hu, J. Shen, and W. Zhang. Generating functions of switched linear systems: Analysis, computation, and stability applications. IEEE Transaction on Automatic Control, 565: , May 20. [6 H. K. Khalil. Nonlinear Systems. Prentice Hall, [7 B. Lincoln and A. Rantzer. Relaxing dynamic programming. IEEE Transactions on Automatic Control, 58: , Aug [8 W. B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality Wiley Series in Probability and Statistics. Wiley- Interscience, [9 W. Zhang, A. Abate, and J. Hu. Efficient suboptimal solutions of switched LQR problems. In Proceedings of the American Control Conference, St. Louis, MO, Jun [0 W. Zhang, A. Abate, J. Hu, and M. Vitus. Exponential stabilization of discrete-time switched linear systems. Automatica, 45: , Nov [ W. Zhang, J. Hu, and A. Abate. On the value functions of the discretetime switched LQR problem. IEEE Transactions on Automatic Control, 54: , Nov APPENDIX Denote by I + B M the set of indices of nonzero B matrices, i.e., I + B {i M : B i 0}. Define λ + Q =

Stabilization of Discrete-Time Switched Linear Systems: A Control-Lyapunov Function Approach

Stabilization of Discrete-Time Switched Linear Systems: A Control-Lyapunov Function Approach Wei Zhang 1, Alessandro Abate 2 and Jianghai Hu 1 1 School of Electrical and Computer Engineering, Purdue University,