Notes for ENEE 664: Optimal Control. André L. Tits

Size: px

Start display at page:

Download "Notes for ENEE 664: Optimal Control. André L. Tits"

Milton Alexander
5 years ago
Views:

1 Notes for ENEE 664: Optimal Control André L. Tits DRAFT August 2008

3 Contents 1 Motivation and Scope Some Examples Scope of the Course Linear Quadratic Optimal Control Fixed end point problem Free end point problem Infinite horizon LTI problem More general optimal control problems Unconstrained Optimization First order condition of optimality Steepest descent method Introduction to convergence analysis Minimization of convex functions Second order optimality conditions Conjugate direction methods Rates of convergence Newton s method Variable metric methods Constrained Optimization Abstract Constraint Set Equality Constraints - First Order Conditions Equality Constraints Second Order Conditions Inequality Constraints First Order Conditions Mixed Constraints First Order Conditions Mixed Constraints Second order Conditions Glance at Numerical Methods for Constrained Problems Sensitivity Duality Linear programming Copyright c , André L. Tits. All Rights Reserved 1

4 CONTENTS 5 Dynamic Optimization Introduction to the calculus of variations Discrete-Time Optimal Control Continuous-Time Optimal Control - Linear Systems Continuous-Time Optimal Control; Nonlinear Systems Dynamic programming A Generalities on Vector Spaces 141 B On Differentiability and Convexity 155 B.1 Differentiability B.2 Some elements of convex analysis B.3 Acknowledgement Copyright c , André L. Tits. All Rights Reserved

5 Chapter 1 Motivation and Scope 1.1 Some Examples We give some examples of design problems in engineering that can be formulated as mathematical optimization problems. Although we emphasize here engineering design, optimization is widely used in other fields such as economics or operations research. Such examples can be found, e.g., in [15]. Example 1.1 Design of an operational amplifier (opamp) Suppose the following features (specifications) are desired 1. a large gain-bandwidth product 2. sufficient stability 3. low power dissipation In this course, we deal with parametric optimization. This means, for this example, that we assume the topology of the circuit has already been chosen, the only freedom left being the choice of the value of a number of design parameters (resistors, capacitors, various transistor parameters). In real world, once the parametric optimization has been performed, the designer will possibly decide to modify the topology of his circuit, hoping to be able to achieve better performances. Another parametric optimization is then performed. This loop may be repeated many times. To formulate the opamp design problem as an optimization problem, one has to specify one (possibly several) objective function(s) and various constraints. We decide for the following goal: minimize subject to the power dissipated gain-bandwidth product ق M 1 (given) frequency response ق M 2 at all frequencies. The last constraint will prevent two high a peaking in the frequency response, thereby ensuring sufficient closed-loop stability margin. We now denote by x the vector of design parameters Copyright c , André L. Tits. All Rights Reserved 3

6 Motivation and Scope x = (R 1, R 2,...,C 1, C 2,..., α i,...) R n For any given x, the circuit is now entirely specified and the various quantities mentioned above can be computed. More precisely, we can define P(x) = power dissipated GB(x) = gain-bandwidth product FR(x, ω) = frequency response, as a function of the frequency ω. We then write the optimization problem as min»p(x) GB(x) ق M 1, FR(x, ω) ق M 2 ω Ω (1.1) where Ω = [ω 1, ω 2 ] is a range of critical frequencies. To obtain a canonical form, we now define and we obtain f(x) := P(x) (1.2) g(x) := M 1 GB(x) (1.3) φ(x, ω) := FR(x, ω) M 2 (1.4) min»f(x) g(x) ق 0, φ(x, ω) ق 0 ω Ω (1.5) Note. We will systematically use notations such as min»f(x) g(x) ق 0 not to just indicate the minimum value, but rather as a short-hand for minimize f(x) subject to g(x) ق 0 More generally, one would have min»f(x) g i (x) ق 0, i = 1, 2,..., m, φ i (x, ω) ق 0, ω Ω i, i = 1,..., k. (1.6) If we define g : R n R m by g(x) = g 1 (x). g m (x) (1.7) 4 Copyright c , André L. Tits. All Rights Reserved

7 1.1 Some Examples and, assuming that all the Ω i s are identical, if we define φ : R n Ω R k by we obtain again φ(x, ω) = φ 1 (x, ω). φ k (x, ω) (1.8) min»f(x) g(x) ق 0, φ(x, ω) ق 0 ω Ω (1.9) [This is called a semi-infinite optimization problem: finitely many variables, infinitely many constraints.] Note. If we define (1.9) is equivalent to ψ i (x) = sup φ i (x, ω) (1.10) ω Ω min»f(x) g(x) ق 0, ψ(x) ق 0 (1.11) (more precisely,»x φ(x, ω) ق 0 ω Ω =»x ψ(x) ق 0 ) and ψ(x) can be absorbed into g(x). This transformation may not be advisable, for the following reasons: (i) some potentially useful information (e.g., what are the critical values of ω) is lost when replacing (1.9) by (1.11) (ii) for given x, ψ(x) may not be computable exactly in finite time (this computation involves another optimization problem) (iii) ψ may not be smooth even when φ is, as shown in the exercise below. Thus (1.11) may not be solvable by classical methods. Exercise 1.1 Prove the equivalence between (1.9) and (1.11). (To prove A = B, prove A B and B A.) Exercise 1.2 Suppose that φ: R n Ω R is continuous and that Ω is compact, so that the sup in (1.10) can be written as a max. Show that ψ is continuous. Exercise 1.3 Show, by exhibiting counterexamples, that (i) (compactness of Ω and) continuity of φ in each variable separately does not imply continuity of ψ (even though the sup in (1.10) is achieved under such assumptions) and that (ii) continuity of ψ does not follow either if the compactness assumption is dropped (even if the sup is achieved). Copyright c , André L. Tits. All Rights Reserved 5

8 Motivation and Scope Exercise 1.4 Exhibit an example where φ C (all derivatives exist and are continuous), where Ω is compact, but where ψ is not everywhere differentiable. However, in this course, we will limit ourselves mostly to classical (non semi-infinite) problems (and will generally assume continuous differentiability), i.e., to problems of the form min»f(x) g(x) ق 0, h(x) = 0 where f : R n R, g : R n R m, h : R n R l, for some positive integers n, m and l. Remark 1.1 To fit the opamp design problem into formulation (1.11) we had to pick one of the design specifications as objective (to be minimized). Intuitively more appealing would be some kind of multiobjective optimization problem. Example 1.2 Design of a p.i.d controller (proportional - integral - derivative) The scalar plant G(s) is to be controlled by a p.i.d. controller (see Figure 1.1). Again, the structure of the controller has already been chosen; only the values of three parameters have to be determined (x = [x 1, x 2, x 3 ] T ). R(s) + E(x,s) x Y(x,s) x + x s s G(s) T(x,s) Figure 1.1: Suppose the specifications are as follows low value of the ISE for a step input (ISE = integral of the square of the difference (error) between input and output, in time domain) enough stability short rise time, settling time, low overshoot We decide to minimize the ISE, while keeping the Nyquist plot of T(x, s) outside some forbidden region (see Figure 1.2) and keeping rise time, settling time, and overshoot under given values. The following constraints are also specified. 10 ق x 1 ق 10, 10 ق x 2 ق 10,.1 ق x 3 ق 10 Exercise 1.5 Put the p.i.d. problem in the form (1.6), i.e., specify f, g i, φ i, Ω i. 6 Copyright c , André L. Tits. All Rights Reserved

9 1.1 Some Examples parabola y=(a+1)x 2- b u(t) (-1,0) w=0 1 y(t) T(x,jw) w=w ι(t) Tr Ts T Figure 1.2: T(x, jω) has to stay outside the forbidden region ω [0, ω] For a step input, y(x, t) is desired to remain between l(t) and u(t) for t [0, T] Example 1.3 Consider again a plant, possibly nonlinear and time varying, and suppose we want to determine the best control u(t) to approach a desired response. ẋ = F(x, u, t) y = G(x, t) We may want to determine (مم) u to minimize the integral J(u) = T 0 (y u (t) v(t)) 2 dt where y u (t) is the output corresponding to control (مم) u and (مم) v is some reference signal. Various features may have to be taken into account: Constraints on (مم) u (for realizability) piecewise continuous u(t) ق U t T may be finite or infinite x(0), x(t) may be free, fixed, constrained The entire state trajectory may be constrained,((مم) x ) e.g., to keep the temperature reasonable One may require a closed loop control, e.g., u(t) = u(x(t)). It is well known that such feedback control systems are much less sensitive to perturbations and modeling errors. Copyright c , André L. Tits. All Rights Reserved 7

10 Motivation and Scope Unlike Example 1.1 and Example 1.2, Example 1.3 is an optimal control problem. Whereas discrete-time optimal control problems can be solved by classical optimization techniques, continuous-time problems involve optimization in infinite dimension spaces (a complete waveform has to be determined). To conclude this section we now introduce the class of problems that will be studied in this course. Consider the abstract optimization problem (P) min»f(x) x S where S is a subset of a vector space X and where f : X R is the cost or objective function. S is the feasible set. Any x in S is a feasible point. Definition 1.1 A point ˆx is called a (strict) global minimizer for (P) if ˆx S and f(ˆx) ق f(x) x S (<) ( x S, x ˆx) Assume now X is equipped with a norm. Definition 1.2 A point ˆx is called a (strict) local minimizer for (P) if ˆx S and ǫ > 0 such that f(ˆx) ق f(x) x S B(ˆx, ǫ) (<) ( x S B(ˆx, ǫ), x ˆx) 8 Copyright c , André L. Tits. All Rights Reserved

11 1.2 Scope of the Course 1.2 Scope of the Course 1. Type of optimization problems considered (i) Finite dimensional unconstrained equality constrained inequality [and equality] constrained linear, quadratic programs, convex problems multiobjective problems discrete optimal control (ii) Infinite dimensional calculus of variations (no control signal) (old: 1800) optimal control (new: 1950 s) Note: most types in (i) can be present in (ii) as well. 2. Results sought Essentially, solve the problem. The steps are conditions of optimality ( simpler characterization of solutions) numerical methods: solve the problem, generally by solving some optimality condition or, at least, using the insight such conditions provide. sensitivity: how good is the solutions in the sense of what if we didn t solve exactly the right problem? ( duality: some transformation of the original problem into a hopefully simpler optimization problem) Copyright c , André L. Tits. All Rights Reserved 9

13 Chapter 2 Linear Quadratic Optimal Control References: [21, 1]. Background material for this chapter is given in Appendix A. Consider the linear control system ẋ(t) = A(t)x(t) + B(t)u(t) x(t 0 ) = x 0 (2.1) where x(t) R n, u(t) R m and (مم) A and (مم) B are matrix valued functions. Suppose,(مم) A (مم) B and (مم) u are continuous. Then (2.1) has the unique solution x(t) = Φ(t, t 0 )x 0 + t t 0 Φ(t, σ)b(σ)u(σ)dσ where the state transition matrix Φ satisfies the homogeneous differential equation with initial condition t Φ(t, t 0) = A(t)Φ(t, t 0 ) Φ(t 0, t 0 ) = I. Finally, for any t 1, t 2, the transition matrix Φ(t 1, t 2 ) is invertible and Φ(t 1, t 2 ) 1 = Φ(t 2, t 1 ). 2.1 Fixed end point problem Let t 1 > t 0. Suppose we wish to use the control function (مم) u : [t 0, t 1 ] R m in a certain class U of sufficiently regular functions. For simplicity, let U = C[t 0, t 1 ] m, the class of continuous such functions. Question: Given x 1 R n, does there exist u U such that, for system (2.1), x(t 1 ) = x 1? If the answer to the above is yes, we say that x 1 is reachable from (x 0, t 0 ) at time t 1. If moreover this holds for all x 0, x 1 R n then we say that the system (2.1) is reachable on [t 0, t 1 ]. Copyright c , André L. Tits. All Rights Reserved 11

14 Linear Quadratic Optimal Control There is no loss of generality in assuming that x 1 = θ, as shown by the following exercise. Exercise. Define ˆx(t) := x(t) Φ(t, t 1 )x 1. Then ˆx satisfies d = A(t)ˆx(t) + B(t)u(t). dtˆx(t) Conclude that, under dynamics (2.1), u steers (x 0, t 0 ) to (x 1, t 1 ) if and only if it steers (x 0 Φ(t 0, t 1 )x 1, t 0 ) to (θ, t 1 ). Since Φ(t 0, t 1 ) is invertible, it follows that system (2.1) is reachable on [t 0, t 1 ] if and only if it is controllable on [t 0, t 1 ], i.e., if and only if, given x 0, there exists u U that steers (x 0, t 0 ) to (θ, t 1 ). Note. Equivalence between reachability and controllability (to the origin) does not hold in the discrete-time case, where controllability is a weaker property than controllability. Now reachability of (θ, t 1 ) from (ξ, t 0 ), for some ξ is equivalent to solvability of the linear :((مم) u equation (in Φ(t 1, t 0 )ξ + t 1 t 0 Φ(t 1, σ)b(σ)u(σ)dσ = 0. Equivalently (multiplying on the left by the nonsingular matrix Φ(t 0, t 1 )), ξ t 1 = Φ(t 0, σ)b(σ)u(σ)dσ (2.2) t 0 =: L(u). with L : U R n, a linear map. Clearly, θ is reachable at time t 1 from (ξ, t 0 ) if and only if ξ R(L). It follows that (2.1) is controllable on [t 0, t 1 ] if and only if R(L) = R n. Suppose now θ is reachable at time t 1 from (ξ, t 0 ) and suppose we want to reach θ while spending the least energy, i.e., while minimizing u, u = t 1 t 0 u(t) T u(t)dt where,مم مم is the L 2 inner product over U (see Example A.4 in Appendix A). That is, we want to solve the minimum energy fixed endpoint problem (FEP) minimize u, u subject to (2.1), x(t 1 ) = θ, and u U. We know (see Appendix A) that the unique such control is given by u = L η 0, where L is the adjoint of L for inner product,مم, مم and where η 0 is any point in R n satisfying LL η 0 = ξ 12 Copyright c , André L. Tits. All Rights Reserved

15 2.1 Fixed end point problem (and such points do exist). It is shown in Appendix A that L is given by which yields (L x)(t) = B T (t)φ T (t 0, t)x t 1 LL x = Φ(t 0, σ)b(σ)(l x)(σ)dσ t 0 = t 1 t 0 Φ(t 0, σ)b(σ)b T (σ)φ T (t 0, σ)dσ x, x, i.e., LL : R n R n is given by LL = t 1 t 0 Φ(t 0, t)b(t)b T (t)φ T (t 0, t)dt =: W(t 0, t 1 ). Since R(L) = R(LL ), θ is reachable at t 1 from (ξ, t 0 ) if and only if The matrix W(t 0, t 1 ) has entries ξ R(W(t 0, t 1 )). (W(t 0, t 1 )) ij = (Φ i (t 0, ((مم) B (مم T, (Φ j (t 0, ((مم) B (مم T, where,مم مم is again the L 2 inner product, i.e., W(t 0, t 1 ) is the Gramian matrix (or Gram matrix, or Gramian) associated with the vectors (Φ 1 (t 0, ((مم) B (مم T,...,(Φ n (t 0, ((مم) B (مم T, (مم) B =) T (Φ 1 (t 0, ((مم T,.. (مم) B,. T (Φ n (t 0, ((مم T ), which are the columns of the matrix (مم) B T (Φ(t 0, ((مم T ). It is known as the controllability Gramian. It is invertible if and only if R(L) = R n, i.e., if and only if the system is controllable on [t 0, t 1 ]. Suppose this is the case and let ξ := ξ(x 0, t 0 ) := x 0 Φ(t 0, t 1 )x 1. The minimum energy control that steers (x 0, t 0 ) to (x 1, t 1 ) is then given by i.e. and the corresponding energy is given by u 0 = L (LL ) 1 ξ(x 0, t 0 ) u 0 (t) = B T (t)φ T (t 0, t)w(t 0, t 1 ) 1 ξ(x 0, t 0 ) (2.3) u 0, u 0 = ξ(x 0, t 0 ), W(t 0, t 1 ) 1 ξ(x 0, t 0 ). Note that, as expressed in (2.3), u 0 (t) depends explicitly, through ξ(x 0, t 0 ), on the initial state x 0 and initial time t 0. Consequently, if between t 0 and the current time t, the state Copyright c , André L. Tits. All Rights Reserved 13

16 Linear Quadratic Optimal Control has been affected by an external perturbation, u 0 as expressed by (2.3) is no longer optimal (minimum energy) over the remaining time interval [t, t 1 ]. Let us address this issue. At time t 0, we have u 0 (t 0 ) = B T (t 0 )Φ T (t 0, t 0 )W(t 0, t 1 ) 1 ξ(x 0, t 0 ) = B T (t 0 )W(t 0, t 1 ) 1 ξ(x 0, t 0 ). Intuitively, this must hold independently of the value of t 0, i.e., for the problem under consideration, Bellman s Principle of Optimality holds: independently of the initial state (at t 0 ), for u 0 to be optimal for (FEP), it is necessary that u 0 applied from the current time t ق t 0 up to the final time t 1, starting at the current state x(t), be optimal for the remaining problem, i.e., for the objective function t 1 t u(τ) T u(τ)dτ. Specifically, given x R n and t [t 0, t 1 ] such that x 1 is reachable at time t 1 from (x, t), denote by P(x, t; x 1, t 1 ) the problem of determining the control of least energy that steers (x, t) to (x 1, t 1 ), i.e., problem (FEP) with (x, t) replacing (x 0, t 0 ). Let (مم) x be the state trajectory that results when optimal control u 0 is applied, starting from x 0 at time t 0. Then Bellman s Principle of Optimality asserts that, for any t [t 0, t 1 ], the restriction of u 0 to [t, t 1 ] solves P(x(t), t; x 1, t 1 ). Exercise 2.1 Prove that Bellman s Principle of Optimality holds for the minimum energy fixed endpoint problem. [Hint: Observe that the entire derivation so far still works when U is redefined to be piecewise continuous on [t 0, t 1 ] (i.e., continuous except at finitely many points, with finite left and right limits at those points). Then, Assuming by contradiction existence of a lower energy control for P(x(t), t; x 1, t 1 ), show by construction that u 0 cannot be optimal for P(x 0, t 0 ; x 1, t 1 ).] It follows that, for any t such that W(t, t 1 ) is invertible, u 0 (t) = B T (t)w(t, t 1 ) 1 ξ(x(t), t) t (2.4) which yields the closed loop implementation ( independent of initial time and state ) depicted in Figure 2.1, where v(t) = B T (t)w 1 (t, t 1 )Φ(t, t 1 )x 1. In particular, if x 1 = θ, then v(t) = 0 t. If, for whatever reason, the state is perturbed at some point in time, (2.4) is still optimal over the remaining time interval (assuming there is no subsequent perturbation) while (2.3) is not. Exercise 2.2 Prove (2.4) from (2.3) directly, without invoking Bellman s principle. Example 2.1 (charging capacitor) d cv(t) = i(t) dt minimize t 1 0 ri(t) 2 dt s.t. v(0) = v 0, v(t 1 ) = v 1 14 Copyright c , André L. Tits. All Rights Reserved

17 2.1 Fixed end point problem v(t) + - u(t) B(t) + x(t) x(t) A(t) B T(t)W -1(t,t 1 ) Figure 2.1: Closed loop implementation r i c + - v Figure 2.2: Charging capacitor We obtain B(t) 1 c ; A(t) 0 W(0, t 1 ) = t c 2dt = t 1 c 2 η 0 = c 2v 0 v 1 t 1 i 0 (t) = 1 c 2 (v 0 v 1 ) = c(v 1 v 0 ) = constant. c t 1 t 1 The closed loop optimal feedback law is given by i 0 (t) = c t 1 t (v(t) v 1). Exercise 2.3 Discuss the same optimal control problem (with fixed end points) with the objective function replaced by J(u) t 1 t 0 u(t), R(t)u(t) dt Copyright c , André L. Tits. All Rights Reserved 15

18 Linear Quadratic Optimal Control where R(t) = R(t) T > 0 for all t [t 0, t 1 ] and (مم) R is continuous. [Hint (to be proved if you use it, as usual): There is no loss of generality in assuming that R(t) is symmetric. Further, the Cholesky factor of a symmetric positive definite matrix is unique and continuous. Consider a change of control variable. Other approach: define a new inner product on U.] Remark 2.1 In view of (2.2), a control u steers (x 0, t 0 ) to (x 1, t 1 ) if and only if it steers (x 0 Φ(t 0, t 1 )x 1, t 0 ) to (θ, t 1 ). Thus there is no loss of generality in assuming x 1 = θ. The controllability Gramian,مم) W (مم happens to satisfy certain simple equations. Recalling that W(t, t 1 ) = one easily verifies that W(t 1, t 1 ) = 0 and t 1 t Φ(t, σ)b(σ)b T (σ)φ(t, σ) T dσ d dt W(t, t 1) = A(t)W(t, t 1 ) + W(t, t 1 )A T (t) B(t)B T (t) (2.5) implying that, if W(t, t 1 ) is invertible, it satisfies d dt W(t, t 1) 1 = W(t, t 1 ) 1 A(t) A T (t)w(t, t 1 ) 1 + W(t, t 1 ) 1 B(t)B T (t)w(t, t 1 ) 1 (2.6) Exercise 2.4 Prove (2.6). (Hint: d dt M(t) 1 = M(t) 1 ( d dt M(t))M(t) 1.) Equation (2.5) is linear. It is a Lyapunov equation. Equation (2.6) is quadratic. It is a Riccati equation (for W(t, t 1 ) 1 ).,مم) W (مم also satisfies the functional equation W(t 0, t 1 ) = W(t 0, t) + Φ(t 0, t)w(t, t 1 )Φ T (t 0, t). As we will see later, the Riccati equation plays a fundamental role in optimal control systems involving linear dynamics and quadratic cost (linear-quadratic problems). At this point, simply note that, if x 1 = θ and W(t, t 1 ) is invertible, then u 0 (t) = B(t) T P(t)x(t), where P(t) = W(t, t 1 ) 1 solves Riccati equation (2.6). Example 2.2 (scalar Riccati equation) k = 1 k 2, k(t) = 0 Solution tanh 1 (k(t)) tanh 1 (k(t)) = t T k(t) = tanh (t T). 16 Copyright c , André L. Tits. All Rights Reserved

19 2.1 Fixed end point problem We have seen that, if W(t 0, t 1 ) is invertible, the optimal cost for problem (FEP) is given by J(u 0 ) = u 0, u 0 = x(t 0 ) Φ(t 0, t 1 )x 1, W(t 0, t 1 ) 1 (x(t 0 ) Φ(t 0, t 1 )x 1 ). (2.7) This is clearly true for any t 0, so that, from a given time t < t 1 (such that W(t, t 1 ) is invertible) the cost-to-go is given by x(t) Φ(t, t 1 )x 1, W(t, t 1 ) 1 (x(t) Φ(t, t 1 )x 1 ). In particular, if x 1 = θ, the cost-to-go takes value x(t), W(t, t 1 ) 1 x(t). We will see that such an expression arises in other types of linear-quadratic optimal control problems, with W(t, t 1 ) 1 replaced by some matrix K(t), solution of a certain Riccati equation. The following fundamental lemma, in which K(t) is arbitrary, expresses the variation of such expressions as a certain integral. Lemma 2.1 (Fundamental Lemma) Let,(مم) A (مم) B be continuous matrix-value functions and (مم) K = K T (مم) be an absolutely continuous matrix-valued function. Suppose K(t) exists on [t 0, t 1 ]. Then, if x(t) and u(t) are related by ẋ(t) = A(t)x(t) + B(t)u(t), (2.8) it holds x(t 1 ), K(t 1 )x(t 1 ) x(t 0 ), K(t 0 )x(t 0 ) = t 1 t 0 [ x(t) u(t) ], [ K(t) + A T (t)k(t) + K(t)A(t) K(t)B(t) B(t) T K(t) 0 ] [ x(t) u(t) ] dt (2.9) Proof. x(t 1 ), K(t 1 )x(t 1 ) x(t 0 ), K(t 0 )x(t 0 ) = t 1 t 0 d x(t), K(t)x(t) dt dt = t 1 t 0 ( ẋ(t), K(t)x(t) + x(t), K(t)x(t) + x(t), K(t)ẋ(t) )dt and the claim follows if one substitutes for ẋ(t) the right hand side of (2.8). Note that, while the integral in this lemma involves the paths x(t), u(t), t [t 0, t 1 ], its value depends only on the end points of the trajectory,(مم) x i.e., this integral is path independent. Now consider the more general quadratic cost J(u) := t 1 t 1 t 0 x(t), L(t)x(t) + u(t), u(t) dt = t 0 [ x(t) u(t) ], [ L(t) 0 0 I ] [ x(t) u(t) ] dt,(2.10) Copyright c , André L. Tits. All Rights Reserved 17

20 Linear Quadratic Optimal Control where (مم) L = (مم) L T is continuous. Let K(t) = K(t) T be some absolutely continuous timedependent matrix. Using the Fundamental Lemma we see that, since x 0 and x i are fixed, it is equivalent to minimize J(u) := J(u) + x 1, K(t 1 )x 1 x 0, K(t 0 )x 0 := t 1 [ ] x(t) =, u(t) t 0 t 1 t 0 ( x(t), L(t)x(t) + u(t), u(t) ) dt [ L(t) + K(t) + A T (t)k(t) + K(t)A(t) K(t)B(t) B(t) T K(t) 0 To complete the square, suppose there exists such K(t) that satisfies i.e., L(t) + K(t) + A T (t)k(t) + K(t)A(t) = K(t)B(t)B(t) T K(t) ] [ x(t) u(t) ] dt. K(t) = A T (t)k(t) K(t)A(t) + K(t)B(t)B(t) T K(t) L(t), (2.11) a Riccati Differential Equation (DRE). (Existence of a solution to this equation depends, in general, on.(مم) L As we will see late, when discussing the free endpoint problem, if L(t) is positive semidefinite for all t, then a solution exists, e.g., for every prescribed positive semidefinite final value K(t 1 ).) Then we get t 1 [ ] x(t) J(u) =, u(t) t 0 = t 1 [ K(t)B(t)B(t) T K(t) K(t)B(t) B(t) T K(t) 0 ] [ x(t) u(t) t 0 B(t) T K(t)x(t) + u(t), B(t) T K(t)x(t) + u(t) dt. Now, again supposing that some solution to (DRE) exists, let (مم) K be any such solution and let v(t) = B T (t)k(t)x(t) + u(t). It is readily verified that, in terms of the new control input v, the systems dynamics become and the cost function takes the form That is, we end up with the problem minimize ẋ(t) = [A(t) B(t)B T (t)k(t)]x(t) + Bv(t), t 1 t 0 J(u) = t 1 v(t), v(t) dt t 0 v(t), v(t) dt. ] dt. subject to ẋ(t) = [A(t) B(t)B T (t)π(t, K 1, t 1 )]x(t) + Bv(t) (2.12) x(t 0 ) = x 0 x(t 1 ) = x 1 18 Copyright c , André L. Tits. All Rights Reserved

21 2.1 Fixed end point problem where, following standard using, we have parametrized the solutions to DRE by there value (K 1 ) at time t 1, and denoted them by Π(t, K 1, t 1 ). This transformed problem (parameterized by K 1 ) is of an identical form to that we solved earlier. Denote by Φ A BB T Π and W A BB T Π the state transition matrix and controllability Gramian for (2.12) (for economy of notation, we have kept implicit the dependence of Π on K 1. Then, for a given K 1, we can write the optimal control v K 1 0 for the transformed problem as (assuming x 1 = θ for simplicity) v K 1 0 (t) = B(t)T W 1 A BB T Π x(t), and the optimal control u 0 for the original problem as u 0 (t) = v K 1 0 (t) B(t)T Π(t, K 1, t 1 )x(t) = B(t) T (Π(t, K 1, t 1 ) + W 1 A BB T Π )x(t). Still with x 1 = θ, we also obtain J(u 0 ) = J(u 0 ) + x 0, Π(t 0, K 1, t 1 )x 0 = x 0, (W A BB T Π(t 0, t 1 ) 1 + Π(t 0, K 1, t 1 ))x 0. The cost-to-go at time t is x(t), (W A BB T Π(t, t 1 ) 1 + Π(t, K 1, t 1 ))x(t) Finally, if L(t) is identically zero, we can pick K(t) identically zero and we recover the previous result. Exercise 2.5 Show that controllability of (2.1) on [t 0, t 1 ] implies invertibility of W A BB T Π and vice-versa. We obtain the block diagram depicted in Figure 2.3. Thus u 0 (t) = B(t) T p(t), with w = v 0 + u 0 - B(t) + + x A(t) B T(t)Π(t,K 1,t 1 ) B T(t)W A-BB TΠ (t,t 1 ) -1 Figure 2.3: Optimal feedback law p(t) = (Π(t, K 1, t 1 ) + W A BB T Π(t, t 1 ) 1 )x(t). Copyright c , André L. Tits. All Rights Reserved 19

22 Linear Quadratic Optimal Control If x 1 θ, then w 0 (t) = B T (t)w A BB T Π(t, t 1 ) 1 (x(t) Φ A BB T Π(t, t 1 )x 1 ). Note that while v 0 clearly depends on K 1, u 0 obviously cannot, since K 1 is an arbitrary symmetric matrix (subject to DRE having a solution with K(t 1 ) = K 1 ). Thus we could have assigned K 1 = 0 throughout the analysis. Check the details of this. The above is a valid closed-loop implementation as it does not involve the initial point (x 0, t 0 ) (indeed perturbations may have affected the trajectory between t 0 and the current time t). Π(t, K 1, t 1 ) can be precomputed (again, we must assume that such a solution exists.) Finally, note that the optimal cost J(u 0 ) is given by J(u 0 ) = J(u 0 ) ( x 1, K 1, x 1 x 0, Π(t 0, K 1, t 1 )x 0 ) = x 0 Φ A BB T Π(t 0, t 1 )x 1, W A BB T Π(t 0, t 1 ) 1 (x 0 Φ A BB T Π(t 0, t 1 )x 1 ) + x 0, Π(t 0, K 1, t 1 )x 0 x 1, K 1, x 1, and is independent of K 1. If x 1 = θ, we get J(u 0 ) = x 0, (W A BB T Π(t 0, t 1 ) 1 + Π(t 0, K 1, t 1 ))x 0 and the cost-to-go from any time t < t 1 and state x(t) is x(t), (W A BB T Π(t, t 1 ) 1 + Π(t, K 1, t 1 ))x(t) 2.2 Free end point problem We now consider optimal control problem for which the final value of the state is left unspecified (free end point). For such problems, there may be a cost associated with this value. In the context of quadratic objective functions we are led to consider t 1 minimize J(u) ( x(t), L(t)x(t) + u(t), u(t) )dt + x(t 1 ), Qx(t 1 ) t 0 subject to ẋ(t) = A(t)x(t) + B(t)u(t) (2.13) x(t 0 ) = x 0. Without loss of generality, L and Q are assumed symmetric (both can be zero). Using Lemma 2.1 one can both complete the square and do away with the final cost term. For this, let K(t) be solution of the Riccati equation K(t) = A T (t)k(t) K(t)A(t) L(t) + K(t)B(t)B T (t)k(t) with the condition i.e. K(t 1 ) = Q K(t) = Π(t, Q, t 1 ) 20 Copyright c , André L. Tits. All Rights Reserved

23 2.2 Free end point problem We assume that such a solution exists (this need not be the case). Then, eliminating the last term in J(u) by means of the lemma and performing manipulations identical to those operated in the fixed end point case, we obtain J(u) = t 1 t 0 B T (t)π(t, Q, t 1 )x(t) + u(t) 2 dt + x(t 0 ), Π(t 0, Q, t 1 )x(t 0 ). (2.14) Since the initial point is fixed (x(t 0 ) = x 0 ), the optimal control in closed loop form is given by u 0 (t) = B T (t)π(t, Q, t 1 )x(t) (2.15) and the optimal value is given by x(t 0 ), Π(t 0, Q, t 1 )x(t 0 ). The cost-to-go from any time t < t 1 and state x(t) is x(t), Π(t, Q, t 1 )x(t). Exercise 2.6 Let p(t) = Π(t, Q, t 1 )x(t). Then u 0 (t) = B(t) T p(t), [ ẋ(t) ṗ(t) ] [ A(t) B(t)B = T (t) L(t) A T (t) ] [ x(t) p(t) ] (2.16) and J(u 0 ) = x 0, p(t 0 ). Remark 2.2 We have not made any positive (or even nonnegative) definiteness assumption on L(t) or Q. The key assumption is that the stated Riccati equation has a solution Π(t, Q, t 1 ). Below we investigate conditions (in particular, on L and Q) which insure that this is the case. Exercise 2.7 Investigate the case of the more general cost function J(u) = t 1 t 0 ( x(t), L(t)x(t) + 2 u(t), S(t)x(t) + u(t), R(t)u(t) )dt, where R(t) is positive definite for all t. Hint: let v(t) = u(t)+m(t)x(t) with M(t) judiciously chosen. Remark 2.3 In all cases considered so far, the optimal control (when it exists) is in the range space of B T (t) for all t. Give a simple explanation for this fact. When L(t) = 0 for all t and Q is nonsingular, the solution of the Riccati equation is directly interpretable in terms of the controllability Gramian W(t, t 1 ). We observed above that if M(t) satisfies the linear equation Ṁ(t) = M(t)A T (t) + A(t)M(t) B(t)B T (t) (2.17) Copyright c , André L. Tits. All Rights Reserved 21

24 Linear Quadratic Optimal Control and if K(t) := M(t) 1 exists, then it satisfies the Riccati equation K(t) = A T (t)k(t) K(t)A(t) + K(t)B(t)B T (t)k(t). (2.18) Note that W(t, t 1 ) satisfies (2.17), with the terminal condition W(t 1, t 1 ) = 0. Its inverse (if it exists) satisfies (2.18) but does not satisfy the required condition at t 1 (in fact it does not exist at t = t 1 ). The solution to linear ordinary equation (2.17) taking on the value Q 1 (assuming Q is invertible) at t = t 1 is given by H(t) = W(t, t 1 ) + Φ(t, t 1 )Q 1 Φ T (t, t 1 ). (Prove this; the first term is the zero state response, the second one the zero input response.) Thus Π(t, Q, t 1 ) = H(t) 1 = (W(t, t 1 ) + Φ(t, t 1 )Q 1 Φ T (t, t 1 )) 1 Exercise 2.8 Assume that Q > 0. Show that H(t) is invertible t ق t 1 (even when W(t, t 1 ) is singular) We now turn to the case when L(t) is not identically zero. First, we derive some properties of adjoint equations. Given the equation ẋ(t) = A(t)x(t) its adjoint equation is given by ṗ(t) = A T (t)p(t). Denote by Φ A (t, t 0 ) and Φ A T (t, t 0 ) the respective state transition matrices. First, letting f(t) = x(t), p(t), one has so that f(t) = ẋ, p + x, ṗ = Ax, p + x, A T p = 0 x(t), p(t) = x(t 0 ), p(t 0 ) t. (2.19) It follows that Φ A T (t, t 0 ) = Φ T A(t 0, t). (Note the change of order of the arguments.) Corollary 2.1 d dt Φ A(t 0, t) = Φ A (t 0, t)a(t) Exercise 2.9 Prove the corollary. 22 Copyright c , André L. Tits. All Rights Reserved

25 2.2 Free end point problem We now turn back to the Riccati equation K(t) = A(t) T K(t) K(t)A(t) + K(t)B(t)B T (t)k(t) L(t) K(t 1 ) = Q (2.20) To analyze this equation, we consider the linear system ( ) [ ẋ(t) A(t) B(t)B = T (t) ṗ(t) L(t) A T (t) ] ( x(t) p(t) ) (2.21) evolving in R 2n. We will see that trajectories of (2.21) are directly related to optimal trajectories of our linear/quadratic optimal control problems. (Also see Exercise 2.6 above.) We first show that the Riccati equation (2.20) can be solved via system (2.21). Theorem 2.1 Let,مم) Ψ (مم be the 2n 2n state transition matrix for system (2.21) and let Then ( X(t) P(t) ) = Ψ(t, t 1 ) ( I Q ) Π(t, Q, t 1 ) = P(t)X(t) 1 solves (2.20) for t [τ, t 1 ] for any τ < t 1 such that X(t) 1 exists on [τ, t 1 ]. Proof. Just plug in and use the fact that ( ) [ Ẋ(t) A(t) B(t)B P(t) = T (t) L(t) A T (t) ] ( X(t) P(t) ). Note that, since X(t 1 ) = I, by continuity of the state transition matrix, X(t) is invertible for t close enough to t 1, and thus the Riccati equation has a solution for t close to t 1. This solution is unique since, over bounded subsets of R n n, the right-hand side of the Riccati equation satisfies a Lipschitz condition. Since clearly the transpose of a solution to the Riccati equation is also a solution, this unique solution must be symmetric, as required in the path-independent lemma. Also, the only way the solution can disappear is if X(t) becomes singular. We now show that, when Π(t, Q, t 1 ) disappears, it must first grow without bounds. Lemma 2.2 Let ˆt = inf»τ : Π(t, Q, t 1 ) exists t [τ, t 1 ]. If ˆt >,ف then,مم) Π Q, t 1 ) is unbounded on (ˆt, t 1 ]. Proof. Let so that (2.20) can be written ϕ(k, t) = A(t) T K KA(t) + KB(t)B T (t)k L(t), K(t) = ϕ(k(t), t), K(t 1 ) = Q. Copyright c , André L. Tits. All Rights Reserved 23

26 Linear Quadratic Optimal Control ϕ is continuous and is Lipschitz continuous in its first argument on every bounded set. The claim is then an immediate consequence of a classical result on the solution of ordinary differential equations: See, e.g., [11, Theorem 2.1], with D = (ˆt, t 1 ) R n(n+1) 2. Theorem 2.2 Suppose L(t) is positive semidefinite for all t and Q is positive semidefinite. Then Π(t, Q, t 1 ) exists t ق t 1. Proof. Again, let ˆt = inf»τ : Π(t, Q, t 1 ) exists t [τ, t 1 ], so that,مم) Π Q, t 1 ) exists in (ˆt, t 1 ]. In view of Lemma 2.2, it is enough to prove that,مم) Π Q, t 1 ) is bounded on (ˆt, t 1 ]. Let τ (ˆt, t 1 ]. For any x R n we have t 1 0 ق ) 1 x, Π(τ, Q, t 1 )x = min ( u(t), u(t) + x(t), L(t)x(t) )dt + x(t 1 ), Qx(t u U τ where x(t) satisfies (2.13) with initial condition x(τ) = x. Letting ˆx(t) be the solution to (2.13) corresponding to u(t) = 0 t and ˆx(τ) = x, we can write with ق )x x, Π(τ, Q, t 1 ق 0 F(τ) = t1 τ t 1 τ ˆx(t), L(t)ˆx(t) dt + ˆx(t 1 ), Qˆx(t 1 ) = x, F(τ)x Φ(t, τ) T L(t)Φ(t, τ)dt + Φ(t 1, τ) T QΦ(t 1, τ), a continuous function. Since this holds for all x R n and since Π(t, Q, t 1 ) is symmetric, it follows that, using, e.g., the spectral norm, Π(τ, Q, t 1 ) ق F(τ). Since (مم) F is continuous on [ˆt, t 1 ], it is bounded on [ˆt, t 1 ], hence on (ˆt, t 1 ]. The proof is complete. Corollary 2.2 If L(t) is positive semidefinite for all t, then the DRE (2.11 has a solution for some symmetric final value K 1 (indeed, for any symmetric positive semidefinite K 1 ), and hence the fixed end point problem has a solution. ق A 2 x, Fx for all x, then ق Ax x, ق Exercise 2.10 Prove that, if A = A T and 0 F 2. Thus, when L(t) is positive semidefinite for all t and Q is positive semidefinite, our problem has a unique optimal control given by (2.15). Finally, for all optimal control problems considered so far, the optimal control can be expressed in terms of the adjoint variable (or costate).(مم) p More precisely, the following holds. 24 Copyright c , André L. Tits. All Rights Reserved

27 2.2 Free end point problem Theorem 2.3 Consider either the fixed end point problem, with x 1 = θ, or the free end point problem studied above. Suppose that the controllability Gramian W(t 0, t 1 ) is nonsingular (for the fixed end point case) and that the relevant Riccati equation has a (unique) solution K(t) on [t 0, t 1 ] with K(t 1 ) = K 1. Let x(t) be the optimal trajectory and define p(t) by for the fixed end-point problem and by p(t) = (Π(t, K 1, t 1 ) + W A BB T Π(t, t 1 ) 1 )x(t) p(t) = Π(t, Q, t 1 )x(t) for the free end-point problem so that the optimal control is given by u 0 (t) = B T (t)p(t). Then ( ẋ(t) ṗ(t) ) ( A(t) B(t)B = T (t) L(t) A T (t) ) ( x(t) p(t) ) and the optimal cost is x(t 0 ), p(t 0 ). Exercise 2.11 Prove the theorem. Remark 2.4 Let H : R n R n R m R R be given by H(ξ, η, u, t) = 1 2 u, u + 1 ξ, L(t)ξ + η, A(t)ξ + B(t)u, 2 and let H : R n R n R R be defined by Thus H(ξ, η, t) = 1 2 ( ξ η H(ξ, η, t) = min u R m H(ξ, η, u, t). ), [ L(t) A T (t) A(t) B(t)B T (t) ] ( ξ η ) = 1 2 ξ, L(t)ξ + η, A(t)ξ 1 2 η, B(t)BT (t)η. The gradient of H with respect to the first 2n arguments is given by [ L(t) A H(ξ, η, t) = T ] ( ) (t) ξ A(t) B(t)B T. (t) η System (2.21) can then be equivalently written in the canonical form ẋ(t) = H (x(t), p(t), t) p ṗ(t) = H (x(t), p(t), t) x Copyright c , André L. Tits. All Rights Reserved 25

28 Linear Quadratic Optimal Control i.e., where z(t) = ( x(t) p(t) ) and J = ż(t) = J H(x(t), p(t), t) (2.22) ( 0 I I 0 ). This is one instance of Pontryagin s maximum principle ; see Chapter 5 for more details.) The function H is called pseudo-hamiltonian and H is called Hamiltonian (or true Hamiltonian) for the problem under consideration. Remark 2.5 Along trajectories of (2.21), since d dt H(x(t), p(t), t) = H(x(t), p(t), t), ż(t) + H(x(t), p(t), t) t = H(x(t), p(t), t) t H(x(t), p(t), t), ż(t) = H(x(t), p(t), t), J H(x(t), p(t), t) = 0 (since J T = J). In particular, if A, B and L do not depend on t, H(x(t), p(t), t) is constant along trajectories of (2.21). 2.3 Infinite horizon LTI problem We now turn our attention to the case of infinite horizon (t 1 =.(ف We also assume that A, B and L are constant. Assuming (as above) that L = L T ق 0, we write L = C T C, so that the problem can be written as subject to minimize J(u) = 0 ( u(t), u(t) + y(t), y(t) )dt ẋ = Ax + Bu y = Cx x(0) = x 0 Note that y is merely some linear image of x, and need not be a physical output. For example, it could be an error output. 26 Copyright c , André L. Tits. All Rights Reserved

29 2.3 Infinite horizon LTI problem We know that Π( t, 0, 0) exists for all t ق 0 and is symmetric positive definite. Intuition suggests that, since the time-to-go is infinite at any time t, and the system is time-invariant, the optimal feedback should not depend on t, i.e., that Π( t, 0, 0) converges to some limit Π as t,ف that its derivative converges to zero, and that Π thus satisfies the algebraic Riccati equation A T Π + Π A Π BB T Π = C T C (ARE) We now show that, under certain natural assumptions, this intuition is essentially correct. For any u U, t ق 0, define J t (u) = t 0 ( u(τ), u(τ) + x(τ), C T Cx(τ) )dτ. Let K be any constant symmetric matrix that satisfies (ARE), let u U, let x be the corresponding state trajectory, and let t ق 0. Note that, since K = 0, K also satisfies K = A T K KA + KBB T K C T C (DRE) It follows that J t (u) = t 0 u(τ) + B T Kx(τ) 2 x(t), Kx(t) + x 0, Kx 0. (2.23) Theorem 2.4 Let K = K T ق 0 satisfy (ARE) and let û U be determined by the feedback control law û = B T Kx. Then J(û) ق x 0, Kx 0. Proof. Since K ق 0 it follows from (2.23) that 0. ق x 0, Kx 0 t ق (û) J t.ف t The claim follows by letting Now note that, from Theorem 2.2, Π( t, 0, 0) exists for all t ق 0 and is symmetric, and that, by time-invariance of (DRE), 0. ق Π(0, 0, t) = Π( t, 0, 0) for all t Clearly any stabilizing feedback law yields a finite value for J(u). It turns out that stabilizability is sufficient for the existence of an optimal control. 0, ق Π Theorem 2.5 Suppose (A, B) is stabilizable. Then, for some matrix and Π solves (ARE). Moreover ف t Π(0, 0, t) Π as J(u) ق x 0, Π x 0 u U. Copyright c , André L. Tits. All Rights Reserved 27

30 Linear Quadratic Optimal Control Proof. Direct consequence of the following two lemmas..ف t Lemma 2.3 For (DRE), if (A, B) is stabilizable, Π( t, 0, 0) remains bounded as.ف t Furthermore there exists a matrix Π satisfying (ARE) such that Π(0, 0, t) Π as 0, ق Proof. Given x 0 R n we have, for t x 0, Π(0, 0, t)x 0 = min u U J t(u). It follows that Π(0, 0, t) ق 0 for t ق 0. Next, let û(t) = F x(t) be any stabilizing static state feedback and let ˆx be the corresponding solution of ẋ = (A + BF)x x(0) = x 0, i.e., ˆx(t) = e (A+BF)t x 0. Then, since A + BF is stable, ق 0 x 0, Π(0, 0, t)x 0 û(σ), û(σ) + ˆx(σ), C T Cˆx(σ) dσ = x 0, M x 0 with M = 0 e (A+BF)T t (F T F + C T C)e (A+BF)t dt 0. ق for all t Since Π(0, 0, t) is symmetric positive semidefinite, the first claim follows (prove it). Now, for all t ق 0, we have t x 0, Π(0, 0, t)x 0 = min u(σ), u(σ) + x(σ), C T Cx(σ) dσ. (2.24) u U 0 Nonnegative definiteness of C T C implies that x 0, Π(0, 0, t)x 0 increases monotonically as t increases. Since it is bounded, it must converge and its time derivative must go to zero for any x 0. 1 Using the fact that Π(0, 0, t) is symmetric it is easily shown that it converges (prove it), i.e., for some symmetric matrix Π, lim t Π(0, 0, t) = Π or equivalently lim Π( t, 0, 0) = Π, t and that its derivative goes to zero. Taking limits on both sides of (DRE) shows that Π satisfies (ARE). 1 The latter does not follow from mere convergence of x 0, Π(0, 0, t)x 0 : consider, e.g., the function (1/t)sin(t 2 ). 28 Copyright c , André L. Tits. All Rights Reserved

31 2.3 Infinite horizon LTI problem For any t ق 0 we now define J t (u) = t 0 ( u(σ), u(σ) + Cx(σ), Cx(σ) )dσ. Lemma 2.4 Assume that (A, B) is stabilizable. Then, for any u U, J(u) ق x 0, Π x 0. Proof. Since C T C is positive semidefinite, for any t > 0, J(u) ق J t (u) ق inf u U J t(u) = x 0, Π(0, 0, t)x 0. Letting t ف and using Lemma 2.3 proves the claim. Thus we have proved the following. Theorem 2.6 Suppose (A, B) is stabilizable, and let Π ق 0 be as in Theorem 2.5. Then Π solves (ARE), the control law û = B T Π x is optimal, and J(û) = x 0, Π x 0. This solves the infinite horizon LTI problem. Additional insight is provided by the following results. Theorem 2.7 If (C, A) is detectable and K ق 0 satisfies (ARE), then A BB T K is stable. Proof. (ARE) can be rewritten (A BB T K) T K + K(A BB T K) = KBB T K C T C. (2.25) Proceed now by contradiction. Let λ, with Reλ ق 0, and v 0 be such that (A BB T K)v = λv. (2.26) Multiplying (2.25) on the left by v and on the right by v we get 2(Reλ)(v Kv) = B T Kv 2 Cv 2. Since the left-hand side is nonnegative and the right-hand side nonpositive, both sides must vanish. Thus (i) Cv = 0 and, (ii) B T Kv = 0 which together with (2.26), implies that Av = λv. Since Reλ ق 0, this contradicts detectability of (C, A). Corollary 2.3 If (A, B) is stabilizable and (C, A) is detectable, then the optimal control law û = B T Π x is stabilizing. Copyright c , André L. Tits. All Rights Reserved 29

32 Linear Quadratic Optimal Control Theorem 2.8 Suppose (A, B) is stabilizable. Then, (i) if (C, A) is observable, then Π > 0; more generally, (ii) if Π x 0 = 0 (equivalently, x 0 Π x 0 = 0) for some x 0, then x 0 belongs to the unobservable subspace. Proof. J(û) = x 0, Π x 0 = 0 = 0 û, û + Cx, Cx dt. Thus û(t) = 0 for all t and Cx(t) = 0 for all t, so that x 0 is unobservable. Computation of Π Let [ A BB T H = L A T ], and let K + be a stabilizing solution to (ARE). Let [ I 0 T = K + I ]. Then T 1 = [ I 0 K + I ]. Now (elementary block column and block row operations) [ ] [ ] [ ] T 1 I 0 A BB T I 0 HT = K + I L A T K + I [ ] [ I 0 A BB = T K + BB T ] K + I L A T K + A T [ A BB = T K + BB T ] 0 (A BB T K + ) T since K + is a solution to (ARE). It follows that σ(h) = σ(a BB T K + ) σ( (A BB T K + )), where (مم) σ denotes the spectrum (set of eigenvalues). Thus, if (A, B) is stabilizable and (C, A) detectable, H cannot have any imaginary eigenvalues: It must have n eigenvalues in C and n eigenvalues in C +. Furthermore the first n columns of T form a basis for the stable invariant subspace of H, i.e., for the span of all generalized eigenvectors of H associated with stable eigenvalues ](see, e.g., Chapter 13 of [23] for more on this). Now let [ S11 S 21 be any basis for the stable invariant subspace of H, i.e., let S = [ S11 S 12 S 21 S Copyright c , André L. Tits. All Rights Reserved ]

33 2.4 More general optimal control problems be any invertible matrix such that S 1 HS = [ X Z 0 Y ] for some X, Y, Z such that σ(x) C and σ(y ) C +. (Note that σ(h) = σ(x) σ(y ).) Then it must hold that, for some nonsingular R, [ ] [ ] S11 I = K + R. S 21 It follows that S 11 = R and S 21 = K + R, thus We have thus proved the following. K + = S 21 S Theorem 2.9 Suppose (A, B) is stabilizable and (C, A) is detectable, ] and let K + be a stabilizing solution of (ARE). Then K + = S 21 S 1 11, where [ S11 S 21 is any basis for the stable invariant subspace of H. In particular, S 21 S 1 11 is symmetric and there is exactly one stabilizing solution to (ARE). Corollary 2.4 Suppose again that (A, B) is stabilizable and that (C, A) is detectable. Then there is a unique positive semidefinite solution to (ARE). Moreover, if (C, A) is observable, that solution is positive definite. Proof. The first statement follows directly from Theorem 2.7 and the last assertion in Theorem 2.9. Then second statement follows from Corollary 2.3 and Theorem 2.8. [ ] 0 I Exercise 2.12 Given J =, any real matrix H that satisfies J I 0 1 HJ = H T is said to be Hamiltonian. Show that if H is Hamiltonian and λ is an eigenvalue of H, then λ also is. 2.4 More general optimal control problems We have shown how to solve optimal control problems where the dynamics are linear, the objective function is quadratic, and the constraints are of a very simple type (fixed initial point, fixed final point). In most problems of practical interest, though, one or more of the following features is present. (i) nonlinear dynamics and objective function (ii) constraints on the control or state trajectories, e.g., u(t) U t, where U R m (iii) more general constraints on the initial and final state, e.g., g(x(t 1 )) ق 0. To tackle such more general optimization problems, we will make use of additional mathematic machinery. We first proceed to develop such machinery. Copyright c , André L. Tits. All Rights Reserved 31

35 Chapter 3 Unconstrained Optimization References: [15, 19]. 3.1 First order condition of optimality We consider the problem min»f(x) x X (3.1) where, as assumed throughout, X is a normed vector space and f : X R is continuously (Fréchet) differentiable (see Appendix B). (In fact, many of the results we obtain below hold under the milder assumption that f is merely Fréchet differentiable at the local minimizer of interest.) Remark 3.1 This problem is technically very similar to the problem min»f(x) x Ω (3.2) where Ω is an open set of X, as shown in the following exercise. Exercise 3.1 Prove carefully, using the definitions given earlier, that ˆx is a local minimizer for (3.2) if and only if ˆx is a local minimizer for (3.1) and ˆx Ω. We now give a first order necessary condition for optimality. Theorem 3.1 Suppose ˆx is a local minimizer for (3.1). Then f (ˆx) = 0. x Proof. Since ˆx is a local minimizer for (3.1), there exists ǫ > 0 such that Since f is Fréchet-differentiable, we have, for all h X, f(ˆx + h) ق f(ˆx) h B(0, ǫ) (3.3) f(ˆx + h) = f(ˆx) + f (ˆx)h + o(h) (3.4) x Copyright c , André L. Tits. All Rights Reserved 33

36 Unconstrained Optimization with o(h) h 0 as h 0. Hence, from (3.3), whenever h B(0, ǫ) or, equivalently, f (3.5) 0 ق o(h) (ˆx)h + x and, dividing by α, for α 0 f (ˆx)(αh) + o(αh) ق 0 h B(0, ǫ), α [0, 1] (3.6) x f o(αh) (ˆx)h + ق 0 h B(0, ǫ), α (0, 1] (3.7) x α It is easy to show (see exercise below) that o(αh) α (3.7), we get 0 as α 0. Hence, letting α 0 in f 0 ق x (ˆx)h h B(0, ǫ). Since h B(0, ǫ) implies h B(0, ǫ), we have f (ˆx)( h) ق 0 thus x Hence f (ˆx)h ق 0 h B(0, ǫ). x f (ˆx)h = 0 h B(0, ǫ) x which implies (since f (ˆx) is linear) x f (ˆx)h = 0 h X x i.e., f (ˆx) = θ. x Exercise 3.2 If (مم) o is such that o(h) h 0 as h 0, then o(αh) α 0 as α 0 α R. Remark 3.2 The optimality condition above, like several other conditions derived in this course, is only a necessary condition, i.e., a point x satisfying this condition need not be optimal, even locally. However, if there is an optimal x, it has to be among those which satisfy the optimality condition. Also it is clear that this optimality condition is also necessary for a global minimizer (since a global minimizer is also a local minimizer). Hence, if a global minimizer is known to exist, it must be, among the points satisfying the optimality condition, the one with minimum value of f. 34 Copyright c , André L. Tits. All Rights Reserved

37 3.2 Steepest descent method Exercise 3.3 Check that, if f is merely Gâteaux differentiable, the above theorem is still valid. (Further, note that continuous differentiability is not required.) Suppose now that f f (ˆx) θ (hence ˆx is not a local minimizer). If h is such that (ˆx)h < 0 x x (such h exists; why?) then, for α > 0 small enough, we have and, hence, for some α 0 > 0, α» f o(αh) (ˆx)h + x α < 0 f(ˆx + αh) < f(ˆx) α (0, α 0 ]. Such h is called a descent direction for f at ˆx. The concept of descent direction is essential to numerical methods. If X is a Hilbert space, then f (ˆx)h = gradf(ˆx), h and a particular x descent direction is h = gradf(ˆx). (This is so irrespective of which inner product (and associated gradient) is used. We will return to this point when studying Newton s method and variable metric methods.) 3.2 Steepest descent method Suppose X = H, a Hilbert space. In view of what we just said, a natural algorithm for attempting to solve (3.1) would be the following. Algorithm 1 (steepest descent) Data x 0 H i = 0 while gradf(x i ) θ do» pick λ i arg min λ»f(x i λgradf(x i )) : λ ق 0 (if there is no such minimizer the algorithm fails) x i+1 = x i λ i gradf(x i ) i = i + 1 stop Notation: Given a real-valued function φ, the (possibly empty) set of global minimizers for the problem minimize φ(x) s.t. x S is denoted by arg min x»φ(x) : x S. f Exercise 3.4 Let x be such that gradf(x) θ (i.e., (x) θ). Show that ĥ := x gradf(x) (unit vector along gradf(x)) is indeed the unique direction of (local) gradf(x),gradf(x) 1/2 Copyright c , André L. Tits. All Rights Reserved 35

Notes for ENEE 664: Optimal Control. André L. Tits

Notes for ENEE 664: Optimal Control André L. Tits DRAFT July 2011 Contents 1 Motivation and Scope 3 1.1 Some Examples.................................. 3 1.2 Scope of the Course................................