EE363 Prof. S. Boyd EE363 homework 2 solutions. Derivative of matrix inverse. Suppose that X : R R n n, and that X(t is invertible. Show that ( d d dt X(t = X(t dt X(t X(t. Hint: differentiate X(tX(t = I with respect to t. Solution: Differentiating X(tX(t with respect to t, we get = d dt I = d dt X(tX(t ( d = X(t dt X(t + ( d dt X(t X(t. We conclude that and so X(t ( ( d d dt X(t = dt X(t X(t, ( d d dt X(t = X(t dt X(t X(t. 2. Infinite horizon LQR for a periodic system. Consider the system x t+ = A t x t + B t u t, where { { A e t even B e t even A t = A o B t odd t = B o t odd In other words, A and B are periodic with period 2. We consider the infinite horizon LQR problem for this time-varying system, with cost J = ( x τ Qx τ + u τ Ru τ. τ= In this problem you will use dynamic programming to find the optimal control for this system. You can assume that the value function is finite. (a Conjecture a reasonable form for the value function. You do not have to show that your form is correct. (b Derive the Hamilton-Jacobi equation, using your assumed form. Hint: you should get a pair of coupled nonlinear matrix equations.
(c Suggest a simple iterative method for solving the Hamilton-Jacobi equation. You do not have to prove that the iterative method converges, but do check your method on a few numerical examples. (d Show that the Hamilton-Jacobi equation can be solved by solving a single (bigger algebraic Riccati equation. How is the optimal u related to the solution of this equation? Remark: he results of this problem generalize to general periodic systems. Solution: (a In the time-invariant infinite horizon LQR problem, the value function V (z does not depend on time. his is because V t (z = V t (z = min (x(τ Qx(τ + u(τ Rx(τ = τ=t min (x(τ Qx(τ + u(τ Ru(τ τ=t u(t,... u(t,... as long as z = x t = x t. A similar statement can be made for the periodic case. hat is, V (z = V 2k (z for k =, 2,... as long as z = x = x 2k. Similarly V (z = V 2k+ (z as long as z = x = x 2k+. herefore, we can define our value function to be V o (z when t is odd and V e (z when t is even. In the general time-varying finite horizon case it can be shown that the value function is quadratic. Since this holds for any horizon N, it is reasonable to suggest (and indeed is true that the value function remains quadratic as N. A reasonable form for the value function is therefore V t (z = V o (z = z P o z when t is odd and V t (z = V e (z = z P e z when t is even. (b For t even we get V e (z = z Qz + min w (w Rw + V o (A e z + B e w = z Qz + min w (w Rw + (A e z + B t w P o (A e z + B t w = z (Q + A e P oa e A e P ob e (R + B e P ob e B e P oa e z = z P o z, and we get a similar expression for t odd. Putting these together we have P o = Q + A o P ea o A o P eb o (R + B o P eb o B o P ea o, P e = Q + A e P oa e A e P ob e (R + B e P ob e B e P oa e. (c A simple iterative scheme for finding P o and P e is given by repeatedly setting P o := Q + A o P e A o A o P e B o (R + Bo P e B o Bo P e A o P e := Q + A e P oa e A e P ob e (R + Be P ob e Be P oa e, 2
starting from arbitrary positive semidefinite matrices P o and P e. If these converge to a fixed point, we have our value functions. (d he pair of equations discussed in (b and (c can be rewritten as a single matrix equation P = ˆQ + A PA A PB( ˆR + B PB B PA, where [ ] [ ] [ ] Ae Be Q A =, B =, ˆQ =, A o B o Q [ ] [ ] R Po ˆR =, P =. R P e Note that this is simply an algebraic Riccati equation. Using the fact that u t = (R + Bt P t+ B t Bt P t+ A t x t, the optimal control can be obtained from P by taking { (R + B u t = e P o B e Be P oa e x t = K e x t, t even, (R + Bo P eb o Bo P ea o x t = K o x t, t odd. 3. LQR for a simple mechanical system. Consider the mechanical system shown below: d d 2 d 3 d 4 u u u 2 u 2 m m 2 m 3 m 4 k k 2 k 3 k 4 u 3 Here d,...,d 4 are displacements from an equilibrium position, and u,..., u 3 are forces acting on the masses. Note that u is a tension between the first and second masses, u 2 is a tension between the third and fourth masses, and u 3 is a force between the wall (at left and the second mass. You can take the mass and stiffness constants to all be one: m = = m 4 =, k = = k 4 =. (a Describe the system as a linear dynamical system with state (d, d R 8. (b Using the cost function J = ( d(t 2 + u(t 2 dt, find the optimal state feedback gain matrix K. You may find the Matlab function lqr( useful, but check sign conventions: it s not unusual for the optimal feedback gain to be defined as u = Kx instead of u = Kx (which is what we use. 3
(c Plot d(t versus t for the open loop (u(t = and closed loop (u(t = Kx(t cases using an arbitrary initial condition (but not, of course, zero. (d Solve the ARE for this problem using the method based on the Hamiltonian, described in lecture 4. Verify that you get the same result as you did using the Matlab function lqr(. Solution: (a he equations of motion for this system can be written as m d = k d + k 2 (d 2 d + u = 2d + d2 + u m 2 d2 = k 2 (d 2 d + k 3 (d 3 d 2 u u 3 = d 2d 2 + d 3 u u 3 m 3 d3 = k 3 (d 3 d 2 + k 4 (d 4 d 3 + u 2 = d 2 2d 3 + d 4 + u 2 m 4 d4 = k 4 (d 4 d 3 u 2 = d 3 d 4 u 2 Defining d 5 = d,...,d 8 = d 4 and x = (d,...,d 8, this system can be described by ẋ = Ax + Bu, where A = 2 2 2, B =. (b Setting Q and R in the LQR cost function as Q = [ I ] R = I and using the Matlab function -lqr( (the Matlab function returns the negative of optimal gain matrix in our notation we obtain K =.579.3393.447.948.3975.4758.569.5787.754.755.997.3923.54.23.234.9.458.768.2497.537.6683.442.8593.8796. (c Using an initial condition of x( = (,,,,,,,, the displacements d,...,d 4 for the open loop and closed loop systems are shown below. 4
Mass displacements for the open loop system d 2 3 4 5 6 d2 2 3 4 5 6 d3 2 3 4 5 6 d4 2 3 4 5 6 t Mass displacements for the closed loop system d 2 3 4 5 6 d2 2 3 4 5 6 d3 2 3 4 5 6 d4 2 3 4 5 6 (d K can be found from the Hamiltonian matrix by the following method: t 5
Form the 6 6 Hamiltonian matrix H. Find the eigenvalues of H. Find the eigenvectors corresponding to the 8 stable eigenvalues of H. Form the 6 8 matrix M = [v v 8 ] from the stable eigenvectors of H. Partition M into two 8 8 matrices M = [X Y ]. K = R B Y X. 4. Hamiltonian matrices. A matrix [ M M M = 2 M 2 M 22 ] with M ij R n n is Hamiltonian if JM is symmetric, or equivalently JMJ = M, where J = [ I I ]. (a Show that M is Hamiltonian if and only if M 22 = M and M 2 and M 2 are each symmetric. (b Show that if v is an eigenvector of M, then Jv is an eigenvector of M. (c Show that if λ is an eigenvalue of M, then so is λ. (d Show that det(si M = det( si M, so that det(si M is a polynomial in s 2. Solution: (a JMJ = M [ M2 M 22 M M 2 ] [ I I ] = [ M M 2 M 2 M 22 ] (b or [ M22 M 2 and the claim follows. M 2 M ] [ M = M2 M2 M22 ] J = J = Mv = λv = JM Jv = λv = M (Jv = λ(jv So λ is an eigenvalue and Jv is an eigenvector of M. (c It follows from (b, since the eigenvalues of J and J are identical. 6
(d A little algebra: det(si M = det(si M = det(si JMJ = det(si + J MJ = det(si + M = ( 2n det( si M = det( si M 5. Value function for infinite-horizon LQR problem. In this problem you will show that the minimum cost-to-go starting in state z is a quadratic form in z. Let Q = Q, R = R > and define ( J(u, z = x t Qx t + u t Ru t t= where x t+ = Ax t + Bu t, x = z. Note that u is a sequence in R m, and z R n. Of course, for some u s and z s, J(u, z =. Define V (z = min u J(u, z Note that this is a minimum over all possible input sequences. his is just like the previous problem, except that here u has infinite dimension. Assume (A, B is controllable, so V (z < for all z. (a Show that for all λ R, J(λu, λz = λ 2 J(u, z, and conclude that V (λz = λ 2 V (z. ( (b Let u and ũ be two input sequences, and let z and z be two initial states. Show that J(u + ũ, z + z + J(u ũ, z z = 2J(u, z + 2J(ũ, z Minimize the RHS with respect to u and ũ, and conclude V (z + z + V (z z 2V (z + 2V ( z (c Apply the above inequality with (z+ z substituted for z and (z z substituted 2 2 for z to get: V (z + z + V (z z = 2V (z + 2V ( z (2 (d he two properties ( and (2 of V are enough to guarantee that V is a quadratic form. Here is one way to see it (you supply all details: take gradients in (2 with respect to z and z and add to get From ( show that V (z + z = V (z + V ( z (3 V (λz = λ V (z (4 (3 and (4 mean that V (z, which is a vector, is linear in z, and hence has a matrix representation: V (z = Mz (5 where M R n n. 7
(e Show that V (z = z Pz, where P = (M + 4 M. Show that P = P. hus we are done. Hint V (z = V ( + V (θz zdθ. Solution: (a Let x t be the state that corresponds to an input u t and an initial condition z. hen t x t = A t z + A i Bu t i. i= Now denote by x new,t the state that corresponds to an input λu t and an initial condition λz. Obviously, t x new,t = A t (λz + A i B(λu t i = λx t. i= hus, (b Suppose J(λu, λz = (λ 2 x t Qx t + λ 2 u t Ru t = λ 2 J(u, z. t= x t+ = Ax t + Bu t, x = z x t+ = A x t + Bũ t, x = z. Adding or substracting the above equations yields x t+ ± x t+ = A(x t ± x t + B(u t ± ũ t, x ± x = z + z. hus, x t ± x t is the state that corresponds to an input u t ±ũ t and initial condition z ± z, respectively. Now J(u + ũ, z + z + J(u ũ, z z = (x t + x t Q(x t + x t + (x t x t Q(x t x t + So we have t= (u t + ũ t R(u t + ũ t + (u t ũ t R(u t ũ t = 2x t Qx t + 2 x t Q x t + 2u t Ru t + 2ũ t Rũ t = 2J(u, z + 2J(ũ, z. t= J(u + ũ, z + z + J(u ũ, z z = 2J(u, z + 2J(ũ, z. 8
Minimizing both sides with respect to u and ũ obtains min u,ũ {J(u + ũ, z + z + J(u ũ, z z} = min 2J(u, z + min 2J(ũ, z. u ũ he right-hand side of this equation is obviously 2V (z + 2V ( z. Examining the left-hand side, herefore, (c Performing this substitution yields Using part (a, we have min{j(u + ũ, z + z + J(u ũ, z z} u,ũ min J(u + ũ, z + z + min J(u ũ, z z u,ũ u,ũ = V (z + z + V (z z. V (z + z + V (z z 2V (z + 2V ( z. V (z + V ( z 2V ( z + z z + 2V (z 2 2. V (z + V ( z 2 V (z + z + V (z z 2 Combining this result with the one from part (b yields V (z + z + V (z z = 2V (z + 2V ( z. (d aking gradients of both sides of equation ( in part (a, we obtain λ V (λz = λ 2 V (z = V (λz = λ V (z. (e hus V (z is linear in z, and hence M R n n such that V (z = Mz. V (z = V (+ ( V (θz d(θz = V (+ Substituting λ = in equation ( in part (a we obtain ( V (z zθ dθ = V (+ z M z. 2 hus, V (z = z M z 2 V ( = V ( z = 2 V (z =. = z Mz 2 = 2 ( z M z 2 + z Mz 2 P is obviously symmetric; it is also positive semidefinite: = z (M + M z. 4 min u J(u, z z = V (z = z Pz z = P. 9
6. Closed-loop stability for receding horizon LQR. We consider the system x t+ = Ax t + Bu t, y t = Cx t, with A =.4.6.4.4.6.4, B =, C = [ ]. Consider the receding horizon LQR control problem with horizon, using state cost matrix Q = C C and control cost R =. For each, the receding horizon control has linear state feedback form u t = K x t. he associated closed-loop system has the form x t+ = (A + BK x t. What is the smallest horizon for which the closed-loop system is stable? Interpretation. As you increase, receding horizon control becomes less myopic and greedy; it takes into account the effects of current actions on the long-term system behavior. If the horizon is too short, the actions taken can result in an unstable closed-loop system. Solution: he closed-loop system x t+ = (A + BK x t is stable if the spectral radius of A + BK is strictly less than. (he spectral radius of a square matrix P is max i=,...,n λ i, where λ...λ n are its eigenvalues. In order to determine the smallest horizon that gives a stable closed-loop system, we will find K for =, 2... and check whether the spectral radius of A + BK is less than. We have K = (R + B P B B P A, where P = Q, P i+ = Q + A P i A P i B(R + B PB B P i A. A simple Matlab script which performs the method described is shown below. he first plot shows the optimal state feedback gains versus the receding time horizon. he second plot shows the spectral radius of A + BK as a function of. We can see from the second plot that for 7, the resulting closed-loop system is stable.
(K (K2 (K3 (K4.5.5 5 5 2 25.5.5 5 5 2 25.5.5 5 5 2 25 LQR-optimal state feedback gain versus horizon 2 5 5 2 25 time horizon % closed-loop stability for receding horizon LQR % data A = [.4 ; -.6.4 ;.4 -.6;.4 ]; B = [ ] ; C = [ ]; Q = C *C;R = ; m=;n=4;=25; % recursion P=zeros(n,n,+; K=zeros(m,n,; P(:,:,+=Q; spec_rad = zeros(,; for i = :-: % LQR-optimal state feedback K(:,:,i = -inv(r + B *P(:,:,i+*B*B *P(:,:,i+*A; % spectral radius spec_rad(-i+ = max(abs(eig(a+b*k(:,:,i; P(:,:,i=Q + A *P(:,:,i+*A + A *P(:,:,i+*B*K(:,:,i; end % plots figure(; t = :-; K = shiftdim(k;
subplot(4,,; plot(t,k(,:; ylabel( K(t ; subplot(4,,2; plot(t,k(2,:; ylabel( K2(t ; subplot(4,,3; plot(t,k(3,:; ylabel( K3(t ; subplot(4,,4; plot(t,k(4,:; ylabel( K4(t ; xlabel( t ; % print -depsc receding_horizon_gains.eps figure(2 plot(:,spec_rad; hold on; plot(:,spec_rad,. xlabel( ; ylabel( \rho(a+bk title( title % print -depsc receding_horizon_specrad.eps.3 Spectral radius of A + BK versus.25.2.5. ρ(a+bk.5.95.9.85.8 5 5 2 25 time horizon 7. LQR with exponential weighting. A common variation on the LQR problem includes explicit time-varying weighting factors on the state and input costs, J = N τ= γ τ ( x τ Qx τ + u τ Ru τ + γ N x N Q fx N, where x t+ = Ax t +Bu t, x is given, and, as usual, we assume Q = Q, Q f = Q f and R = R > are constant. he parameter γ, called the exponential weighting factor, is positive. For γ =, this reduces to the standard LQR cost function. For γ <, the penalty for future state and input deviations is smaller than in the present; in this case we call γ the discount factor or forgetting factor. When γ >, future costs are accentuated compared to present costs. his gives added incentive for the input to steer the state towards zero quickly. 2
(a Note that we can find the input sequence u,...,u N that minimizes J using standard LQR methods, by considering the state and input costs as time varying, with Q t = γ t Q, R t = γ t R, and final cost given by γ N Q f. hus, we know at least one way to solve the exponentially weighted LQR problem. Use this method to find the recursive equations that give u. (b Exponential weights can also be incorporated directly into a dynamic programming formulation. We define N W t (z = min γ ( τ t x τ Qx τ + u τ Ru τ + γ N t x N Q fx N, τ=t where x t = z, x τ+ = Ax τ + Bu τ, and the minimum is over u t,...,u N. his is the minimum cost-to-go, if we started in state z at time t, with the time weighting also starting at t. Argue that we have W N (z = x N Q fx N, W t (z = min w ( z Qz + w Rw + γw t+ (Az + Bw, and that the minimizing w is in fact u t. In other words, work out a backwards recursion for W t, and give an expression for u t in terms of W t. Show that this method yields the same u as the first method. (c Yet another method can be used to find u. Define a new system as y t+ = γ /2 Ay t + γ /2 Bz t, y = x. Argue that we have y t = γ t/2 x t, provided z t = γ t/2 u t, for t =,..., N. With this choice of z, the exponentially weighted LQR cost J for the original system is given by Solution: J = N τ= ( y τ Qy τ + z τ Rz τ + y N Q f y N, i.e., the unweighted LQR cost for the modified system. We can use the standard formulas to obtain the optimal input for the modified system z, and from this, we can get u. Do this, and verify that once again, you get the same u. (a he value function is V t (z = min ( N τ=t γ τ ( x τ Qx τ + u τ Ru τ + x N γ N Q f x N where the minimum is over all input sequences u(,..., u(n, and x t = z, x τ+ = Ax τ + Bu τ for τ = t,...,n. We can express this as V t (z = min N ( x τ Q τ x τ + u τ R τu τ + x N γ N Q f x N, τ=t 3,
where Q τ = γ τ Q and R τ = γ τ R. his is just a standard LQR problem with time-varying Q and R. hus from the lecture notes we have V t (z = z P t z, u t = K tx t, where P t and K t are given by the following recursion P N = γ N Q f, P t = Q t + A P t A A P t B ( R t + B P t B B P t A = γ t Q + A P t A A P t B ( γ t R + B P t B B P t A, K t = ( R t + B P t+ B B P t+ A = ( γ t R + B P t+ B B P t+ A. (b Clearly we have W N (z = z Q f z, since at the last step the cost is not a function of the input. Now let u t = w. We can apply the dynamic programming principle to the equation for W t (z by first minimizing with respect to w and then with respect to the other inputs. If we do that, it is easy to see that W t (z = min w ( z Qz + w Rw + γw t+ (Az + Bw. Note that the minimizing w is equal to the optimal input u t. his is because W (z = V (z for all z and therefore a set of minimizing inputs for the above HJ equation will also minimize J. We will now show by induction that W t (z = z S t z, i.e. that W t (z is a quadratic form in z. Let us assume that this is true for t = τ +. hen using the above HJ equation we have W τ (z = min w ( z Qz + w Rw + γ(az + Bw S τ+ (Az + Bw. o find w, we take the derivatives with respect to w of the argument within the minimum above, and set it equal to zero. We get w = γ(r + γb S τ+ B B S τ+ Az. By substituting the optimal w in the expression for W t (z we obtain W t (z = z ( Q + γa S t+ A γ 2 A S t+ B(R + γb S t+ B B S t+ A z, which is still a quadratic form. his combined with the fact that W N (z = z Q f z is sufficient to see that W t (z is indeed a quadratic form. Also, the optimum input u t is given by u t = γ(r + γb S τ+ B B S τ+ Ax t. 4
Actually S t and P t are closely related, since we have W t (z = min ( N = γ t min = γ t V t (z τ=t ( N γ τ t ( x τ Qx τ + u τ Ru τ + x N Q f x N τ=t γ τ ( x τ Qx τ + u τ Ru τ + x N Q f x N hus z S t z = γ t z P t z, z, or in other words S t = γ t P t. Substituting this into the formula for u t we get u t = ( R + B (γ t P t+ B B A(γ t P t+ x t = ( γ t R + B P t+ B B AP t+ x t. herefore this method gives the same formula for u t as the one of part (a. (c We want to show that y t = γ t/2 x t, provided that z t = γ t/2 u t. his can be shown by induction. he argument holds for t =, since y = x. Now suppose that y t = γ t/2 x t. We have y t+ = γ /2 Ay t + γ /2 Bz t = γ (t+/2 Ax t + γ (t+/2 Bu t = γ (t+/2 (Ax t + Bu t = γ (t+/2 x t+. hus the argument also holds for y t+. Now, let s express the cost function in terms of y t and z t. We have J = = = N τ= N τ= N τ= γ τ ( x τ Qx τ + u τ Ru τ + γ N x N Q fx N ( (γ τ/2 x τ Q(γτ/2 x τ + (γ τ/2 u τ R(γτ/2 u τ + (γ N/2 x N Q f(γ N/2 x N ( y τ Qy τ + z τ Rz τ + y N Q f y N. Using the formulas from the lecture notes we get that the minimizing z t for the above expression is given by z t = γ(r + γb Pt+ B B Pt+ Ay t, 5
where P t is found by the following recursion P N = Q f, P t = Q + γa Pt A γ 2 A Pt B(R + γb Pt B B Pt A. Note that since the original problem and this modified problem have the same cost function, the optimum zt will give us the optimum u t of the original problem, using the relation zt = γt/2 u t. We can also show by an inductive argument that P t = γ t Pt. Combining these two relations in the formula for zt, we get γ t/2 u t = γ t (R + γ t B P t+ B B P t+ Ay t = γ t/2 (γ t R + B P t+ B B P t+ Ax t We therefore get u t = ( γ t R + B P t+ B B AP t+ x t, which is the same expression as the one obtained in parts (a and (b. 6