EE363 homework 2 solutions

Similar documents
Lecture 4 Continuous time linear quadratic regulator

EE363 Review Session 1: LQR, Controllability and Observability

Lecture 2 and 3: Controllability of DT-LTI systems

Lecture 5 Linear Quadratic Stochastic Control

6. Linear Quadratic Regulator Control

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28

Homework Solution # 3

1. Find the solution of the following uncontrolled linear system. 2 α 1 1

EE363 homework 7 solutions

4F3 - Predictive Control

EE363 homework 1 solutions

MATH4406 (Control Theory) Unit 6: The Linear Quadratic Regulator (LQR) and Model Predictive Control (MPC) Prepared by Yoni Nazarathy, Artem

EE221A Linear System Theory Final Exam

6.241 Dynamic Systems and Control

EE363 homework 8 solutions

Suppose that we have a specific single stage dynamic system governed by the following equation:

Lecture 6. Foundations of LMIs in System and Control Theory

Quadratic Stability of Dynamical Systems. Raktim Bhattacharya Aerospace Engineering, Texas A&M University

4F3 - Predictive Control

5. Observer-based Controller Design

Asymptotic Stability by Linearization

2 The Linear Quadratic Regulator (LQR)

State Feedback and State Estimators Linear System Theory and Design, Chapter 8.

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

Extensions and applications of LQ

Linear Quadratic Regulator (LQR) Design I

Module 07 Controllability and Controller Design of Dynamical LTI Systems

Steady State Kalman Filter

Topic # Feedback Control Systems

ECEEN 5448 Fall 2011 Homework #4 Solutions

Lyapunov Stability Analysis: Open Loop

Lecture 2: Discrete-time Linear Quadratic Optimal Control

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Robotics. Control Theory. Marc Toussaint U Stuttgart

ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming

State Regulator. Advanced Control. design of controllers using pole placement and LQ design rules

Solution of Linear State-space Systems

Semidefinite Programming Duality and Linear Time-invariant Systems

Problem 1 Cost of an Infinite Horizon LQR

Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation

MODERN CONTROL DESIGN

Modern Optimal Control

Control Systems Design, SC4026. SC4026 Fall 2010, dr. A. Abate, DCSC, TU Delft

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

EE C128 / ME C134 Feedback Control Systems

EE C128 / ME C134 Final Exam Fall 2014

Lecture 9. Introduction to Kalman Filtering. Linear Quadratic Gaussian Control (LQG) G. Hovland 2004

= m(0) + 4e 2 ( 3e 2 ) 2e 2, 1 (2k + k 2 ) dt. m(0) = u + R 1 B T P x 2 R dt. u + R 1 B T P y 2 R dt +

ACM/CMS 107 Linear Analysis & Applications Fall 2016 Assignment 4: Linear ODEs and Control Theory Due: 5th December 2016

Linear Quadratic Regulator (LQR) I

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

Perturbation theory for eigenvalues of Hermitian pencils. Christian Mehl Institut für Mathematik TU Berlin, Germany. 9th Elgersburg Workshop

21 Linear State-Space Representations

IEOR 265 Lecture 14 (Robust) Linear Tube MPC

Time-Invariant Linear Quadratic Regulators!

The Inverted Pendulum

Nonlinear Control Systems

Deterministic Dynamic Programming

Course Outline. Higher Order Poles: Example. Higher Order Poles. Amme 3500 : System Dynamics & Control. State Space Design. 1 G(s) = s(s + 2)(s +10)

Symmetric Matrices and Eigendecomposition

Australian Journal of Basic and Applied Sciences, 3(4): , 2009 ISSN Modern Control Design of Power System

Chapter Introduction. A. Bensoussan

Z i Q ij Z j. J(x, φ; U) = X T φ(t ) 2 h + where h k k, H(t) k k and R(t) r r are nonnegative definite matrices (R(t) is uniformly in t nonsingular).

POLE PLACEMENT. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 19

Lecture 7 and 8. Fall EE 105, Feedback Control Systems (Prof. Khan) September 30 and October 05, 2015

Time-Invariant Linear Quadratic Regulators Robert Stengel Optimal Control and Estimation MAE 546 Princeton University, 2015

SYSTEMTEORI - KALMAN FILTER VS LQ CONTROL

Module 03 Linear Systems Theory: Necessary Background

STAT200C: Review of Linear Algebra

Nonlinear Programming Algorithms Handout

Linear-Quadratic Optimal Control: Full-State Feedback

Hamilton-Jacobi-Bellman Equation Feb 25, 2008

AMS10 HW7 Solutions. All credit is given for effort. (-5 pts for any missing sections) Problem 1 (20 pts) Consider the following matrix 2 A =

Control Systems Design, SC4026. SC4026 Fall 2009, dr. A. Abate, DCSC, TU Delft

Intro. Computer Control Systems: F8

Computational Issues in Nonlinear Dynamics and Control

EML5311 Lyapunov Stability & Robust Control Design

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

Notes on the matrix exponential

Math Ordinary Differential Equations

Lecture 20: Linear Dynamics and LQG

Control Systems. Design of State Feedback Control.

C&O367: Nonlinear Optimization (Winter 2013) Assignment 4 H. Wolkowicz

Lecture 15: H Control Synthesis

Ma/CS 6b Class 23: Eigenvalues in Regular Graphs

Goal: to construct some general-purpose algorithms for solving systems of linear Equations

Numerical Methods - Numerical Linear Algebra

ECEEN 5448 Fall 2011 Homework #5 Solutions

Theorem 1. ẋ = Ax is globally exponentially stable (GES) iff A is Hurwitz (i.e., max(re(σ(a))) < 0).

Topic # /31 Feedback Control Systems. Analysis of Nonlinear Systems Lyapunov Stability Analysis

Outline. 1 Linear Quadratic Problem. 2 Constraints. 3 Dynamic Programming Solution. 4 The Infinite Horizon LQ Problem.

Topic # Feedback Control

EEE582 Homework Problems

9 Controller Discretization

State Feedback and State Estimators Linear System Theory and Design, Chapter 8.

Dynamic interpretation of eigenvectors

Lecture 19 Observability and state estimation

EE363 homework 5 solutions

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)

Transcription:

EE363 Prof. S. Boyd EE363 homework 2 solutions. Derivative of matrix inverse. Suppose that X : R R n n, and that X(t is invertible. Show that ( d d dt X(t = X(t dt X(t X(t. Hint: differentiate X(tX(t = I with respect to t. Solution: Differentiating X(tX(t with respect to t, we get = d dt I = d dt X(tX(t ( d = X(t dt X(t + ( d dt X(t X(t. We conclude that and so X(t ( ( d d dt X(t = dt X(t X(t, ( d d dt X(t = X(t dt X(t X(t. 2. Infinite horizon LQR for a periodic system. Consider the system x t+ = A t x t + B t u t, where { { A e t even B e t even A t = A o B t odd t = B o t odd In other words, A and B are periodic with period 2. We consider the infinite horizon LQR problem for this time-varying system, with cost J = ( x τ Qx τ + u τ Ru τ. τ= In this problem you will use dynamic programming to find the optimal control for this system. You can assume that the value function is finite. (a Conjecture a reasonable form for the value function. You do not have to show that your form is correct. (b Derive the Hamilton-Jacobi equation, using your assumed form. Hint: you should get a pair of coupled nonlinear matrix equations.

(c Suggest a simple iterative method for solving the Hamilton-Jacobi equation. You do not have to prove that the iterative method converges, but do check your method on a few numerical examples. (d Show that the Hamilton-Jacobi equation can be solved by solving a single (bigger algebraic Riccati equation. How is the optimal u related to the solution of this equation? Remark: he results of this problem generalize to general periodic systems. Solution: (a In the time-invariant infinite horizon LQR problem, the value function V (z does not depend on time. his is because V t (z = V t (z = min (x(τ Qx(τ + u(τ Rx(τ = τ=t min (x(τ Qx(τ + u(τ Ru(τ τ=t u(t,... u(t,... as long as z = x t = x t. A similar statement can be made for the periodic case. hat is, V (z = V 2k (z for k =, 2,... as long as z = x = x 2k. Similarly V (z = V 2k+ (z as long as z = x = x 2k+. herefore, we can define our value function to be V o (z when t is odd and V e (z when t is even. In the general time-varying finite horizon case it can be shown that the value function is quadratic. Since this holds for any horizon N, it is reasonable to suggest (and indeed is true that the value function remains quadratic as N. A reasonable form for the value function is therefore V t (z = V o (z = z P o z when t is odd and V t (z = V e (z = z P e z when t is even. (b For t even we get V e (z = z Qz + min w (w Rw + V o (A e z + B e w = z Qz + min w (w Rw + (A e z + B t w P o (A e z + B t w = z (Q + A e P oa e A e P ob e (R + B e P ob e B e P oa e z = z P o z, and we get a similar expression for t odd. Putting these together we have P o = Q + A o P ea o A o P eb o (R + B o P eb o B o P ea o, P e = Q + A e P oa e A e P ob e (R + B e P ob e B e P oa e. (c A simple iterative scheme for finding P o and P e is given by repeatedly setting P o := Q + A o P e A o A o P e B o (R + Bo P e B o Bo P e A o P e := Q + A e P oa e A e P ob e (R + Be P ob e Be P oa e, 2

starting from arbitrary positive semidefinite matrices P o and P e. If these converge to a fixed point, we have our value functions. (d he pair of equations discussed in (b and (c can be rewritten as a single matrix equation P = ˆQ + A PA A PB( ˆR + B PB B PA, where [ ] [ ] [ ] Ae Be Q A =, B =, ˆQ =, A o B o Q [ ] [ ] R Po ˆR =, P =. R P e Note that this is simply an algebraic Riccati equation. Using the fact that u t = (R + Bt P t+ B t Bt P t+ A t x t, the optimal control can be obtained from P by taking { (R + B u t = e P o B e Be P oa e x t = K e x t, t even, (R + Bo P eb o Bo P ea o x t = K o x t, t odd. 3. LQR for a simple mechanical system. Consider the mechanical system shown below: d d 2 d 3 d 4 u u u 2 u 2 m m 2 m 3 m 4 k k 2 k 3 k 4 u 3 Here d,...,d 4 are displacements from an equilibrium position, and u,..., u 3 are forces acting on the masses. Note that u is a tension between the first and second masses, u 2 is a tension between the third and fourth masses, and u 3 is a force between the wall (at left and the second mass. You can take the mass and stiffness constants to all be one: m = = m 4 =, k = = k 4 =. (a Describe the system as a linear dynamical system with state (d, d R 8. (b Using the cost function J = ( d(t 2 + u(t 2 dt, find the optimal state feedback gain matrix K. You may find the Matlab function lqr( useful, but check sign conventions: it s not unusual for the optimal feedback gain to be defined as u = Kx instead of u = Kx (which is what we use. 3

(c Plot d(t versus t for the open loop (u(t = and closed loop (u(t = Kx(t cases using an arbitrary initial condition (but not, of course, zero. (d Solve the ARE for this problem using the method based on the Hamiltonian, described in lecture 4. Verify that you get the same result as you did using the Matlab function lqr(. Solution: (a he equations of motion for this system can be written as m d = k d + k 2 (d 2 d + u = 2d + d2 + u m 2 d2 = k 2 (d 2 d + k 3 (d 3 d 2 u u 3 = d 2d 2 + d 3 u u 3 m 3 d3 = k 3 (d 3 d 2 + k 4 (d 4 d 3 + u 2 = d 2 2d 3 + d 4 + u 2 m 4 d4 = k 4 (d 4 d 3 u 2 = d 3 d 4 u 2 Defining d 5 = d,...,d 8 = d 4 and x = (d,...,d 8, this system can be described by ẋ = Ax + Bu, where A = 2 2 2, B =. (b Setting Q and R in the LQR cost function as Q = [ I ] R = I and using the Matlab function -lqr( (the Matlab function returns the negative of optimal gain matrix in our notation we obtain K =.579.3393.447.948.3975.4758.569.5787.754.755.997.3923.54.23.234.9.458.768.2497.537.6683.442.8593.8796. (c Using an initial condition of x( = (,,,,,,,, the displacements d,...,d 4 for the open loop and closed loop systems are shown below. 4

Mass displacements for the open loop system d 2 3 4 5 6 d2 2 3 4 5 6 d3 2 3 4 5 6 d4 2 3 4 5 6 t Mass displacements for the closed loop system d 2 3 4 5 6 d2 2 3 4 5 6 d3 2 3 4 5 6 d4 2 3 4 5 6 (d K can be found from the Hamiltonian matrix by the following method: t 5

Form the 6 6 Hamiltonian matrix H. Find the eigenvalues of H. Find the eigenvectors corresponding to the 8 stable eigenvalues of H. Form the 6 8 matrix M = [v v 8 ] from the stable eigenvectors of H. Partition M into two 8 8 matrices M = [X Y ]. K = R B Y X. 4. Hamiltonian matrices. A matrix [ M M M = 2 M 2 M 22 ] with M ij R n n is Hamiltonian if JM is symmetric, or equivalently JMJ = M, where J = [ I I ]. (a Show that M is Hamiltonian if and only if M 22 = M and M 2 and M 2 are each symmetric. (b Show that if v is an eigenvector of M, then Jv is an eigenvector of M. (c Show that if λ is an eigenvalue of M, then so is λ. (d Show that det(si M = det( si M, so that det(si M is a polynomial in s 2. Solution: (a JMJ = M [ M2 M 22 M M 2 ] [ I I ] = [ M M 2 M 2 M 22 ] (b or [ M22 M 2 and the claim follows. M 2 M ] [ M = M2 M2 M22 ] J = J = Mv = λv = JM Jv = λv = M (Jv = λ(jv So λ is an eigenvalue and Jv is an eigenvector of M. (c It follows from (b, since the eigenvalues of J and J are identical. 6

(d A little algebra: det(si M = det(si M = det(si JMJ = det(si + J MJ = det(si + M = ( 2n det( si M = det( si M 5. Value function for infinite-horizon LQR problem. In this problem you will show that the minimum cost-to-go starting in state z is a quadratic form in z. Let Q = Q, R = R > and define ( J(u, z = x t Qx t + u t Ru t t= where x t+ = Ax t + Bu t, x = z. Note that u is a sequence in R m, and z R n. Of course, for some u s and z s, J(u, z =. Define V (z = min u J(u, z Note that this is a minimum over all possible input sequences. his is just like the previous problem, except that here u has infinite dimension. Assume (A, B is controllable, so V (z < for all z. (a Show that for all λ R, J(λu, λz = λ 2 J(u, z, and conclude that V (λz = λ 2 V (z. ( (b Let u and ũ be two input sequences, and let z and z be two initial states. Show that J(u + ũ, z + z + J(u ũ, z z = 2J(u, z + 2J(ũ, z Minimize the RHS with respect to u and ũ, and conclude V (z + z + V (z z 2V (z + 2V ( z (c Apply the above inequality with (z+ z substituted for z and (z z substituted 2 2 for z to get: V (z + z + V (z z = 2V (z + 2V ( z (2 (d he two properties ( and (2 of V are enough to guarantee that V is a quadratic form. Here is one way to see it (you supply all details: take gradients in (2 with respect to z and z and add to get From ( show that V (z + z = V (z + V ( z (3 V (λz = λ V (z (4 (3 and (4 mean that V (z, which is a vector, is linear in z, and hence has a matrix representation: V (z = Mz (5 where M R n n. 7

(e Show that V (z = z Pz, where P = (M + 4 M. Show that P = P. hus we are done. Hint V (z = V ( + V (θz zdθ. Solution: (a Let x t be the state that corresponds to an input u t and an initial condition z. hen t x t = A t z + A i Bu t i. i= Now denote by x new,t the state that corresponds to an input λu t and an initial condition λz. Obviously, t x new,t = A t (λz + A i B(λu t i = λx t. i= hus, (b Suppose J(λu, λz = (λ 2 x t Qx t + λ 2 u t Ru t = λ 2 J(u, z. t= x t+ = Ax t + Bu t, x = z x t+ = A x t + Bũ t, x = z. Adding or substracting the above equations yields x t+ ± x t+ = A(x t ± x t + B(u t ± ũ t, x ± x = z + z. hus, x t ± x t is the state that corresponds to an input u t ±ũ t and initial condition z ± z, respectively. Now J(u + ũ, z + z + J(u ũ, z z = (x t + x t Q(x t + x t + (x t x t Q(x t x t + So we have t= (u t + ũ t R(u t + ũ t + (u t ũ t R(u t ũ t = 2x t Qx t + 2 x t Q x t + 2u t Ru t + 2ũ t Rũ t = 2J(u, z + 2J(ũ, z. t= J(u + ũ, z + z + J(u ũ, z z = 2J(u, z + 2J(ũ, z. 8

Minimizing both sides with respect to u and ũ obtains min u,ũ {J(u + ũ, z + z + J(u ũ, z z} = min 2J(u, z + min 2J(ũ, z. u ũ he right-hand side of this equation is obviously 2V (z + 2V ( z. Examining the left-hand side, herefore, (c Performing this substitution yields Using part (a, we have min{j(u + ũ, z + z + J(u ũ, z z} u,ũ min J(u + ũ, z + z + min J(u ũ, z z u,ũ u,ũ = V (z + z + V (z z. V (z + z + V (z z 2V (z + 2V ( z. V (z + V ( z 2V ( z + z z + 2V (z 2 2. V (z + V ( z 2 V (z + z + V (z z 2 Combining this result with the one from part (b yields V (z + z + V (z z = 2V (z + 2V ( z. (d aking gradients of both sides of equation ( in part (a, we obtain λ V (λz = λ 2 V (z = V (λz = λ V (z. (e hus V (z is linear in z, and hence M R n n such that V (z = Mz. V (z = V (+ ( V (θz d(θz = V (+ Substituting λ = in equation ( in part (a we obtain ( V (z zθ dθ = V (+ z M z. 2 hus, V (z = z M z 2 V ( = V ( z = 2 V (z =. = z Mz 2 = 2 ( z M z 2 + z Mz 2 P is obviously symmetric; it is also positive semidefinite: = z (M + M z. 4 min u J(u, z z = V (z = z Pz z = P. 9

6. Closed-loop stability for receding horizon LQR. We consider the system x t+ = Ax t + Bu t, y t = Cx t, with A =.4.6.4.4.6.4, B =, C = [ ]. Consider the receding horizon LQR control problem with horizon, using state cost matrix Q = C C and control cost R =. For each, the receding horizon control has linear state feedback form u t = K x t. he associated closed-loop system has the form x t+ = (A + BK x t. What is the smallest horizon for which the closed-loop system is stable? Interpretation. As you increase, receding horizon control becomes less myopic and greedy; it takes into account the effects of current actions on the long-term system behavior. If the horizon is too short, the actions taken can result in an unstable closed-loop system. Solution: he closed-loop system x t+ = (A + BK x t is stable if the spectral radius of A + BK is strictly less than. (he spectral radius of a square matrix P is max i=,...,n λ i, where λ...λ n are its eigenvalues. In order to determine the smallest horizon that gives a stable closed-loop system, we will find K for =, 2... and check whether the spectral radius of A + BK is less than. We have K = (R + B P B B P A, where P = Q, P i+ = Q + A P i A P i B(R + B PB B P i A. A simple Matlab script which performs the method described is shown below. he first plot shows the optimal state feedback gains versus the receding time horizon. he second plot shows the spectral radius of A + BK as a function of. We can see from the second plot that for 7, the resulting closed-loop system is stable.

(K (K2 (K3 (K4.5.5 5 5 2 25.5.5 5 5 2 25.5.5 5 5 2 25 LQR-optimal state feedback gain versus horizon 2 5 5 2 25 time horizon % closed-loop stability for receding horizon LQR % data A = [.4 ; -.6.4 ;.4 -.6;.4 ]; B = [ ] ; C = [ ]; Q = C *C;R = ; m=;n=4;=25; % recursion P=zeros(n,n,+; K=zeros(m,n,; P(:,:,+=Q; spec_rad = zeros(,; for i = :-: % LQR-optimal state feedback K(:,:,i = -inv(r + B *P(:,:,i+*B*B *P(:,:,i+*A; % spectral radius spec_rad(-i+ = max(abs(eig(a+b*k(:,:,i; P(:,:,i=Q + A *P(:,:,i+*A + A *P(:,:,i+*B*K(:,:,i; end % plots figure(; t = :-; K = shiftdim(k;

subplot(4,,; plot(t,k(,:; ylabel( K(t ; subplot(4,,2; plot(t,k(2,:; ylabel( K2(t ; subplot(4,,3; plot(t,k(3,:; ylabel( K3(t ; subplot(4,,4; plot(t,k(4,:; ylabel( K4(t ; xlabel( t ; % print -depsc receding_horizon_gains.eps figure(2 plot(:,spec_rad; hold on; plot(:,spec_rad,. xlabel( ; ylabel( \rho(a+bk title( title % print -depsc receding_horizon_specrad.eps.3 Spectral radius of A + BK versus.25.2.5. ρ(a+bk.5.95.9.85.8 5 5 2 25 time horizon 7. LQR with exponential weighting. A common variation on the LQR problem includes explicit time-varying weighting factors on the state and input costs, J = N τ= γ τ ( x τ Qx τ + u τ Ru τ + γ N x N Q fx N, where x t+ = Ax t +Bu t, x is given, and, as usual, we assume Q = Q, Q f = Q f and R = R > are constant. he parameter γ, called the exponential weighting factor, is positive. For γ =, this reduces to the standard LQR cost function. For γ <, the penalty for future state and input deviations is smaller than in the present; in this case we call γ the discount factor or forgetting factor. When γ >, future costs are accentuated compared to present costs. his gives added incentive for the input to steer the state towards zero quickly. 2

(a Note that we can find the input sequence u,...,u N that minimizes J using standard LQR methods, by considering the state and input costs as time varying, with Q t = γ t Q, R t = γ t R, and final cost given by γ N Q f. hus, we know at least one way to solve the exponentially weighted LQR problem. Use this method to find the recursive equations that give u. (b Exponential weights can also be incorporated directly into a dynamic programming formulation. We define N W t (z = min γ ( τ t x τ Qx τ + u τ Ru τ + γ N t x N Q fx N, τ=t where x t = z, x τ+ = Ax τ + Bu τ, and the minimum is over u t,...,u N. his is the minimum cost-to-go, if we started in state z at time t, with the time weighting also starting at t. Argue that we have W N (z = x N Q fx N, W t (z = min w ( z Qz + w Rw + γw t+ (Az + Bw, and that the minimizing w is in fact u t. In other words, work out a backwards recursion for W t, and give an expression for u t in terms of W t. Show that this method yields the same u as the first method. (c Yet another method can be used to find u. Define a new system as y t+ = γ /2 Ay t + γ /2 Bz t, y = x. Argue that we have y t = γ t/2 x t, provided z t = γ t/2 u t, for t =,..., N. With this choice of z, the exponentially weighted LQR cost J for the original system is given by Solution: J = N τ= ( y τ Qy τ + z τ Rz τ + y N Q f y N, i.e., the unweighted LQR cost for the modified system. We can use the standard formulas to obtain the optimal input for the modified system z, and from this, we can get u. Do this, and verify that once again, you get the same u. (a he value function is V t (z = min ( N τ=t γ τ ( x τ Qx τ + u τ Ru τ + x N γ N Q f x N where the minimum is over all input sequences u(,..., u(n, and x t = z, x τ+ = Ax τ + Bu τ for τ = t,...,n. We can express this as V t (z = min N ( x τ Q τ x τ + u τ R τu τ + x N γ N Q f x N, τ=t 3,

where Q τ = γ τ Q and R τ = γ τ R. his is just a standard LQR problem with time-varying Q and R. hus from the lecture notes we have V t (z = z P t z, u t = K tx t, where P t and K t are given by the following recursion P N = γ N Q f, P t = Q t + A P t A A P t B ( R t + B P t B B P t A = γ t Q + A P t A A P t B ( γ t R + B P t B B P t A, K t = ( R t + B P t+ B B P t+ A = ( γ t R + B P t+ B B P t+ A. (b Clearly we have W N (z = z Q f z, since at the last step the cost is not a function of the input. Now let u t = w. We can apply the dynamic programming principle to the equation for W t (z by first minimizing with respect to w and then with respect to the other inputs. If we do that, it is easy to see that W t (z = min w ( z Qz + w Rw + γw t+ (Az + Bw. Note that the minimizing w is equal to the optimal input u t. his is because W (z = V (z for all z and therefore a set of minimizing inputs for the above HJ equation will also minimize J. We will now show by induction that W t (z = z S t z, i.e. that W t (z is a quadratic form in z. Let us assume that this is true for t = τ +. hen using the above HJ equation we have W τ (z = min w ( z Qz + w Rw + γ(az + Bw S τ+ (Az + Bw. o find w, we take the derivatives with respect to w of the argument within the minimum above, and set it equal to zero. We get w = γ(r + γb S τ+ B B S τ+ Az. By substituting the optimal w in the expression for W t (z we obtain W t (z = z ( Q + γa S t+ A γ 2 A S t+ B(R + γb S t+ B B S t+ A z, which is still a quadratic form. his combined with the fact that W N (z = z Q f z is sufficient to see that W t (z is indeed a quadratic form. Also, the optimum input u t is given by u t = γ(r + γb S τ+ B B S τ+ Ax t. 4

Actually S t and P t are closely related, since we have W t (z = min ( N = γ t min = γ t V t (z τ=t ( N γ τ t ( x τ Qx τ + u τ Ru τ + x N Q f x N τ=t γ τ ( x τ Qx τ + u τ Ru τ + x N Q f x N hus z S t z = γ t z P t z, z, or in other words S t = γ t P t. Substituting this into the formula for u t we get u t = ( R + B (γ t P t+ B B A(γ t P t+ x t = ( γ t R + B P t+ B B AP t+ x t. herefore this method gives the same formula for u t as the one of part (a. (c We want to show that y t = γ t/2 x t, provided that z t = γ t/2 u t. his can be shown by induction. he argument holds for t =, since y = x. Now suppose that y t = γ t/2 x t. We have y t+ = γ /2 Ay t + γ /2 Bz t = γ (t+/2 Ax t + γ (t+/2 Bu t = γ (t+/2 (Ax t + Bu t = γ (t+/2 x t+. hus the argument also holds for y t+. Now, let s express the cost function in terms of y t and z t. We have J = = = N τ= N τ= N τ= γ τ ( x τ Qx τ + u τ Ru τ + γ N x N Q fx N ( (γ τ/2 x τ Q(γτ/2 x τ + (γ τ/2 u τ R(γτ/2 u τ + (γ N/2 x N Q f(γ N/2 x N ( y τ Qy τ + z τ Rz τ + y N Q f y N. Using the formulas from the lecture notes we get that the minimizing z t for the above expression is given by z t = γ(r + γb Pt+ B B Pt+ Ay t, 5

where P t is found by the following recursion P N = Q f, P t = Q + γa Pt A γ 2 A Pt B(R + γb Pt B B Pt A. Note that since the original problem and this modified problem have the same cost function, the optimum zt will give us the optimum u t of the original problem, using the relation zt = γt/2 u t. We can also show by an inductive argument that P t = γ t Pt. Combining these two relations in the formula for zt, we get γ t/2 u t = γ t (R + γ t B P t+ B B P t+ Ay t = γ t/2 (γ t R + B P t+ B B P t+ Ax t We therefore get u t = ( γ t R + B P t+ B B AP t+ x t, which is the same expression as the one obtained in parts (a and (b. 6