Structure of the Hessian Graham C. Goodwin September 2004
1. Introduction We saw in earlier lectures that a core ingredient in quadratic constrained optimisation problems is the Hessian matrix H. So far we have simply given an in principle approach to the evaluation of this matrix. That is, for a system with state equations x k+1 = Ax k + Bu k, we can compute the Hessian as H=Γ T QΓ+R, whereγ, Q, R.
This will be satisfactory for simple problems. However, for more complex problems (for example, high order systems or problems having mixed stable and unstable modes) this brute force approach may fail. A hint as to the source of the difficulties is that the direct way of computing the Hessian depends on powers of the system matrix A. Clearly, if the system has unstable modes, then some entries ofγwill diverge as N increases.
We will show in this lecture that this problem can be resolved by focusing attention on the stable and unstable parts of the system separately.
2. Separating Stable and Unstable Modes Our eventual goal in this lecture is to gain a better understanding of the structure of the Hessian matrix particularly for large optimisation horizons. However, as mentioned above, the straightforward approach to evaluating the Hessian will often meet difficulties for open loop unstable plants due to exponential divergence of the system impulse response.
One way of addressing this problem is to recognise that there is an intimate connection between stability and causality. In particular, a system having all modes unstable becomes stable if viewed in reverse time, that is, as an anti-causal system. This line of reasoning leads to an alternative viewpoint in which unstable modes are treated differently. We show below that this leads to an equivalent problem formulation with a different Hessian having different properties.
Consider a discrete time linear system and suppose that it has no eigenvalues on the unit circle. We can then partition the state vector x k R n as [ ] x s x k = k x u, k where the states x s k and xu are associated with the stable and k unstable modes, respectively.
We can then factor the state equations into stable and unstable parts as follows: ][ ] ] x s [ ] [ x s As 0 k+1 x u = 0 A k+1 u y k = C x k = [ C s k x u k + [ Bs B u ] [ ] x s C k u x u, k u k, (1) where u k R m and y k R p (p m). The eigenvalues of A s have moduli less than one, and the eigenvalues of A u have moduli greater than one.
We can then express the solution of the system equations as k 1 x s k = A s k xs 0 + j=0 A k 1 j s B s u j for k= 1,...,N, (2) N 1 x u k = A (N k) u µ A k 1 j u B u u j for k= 0,...,N 1. (3) j=k x u N µ,
We then need an equality constraint that bothµand the sequence of control signals{u 0,...,u N 1 } need to satisfy in order to bring the unstable states back to their correct initial values, that is, N 1 Au N µ A j 1 u B u u j = x u 0. (4) j=0
We are thus led to the following equivalent statement of the optimisation problem. P N (x) : V OPT N (x) min V N({x k },{u k },µ), (5) subject to: [ ] x s x k = k x u k k 1 x s k = A s k xs 0 + for k= 0,...,N, j=0 A k 1 j s B s u j for k= 1,...,N, N 1 x u k = A (N k) u µ A k 1 j u B u u j for k= 0,...,N 1, j=k (6)
where V N ({x k },{u k },µ) 1 2 [ ] x s x 0 = 0 x u = x, 0 u k U for k= 0,...,N 1, x k X for k= 0,...,N 1, [ ] x s x N = N X µ f X, [ ] x s T [ ] N x s P N + 1 N 1 (x T µ µ k 2 Qx k+ u T k Ru k ). (7) k=0
The above formulation of the problem, at the expense of the introduction of the additional optimisation variable µ, avoids exponentially diverging terms in the computation of the Hessian matrix. Thus, at least intuitively, it would seem to be more apposite for studying the structure of the problem for large horizons.
3. Numerical Issues in the Computation of the Hessian We represent the time evolution of the system output using the usual vector notation. Thus, let y= [ y T 1 y T 2... y T N] T, u= [ u T 0 u T 1... u T N 1 ] T. (8)
where C s A s C s A 2 s Ω s =. C s As N y=(γ s +Γ u ) u+ω } {{ } s x s 0 +Ω uµ, (9) Γ, Γ s = C s B s 0 0 C s A s B s C s B s 0.,..... C s A N 1 s B s C s A N 2 s B s C s B s
C u A (N 1) u 0 C u A 1 C u A (N 2) u B u C u A 2 u B u... C u A (N 1) u 0 0 C u A Ω u = u. 1 u B u... C u A (N 2) u, Γ u =......... C u A 1 u 0 0 0 C u A 1 u C u 0 0 0 0 B u B u B u
The matrix Γ Γ s +Γ u has the form h 0 h 1 h 2...... h (N 1) h 1 h0 h 1...... h (N 2).. Γ Γ s +Γ u =................. h N 1 hn 2...... h1 h0. h0 h 1, (10)
where{ hk : k= (N 1),...,N 1}, is a finite subsequence of the infinite sequence{ hk : k=,..., } defined by { hk : k= 0,..., } {C s B s, C s A s B s, C s A 2 s B s,...}, (11) { hk : k= 1,..., } { C u A 1 u B u, C u A 2 u B u, C u A 3 u B u,...}. (12)
Consider, P= Q and Q= C T C and R=ρI m > 0. V N as follows 1 : V N (x, u, y,µ)= 1 2 (xt Qx+ y T y+ρu T u). 1 We keep the function V N but change its arguments as appropriate.
V N (x, u,µ)= 1 2 [ x T Qx+ ( Γu+Ω s x s 0 +Ω uµ ) T( Γu+Ω s x s 0 +Ω uµ ) +ρu T u ] = 1 2 ut [ Γ T Γ+ρI]u+u T Γ T( Ω s x s 0 +Ω uµ ) + 1 ( Ωs x s 0 2 +Ω uµ ) T( Ωs x s 0 +Ω uµ ). (13)
With respect to the new optimisation variables (u, µ), the modified Hessian of the quadratic objective function is ] [ Γ H T Γ+ρI Γ T Ω u Ω T Γ u Ω T. (14) uω u
For future use, we extract the left-upper submatrix, which we call the regularised sub-hessian : H N Γ T Γ+ρI. (15)
The above problem formulation has the ability to ameliorate the numerical difficulties encountered when dealing with unstable plants. Indeed, we see that all terms depend only on exponentially decaying quantities.
Example Consider a single input-single output system with stable and unstable modes defined via the following matrices: A s = [ ] 1.442 0.64, B 1 0 s = [ ] 1, C 0 s = [ ] 0.721 0.64, and A u = 2, B u = 1, C u = 1.
we take Q= [ C s C u ] T [ Cs C u ], R= 0.1, P= Q. We consider input constraints of the form u k 1and no state constraints.
200 180 Standard Hessian Modified Hessian 160 140 Condition number 120 100 80 60 40 20 0 0 5 10 15 20 25 30 35 40 45 50 Horizon N Figure: Condition number of the modified Hessian (14) (circle-dashed line) and that of the standard Hessian (plus-solid line).
90 80 Modified Hessian Standard Hessian 70 Objective function value 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 40 45 50 Horizon N Figure: Comparison of the objective function value for different prediction horizons: values achieved using the modified Hessian (circle-dashed line) and using the standard Hessian (plus-solid line).
4. Revision of System Frequency Response Consider again the system split into stable and unstable parts. The system transfer function is clearly where G(z)=G s (z)+g u (z), (16) G s (z)=c s (zi A s ) 1 B s, (17) G u (z)=c u (zi A u ) 1 B u, (18)
We will study properties of the frequency response of, G(e jω ), via its singular value decomposition [SVD], which has the form G(e jω )=U(e jω )Σ(ω)V H (e jω ). (19) where, U and V are matrices containing the left and right singular vectors of G, satisfying U C p p, U H U= UU H = I p, V C m m, V H V= VV H = I m, and Σ(ω)=diag{σ 1 (ω),...,σ m (ω)}, (20)
Lemma Let Ḡ(z) be the two-sided Z-transform of the sequence { hk : k=,..., } embedded in Γ. Then Ḡ(z) is given by Ḡ(z)=zG(z), (21) where G(z) is the system transfer function. Moreover, the region of convergence of Ḡ(z) is given by max{ λ i (A s ) }< z <min{ λ i (A u ) }, where{λ i ( )} is the set of eigenvalues of the corresponding matrix. Proof: By direct calculation.
5. Connecting the Singular Values of the Regularised Sub-Hessian to the System Frequency Response We show that there is a direct connection between the singular values of HN, for large N, and the system frequency response.
The result of the previous Lemma ensures that given anyε>0 there exists k 0 > 0 such that Ḡ(e jω ) k 2 2 0 2 hk e jωk <ε for all w [ π,π], (22) k= k 0 since the sequence { hk } contains only exponentially decaying terms. 2
Recall the structure of Γ and using the definition ofψ l given above, we see that the regularised sub-hessian can be written as X 1 0... 0... Ψ 2k0... Ψ 0... Ψ 2k0 0... 0. 0............ H N =........... +ρi,.. 0 0... 0 Ψ 2k0... Ψ 0... Ψ 2k0... 0... 0 X 2 provided N (4k 0 + 1)m. Also, X 1 and X 2 are appropriate submatrices. (23)
Consider the Nm m matrix E N,ω [ e 1 N,ω em N,ω ] 1 N Ē N,ω V(e jω ), (24) where I m e jω I m Ē N,ω, ω= 2π l forl {0,...,N 1}.. N e j(n 1)ω I m
Consider the discrete time linear system, and the SVD of its frequency response. Let k 0 > 0 be such that the tails of the impulse response are negligible. Consider HN for N 4k 0 + 1 and ρ=0, which is the regularised sub-hessian of the quadratic objective function V N withρ=0. Then, for every given frequency ω 0 = 2π N 0 l 0,l 0 {0,...,N 0 1}, we have that lim HN e i 2 N N,ω 0 =σ 2 i (ω 0) for i= 1,...,m, N 0 where e i N,ω 0 is the ith column of E N,ω0 defined in (24), and N/N 0 {1, 2,...}.
Proof The key idea is to note that the inner sub-matrix of HN has a simple shift structure which allows one to easily apply Fourier methods. We then simply have to bound the end effects due to the other terms.
Corollary Consider the same conditions of the above Theorem and choose ρ>0. Then lim HN e i 2 N N,ω 0 =σ 2 i (ω 0)+ρ for i= 1,...,m, (25) N 0 where N/N 0 {1, 2,...}.
Example Consider a 2 input-2 output system with stable and unstable modes defined via the following matrices. A s = [ ] 1.442 0.64, B 1 0 s = [ ] 1 0, C 0 0 s = [ ] 0.721 0.64, 0.36 0.32 and A u = 2, B u = [ 0 1 ], C u = [ ] 0.1. 0.1
Consider the finite horizon optimal control problem (5) (7) and select Q= [ C s C u ] T [ Cs C u ], R= 0, P= Q.
5.5 0.45 5 0.4 4.5 4 3.5 3 2.5 2 1.5 1 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0.5 0 0 ω [rad/sec] 10 2 10 1 10 0 (a)σ 1 (w) 2 0.05 10 2 10 1 10 0 ω [rad/sec] (b)σ 2 (w) 2 Figure: Singular values of the regularised sub-hessian HN (circles) with N = 61. The continuous lines represent the square of the two principal gains of the system.
5.5 0.45 5 0.4 4.5 4 3.5 3 2.5 2 1.5 1 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0.5 0 0 ω [rad/sec] 10 2 10 1 10 0 (a)σ 1 (w) 2 0.05 10 2 10 1 10 0 ω [rad/sec] (b)σ 2 (w) 2 Figure: Singular values of the regularised sub-hessian HN (circles) with N= 401. The continuous lines represent the square of the two principal gains of the system.
6. Discussion In this lecture, we have seen: (i) Numerical conditioning of the Hessian can be improved by solving the stable part in forward time and the unstable part in reverse time. This exploits the connection between stability and causality. (ii) That the singular values of HN converge to the Frequency Domain Principle gains of the system.