Training Schro dinger s Cat Quadratically Converging algorithms for Optimal Control of Quantum Systems David L. Goodwin & Ilya Kuprov d.goodwin@soton.ac.uk comp-chem@southampton, Wednesday 15th July 2015
Two equations... 2 J c (k) n 2 c (k) n 1 = σ ˆPN ˆPN 1 c (k) ˆPn2 n 2 c n (k) ˆPn1 ˆP2 ˆP1 ψ 0 (1) 1 iˆl t iĥ (k 1) n 1 t 0 exp 0 iˆl t iĥ (k 2) n 2 t = 0 0 iˆl t e i ˆL t c (k 1 ) e i ˆL t 1 2 e i ˆL t 2 n 1 c (k 1 n ) 1 c (k 2 n ) 2 e i ˆL t c (k 2 n ) 2 0 e i ˆL t 0 0 e i ˆL t (2)
01 Introducing Optimal Control 02 Newton-Raphson method 03 Gradient and Hessian 04 Van Loan s Augmented Exponentials 05 Regularisation
Introducing Optimal Control
Optimal Control Theory Optimal control can be thought of as an algorithm; there is a start and stop. Specifically, we can think of a dynamic system having a initial state and a target state. The optimality finds an algorithmic solution in a minimum of effort.
Newton-Raphson method
The Newton-Raphson method Taylor s Theorem Taylor series approximated to second order [1]. If f is continuously differentiable f(x + p) = f(x) + f(x + tp) T p If f is twice continuously differentiable f(x + p) = f(x) + f(x) T p + 1 2 pt 2 f(x + αp)p 1st order necessary condition: f(x ) = 0 2nd order necessary condition: 2 f(x ) is positive semidefinite Finding a minimum to the function using tangents of the gradient f (x) f (x) f (x) f(x) x0 x0 x0 2 2 2 2 x0 1 0 2 1 0 1 2 1 x 1 x1 0 2 1 0 1 2 1 x 1 x2 x1 0 2 1 0 1 2 1 x 1 x1 0 2 1 0 1 2 1 x2 x f (x) = x 2 1 f (x0) = 2x0 f (x1) = 2x1 f(x) = x3 3 x + 1 [1] B. Taylor. Inny, 1717, J. Nocedal and S. J. Wright. 1999.
Numerical Optimisation Algorithms Gradient Descent Step in direction opposite to local gradient. f( x + x) = f( x) + f( x) T x Newton-Raphson Quadratic approximation of objective function, moving to this minimum. f( x + x) = f( x) + f( x) T x + 1 2 xt H x Quasi-Newton BFGS Approximate H with information from the gradient history. H k+1 =H k + g k g T k g T k x k H k x k (H k x k ) T x T k H k x k
The Hessian Matrix The Newton step: p N k = H 1 k f k 2 f k = H k is the Hessian matrix, one of second order partial derivatives [2] : 2 f 2 f 2 f x 2 x 1 1 x 2 x 1 x n 2 f 2 f 2 f x H = 2 x 1 x 2 x 2 2 x n........ 2 f x n x 2 2 f x n x 1 2 f x 2 n The steepest descent method results from the same equation when we set H to the identity matrix. Quasi-Newton methods initialise H to the identity matrix, then to approximate it from an update formula using a gradient history. The Hessian proper must be positive definite (and quite well conditioned) to make an inverse; an indefinte Hessian results in non-descent search directions. [2] L. O. Hesse. Chelsea, 1972.
Gradient and Hessian
GRAPE Gradient Ascent Pulse Engineering Split Liouvillian to controllable and uncontrollable parts [3] ˆL(t) = Ĥ 0 + c (k) (t)ĥ k k T Maximise the fidelity measure, J = Re ˆσ exp (0) [ i 0 ] ˆL(t)dt ˆρ(0) J Optimality conditions, c k = 0 at a minimum, and the Hessian matrix (t) should be positive definite Discretize the time into small fixed intervals during which the control functions are assumed to be constant (piecewise-constant approximation). c (k) n t 1 t n t N [3] N. Khaneja et al. In: Journal of Magnetic Resonance 172.2 (2005), pp. 296 305. t t
GRAPE gradient Gradient found from forward and backward propagation: J = σ ˆPN ˆPN 1 ˆPN 2 ˆPN 3 ˆPN 4... ˆP3 ˆP2 ˆP1 ρ 0 }{{} (I) propagate forwards from source c (k) N 3 ˆP N 3 (III) compute expectation of the derivative (II) propagate backwards from target {}}{ J = σ ˆPN ˆPN 1 ˆPN 2 ˆPN 3 ˆPN 4... ˆP3 ˆP2 ˆP1 ρ 0 Propagator over a time slice: [ ( ˆP n = exp i Ĥ 0 + k c (k) n Ĥ k ) t ]
GRAPE Hessian (block) diagonal elements 2 J 2 = σ ˆPN ˆPN 1 c (k) n 2 c (k) n 2 ˆP n ˆP2 ˆP1 ψ 0 non-diagonal elements 2 J c (k) n 2 c (k) n 1 = σ ˆPN ˆPN 1 c (k) ˆPn2 n 2 c (k) ˆPn1 ˆP2 ˆP1 ψ 0 n 1 All propagators of the non-diagonal blocks have been calculated within a gradient calculation, and can be reused. Only need to find the diagonal blocks.
Van Loan s Augmented Exponentials
Efficient Gradient Calculation Augmented Exponentials Among the many complicated functions encountered in magnetic resonance simulation context, chained exponential integrals involving square matrices A k and B k occur particularly often: t 0 dt 1 t 1 0 t n 2 } dt 2 dt n 1 {e A 1(t t 1 ) B 1 e A 2(t 1 t 2 ) B 2 e A 1(t t 1 ) B n 1 e Ant n 1 0 A method for computing some of the integrals of the general type shown in Equation of this type was proposed by Van Loan in 1978 [4] (pointed out by Sophie Schirmer [5] ) Using this augmented exponential technique, we can write an upper-triangular block matrix exponential as ( ) t 1 A B exp = eat e A(t s) Be As ds 0 A 0 = ea e A(1 s) Be As ds 0 e At 0 0 e A [4] C. F. Van Loan. In: Automatic Control, IEEE Transactions on 23.3 (1978), pp. 395 404. [5] F. F. Floether, P. de Fouquieres and S. G. Schirmer. In: New Journal of Physics 14.7 (2012), p. 073023.
Efficient Gradient Calculation Augmented Exponentials Find the derivative of the control pulse at a specific time point set 1 0 ) e A(1 s) Be As ds = D cn (t) exp ( iˆl t B = iĥ n (k) t leading to an efficient calculation of the gradient element ( ) iˆl t iĥ n (k) t exp = e i ˆL t c (k) n 0 iˆl t 0 e i ˆL t e i ˆL t
Efficient Hessian Calculation Augmented Exponentials second order derivatives can be calculated with a 3 3 augmented exponential [6] set 1 s 0 0 ) e A(1 s) B n1 e A(s r) B n2 e Ar drds = Dc 2 n1 c n2 (t) exp ( iˆl t B n = iĥ n (k) t Giving the efficient Hessian element calculation iˆl t iĥ (k 1) n 1 t 0 exp 0 iˆl t iĥ (k 2) n 2 t = 0 0 iˆl t e i ˆL t c (k 1 ) e i ˆL t 1 2 n 1 2 e i ˆL t c (k 1 n ) 1 c (k 2 n ) 2 0 e i ˆL t c (k 2 ) n 2 e i ˆL t 0 0 e i ˆL t [6] T. F. Havel, I Najfeld and J. X. Yang. In: Proceedings of the National Academy of Sciences 91.17 (1994), pp. 7962 7966, I. Najfeld and T. Havel. In: Advances in Applied Mathematics 16.3 (1995), pp. 321 375.
Overtone excitation Simulation ˆT 1,0 ˆT 2,2 Excite 14 N from a state ˆT 1,0 ˆT 2,2. Solid state powder average, with objective functional weighted over the crystalline orientations (rank 17 Lebedev grid - 110 points). Nuclear quadrupolar interaction. 400 time points for total pulse duration of 40µs
Overtone excitation Comparison of BFGS and Newton-Raphson 1 0.95 BFGS Newton 0.9 0.85 0.8 1-fidelity 0.75 0.7 0.65 0.6 0.55 0.5 0 20 40 60 80 100 iterates
Regularisation
Problems Singularities everywhere! BFGS (using the DFP formula) is guaranteed to produce a positive definite Hessian update The Newton-Raphson method does not: p N k = H 1 k f k Properties of the Hessian matrix: 1. Must be symmetric: 2 = 2 c (i) c (j) c (j) c (i) ; not if control operators commute 2. Must be sufficiently positive definite; non-singular; invertible. 3. The Hessian is diagonally dominant.
Regularisation Avoiding Singularities Common when we have negative eigenvalues, regularise the Hessian to be positive definite [7]. Check eigenvalues performing an eigendecomposition of the Hessian matrix: H = QΛQ 1 Add the smallest negative eigenvalue to all eigenvalues, then reform the Hessian with initial eigenvectors: λ min = max [0, min(λ)] H reg =Q(Λ + λ min Î)Q 1 TRM Introduce a constant δ; region of a radius we trust to give a sufficiently positive definite Hessian. H reg = Q(Λ + δλ min Î)Q 1 However, if δ is too small, the Hessian will become ill-conditioned. RFO The method proceeds to construct an augmented Hessian matrix [ ] δ 2 H δ g H aug = = QΛQ 1 δ g 0 [7] X. P. Resina. PhD thesis. Universitat Autónoma de Barcelona, 2004.
State transfer, ˆL(H), ˆL(C), ˆL(C) y, ˆL(F ) x, ˆL(F ) {ˆL(H) Controls := x y x y valid vs. invalid parametrisation of Lie groups. } C +140 HzH 160 Hz F Interaction parameters of a hydroflourocarbon molecular group used in state transfer simulations, with a magnetic induction of 9.4 Tesla
State transfer Comparison of BFGS and Newton-Raphson I 10 0 10 0 1-fidelity 1-fidelity 10-1 10-1 0 50 100 150 iterations 0 500 1000 wall clock time (s)
State transfer Comparison of BFGS and Newton-Raphson II 10 0 10 0 10-2 10-2 10-4 10-4 1-fidelity 10-6 1-fidelity 10-6 10-8 10-8 10-10 0 50 100 iterations 10-10 0 1000 2000 wall clock time (s)
Closing Remarks Hessian calculation that scales with O(n) computations. Efficient directional derivative calculation with augmented exponentials better regularisation and conditioning infeasible start points? Different line search methods? forced symmetry of a Hessian