Training Schro dinger s Cat

Similar documents
Higher-Order Methods

Trust Regions. Charles J. Geyer. March 27, 2013

Unconstrained optimization

Optimization II: Unconstrained Multivariable

MATH 4211/6211 Optimization Quasi-Newton Method

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Math 411 Preliminaries

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

Line Search Methods for Unconstrained Optimisation

Optimization II: Unconstrained Multivariable

Quasi-Newton Methods

MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS

Nonlinear Programming

Numerical optimization

2. Quasi-Newton methods

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

5 Quasi-Newton Methods

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Introduction to the GRAPE Algorithm

Numerical Optimization Algorithms

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

Algorithms for Constrained Optimization

Programming, numerics and optimization

Unconstrained minimization of smooth functions

ECE580 Exam 2 November 01, Name: Score: / (20 points) You are given a two data sets

Convex Optimization CMU-10725

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical simulations of spin dynamics

ECE580 Partial Solution to Problem Set 3

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Iterative Methods for Smooth Objective Functions

Convex Optimization. Problem set 2. Due Monday April 26th

Performance Surfaces and Optimum Points

Geometry optimization

Constrained optimization: direct methods (cont.)

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Numerical Linear and Multilinear Algebra in Quantum Tensor Networks

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Optimization Methods

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

Gradient Descent. Dr. Xiaowei Huang

Image restoration: numerical optimisation

Solving linear equations with Gaussian Elimination (I)

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

Optimization for neural networks

Introduction to Nonlinear Optimization Paul J. Atzberger

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

Introduction to gradient descent

Numerical Optimization: Basic Concepts and Algorithms

HW3 - Due 02/06. Each answer must be mathematically justified. Don t forget your name. 1 2, A = 2 2

Modified Newton-Raphson GRAPE methods for optimal control of spin systems. D.L. Goodwin, Ilya Kuprov *

ALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS

Introduction to unconstrained optimization - direct search methods

Numerical Simulation of Spin Dynamics

Mathematical optimization

Optimization Methods. Lecture 19: Line Searches and Newton s Method

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Inverse Singular Value Problems

Nonlinear Optimization for Optimal Control

1 Numerical optimization

Efficient and Accurate Rectangular Window Subspace Tracking

Numerical Optimization

Written Examination

Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods

Multivariate Newton Minimanization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

1 Kernel methods & optimization

Symmetric Matrices and Eigendecomposition

Continuous Steepest Descent Paths for Non-Convex Optimization

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal

Nonlinear Optimization: What s important?

1 Numerical optimization

Matrix Derivatives and Descent Optimization Methods

Homework 5. Convex Optimization /36-725

Chapter 4. Unconstrained optimization

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Second Order Optimization Algorithms I

CONSTRAINED NONLINEAR PROGRAMMING

Arc Search Algorithms

Practical Optimization: Basic Multidimensional Gradient Methods

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

5 Handling Constraints

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Notes on Numerical Optimization

Transcription:

Training Schro dinger s Cat Quadratically Converging algorithms for Optimal Control of Quantum Systems David L. Goodwin & Ilya Kuprov d.goodwin@soton.ac.uk comp-chem@southampton, Wednesday 15th July 2015

Two equations... 2 J c (k) n 2 c (k) n 1 = σ ˆPN ˆPN 1 c (k) ˆPn2 n 2 c n (k) ˆPn1 ˆP2 ˆP1 ψ 0 (1) 1 iˆl t iĥ (k 1) n 1 t 0 exp 0 iˆl t iĥ (k 2) n 2 t = 0 0 iˆl t e i ˆL t c (k 1 ) e i ˆL t 1 2 e i ˆL t 2 n 1 c (k 1 n ) 1 c (k 2 n ) 2 e i ˆL t c (k 2 n ) 2 0 e i ˆL t 0 0 e i ˆL t (2)

01 Introducing Optimal Control 02 Newton-Raphson method 03 Gradient and Hessian 04 Van Loan s Augmented Exponentials 05 Regularisation

Introducing Optimal Control

Optimal Control Theory Optimal control can be thought of as an algorithm; there is a start and stop. Specifically, we can think of a dynamic system having a initial state and a target state. The optimality finds an algorithmic solution in a minimum of effort.

Newton-Raphson method

The Newton-Raphson method Taylor s Theorem Taylor series approximated to second order [1]. If f is continuously differentiable f(x + p) = f(x) + f(x + tp) T p If f is twice continuously differentiable f(x + p) = f(x) + f(x) T p + 1 2 pt 2 f(x + αp)p 1st order necessary condition: f(x ) = 0 2nd order necessary condition: 2 f(x ) is positive semidefinite Finding a minimum to the function using tangents of the gradient f (x) f (x) f (x) f(x) x0 x0 x0 2 2 2 2 x0 1 0 2 1 0 1 2 1 x 1 x1 0 2 1 0 1 2 1 x 1 x2 x1 0 2 1 0 1 2 1 x 1 x1 0 2 1 0 1 2 1 x2 x f (x) = x 2 1 f (x0) = 2x0 f (x1) = 2x1 f(x) = x3 3 x + 1 [1] B. Taylor. Inny, 1717, J. Nocedal and S. J. Wright. 1999.

Numerical Optimisation Algorithms Gradient Descent Step in direction opposite to local gradient. f( x + x) = f( x) + f( x) T x Newton-Raphson Quadratic approximation of objective function, moving to this minimum. f( x + x) = f( x) + f( x) T x + 1 2 xt H x Quasi-Newton BFGS Approximate H with information from the gradient history. H k+1 =H k + g k g T k g T k x k H k x k (H k x k ) T x T k H k x k

The Hessian Matrix The Newton step: p N k = H 1 k f k 2 f k = H k is the Hessian matrix, one of second order partial derivatives [2] : 2 f 2 f 2 f x 2 x 1 1 x 2 x 1 x n 2 f 2 f 2 f x H = 2 x 1 x 2 x 2 2 x n........ 2 f x n x 2 2 f x n x 1 2 f x 2 n The steepest descent method results from the same equation when we set H to the identity matrix. Quasi-Newton methods initialise H to the identity matrix, then to approximate it from an update formula using a gradient history. The Hessian proper must be positive definite (and quite well conditioned) to make an inverse; an indefinte Hessian results in non-descent search directions. [2] L. O. Hesse. Chelsea, 1972.

Gradient and Hessian

GRAPE Gradient Ascent Pulse Engineering Split Liouvillian to controllable and uncontrollable parts [3] ˆL(t) = Ĥ 0 + c (k) (t)ĥ k k T Maximise the fidelity measure, J = Re ˆσ exp (0) [ i 0 ] ˆL(t)dt ˆρ(0) J Optimality conditions, c k = 0 at a minimum, and the Hessian matrix (t) should be positive definite Discretize the time into small fixed intervals during which the control functions are assumed to be constant (piecewise-constant approximation). c (k) n t 1 t n t N [3] N. Khaneja et al. In: Journal of Magnetic Resonance 172.2 (2005), pp. 296 305. t t

GRAPE gradient Gradient found from forward and backward propagation: J = σ ˆPN ˆPN 1 ˆPN 2 ˆPN 3 ˆPN 4... ˆP3 ˆP2 ˆP1 ρ 0 }{{} (I) propagate forwards from source c (k) N 3 ˆP N 3 (III) compute expectation of the derivative (II) propagate backwards from target {}}{ J = σ ˆPN ˆPN 1 ˆPN 2 ˆPN 3 ˆPN 4... ˆP3 ˆP2 ˆP1 ρ 0 Propagator over a time slice: [ ( ˆP n = exp i Ĥ 0 + k c (k) n Ĥ k ) t ]

GRAPE Hessian (block) diagonal elements 2 J 2 = σ ˆPN ˆPN 1 c (k) n 2 c (k) n 2 ˆP n ˆP2 ˆP1 ψ 0 non-diagonal elements 2 J c (k) n 2 c (k) n 1 = σ ˆPN ˆPN 1 c (k) ˆPn2 n 2 c (k) ˆPn1 ˆP2 ˆP1 ψ 0 n 1 All propagators of the non-diagonal blocks have been calculated within a gradient calculation, and can be reused. Only need to find the diagonal blocks.

Van Loan s Augmented Exponentials

Efficient Gradient Calculation Augmented Exponentials Among the many complicated functions encountered in magnetic resonance simulation context, chained exponential integrals involving square matrices A k and B k occur particularly often: t 0 dt 1 t 1 0 t n 2 } dt 2 dt n 1 {e A 1(t t 1 ) B 1 e A 2(t 1 t 2 ) B 2 e A 1(t t 1 ) B n 1 e Ant n 1 0 A method for computing some of the integrals of the general type shown in Equation of this type was proposed by Van Loan in 1978 [4] (pointed out by Sophie Schirmer [5] ) Using this augmented exponential technique, we can write an upper-triangular block matrix exponential as ( ) t 1 A B exp = eat e A(t s) Be As ds 0 A 0 = ea e A(1 s) Be As ds 0 e At 0 0 e A [4] C. F. Van Loan. In: Automatic Control, IEEE Transactions on 23.3 (1978), pp. 395 404. [5] F. F. Floether, P. de Fouquieres and S. G. Schirmer. In: New Journal of Physics 14.7 (2012), p. 073023.

Efficient Gradient Calculation Augmented Exponentials Find the derivative of the control pulse at a specific time point set 1 0 ) e A(1 s) Be As ds = D cn (t) exp ( iˆl t B = iĥ n (k) t leading to an efficient calculation of the gradient element ( ) iˆl t iĥ n (k) t exp = e i ˆL t c (k) n 0 iˆl t 0 e i ˆL t e i ˆL t

Efficient Hessian Calculation Augmented Exponentials second order derivatives can be calculated with a 3 3 augmented exponential [6] set 1 s 0 0 ) e A(1 s) B n1 e A(s r) B n2 e Ar drds = Dc 2 n1 c n2 (t) exp ( iˆl t B n = iĥ n (k) t Giving the efficient Hessian element calculation iˆl t iĥ (k 1) n 1 t 0 exp 0 iˆl t iĥ (k 2) n 2 t = 0 0 iˆl t e i ˆL t c (k 1 ) e i ˆL t 1 2 n 1 2 e i ˆL t c (k 1 n ) 1 c (k 2 n ) 2 0 e i ˆL t c (k 2 ) n 2 e i ˆL t 0 0 e i ˆL t [6] T. F. Havel, I Najfeld and J. X. Yang. In: Proceedings of the National Academy of Sciences 91.17 (1994), pp. 7962 7966, I. Najfeld and T. Havel. In: Advances in Applied Mathematics 16.3 (1995), pp. 321 375.

Overtone excitation Simulation ˆT 1,0 ˆT 2,2 Excite 14 N from a state ˆT 1,0 ˆT 2,2. Solid state powder average, with objective functional weighted over the crystalline orientations (rank 17 Lebedev grid - 110 points). Nuclear quadrupolar interaction. 400 time points for total pulse duration of 40µs

Overtone excitation Comparison of BFGS and Newton-Raphson 1 0.95 BFGS Newton 0.9 0.85 0.8 1-fidelity 0.75 0.7 0.65 0.6 0.55 0.5 0 20 40 60 80 100 iterates

Regularisation

Problems Singularities everywhere! BFGS (using the DFP formula) is guaranteed to produce a positive definite Hessian update The Newton-Raphson method does not: p N k = H 1 k f k Properties of the Hessian matrix: 1. Must be symmetric: 2 = 2 c (i) c (j) c (j) c (i) ; not if control operators commute 2. Must be sufficiently positive definite; non-singular; invertible. 3. The Hessian is diagonally dominant.

Regularisation Avoiding Singularities Common when we have negative eigenvalues, regularise the Hessian to be positive definite [7]. Check eigenvalues performing an eigendecomposition of the Hessian matrix: H = QΛQ 1 Add the smallest negative eigenvalue to all eigenvalues, then reform the Hessian with initial eigenvectors: λ min = max [0, min(λ)] H reg =Q(Λ + λ min Î)Q 1 TRM Introduce a constant δ; region of a radius we trust to give a sufficiently positive definite Hessian. H reg = Q(Λ + δλ min Î)Q 1 However, if δ is too small, the Hessian will become ill-conditioned. RFO The method proceeds to construct an augmented Hessian matrix [ ] δ 2 H δ g H aug = = QΛQ 1 δ g 0 [7] X. P. Resina. PhD thesis. Universitat Autónoma de Barcelona, 2004.

State transfer, ˆL(H), ˆL(C), ˆL(C) y, ˆL(F ) x, ˆL(F ) {ˆL(H) Controls := x y x y valid vs. invalid parametrisation of Lie groups. } C +140 HzH 160 Hz F Interaction parameters of a hydroflourocarbon molecular group used in state transfer simulations, with a magnetic induction of 9.4 Tesla

State transfer Comparison of BFGS and Newton-Raphson I 10 0 10 0 1-fidelity 1-fidelity 10-1 10-1 0 50 100 150 iterations 0 500 1000 wall clock time (s)

State transfer Comparison of BFGS and Newton-Raphson II 10 0 10 0 10-2 10-2 10-4 10-4 1-fidelity 10-6 1-fidelity 10-6 10-8 10-8 10-10 0 50 100 iterations 10-10 0 1000 2000 wall clock time (s)

Closing Remarks Hessian calculation that scales with O(n) computations. Efficient directional derivative calculation with augmented exponentials better regularisation and conditioning infeasible start points? Different line search methods? forced symmetry of a Hessian