STCE. Adjoint Code Design Patterns. Uwe Naumann. RWTH Aachen University, Germany. QuanTech Conference, London, April PDF Free Download

Adjoint Code Design Patterns Uwe Naumann RWTH Aachen University, Germany QuanTech Conference, London, April 2016

Outline Why Adjoints? What Are Adjoints? Software Tool Support: dco/c++ Adjoint Code Design Patterns Naumann, April 2016, Adjoint Code Design Patterns 2

Why Adjoints? Setting the Stage In U.N. and J. du Toit: Adjoint Algorithmic Differentiation Tool Support for Typical Numerical Patterns in Computational Finance. Under Review. Also appeared as NAG Technical Report TR 3/14, The Numerical Algorithms Group. we price a simple European Call option written on an underlying S = S(t) : IR + IR +, described by the SDE ds(t) = S(t) r dt + S(t) σ(t, S(t)) dw (t) with time t 0, maturity T > 0, strike K > 0, constant interest rate r > 0, volatility σ = σ(t, S) : IR + IR + IR +, and Brownian motion W = W (t) : IR + IR. We use this simple case study for illustration of common patterns found in financial simulations in the context of Adjoint Algorithmic Differention (AAD). Naumann, April 2016, Adjoint Code Design Patterns 3

Why Adjoints? Setting the Stage The value of a European call option driven by S is given by the expectation V = Ee r T (S(T ) K) + for given interest r, strike K, and maturity T. It can be evaluated using Monte Carlo simulation or restated as a PDE and solved by a finite difference method. We compare the computation of the gradient of the option price V IR + wrt. the problem parameters (K, S(0), r, T, and the variable number of parameters of the local volatility surface) by finite differences with an adjoint code based on dco/c++... Naumann, April 2016, Adjoint Code Design Patterns 4

Why Adjoints? Setting the Stage Monte Carlo method using 10 4 paths performing 360 time steps each; gradient of size n = 222; running on standard PC with 3GB of RAM available primal central FD naive AAD robust AAD run time (sec.) 2.3 1010.7 24.4 observing gradient with machine accuracy by AAD prohibitive memory requirement > 40GB for naive AAD (quasi-)constant factor (here approx. 10) wrt. primal run time for AAD within given memory bounds aiming for robust and efficient gradient (and Hessian) code download adjoint option pricers from www.nag.co.uk/downloads/addownloads including first- and second-order adjoint ensembles and evolutions Naumann, April 2016, Adjoint Code Design Patterns 5

What Are Adjoints? Let y = F (x), F : IR n IR m, be continuously differentiable. Then...... (approximate) tangents (directional derivatives) IR m l Ẏ = F (x) Ẋ IR m n IR n l yield (approximate) Jacobian at O(n) Cost(F );... adjoints IR n l X = F (x) T IR n m Ȳ IR m l yield Jacobian at O(m) Cost(F ) (cheap gradients). Challenge: Reversal of data flow yields potentially prohibitive memory requirement (O(Cost(F )))! Naumann, April 2016, Adjoint Code Design Patterns 6

Tangents by Overloading 5: y 0( ) 6: y 1(/) v4 x0 1/x1 4: exp v4 3: sin v4/x 2 1 cos(v2) 2: x1 x0 0: x 0 1: x 1 t := e sin(x0 x1) y 0 := x 0 t y 1 := t x 1 v 2 = x 0 x 1 v 2 = ẋ 0 x 1 + x 0 ẋ 1 v 3 = sin(v 2 ) v 3 = cos(v 2 ) v 2 v 4 = exp(v 3 ) v 4 = v 4 v 3 y 0 = x 0 v 4 ẏ 0 = ẋ 0 v 4 + x 0 v 4 y 1 = v 4 /x 1 ẏ 1 = v 4 /x 1 v 4 ẋ 1 /x 2 1 5: ẏ 0 6: ẏ 1 v4 x0 1/x1 4: v 4 v4 3: v 3 v4/x 2 1 cos(v2) 2: v 2 x1 x0 0: ẋ 0 1: ẋ 1 Naumann, April 2016, Adjoint Code Design Patterns 7

Adjoints by Overloading 5: y 0( ) 6: y 1(/) v4 x0 1/x1 4: exp v4 3: sin v4/x 2 1 cos(v2) 2: x1 x0 0: x 0 1: x 1 t := e sin(x0 x1) y 0 := x 0 t y 1 := t x 1 // primal... v 2 = v 3 = v 4 := 0 x 1 = v 4 /x 2 1 ȳ 1 v 4 + = 1/x 1 ȳ 1 v 4 + = x 0 ȳ 0 x 0 + = v 4 ȳ 0 v 3 + = v 4 v 4 v 2 + = cos(v 2 ) v 3 x 1 + = x 0 v 2 x 0 + = x 1 v 2 5: ȳ 0 6: ȳ 1 v4 x0 1/x1 4: v 4 v4 3: v 3 v4/x 2 1 cos(v2) 2: v 2 x1 x0 0: x 0 1: x 1 Naumann, April 2016, Adjoint Code Design Patterns 8

Software Tool Support: dco/c++ dco/c++ (derivative code by overloading in C++) features tangents and adjoints of arbitrary order through recursive template instantiation for numerical simulation code implemented in C++ front-ends for Fortran and C# (beta) optimized assignment-level gradient code through expression templates cache-optimized internal representation in various incarnations vector modes / detection and exploitation of sparsity external adjoint / Jacobian interfaces user-defined intrinsics intrinsic NAG Library functions (e.g. Linear Algebra, Interpolation, Root Finding, Nearest Correlation Matrix) support for parallelism: thread-safe data structures, adjoint MPI library, GPU coupling Naumann, April 2016, Adjoint Code Design Patterns 9

Software Tool Support: dco/c++ Moreover, we have implemented an industrial-strength software engineering infrastructure 1 including professional software development and quality assurance (NAG ) multi-platform (Windows, Linux, Mac OS X) / compiler (vcc, gcc, clang, icc) support overnight building / testing version control user documentation / examples numerous successful applications, including QuantLib, OpenFOAM, PETSc, ICON, Telemac, Jurassic2, ACE+, and several confidential projects with tier-1 investment banks 1 building on academic prototype Naumann, April 2016, Adjoint Code Design Patterns 10

Adjoint Code Design An algorithmic adjoint of F : IR n IR m : y z q = F (x) computes X = F T Ȳ = IR n l IR m l Z {}} k 1 { F 1T... F kt (z k 1 ) ( F k+1t... ( F qt Ȳ )...) }{{} Z k for x z 0 and based on adjoint code modules (ACModules) Z i 1 = F it Zi for i = q,..., 1. The art of differentiating computer programs / algorithmic differentiation (AD) is about provision of a correct, robust, efficient, scalable, flexible ACModule hierarchy. z 4 z 5 z 3 (F 3 F 2 ) z 1 z 0 0 z 0 1 Naumann, April 2016, Adjoint Code Design Patterns 11

Adjoint Code Design with dco/c++ Mind the Gap Naumann, April 2016, Adjoint Code Design Patterns 12

Adjoint Code Design with dco/c++ Case Study We consider a Euler-Maruyama scheme performing µ explicit Euler integration steps along ν independent Monte Carlo sample paths. A basic implementation of this method evaluates a function of the following structure: y = F (x) = G Pµ 1(P 0 µ 2(. 0.. P0 0 (S 0 (x))...))., ν 1 ν 1 (Pµ 2 (... P0 (S ν 1 (x))...)) P ν 1 µ 1 where the arguments x (strike price, time to maturity, interest rate(s), price of underlying and market observed volatilities) are scattered over the individual paths by the S i, i = 0,..., ν 1, each path performs integration steps P i j, j = 0,..., µ 1, and the results are gathered by G to yield y (option price). Naumann, April 2016, Adjoint Code Design Patterns 13

Adjoint Code Design Patterns with dco/c++ Case Study: ACModule Class Diagram Naumann, April 2016, Adjoint Code Design Patterns 14

Adjoint Code Design Patterns with dco/c++ Case Study: ACModule Object Diagram See www.stce.rwth-aachen.de/files/naumann/ad16/html/ for doxygen. Naumann, April 2016, Adjoint Code Design Patterns 15

Adjoint Code Design Patterns with dco/c++ Further Scenarios Symbolic Adjoints of Implicit Functions Adjoint Linear Systems Adjoint Nonlinear Systems Adjoint Nonlinear (Locally) Convex Objective Functions Local Finite Differences Black-Box Primal Active Control Flow Dependence Nonsmooth Time Stepping Preaccumulation of Local Jacobians Detection and Exploitation of Sparsity Elimination Techniques on Linearized DAGs Reverse Accumulation for Adjoint Fix-Point Iteration Dynamic Call Tree Reversal Verified Enclosures through Interval Arithmetic / Convex Envelopes... Naumann, April 2016, Adjoint Code Design Patterns 16

Adjoint Code Design Patterns with dco/c++ Selected Scenarios (Toy) x 2 p = 0 x d dp differentiate ( x 2 x dx ) ( dp 1 = 2 x x dx ) x = 0 dp = p (x, G) = S(x 0, p) record perturb ẋ = S(x 0, p + h) p = S( p 0, x, x) x p interpret p = S(G, x) p = x 2 p = x 2 x p p x ẋ x h U.N. at al.: Algorithmic Differentiation of Numerical Methods: Tangent and Adjoint Solvers for Parameterized Systems of Nonlinear Equations, ACM TOMS, 2015. Naumann, April 2016, Adjoint Code Design Patterns 17

Adjoint Code Design Patterns with dco/c++ Local Finite Differences (Notation: v v (1) ) Naumann, April 2016, Adjoint Code Design Patterns 18

Adjoint Code Design Patterns with dco/c++ A(p) x(p) = b(p) A T b = x; Ā = b x T Naumann, April 2016, Adjoint Code Design Patterns 19

Adjoint Code Design Patterns with dco/c++ F (x(p), p) = 0 xf T z = x; p = pf T z Naumann, April 2016, Adjoint Code Design Patterns 20

Adjoint Code Design Patterns with dco/c++ argmin x f(x(p), p) xxf T z = x; p = xpf T z Naumann, April 2016, Adjoint Code Design Patterns 21

Conclusion The quality of an adjoint AD solution / tool is defined by robustness wrt. language features of target code efficiency of adjoint propagation flexibility wrt. design scenarios sustainability wrt. dynamics in user requirements, personnel, hard- and software You want (sooner or later) top-quality AAD tools and documentation, e.g, dco/c++ adjoint numerical libraries, e.g, NAG AD Library training, e.g, 1-3 day in-house courses support / consultancy, e.g, responsive and competent help desk Naumann, April 2016, Adjoint Code Design Patterns 22

Outlook dco/c++ v4.x tape compression / decomposition / debugging run time code generation file / MPI tapes heapless taping on GPUs fully automatic checkpointing / adjoints exploratory spawning for globalization and nonsmoothness multi-core taping approximate / gut computing Naumann, April 2016, Adjoint Code Design Patterns 23

Follow-up naumann@stce.rwth-aachen.de Naumann, April 2016, Adjoint Code Design Patterns 24

STCE. Adjoint Code Design Patterns. Uwe Naumann. RWTH Aachen University, Germany. QuanTech Conference, London, April 2016