STCE. Adjoint Code Design Patterns. Uwe Naumann. RWTH Aachen University, Germany. QuanTech Conference, London, April 2016

Similar documents
The Art of Differentiating Computer Programs

STCE. Mind the Gap in the Chain Rule. Uwe Naumann. Software Tool Support for Advanced Algorithmic Differentiation of Numerical Simulation Code

Nested Differentiation and Symmetric Hessian Graphs with ADTAGEO

Algorithmic Differentiation of Numerical Methods: Tangent and Adjoint Solvers for Parameterized Systems of Nonlinear Equations

STCE. Mixed Integer Programming for Call Tree Reversal. J. Lotz, U. Naumann, S. Mitra

Numerical Nonlinear Optimization with WORHP

Multi-Factor Finite Differences

Sensitivity analysis by adjoint Automatic Differentiation and Application

1.2 Derivation. d p f = d p f(x(p)) = x fd p x (= f x x p ). (1) Second, g x x p + g p = 0. d p f = f x g 1. The expression f x gx

Automatic Differentiation Algorithms and Data Structures. Chen Fang PhD Candidate Joined: January 2013 FuRSST Inaugural Annual Meeting May 8 th, 2014

Fast evaluation of mixed derivatives and calculation of optimal weights for integration. Hernan Leovey

Malliavin Calculus in Finance

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints

Module III: Partial differential equations and optimization

Numerical Modelling in Fortran: day 10. Paul Tackley, 2016

Edwin van der Weide and Magnus Svärd. I. Background information for the SBP-SAT scheme

Verified High-Order Optimal Control in Space Flight Dynamics

WRF performance tuning for the Intel Woodcrest Processor

Scikit-learn. scikit. Machine learning for the small and the many Gaël Varoquaux. machine learning in Python

Low-Memory Tour Reversal in Directed Graphs 1. Uwe Naumann, Viktor Mosenkis, Elmar Peise LuFG Informatik 12, RWTH Aachen University, Germany

Adjoint approach to optimization

Introduction to numerical simulations for Stochastic ODEs

A Symbolic Numeric Environment for Analyzing Measurement Data in Multi-Model Settings (Extended Abstract)

Exploiting Fill-in and Fill-out in Gaussian-like Elimination Procedures on the Extended Jacobian Matrix

IMPLEMENTATION OF AN ADJOINT THERMAL SOLVER FOR INVERSE PROBLEMS

DiffSharp An AD Library for.net Languages

Numerical Methods I Solving Nonlinear Equations

Implicitly and Explicitly Constrained Optimization Problems for Training of Recurrent Neural Networks

Math and Numerical Methods Review

Parallel Polynomial Evaluation

ECS289: Scalable Machine Learning

Solving equilibrium problems using extended mathematical programming and SELKIE

Visual Tracking via Geometric Particle Filtering on the Affine Group with Optimal Importance Functions

Dense Arithmetic over Finite Fields with CUMODP

Sensitivity Analysis AA222 - Multidisciplinary Design Optimization Joaquim R. R. A. Martins Durand

Current Developments and Applications related to the Discrete Adjoint Solver in SU2

Design and implementation of a new meteorology geographic information system

Overview of the PyLith Finite Element Code

Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R

PROJECTION METHODS FOR DYNAMIC MODELS

Neural Networks in Structured Prediction. November 17, 2015

Short-time expansions for close-to-the-money options under a Lévy jump model with stochastic volatility

Compute the behavior of reality even if it is impossible to observe the processes (for example a black hole in astrophysics).

Variational Bayesian Inference Techniques

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

LINEAR AND NONLINEAR PROGRAMMING

Lecture 8: Fast Linear Solvers (Part 7)

Probabilistic Graphical Models

GPU Applications for Modern Large Scale Asset Management

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Stochastic Volatility and Correction to the Heat Equation

DELFT UNIVERSITY OF TECHNOLOGY

Open-source finite element solver for domain decomposition problems

Variance Reduction Techniques for Monte Carlo Simulations with Stochastic Volatility Models

A new approach for investment performance measurement. 3rd WCMF, Santa Barbara November 2009

Efficient robust optimization for robust control with constraints Paul Goulart, Eric Kerrigan and Danny Ralph

Computing the Geodesic Interpolating Spline

GLoBES. Patrick Huber. Physics Department VT. P. Huber p. 1

Geog 469 GIS Workshop. Managing Enterprise GIS Geodatabases

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

REPORTS IN INFORMATICS

Hydra. A library for data analysis in massively parallel platforms. A. Augusto Alves Jr and Michael D. Sokoloff

CP2K. New Frontiers. ab initio Molecular Dynamics

A nonlinear equation is any equation of the form. f(x) = 0. A nonlinear equation can have any number of solutions (finite, countable, uncountable)

Numerical Methods for Large-Scale Nonlinear Equations

A computational architecture for coupling heterogeneous numerical models and computing coupled derivatives

What s New in Active-Set Methods for Nonlinear Optimization?

Calculation of Sound Fields in Flowing Media Using CAPA and Diffpack

Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Convex Optimization

Parallel-in-Time Optimization with the General-Purpose XBraid Package

Using an Auction Algorithm in AMG based on Maximum Weighted Matching in Matrix Graphs

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

Lecture 1 Numerical methods: principles, algorithms and applications: an introduction

Optimization Tools in an Uncertain Environment

Work in Progress: Reachability Analysis for Time-triggered Hybrid Systems, The Platoon Benchmark

Numerical Modelling in Fortran: day 9. Paul Tackley, 2017

D03PEF NAG Fortran Library Routine Document

Treecodes for Cosmology Thomas Quinn University of Washington N-Body Shop

The Chemical Kinetics Time Step a detailed lecture. Andrew Conley ACOM Division

Parallelization of the Dirac operator. Pushan Majumdar. Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata

Reservoir Computing and Echo State Networks

Chapter 6. Differentially Flat Systems

Numerical methods in molecular dynamics and multiscale problems

Dynamic Risk Measures and Nonlinear Expectations with Markov Chain noise

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

Hybrid Analog-Digital Solution of Nonlinear Partial Differential Equations

Recent Progress of Parallel SAMCEF with MUMPS MUMPS User Group Meeting 2013

A particle-in-cell method with adaptive phase-space remapping for kinetic plasmas

Using AD to derive adjoints for large, legacy CFD solvers

2326 Problem Sheet 1. Solutions 1. First Order Differential Equations. Question 1: Solve the following, given an explicit solution if possible:

Airfoil shape optimization using adjoint method and automatic differentiation

FAS and Solver Performance

Numerical Integration of SDEs: A Short Tutorial

Advances in Extreme Learning Machines

Parsimonious Adaptive Rejection Sampling

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal

Electronics 101 Solving a differential equation Incorporating space. Numerical Methods. Accuracy, stability, speed. Robert A.

FEniCS Course Lecture 13: Introduction to dolfin-adjoint. Contributors Simon Funke Patrick Farrell Marie E. Rognes

Adjoint code development and optimization using automatic differentiation (AD)

Dual Reciprocity Method for studying thermal flows related to Magma Oceans

Transcription:

Adjoint Code Design Patterns Uwe Naumann RWTH Aachen University, Germany QuanTech Conference, London, April 2016

Outline Why Adjoints? What Are Adjoints? Software Tool Support: dco/c++ Adjoint Code Design Patterns Naumann, April 2016, Adjoint Code Design Patterns 2

Why Adjoints? Setting the Stage In U.N. and J. du Toit: Adjoint Algorithmic Differentiation Tool Support for Typical Numerical Patterns in Computational Finance. Under Review. Also appeared as NAG Technical Report TR 3/14, The Numerical Algorithms Group. we price a simple European Call option written on an underlying S = S(t) : IR + IR +, described by the SDE ds(t) = S(t) r dt + S(t) σ(t, S(t)) dw (t) with time t 0, maturity T > 0, strike K > 0, constant interest rate r > 0, volatility σ = σ(t, S) : IR + IR + IR +, and Brownian motion W = W (t) : IR + IR. We use this simple case study for illustration of common patterns found in financial simulations in the context of Adjoint Algorithmic Differention (AAD). Naumann, April 2016, Adjoint Code Design Patterns 3

Why Adjoints? Setting the Stage The value of a European call option driven by S is given by the expectation V = Ee r T (S(T ) K) + for given interest r, strike K, and maturity T. It can be evaluated using Monte Carlo simulation or restated as a PDE and solved by a finite difference method. We compare the computation of the gradient of the option price V IR + wrt. the problem parameters (K, S(0), r, T, and the variable number of parameters of the local volatility surface) by finite differences with an adjoint code based on dco/c++... Naumann, April 2016, Adjoint Code Design Patterns 4

Why Adjoints? Setting the Stage Monte Carlo method using 10 4 paths performing 360 time steps each; gradient of size n = 222; running on standard PC with 3GB of RAM available primal central FD naive AAD robust AAD run time (sec.) 2.3 1010.7 24.4 observing gradient with machine accuracy by AAD prohibitive memory requirement > 40GB for naive AAD (quasi-)constant factor (here approx. 10) wrt. primal run time for AAD within given memory bounds aiming for robust and efficient gradient (and Hessian) code download adjoint option pricers from www.nag.co.uk/downloads/addownloads including first- and second-order adjoint ensembles and evolutions Naumann, April 2016, Adjoint Code Design Patterns 5

What Are Adjoints? Let y = F (x), F : IR n IR m, be continuously differentiable. Then...... (approximate) tangents (directional derivatives) IR m l Ẏ = F (x) Ẋ IR m n IR n l yield (approximate) Jacobian at O(n) Cost(F );... adjoints IR n l X = F (x) T IR n m Ȳ IR m l yield Jacobian at O(m) Cost(F ) (cheap gradients). Challenge: Reversal of data flow yields potentially prohibitive memory requirement (O(Cost(F )))! Naumann, April 2016, Adjoint Code Design Patterns 6

Tangents by Overloading 5: y 0( ) 6: y 1(/) v4 x0 1/x1 4: exp v4 3: sin v4/x 2 1 cos(v2) 2: x1 x0 0: x 0 1: x 1 t := e sin(x0 x1) y 0 := x 0 t y 1 := t x 1 v 2 = x 0 x 1 v 2 = ẋ 0 x 1 + x 0 ẋ 1 v 3 = sin(v 2 ) v 3 = cos(v 2 ) v 2 v 4 = exp(v 3 ) v 4 = v 4 v 3 y 0 = x 0 v 4 ẏ 0 = ẋ 0 v 4 + x 0 v 4 y 1 = v 4 /x 1 ẏ 1 = v 4 /x 1 v 4 ẋ 1 /x 2 1 5: ẏ 0 6: ẏ 1 v4 x0 1/x1 4: v 4 v4 3: v 3 v4/x 2 1 cos(v2) 2: v 2 x1 x0 0: ẋ 0 1: ẋ 1 Naumann, April 2016, Adjoint Code Design Patterns 7

Adjoints by Overloading 5: y 0( ) 6: y 1(/) v4 x0 1/x1 4: exp v4 3: sin v4/x 2 1 cos(v2) 2: x1 x0 0: x 0 1: x 1 t := e sin(x0 x1) y 0 := x 0 t y 1 := t x 1 // primal... v 2 = v 3 = v 4 := 0 x 1 = v 4 /x 2 1 ȳ 1 v 4 + = 1/x 1 ȳ 1 v 4 + = x 0 ȳ 0 x 0 + = v 4 ȳ 0 v 3 + = v 4 v 4 v 2 + = cos(v 2 ) v 3 x 1 + = x 0 v 2 x 0 + = x 1 v 2 5: ȳ 0 6: ȳ 1 v4 x0 1/x1 4: v 4 v4 3: v 3 v4/x 2 1 cos(v2) 2: v 2 x1 x0 0: x 0 1: x 1 Naumann, April 2016, Adjoint Code Design Patterns 8

Software Tool Support: dco/c++ dco/c++ (derivative code by overloading in C++) features tangents and adjoints of arbitrary order through recursive template instantiation for numerical simulation code implemented in C++ front-ends for Fortran and C# (beta) optimized assignment-level gradient code through expression templates cache-optimized internal representation in various incarnations vector modes / detection and exploitation of sparsity external adjoint / Jacobian interfaces user-defined intrinsics intrinsic NAG Library functions (e.g. Linear Algebra, Interpolation, Root Finding, Nearest Correlation Matrix) support for parallelism: thread-safe data structures, adjoint MPI library, GPU coupling Naumann, April 2016, Adjoint Code Design Patterns 9

Software Tool Support: dco/c++ Moreover, we have implemented an industrial-strength software engineering infrastructure 1 including professional software development and quality assurance (NAG ) multi-platform (Windows, Linux, Mac OS X) / compiler (vcc, gcc, clang, icc) support overnight building / testing version control user documentation / examples numerous successful applications, including QuantLib, OpenFOAM, PETSc, ICON, Telemac, Jurassic2, ACE+, and several confidential projects with tier-1 investment banks 1 building on academic prototype Naumann, April 2016, Adjoint Code Design Patterns 10

Adjoint Code Design An algorithmic adjoint of F : IR n IR m : y z q = F (x) computes X = F T Ȳ = IR n l IR m l Z {}} k 1 { F 1T... F kt (z k 1 ) ( F k+1t... ( F qt Ȳ )...) }{{} Z k for x z 0 and based on adjoint code modules (ACModules) Z i 1 = F it Zi for i = q,..., 1. The art of differentiating computer programs / algorithmic differentiation (AD) is about provision of a correct, robust, efficient, scalable, flexible ACModule hierarchy. z 4 z 5 z 3 (F 3 F 2 ) z 1 z 0 0 z 0 1 Naumann, April 2016, Adjoint Code Design Patterns 11

Adjoint Code Design with dco/c++ Mind the Gap Naumann, April 2016, Adjoint Code Design Patterns 12

Adjoint Code Design with dco/c++ Case Study We consider a Euler-Maruyama scheme performing µ explicit Euler integration steps along ν independent Monte Carlo sample paths. A basic implementation of this method evaluates a function of the following structure: y = F (x) = G Pµ 1(P 0 µ 2(. 0.. P0 0 (S 0 (x))...))., ν 1 ν 1 (Pµ 2 (... P0 (S ν 1 (x))...)) P ν 1 µ 1 where the arguments x (strike price, time to maturity, interest rate(s), price of underlying and market observed volatilities) are scattered over the individual paths by the S i, i = 0,..., ν 1, each path performs integration steps P i j, j = 0,..., µ 1, and the results are gathered by G to yield y (option price). Naumann, April 2016, Adjoint Code Design Patterns 13

Adjoint Code Design Patterns with dco/c++ Case Study: ACModule Class Diagram Naumann, April 2016, Adjoint Code Design Patterns 14

Adjoint Code Design Patterns with dco/c++ Case Study: ACModule Object Diagram See www.stce.rwth-aachen.de/files/naumann/ad16/html/ for doxygen. Naumann, April 2016, Adjoint Code Design Patterns 15

Adjoint Code Design Patterns with dco/c++ Further Scenarios Symbolic Adjoints of Implicit Functions Adjoint Linear Systems Adjoint Nonlinear Systems Adjoint Nonlinear (Locally) Convex Objective Functions Local Finite Differences Black-Box Primal Active Control Flow Dependence Nonsmooth Time Stepping Preaccumulation of Local Jacobians Detection and Exploitation of Sparsity Elimination Techniques on Linearized DAGs Reverse Accumulation for Adjoint Fix-Point Iteration Dynamic Call Tree Reversal Verified Enclosures through Interval Arithmetic / Convex Envelopes... Naumann, April 2016, Adjoint Code Design Patterns 16

Adjoint Code Design Patterns with dco/c++ Selected Scenarios (Toy) x 2 p = 0 x d dp differentiate ( x 2 x dx ) ( dp 1 = 2 x x dx ) x = 0 dp = p (x, G) = S(x 0, p) record perturb ẋ = S(x 0, p + h) p = S( p 0, x, x) x p interpret p = S(G, x) p = x 2 p = x 2 x p p x ẋ x h U.N. at al.: Algorithmic Differentiation of Numerical Methods: Tangent and Adjoint Solvers for Parameterized Systems of Nonlinear Equations, ACM TOMS, 2015. Naumann, April 2016, Adjoint Code Design Patterns 17

Adjoint Code Design Patterns with dco/c++ Local Finite Differences (Notation: v v (1) ) Naumann, April 2016, Adjoint Code Design Patterns 18

Adjoint Code Design Patterns with dco/c++ A(p) x(p) = b(p) A T b = x; Ā = b x T Naumann, April 2016, Adjoint Code Design Patterns 19

Adjoint Code Design Patterns with dco/c++ F (x(p), p) = 0 xf T z = x; p = pf T z Naumann, April 2016, Adjoint Code Design Patterns 20

Adjoint Code Design Patterns with dco/c++ argmin x f(x(p), p) xxf T z = x; p = xpf T z Naumann, April 2016, Adjoint Code Design Patterns 21

Conclusion The quality of an adjoint AD solution / tool is defined by robustness wrt. language features of target code efficiency of adjoint propagation flexibility wrt. design scenarios sustainability wrt. dynamics in user requirements, personnel, hard- and software You want (sooner or later) top-quality AAD tools and documentation, e.g, dco/c++ adjoint numerical libraries, e.g, NAG AD Library training, e.g, 1-3 day in-house courses support / consultancy, e.g, responsive and competent help desk Naumann, April 2016, Adjoint Code Design Patterns 22

Outlook dco/c++ v4.x tape compression / decomposition / debugging run time code generation file / MPI tapes heapless taping on GPUs fully automatic checkpointing / adjoints exploratory spawning for globalization and nonsmoothness multi-core taping approximate / gut computing Naumann, April 2016, Adjoint Code Design Patterns 23

Follow-up naumann@stce.rwth-aachen.de Naumann, April 2016, Adjoint Code Design Patterns 24