The Art of Differentiating Computer Programs

Size: px

Start display at page:

Download "The Art of Differentiating Computer Programs"

Rafe Kelley
5 years ago
Views:

1 The Art of Differentiating Computer Programs Uwe Naumann LuFG Informatik 12 Software and Tools for Computational Engineering RWTH Aachen University, Germany 1 / 69

2 Contents AD Informally AD Formally AD by Overloading Second-(Higher-)Order AD Formally Advanced AD (with dco/c++) External Function Interface Adjoint Ensembles User-Defined Adjoints AD Projects 2 / 69

3 Who knows how to differentiate... y=x x; v o i d f ( c o n s t double x, double& y ) { y=x x ; } v o i d f ( i n t n, double c o n s t x, double& y ) { f o r ( i n t i =0; i <n;++ i ) x [ i ] = x [ i ] ; y =0; f o r ( i n t i =0; i <n;++ i ) y+=x [ i ] ; y =y ; } 3 / 69

4 Algorithmic Differentiation Basics AD assumes differentiability of a given implementation of a multivariate vector function y = f (x) : IR n IR m (e.g., existence of Jacobian f (x) IR m n and Hessian 2 f (x) IR m n n ).... transforms the given program into programs for computing arbitrary projections of first- and higher-order derivative tensors (e.g., f (x) ẋ and f (x) T ȳ).... can be implemented by source-to-source transformation and/or operator overloading combined with generic (meta-)programming techniques.... can be supported by corresponding software tools in order to (partially) automate its application.... has been applied successfully to a large number of real-world codes from Computational Science, Engineering, and Finance; see, for example, proceedings of the six AD conferences. While the basic idea is reasonably straight forward, the actual implementation (both user and developer perspective) yields several challenges. 4 / 69

5 Foundations of what is coming next... U. Naumann: The Art of Differentiating Computer Programs. An Introduction to Algorithmic Differentiation. Number 24 in Software, Environments, and Tools, SIAM, / 69

6 Algorithmic Differentiation Idea by Example w.l.o.g. m = 1 y where y = f (x) = f 3(f 2(f 1(x))) = IR n x = f 1(x) : x i = xi 2 IR 1 y = f 2(x) = n 1 i=0 x i IR 1 y = f 3(y) = y 2 ( n 1 i=0 x 2 i ) 2 [2 y] y [1] [1] [1] x0 x1 x2 [2 x0] [2 x1] [2 x2] x0 x1 x2 f i are continuously differentiable wrt. their arguments. Hence, by the chain rule f (x) IR n y x = f3(y) IR f 2(x) f 1(x) IR 1 n x 0. = (2 y) (1... 1) 2. IR n n 2 x n 1 6 / 69

7 Algorithmic Differentiation Idea by Example finite differences A (first-order forward) finite difference quotient f (x + h ẋ) f (x) h approximates the (first) directional (total) derivative ẏ = f (x) ẋ = f 3(y) f 2(x) f 1(x) ẋ = (2 y) ( ) x x n 1 ẋ 0. ẋ n 1. The whole gradient f (x) can be approximated by letting ẋ range over the Cartesian basis vectors in IR n at a computational cost of O(n) Cost(f ). 7 / 69

8 Can we avoid truncation? YES, WE CAN! 8 / 69

9 Algorithmic Differentiation Idea by Example tangent code A (first-order) tangent code computes the same directional derivative with machine accuracy. ẏ = (2 y [= n 1 i=0 x2 i ] ) ( ) x x n 1 ẋ 0. ẋ n 1 ẏ [2 y] ẏ [1] [1] [1] ẋ0 ẋ1 ẋ2 [2 x0] [2 x1] [2 x2] ẋ0 ẋ1 ẋ2 The whole gradient f (x) can be approximated by letting ẋ range over the Cartesian basis vectors in IR n at a computational cost of O(n) Cost(f ). 9 / 69

10 Writing Tangent Code by Hand Basic Tangent Code (Rules) 1. duplicate active data segment double v, t 1 v ; 2. augment with assignment-level tangent code t 1 z=c o s ( x z ) ( x t 1 z+t 1 x z ) ; z=s i n ( x z ) ; 3. leave intraprocedural flow of control unchanged f o r ( i n t i =0; i <n;++ i ) { t 1 y [ i ] = s i n ( x [ i ] ) t 1 x [ i ] ; y [ i ]+=cos ( x [ i ] ) ; } 4. replace subprogram calls with tangent versions f ( n, x,m, y, t 1 x, t 1 y ) ; 10 / 69

11 Can we reduce the computational cost? YES, WE CAN! 11 / 69

12 Algorithmic Differentiation Idea by Example adjoint code y = x 0 x 1 12 / 69

13 Algorithmic Differentiation Idea by Example adjoint code Alternatively, the transposed Jacobian can be computed as ȳ [2 v] x = f (x) T ȳ = f 1(x) T f 2(x) T f 3(y) T ȳ x 0 1. = 2.. (2 y ) ȳ [= n 1 2 x n 1 1 i=0 x2 i ] ȳ [1] [1] [1] x0 x1 x2 [2 x0] [2 x1] [2 x2] x0 x1 x2 The whole gradient f (x) can be approximated by setting ȳ = 1 at the computational cost of O(1) Cost(f ). 13 / 69

14 Writing Adjoint Code by Hand Basic Adjoint Code (Rules) 1. duplicate active data segment 2. store lost (due to overwriting or leaving scope) required values in forward section d o u b l e s. push ( z ) ; z=s i n ( x z ) ; 3. assignment-level adjoint code in reverse section... z=d o u b l e s. top ( ) ; d o u b l e s. pop ( ) ; a 1 x+=x c o s ( x z ) a 1 z a 1 z=z c o s ( x z ) a 1 z 4. reverse intraprocedural flow of control by (combinations of) 4.1 enumeration of basic blocks 4.2 counting loop iterations and enumeration of branches 4.3 explicit reversal of for-loops 14 / 69

15 Summary: Tangents and Adjoints by AD y = f (x) : IRn IR (w.l.o.g. m = 1) STCE I Tangent(-Linear) Code f (1) (forward mode) y (1) = f (x) x(1)n IRn IR f at O(n) Cost(f ) Note: Approximation by finite differences. I Adjoint Code f(1) (reverse mode) x(1) = y(1) f (x) IR f at O(1) Cost(f ) c ( J.Lotz) 15 / 69

Nice to have? MITgcm, (EAPS, MIT) in collaboration with ANL, MIT, Rice, UColorado J. Utke, U.N. et al: OpenAD/F: A modular, open-source tool for automatic differentiation of Fortran codes.

16 Nice to have? MITgcm, (EAPS, MIT) in collaboration with ANL, MIT, Rice, UColorado J. Utke, U.N. et al: OpenAD/F: A modular, open-source tool for automatic differentiation of Fortran codes. ACM TOMS 34(4), Plot: A tangent computation / finite difference approximation for 64,800 grid points at 1 min each would keep us waiting for a month and a half... :-((( We did it in less than 10 minutes thanks to discrete adjoints computed by a differentiated version of the MITgcm :-) 16 / 69

17 Can we do it again (and again, and...)? YES WE CAN! 17 / 69

18 Tangent Model Graphically y[f ] F (x) y[f ] x F (x) x (1) x s x (1) x s y (1) y s = y x x = F (x) x(1) s 18 / 69

19 Adjoint Model Graphically t y (1) y[f ] F (x) x y (1) t y x (1) t x = t y y x = y (1) F (x) 19 / 69

20 Second-Order Tangent Model Graphically (w.l.o.g. m = 1) y (1) [ ] x (1)T F F (x) 2 F (x) x x (1) x (2) x (1,2) s Tangent Extension of the Linearized DAG of the Tangent Model y (1) = F (x) x (1) of y = F (x) 20 / 69

21 Second-Order Adjoint Model Graphically (w.l.o.g. m = 1) x(1)[ ] y(1) F (x) F 2 F (x) y(1) x y(2) (1) x (2) s Tangent Extension of the Linearized DAG of the Adjoint Model x (1) = F (x) T y (1) of y = F (x) 21 / 69

22 Summary: Second-(and Higher-)Order AD y = f (x) : IR n IR Second-Order Tangent Code (forward-over-forward AD) y (1,2) = x (1) T 2 f (x) x (2) IR n IR n n IR n 2 f at O(n 2 ) Cost(f ) Second-Order Adjoint Code (forward-over-reverse AD) x (2) (1) = y (1) 2 f (x) x (2) 2 f x (2) at O(1) Cost(f ) resp. 2 f at O(n) Cost(f ) Higher-order tangent (ToTo...ToT) and adjoint (ToTo...ToA) models are derived recursively 22 / 69

23 Algorithmic Differentiation Formalism 1. given implementation of F : IR n IR m : y = F (x), can be decomposed into a single assignment code (SAC) for i = 0,..., n 1 : v i = x i for j = n,..., n + q 1 : v j = ϕ j ( (vk ) k j ) for k = 0,..., m 1 : y k = v n+p+k where q = p + m 2. elemental functions ϕ j posess jointly continuous partial derivatives with respect to their arguments (v k ) k j at alle points of interest 23 / 69

24 Algorithmic Differentiation Tangent (Forward) Mode Let v j ϕ j s ((v k) k j ) for some s IR and j = 0,..., n + p + m 1. For given ẋ i, i = 0,..., n 1, tangent mode AD computes directional derivatives as follows: for i = 0,..., n 1 : v i = ẋ i ; v i = x i for j = n,..., n + q 1 : v j = i j ϕ j v i (v k ) k j v i v j = ϕ j ((v k ) k j ) for k = 0,..., m 1 : ẏ k = v n+p+k ; y k = v n+p+k 24 / 69

25 Algorithmic Differentiation Tangent Mode (Example) Let F : IR 2 IR 2, y = F (x) be implemented as x 0 = x 0 x 1 y 0 = sin(x 0) y 1 = x 0 x 1 yielding the SAC v 0 = x 0; v 1 = x 1 v 2 = v 0 v 1 y 0 = v 3 = sin(v 2) y 1 = v 4 = v 2 v / 69

26 Algorithmic Differentiation Tangent Mode (Proof Idea) Consider the extended function F : IR 5 IR 5 defined as where r i, i = 2, 3, 4 are random and x 0 x 0 x 0 x 1 v 2 y 0 = F x 1 r 2 r 3 = Φ3(Φ2(Φ1( x 1 r 2 r 3 ) y 1 r 4 r 4 x 0 x 0 x 1 Φ 1( r 2 r 3 ) = ( x 1 x 1 v 2 ); Φ2( v 2 r 3 r 3 ) = ( x 1 x 1 v 2 ); Φ3( v 2 v 3 v 3 ) = ( x 1 v 2 v 3 ) r 4 r 4 r 4 r 4 r 4 v 4 x 0 x 0 x 0 x 0 for v 2 = v 0 v 1; v 3 = sin(v 2); v 4 = v 2 v / 69

27 Algorithmic Differentiation Tangent Mode (Proof Idea) If F is continuously differentiable at x, then so is F and, hence, x 0 x 0 x 1 F = F( r 2 r 3 ) = Φ3( x 1 v 2 v 3 ) Φ2( x 1 v 2 r 3 ) Φ1( x 1 r 2 r 3 ) r 4 r 4 r 4 r 4 since = cos(v 2) x 1 x Moreover,... v 4 = v 2 v 1; v 3 = sin(v 2); v 2 = x 0 x 1. x 0 x 0 27 / 69

28 Algorithmic Differentiation Tangent Mode (Proof Idea) ẋ 0 ẋ 0 ẋ1 ẋ1 v2 ẏ = F ṙ2 0 ṙ3 ẏ1 ṙ ẋ ẋ1 = cos(v 2 ) 0 0 x 1 x ṙ2 ṙ ṙ ẋ ẋ1 = cos(v 2 ) 0 0 x 1 ẋ 0 + x 0 ẋ 1 ṙ ṙ ẋ 0 ẋ 0 ẋ ẋ1 ẋ1 ẋ1 = v2 cos v 2 v = v2 2 v = v2 3 v ṙ 4 v 2 ẋ 1 v 4 Q.E.D 28 / 69

29 Algorithmic Differentiation Adjoint (Reverse) Mode Let v j t ϕ j ((v k ) k j ) for some t IR and j = 0,..., n + p + m 1. For given ẏ i, i = 0,..., m 1, adjoint mode AD computes adjoint derivatives as follows: for i = 0,..., n 1 : v i = x i for j = n,..., n + q 1 : v j = ϕ j ((v k ) k j ) for k = 0,..., m 1 : y k = v n+p+k for k = 0,..., m 1 : v n+p+k = ȳ k for i = 0,..., n 1 : v i = x i for j = n + q 1,..., n : i j : v i = v i + ϕ j v i (v k ) k j v j v j = 0 for i = 0,..., n 1 : x i = v i 29 / 69

30 Algorithmic Differentiation Adjoint Mode (Proof Idea) Standard matrix arithmetic arithmetic yields x 0 x 0 x 1 x 1 x 1 x 1 F T = F( r 2 r 3 )T = Φ 1( r 2 r 3 )T Φ 2( v 2 r 3 )T Φ 3( v 2 v 3 r 4 r 4 r 4 r 4 x 0 x 0 )T 1 0 x x = cos(v 2) Moreover, / 69

31 Algorithmic Differentiation Adjoint Mode (Proof Idea) x 0 x 0 x 1 x v 2 ȳ = F T 1 v 2 0 ȳ 3 ȳ 1 ȳ x x x x = cos(v 2 ) v 2 ȳ ȳ x x x x = cos(v 2 ) ȳ 1 v 2 + ȳ 1 ȳ x x x x = ȳ 1 v 2 + ȳ 1 + cos(v 2 ) ȳ 0 0 = / 69

32 Algorithmic Differentiation Adjoint Mode (Proof Idea) corresponding to x 0 + x 1 ( v 2 + ȳ 1 + cos(v 2 ) ȳ 0 ) x 1 ȳ 1 + x 0 ( v 2 + ȳ 1 + cos(v 2 ) ȳ 0 )... = v 2 = v ȳ 1 x 1 = x 1 1 ȳ 1 ȳ 1 = 0 v 2 = v 2 + cos(v 2 ) ȳ 0 ȳ 0 = 0 x 1 = x 1 + x 0 v 2 x 0 = x 0 + x 1 v 2 v 2 = 0 Note the use of v 2 (stored in x 0) prior to input value of x 0. Q.E.D 32 / 69

33 AD by Overloading Example: y = f (x) = sin(e x ) v o i d f ( double &z ) { z=exp ( z ) ; z=s i n ( z ) ; } 1 sin(exp(x)) (generated with Gnuplot) 33 / 69

34 Tangents by Overloading... the dco approach (conceptually)... v 1(exp) v o i d t 1 f ( t 1 s : : t y p e &[z, t 1 z ] ) {... [ exp ( z ), exp ( z ) t 1 z ] ; } z 1 v (1) (1) 1 =e z1 z 1 z (1) 1 34 / 69

35 Tangents by Overloading z 2(=) z (1) 2 = 1 v (1) 1 v o i d t 1 f ( t 1 s : : t y p e &[z, t 1 z ] ) { [ z, t 1 z ]=[ exp ( z ), exp ( z ) t 1 z ] ; } v 1(exp) z 1 v (1) (1) 1 =e z1 z 1 z (1) 1 35 / 69

36 Tangents by Overloading v 2(sin) z 2(=) v (1) 2 = cos(z 2) z (1) 2 v o i d t 1 f ( t 1 s : : t y p e &[z, t 1 z ] ) { [ z, t 1 z ]=[ exp ( z ), exp ( z ) t 1 z ] ;... [ s i n ( z ), c o s ( z ) t 1 z ] ; } v 1(exp) z (1) 2 = 1 v (1) 1 v (1) (1) 1 =e z1 z 1 z 1 z (1) 1 36 / 69

37 Tangents by Overloading v o i d t 1 f ( t 1 s : : t y p e &[z, t 1 z ] ) { [ z, t 1 z ]=[ exp ( z ), exp ( z ) t 1 z ] ; [ z, t 1 z ]=[ s i n ( z ), c o s ( z ) t 1 z ] ; } Driver: z 3(=) v 2(sin) z 2(=)... t 1 s : : t y p e [ z, t 1 z ] = [..., 1 ] ; t 1 f ( [ z, t 1 z ] ) ; cout << t 1 z << e n d l ;... z 1 v 1(exp) z (1) 3 = 1 v (1) 2 v (1) 2 = cos(z 2) z (1) 2 z (1) 2 = 1 v (1) 1 v (1) (1) 1 =e z1 z 1 z (1) 1 37 / 69

38 dco/c++ Preprocessing The given implementation v o i d f ( i n t n, c o n s t double x, i n t n p, c o n s t double x p i n t m, double y, i n t m p, double y p ) ; of f : IR n IR np IR m IR mp : ( ) y = f (x, x p) needs to be preprocessed (manually and/or automatically) yielding a new implementation f. y p The scope of this preprocessing depends on the AD context. In the simplest case a mere change of the types of all active program variables to the desired dco/c++ type is sufficient. More substantial modifications may be required, e.g., in the context of checkpointing or when defining user-defined intrinsics. 38 / 69

39 dco/c++ Generic First-Order Tangent Mode To compute y (1) = F (x) x (1) activate #include dco.hpp redeclare floating-point variables in F as of type dco:: tt1s<double>::type seed set directional derivatives of active inputs run call active F harvest get directional derivatives of active outputs Live: Example 39 / 69

40 Adjoints by Overloading... the dco approach (conceptually)... v o i d a 1 f ( a1s : : t y p e &[z, a 1 z ] ) { [ z, TAPE]= exp ( z ) ;... }... z 2(=) [1] v 1(exp) [e z1 ] z 1 40 / 69

41 Adjoints by Overloading z 3(=) [1] v o i d a 1 f ( a1s : : t y p e &[z, a 1 z ] ) { [ z, TAPE]= exp ( z ) ; [ z, TAPE]= s i n ( z ) ; }... v 2(sin) [cos(z 2)] z 2(=) [1] v 1(exp) [e z1 ] z 1 41 / 69

42 Adjoints by Overloading z 3(1) z 3(=) v o i d a 1 f ( a1s : : t y p e &[z, a 1 z ] ) { [ z, TAPE]= exp ( z ) ; [ z, TAPE]= s i n ( z ) ; }... a 1 z =1; TAPE. i n t e r p r e t ( ) ;... [1] v 2(sin) [cos(z 2)] z 2(=) [1] v 1(exp) [e z1 ] z 1 42 / 69

43 Adjoints by Overloading z 3(1) z 3(=) v o i d a 1 f ( a1s : : t y p e &[z, a 1 z ] ) { [ z, TAPE]= exp ( z ) ; [ z, TAPE]= s i n ( z ) ; }... a 1 z =1; TAPE. i n t e r p r e t ( ) ;... v 2(1) = 1 z 3(1) v 2(sin) [cos(z 2)] z 2(=) [1] v 1(exp) [e z1 ] z 1 43 / 69

44 Adjoints by Overloading z 3(1) z 3(=) v o i d a 1 f ( a1s : : t y p e &[z, a 1 z ] ) { [ z, TAPE]= exp ( z ) ; [ z, TAPE]= s i n ( z ) ; }... a 1 z =1; TAPE. i n t e r p r e t ( ) ;... v 2(1) = 1 z 3(1) v 2(sin) z 2(1) = cos(z 2) v 2(1) z 2(=) [1] v 1(exp) [e z1 ] z 1 44 / 69

45 Adjoints by Overloading z 3(1) z 3(=) v o i d a 1 f ( a1s : : t y p e &[z, a 1 z ] ) { [ z, TAPE]= exp ( z ) ; [ z, TAPE]= s i n ( z ) ; }... a 1 z =1; TAPE. i n t e r p r e t ( ) ;... v 2(1) = 1 z 3(1) v 2(sin) z 2(1) = cos(z 2) v 2(1) z 2(=) v 1(1) = 1 z 2(1) v 1(exp) [e z1 ] z 1 45 / 69

46 Adjoints by Overloading z 3(1) v o i d a 1 f ( a1s : : t y p e &[z, a 1 z ] ) { [ z, TAPE]= exp ( z ) ; [ z, TAPE]= s i n ( z ) ; }... a 1 z =1; TAPE. i n t e r p r e t ( ) ; cout << a 1 z << e n d l ;... z 3(=) v 2(1) = 1 z 3(1) v 2(sin) z 2(1) = cos(z 2) v 2(1) z 2(=) v 1(1) = 1 z 2(1) v 1(exp) z 1(1) = e z1 v 1(1) z 1 46 / 69

47 dco/c++ Generic First-Order Adjoint Mode To compute x (1) = x (1) + F (x) T y (1) activate #include dco.hpp instantiate tape for base type double redeclare floating-point variables in F as of type dco:: ta1s<double>::type tape register active inputs x call active F mark active outputs y seed set adjoints x (1) and y (1) of active inputs and outputs, respectively interpret run tape interpreter harvest get adjoints x (1) of active inputs Live: Example 47 / 69

48 Second (and Higher) Derivatives Tangent Projection (A 2 F, F : IR 6 IR 4 ) A v <A,v> 48 / 69

49 Second (and Higher) Derivatives Adjoint Projection (A 2 F, F : IR 6 IR 4 ) w A <w,a> 49 / 69

50 Second (and Higher) Derivatives Second-Order Tangent Projection (A 2 F, F : IR 6 IR 4 ) Note: due to symmetry. A v u <A,v,u> < A, v, u >=<< A, v >, u >=<< A, u >, v >=< A, u, v > 50 / 69

51 Second (and Higher) Derivatives Second-Order Adjoint Projection (A 2 F, F : IR 6 IR 4 ) v w A <w,a,v> Note: < w, A, v >=< w <, A, v >>=<< w, A >, v >=< v, < w, A >>=< v, w, A > due to symmetry. 51 / 69

52 dco/c++ Generic Second-Order Tangent Mode To compute y (1,2) =< F (x), x (1,2) > + < 2 F (x), x (1), x (2) > activate #include dco.hpp redeclare floating-point variables in F as of type dco:: tt1s<dco::tt1s<double>::type>::type seed set tangents x (1) and x (2) of active input set second-order tangent x (1,2) of active input run call active F harvest get second-order tangent y (1,2) of active output Live: Example 52 / 69

53 dco/c++ Generic Second-Order Adjoint Mode To compute x (2) (1) = x(2) (1) + < y(2) (1), F (x) > + < y (1), 2 F (x), x (2) > activate #include dco.hpp instantiate tape for base type dco:: tt1s<double>::type redeclare floating-point variables in F as of type dco:: ta1s<tt1s<double>::type>::type seed tangent set tangent x (2) for active input tape register active input x call active F mark active output y seed adjoints set adjoint y (1) of active output set second-order adjoints x (2) and y(2) of active input and output, respectively (1) (1) interpret run tape interpreter harvest get second-order adjoint x (2) (1) of active input Live: Example 53 / 69

54 Is it really that easy? WELL, / 69

55 External Function General Case (y, y p) = f (x, x p) x p ū p v p y p x [ ] ū x ū u p v p v [ ] y v y [1] [1] u [ ] v u v x (1) ū (1) [ ] ū x v (1) y (1) [ ] y v [1] [1] u (1) v (1) [ ] v u 55 / 69

56 External Function General Case xp ūp vp yp up vp x [ ū ] ū v [ y ] y x(1) [ ū ] ū(1) v(1) [ y ] y(1) x v x v [1] [1] u [ v u ] v [1] [1] u(1) [ v u ] v(1) A gap in the tape is introduced by calling a user-defined function make gap to record the following gap data: Tape location of active gap inputs ū in order to write ū (1) := u (1) correctly; adjoint gap input checkpoint (u, u p, v, v p) in order to initialize interpretation of the gap correctly; tape location of active gap outputs v in order to initialize v (1) := v (1) and, hence, interpretation of gap correctly; requires execution of g(u, u p, v, v p). This data is stored in the tape together with a reference to a user-defined function interpret gap to increment ū (1) with ( v u ) T v(1). 56 / 69

57 External Function Support by dco/c++ while taping f User function make gap: u := gap data >register input(ū); u p := ū p gap data >write to checkpoint(z ), where z (ū, ū p, ) g(u, u p, v, v p) v := gap data >register output(v) gap data >write to checkpoint(z + ), where z + (v, v p, ) register external function(&interpret gap,gap data) xp ūp vp yp up vp x [ ū x ] ū v [ y v ] y [1] [1] u [ v u ] v 57 / 69

58 External Function Support by dco/c++ while interpreting tape for f User function interpret gap : gap data >read from checkpoint(z), where z (z, z + ) v (1) := gap data >get output adjoint() g (1) (z, v (1), v p, u (1) ) gap data >increment input adjoint(u (1) ) x(1) [ ū x ] ū(1) v(1) [ y v ] y(1) [1] [1] u(1) [ v u ] v(1) 58 / 69

59 Checkpointing Formalism: Call Tree v o i d g ( i n t l, i n t k, c o n s t double u, double v ) {... } v o i d f ( i n t n, i n t m, c o n s t double x, double y ) {... g ( l, k, u, v ) ;... } yields f g 59 / 69

60 Checkpointing Adjoint Call Tree f (MAKE TAPE) g (STORE INPUTS) g f (INTERPRET TAPE) g (RESTORE INPUTS) g (MAKE TAPE) g (INTERPRET TAPE) Assumption: f and g are transformed into f and g, respectively. The new routines implement the four modes (MAKE TAPE, INTERPRET TAPE, STORE INPUTS, RESTORE INPUTS) that are required by the checkpointed adjoint code. 60 / 69

61 External Function Checkpointing (make gap) u ū (g (INTERPRET TAPE,...) increments ū (1) ); u p := ū p g(u, u p, v, v p) v := gap data >register output(v) gap data >write to checkpoint(ū, ū p) register external function(&interpret gap,gap data) xp ūp vp yp up vp x [ ū x ] ū v [ y v ] y [1] [1] u [ v u ] v 61 / 69

62 External Function Checkpointing (interpret gap) gap data >read from checkpoint(u, u p) g (MAKE TAPE,u, u p) v (1) := gap data >get output adjoint() g (INTERPRET TAPE,v (1) ) (interpretation increments ū (1) ) x(1) [ ū x ] ū(1) v(1) [ y v ] y(1) [1] [1] u(1) [ v u ] v(1) 62 / 69

63 External Function Checkpointing (Implementation) template<typename ATYPE> v o i d f ( i n t n, ATYPE& x ) { f o r ( i n t i =0; i <n ; i ++) x=s i n ( x ) ; }... i n t main ( ) {... f ( n /3, x ) ; f make gap ( n /3, x ) ; f ( n n /3 2, x ) ;... } n iterations of x = sin(x) split into thirds checkpointed adjoint: a1s_joint_reversal naive adjoint: single call of f(n,x) instead 63 / 69

64 External Function Adjoint Ensemble (Concurrent Local Taping) v o i d f (... ) {... vb =0; f o r ( i n t i =0; i <N; i ++) { g ( u, u p [ i ], v [ i ], v p [ i ] ) ; vb+=v [ i ] ;... }... } u 1 p u 1 [ ] v 1 u 1 v 1 p v 1 xp ūp vp yp [1] [1]. x [ ū x ] ū u N p v N p v [ y v ] y [1] [1] u N [ ] v N u N v N 64 / 69

65 External Function Adjoint Ensemble (Concurrent Local Interpretation) u 1 (1) [ ] v 1 (1) v 1 u 1 [1] [1] x(1) [ ū x ] ū(1) v(1) [ y v ] y(1) [1] [1] u N (1) [ ] v N (1) v N u N plain adjoint AD likely to run out of memory for large N path-wise adjoints reduce memory requirement siginifcantly multiple tapes enable thread-parallelism optionally: individual paths and their adjoints on accelerators 65 / 69

66 External Function Adjoint Ensemble (Implementation) template<typename ATYPE> v o i d f ( i n t m, c o n s t ATYPE& x, c o n s t double& r, ATYPE& y ) { y =0; f o r ( i n t i =0; i <m; i ++) y+=s i n ( x+r ) ; } ensemble over x in (random) r naive adjoint a1s_adjoint_ensemble/plain adjoint ensemble a1s_adjoint_ensemble 66 / 69

67 External Function User-Defined Adjoints (Motivating Example) r(x, p) x 2 p = 0 x (1) x p ( x (1) 2 x x ) p 1 = 2 x x (1) x p x (1) = 0 x = SOLVE(x 0, p) p (1) = SOLVE (1) (x 0, x (1), p) p (1) = SOLVE(x, x (1) ) x p x (1) x p = x (1) 2 p = x (1) 2 x p (1) p (1) exact inexact continuous adjoint discrete adjoint 67 / 69

68 External Function User-Defined Adjoints (Implementation) solving x 2 p = 0 by Newton s method template<typename ATYPE> v o i d m y s q r t (ATYPE p, ATYPE& x ) { c o n s t double eps=1e 15; ATYPE x o l d=x +1; w h i l e ( abs ( x x o l d )> eps ) { x o l d=x ; x=x o l d ( x o l d x o l d p ) / ( 2 x o l d ) ; } } solving (linear) adjoint equation explicitely v o i d m y a d j o i n t s q r t ( double& a1s p, double x, double a 1 s x ) { a 1 s p=a 1 s x /(2 x ) ; } discrete adjoint a1s_user_defined_intrinsic/discrete continuous adjoint a1s_user_defined_intrinsic 68 / 69

69 AD STCE Ongoing EU: AboutFlow, Scorpio, Fortissimo (OpenFOAM, ACE+) Tier-1 Investment Banks: Risk management NAG, DFG: Adjoint numerical methods (NAG Library) DFG: Hybrid discrete adjoints BAW, EDF: Waterways engineering (Telemac/Sisyphe) MPI-M: Oceanography (ICON GCM) JADE: DAEs with optimality conditions GRS: Toward robust global (deterministic) optimization FNR, Base: Adjoint MPI / OpenMP 69 / 69

STCE. Adjoint Code Design Patterns. Uwe Naumann. RWTH Aachen University, Germany. QuanTech Conference, London, April 2016

STCE. Adjoint Code Design Patterns. Uwe Naumann. RWTH Aachen University, Germany. QuanTech Conference, London, April 2016 Adjoint Code Design Patterns Uwe Naumann RWTH Aachen University, Germany QuanTech Conference, London, April 2016 Outline Why Adjoints? What Are Adjoints? Software Tool Support: dco/c++ Adjoint Code Design