Generalized Derivatives Automatic Evaluation & Implications for Algorithms Paul I. Barton, Kamil A. Khan & Harry A. J. Watson

Size: px

Start display at page:

Download "Generalized Derivatives Automatic Evaluation & Implications for Algorithms Paul I. Barton, Kamil A. Khan & Harry A. J. Watson"

George Ross
5 years ago
Views:

1 Generalized Derivatives Automatic Evaluation & Implications for Algorithms Paul I. Barton, Kamil A. Khan & Harry A. J. Watson Process Systems Engineering Laboratory Massachusetts Institute of Technology

2 Nonsmooth Equation Solving Semismooth Newton method: Linear programming (LP) Newton method: min γ,x γ G(x k )(x x k ) = f(x k ) s.t. f(x k ) + G(x k )(x x k ) γ f(x k ) 2 G(x k ) (x x k ) γ f(x k ) x X Polyhedral set some element of a generalized derivative Kojima & Shindo (1986), Qi & Sun (1993), Facchinei, Fischer & Herrich (2014). 2

3 Generalized Derivatives Suppose locally Lipschitz => differentiable on a set S B-subdifferential: Clarke Jacobian: f B f(x) := {H : H = lim i f(x) := conv B f(x) f (x) = x Jf(x (i) ), x = lim x (i), x (i) S} i f (x) = { 1} f (x) = {1} x B f (x) = { 1,1}, f (x) = [ 1,1] Useful properties of f(x) : Clarke (1973). Ø Nonempty, convex, and compact Ø Satisfies mean-value theorem, implicit/inverse function theorems Ø Reduces to subdifferential/derivative when f is convex/strictly differentiable 3

4 Convergence Properties Suppose generalized derivative contains no singular matrices at the solution Semismooth Newton method: G(x k ) f(x k ) Ø local Q-superlinear convergence if Ø local Q-quadratic convergence if strongly semismooth Semismooth Newton & LP-Newton methods for PC 1 or strongly semismooth functions: Ø local Q-quadratic convergence if G(x k ) B f(x k ) Automatic/Algorithmic Differentiation (AD) Ø Automatic methods for computing derivatives in complex settings Ø Automatic method for computing elements of generalized derivatives? Ø Computationally relevant generalized derivatives 4

5 All generalized derivatives are equal But, some are more equal than others. 5

6 Obstacles to Automatic Gen. Derivative Evaluation 1 Automatically evaluating Clarke Jacobian elements is difficult Lack of sharp calculus rules: g(x) = max{0, x} h(x) = min{0, x} f (x) = g(x)+ h(x) x x x 0 g(0) = [0,1] 0 h(0) = [0,1] (0 + 0) f (0) = {1} f (0) g(0) + h(0) 6

7 Directional Derivatives & PC 1 Functions Directional derivative: f '(x;d) = lim t 0 + f(x + td) f(x) t Sharp chain rule for locally Lipschitz functions: [f!g]'(x;d) = f '(g(x);g'(x;d)) AD gives the directional derivative PC 1 functions: finite collection of C 1 functions for which f(y) { φ(y) :φ F f (x)}, y N(x) 2-norm not PC 1 Griewank (1994), Scholtes (2012). 7

8 Obstacles 2 PC 1 functions have piecewise linear directional derivative f (x;d) = B (1) d d 2 f (x;d) = B (2) d d 1 f (x;d) = B (3) d 8

9 Obstacles 2 PC 1 functions have piecewise linear directional derivative f (x;d) = B (1) d d 2 f (x;d) = B (2) d d 1 f (x;d) = B (3) d Directional derivatives in the coordinate directions do not necessarily give B- subdifferential elements Also defeats finite differences 9

10 Obstacles 3 f(x) may be a strict subset of m i=1 f i (x) f : (x 1, x 2 )! x + x 1 2 f(0) = x 1 x 2 f 1 (0) f 2 (0) = π 2 f(0) s 1 1 2s : s 0,1 2s : (s 2s 2 1 1,s 2 ) 0,1 π 2 ( f 1 (0) f 2 (0)) 10

11 L-smooth Functions The following functions f : X R n R m are L-smooth: Ø Continuously differentiable functions Ø Convex functions (e.g. abs, 2-norm) Ø PC 1 functions Ø Compositions of L-smooth functions: Ø Integrals of L-smooth functions: Ø Solutions of ODEs with L-smooth right-hand sides: where dx dt x! b a g(t,x) dt c! x(b,c), (t,c) = g(t,x(t,c)), x(0,c) = c x! h(g(x)) Nesterov (1987), Khan and Barton (2014), Khan and Barton (2015). 11

12 Lexicographic Derivatives L-subdifferential: L f(x) = {J L f(x;m) :det M 0} Ø Contains L-derivatives in directions M: J L f(x;m), det M 0 Useful properties: Ø L-derivatives classical derivative wherever strictly differentiable Ø L-derivatives elements of Clarke gradient Ø Contains only subgradients when f convex Ø Contained in plenary hull of Clarke Jacobian, and can be used in place of Clarke Jacobian in numerical methods: {Ad : A L f(x)} {Ad : A f(x)} for each d R n Ø For PC 1 functions, L-derivatives elements of B-subdifferential Ø Satisfies sharp chain rule, expressed naturally using LD-derivatives Nesterov (1987), Khan and Barton (2014), Khan and Barton (2015). 12

13 Lexicographic Directional (LD)-Derivatives Extension of classical directional derivative LD-derivative: for any M := [m (1)! m ( p) ] R n p, f '(x;m) = [f (0) ( x,m (m (1) )! f p 1) x,m (m ( p) )] If M is square and nonsingular: f '(x;m) = J L f(x;m)m If f is differentiable at x: f '(x;m) = Jf(x)M Sharp LD-derivative chain rule: Khan and Barton (2015). [f! g]'(x;m) = f '(g(x);g'(x;m)) 13

14 Vector Forward AD Mode for LD-derivatives Sharp chain rule immediately implies, given the seed directions M, forward-mode AD can compute: Need calculus rules for elementary functions : Ø abs, min, max, mid, f '(x;m), etc. Ø algorithm for elemental PC 1 functions Ø linear programs and lexicographic linear programs parameterized by their RHSs Ø implicit function: w'(ẑ;m) is the unique solution N of 2 h(w(z),z) = 0 ( ) = 0 h' (ŷ,ẑ);(n,m) Khan and Barton (2015), Khan and Barton (2013), Hoeffner et al. (2015). 14

15 Semismooth Inexact Newton Method Inexact Newton method: Solve iteratively: But, directional derivative not a linear function of the directions Let, M nonsingular. Then: But, M not known in advance Compute columns of one at time M = d 1,d 2, J(x)d i, i = 1,2, J L f(x;m) Δx = f(x) f '(x;m) = J L f(x;m)m f '(x;m) Ø computation of a column affects subsequent columns Ø automatic code can be locked to record influence of earlier columns Local Q-superlinear & Q-quadratic convergence rates can be achieved 15

16 Approximation of LD-derivatives using FDs LD-derivative: M := [m (1)! m ( p) ] R n p f '(x;m) = [f (0) ( x,m (m (1) )! f p 1) x,m (m ( p) )] FD approx. of f '(x;m) using p+1 function evaluations: f (0) x,m (m (1) ) α 1 [f(x + αm (1) ) f(x)] =: D αm(1) [f](x) f (1) x,m (m (2) ) D αm( [f (0) 2) x,m ](m (1) ) = D αm( D αm(1) [f](x) 2)! ( f p 1) ( x,m (m ( p) ) D αm( [f p 2) p ) x,m ](m ( p 1) ) = D αm( "D αm( D αm(1) [f](x) p ) 2) f '(x;m) = [f (0) x,m (m (1) ) f (1) x,m (m (2) )] f (0) x,m (m (1) ) α 1 [f(x + αm (1) ) f(x)] f (1) x,m (m (2) ) α 2 [f(x + αm (1) + α 2 m (2) ) f(x + αm (1) )] x x + αm (1) + α 2 m (2) x + αm (1) 16

17 Sparse Accumulation for L- derivatives Cost of AD can be reduced when the Jacobian is sparse Ø Find structurally orthogonal columns Ø Perform vector forward pass with seed matrix rather than AD for LD-derivatives à order of the directions matters Ø Corresponding to M is an uncompressed (permutation) matrix Q:» M = QD for some matrix D Ø Procedure: Ø a b 0 0 c 0 d 0 0 e 0 f 0 0 g h» Identify matrices Q, D, and M is not true in general M ϒ n p I ϒ n n » Perform vector forward pass to calculate f (x;m)» Copy entries of f (x;m) into entries of sparse data structure for f (x;q) Done based on assumption that f (x;m) = f (x;q)d» Calculate J (i.e. by sparse permutation) L f(x;q) = f (x;q)q 1 f (x;m) = f (x;q)d 17

18 Generalized Derivatives of Algorithms: MHEX model F 1,T 1 in! out F H,T H f 1,t 1 in MHEX!! out f C,t C F 1,T 1 out! out F H,T H f 1,t 1 out out f C,t C Watson et al. (2015). F ( i T in out i T ) i = f j t out in i t i i H min p P j C ( ) ( EBP p p C EBP ) H = 0 UA ΔQ k = 0 k K k K k ΔT LM 18

19 ODEs and BVPs dx dt (t,c) = f(t,x(t,c)), x(t 0,c) = c The LD-derivative mapping t! [x t ]'(c 0 ;M) uniquely solves the ODE: da dt (t) = [f t ]'(x(t,c 0 );A(t)), A(t 0 ) = M Boundary value problem: 0 = F(c,x(t f,c)) Solve with semismooth (inexact) Newton method using chain rule for LD-derivatives Ø if in addition f is semismooth, then Q-superlinear convergence rate Ø if it happens to be PC 1, then Q-quadratic convergence rate Khan and Barton (2014), Khan and Barton (2015), Pang and Stewart (2009). 19

20 Conclusions L-derivatives computationally relevant generalized derivatives Can be computed automatically for broad classes of functions Strong theory gives practically computable generalized derivatives for: Ø Implicit functions Ø Algorithms Ø ODE solutions Ø Linear programs Ø etc. 20

21 Acknowledgments Peter Stechlinski Novartis Statoil 21

22 Obstacles 3 f(x) may be a strict subset of m i=1 f i (x) f : (x 1, x 2 )! x + x f(0) = x 1 x f 1 (0) f 2 (0) = 1 2s 1 π 2 f(0) = : s 0,1 1 2s 2s 1 1 2s : s 0,1 2s : (s 2s 2 1 1,s 2 ) 0,1 π 2 ( f 1 (0) f 2 (0)) = 2s 1 1 2s 2 1 : (s, s ) 0,

23 Lexicographic differentiation [1] f : X R n R m is L-smooth at x X if it is locally Lipschitz continuous and directionally differentiable, and if, for any M := [m (1)! m ( p) ] R n p, the following functions exist: (0) f x,m (1) f x,m :d! f '(x;d) :d! [f (0) x,m ]'(m (1) ;d) ( f p) ( x,m :d! [f p 1) x,m ]'(m ( p) ;d) If the columns of M span R n ( p), then f x,m is linear " J L f(x;m) := Jf ( p) x,m (0) Lexicographic subdifferential: L f(x) = {J L f(x;m) : M R n n, det M 0} The class of L-smooth functions is closed under composition, and includes all smooth functions and all convex functions [1]: Y. Nesterov, Math. Program. B, 104 (2005)

24 Inverse and implicit functions LD-derivatives for inverse functions: Ø Suppose f is L-smooth and locally invertible near ŷ, f(ŷ) = ẑ, and f 1 is Lipschitz near ẑ Ø Result: f 1 is also L-smooth at ẑ Ø Result: [f 1 ]'(ẑ;m) is the unique solution N of : f '(ŷ;n) = M LD-derivatives for implicit functions: Ø Suppose h is L-smooth, h(ŷ, ẑ) = 0, and there exists an implicit function w such that Ø h(w(z), z) = 0, for each z near ẑ Ø Result: w is L-smooth at ẑ Ø Result: w'(ẑ;m) is the unique solution N of: '! h' # ) ("# ŷ ẑ $ & %& ;! # " N M $ * & %, = 0 + 1: Khan and Barton, submitted 24

25 Vector Forward AD Mode for LD-derivatives f :! 2! :(x, y) " max{x, y, x, y} g :!! 2 : x " (x,x) h :!! : z " f #g Z f = {(x, y)! 2 : y = x or y = x} z g(!) Z f h(z) = f (g(z)) = f (z, z) = max{z, z, z, z}= z 25

26 Generalized Derivatives of Algorithms: MHEX model F 1,T 1 in! out F H,T H f 1,t 1 in MHEX!! out f C,t C F 1,T 1 out! out F H,T H f 1,t 1 out out f C,t C F ( i T in out i T ) i = f j t out in i t i i H min p P j C ( ) ( EBP p p C EBP ) H = 0 UA ΔQ k = 0 k K k K k ΔT LM 26

27 Formulating a PC 1 Area Constraint Consider the set of points which are either kinks or endpoints on the composite curves Ø Index this set of points with set Ø Each k K has an associated enthalpy value K k Q T Q = Q 1 2 k Q Q k 1 Q + K Q 27

28 Formulating a PC 1 Area Constraint If both temperatures at adjacent points are known, the interval can be treated as a twostream heat exchanger T k k 1 T + T ΔQ k = UA k k ΔT LM k t k 1 t + k ΔQ Q = Q 1 2 k Q Q k 1 Q + K Q 28

29 Formulating a PC 1 Area Constraint Summing the area over all intervals gives the total MHEX area: ΔQ k UA = k K k K k ΔT LM T k ΔQ Q = Q 1 2 k Q Q k 1 Q + K Q 29

30 Formulating a PC 1 Area Constraint Difficulties: Ø The enthalpies and temperatures need to be sorted Ø Not all of the temperatures are known Consider (naïve) bubble sort: Only calculations involve taking min/max of two entries Same sequence and number of calculations regardless of whether the input is well-sorted or not Naïve bubble sort is an composite PC 1 function 30

31 Calculating Unknown Temperatures Finding the unknown temperatures involves solving one k k of these equations for : out in ( ) k k ht (, y, Q ( y) ) = 0 in out ( max{0, } max{0, }) k k k Q ( y) F max{0, T T } max{0, T T } = 0 i H i i i k k k Q ( y) f t t t t = 0 j C j j j Easily solved using if-else logic T Ø Correctly calculates values of the unknown temperatures, but not their LD-derivatives ß not composite PC 1 Solution of the equation defines an implicit function η :! n y!! : k k k k ht (, y, Q ( y) ) = 0 T ( y) =η ( y, Q ( y) ) Generalization of the implicit k k T '( yˆ ; I) = η ( yˆ, Q ( yˆ )); function theorem Q, t k I '( yˆ ; I) 31

32 Calculating Temperature Driving Force and Area Once all temperatures are found, the driving force for each interval can be calculated Continuously differentiable elemental function Ø Requires modification of standard log-mean temperature difference calculation: ( + 1, ) Δ T = max{ ΔT, T t } k k k min Δ T = max{ ΔT, T t } k+ 1 k+ 1 k+ 1 min k+ 1 k T T k Δ +Δ if Δ T =Δ k k k 2 ΔTLM ΔT Δ T = k+ 1 k ΔT ΔT k+ 1 k ln( ΔT ) ln( ΔT ) otherwise T k+ 1 Finally, calculate the total area using: k Q UA = Δ k k K ΔT LM k K 32

33 Formulating the ΔT min Equation New extended pinch operator formulation: ( { } EBP C p = f j max 0,(T p ΔT min ) t j in j C max{ 0,(T p out ΔT min ) t } j + max{ 0,(T p ΔT min ) t max } max 0,t min (T p ΔT min ) ( { } EBP H p = F i max 0,T p T i out i H { }) max 0,T p in { T i } max{ 0,T min T p }+ max 0,T p T max { }) Relevant equation for the new formulation: min p P ( EBP p p C EBP ) H = 0 33

34 Complete Formulation F i T in out ( i T ) i f j t out in i t i i H j C min p P ( ) = 0 ( EBP p p C EBP ) H = 0 UA T i out t j out ΔQ k = 0 k k K k K ΔT LM T i in, i H t j in, j C 3 equations (plus constraints on outlet temperatures) Solve for 3 unknown quantities using LP-Newton method (temperatures, flow rates, area, minimum approach temperature, etc.) 34

35 LNG process case study Two MHEX models plus three intermediate compression/expansion operations Ø 9 equations (4 nonsmooth) à solve for nine variables 35

36 LNG process data and variables Seven temperatures are taken as unknown in each of the following cases Ø 2 additional variables can be taken as unknowns (UA, approach temperature, more temperatures, etc. 36

37 LNG process example: Case I Δ T = 4 K min UA =? ΔTmin = 4K UA =? Given the minimum approach temperature, calculate the area 37

38 LNG process example: Case I HX-100: UA = kw/k HX-101: UA = kw/k y 1 y 2 y 3 y 4 y 5 y 6 y K K K K K K K 38

39 Empirical Convergent Rate k f( y ) Iteration 39

40 LNG process example: Case II ΔT = ΔT min =? min? UA = 120 (kw/k) UA = 30 (kw/k) Given the area, calculate the minimum approach temperature 40

41 LNG process example: Case II HX-100: ΔT = 2.62 K min HX-101: ΔT =1.26 K min y 1 y 2 y 3 Y 4 y 5 y 6 y K K K K K K K 41

42 LNG process example: Case III Δ T = 4K Δ Tmin = 4 K min UA = 85 UA = 35 (kw/k) (kw/k) Given the area and the minimum approach temperature, calculate (more) information about the streams 42

43 LNG process example: Case III T = out H K t out = C3 K y 1 y 2 y 3 y 4 y 5 y 6 y K K K K K K K 43

44 PC RHS with non-pc Solution!x = y,!y = x,!z = x, t 1 1 x(0) = x y(0) = y z(0) = z z(t) = z 0 + x 0 cos(s) + y 0 sin(s) ds 0 t = z 0 + (rcos(θ)) cos(s) + (rsin(θ))sin(s) ds 0 = z 0 + x y 0 2 t 0 cos(θ s) ds z 2π (x 0, y 0,0) y 0 x 0 z t (x 0 ) for t [0,10] ( y 0 = 0, z 0 = 0) z 2πk (x 0, y 0,0) = f (k) x y 0 2 positive, independent of θ 44

45 Existing dynamic sensitivities 45

46 Existing dynamic sensitivities 5 6 x(t,c) 0 Γ[x(2, )](t) t t 46

47 Dynamic LD-derivatives 47

Example 1 x(t,c) 5 0-5 0 1 2 3 t Generalized derivative bounds 6 5 4

48 Example 1 x(t,c) t Generalized derivative bounds LNA bounds Lex. deriv. bounds Time (t) 48

49 Singleton & Nonsingleton Trajectories OK Not OK OK 49

Computationally Relevant Generalized Derivatives Theory, Evaluation & Applications

Computationally Relevant Generalized Derivatives Theory, Evaluation & Applications Paul I. Barton, Kamil A. Khan, Jose A. Gomez, Kai Hoeffner, Peter Stechlinski, Paul Tranquilli & Harry A. J. Watson Process