Understanding Automatic Differentiation to Improve Performance

Size: px

Start display at page:

Download "Understanding Automatic Differentiation to Improve Performance"

Thomas Johnston
5 years ago
Views:

1 Understanding Automatic Differentiation to Improve Performance Charles Margossian Columbia University, Department of Statistics July 22nd 2018

3 Sampler Metropolis Hasting, Gibbs Hamiltonian Monte Carlo Order of derivative 0 (value) 1 (gradient) Riemannian HMC 2 (Hessian) and 3

6 How do we efficiently compute ( ) log(π(θ x)) = log(π(θ x),, log(π(θ x)? θ 1 θ n

7 f (y, µ, σ) = ( y µ σ ) 2

8 Expression graph pow v 6 / v 5 2 v 4 y v 1 µ v 2 σ v 3

9 Solving algebraic equations Find x such that f (x, θ) = 0 and compute θ x (θ). Example Newton s algorithms: x i+1 = x i f (x i, θ) f (x i, θ)

10 Computing derivatives x i+1 = x i f (x i ) f (x i ) x i+1 / x i f (x i ) f (x i ) Figure: Topological graph for automatic differentiation. The orange nodes further expand into topological graphs, across which we apply the chain rule.

11 Using semi-analytical solutions Under certain regularity conditions: ( ) f 1 θ x f (θ) = x θ The result extends to higher dimensions, by using Jacobian matrices.

12 Run time (s) 0.04 colour semi analytical standard number of states

13 Run time (s) (log scale) colour semi analytical standard number of states

14 Example: ordinary differential equations y (t) = f (y, t, θ) where y R n and θ R p.

15 Example: ordinary differential equations y (t) = f (y, t, θ) Need to compute: the solution: y the derivatives: J = y 1 y 1 θ p θ 1 y n θ 1 y n θ p

16 Components which may require sensitivities model parameters, θ R P initial states, y R N time, t 1 R J = y 1 y 1 θ p y 1 y 0 1 θ 1 y n θ 1 y n θ p y n y 0 1 y 1 y 0 n y n y 0 n

17 Coupled Ordinary Differential Equations: y 1 = f 1 (y, t, θ) y 2 = f 2 (y, t, θ) d y 1 = dt θ 1 f 1,1 (y, t, θ) d y n = dt θ p f n,p (y, t, θ) d y 1 dt y1 0 = f n,p (y, t, θ)

18 Coupled Ordinary Differential Equations: y 1 = f 1 (y, t, θ) y 2 = f 2 (y, t, θ) d y 1 = dt θ 1 f 1,1 (y, t, θ) d y n = dt θ p f n,p (y, t, θ) d y 1 dt y1 0 = f n,p (y, t, θ)

19 Coupled Ordinary Differential Equations: y 1 = f 1 (y, t, θ) y 2 = f 2 (y, t, θ) d y 1 = dt θ 1 f 1,1 (y, t, θ) d y n = dt θ p f n,p (y, t, θ) d y 1 dt y1 0 = f n,p (y, t, θ)

20 Coupled Ordinary Differential Equations: y 1 = f 1 (y, t, θ) y 2 = f 2 (y, t, θ) d y 1 = dt θ 1 f 1,1 (y, t, θ) d y n = dt θ p f n,p (y, t, θ) d y 1 dt y1 0 = f n,p (y, t, θ)

21 Number of evaluations when we require sensitivities for model parameters and initial states C N(N + N 2 + P + P N)

22 Number of evaluations when we require sensitivities for model parameters and initial states C N(N + N 2 + P + P N)

23 Number of evaluations when we require sensitivities for model parameters and initial states C N(N + N 2 + P + P N)

24 PK / PD ordinary differential equation y PK = f PK (y PK, t) y PD = f PD (y PK, y PD, t) where we note y PK R N PK and y PD R N PD.

25 PK / PD ordinary differential equation y PK = f PK (y PK, t) y PD = f PD (y PK, y PD, t) where we note y PK R N PK and y PD R N PD.

26 Full integration y = f (y, t, θ)dt

27 Mixed Solving y PK = F PK (t, θ) y PD = f PK (F PK, y PK, t, θ)dt Computing F PK is more expensive than computing f!

28 Computer experiment PK model with N PK = 3 PD model with N PD = 5 Theoretical relative cost: 0.42 Note 5/8 = > 0.42!

29 More theoretical results Initial State Initial State for y 1 for y 2 Parameters R

30 Empirical result R = ± 13.51(%)

31 Drawbacks: Coding analytical solutions is time consuming and error prone. There is some difficult bookkeeping when doing mixed solving. Torsten has routines to do so when the PK is a one or two compartment model. mixedode1cptmodel mixedode2cptmodel Torsten also uses mixed solving for algebraic equations.

32 Acknowledgment Individuals: Bill Gillespie (Metrum Research Group) Bob Carpenter (Columbia) Andrew Gelman (Columbia) Sebastian Weber (Novartis) Michael Betancourt (Symplectomorphic LLC) Ben Goodrich (Columbia) Yi Zhang (Metrum Research Group) Institutions: Office of Naval Research, Bill & Melinda Gates Foundation Columbia University, Metrum Research Group, AstraZeneca

MIT /30 Gelman, Carpenter, Hoffman, Guo, Goodrich, Lee,... Stan for Bayesian data analysis

MIT /30 Gelman, Carpenter, Hoffman, Guo, Goodrich, Lee,... Stan for Bayesian data analysis MIT 1985 1/30 Stan: a program for Bayesian data analysis with complex models Andrew Gelman, Bob Carpenter, and Matt Hoffman, Jiqiang Guo, Ben Goodrich, and Daniel Lee Department of Statistics, Columbia