A fully implicit, exactly conserving algorithm for multidimensional particle-in-cell kinetic simulations

A fully implicit, exactly conserving algorithm for multidimensional particle-in-cell kinetic simulations L. Chacón Applied Mathematics and Plasma Physics Group Theoretical Division Los Alamos National Laboratory P.O. Box 1663, Los Alamos, NM 87544 Other collaborators: G. Chen, W. Taitano, D. A. Knoll (LANL) D. C. Barnes (Coronado Consulting) Numerical Methods for Large-Scale Nonlinear Problems and Their Applications Aug 31-Sept 4, 215 ICERM, Brown U. Providence, RI Work funded by DOE-SC ASCR, and the LANL and ORNL LDRD programs

Outline Particle-in-cell (PIC) methods for plasmas Explicit vs. implicit PIC Preconditioned JFNK 1D ES implicit PIC Multirate implicit time integrator (particle subcycling) Moment-based preconditioning Generalization to Curvilinear Multi-D EM PIC: Vlasov-Darwin model Review and motivation for Darwin model Conservation properties (energy, charge, and canonical momenta) Numerical benchmarks

Introduction

Kinetic Plasma Simulation A fully ionized collisionless plasma: ions, electrons, and electromagnetic fields Challenge: integrate electron-ion-field kinetic system on an ion time-scale and a system length scale while retaining electron kinetic effects accurately. (We are developing a new implicit algorithm for long-term, system-scale simulations. ) Problem features a hierarchical description: How to design a multiscale algorithm? How to respect conservation laws, and constraints?

Particle-in-cell (PIC) methods for kinetic plasma simulation t f + v f + F m v f = Lagrangian solution by the method of characteristics: f (x, v, t) = f (x t dtv, v 1 m t ) dtf ; x(t = ) = x ; v(t = ) = v PIC approach follows characteristics employing macroparticles (volumes in phase space) f (x, v, t) = p δ(x x p )δ(v v p ) t B + E = ẋ p = v p v p = q p m p (E + v B) µ ɛ t E + B = µ j B = E = ρ ɛ δ(x x p ) S(x x p ) ; E p = i E i S(x i x p ) ; j i = p j p S(x i x p )

State-of-the-art classical PIC algorithm is explicit Classical explicit PIC: leap-frogs particle positions and velocities, field-solve at position update: Implementation is straightforward, but... Performance limitations: CFL-type instability: min(ω pe t < 1, c t < x). Minimum temporal resolution Finite-grid instability: x < λ Debye. Minimum spatial resolution Memory bound: challenging for efficient use of modern computer architectures. Accuracy limitations: Lack of energy conservation, problematic for long-time-scale simulations To remove the stability/accuracy constraints of explicit methods, we consider implicit methods.

Implicit PIC methods Exploration of implicit PIC started in the 198s Implicit moment method 1 Direct implicit method 2 Early approaches used linearized, semi-implicit formulations: Lack of nonlinear convergence Particle orbit accuracy (particle and fields integrated in lock-step) Inconsistencies between particles and moments Inaccuracies! Plasma self-heating/cooling 3 Our approach: nonlinear implicit PIC Enforcing nonlinear convergence; complete consistency between particles, moments, and fields. Allowing stable and robust integrations with large t and x (2nd order accurate). Ensuring exact global energy conservation and local charge conservation properties. Allowing adaptivity in both time and space without loss of the conservation properties. Allowing particle subcycling high operational intensities (compute bound). Allowing fluid preconditioning to accelerate the iterative kinetic solver! 1 Mason, R. J. (1981), Brackbill, J. U., and Forslund, D. W. (1982) 2 Friedman, A., Langdon, A. B. and Cohen, B. I.(1981) 3 Cohen, B. I., Langdon, A. B., Hewett, D. W., and Procassini, R. J. (1989)

Fully implicit PIC: 1D electrostatic PIC Chen et al, JCP 211, 212, 213; Taitano et al, SISC (213)

Fully implicit PIC formulation (at first glance) A fully implicit formulation couples particles and fields non-trivially (integro-differential PDE): f n+1 f n t + v f n+1 + f n 2 2 Φ n+1 = q m Φn+1 + Φ n dv f n+1 or 2Φn+1 Φ n 2 t v f n+1 + f n 2 = = dv v f n+1 + f n 2 In PIC, f n+1 is sampled by a large collection of particles in phase space, {x, v} n+1 p. There are N p particles, each particle requiring 2 d equations (d dimensions), Field requires N g equations, one per grid point. If implemented naively, an impractically large algebraic system of equations results: G({x, v} n+1 p, {Φ n+1 } g ) = dim(g) = 2dN p + N g No current computing mainframe can afford the memory requirements Algorithmic issues are showstoppers (e.g., how to precondition it?) An alternative strategy exists: nonlinear elimination (particle enslavement)

Particle enslavement (nonlinear elimination) Full residual G({x, v} p, {Φ} g ) = is impractical to implement Alternative: nonlinearly eliminate particle quantities so that they are not dependent variables: Formally, particle equations of motion are functionals of the electrostatic potential: x n+1 p = x p [Φ n+1 ] ; v n+1 p = v p [Φ n+1 ] G(x p n+1, v p n+1, Φ n+1 ) = G(x[Φ n+1 ], v[φ n+1 ], Φ n+1 ) = G(Φ n+1 ) Nonlinear residual can be unambiguously formulated in terms of electrostatic potential only! JFNK storage requirements are dramatically decreased, making it tractable: Nonlinear solver storage requirements N g, comparable to a fluid simulation Particle quantities auxiliary variables: only a single copy of particle population needs to be maintained in memory throughout the nonlinear iteration

Energy-conserving (EC) Vlasov-Ampère discretization Fully implicit Crank-Nicolson time discretization: v n+1 p t v n p E n+1 i Ei n ε t x n+1 p t + j n+ 1/2 i j = ; x n p v n+ 1/2 p = ; q p m p E n+ 1/2 i S(x i x n+ 1/2 p )= ; i j n+ 1/2 i = q p v n+ 1/2 p S(x n+ 1/2 p x i ). p C-N enforces energy conservation to numerical round-off: p m p 2 (vn+1 p + v n p)(v n+1 p v n p) = i E n+1 i ε Ei n t E n+1 i + Ei n 2 p 1 2 m pv 2 1 p + 2 ε Ei 2 = const i No CFL condition. Does not suffer from finite-grid instabilities: x λ D!! Requires that particles and fields are nonlinearly converged.

Jacobian-Free Newton-Krylov Methods After spatial and temporal discretization a large set of nonlinear equations: G( x n+1 ) = Converging nonlinear couplings requires iteration: Newton-Raphson method: G δ x x k = G( xk ) k Jacobian linear systems result, which require a linear solver Krylov subspace methods (GMRES) Only require matrix-vector products to proceed. Jacobian-vector product can be computed Jacobian-free (CRITICAL: no need to form Jacobian matrix): ( ) G G( xk + ɛ y) G( xk ) y = J k y = lim x ɛ ɛ k Krylov methods can be easily preconditioned: P 1 k J 1 k J k P 1 k P k δ x = G k We will explore suitable preconditioning strategies later in this talk.

Algorithmic implementation details The nonlinear residual formulation G(E n+1 ) based on Vlasov-Ampere formulation is as follows: 1. Input E (given by JFNK iterative method) 2. Move particles (i.e., find x p [E], v p [E] by solving equations of motion) (a) Requires inner (local) nonlinear iteration: Picard (not stiff) (b) Can be as complicated as we desire (substepping, adaptivity, etc) 3. Compute moments (current) 4. Form Vlasov-Ampere equation residual 5. return Because dependent variables are fields, we can explore moment-based preconditioning! Because particle move is performed within function evaluation, we have much freedom. We can explore improvements in particle mover to ensure long-term accuracy! Particle substepping and orbit averaging (ensures orbit accuracy and preserves exact energy conservation) Exact charge conservation strategy (a new charge-conserving particle mover) Orbit adaptivity (to improve momentum conservation)

Adaptive, Charge conserving Particle orbit integration Multi-rate Integrator: field time-scale ( t) and orbit time-scale ( τ) can be well separated Field does not change appreciably in t. Accurate orbit integration requires particle sub-stepping! Particles stop at cell boundaries exact charge conservation for B-splines with order 2 ρ i+ 1 2 = p q p S m (x x i+ 1 2 ) x j i = p q p v p S m 1 (x x i ) x S m(x) = S m 1(x+ x 2 ) S m 1(x x x 2 ) (m=1,2) = [ t ρ + j = ] n+ 1 2 i+ 1 2 =

Energy conservation and orbit averaging Particle substepping breaks energy conservation. Energy conservation theorem can be recovered by orbit averaging Ampère s law: ɛ t E + j = j, 1 t+ t dτ[ ] t t ɛ E n+1 E n t + j = j Orbit-averaged current is found as: j = 1 t t+ t t dτ j 1 t p N ν q p v p S(x x p ) τ ν ν=1 With these definitions, exact energy conservation is recovered: p ν m p 2 (vν+1 p + v ν p)(v ν+1 p v ν E p) = n+1 E n ɛ t i E n+1 i + Ei n 2 p 1 2 m pv 2 1 p + 2 ɛ Ei 2 = const. i

K i K e Accuracy impact of adaptive charge-conserving im,sub-cn, t=1 movers 1e-3 4.98 4.96 4.94 im,acc-cn, t=4 im,cn, t=.25 im,cn, t=1 a. b. (total energy) 4e-1-4e-1 c. total momentum Npx[(mvth)i+(mvth)e] 4e-4 2e-4-2e-4-4e-4 Ion acoustic oscillations im=implicit cn=crank-nicolson, sub=fixed-substepping acc=adaptive charge conserving..5 1 1.5 2 t (x1) d.

Ion acoustic shock wave n i (a.u.) 3 2 1 ex, dt=.1 im, dt= 1. im, dt=1. t = 3, 15, 23, 3..4 E (a.u.).2. -.2 -.4 2 4 6 8 1 12 14 x 2 4 6 8 1 12 14 x 2 4 6 8 1 12 14 x 2 4 6 8 1 12 14 x Propagating IAW with perturbation level ɛ =.4, with 4 particles/cell. Realistic mass ratio (m i /m e = 2). Shock wave length scale Debye length.

Comparison against semi-implicit PIC

Moment-based acceleration of fully implicit kinetic solver Chen et al., JCP (214)

CPU gain potential of implicit PIC vs. explicit PIC Back-of-the-envelope estimate of CPU gain: CPU ( ) ( ) d T L n p C solver ; t x C imp C ex t imp N FE ; τ imp CPU ex CPU imp ( ximp x ex ) d τ imp t ex 1 N FE Using reasonable estimates: [ τ imp min.1 x ] imp, t imp v th t imp.1ω 1 pi t exp.1ω 1 pe k x imp.2 CPU ex 1 [ ] 1 1 mi min, CPU imp (kλ D ) d N FE kλ D m e x ex λ D CPU speedup is: Better for realistic mass ratios and increased dimensionality! Limited by solver performance N FE (preconditioning!) Largely independent of t for large enough time steps!

Moment-based acceleration of fully kinetic simulations Particle elimination nonlinear residual is formulated in terms of fields/moments ONLY: G(E) Within JFNK, preconditioner ONLY needs to provide field/moment update: δe P 1 G Premise of acceleration: obtain δe from a fluid model using current particle distribution for closure. We begin with corresponding fluid nonlinear model: t n α = Γ α m α [ t Γ α + ( 1 ] Γ α Γ α ) = q α n α E + n α ɛ t E = α q α Γ α ( n α ( Πα n α ) p )

Moment-based acceleration of fully kinetic simulations (cont.) We formulate approximate linearized fluid equations (neglect linear temperature response): δn α t m α δγ α t = δγ α q α (δn α E + n α,p δe) + ( Πα ) n α p δn α [ ] ɛ δe = t α q α δγ α G(E) δe can be obtained from Newton state E, Newton residual G(E), and particle closures Π α,p and n α,p

Preconditioner performance with t Number of FE 5 4 3 2 1 GMRES: no prec w/ prec Newton: no prec w/ prec Average particle pushing time (s) 4 2 Pushing time: no prec w/ prec Total CPU: no prec w/ prec 12 9 6 3 Total CPU time (s).5.1.15.2.25 ω pi t.5.1.15.2.25 ω pi t Number of FE remains constant with t (preconditioning) Overall CPU time of algorithm is independent of t (as predicted!)

Preconditioner performance: CPU scaling CPU ex 1 [ ] 1 1 mi min, CPU imp (kλ D ) d N FE kλ D m e 1 1 prec no prec (kλ D ) -1. (kλ D ) -1.86 CPU ex /CPU im 1 1 1.1.1.1.1.1 1 kλ D Transition occurs at kλ D m e m i.25, as predicted

Electromagnetic PIC: non-radiative Darwin formulation Chen et al, CPC 214, 215

Light wave excitation and radiative noise in Vlasov-Maxwell If one keeps light wave with exact energy conservation, one gets enhanced numerical noise due to numerical Cherenkov radiation (or possibly instability). Noise-control requires numerical dissipation Figure 1: Fourier spectrum for Weibel instability. [Markidis and Lapenta, JCP 211]. Numerical dissipation breaks energy conservation Solution: analytically remove light-wave when relativistic effects are not important

Darwin model formulation (potential form) Darwin model is formal O(v/c) 2 approximation to Maxwell s equations 4 Analytically removes light-wave while preserving charge separation effects Begin with Maxwell s equations: Consider potentials φ, A such that: t B + E =, (1) 1 c 2 te + µ j = B, (2) E = ρ/ɛ, (3) B =. (4) (4) B = A, (1) E = φ t A. In the Coulomb gauge ( A = ), taking c in transverse displacement current: 4 Krause and Morrison (27) 1 (2) 2 A c 2 t 2 2 A = µ [j ɛ t φ], (2) ɛ 2 t φ = j.

Full Darwin system: Darwin model formulation (cont.) 1 µ 2 A = j ɛ t φ, (5) ɛ t 2 φ = j. (6) A = (7) 2 φ = ρ/ɛ (8) Enforcing involutions (Eqs. 7, 8) is critical for accuracy. Careful discretization allows one to imply involutions, rather than solving for them: A = implied by Eqs. 5, 6 and careful boundary conditions: 2 ( A) = 2 φ = ρ/ɛ implied by Eq. 6 and exact PIC charge conservation: t ρ + j = Minimal Darwin formulation (j obtained from particles): 2 χ = j, 2 A = µ [j χ], ɛ t φ = χ.

Numerical integration of Vlasov-Darwin in multi-d Work directly with potential formulation, avoiding explicit involutions Spatial discretization on a Yee mesh Automatic enforcement of Coulomb gauge ( A = ) and Gauss law ( 2 φ = ρ/ɛ ) NO divergence cleaning needed! Fully implicit, fully nonlinear time stepping (Crank-Nicolson) Particles are nonlinearly enslaved, subcycled, time-adapted (implicit Boris push) Exact local charge conservation (implies Gauss law) Exact global energy conservation Exact conservation of canonical momenta in ignorable directions E, A, j y y y B x B z B y Φ, ρ, E, A, j z z z E x, A x, j x

CPU speedup potential of EM implicit PIC vs. explicit PIC Back-of-the-envelope estimate of CPU gain: CPU ex CPU imp ( ximp x ex ) d τ imp t ex 1 N FE τ imp min t imp.1ω 1 pi t exp x exp c k x imp.2 x ex λ D CPU speedup is: [.1 x imp v th,e,.1ω 1 ce, t imp ] CPU ex 1 [ c 1 1 min, CPU imp (kλ D ) d v th,e N FE kλ D Impacted by electron-ion mass ratio, how close electrons are to relativistic speeds. Again, key is to minimize N FE. We have developed a very effective moment-based preconditioner. c v A ] me mi, m i m e

EM preconditioner summary 5 We have developed a fluid preconditioner that takes into account both ion and electron linear responses: Proper asymptotic behavior: Large domain sizes (L d i ): recover Hall MHD and MHD current responses Small electron-to-ion mass ratios, m e /m i 1 Effective for ω pe > ω ce, i.e., weakly to moderately magnetized plasmas m e m i > ( v Ac ) 2, i.e., it limits achievable mass ratio for fixed magnetic field Could be overcome with proper model for gyroviscous linear response If strongly magnetized regimes are of interest, then t imp.1ωce 1, and: CPU ex 1 c 1 CPU imp (kλ D ) d v th,e N FE Still strong potential for algorithmic acceleration. 5 Chen and LC, CPC (submitted)

Isotropic ions, bi-maxwellian electrons 1D Electron Weibel instability m i /m e = 1836, T e /T e = 16, N e,i =128,, L = 2πc/ω pe, N g =32. 2e-16 charge conservation 1e-16 CPU ex /CPU im 1 1 (kλ D ) -2.33 1 1 1.1.1.1.1.1 kλ D 4e-11 energy conservation 2e-11-2e-11-4e-11 4e-6 momentum conservation (x) 2e-6-2e-6-4e-6 1e-12 5e-13-5e-13-1e-12 1e-12 5e-13-5e-13-1e-12 canonical momentum conservation (y) canonical momentum conservation (z) 1 2 3 4 5 ω pe t

Impact of Multi-rate Integrator on Temporal Rate of Convergence Numerical demonstration of second-order accuracy in time Absolute error.1.1 1e-5 1e-6 1e-7 1e-8 t=28 t=32 t=36 t=4 average 1e-9.1 1 1 1 ω pe t

2D Weibel instability m i /m e = 1836, T e /T e = 9, N pc =2, L = π d e π d e, N g = 32 32 magnetic energy density.1.1.1 1e-5 1e-6 1e-7 1e-8 implicit explicit linear theory (γ max =.12) 2 4 6 8 1 12 14 16 ω pe t 5e-15 1e-15 1e-6 1e-8 1e-1 1e-12 2e-8 charge conservation implicit explicit energy conservation momentum conservation (x) 1-2e-8 CPU ex /CPU im 1 1 1 1 1 (kλ D ) -2 (kλ D ) -3 1e-8 1e-12 1e-16 1e-13 canonical momentum conservation (z) div(a).1.1.1.1.1 1 kλ D 1e-15 5 1 15 ω pe t

2D Weibel instability in curvilinear mesh L x L y = 1 d e 1 d e, N g = 16 16, t =.1ω 1 pi magnetic energy density 1.1.1.1.1 1e-5 1e-6 1e-7 1e-8 uniform grid sin map,.5 sin map,.15 2 4 6 8 1 12 14 ω pe t 1 5 5 1

2D Weibel instability in curvilinear mesh (cont.) charge conservation uniform grid curvilinear grid 2e-7 1e-7 1 energy conservation 2e-5-2e-5 5 momentum conservation (x) canonical momentum conservation (z) div(a) 5 1 15 ω pe t 2e-7-2e-7 5e-9-5e-9 4e-7 2e-7 5 1 Sin map ɛ =.15 tol = 6e-4. hybrid pusher (Wang et. al. JPP (1999)): position in logical space; velocity in physical space Grid metrics are cell based (i.e. NGP). Need more sophisticated interpolation.

2D Electron Weibel instability: preconditioner performance L x L y = 22 22 (d 2 e), N pc = 2, t =.1ω 1 pi N x N y = 128 128 no preconditioner with preconditioner m i /m e Newton GMRES Newton GMRES 25 5.8 192.5 3 1 5.7 188.8 3 1836 7.7 237.8 4 2.8 m i /m e = 1836 no preconditioner with preconditioner N x N y Newton GMRES Newton GMRES 16 16 3.7 2 3.9 32 32 4 38.5 3.9 64 64 4.3 79.9 3.2

2D KAW: impact of magnetization B x, =.667, v et /c =.745 (β e =.1), t =.1ω 1 pi L x L y = 1 1(d 2 i ) N pc = 5 N x N y = 64 64 m i /m e = 25 L x L y = 22 22(d 2 i ) N pc = 2, N x N y = 32 32 Fixed magnetic field magnetic energy density.1.1.1 m i /m e no preconditioner with preconditioner Newton GMRES Newton GMRES 25 4 171.9 3.2 1 15 4.5 764 4 2.9 6 7.4 454.8 4 11.9 ω pe /ω ce = 3 1e-5 implicit simulation linear theory (γ max =.225) 5 1 15 2 25 3 35 4 45 5 ω ci t m i /m e no preconditioner with preconditioner Newton GMRES Newton GMRES 15 4.5 738 4 3 6 5.8 1887 4 3.9 1836 NC NC 4 5.9

Summary and conclusions We have demonstrated a fully implicit, fully nonlinear, multidimensional PIC formulation that features: Exact local charge conservation (via a novel particle mover strategy). Exact global energy conservation (no particle self-heating or self-cooling). Adaptive particle orbit integrator to control errors in momentum conservation. Canonical momenta (EM-PIC only, reduced dimensionality). The approach is free of numerical instabilities: ω pe t 1, and x λ D Requires many fewer dofs (vs. explicit PIC) for comparable accuracy in challenging problems Significant CPU gains (vs explicit PIC) have been demonstrated The method has much potential for efficiency gains vs. explicit in long-time-scale applications: CPU ex 1 c CPU imp (kλ D ) d v }{{ th,e } Physics Precond. {}}{ 1 N FE. Moment-based acceleration is effective in minimizing N FE, leading to an optimal algorithm.