US-Japan JIFT Workshop 2011 The Next Stage in the Progress of Simulation Science in Plasma Physics Dec.2-Dec.3, 2011 at NIFS, Toki, Japan Next stage in the extreme- scale first- principles simula4on using XGC C. S. Chang Princeton Plasma Physics Laboratory Work supported by US DOE OFES and OASCR, and the FPTRC of KAIST. Computing time has been provided by OLCF and NERSC.
Outline Introduction to extreme scale first principles computing using XGC code in the SciDAC CPES 1 and EPSI 2 Full-f whole-volume simulation of turbulence-neoclassical physics Present capability on extreme-scale HPCs Unlike in core, E r in edge plasma can be determined Full-f kinetic electrons in XGC1 (in addition to XGC0) Numerous multiscale physics studies are being performed in XGC1 Kinetic-kinetic projective multiscale simulation to experimental edge time with error resetting Integrated simulation in the large scale HPC era 1 CPES (Center for Plasma Edge Simulation) of SciDAC2 is proposed to change into 2 EPSI (Partnership for Edge Physics Simulation) of SciDAC3 SciDAC: Scientific Discovery through Advanced Computing (pushing the edge of the parallel scalability)
Why extreme scale computing? Can simulate multiscale science from first principles within a reasonable wall-clock time and predictable equation accuracy. A physics calcula4on is from first principles if it starts directly at the level of the most fundamental established laws of physics in that discipline using a jus4fiable set of approxima4ons, and does not make assump4ons such as empirical models and ficng parameters [Wikipedia]. For example: Schrödinger s equa8on for electronic structure Vlasov and gyrokine8c equa8ons for plasma (If a kine8c eq. uses empirical Landau resonance model first principles)
XGC is a full-f multi-scale, multi-physics code df/dt=c(f)+source+sink Full-f code: solves for f with sources and sinks - Good for equilibrium or non-equilibrium thermodynamic problems - Multi-physics: Neoclassical, turbulence, neutrals, impurities, radiation, alpha particles, NBI and others - Solves for the stiff equilibrium plasma profile and the transport consistently - No growing weight problem - Full-f particle code requires large scale HPCs - Particle codes have been scaling well to the maximal number of cores Delta-f codes: df/dt=c(f), Source and Sink =0 Assumes conservation of f: equilibrium thermodynamics f=f 0 (n,t) + δf, and solves for δf=wf 0 on fixed f 0 Cannot be used for edge plasma, or under sources and sinks.
XGC1 scales efficiently to the maximal number of Jaguarpf cores 223,488 cores
Frequency Window: 10 30 khz Simulation by S. Ku, Visualization by K. Ma, Aug. 2011
Full-f simulation can predict stiff T i profile Turbulence energy spreads in position space while nonlinearly cascading in k-space. Heat flows ourward from core to edge. Final T i profile becomes globally somewhat super critical and stiff, except around the magnetic axis where Grad T i is small and in the pedestal where density gradient is large (η i << η ic ). Simulation by S. Ku
ExB flow in the edge region from neoclassical x-transport and ITG turbulence physics, showing negative radial electric field well as seen in experiments. The E r - well forma8on is robust, from the separatrix effect (X- Transport, Chang, Phys. Plasmas, 2004 and 2009)
Edge potential-well (and pedestal) formation Separatrix effect E r Φ(eV) Grounded Wall ψ N
Full-f kinetic simulation tries to mimic real experiments Radial electric field (and rotation) generation is a fundamental property of tokamak plasma Γ i = Γ e What is happening in the initial transient stage of tokamak discharge? A theorist s view: Plasma generation: electrons and ions are born together. Neoclassical charge separation is attempted by ions via much wider radial guiding center orbit than electrons in r p 0. Plasma resists to large charge separation by generating a proper E r in the initial transient stage. Ion polarization currents (neoclassical and classical), Orbit squeezing Plasma then quickly establishes the quasi-steady state within neoclassical orbit time Analytic theories and transport ordering are valid from here. Automatic ambipolarity. E r / t Γ i pol u i +(cti/eb p )(e/ti)dφ/dr = (ct i /eb p )[kdlogt i /dr dlog p i /dr]
Unlike in core, E r in edge is determined independently of u i. Quasi- steady state: Within neoclassical ion orbit 8me, E r0 saturates to quasi- steady state. AUer GAMs decay, we have the automa8cally ambipolar neoclassical plasma en i (gyro- center, Φ, Φ )+(ρ i /λ D ) 2 2 Φ = en e (Φ, Φ ) for any Φ, Φ sa8sfying u i +(cti/eb p )(e/ti)dφ/dr = (ct i /eb p )[kdlogt i /dr dlog p i /dr] Lowest order kine8c equa8on cannot determine u i and dφ/dr independently (1970s) in an axisymmetry core neoclassical system. Above statement in the context of the gyrokine8c Poisson equa8on: in the core plasma (ρ i /λ D ) 2 2 Φ + nonlinear terms = en e (Φ, Φ,u i )- en igc (Φ, Φ,u i ) O(ρ i /L) 2 : small O(ρ i /L B ): neoclassical L a is not much less than L B we need higher order terms on the RHS to determine Φ uniquely. This ordering and the neoclassical knowledge from the core break down in the edge where Φ = O(ρ i /Δr) 2 is large, and where orbit loss, wall sheath, neutrals, etc. exist, LHS = O(ρ i /Δr) 2 can easily beat RHS O(ρ i /L B ). And, other edge effects can be greater than the hidden higher order effects E r in edge is determined. (In the edge, virulent E r and rotation sources exist from boundary phenomena [Hazeltine and Meiss, Plasma Confinement,1992])
Artificial core-edge boundary distorts edge turbulence solution ITG grid in the whole core is preferred while the edge grid is refined. Pedestal top (V 2 ) Pedest al top Separatrix 0.85 0.90 0.95 1.00 1.05 f119 Edge only simulation with a coreedge boundary Edge ITG solution without an artificial core-edge boundary in a whole-volume simulation Simulation by S. Ku
Multi-physics in XGC program Neoclassical physics Turbulence physics Neutral particles (DEGAS2 is in XGC as a subroutine) Impurity particles Radiation physics 3D magnetic perturbation RF interaction (XGC-RF, J. Kwon) Neutral beam injection (incomplete) Neutral particle density distribution in realistic DIII-D edge geometry from plasma particles, neutral particles and wall recycling in XGC [simulation by D. Stotler].
Moving Forward with XGC1 into Electromagnetic edge turbulence: delta-f full-f Fluid-kinetic hybrid electrons imported from GTC Split-weight kinetic electrons imported from GEM XGC1 verification of Shear Alfven wave. The line is from an analytic calculation, the o data points are from GTC and the + data points are from XGC1. Split-weight-electron simulation of electromagnetic turbulence in XGC1 at low electron beta.
XGC1 is moving forward into Electromagnetic edge turbulence: delta-f full-f Verification of kinetic electron dynamics in XGC1 against GT3D, GTC, and FULL in delta-f mode. Fluid-kinetic hybrid electrons are used. Double hybrid technique Par8cle- con8nuum 4- point averaging on con8nuum grid Nonlinear collisions on con8nuum grid Full- f (numerical f e0 on con8nuum grid, source and sink) + delta- f electrons with f e0 informa8on moved to grid [cf, S. Parker + C.S. Chang, Particle-Continuum coarsegraining and resetting method, PoP 2008]
Full delta-f will be used, and verified against full-f
Verification of fully nonlinear FP collision operator Ψ N Neoclassical ion thermal conduc8vity Neoclassical par8cle transport
XGC1 is studying many multiscale physics Pedestal structure RMP penetration and pedestal response (2011 APS-DPP Invited) Core-edge interaction (under prep for submission to Nature Physics) I-mode and H-mode transitions Numerical Solomon Experiments to identify residual stress (IAEA2010, Oral, NF) Cold pulse experiment Turbulence around magnetic axis ELM crash, in coupling with Elite and M3D-MPP (soon with M3D-C1) Bootstrap current (to be submitted to PoP) Impurity transport And others
Numerical Solomon experiment in XGC1p in ITG turbulence [S. Ku, Diamond, J. Kwon, G. Dif-Pradalier, et al, IAEA2010] Submitted to NF
Time averaged Reynolds stress. Since parallel flow is near zero globally, this is primarily residual stress (intrinsic torque). R/L T and Correlated with V E and I
Simula8on of cold edge pulse in XGC1p Temperature decrease Strong Cooling ITG turbulence in XGC1p Strong cooling at the edge after the plasma reaches quasi steady state. Cold pulse propagates inwards. Inward propagation of intensified turbulence Start of edge cooling at t=0 Simulation by S. Ku
Local s8ffness change due to turbulence intensity When intensity pulse arrives Local stiffness model Higher heat conductivity due to incoming intensity pulse. Speed of cold pulse speed of intensity propagation Nonlocal turbulence phenomena Simulation by S. Ku
Bootstrap current is an output from XGC. A new formula has been established for pedestal. In the weakly collisional regime, Sauter formula is surprisingly good in steep edge pedestal all the way to the separatrix. DIII-D case is shown here. C-Mod shows similar agreement. NSTX shows somewhat worse agreement, but acceptable. However, in the collisional regime, Sauter formula shows deviation from XGC0. In the NSTX case shown here, XGC0 bootstrap current is about 40% higher than Sauter. [Sehoon Koh, part of PhD thesis].
Kinetic-MHD coupled simulation for pedestal-elm cycle in automated EFFIS framework Linear stability check (Binary Elite) B-reconstruction and mesh interpolation by M3D-OMP Divertor Heat Load (2010 OFES Milestone) ELM crash in extended M3D-MPP XGC0 kinetic transport modeling Heat Pressure profile Neutrals ψ N mxn Memory coupling of MHD turbulence T=76 saturation T = 496 relaxation
Projective multiscale temporal integration within XGC Prolong the kinetic simulation to experimental time scale (~50 ms for edge) while resetting the numerical and equation error accumulation Restricting to coarse grained axisymmetric kinetic system for fast experimental time scale simulation (kinetic neoclassical using the turbulent fluxes from fine grained simulation) Lifting to fine grained system for microscopic level fidelity (kinetic turbulence + kinetic neoclassical multiscale physics) Does not lose information by the conventional kinetic fluid restricting and lifting. Start from a working technology: Explicit time integrating (already working in XGC0-GEM and XGC0-XGC1) Second order predictor-corrector algorithm (planned) Improve the accuracy and stability in collaboration with applied math members The computer science technology EFFIS already exists in CPES Memory requirement is minimal Use the same cores and memory between two phases to reduce communication Hybrid staging technology Fine grained (XGC1) coarse grained (XGC0-type) time
Integrated simulations face Heterogeneous computation Give up bulk synchronous programming Give up default use of high precision (storage and bandwidth) Flop/s rate (and percentage of peak flop/s) are cheap and become less relevant. Use cheap resources to manage expensive resources Fault tolerance Mean time between hardware interrupts shortens Rate of retiring flops in each individual processor becomes much less predictable (automatic load balancing, but difficult) Architectural opportunities may lure users into more arithmetically intensive formulations (e.g., avoid PDEs): David Keyes Asynchronous computing Minimize data movement and IO In-situ in memory data analysis and visualization Adios with hybrid staging in CPES- EPSI, in service oriented architecture.
Development of Integrated Simulation Technology in CPES/EPSI Toward Extreme Scale Computing < 2010: Data movement was cheap Service A Kepler Service B Either intra plahorm (under a unified job submissions) or inter plahorms Centralized memory coupling Kepler control mxn memory coupling
2011: Data movement is becoming expensive: Thus Process the coupling, visualiza8on, and storage data in memory & in- situ Use plug- ins Use hybrid staging In- node coupling with shared cache memory Minimize the data movement between GPUs and CPUs Plugins executed within the applica8on node Staging/DataSpaces Area Plugin B Plugin C Applica8on Plugin A Plugin E Plugin D Applica8on par88on
Crea8on of I/O pipelines to reduce file ac8vity
ADIOS with DataSpaces for in-memory code coupling Virtual shared space Constructed on-the-fly on the cloud of staging nodes Indexes data for quick access and retrieval Provides asynchronous coordination and interaction, and realizes the shared-space abstraction In-memory code coupling becomes part of the ADIOS I/O pipeline ADIOS (ADaptive I/Os) started our in CPES to make the large scale check-point possible: > 1 hour <1 minute for 1 tb In-space (online) data transformation and manipulations Robust decentralized data analysis inthe-space