Parallel Simulations of Self-propelled Microorganisms K. Pickl a,b M. Hofmann c T. Preclik a H. Köstler a A.-S. Smith b,d U. Rüde a,b ParCo 2013, Munich a Lehrstuhl für Informatik 10 (Systemsimulation), FAU Erlangen-Nürnberg b Cluster of Excellence: Engineering of Advanced Materials, FAU Erlangen-Nürnberg c Fakultät für Mathematik, Lehrstuhl für Numerische Mathematik, TU München d Institut für Theoretische Physik I, FAU Erlangen-Nürnberg
Flow Regimes 4 10-4 10 Re 9 10 2 10 all images taken from www.wikipedia.com ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 2
Flow at Low Reynolds Number: Purcell s Scallop Theorem t Stokes flow t 2 t 1 x 1 x 2 x domination of viscous forces small momentum always laminar time reversible no coasting we need asymmetric, non-time reversible motion to achieve any net movement E.M. Purcell. Life at low Reynolds number. American Journal of Physics 45: 3-11 (1977) ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 3
Physical Model of a Swimmer we choose the simplest possible design: Golestanian s* swimmer connections between the objects: spring-damper systems used in previous studies A. Najafi and R. Golestanian. Simple swimmer at low Reynolds number: Three linked spheres. Phys. Rev. E, 69(6):062901 (2004) K. Pickl et al. All good things come in threes three beads learn to swim with lattice Boltzmann and a rigid body solver. JoCS 3(5):374 387 (2012) ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 4
Physical Model of a Swimmer we choose the simplest possible design: Golestanian s* swimmer connections between the objects: spring-damper systems used in previous studies overlapping hydrodynamic interactions prevent bending (preserve axis of 180 ) introduce angular springs A. Najafi and R. Golestanian. Simple swimmer at low Reynolds number: Three linked spheres. Phys. Rev. E, 69(6):062901 (2004) K. Pickl et al. All good things come in threes three beads learn to swim with lattice Boltzmann and a rigid body solver. JoCS 3(5):374 387 (2012) M. Hofmann. Parallelisation of Swimmer Models for Swarms of Bacteria in the Physics Engine pe. Master s thesis, LSS, FAU Erlangen-Nürnberg (2013) ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 4
Non-time Reversible Cycling Strategy 0.3 0.25 0.2 Force (x-component) 0.15 0.1 0.05 0-0.05-0.1-0.15 Force on body 2 Force on body 1 Force on body 3-0.2-0.25-0.3 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 Time step total applied force vanishes over one cycle (displacement of swimmer over one cycle is zero in absence of fluid) applied along specified main axis of swimmer on center of mass of each body (in this case: x-direction) net driving force acting on system at each instant of time is zero ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 5
Software Fluid Simulation WALBERLA (widely applicable Lattice Boltzmann solver from Erlangen) suited for various flow applications different fluid models (SRT, TRT, MRT) suitable for homo- and heterogeneous architectures large-scale, MPI-based parallelization Rigid Body Simulation pe based on Newton s mechanics fully resolved objects (sphere, box,... ) connections between objects can be soft or hard constraints accurate handling of friction during collision large-scale, MPI-based parallelization I. Ginzburg et al. Two-relaxation-time lattice Boltzmann scheme: About parametrization,.... Comm. in Computational Physics, 3(2):427 478, (2008) P. A. Cundall and O. D. L. Strack. A discrete numerical model for granular assemblies. Geotechnique, 29:47 65, (1979) ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 6
Coupling both Frameworks: Four-Way Coupling 1. Object Mapping 2. LBM Communication 3. Stream Collide 4. Hydrodynamic Forces 5. Lubrication Correction 6. Physics Engine ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 7
So Far: Sequential Computations ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 8
So Far: Sequential Computations Get Ready for Parallel Simulations of Many Swimmers ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 8
So Far: Sequential Computations Get Ready for Parallel ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Simulations of Many Swimmers introduction of angular springs to prevent bending Parallel Simulations of Self-propelled Microorganisms 8
So Far: Sequential Computations Get Ready for Parallel ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Simulations of Many Swimmers introduction of angular springs to prevent bending handling of pair-wise spring-like interactions, extending not only over neighboring but also over multiple process domains job of the pe Parallel Simulations of Self-propelled Microorganisms 8
Parallel Discrete Element Method (DEM) First MPI communication: Send and receive forces and torques 1: Find and resolve all contacts inside each local domain 2: // First MPI communication 3: for all remote objects b rem do 4: sendforcesandtorquestoowner(b rem ) 5: end for 6: Receive forces and torques on local objects and perform a time-integration ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 9
Parallel Discrete Element Method (DEM) Second MPI communication: Update remote objects and migrate objects 7: for all local objects b loc do 8: for all neighboring processes p loc do 9: if b loc and p loc intersect and there is no remote object of b loc on p loc then 10: send b loc to p loc 11: end if 12: end for 13: for all processes p s holding remote objects of b loc do 14: send update of b loc to p s 15: end for 16: if b loc s center of mass has moved to neighboring process p n then 17: migrate b loc to p n 18: mark springs attached to b loc to be moved to p n 19: end if 20: end for 21: Receive updates and new objects M. Hofmann. Parallelisation of Swimmer Models for Swarms of Bacteria in the Physics Engine pe. Master s thesis, LSS, FAU Erlangen-Nürnberg (2013) ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 10
Parallel Discrete Element Method (DEM) New! Third MPI Communication: Send springs and attached objects M. Hofmann. Parallelisation of Swimmer Models for Swarms of Bacteria in the Physics Engine pe. Master s thesis, LSS, FAU Erlangen-Nürnberg (2013) ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 11
Parallel Discrete Element Method (DEM) New! Third MPI Communication: Send springs and attached objects 22: for all springs s send marked to be sent do 23: for all objects b att attached to s send do 24: send remote object of b att to the stored process p n 25: end for 26: send spring s send to the stored process p n 27: end for 28: Receive remote objects and instantiate a distant process, if necessary 29: Receive springs and attach them 30: Delete remote objects, springs, and distant processes no longer needed keep communication partners updated all information regarding spring-like pair-wise interactions is sent for long-range interactions: only associated processes communicate M. Hofmann. Parallelisation of Swimmer Models for Swarms of Bacteria in the Physics Engine pe. Master s thesis, LSS, FAU Erlangen-Nürnberg (2013) ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 12
So Far: Sequential Computations Now We Are Ready for ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Many Swimmers introduction of angular springs to prevent bending handling of pair-wise spring-like interactions, extending not only over neighboring but also over multiple process domains Parallel Simulations of Self-propelled Microorganisms 13
Weak Scaling Setup Does the newly introduced communication influence the scaling behavior? 1003 lattice cells and 2x8x8 swimmers/core rsphere = 4 lattice cells, dc.o.m. = 16 lattice cells 4,000 time steps smallest scenario: 4x4x4 cores (8,192 swimmers) successively doubling the cores in Cartesian directions entire domain: periodic in all directions ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 14
System Configurations of the Used Supercomputers SUPERMUC JUQUEEN # Cores 147,456 458,752 # Nodes 9,216 28,672 # Processors per Node 2 1 # Cores per Processor 8 16 Peak Performance [PFlop/s] 3.185 5.9 Memory per Core [GByte] 2 1 Processor Type Sandy Bridge-EP Intel Xeon E5-2680 8C IBM PowerPC A2 Clock Speed 2.7 GHz 1.6 GHz Interconnect Infiniband FDR10 IBM specific Interconnect Type Intra-Island Topology: non-blocking Tree 5D Torus Inter-Island Topology: Pruned Tree 4:1 ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 15
Weak Scaling Results on SUPERMUC 4000 100 3500 90 Time to solution [s] 3000 2500 2000 1500 1000 500 80 70 60 50 40 30 20 10 Efficiency [%] Efficiency Physics Engine LBM Communication Stream Collide Hydrodynamic Forces Object Mapping 0 0 4 8 16 32 64 128 256 512 Number of nodes using Intel C++ compiler version 12.1, IBM MPI implementation, and a clock speed of 2.7 GHz not displayed: Setup, Swimmer Setup and Lubrication Correction individual fractions measured using average over all cores ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 16
Weak Scaling Setup on JUQUEEN cores are able to perform four-way multithreading analyze our smallest setup: 4x4x4 cores (ˆ= 4 nodes) # Threads MLUP/s 1 23.66 2 40.03 4 48.80 ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 17
Weak Scaling Setup on JUQUEEN cores are able to perform four-way multithreading analyze our smallest setup: 4x4x4 cores (ˆ= 4 nodes) # Threads MLUP/s 1 23.66 2 40.03 4 48.80 ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 17
Weak Scaling Results on JUQUEEN 8000 100 7000 90 Time to solution [s] 6000 5000 4000 3000 2000 1000 80 70 60 50 40 30 20 10 Efficiency [%] Efficiency Physics Engine LBM Communication Stream Collide Hydrodynamic Forces Object Mapping 0 0 4 8 16 32 64 128 256 512 1024 2048 4096 8192 Number of nodes Largest simulated setup on 8,192 nodes: 16,777,216 swimmers using GNU C++ compiler version 4.4.6 not displayed: Setup, Swimmer Setup and Lubrication Correction individual fractions measured using average over all cores ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 18
Conclusions and Future Work Conclusions successful integration of handling pair-wise interactions extending over process domains weak scaling on two supercomputers currently ranked in top ten of TOP500 list demonstrate scalability on up to 262,144 processes Future Work analyze collective behavior of swimmers systematically reaching a steady state requires longer time steps improvement of parallel I/O and associated data analysis strong scaling characteristics ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 19
Thank you for your attention! Extract from the References K. Pickl et al. All good things come in threes three beads learn to swim with lattice Boltzmann and a rigid body solver. Journal of Computational Science, 3(5):374 387, 2012. M. Hofmann. Parallelisation of Swimmer Models for Swarms of Bacteria in the Physics Engine pe. Master s thesis, Lehrstuhl für Informatik 10 (Systemsimulation), FAU Erlangen-Nürnberg, 2013. C. Feichtinger et al. WaLBerla: HPC software design for computational engineering simulations. Journal of Computational Science, 2(2):105 112, 2011. A. Najafi and R. Golestanian. Simple swimmer at low Reynolds number: Three linked spheres. Phys. Rev. E, 69(6):062901, 2004. C. M. Pooley et al. Hydrodynamic interaction between two swimmers at low Reynolds number. Phys. Rev. Lett., 99:228103, 2007. Acknowledgments ParCo 2013, Munich kristina.pickl@fau.de FAU Erlangen-Nürnberg Parallel Simulations of Self-propelled Microorganisms 20