Applications of Lattice Boltzmann Methods Dominik Bartuschat, Martin Bauer, Simon Bogner, Christian Godenschwager, Florian Schornbaum, Ulrich Rüde Erlangen, Germany March 1, 2016 NUMET 2016
D.Bartuschat, M. Bauer, S. Bogner, C. Godenschwager, F. Schornbaum, U.Rüde Chair for System Simulation, FAU Erlangen-Nürnberg
Outline The walberla Simulation Framework The Lattice Boltzmann Method and Complex Flows Fluid-Particle Interaction Charged Particles in Fluid Flow Free Surface Flow 3
The walberla Simulation Framework
walberla Widely applicable Lattice Boltzmann framework. Suited for various flow applications. Large-scale, MPI-based parallelization. Dynamic application switches for heterogeneous architectures and optimization. 5
walberla Concepts Block concept: Domain partitioned into cartesian grid of blocks. Blocks can be assigned to different processes. Blocks contain: cell data, e.g. fluid density, electric potential. global information e.g. MPI rank, location. Communication concept: Simple communication mechanism on uniform grids, utilizing MPI. Ghost layers to exchange cell data with neighboring blocks. Sweep concept: Sweeps are work steps of a time-loop, performed on block-parallel level. Example: MG sweep, contains sub-sweeps (restriction, prolongation, smoothing). 6
The Lattice Boltzmann Method and flow in complex geometries
Lattice Boltzmann Method f q (x i + c q dt, t n + dt) f q (x i, t n )=dt C q (f q (x i, t n )) dt C q Discrete lattice Boltzmann equation with collision operator Note: two relaxation time (TRT) model used for most simulations. Domain discretized in cubes (cells). Discrete velocities cq and associated distribution functions fq per cell. D3Q19 model 8
Stream-Collide The equation is solved in two steps: Stream step: f q (x i + e q, t n + dt) = f q (x i, t n ) Collide step (SRT): f q (x i, t n )=f q (x i, t n ) 1 τ fq (x i, t n ) fq eq (x i, t n ) 9
Flow in Complex Geometries Flow through Coronary Arteries Sparse, but coherent geometry Volume fraction 0.3% Multiple blocks per process Load balancing required 10
Complex Geometry Initialization 11
Domain Partitioning Domain partitioning of coronary tree dataset Partitioning for aim of one block per process JUQUEEN nodeboard 512 processes, 485 blocks JUQUEEN, full machine 458 752 processes, 458 184 blocks Excellent scaling on JUQUEEN with TRT on up to 1 trillion lattice cells* * C. Godenschwager et al. A Framework for Hybrid Parallel Flow Simulations with a Trillion Cells in Complex Geometries (2013), doi:10.1145/2503210.2503273 12
Blood Flow and Perfusion Flow simulation in arteries + myocardium as porous medium modeled by means of LBM forcing term (SRT+Guo forcing*) Pressure boundary conditions at aorta and endocardium Blood flow visualization by particle tracing * Z. Guo, T. Zhao. Lattice Boltzmann model for incompressible flows through porous media (2002), doi:10.1103/physreve.66.036304 13
Perfusion Results 14
Fluid-Particle Interaction with LBM and tumbling spherocylinders in Stokes flow
Fluid-Particle Interaction 4-Way Coupling Particles mapped onto lattice Boltzmann grid Each lattice node with cell center inside object is treated as moving boundary Hydrodynamic forces of fluid on particle computed by momentum exchange method* * N. Nguyen, A. Ladd. Lubrication corrections for lattice-boltzmann simulations of particle suspensions (2002), doi:10.1103/physreve.66.046708 16
Moving Obstacle Treatment Influence of particle on fluid Particles act on fluid by moving wall boundary condition: f q (x f, t n + dt) = f q (x f, t n ) 2 ω q c 2 s ρ 0 c q u s Reconstruction of PDFs (f eq ) for new fluid cells dependent on uw:,- indicates opposing direction Influence of fluid on particle Hydrodynamic force on all particle surface cells s: F h = s q D s 2 f q (x f, t n ) 2 ω q c 2 s ρ 0 c q u s c q dx 3 dt cq: discrete lattice velocity ωq: lattice weight (s. LBM model) ρ0: fluid reference density Ds: set of directions from which cell s is accessed from fluid cells 17
Tumbling Spherocylinders in Stokes Flow Tumbling motion of elongated particles in Stokes flow Four spherocylinders in periodic domain, aspect ratio 1/ε = length radius = 12 LBM simulations with TRT operator and comparison against slender body formulation (examining influence of inertia, wall effects, and particle shape)* * D. Bartuschat et al. Two Computational Models for Simulating the Tumbling Motion of Elongated Particles in Fluids (2016), doi:10.1016/j.compfluid.2015.12.010 18
Two Tumbling Spherocylinders Motion dependent on aspect ratio 1/ε 250 10 6 ε =1/10 ε =1/12 ε =1/14 Flow Field around two spherocylinders,1/ε =12 200 x [m] 150 100 50 200 10 6 ux [m/s] uz [m/s] 100 0 100 200 2.2 2 1.8 1.6 1.4 1.2 10 3 0 5 10 15 20 25 t [s] Convergence to preferred distance xmax due to inertia Max. velocity uz for horizontal particle orientation, min. uz for vertical orientation Domain size: [576 dx] 3 Fluid: Water (T=20 C) Time steps: 600000 dx=4.98µm, dt=4.55 10-5 s, τ =6 Particle density: 1492 kg/m 3, radius = 4dx Runtime on LiMa: 16h, 768 cores 19
Charged Particles in Fluid Flow for particle-laden electrokinetic flows
Motivation Simulating separation (or agglomeration) of charged particles in micro-fluid flow, influenced by external electric fields Medical applications: Optimization of Lab-on-a-Chip systems: Separation of different cells Trapping cells and viruses Deposition of charged aerosol particles in respiratory tract (e.g. drug delivery) Industrial applications (agglomeration): Filtering particulates from exhaust gases Charged particle deposition in cooling systems of fuel cells Kang and Li Electrokinetic motion of particles and cells in microchannels Microfluidics and Nanofluidics 21
Multi-Physics Simulation electrostatic force Rigid body dynamics charge density Electro (quasi) statics object movement hydrodynamic force ion convection force on ions Fluid dynamics 22
Poisson Equation and Force on Particles Electric potential described by Poisson equation, with particle s charge density on RHS: Φ(x) = ρ particles (x) r 0 Discretized by finite volumes Solved with cell-centered multigrid solver implemented in walberla Subsampling for computing overlap degree to set RHS accordingly Electrostatic force on particle: F q = q particle Φ(x) 23
6-Way Coupling for Charged Particles charge distribution velocity BCs Finite volumes MG treat BCs V-cycle iter at. object motion hydrodynam. force LBM treat BCs stream-collide step electrostat. force Newtonian mechanics collision response object distance correction force Lubrication correction D. Bartuschat, U. Rüde. Parallel Multiphysics Simulations of Charged Particles in Microfluidic Flows (2015), doi:10.1016/j.jocs.2015.02.006 24
Charged Particles Algorithm foreach time step, do // solve Poisson equation with particle charge density set RHS of Poisson equation while residual too high do perform multigrid v-cycle to solve Poisson equation // solve lattice Boltzmann equation considering particle velocities begin perform stream step compute macroscopic variables perform collide step end // couple potential solver and LBM with pe begin apply hydrodynamic force to particles apply electrostatic force to particles pe moves particles depending on forces end 25
Charged Particles Multigrid Solver
Multigrid Iterative method for efficient solution of sparse linear systems Based on Smoothing principle: High-frequency error elimination by iterative solvers (e.g. GS) Coarse grid principle: Restriction to coarser grid transforms low-frequency error components to relative higher-frequency ones Smoothing on coarse grids. Prolongation of obtained correction terms to finer grid Applied recursively, V(νpre, νpost)-cycle 27
Cell-Centered Multigrid - Implementation All operations implemented as compact stencil operations Design goals: Efficient and robust black-box solver Handling complex boundary conditions on coarse levels Naturally extensible to jumping coefficients Method of choice: Galerkin coarsening (FV) Stencils stored for each unknown On finest level: quasi-constant stencils P1 P3 P2 P4 Averaging restriction, constant prolongation Preserves D3Q7 stencil on coarse grids Convergence rate deteriorates Workaround for Poisson problem: Overrelaxing prolongation* * Mohr, Wienands Cell-centered Multigrid Revisited, Comput. Vis. Sci. (2004) 28
Charged Particles Validation
Validation of Electric Potential Φ/V 0.3 0.2 0.1 Analytical solution for homogeneously charged particle: if r R 10 3 Φ(r) = 1 4πε e q e r 1 4πε e q e 2R 3 r R 2 Analytical solution Numerical solution Sphere surface if r <R Particle in 256 3 domain MG: 5 V(2,2)-cycles Dirichlet BCs: exact solution Relative error: in order of 10-3 0 0 50 100 150 200 250 x L Radius: 60 µm Charge: 8000 e Subsampling: factor 3 30
Validation of Electric Potential Determination of residual threshold: 10 5 10 6 Error Residual L2 norm 10 7 10 8 10 9 10 10 0 1 2 3 4 5 Iteration Error hardly reduced after residual norm smaller than 10-9 Residual threshold for simulations: 2 10-9 V(2,2) cycles Conv. rate 0.18 31
Charged Particles Results
Charged Particles in Fluid Flow 33
Charged Particles in Fluid Flow Computed on 144 cores (12 nodes) of RRZE - LiMa 132 600 time steps TRT LBM with optimal Poiseuille parameter 64 3 unknowns per core 6 MG levels Channel: 2.56 x 5.76 x 2.56 mm Dx=10µm, Dt=4 10-5 s, τ =1.7 Particle radius: 80µm Particle charge: ±10 5 e Inflow:1mm/s, Outflow:0Pa Other walls: No-slip BCs Potential: ±51.2V Else: homogen. Neumann BCs 34
Scaling Setup and Environment Weak scaling: Costant size per core: 128 3 cells 9.4% moving obstacle cells Size doubled (y-dimension) MG (Residal L2-Norm 2 10-9 ): V(3,3) with 7 levels 10 to 45 CG coarse-grid iterations Convergence rate: 0.07 2x4x2 cores per node Executed on LRZ s SuperMUC: 9216 compute nodes (thin islands), each: 2 Xeon "Sandy Bridge-EP" chips @2.5 GHz, 32 GB DDR3 RAM Infiniband interconnect Currently ranked #20 in TOP500 35
Weak Scaling for 240 Time Steps 400 Total runtimes/s 300 200 100 0 1 2 4 8 16 32 64 Number of nodes Overall parallel efficiency @2048 nodes: 83% 128 256 512 1024 2048 LBM Map Lubr HydrF pe MG SetRHS PtCm ElectF 32 768 cores 7.1M particles D. Bartuschat, U. Rüde. Parallel Multiphysics Simulations of Charged Particles in Microfluidic Flows (2015), doi:10.1016/j.jocs.2015.02.006 36
Mega fluid lattice site updates per sec. Weak Scaling for 240 Time Steps 10 3 MFLUPS (LBM) 90 80 70 60 50 40 30 20 10 0 LBM Perform. MG Perform. 0 250 500 750 1000 1250 1500 1750 2000 Number of nodes Parallel efficiency @2048 nodes: LBM 91 % MG - 1 V(3,3) 64 % MG performance restricted by coarsest-grid solving 120 100 80 60 40 20 10 3 MLUPS (MG) Mega lattice site updates per sec. 32 768 cores 7.1M particles D. Bartuschat, U. Rüde. Parallel Multiphysics Simulations of Charged Particles in Microfluidic Flows (2015), doi:10.1016/j.jocs.2015.02.006 37
Free Surface Flow and effects of surface tension
Free Surface Extension Volume of Fluid Approach* Only liquid phase has to be simulated** Cells are classified as either solid, liquid, gas or interface LBM only performed in fluid cells Suitable for phases with high density differences *C. Hirt, B. Nichols. Volume of Fluid (VOF) Method for the Dynamics of Free Boundaries (1981), doi:10.1016/0021-9991(81)90145-5 **C. Körner et al. Lattice Boltzmann Model for Free Surface Flow for Modeling Foaming (2005), doi:10.1007/s10955-005-8879-8 39
Free Surface Extension Geometry Reconstruction: calculation of interface normals compute normals at triple points compute local curvature to account for surface tension Surface Dynamics: non-free-surface boundary treatment free surface boundary treatment LBM streaming step ( advection ) fill level updates ( mass advection ) LBM collision step conversion of cell types 40
Drop on Inclined Plane kin. viscosity: surf. tension: drop diameter: dx: τ: dt: 7.9 10-6 m 2 /s 4.82 10-2 N/m 5 mm 100 cells 0.526 2.8 10-6 s SRT with Guo forcing term* * Z. Guo et al. Discrete lattice effects on the forcing term in the lattice Boltzmann method (2002), doi: 10.1103/PhysRevE.65. 046308 41
Domain Setup Restriction to L-shaped domain No inclined geometry, use inclined gravity instead: gravity Full free surface calculations are costly: find interface cells reconstruct interface normals and curvature mass advection cell conversions 42
Domain Setup Full free surface calculations are costly Only fraction of complete domain covered with fluid/interface Fluid covered region is moving Dynamic load balancing required: 43
Domain Setup Full free surface calculations are costly Only fraction of complete domain covered with fluid/interface Fluid covered region is moving Dynamic load balancing required Simulation on 200 cores (on Emmy, RRZE) Drop resolution of 100 cells Runtime between 1 and 2 hours 44
Boundary Setup No-slip boundary: solid wall, enforces zero velocity at boundary Pressure boundary: pressure Dirichlet (set to capillary pressure) 45
Varying Inclination Angle 15 30 45 Higher inclination causes drop to move further before being absorped 46
Varying Contact Angle 50 90 110 Higher contact angle causes drop to move further before being absorped 47
Free Surface Flow with Particles
Floating Objects S. Bogner, U. Rüde. Simulation of floating bodies with lattice Boltzmann (2012), doi:10.1016/j.camwa.2012.09.012 49
Rising Bubble S. Bogner, U. Rüde. Simulation of floating bodies with lattice Boltzmann (2012), doi:10.1016/j.camwa.2012.09.012 50
Thank you for your attention!