Numerical Modelling in Fortran: day 8. Paul Tackley, 2017

Similar documents
Numerical Modelling in Fortran: day 10. Paul Tackley, 2016

Numerical Modelling in Fortran: day 9. Paul Tackley, 2017

Turbulence in geodynamo simulations

PDE Solvers for Fluid Flow

Two-Dimensional Unsteady Flow in a Lid Driven Cavity with Constant Density and Viscosity ME 412 Project 5

Case Study: Numerical Simulations of the Process by which Planets generate their Magnetic Fields. Stephan Stellmach UCSC

Multigrid Methods and their application in CFD

Zonal modelling approach in aerodynamic simulation

An Overview of Fluid Animation. Christopher Batty March 11, 2014

Numerical Methods in Geophysics. Introduction

Chapter 5. Methods for Solving Elliptic Equations

ES265 Order of Magnitude Phys & Chem Convection

Numerical modelling of phase change processes in clouds. Challenges and Approaches. Martin Reitzle Bernard Weigand

Game Physics. Game and Media Technology Master Program - Utrecht University. Dr. Nicolas Pronost

Three-dimensional numerical simulations of thermo-chemical multiphase convection in Earth s mantle Takashi Nakagawa a, Paul J.

Numerical methods for the Navier- Stokes equations

Fluid Animation. Christopher Batty November 17, 2011

Diffusion / Parabolic Equations. PHY 688: Numerical Methods for (Astro)Physics

, 孙世玮 中国科学院大气物理研究所,LASG 2 南京大学, 大气科学学院

Implementation of 3D Incompressible N-S Equations. Mikhail Sekachev

Soft Bodies. Good approximation for hard ones. approximation breaks when objects break, or deform. Generalization: soft (deformable) bodies

Open boundary conditions in numerical simulations of unsteady incompressible flow

Math background. Physics. Simulation. Related phenomena. Frontiers in graphics. Rigid fluids

10. Buoyancy-driven flow

CHAPTER 7 SEVERAL FORMS OF THE EQUATIONS OF MOTION

Computational Seismology: An Introduction

A Hybrid Method for the Wave Equation. beilina

3 Generation and diffusion of vorticity

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline

Vorticity and Dynamics

fluid mechanics as a prominent discipline of application for numerical

Pressure-velocity correction method Finite Volume solution of Navier-Stokes equations Exercise: Finish solving the Navier Stokes equations

ARTICLE IN PRESS. Physics of the Earth and Planetary Interiors xxx (2008) xxx xxx. Contents lists available at ScienceDirect

FEniCS Course. Lecture 6: Incompressible Navier Stokes. Contributors Anders Logg André Massing

Parallel Programming & Cluster Computing Transport Codes and Shifting Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa

Colloquium FLUID DYNAMICS 2012 Institute of Thermomechanics AS CR, v.v.i., Prague, October 24-26, 2012 p.

Study of Forced and Free convection in Lid driven cavity problem

Theoretical Geomagnetism. Lecture 3. Core Dynamics I: Rotating Convection in Spherical Geometry

Project 4: Navier-Stokes Solution to Driven Cavity and Channel Flow Conditions

Kasetsart University Workshop. Multigrid methods: An introduction

General Curvilinear Ocean Model (GCOM): Enabling Thermodynamics

Basic Aspects of Discretization

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

Numerically Solving Partial Differential Equations

Dan s Morris s Notes on Stable Fluids (Jos Stam, SIGGRAPH 1999)

Thermal Analysis Contents - 1

Earth System Modeling Domain decomposition

Block-Structured Adaptive Mesh Refinement

The Fast Multipole Method in molecular dynamics

Dedalus: An Open-Source Spectral Magnetohydrodynamics Code

Convection-driven dynamos in the limit of rapid rotation

Chapter 5. The Differential Forms of the Fundamental Laws

OpenFOAM selected solver

Application of Chimera Grids in Rotational Flow

Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems

Scientific Computing I

Chapter 9: Differential Analysis

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Modeling, Simulating and Rendering Fluids. Thanks to Ron Fediw et al, Jos Stam, Henrik Jensen, Ryan

Workshop on Geophysical Data Analysis and Assimilation. 29 October - 3 November, Modelling Mantle Convection and Plate Tectonics

Computational Techniques Prof. Sreenivas Jayanthi. Department of Chemical Engineering Indian institute of Technology, Madras

Chapter 9: Differential Analysis of Fluid Flow

Numerical Solution Techniques in Mechanical and Aerospace Engineering

Computational Astrophysics

Accelerating incompressible fluid flow simulations on hybrid CPU/GPU systems

2.29 Numerical Fluid Mechanics Fall 2011 Lecture 7

Chapter 9 Implicit integration, incompressible flows

Mathematical Concepts & Notation

Dual Reciprocity Boundary Element Method for Magma Ocean Simulations

Unravel Faults on Seismic Migration Images Using Structure-Oriented, Fault-Preserving and Nonlinear Anisotropic Diffusion Filtering

Needs work : define boundary conditions and fluxes before, change slides Useful definitions and conservation equations

Anisotropic turbulence in rotating magnetoconvection

The Shallow Water Equations

Rapidly Rotating Rayleigh-Bénard Convection

Natural convection adjacent to a sidewall with three fins in a differentially heated cavity

Generation of magnetic fields by large-scale vortices in rotating convection

High performance mantle convection modeling

Numerical Methods of Applied Mathematics -- II Spring 2009

Stabilization and Acceleration of Algebraic Multigrid Method

Entropy 2011, 13, ; doi: /e OPEN ACCESS. Entropy Generation at Natural Convection in an Inclined Rectangular Cavity

Schwarz-type methods and their application in geomechanics

V (r,t) = i ˆ u( x, y,z,t) + ˆ j v( x, y,z,t) + k ˆ w( x, y, z,t)

Simulation Study on the Generation and Distortion Process of the Geomagnetic Field in Earth-like Conditions

Additive Manufacturing Module 8

Simulations of Fluid Dynamics and Heat Transfer in LH 2 Absorbers

Finite volume method for CFD

Modelling of turbulent flows: RANS and LES

Lecture 3: The Navier-Stokes Equations: Topological aspects

DETAILS ABOUT THE TECHNIQUE. We use a global mantle convection model (Bunge et al., 1997) in conjunction with a

Basic hydrodynamics. David Gurarie. 1 Newtonian fluids: Euler and Navier-Stokes equations

Adaptive C1 Macroelements for Fourth Order and Divergence-Free Problems

arxiv: v2 [math.na] 21 Apr 2014

Conservation of Mass. Computational Fluid Dynamics. The Equations Governing Fluid Motion

Convective Vaporization and Burning of Fuel Droplet Arrays

Chapter 1. Continuum mechanics review. 1.1 Definitions and nomenclature

SWE Anatomy of a Parallel Shallow Water Code

UNIVERSITY of LIMERICK

The Performance Evolution of the Parallel Ocean Program on the Cray X1

High-Performance Scientific Computing

Lattice Boltzmann Method for Fluid Simulations

Transcription:

Numerical Modelling in Fortran: day 8 Paul Tackley, 2017

Today s Goals 1. Introduction to parallel computing (applicable to Fortran or C; examples are in Fortran) 2. Finite Prandtl number convection

Motivation: To model the Earth, need a huge number of grid points / cells /elements! e.g., to fill mantle volume: (8 km) 3 cells -> 1.9 billion cells (2 km) 3 cells -> 123 billion cells

Huge problems => huge computer www.top500.org

Huge problems => huge computer www.top500.org

Progress: iphone > fastest computer in 1976 (cost: $8 million) (photo taken at NCAR museum)

In Switzerland Each node: 12-core Intel CPU + GPU Piz Dora Each node: 2* 18-core Intel CPU

Shared memory: several cpus (or cores) share the same memory. Parallelisation can often be done by the compiler (sometimes with help, e.g., OpenMP instructions in the code) Distributed memory: each cpu has its own memory. Parallelisation usually requires message-passing, e.g. using MPI (messagepassing interface)

A brief history of supercomputers 1983-5 4 CPUs, Shared memory 1991: 512 CPUs, distributed memory 2010: 224,162 Cores, distributed + shared memory (12 cores per node)

Another possibility: build you own ( Beowulf cluster) Using standard PC cases: or using rack-mounted cases

MPI: message-passing interface A standard library for communicating between different tasks (cpus) Pass messages (e.g., arrays) Global operations (e.g., sum, maximum) Tasks could be on different cpus/cores of the same node, or on different nodes Works with Fortran and C Works on everything from a laptop to the largest supercomputers. 2 versions are: http://www.mcs.anl.gov/research/projects/mpic h2/ http://www.open-mpi.org/

How to parallelise a code: worked example

Example: Scalar Poisson eqn. 2 u = f 1 h 2 Finite-difference approximation: ( u i +1 jk + u i 1 jk + u ij +1k + u ij 1k + u ijk +1 + u ijk +1 6u ) i, j = f ij Use iterative approach=>start with u=0, sweep through grid updating u values according to: u ij n +1 = h 2 u n ij + αr ij 6 Where Rij is the residue ( error ): R = 2 u f

Code

Parallelisation: domain decomposition Single CPU 8 CPUs CPU 0 CPU 1 CPU 2 CPU 4 CPU 3 CPU 5 CPU 6 CPU 7 Each CPU will do the same operations but on different parts of the domain

You need to build parallelization into the code using MPI Any scalar code will run on multiple CPUs, but will produce the same result on each CPU. Code must first setup local grid in relation to global grid, then handle communication Only a few MPI calls needed: Init. (MPI_init,MPI_com_size,MPI_com_rank) Global combinations (MPI_allreduce) CPU-CPU communication (MPI_send,MPI_recv )

Boundaries When updating points at edge of subdomain, need values on neighboring subdomains Hold copies of these locally using ghost points This minimizes #of messages, because they can be updated all at once instead of individually =ghost points

Scalar Grid Red=boundary points (=0) Yellow=iterated/solved (1 n-1)

Parallel grids Red=ext. boundaries Green=int. boundaries Yellow=iterated/solved

First things the code has to Call MPI_init(ierr) do: Find #CPUs using MPI_com_size Find which CPU it is, using MPI_com_rank (returns a number from 0 #CPUs-1) Calculate which part of the global grid it is dealing with, and which other CPUs are handling neighboring subdomains.

Example: Hello world program

Moving forward Update values in subdomain using ghost points as boundary condition, i.e., Timestep (explicit), or Iteration (implicit) Update ghost points by communicating with other CPUs Works well for explicit or iterative approaches

Boundary communication Step 1: x-faces Step 2: y-faces (including corner values from step 1) [Step 3: z-faces (including corner values from steps 1 & 2)] Doing the 3 directions sequentially avoids the need for additional messages to do edges & corners (=>in 3D, 6 messages instead of 26)

Main changes Parallelisation hidden in set_up_parallelisation and update_sides Many new variables to store parallelisation information Loop limits depend on whether global domain boundary or local subdomain

Simplest communication Not optimal uses blocking send/receive

Better: using non-blocking (isend/irecv)

Performance: theoretical analysis

How much time is spent communicating? Computation time volume (N x^3) Communication time surf. area (N x^2) =>Communication/Computation 1/N x =>Have as many points/cpu as possible!

Is it better to split 1D, 2D or 3D? E.g., 256x256x256 points on 64 CPUs 1D split: 256x256x4 points/cpu Area=2x(256x256)=131,072 2D split: 256x32x32 points/cpu Area=4x(256x32)=32,768 3D split:64x64x64 points/cpu Area=6x(64x64)=24,576 =>3D best but more messages needed

Model code performance (Time per step or iteration) Computation : t=an 3 Communication : t=nl+bn 2 /B (L=Latency, B=bandwidth) TOTAL : t= an 3 +nl+bn 2 /B

Example: Scalar Poisson equation 2 u = f t= an 3 +nl+bn 2 /B Assume 15 operations/point/iteration & 1 Gflop performance Þa=15/1e9=1.5e-8 If 3D decomposition, n=6, b=6*4 (single precision) Gigabit ethernet: L=40e-6 s, B=100 MB/s Quadrics: L=2e-6 s, B=875 MB/s

Time/iteration vs. #cpus Quadrics, Gonzales-size cluster

Up to 2e5 CPUs (Quadrics communication)

Efficiency

Now multigrid V cycles Smooth 32x32x32 Smooth 16x16x16 Smooth 8x8x8 Residues (=error) corrections Exact solution 4x4x4

Application to StagYY Cartesian or spherical

StagYY iterations: 3D Cartesian Change in scaling from same-node to cross-node communication Simple-minded multigrid: Very inefficient coarse levels! Exact coarse solution can take long time!

New treatment: follow minima Keep #points/core > minimum (tuned for system) Different for onnode and cross-node communication

Multigrid now (& before): yin-yang 1.8 billion

Summary For very large-scale problems, need to parallelise code using MPI For finite-difference codes, the best method is to assign different parts of the domain to different CPUs ( domain decomposition ) The code looks similar to before, but with some added routines to take care of communication Multigrid scales fine on 1000s CPUs if: Treat coarse grids on subsets of CPUs Large enough total problem size

For more information https://computing.llnl.gov/tutorials/parall el_comp/ http://en.wikipedia.org/wiki/parallel_com puting http://www.mcs.anl.gov/~itf/dbpp/ http://en.wikipedia.org/wiki/message_pa ssing_interface

Programming: Finite Prandtl number convection (i.e., almost any fluid) Ludwig Prandtl (1875-1953)

Values of the Prandt number Pr Viscous diffusivity Pr = ν κ Thermal diffusivity Liquid metals: 0.004-0.03 Air: 0.7 Water: 1.7-12 Rock: ~10 24!!! (effectively infinite)

Finite-Prandtl number convection Existing code assumes infinite Prandtl number also known as Stokes flow appropriate for highly-viscous fluids like rock, honey etc. Fluids like water, air, liquid metal have a lower Prandtl number so equations must be modified

Applications for finite Pr Outer core (geodynamo) Atmosphere Ocean Anything that s not solid like the mantle

Equations Conservation of mass (= continuity ) Conservation of momentum ( Navier- Stokes equation: F=ma for a fluid) Conservation of energy Claude Navier (1785-1836) Sir George Stokes (1819-1903)

Finite Pr Equations Navier-Stokes equation: F=ma for a fluid Coriolis force ρ v t + v v = P + ρν 2 v + 2ρΩ v + gραtŷ ma Valid for constant viscosity only continuity and energy equations same as before T t + v T = κ 2 T + Q v = 0 ρ=density, ν=kinematic viscosity, g=gravity, α=thermal expansivity

Non-dimensionalise the equations Reduces the number of parameters Makes it easier to identify the dynamical regime Facilitates comparison of systems with different scales but similar dynamics (e.g., analogue laboratory experiments compared to core or mantle)

Non-dimensionalise to thermal diffusion scales Lengthscale D (depth of domain) Temperature scale (T drop over domain) Time to Velocity to Stress to D 2 /κ κ / D ρνκ / D 2

Nondimensional equations v = 0 T t + v T = 2 T 1 Pr v t + v v = P + 2 v + 1 Ek Ω v + Ra.Tˆ y Pr = ν Ek = κ ν 2ΩD 2 Ra = gα TD 3 νκ Prandtl number Ekman number Rayleigh number

As before, use streamfunction v x = ψ y v y = ψ x Also simplify by assuming 1/Ek=0

Eliminating pressure Take curl of 2D momentum equation: curl of grad=0, so pressure disappears Replace velocity by vorticity: ω = v in 2D only one component of vorticity is needed (the one perpendicular to the 2D plane), 2 ψ = ω z 1 Pr ω t + v x ω x + v y ω y = 2 ω Ra T x

=> the streamfunction-vorticity formulation 1 Pr ω t + v x ω x + v y ω y = 2 ω Ra T x 2 ψ = ω ( v,v ) = ψ x y y, ψ x T t + v T = 2 T + Q

Note: Effect of high Pr 1 Pr ω t + v x ω x + v y ω y = 2 ω Ra T x If Pr->infinity, left-hand-side=>0 so equation becomes Poisson like before: 2 ω = Ra T x

Taking a timestep (i) Calculate ψ from ω using: 2 ψ = ω (ii) Calculate v from ψ ( v x,v ) y = ψ ψ, y x (iii) Time-step ω and T using explicit finite differences: T t = v x ω t = v x T x v y ω x v y T y + 2 T ω y + Pr 2 ω RaPr T x

T time step is the same as before T new T old Δt = v x T old x v y T old y + 2 T old T T new = T old + Δt 2 T old v old x x v y T old y w must now be time stepped in a similar way ω new ω old Δt = v x ω old x v y ω old y + Pr 2 ω old RaPr T old x ω ω new = ω old + Δt Pr 2 ω old v old x x v y ω old y RaPr T old x

Stability condition Diffusion: Advection: dt diff = a diff h 2 dt adv = a adv min max(pr,1) h maxval(abs(vx)), h max val(abs(vy)) Combined: dt = min(dt diff, dt adv )

Modification of previous convection program Replace Poisson calculation of w with timestep, done at the same time as T time-step Get a compiling code! Make sure it is stable and convergent for values of Pr between 0.01 and 1e2 Hand in your code, and your solutions to the test cases in the following slides Due date: 18 December (2 weeks from today)

Test cases All have nx=257, ny=65, Ra=1e5, total_time=0.1, and random initial T and w fields, unless otherwise stated Due to random start, results will not look exactly as these, but they should look similar (i.e.. width of upwellings & downwellings & boundary layers similar, but number and placement of upwellings/downwellings different).

Pr=10

Pr=1

Pr=0.1

Pr=0.01

Pr=0.01, time=1.0

Pr=0.1, Ra=1e7