Numerical Solution of the Generalised Poisson Equation on Parallel Computers Diplomanden- und Dissertantenkolloquium Universität Wien Hannes Grimm-Strele Projektgruppe ACORE, Fakultät für Mathematik, Universität Wien April 15, 2010 Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 1 / 14
Introduction Subject of this talk Generalised Poisson Equation (GPE: Main questions considered: (κ u + cu = f on Ω, (1 κ(x κ 0 > 0, c(x 0 x Ω. (2 Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 2 / 14
Introduction Subject of this talk Generalised Poisson Equation (GPE: (κ u + cu = f on Ω, (1 κ(x κ 0 > 0, c(x 0 x Ω. (2 Main questions considered: Origin and practical relevance of GPE Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 2 / 14
Introduction Subject of this talk Generalised Poisson Equation (GPE: (κ u + cu = f on Ω, (1 κ(x κ 0 > 0, c(x 0 x Ω. (2 Main questions considered: Origin and practical relevance of GPE Importance of parallel computing Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 2 / 14
Introduction Subject of this talk Generalised Poisson Equation (GPE: (κ u + cu = f on Ω, (1 κ(x κ 0 > 0, c(x 0 x Ω. (2 Main questions considered: Origin and practical relevance of GPE Importance of parallel computing Approach to numerical solution on parallel computers Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 2 / 14
Heat conduction Physical Background Assumption: Particles move according to Brownian motion. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 3 / 14
Heat conduction Physical Background Assumption: Particles move according to Brownian motion. Fourier s law: Energy flux goes from regions of high temperature to regions of low temperature. J = κ T (3 T temperature κ heat conductivity of the material (> 0 e energy J energy flux Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 3 / 14
Heat conduction Physical Background Assumption: Particles move according to Brownian motion. Fourier s law: Energy flux goes from regions of high temperature to regions of low temperature. J = κ T (3 T temperature κ heat conductivity of the material (> 0 e energy J energy flux Conservation of energy e t = J Stationarity e t = 0 J = (κ T = 0 (4 Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 3 / 14
Euler Equations 1/2 Physical Background The Euler Equations govern the dynamics of flow without friction. Mass and momentum equations in two dimensions: ( ρu ρv t ( ρu 2 + ρuv x ( ρuv + ρuv 2 ρ t + (ρ u = 0, (5 ( px + = 0, (6 y p y ρ u = (u, v p mass density velocity field pressure Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 4 / 14
Euler Equations 1/2 Physical Background The Euler Equations govern the dynamics of flow without friction. Mass and momentum equations in two dimensions: ( ρu ρv t ( ρu 2 + ρuv x ( ρuv + ρuv 2 ρ t + (ρ u = 0, (5 ( px + = 0, (6 y p y ρ u = (u, v p mass density velocity field pressure Incompressibility assumption: Mass density does not change along a trajectory. Flow is incompressible (5 u = 0. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 4 / 14
Euler Equations 2/2 Physical Background Set (ρ u = (ρ u n t ( ( ρu 2 ρuv x ( ρuv + ρuv 2 y Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 5 / 14
Euler Equations 2/2 Physical Background Set (ρ u = (ρ u n t ( ( ρu 2 ρuv x ( ρuv + ρuv 2 y Euler forward step (ρ u t (ρ un+1 (ρ u t = p Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 5 / 14
Euler Equations 2/2 Physical Background Set (ρ u = (ρ u n t ( ( ρu 2 ρuv x ( ρuv + ρuv 2 y Euler forward step (ρ u t (ρ un+1 (ρ u t = p Divide by ρ n+1 = ρ un+1 u t = p ρ n+1 Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 5 / 14
Physical Background Euler Equations 2/2 Set (ρ u = (ρ u n t ( ( ρu 2 ρuv x ( ρuv + ρuv 2 y Euler forward step (ρ u t (ρ un+1 (ρ u t = p Divide by ρ n+1 = ρ un+1 u t = p ρ n+1 ( Take divergence u n+1 = u t p ρ n+1 Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 5 / 14
Physical Background Euler Equations 2/2 Set (ρ u = (ρ u n t ( ( ρu 2 ρuv x ( ρuv + ρuv 2 y Euler forward step (ρ u t (ρ un+1 (ρ u t = p Divide by ρ n+1 = ρ un+1 u t = p ρ n+1 ( Take divergence u n+1 = u t p ρ n+1 ( Incompressible flow p = u ρ n+1 t Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 5 / 14
Parallel Computing Why parallel computing is important Simulating 3s of the solar granulation with ANTARES requires approx. 5000 CPUh and 1.1 TB memory. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 6 / 14
Parallel Computing Example: Deutsches Klimarechenzentrum http://www.dkrz.de/dkrz/science/why SC1: Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 7 / 14
Parallel Computing Example: Deutsches Klimarechenzentrum http://www.dkrz.de/dkrz/science/why SC1: Die globale Umwelt und insbesondere das Klima sind äußerst komplexe Systeme, deren Dynamik und künftige Entwicklung nur mit weitreichenden Untersuchungen und aufwendigen Modellrechnungen erfasst werden können. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 7 / 14
Parallel Computing Example: Deutsches Klimarechenzentrum http://www.dkrz.de/dkrz/science/why SC1: Die globale Umwelt und insbesondere das Klima sind äußerst komplexe Systeme, deren Dynamik und künftige Entwicklung nur mit weitreichenden Untersuchungen und aufwendigen Modellrechnungen erfasst werden können. Vier Gründe, warum wir auf Höchstleistungsrechner angewiesen sind: Komplexität des Erdsystems Modellauflösung Ensemble -Rechnungen Integrationen über viele Jahrhunderte Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 7 / 14
Parallel Computing Example: Deutsches Klimarechenzentrum http://www.dkrz.de/dkrz/science/why SC1: Die globale Umwelt und insbesondere das Klima sind äußerst komplexe Systeme, deren Dynamik und künftige Entwicklung nur mit weitreichenden Untersuchungen und aufwendigen Modellrechnungen erfasst werden können. Vier Gründe, warum wir auf Höchstleistungsrechner angewiesen sind: Komplexität des Erdsystems Modellauflösung Ensemble -Rechnungen Integrationen über viele Jahrhunderte HPC blizzard : 8448 cores, 158 TFlops, 20 TB memory Cp: VSC 3488 cores, 35.5 TFlops, 11.2 TB memory Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 7 / 14
Parallel Computing Example: Sauber Motorsport http://www.sauber-motorsport.com/index.cfm?pageid=70: CFD (Computational Fluid Dynamics spielt insbesondere bei der Entwicklung von Front- und Heckflügeln sowie auch bei der Motor- und Bremskühlung eine wichtige Rolle. Seit dem Frühjahr 2008 ist Albert3 in Betrieb. Die jüngste Ausbaustufe verfügt über insgesamt 4224 Prozessorkerne. [...] Der Arbeitsspeicher wuchs auf 8448 GByte und die maximale Rechenleistung auf 50,7 TFlops. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 8 / 14
Why not just wait? Parallel Computing Moore sches Gesetz: Die Anzahl an Transistoren auf einem handelsüblichen Prozessor verdoppelt sich alle achtzehn Monate. Quelle: Der Spiegel 11/2005, S. 174 184 Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 9 / 14
Parallel Computing The concept of parallel programming Decomposition and distribution of data and work (e.g. by domain decomposition Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 10 / 14
Parallel Computing The concept of parallel programming Decomposition and distribution of data and work (e.g. by domain decomposition Synchronisation and communication Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 10 / 14
Parallel Computing The concept of parallel programming Decomposition and distribution of data and work (e.g. by domain decomposition Synchronisation and communication Criteria for good parallelisation: load balancing and optimal speedup (Amdahl s law Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 10 / 14
Parallel Computing The concept of parallel programming Decomposition and distribution of data and work (e.g. by domain decomposition Synchronisation and communication Criteria for good parallelisation: load balancing and optimal speedup (Amdahl s law Programming models: Message Passing Interface (MPI and OpenMP Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 10 / 14
Numerical Methods Finite Element Method (FEM Using FEM, GPE is transformed to the discrete system Au h = b, where u h is an approximate solution for u. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 11 / 14
Numerical Methods Finite Element Method (FEM Using FEM, GPE is transformed to the discrete system Au h = b, where u h is an approximate solution for u. The approximation error can be controlled by the choice of the finite-dimensional ansatz space. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 11 / 14
Numerical Methods Finite Element Method (FEM Using FEM, GPE is transformed to the discrete system Au h = b, where u h is an approximate solution for u. The approximation error can be controlled by the choice of the finite-dimensional ansatz space. Most times: space of linear splines on Ω. Figure: Basis function for the space of linear splines (2D. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 11 / 14
Numerical Methods Finite Element Method (FEM Using FEM, GPE is transformed to the discrete system Au h = b, where u h is an approximate solution for u. The approximation error can be controlled by the choice of the finite-dimensional ansatz space. Most times: space of linear splines on Ω. Figure: Basis function for the space of linear splines (2D. A is always symmetric and positive definite. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 11 / 14
Numerical Methods Preconditioned Conjugate Gradient Algorithm (PCG Iterative algorithm to invert linear systems Au h = b. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 12 / 14
Numerical Methods Preconditioned Conjugate Gradient Algorithm (PCG Iterative algorithm to invert linear systems Au h = b. Converges if A is symmetric and positive definite. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 12 / 14
Numerical Methods Preconditioned Conjugate Gradient Algorithm (PCG Iterative algorithm to invert linear systems Au h = b. Converges if A is symmetric and positive definite. Convergence speed depends on κ(a. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 12 / 14
Numerical Methods Preconditioned Conjugate Gradient Algorithm (PCG Iterative algorithm to invert linear systems Au h = b. Converges if A is symmetric and positive definite. Convergence speed depends on κ(a. κ(a can be lowered by preconditioning (e.g. Incomplete Cholesky Decomposition. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 12 / 14
Numerical Methods Schur Complement Method 1/2 If the matrix A is of the following form: ( A1,1 A A = 1,2 A 2,1 A 2,2 with certain submatrices A 1,1, A 1,2, A 2,1 and A 2,2 such that A 1,1 is invertible, one can define the Schur Complement S A by S A = A 2,2 A 2,1 A 1 1,1 A 1,2. (7 Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 13 / 14
Numerical Methods Schur Complement Method 1/2 If the matrix A is of the following form: ( A1,1 A A = 1,2 A 2,1 A 2,2 with certain submatrices A 1,1, A 1,2, A 2,1 and A 2,2 such that A 1,1 is invertible, one can define the Schur Complement S A by Lemma S A = A 2,2 A 2,1 A 1 1,1 A 1,2. (7 If A is symmetric and positive definite, then S A defined by (7 is also symmetric and positive definite. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 13 / 14
Numerical Methods Schur Complement Method 2/2 The system Au = b is equivalent to ( ( ( ( I 0 A1,1 A 1,2 u1 b1 A 2,1 A 1 =. 1,1 I 0 S A u 2 b 2 By the PCG algorithm these systems can be solved subsequently. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 14 / 14
Numerical Methods Schur Complement Method 2/2 The system Au = b is equivalent to ( I 0 A 2,1 A 1 1,1 I ( A1,1 A 1,2 0 S A ( u1 u 2 = ( b1 By the PCG algorithm these systems can be solved subsequently. This can be done in parallel, if A 1,1 has block structure (can be achieved by reordering and renumbering of the nodes: A 1,1 = D 1 0... 0 D p. Then A 1,2 = (B 1,..., B p T, A 2,1 = (C 1,..., C p, A 2,2 = i E i. S A = p i=1 (E i C i D 1 i B i. Hannes Grimm-Strele (ACORE Parallel Solution of GPE April 15, 2010 14 / 14 b 2.