Integration of PETSc for Nonlinear Solves

Similar documents
Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools

Linear Solvers. Andrew Hazel

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems

Lecture 17: Iterative Methods and Sparse Linear Algebra

Using PETSc Solvers in PyLith

Using PETSc Solvers in PyLith

Progress in Parallel Implicit Methods For Tokamak Edge Plasma Modeling

NIMROD Project Overview

Preface to the Second Edition. Preface to the First Edition

Lecture 8: Fast Linear Solvers (Part 7)

Implicit Solution of Viscous Aerodynamic Flows using the Discontinuous Galerkin Method

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Robust solution of Poisson-like problems with aggregation-based AMG

High Performance Nonlinear Solvers

Solving PDEs with Multigrid Methods p.1

Enhancing Scalability of Sparse Direct Methods

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Solving Ax = b, an overview. Program

Parallel sparse linear solvers and applications in CFD

Elmer. Introduction into Elmer multiphysics. Thomas Zwinger. CSC Tieteen tietotekniikan keskus Oy CSC IT Center for Science Ltd.

Parallel Discontinuous Galerkin Method

hypre MG for LQFT Chris Schroeder LLNL - Physics Division

PETSc for Python. Lisandro Dalcin

Acceleration of Time Integration

Multigrid solvers for equations arising in implicit MHD simulations

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

Finite Element Solver for Flux-Source Equations

Applied Math Algorithms in FACETS

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

Computers and Mathematics with Applications

Solving Large Nonlinear Sparse Systems

Contents. Preface... xi. Introduction...

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

Parallelism in FreeFem++.

Implementation of Generalized Minimum Residual Krylov Subspace Method for Chemically Reacting Flows

Indefinite and physics-based preconditioning

Adaptive algebraic multigrid methods in lattice computations

FAS and Solver Performance

Chapter 9 Implicit integration, incompressible flows

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

1. Fast Iterative Solvers of SLE

Domain decomposition on different levels of the Jacobi-Davidson method

Stabilization and Acceleration of Algebraic Multigrid Method

A User Friendly Toolbox for Parallel PDE-Solvers

Newton s Method and Efficient, Robust Variants

Algebraic Multigrid as Solvers and as Preconditioner

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

Solving PDEs with CUDA Jonathan Cohen

What s New in Isorropia?

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu

Review: From problem to parallel algorithm

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

Fundamentals Algorithms for Linear Equation Solution. Gaussian Elimination LU Factorization

A Linear Multigrid Preconditioner for the solution of the Navier-Stokes Equations using a Discontinuous Galerkin Discretization. Laslo Tibor Diosady

arxiv: v1 [math.na] 7 May 2009

IMPLEMENTATION OF A PARALLEL AMG SOLVER

Introduction to Compact Dynamical Modeling. II.1 Steady State Simulation. Luca Daniel Massachusetts Institute of Technology. dx dt.

Kasetsart University Workshop. Multigrid methods: An introduction

Aggregation-based algebraic multigrid

Domain-decomposed Fully Coupled Implicit Methods for a Magnetohydrodynamics Problem

Improvements for Implicit Linear Equation Solvers

M.A. Botchev. September 5, 2014

2 CAI, KEYES AND MARCINKOWSKI proportional to the relative nonlinearity of the function; i.e., as the relative nonlinearity increases the domain of co

c 2015 Society for Industrial and Applied Mathematics

Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners

Incomplete Cholesky preconditioners that exploit the low-rank property

Nonlinear Iterative Solution of the Neutron Transport Equation

Preconditioning techniques to accelerate the convergence of the iterative solution methods

The solution of the discretized incompressible Navier-Stokes equations with iterative methods

Recent advances in sparse linear solver technology for semiconductor device simulation matrices

Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems

Lecture 18 Classical Iterative Methods

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems

Using an Auction Algorithm in AMG based on Maximum Weighted Matching in Matrix Graphs

New Streamfunction Approach for Magnetohydrodynamics

Lab 1: Iterative Methods for Solving Linear Systems

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

Numerical Modelling in Fortran: day 10. Paul Tackley, 2016

Fast solvers for steady incompressible flow

A Novel Aggregation Method based on Graph Matching for Algebraic MultiGrid Preconditioning of Sparse Linear Systems

Hybrid Kinetic-MHD simulations with NIMROD

Course Notes: Week 1

A finite element solver for ice sheet dynamics to be integrated with MPAS

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Multilevel spectral coarse space methods in FreeFem++ on parallel architectures

Algebraic Multigrid Preconditioners for Computing Stationary Distributions of Markov Processes

Fine-grained Parallel Incomplete LU Factorization

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria

AMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, 1.

A Multiscale Mortar Method And Two-Stage Preconditioner For Multiphase Flow Using A Global Jacobian Approach

Sparse Matrices and Iterative Methods

An Empirical Comparison of Graph Laplacian Solvers

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

Review for Exam 2 Ben Wang and Mark Styczynski

Implementation and Comparisons of Parallel Implicit Solvers for Hypersonic Flow Computations on Unstructured Meshes

Lehrstuhl für Informatik 10 (Systemsimulation)

A Numerical Study of Some Parallel Algebraic Preconditioners

Transcription:

Integration of PETSc for Nonlinear Solves Ben Jamroz, Travis Austin, Srinath Vadlamani, Scott Kruger Tech-X Corporation jamroz@txcorp.com http://www.txcorp.com NIMROD Meeting: Aug 10, 2010 Boulder, CO

Outline Overview of PETSc Motivation for using PETSc - JFNK PETSc Interface to direct Solvers Complex Unrolling Preconditioning: Direct Solvers Preconditioning: AMG To do

PETSc: A scalable solver package Portable, Extensible Toolkit for Scientific Computation Argonne National Lab SNES,KSP,PC SNES: Nonlinear Solver Package KSP: Krylov Subspace Solver Packages PC: Preconditioner Packages Jacobian-Free Newton-Krylov Formulation Interface to many external packages SuplerLU_dist MUMPS HYPRE Command line control over solver options Saves coding Krylov methods, compiling Real OR Complex build

PETSc used as a framework for Fully Implicit JFNK Nonlinear Solve Used SNES Functional F(u) = 0 Used to approximate Jacobian Preconditioner

Physics-based Preconditioning Following Chacon (2008) apply LDU on 2x2 matrix and invert wher e Approximate with Where is the ideal MHD operator which contains all of the wave propagation information P sf, matrices already exists in NIMROD (2D)

SNES Implementation

SNES Implementation Requires Subroutines to Evaluate Functional Apply Preconditioner Test/Create fresh Preconditioning matrices Vector Transfer operators NIMROD cvector types to Flat vectors Only Investigated Axisymmetric problems -> Real Matrices Work in progress

Investigate PETSc interfaces with external solvers NIMROD currently interfaces with SuperLU Application of the Preconditioner within GMRES/CG iterations Investigate the speed/robustness of other solves PETSc Command line options allow fast/easy switching of nonlinear solver methods/preconditioners -ksp_type gmres pc_type ilu External LU solver as a preconditioner -ksp_type preonly pc_type lu pc_factor_mat_solver_package xxx superlu_dist, mumps, spooles, pastix, Package Preconditioner matrix into PETSc format (CSR) Similar to format needed for SuperLU

Complex Unrolling Wastes Memory PETSc can only be built with Real OR Complex data NIMROD: Perfect Storm Both Real and Complex data Link to Real PETSc Real systems (Srinath V.) Complex systems form equivalent real system (unroll) Complex Unrolling 4x size of matrix size 2x amount of memory Seems to be the industry standard

Complex Unrolling Parallel CSR PETSc represents sparse matrices in Compressed Sparse Row (CSR) On proc/off Proc partitioning Required for efficient communication 2 CSRs for each processor block row On Proc data Off Proc data Column indices must be sorted per row Merge sort routine O(n log(n)), n = nonzeros in row Not much work/time overhead No added communication

Minimizing Memory with PETSc PETSc CSR pointer to Fortran arrays/matrices Vector: VecCreateMPIWithArray Matrix: MatCreateMPIAIJWithSplitArrays NIMROD developer has explicit control of memory KSPSolve If PC=LU, Factorization done with call to first linear solve Destroy Fortran CSR Matrix after the first solve No separate call to perform factorization Potential for higher memory overhead Internal copies of matrix by PETSc HYPRE BoomerAMG: copy for reordering In contact with HYPRE team to resolve this issue SuperLU no extra copy MUMPS no extra copy

Real Solve: Timings are Comparable Test problem Ornl case 2 modes, 1 mode per layer Quartic polynomials 8x5 elements per block 100 time steps 1/100 real matrix creates 100/100 complex creates Solve Time Factor the Matrix - Build LU Solve LUx=b

Complex Solve: PETSc slower but PETSc SuperLU interface slower Complex unrolling PETSc MUMPS Competitive for larger problems Need to run even larger Solve Time Factorize Matrix (Build LU) Solve LUx=b

Complex sorting and overhead negligible Sorting: arranging data into CSR (or CSC), 10 do loops slu_dstm slu_dist PETSc overhead Allocating unrolled CSR Filling expanded Real matrix Sorting rows Creating PETSc Objects PETSc overhead negligible Scales

MUMPS is Competitive Timings for Complex solve are close Multi-Frontal Factorization driven by an assembly tree Advanced Preprocessing of matrix Command line options ICNTL(14) estimation of required memory -mat_mumps_icntl_14 50 Handles Complex Matrices Direct interface reduce matrix size Lower memory overhead Shamelessly taken from a presentation of a MUMPS user

Breakdown of MUMPS timing Complex Solve Dominates PETSc Overhead < Complex Sort

Complex unrolling doubles memory Tau profiler Avg. MB/proc Memory highmark Complex unrolling CSR 2x memory Factorization Potential for even more memory usage No direct call to Compute LU Factorization

PETSc->HYPRE->BoomerAMG PETSc interface with HYPRE provides BoomerAMG Algebraic Multigrid (AMG) Coarsening based on strong connections Strong/weak connections determined by user (knob) Coarse grids determined by strong/weak connections AMG can handle modest anisotropies Scalable to petascale HYPRE s BoomerAMG has scaled to 125K processors for 3D 7-Pt Finite Difference Method

Here holmes case nimruns/qpar/cases/holmes/hmpr003a p_model= aniso1 BoomerAMG temperature eqn. only Varied k_pll from 10^4 to 10^12 AMG options Strong threshold 0.25 relaxation type SOR/Jacobi Coarsest grid Gaussian-elimination 2 Vcycles per PC BoomerAMG works well for modest anisotropies 2 relaxation sweeps per level As anisotropy increases HYPRE does a poorer job as a PC need more GMRES iterations Here anisotropy out of the plane (of the plane solve) Affects strong/weak connections AMG does much worse with in plane anisotropy

Conclusions PETSc offers command line flexibilty Interface to external solvers Ability to have different preconditioners for each field Quick response/implementation from developers User Issues: Documentation, Hidden copies Complex Unrolling Doubling of matrix size waste of memory Not much work/time overhead MUMPS is a winner Almost as effective for unrolled (larger) matrices Direct complex interface will save solver time and memory HYPRE Efficient Solve for modest anisotropy Not a magic bullet ~ Large anisotropies

To Do/Future Work Sparse Approximation to the Preconditioner When AMG converges sparse preconditioner can save memory Chetan Jhurani Complex AMG Strength of connections -> modulus of complex entry Serial code (S. Maclachlan) Smooth Aggregation Multigrid (SAMG) University of Colorado Trilinos ML through PETSc Continuing JFNK work

Discretized Equations Currently it uses a semi-implicit scheme Staggered in time and space (implicit leapfrog) Solves for updates rather than fields themselves

Fully Implicit JFNK Fully Implicit Overcomes time step limitations due to advection Larger time steps -> Faster time to solve (Chacon 2008) Accuracy Iterative (Newton type) method to solve nonlinear F(u)=0 Typically use CG or GMRES Action of the Jacobian (in building Krylov subspace) is approximated Don t need to form the analytical Jacobian Preconditioning needed to attain reasonable convergence rates Preconditioner usually a simple approximation to the full Jacobian Physics-based preconditioning (Chacon 2008) Right preconditioned GMRES

Approach to Applying JFNK within NIMROD Non-invasive Don t change the structure of the code Adapt to the existing routines Use as much functionality as possible Potentially save time Extend NIMROD so that other developers can use our work Interface with PETSc

Using PETSc for Nonlinear Solve Portable, Extensible Toolkit for Scientific Computation Developed by Argonne National Lab Parallelization Distributed Arrays KSP library Implementation of GMRES Preconditioners SNES library Nonlinear solver library Jacobian-Free Newton Krylov Shell routines Matrix Free Allow the use of NIMROD routines Evaluate functionals Apply preconditioners

Fully Implicit Solve: Velocity Nimrod currently has only linear solvers Discretized velocity equation 2 nd order perturbation lagged used in residual for next iteration Our approach: Include the nonlinear term Precondition GMRES using Calculate the action of the Jacobian using finite differencing on

Velocity Results Tearing Mode Instability Convergence in 2-3 GMRES its Similar to Lagged Method

Fully Implicit Solve: MHD System Apply the fully implicit JFNK method to the full MHD System Problems that couple density, temperature, magnetic field to velocity, but no cross couplings Polytropic plasma All fields coupled to velocity

Fully Implicit Solve: Crank-Nicholson Apply Crank-Nicholson to discretized equations to solve for updates Evaluate all fields at the same time value Still staggered in space

Fully Implicit Solve: Preconditioner Linearize to compute the Jacobian Define

Full MHD System Following Chacon (2008) apply LDU on 2x2 matrix and invert wher e Approximate with Where is the ideal MHD operator which contains all of the wave propagation information P sf, already exists in NIMROD (2D)

Scaling/Nondimensionalization NIMROD is coded in dimensional units Physical quantities have a disparate range of scales In JFNK, using GMRES functional evaluations are used to determine the nonlinear iteration update Magnitude of each residual determines the magnitude of nonlinear update Differences in the mag. of residual and mag. of solution prevent convergence Conversion between nondimensional and dimensional variables

Scaling/Nondimensionalization Scale equations to produce uniform residuals Write dimensional equations with explicit dimensional factors Multiply the four nonlinear residual equations by

Scaling/Nondimensionalization Produces nondimensional functional using dimensional quantities The solution to this system is the dimensionless update Must re-dimensionalize the update to use the dimensional function F Functional becomes

Scaling/Nondimensionalization The nondimensional functional G yields residuals that are all within an order of magnitude Re-dimensionalizing ensures that each unknown is of the proper order Modified Jacobian for the dimensionless functional The dimensional Jacobian is based on the dimensional equations Already implemented in NIMROD Leverage functionals and matrices already implemented in NIMROD Wrap PETSc routines with these scalings

Preconditioning Diagonal inversions already coded in NIMROD Need to apply the upper (U) and lower (L) parts Use existing matrix-free rhs routines in NIMROD For L part: Temporarily set variables to zero and use the functional for V Similarly for U terms Temporarily set variables to zero and use the functional for n,t,b

Status Nonlinear Functional is computed On the fly nondimensionalization works Produces residuals all within one order of magnitude Error equally distributed across equations All variables are equally modified by nonlinear updates GMRES with no preconditioning works Terribly slow convergence In progress: Finishing preconditioner L and U terms

To Do / Future Work Velocity solve works Convergence results Current Challenges Complete Preconditioning for the Full MHD system Future Work 3D Complex (Fourier) coefficients Same (axisymmetric) preconditioner Efficient method of applying preconditioner Multigrid to apply Parallelization Handled by NIMROD + PETSc