Hybrid Analog-Digital Solution of Nonlinear Partial Differential Equations

Similar documents
A Crash Course on Fast Multipole Method (FMM)

HOW TO DEVELOP A TOP- TEN ALGORITHM IN THE 21ST CENTURY

Lecture 1 Numerical methods: principles, algorithms and applications: an introduction

Lecture Note 18: Duality

MATH20602 Numerical Analysis 1

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 0

MATH20602 Numerical Analysis 1

Linear Solvers. Andrew Hazel

Some notes about PDEs. -Bill Green Nov. 2015

Motivation: Sparse matrices and numerical PDE's

1 Conjugate gradients

From Stationary Methods to Krylov Subspaces

COURSE DESCRIPTIONS. 1 of 5 8/21/2008 3:15 PM. (S) = Spring and (F) = Fall. All courses are 3 semester hours, unless otherwise noted.

Numerical Methods for Engineers

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers

The Conjugate Gradient Method

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs

Contents. Preface... xi. Introduction...

Newton s Method and Efficient, Robust Variants

Lecture 17: Iterative Methods and Sparse Linear Algebra

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

Preconditioners for the incompressible Navier Stokes equations

Calculus of Variations and Computer Vision

Efficient Augmented Lagrangian-type Preconditioning for the Oseen Problem using Grad-Div Stabilization

Solving Large Nonlinear Sparse Systems

Review for Exam 2 Ben Wang and Mark Styczynski

Chapter 9 Implicit integration, incompressible flows

PDE Solvers for Fluid Flow

Lecture 8: Fast Linear Solvers (Part 7)

M.A. Botchev. September 5, 2014

Numerical Nonlinear Optimization with WORHP

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

Introduction to Scientific Computing

Application of a Non-Linear Frequency Domain Solver to the Euler and Navier-Stokes Equations

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

FINITE-DIMENSIONAL LINEAR ALGEBRA

Modeling, Simulating and Rendering Fluids. Thanks to Ron Fediw et al, Jos Stam, Henrik Jensen, Ryan

Kasetsart University Workshop. Multigrid methods: An introduction

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

MA3025 Course Prerequisites

Algebraic Multigrid as Solvers and as Preconditioner

Fluid Animation. Christopher Batty November 17, 2011

Numerical Methods I Non-Square and Sparse Linear Systems

Dispersed Multiphase Flow Modeling using Lagrange Particle Tracking Methods Dr. Markus Braun Ansys Germany GmbH

Applied Numerical Analysis

An Overview of Fluid Animation. Christopher Batty March 11, 2014

Application of Chimera Grids in Rotational Flow

HIGH ACCURACY NUMERICAL METHODS FOR THE SOLUTION OF NON-LINEAR BOUNDARY VALUE PROBLEMS

Chapter 6. Finite Element Method. Literature: (tiny selection from an enormous number of publications)

Numerical Mathematics

Conjugate Gradient Method

APPLIED NUMERICAL LINEAR ALGEBRA

The solution of the discretized incompressible Navier-Stokes equations with iterative methods

( ) Notes. Fluid mechanics. Inviscid Euler model. Lagrangian viewpoint. " = " x,t,#, #

2.29 Numerical Fluid Mechanics Fall 2011 Lecture 7

Monte Carlo Methods. Part I: Introduction

Lecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C.

Numerical Methods with MATLAB

NUMERICAL COMPUTATION IN SCIENCE AND ENGINEERING

Fast Multipole Methods: Fundamentals & Applications. Ramani Duraiswami Nail A. Gumerov

University of Maryland Department of Computer Science TR-5009 University of Maryland Institute for Advanced Computer Studies TR April 2012

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

On the choice of abstract projection vectors for second level preconditioners

TAU Solver Improvement [Implicit methods]

The Chemical Kinetics Time Step a detailed lecture. Andrew Conley ACOM Division

The Deflation Accelerated Schwarz Method for CFD

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84

NUMERICAL MATHEMATICS AND COMPUTING

Index. higher order methods, 52 nonlinear, 36 with variable coefficients, 34 Burgers equation, 234 BVP, see boundary value problems

DM545 Linear and Integer Programming. Lecture 7 Revised Simplex Method. Marco Chiarandini

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Optimization with COMSOL Multiphysics COMSOL Tokyo Conference Walter Frei, PhD Applications Engineer

A Novel Approach for Solving the Power Flow Equations

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Numerical Methods in Aerodynamics. Turbulence Modeling. Lecture 5: Turbulence modeling

Computational Methods

A THEORETICAL INTRODUCTION TO NUMERICAL ANALYSIS

Finding Rightmost Eigenvalues of Large, Sparse, Nonsymmetric Parameterized Eigenvalue Problems

High Performance Nonlinear Solvers

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems

Index. Index. More information. in this web service Cambridge University Press

Numerical Methods for PDE-Constrained Optimization

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)

Acceleration of Time Integration

WHEN studying distributed simulations of power systems,

THE problem of phase noise and its influence on oscillators

A CUDA Solver for Helmholtz Equation

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Engineering. Mathematics. GATE 2019 and ESE 2019 Prelims. For. Comprehensive Theory with Solved Examples

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Numerical Solution Techniques in Mechanical and Aerospace Engineering

Accelerated Solvers for CFD

Engineering Mathematics

Incomplete Cholesky preconditioners that exploit the low-rank property

Poisson Equation in 2D

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Revised Simplex Method

Transcription:

Hybrid Analog-Digital Solution of Nonlinear Partial Differential Equations Yipeng Huang, Ning Guo, Kyle Mandli, Mingoo Seok, Yannis Tsividis, Simha Sethumadhavan Columbia University

Hybrid Analog-Digital Solution of Nonlinear Partial Differential Equations Yipeng Huang, Ning Guo, Kyle Mandli, Mingoo Seok, Yannis Tsividis, Simha Sethumadhavan Columbia University

2000 The Best of the 20th Century: Editors Name Top 10 Algorithms Society for Industrial and Applied Mathematics Compiled by Barry Cipra 2016 Most mentioned in The Princeton Companion to Applied Mathematics Compiled by Nick Higham - Newton methods Matrix factorizations (e.g., LU decomposition) Eigenvalue algorithms (e.g., QR algorithm) Monte Carlo method Fast Fourier transform Krylov subspace iteration methods (e.g., conjugate gradients) Simplex method for linear programming Others: Fortran compiler, Quicksort, integer relation detection, fast multipole Others: JPEG, PageRank, Kalman filter

2000 The Best of the 20th Century: Editors Name Top 10 Algorithms Society for Industrial and Applied Mathematics Compiled by Barry Cipra 2016 Most mentioned in The Princeton Companion to Applied Mathematics Compiled by Nick Higham - Newton methods Matrix factorizations (e.g., LU decomposition) Eigenvalue algorithms (e.g., QR algorithm) Monte Carlo method Fast Fourier transform Krylov subspace iteration methods (e.g., conjugate gradients) Simplex method for linear programming Others: Fortran compiler, Quicksort, integer relation detection, fast multipole Others: JPEG, PageRank, Kalman filter

2000 The Best of the 20th Century: Editors Name Top 10 Algorithms Society for Industrial and Applied Mathematics Compiled by Barry Cipra 2016 Most mentioned in The Princeton Companion to Applied Mathematics Compiled by Nick Higham - Newton methods Matrix factorizations (e.g., LU decomposition) Eigenvalue algorithms (e.g., QR algorithm) Monte Carlo method Fast Fourier transform Krylov subspace iteration methods (e.g., conjugate gradients) Simplex method for linear programming Others: Fortran compiler, Quicksort, integer relation detection, fast multipole Others: JPEG, PageRank, Kalman filter

Discipline Nonlinear PDEs Cause of nonlinear behavior Applied physics Plasma & fluids Multiphysics Applied mathematics Nonlinear waves Dispersive effects Chemical engineering Combustion models Multiphysics Civil engineering Solid mechanics Nonlinear spring forces Electrical engineering Circuit simulation Nonlinear components Operations research Convex optimization Nonlinear objectives & constraints Mechanical engineering Optimal control Nonlinear Euler-Lagrange equations

Linear Nonlinear Simple map Check map often Few turns Need to start close

Hybrid Analog-Digital Solution of Nonlinear Partial Differential Equations Yipeng Huang, Ning Guo, Kyle Mandli, Mingoo Seok, Yannis Tsividis, Simha Sethumadhavan Columbia University

Motivation for hybrid analog-digital architecture Ability to solve nonlinear problems Solution accuracy & precision Technique diversity Problem sizes Analog Digital Both analog and digital needed for target workload

Outline Motivation: Analog avoids digital pitfalls in nonlinear problems Architecture: Hybrid analog-digital system combines both for speed & precision How to map a PDE problem to a prototype analog accelerator Evaluation: 100 faster for approximate solution 5.7 faster and 11.6 less energy when analog helps GPU

Outline Motivation: Analog avoids digital pitfalls in nonlinear problems Architecture: Hybrid analog-digital system combines both for speed & precision How to map a PDE problem to a prototype analog accelerator Evaluation: 100 faster for approximate solution 5.7 faster and 11.6 less energy when analog helps GPU

u = 1 2 + 3 2 i Solve for u: f u = u $ 1 = 0 u = 1 u = 1 2 3 2 i

u = 1 2 + 3 2 i Solve for u: f u = u $ 1 = 0 nonlinear u = 1 u = 1 2 3 2 i

Solve for u: f u = u $ 1 = 0 u = 1 2 + 3 2 i Using Newton s method: u -./0 = u 1233 f u 1233 f 4 u 1233 u = 1 With some initial guess u 5 Returns one of three final solutions u = 1 2 3 2 i

Solve for u: f u = u $ 1 = 0 u = 1 2 + 3 2 i Using Newton s method: u next = u curr f u curr f 4 u curr u = 1 With some initial guess u 5 Returns one of three final solutions u = 1 2 3 2 i

Solve for u: f u = u $ 1 = 0 u = 1 2 + 3 2 i Using Newton s method: u -./0 = u 1233 f u 1233 f 4 u 1233 u = 1 With some initial guess u 5 Returns one of three final solutions u = 1 2 3 2 i

Solve for u: f u = u $ 1 = 0 Using Newton s method: u -./0 = u 1233 u $ 1233 1 B 3u 1233 With initial guess: u 5 = 1.5 + 2i Returns final solution: u EF-GH = 1 2 + 3 2 i

Solve for u: f u = u $ 1 = 0 Using Newton s method: u -./0 = u 1233 u $ 1233 1 B 3u 1233 With initial guess: u 5 = 1.5 + 2i Returns final solution: u EF-GH = 1

Plot of final solution vs. initial guess for Newton s method Three colors indicate which of the three roots the initial guesses go to Like hiking in wilderness, important to start close to destination or else easy to get lost http://dept.cs.williams.edu/~heeringa/classes/cs135/s15/labs/lab3/

Solve for u: f u = u $ 1 = 0 Using damped Newton s method: u -./0 = u 1233 1 u$ 1233 1 2 B 3u 1233 With initial guess: u 5 = 1.5 + 2i Returns final solution: u EF-GH = 1

Solve for u: f u = u $ 1 = 0 Using damped Newton s method: u -./0 = u 1233 1 u$ 1233 1 8 B 3u 1233 With initial guess: u 5 = 1.5 + 2i Returns final solution: u EF-GH = 1

Like hiking in wilderness, better to check map often Don t go too far between checking map, or else easy to get lost With smaller damping parameter, plot becomes more contiguous at the cost of more digital computation time José M. Gutiérrez, "Numerical Properties of Different Root-Finding Algorithms Obtained for Approximating Continuous Newton s Method," Algorithms 2015

Outline Motivation: Analog avoids digital pitfalls in nonlinear problems Architecture: Hybrid analog-digital system combines both for speed & precision How to map a PDE problem to a prototype analog accelerator Evaluation: 100 faster for approximate solution 5.7 faster and 11.6 less energy when analog helps GPU

Push damped Newton s method to limit, get continuous Newton s method Take (nearly) infinitely many (nearly) infinitesimal steps Like continuously checking map while hiking in wilderness

Push damped Newton s method to limit, get continuous Newton s method Take (nearly) infinitely many (nearly) infinitesimal steps

Push damped Newton s method to limit, get continuous Newton s method Take (nearly) infinitely many (nearly) infinitesimal steps Integrators store present guess

Push damped Newton s method to limit, get continuous Newton s method Take (nearly) infinitely many (nearly) infinitesimal steps Multipliers and summers find derivative & function values

Push damped Newton s method to limit, get continuous Newton s method Take (nearly) infinitely many (nearly) infinitesimal steps Negative feedback solves linear algebra problem Huang et al., ISCA 2016

Push damped Newton s method to limit, get continuous Newton s method Take (nearly) infinitely many (nearly) infinitesimal steps Integrators accumulate increments for Newton s method

Push damped Newton s method to limit, get continuous Newton s method Take (nearly) infinitely many (nearly) infinitesimal steps Values: analog current and voltage Works continuously: no clock or steps

Push damped Newton s method to limit, get continuous Newton s method Take (nearly) infinitely many (nearly) infinitesimal steps Inner loop for linear algebra Takeaway 1 of 4: how to do more work in analog? Create analog inner loops by nesting circuits with different convergence rates

Push damped Newton s method to limit, get continuous Newton s method Take (nearly) infinitely many (nearly) infinitesimal steps Outer loop for Newton s method invokes inner loop Takeaway 1 of 4: how to do more work in analog? Create analog inner loops by nesting circuits with different convergence rates

Outline Motivation: Analog avoids digital pitfalls in nonlinear problems Architecture: Hybrid analog-digital system combines both for speed & precision How to map a PDE problem to a prototype analog accelerator Evaluation: 100 faster for approximate solution 5.7 faster and 11.6 less energy when analog helps GPU

Linear Nonlinear L2_norm error 1.00E+03 1.00E+01 1.00E-01 1.00E-03 1.00E-05 1.00E-07 1.00E-09 1.00E-11 1.00E-13 1.00E-15 1.00E-17 0 5 10 15 20 25 30 35 iterations newton First few digits of solution cheap Rough guess solution expensive Last few digits of solution expensive Refined solution from guess cheap

Linear Nonlinear L2_norm error 1.00E+03 1.00E+01 1.00E-01 1.00E-03 1.00E-05 1.00E-07 1.00E-09 1.00E-11 1.00E-13 1.00E-15 1.00E-17 0 5 10 15 20 25 30 35 iterations newton First few digits of solution cheap Rough guess solution expensive Last few digits of solution expensive Refined solution from guess cheap

Linear Nonlinear Time savings with good seed L2_norm error 1.00E+03 1.00E+01 1.00E-01 1.00E-03 1.00E-05 1.00E-07 1.00E-09 1.00E-11 1.00E-13 1.00E-15 1.00E-17 Time savings with good seed 0 5 10 15 20 25 30 35 iterations newton First few digits of solution cheap Rough guess solution expensive Last few digits of solution expensive Refined solution from guess cheap

Solve time for digital and seeded digital solver 2 Solution time vs. Reynolds number for digital and seeded digital solver Time to convergence (s) 1.5 1 0.5 0 0.81 0.08 0.06 0.07 0.06 0.08 0.06 0.07 0.06 0.08 0.06 0.15 0.08 0.09 0.08 0.10 0.08 0.05 0.01 0.02 0.03 0.06 0.13 0.25 0.50 1.00 2.00 More Reynolds nonlinear number baseline digital solver analog seeded digital solver Takeaway 2 of 4: how to increase solution accuracy? Cheap analog approximation good digital initial guess

Outline Motivation: Analog avoids digital pitfalls in nonlinear problems Architecture: Hybrid analog-digital system combines both for speed & precision How to map a PDE problem to a prototype analog accelerator Evaluation: 100 faster for approximate solution 5.7 faster and 11.6 less energy when analog helps GPU

Digital: lots of methods and code for breaking down PDEs We want to keep existing code while using accelerator Analog: can only solve ordinary differential equations and algorithms phrased as ODEs, such as continuous Newton s Takeaway 3 of 4: how to minimize reprogramming? Retarget kernels in existing digital code to accelerate in analog

Case study: viscous 2D Burgers equation Core nonlinear part of the Navier-Stokes equations for fluid dynamics u and v are velocity of fluid in x- and y- dimensions, respectively

Case study: viscous 2D Burgers equation Core nonlinear part of the Navier-Stokes equations for fluid dynamics u and v are velocity of fluid in x- and y- dimensions, respectively Why nonlinear: because derivatives of u and v have u and v as coefficients

Case study: viscous 2D Burgers equation Core nonlinear part of the Navier-Stokes equations for fluid dynamics u and v are velocity of fluid in x- and y- dimensions, respectively Why nonlinear: because derivatives of u and v have u and v as coefficients how nonlinear the PDE is depends on choice of Re, Reynolds number

Case study: viscous 2D Burgers equation Core nonlinear part of the Navier-Stokes equations for fluid dynamics u and v are velocity of fluid in x- and y- dimensions, respectively Why nonlinear: because derivatives of u and v have u and v as coefficients how nonlinear the PDE is depends on choice of Re, Reynolds number To solve this PDE, do space discretization and time stepping results in system of nonlinear algebraic equations, which we solve in analog

Prototype analog accelerator for nonlinear systems of equations Tiled programmable analog accelerator Enhanced calibration for all analog units

Prototype analog accelerator for nonlinear systems of equations Tiled programmable analog accelerator Enhanced calibration for all analog units Analog in/out for multi-chip operation Sparse global connectivity b/w tiles for PDEs; Dense local connectivity b/w function units for variety of nonlinear functions, derivatives

Prototype analog accelerator for nonlinear systems of equations Tiled programmable analog accelerator Enhanced calibration for all analog units Analog in/out for multi-chip operation Sparse global connectivity b/w tiles for PDEs; Dense local connectivity b/w function units for variety of nonlinear functions, derivatives Solve 2D Burgers equation with 2 chips x-velocity field in one chip y-velocity field in the other

Outline Motivation: Analog avoids digital pitfalls in nonlinear problems Architecture: Hybrid analog-digital system combines both for speed & precision How to map a PDE problem to a prototype analog accelerator Evaluation: 100 faster for approximate solution 5.7 faster and 11.6 less energy when analog helps GPU

Performance simulation of scaled-up analog accelerators Analog accelerator performance vs. digital for larger problem sizes Plot of solution time to same accuracy vs. how nonlinear problem is

Performance simulation of scaled-up analog accelerators Analog accelerator performance vs. digital for larger problem sizes Plot of solution time to same accuracy vs. how nonlinear problem is Analog: solution time for 2-by-2 grid validated using prototype chip Digital: solution time from damped Newton s method solver

Performance simulation of scaled-up analog accelerators 100 less time Analog accelerator performance vs. digital for larger problem sizes Plot of solution time to same accuracy vs. how nonlinear problem is Analog: solution time for 2-by-2 grid validated using prototype chip Digital: solution time from damped Newton s method solver

Outline Motivation: Analog avoids digital pitfalls in nonlinear problems Architecture: Hybrid analog-digital system combines both for speed & precision How to map a PDE problem to a prototype analog accelerator Evaluation: 100 faster for approximate solution 5.7 faster and 11.6 less energy when analog helps GPU

Using cheap analog approximation to reduce GPU solution time & energy Plot of solution time & energy vs. problem size 32 32 2D problem has 2048 variables at play

Using cheap analog approximation to reduce GPU solution time & energy Plot of solution time & energy vs. problem size 32 32 2D problem has 2048 variables at play Comparing: GPU alone to high precision Analog approximation GPU with analog approx. to high precision

Using cheap analog approximation to reduce GPU solution time & energy 5.7 less time Plot of solution time & energy vs. problem size 32 32 2D problem has 2048 variables at play Comparing: GPU alone to high precision Analog approximation GPU with analog approx. to high precision 11.6 less energy

Using cheap analog approximation to reduce GPU solution time & energy 5.7 less time Plot of solution time & energy vs. problem size 32 32 2D problem has 2048 variables at play Comparing: GPU alone to high precision Analog approximation GPU with analog approx. to high precision 11.6 less energy Time and energy saved here is significant PDE solver workloads repeatedly call these innermost loops as dominant kernel

Using cheap analog approximation to reduce GPU solution time & energy Scaling out beyond analog size constraint? 16-by-16 chip uses 350mm2, 400mW Takeaway 4 of 4: how to increase problem size? Extremely low power analog can scale out and stack differently than digital

Nonlinear is analog killer app!

Nonlinear is analog killer app! Why it works: How to do more work in analog? Create analog inner loops by nesting circuits with different convergence How to increase solution accuracy? Cheap analog approximation good digital initial guess How to minimize reprogramming? Retarget kernels in existing digital code to accelerate in analog How to increase problem size? Extremely low power analog can scale out and stack differently than digital