H 2 -matrices with adaptive bases

Similar documents
Fast evaluation of boundary integral operators arising from an eddy current problem. MPI MIS Leipzig

für Mathematik in den Naturwissenschaften Leipzig


APPROXIMATING GAUSSIAN PROCESSES

Hierarchical Matrices. Jon Cockayne April 18, 2017

Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques

für Mathematik in den Naturwissenschaften Leipzig

Hierarchical matrix techniques for low and high frequency Helmholtz problems

Math 671: Tensor Train decomposition methods

Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig

THE H 2 -matrix is a general mathematical framework [1],

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS

Hybrid Cross Approximation for the Electric Field Integral Equation

An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages

Theory and implementation of H-matrix based iterative and direct solvers for Helmholtz and elastodynamic oscillatory kernels

Fast Multipole BEM for Structural Acoustics Simulation

The Conjugate Gradient Method

COMPARED to other computational electromagnetic

Lecture 1: Center for Uncertainty Quantification. Alexander Litvinenko. Computation of Karhunen-Loeve Expansion:

Fast Structured Spectral Methods

A Fast N-Body Solver for the Poisson(-Boltzmann) Equation

An Introduction to Hierachical (H ) Rank and TT Rank of Tensors with Examples

Numerical Solutions to Partial Differential Equations

Lecture 8: Boundary Integral Equations

Teil 2: Hierarchische Matrizen

Master Thesis Literature Review: Efficiency Improvement for Panel Codes. TU Delft MARIN

Contents. Preface... xi. Introduction...

Contents. Preface to the Third Edition (2007) Preface to the Second Edition (1992) Preface to the First Edition (1985) License and Legal Information

Wavelets in Scattering Calculations

Experiences with Model Reduction and Interpolation

4. Numerical Quadrature. Where analytical abilities end... continued

Numerical methods for PDEs FEM convergence, error estimates, piecewise polynomials

Math 671: Tensor Train decomposition methods II

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

An Adaptive Hierarchical Matrix on Point Iterative Poisson Solver

Fast Numerical Methods for Stochastic Computations

A New Scheme for the Tensor Representation

A High-Order Galerkin Solver for the Poisson Problem on the Surface of the Cubed Sphere

Practical Linear Algebra: A Geometry Toolbox

PARTIAL DIFFERENTIAL EQUATIONS

Lecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C.

MATHEMATICAL METHODS INTERPOLATION

Preliminary Examination, Numerical Analysis, August 2016

Adaptive Finite Element Methods Lecture Notes Winter Term 2017/18. R. Verfürth. Fakultät für Mathematik, Ruhr-Universität Bochum

Applied/Numerical Analysis Qualifying Exam

Part IB Numerical Analysis

Solving an Elliptic PDE Eigenvalue Problem via Automated Multi-Level Substructuring and Hierarchical Matrices

The Lanczos and conjugate gradient algorithms

FINITE-DIMENSIONAL LINEAR ALGEBRA

Lecture 6. Numerical methods. Approximation of functions

für Mathematik in den Naturwissenschaften Leipzig

A Hybrid Method for the Wave Equation. beilina

Lecture 1: Introduction to low-rank tensor representation/approximation. Center for Uncertainty Quantification. Alexander Litvinenko

Chapter Two: Numerical Methods for Elliptic PDEs. 1 Finite Difference Methods for Elliptic PDEs

Tensor-Product Representation of Operators and Functions (7 introductory lectures) Boris N. Khoromskij

Fast Direct Methods for Gaussian Processes

Accuracy, Precision and Efficiency in Sparse Grids

Applied Linear Algebra in Geoscience Using MATLAB

Recurrence Relations and Fast Algorithms

MODEL REDUCTION BY A CROSS-GRAMIAN APPROACH FOR DATA-SPARSE SYSTEMS

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Pascal s Triangle on a Budget. Accuracy, Precision and Efficiency in Sparse Grids

1. Structured representation of high-order tensors revisited. 2. Multi-linear algebra (MLA) with Kronecker-product data.

Stabilization and Acceleration of Algebraic Multigrid Method

Numerical methods for PDEs FEM convergence, error estimates, piecewise polynomials

homogeneous 71 hyperplane 10 hyperplane 34 hyperplane 69 identity map 171 identity map 186 identity map 206 identity matrix 110 identity matrix 45

The Fast Multipole Method and other Fast Summation Techniques

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Numerical Methods for Partial Differential Equations

Application of the LLL Algorithm in Sphere Decoding

Scientific Computing I

A Recursive Trust-Region Method for Non-Convex Constrained Minimization

Sparse BLAS-3 Reduction

A fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix

Solving Boundary Value Problems (with Gaussians)

Fast Multipole Methods for The Laplace Equation. Outline

Conjugate Gradient Method

INFINITE ELEMENT METHODS FOR HELMHOLTZ EQUATION PROBLEMS ON UNBOUNDED DOMAINS

Domain Decomposition Preconditioners for Spectral Nédélec Elements in Two and Three Dimensions

Chapter 1: The Finite Element Method

Numerical Solutions to Partial Differential Equations

Matrix assembly by low rank tensor approximation

Scientific Computing: An Introductory Survey

Applied Linear Algebra in Geoscience Using MATLAB

ACM/CMS 107 Linear Analysis & Applications Fall 2017 Assignment 2: PDEs and Finite Element Methods Due: 7th November 2017

CS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation

Preface to the Second Edition. Preface to the First Edition

Fast Nyström Methods for Parabolic Boundary Integral Equations

Hierarchical Parallel Solution of Stochastic Systems

Course Notes: Week 1

3.1 Interpolation and the Lagrange Polynomial

Hierarchical Matrices in Computations of Electron Dynamics

Numerical Methods I Orthogonal Polynomials

Class notes: Approximation

Fixed point iteration and root finding

FastMMLib: a generic Fast Multipole Method library. Eric DARRIGRAND. IRMAR Université de Rennes 1

Lecture 8: Fast Linear Solvers (Part 7)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Directional Field. Xiao-Ming Fu

Transcription:

1 H 2 -matrices with adaptive bases Steffen Börm MPI für Mathematik in den Naturwissenschaften Inselstraße 22 26, 04103 Leipzig http://www.mis.mpg.de/

Problem 2 Goal: Treat certain large dense matrices in linear complexity. Model problem: Discretization of integral operators of the form K[u](x) = κ(x, y)u(y) dy. Ω Discretization: Galerkin method with FE basis (ϕ i ) n i=1 leads to a matrix K R n n with K ij = ϕ i, K[ϕ j ] L 2 = ϕ i (x)κ(x, y)ϕ j (y) dy dx. (1 i, j n) Ω Ω Problem: K is dense. Idea: Use properties of the kernel function to approximate K. Examples: Wavelets, multipole expansion, interpolation.

Interpolation 3 Idea: We consider a subdomain τ σ Ω Ω and replace the kernel function κ by an interpolant κ τ,σ (x, y) := k k κ(x τ ν, x σ µ)l τ ν(x)l σ µ(y). (k n) ν=1 µ=1 Result: Local matrix in factorized form K τ,σ ij := ϕ i (x)κ(x, y)ϕ j (y) dy dx = τ k σ k ν=1 µ=1 κ(x τ ν, x σ µ) } {{ } =:Sνµ τ,σ = (V τ S τ,σ (V σ ) ) ij τ ϕ i (x)l τ ν(x) dx τ } {{ } =:Viν τ σ ϕ i (x) κ τ,σ (x, y)ϕ j (y) dy dx ϕ j (y)l σ µ(y) dy σ } {{ } =:Vjµ σ n k n k

Local interpolation 4 Problem: Typical kernel functions are not globally smooth. Idea: They are locally smooth outside of the diagonal x = y. Examples: 1/ x y, log x y. Error bound: For m-th order tensor interpolation on axis-parallel boxes B τ τ and B σ σ, we get ( κ τ,σ κ L (B τ B σ ) C apx (m) 1 + 2 dist(bτ, B σ ) m ) diam(b τ B σ. ) Rank: m interpolation points per coordinate k = m d. Admissibility: Uniform exponential convergence requires diam(b τ B σ ) 2η dist(b τ, B σ ).

Block structure 5 Approach: Split Ω Ω into admissible subdomains and a small remainder. Cluster tree: Hierarchy of subdomains of Ω Nearfield: Small blocks, stored in standard format. Farfield: Admissible blocks, stored in V τ S τ,σ (V σ ) format. Partition: Collection of farand nearfield blocks. Construction of cluster tree and block partition can be accomplished by general algorithms in O(n) operations.

Nested bases 6 Goal: Treat the cluster basis (V τ ) τ T efficiently. Idea: For a cluster τ and τ sons(τ) we have L τ ν = k ν =1 L τ ν(x τ ν )Lτ. (1 ν k) ν Nested bases: Consider i τ τ. Setting T τ ν ν := Lτ ν(x τ ν ), we find k V τ iν = τ ϕ i (x)l τ ν(x) dx = ν =1 T τ ν ν τ ϕ i (x)l τ ν (x) dx = (V τ T τ ) iν. Consequence: Store V τ only for leaves and use T τ for all other clusters. Complexity: Storage O(nk), matrixvector O(nk).

Orthogonalization 7 Problem: Efficiency depends on the rank k. Can we reduce this rank by eliminating redundant expansion functions? Idea: Orthogonalize columns of V τ. Gram-Schmidt: Find Ṽ τ = V τ Z τ such that (Ṽ τ ) Ṽ τ = I holds. Nested structure: Using P τ with V τ = Ṽ τ P τ, we find V τ = V τ 1 T τ 1 = Ṽ τ 1 P τ 1 T τ 1 = Ṽ τ 1 V τ 2 T τ 2 Ṽ τ 2 P τ 2 T τ 2 Ṽ τ 2 = Ṽ τ V τ Z τ P τ 1 T τ 1, P τ 2 T τ 2 so it is sufficient to orthogonalize a (2k) k matrix. The resulting cluster basis is nested due to Ṽ τ = V τ Z τ = Ṽ τ 1 (P τ 1 T τ 1 Z τ ) = Ṽ τ 1 T τ 1. Ṽ τ 2 (P τ 2 T τ 2 Z τ ) Ṽ τ 2 τ T 2 Result: Complexity O(nk 2 ).

Orthogonalization: Simple example 8 Example: Unit sphere, piecewise constant trial and test functions. Interpolation Orthogonalized n Build MVM Mem/n Expl Impl MVM Mem/n 2048 6 0.14 49.2 3 8 0.07 17.0 8192 26 0.54 45.6 15 35 0.31 21.1 32768 126 2.45 49.6 107 149 1.37 22.0 131072 765 10.33 53.4 469 774 5.48 22.0 524288 3347 43.08 53.8 2045 3473 23.81 22.0 H 2 -matrix constructed by cubic interpolation and subsequent orthogonalization, relative error 10 3. Hardware: SunFire 6800 with 900 MHz UltraSparc IIIcu processors Software: HLib, see http://www.hlib.org

Recompression algorithm: Problem 9 Problem: Interpolation does not take into account that a subset of polynomials may be sufficient to approximate the kernel function. Idea: Use local singular value decompositions to find optimal cluster bases Take kernel and geometry into account. Important: Since the cluster bases are nested, the father clusters depend on the son clusters. Consequence: If τ σ is a farfield block and τ sons(τ), we have K τ σ = (V τ S τ,σ (V σ ) ) τ σ = V τ T τ S τ,σ (V σ ), i.e., the choice of V τ its other ancestors. influences all farfield blocks connected to τ or

Recompression algorithm: Local problems and recursion 10 Block row: R τ := {σ : τ + τ : τ + σ is admissible}. Goal: Find an orthogonal matrix V τ R τ kτ that maximizes σ R τ (V τ ) K τ σ 2 F. Optimal solution: Use eigenvectors of corresponding Gram matrix. Nested structure: Ensured by bottom-up recursion and projections. Result: Algorithm with complexity O(nk 2 ), truncation error can be controlled directly.

Recompression: Simple example 11 Example: Unit sphere, piecewise constant trial and test functions. Interpolation Recompressed n Build MVM Mem/n Build MVM Mem/n 2048 6 0.14 49.2 11 0.01 3.7 8192 26 0.54 45.6 51 0.11 4.4 32768 128 2.41 49.6 210 0.45 4.6 131072 762 10.34 53.4 883 2.07 4.7 524288 3371 43.37 53.8 3818 9.77 5.2 H 2 -matrix constructed by cubic interpolation. Recompression with relative error bound 10 3, maximal rank 16. Hardware: SunFire 6800 with 900 MHz UltraSparc IIIcu processors. Software: HLib, see http://www.hlib.org

Recompression: Netgen example 12 Example: Grid Crank shaft of the Netgen package. n 1768 10126 113152 Build 20 200 2255 Mem/n 5.7 7.6 5.6 Near/n 4.2 4.7 3.4 MVM 0.02 0.24 2.18 Rel. error 0.05% 0.07% 0.08% DMem/n 13.8 79.1 884.0

Recompression: Netgen geometries and eddy current model 13 Z Z b(v, φ) = Γ Z φ(y), v(x) x Φ(x, y), n(x) dy dx Z φ(y),n(x) x Φ(x, y), v(x) dy dx Γ Γ Γ n 42432 169728 37632 Direct 1554 5178 1428 MVM 8.3 32.5 8.0 Mem/DoF 168 165 183 Recompressed 1700 6090 1644 MVM 0.8 2.6 0.7 Mem/DoF 8.0 6.5 8.0 Rel. error 8 10 3 1 10 3

Summary 14 H 2 -matrices: Nested structure leads to complexity O(nk) for storage requirements and matrix-vector multiplication. Black box: The approximation is constructed by using only kernel evaluations κ(x τ ν, x σ µ) and works for all asymptotically smooth kernel functions. Recompression: Use O(nk 2 ) algorithm to reduce the rank k, find a quasi-optimal cluster basis adapted to geometry, discretization and kernel function. Work in progress: H 2 -matrix arithmetics, HCA.

H 2 -matrix arithmetics 15 Goal: Perform matrix addition and multiplication efficiently. Addition and scaling: For an admissible block τ σ, we have (A + λb) τ σ = V τ (S τ,σ A + λsτ,σ B )(V σ ), so H 2 -matrices form a vector space. Addition and scaling can be performed in O(nk) operations. ( + λ ) Multiplication: Defined recursively. A 11 A 12 B 11 B 12 = A 11B 11 + A 12 B 21 A 11 B 12 + A 12 B 22 A 21 A 22 B 21 B 22 A 21 B 11 + A 22 B 21 A 21 B 12 + A 22 B 22 Use truncation if product does not match prescribed format.

H 2 -matrix multiplication I 16 := Situation: Let A = V τ S τ,ϱ A (V ϱ ), B not admissible. Compute best approximation of AB in the form C = V τ S τ,σ C (V σ ). Orthogonality: If V τ and V σ are orthogonal: S τ,σ C = (V τ ) ABV σ = (V τ ) V τ S τ,ϱ A (V ϱ ) BV σ = S τ,ϱ A (V ϱ ) BV σ. }{{} =: S ϱ,σ B Problem: Computing (V ϱ ) BV σ directly is too expensive. Idea: Prepare S ϱ,σ B := (V ϱ ) BV σ in advance. Matrix forward transformation: Since the cluster bases are nested, all matrices can be prepared in O(nk3 ) operations. S ϱ,σ B

H 2 -matrix multiplication II 17 := Situation: Let A = V τ S τ,ϱ A (V ϱ ) and B = V ϱ S ϱ,σ B (V σ ). Advantage: The product AB already has the factorized form AB = V τ S τ,ϱ A (V ϱ ) V ϱ S ϱ,σ B (V σ ) = V τ (S τ,ϱ A Sϱ,σ B )(V σ ). Problem: Converting AB directly to the format of C will lead to inacceptable complexity. S τ,σ C Idea: Store result in multiplication is complete. := S τ,ϱ A Sϱ,σ B and distribute after the Matrix backward transformation: Since the cluster bases are nested, all matrices can be distributed in O(nk3 ) operations. S τ,σ C

H 2 -matrix multiplication III 18 := Situation: Let B = V ϱ S ϱ,σ B (V σ ). Approach: Split B and proceed by recursion: B ϱ σ = V ϱ T ϱ S ϱ,σ B (T σ ) (V σ ). }{{} =: bs ϱ,σ B Complexity: Only O(Cspn) 2 such splitting operations are required Complexity O(Cspnk 2 3 ). Result: The best approximation C of AB in a prescribed H 2 -matrix format can be computed in O(C 2 spnk 3 ) operations.

H 2 -matrix multiplication: Fixed structure 19 Example: Discretized double layer potential on the unit sphere K, approximate KK. n Build C sp H 2 -MVM H 2 -MMM H-MMM Error 512 1.7 16 < 0.01 0.3 0.7 1.1 2 2048 10.7 36 0.01 6.0 39.0 1.2 2 8192 49.4 42 0.10 35.6 315.9 4.5 3 32768 206.3 56 0.45 172.3 2085.2 4.9 3 131072 858.3 84 2.00 924.2 13213.8 8.6 5 Important: Here, the result matrix is projected to the same cluster bases and the same block structure as K. Question: Can we reduce the error by using adapted cluster bases?

H 2 -matrix multiplication: Adapted structure 20 Modification: Choose cluster bases for result matrix adaptively to improve the precision. n C sp Old MMM Error New basis New MMM Error 512 16 0.3 1.1 2 0.3 0.3 5.0 5 2048 36 6.0 1.2 2 9.5 6.8 6.6 5 8192 42 35.6 4.5 3 98.9 40.8 1.4 4 32768 56 172.3 4.9 3 878.9 194.7 4.3 6 131072 84 924.2 8.6 5 5030.2 966.9 2.9 6 Advantage: Precision significantly improved, computation time only slightly increased. Current work: Improve the efficiency of the construction of adapted cluster bases.

Hybrid Cross Approximation 21 Problem: For each admissble block V τ S τ,σ (V σ ) of the H 2 -matrix, the kernel function κ has to be evaluated in m 2d points. Very time-consuming if high precision is required. Approach: Apply adaptive cross approximation to the coefficient matrices S τ,σ to find rank k representation S τ,σ X τ,σ (Y τ,σ ). Result: Only m d k m 2d kernel evaluations required to approximate coefficient matrix S τ,σ. Important: ACA is only applied to coefficient matrices. Construction still linear in n, no problems with quadrature. m 5 6 7 8 Std 24.1s 64.7s 160.8s 399.2s HCA 22.3s 49.2s 90.0s 222.7s

Conclusion 22 H 2 -matrices: Data-sparse approximation of certain dense matrices with linear complexity in the number of degrees of freedom. Adaptive cluster bases: Find quasi-optimal expansion systems for arbitrary matrices. In the case of integral operators, point evaluations of the kernel function are sufficient to construct efficient and reliable H 2 -matrix approximations. H 2 -matrix arithmetics: Approximate A + λb and AB in O(nk 2 ) and O(nk 3 ) operations. Next goal: Approximate A 1 and LU. HCA: Reduce the number of kernel evaluations required to construct the H 2 -matrix. Software: HLib package for hierarchical matrices and H 2 -matrices, including applications to elliptic PDEs and integral equations. http://www.hlib.org

Interpolation vs. Multipole 23 Construction: Pointwise evaluation and integrals of polynomials. Tν t ν = L t ν(x t ν ), S t,s νµ = κ(x t ν, x s µ), Viν t = ϕ i (x)l t ν(x) dx. Interpolation Multipole - Rank k = p d + Rank k = p d 1 - Low frequencies only + Low and high frequencies + General kernels - Special kernels only + Anisotropic clusters - Spherical clusters only Γ Goal: Reduce rank for interpolation without sacrificing precision.