Introduction to Practical FFT and NFFT

Size: px
Start display at page:

Download "Introduction to Practical FFT and NFFT"

Transcription

1 Introduction to Practical FFT and NFFT Michael Pippig and Daniel Potts Department of Mathematics Chemnitz University of Technology September 14, 211 supported by BMBF grant 1IH81B

2 Table of Contents 1 Serial FFT Algorithms 2 Parallel FFT Algorithms 3 Parallel Fast Summation

3 Outline 1 Serial FFT Algorithms 2 Parallel FFT Algorithms 3 Parallel Fast Summation

4 Discrete Fast Fourier Transform Task of 3d-DFT Consider a three-dimensional dataset of n n 1 n 2 complex numbers ĝ k k 1 k 2 C. Compute g l l 1 l 2 = n 1 n 1 1 n 2 1 k = k 1 = k 2 = for all l t =,..., n t 1 (t =, 1, 2). Realized by 3d-FFT (n = n 1 = n 2 = n) O(n 3 log n) instead of O(n 6 ) ĝ k k 1 k 2 e 2πi(k 2 l 2 n2 +k 1 l 1 n1 +k l n )

5 FFT Software Libraries Examples of FFT implementations IBM ESSL Intel MKL FFTW Features of FFTW [Frigo, Johnson] public available open source high performance many transforms arbitrary size d-dim. FFT in place FFT Available at collect wisdom adjust planning easy interface

6 Using FFTW FFTW Plan - only once hardware adaptive time consuming Execute - several times fast transform Finalize - only once free memory FFTW_ESTIMATE heuristic choice of algorithm FFTW_MEASURE compare different algorithms FFTW_PATIENT compare more algorithms FFTW_EXHAUSTIVE compare all available algorithms

7 FFTW Interface Basic Interface simple do a single transform Advanced Interface do a transform on multiple datasets by one call supports strided input and output use a plan on different datasets Guru Interface most powerful and most complicated combine transform and data permutation 64 bit compatible

8 Nonequispaced Discrete Fourier Transform Task of 3d-DFT and 3d-NDFT For ˆf k k 1 k 2 C compute f l l 1 l 2 := N 1 N 1 N 1 k = k 1 = k 2 = ˆfk k 1 k 2 e 2πi(k 2 l 2 N +k 1 l 1N +k l N ) (DFT) for all l t < N ( l t N < 1), t =, 1, 2, and compute f j := N 1 N 1 N 1 k = k 1 = k 2 = ˆfk k 1 k 2 e 2πi(k 2x (2) j +k 1 x (1) j +k x () j ) (NDFT) for x (t) j [, 1) (t =, 1, 2), j = 1,..., M. Realized by 3d-NFFT [NFFT software library] O(N 3 log N + log 3 ( 1 ε )M ) instead of O(N 3 M )

9 NFFT Matrix-Vector-Notation Compute f = (f j ) j C M via f = Aˆf, where A = (e 2πikx j ) k,j C M N 3 and ˆf = (ˆf k k 1 k 2 ) k C N 3. Approximation [Dutt, Rohklin 93, Beylkin 95, Steidl 96,... ] A BFD, A DF B T where D R N 3 N 3 diagonal matrix F C n3 N 3 truncated Fourier matrix (n N ) B R M n3 sparse matrix

10 NFFT Software Library NFFT Software Library [Keiner, Kunis, Potts] Available at

11 Using NFFT FFTW Plan NFFT Plan Precompute Execute Execute Finalize Finalize

12 NFFT Precompute PRE_FULL_PSI fully precomputed window function Storage: (2m + 2) d M, Computation: None PRE_PSI tensor product based precomputation Storage: d(2m + 2)M, Computation: (2m + 2) d M PRE_LIN_PSI linear interpolation from lookup table Storage: dk, Computation: 2(2m + 2) d M PRE_FG_PSI fast Gaussian gridding Storage: 2dM, Computation: (2m + 2) d M

13 Outline 1 Serial FFT Algorithms 2 Parallel FFT Algorithms 3 Parallel Fast Summation

14 Discrete Fast Fourier Transform Task of 3d-DFT Consider a three-dimensional dataset of n n 1 n 2 complex numbers ĝ k k 1 k 2 C. Compute g l l 1 l 2 := = n 1 n 1 1 n 2 1 k = k 1 = k 2 = n 1 k = n 1 1 n 2 1 k 1 = k 2 = for all l t =,..., n t 1 (t =, 1, 2). ĝ k k 1 k 2 e 2πi(k 2 l 2 n2 +k 1 l 1 n1 +k l n ) ĝ k k 1 k 2 e 2πi(k 2 l 2 n2 +k 1 l 1 Realized by 3d-FFT (n = n 1 = n 2 = n) O(n 3 log n) instead of O(n 6 ) n1 ) e 2πik l n

15 One-Dimensional Data Distribution p n 1 n 2 n T p p - number of processors n n 1 n 2 Features of FFTW [Frigo, Johnson] open source easy interface communicator arbitrary size d-dim. FFT in place FFT high performance many transforms adjust blocksize adjust planning collect wisdom Maximum Number of Processors D max (n = n 1 = n 2 = n) D max = n FFTW combines portable performance and good usability, but is not scalable enough.

16 Two-Dimensional Data Distribution [Ding, Eleftheriou et al. 3, Plimpton, Pekurovsky - P3DFFT] n 2 n 1 n p T n 2 n p n 1 T p n n 2 n 1 p - size of processor grid Maximum Number of Processors pmax 2D (n = n 1 = n 2 = n) p 2D max = n 2 n pmax 1D = n pmax 2D = n

17 Algorithms Supported by FFTW3.3 Aim Implement a new parallel FFT sofware library (PFFT) based on FFTW and the highly scalable two-dimensional data distribution. 1d-FFT Combined with Local Transposition ˆn ˆn 1 ˆn 2 ˆn ˆn 1 ˆn 2 ˆn ˆn 1 ˆn 2 (ˆn ˆn 1 ) ˆn 2 FFT2 12 FFT2 12 FFT2 21 FFT2 21 ˆn ˆn 1 n 2 ˆn 1 ˆn n 2 ˆn n 2 ˆn 1 n 2 (ˆn ˆn 1 )

18 Algorithms Supported by FFTW3.3 Transposition of One-Dimensional Distributed Data N N 1 P T N P N 1 Group two of the three dimensions to use FFTWs matrix transposition on two-dimensional decomposed data, e.g. N = n 2, N 1 = n n 1, P = p. Transposition of Two-Dimensional Distributed Data n 2 ( n p n 1 ) T n 2 p (n n 1 )

19 Two-Dimensional Distributed FFT Based on FFTW PFFT Forward Transform ˆn p ˆn 1 ˆn 2 n 2 ˆn p ˆn 1 n 1 p n 2 ˆn FFT2 21 FFT2 21 FFT2 12 n 2 ˆn p ˆn 1 n 1 n 2 ˆn p n 1 p n 2 n T T PFFT Backward Transform n 1 p n 2 n n 1 n 2 ˆn p n 2 ˆn p ˆn 1 FFT2 12 FFT 12 FFT 12 n 1 p n 2 ˆn n 2 ˆn p ˆn 1 ˆn p ˆn 1 ˆn 2 T T

20 Scaling FFT of Size on BlueGene/P 1 Perfect PFFT P3DFFT wall clock time in s number of cores

21 Scaling FFT of Size on BlueGene/P Perfect PFFT P3DFFT 2 14 speedup number of cores

22 Scaling FFT of Size on BlueGene/P wall clock time in s Perfect PFFT P3DFFT number of processors

23 Scaling FFT of Size on BlueGene/P Perfect PFFT P3DFFT speedup number of cores

24 Comparison of PFFT and P3DFFT Common Features open source high scalability portability multiple precisions Fortran interface C interface r2c FFT ghost cell support PFFT Unique Features c2c FFT completely in place FFT FFTW like interface (basic, advanced and guru) adjustable blocksize separate communicator accumulated wisdom change of planning effort without recompilation d-dimensional parallel FFT over- and downsampling

25 Ghost Cell Support GC p p

26 Oversampled & Downsampled FFT Without Library Support N N 1 P P 1 P 2 P 3 OS n n 1 P P 1 P 2 P 3 FFT ˆn ˆn 1 P 3 P 2 P 1 P PFFT Library Support N 1 N 1 N OS n P P 1 P 2 P 3 P P 1 P 2 P 3 FFT ˆn N 1 P P 1 P 2 P 3 T ˆn N 1 OS P 3 P 2 P 1 P ˆn ˆn 1 P 3 P 2 P 1 P FFT ˆn n 1 P 3 P 2 P 1 P

27 PFFT Software Library PFFT Software Library [Pippig] Available at

28 Parallel NFFT Matrix-Vector-Notation Compute f = (f j ) j C M via f = Aˆf, where A = (e 2πikx j ) k,j C M N 3 and ˆf = (ˆf k k 1 k 2 ) k C N 3. Approximation [Dutt, Rohklin 93, Beylkin 95, Steidl 96,... ] A BFD, A DF B T where D R N 3 N 3 diagonal matrix F C n3 N 3 truncated Fourier matrix (n N ) B R M n3 sparse matrix

29 PNFFT in Pictures D F B n 1 n n 2 FFT n n 1 n 2 Ghost p p1 p

30 PNFFT Software Library PNFFT Software Library [Pippig] Available at

31 Outline 1 Serial FFT Algorithms 2 Parallel FFT Algorithms 3 Parallel Fast Summation

32 Fast Summation Task Fast computation of h j = L a l K( y j x l 2 ), j = 1,..., M, y j, x l R 3 l=1 Example of Radial Kernel Function K( x 2 ) = 1 x 2,... Realized with 3d-NFFT [NFFT Software Library] O(log 3 ( 1 ε )(M + L)) instead of O(ML)

33 Parallel Fast Summation Matrix-Vector-Notation Compute h = (h j ) M j=1 via h = Ka, where K = (K( y j x l )) M,M j,l=1 and a = (a l) M l=1 CM. Standard Algorithm for Equispaced Nodes where K = FDF F C M M equispaced Fourier matrix D C M M diagonal matrix

34 Parallel Fast Summation Matrix-Vector-Notation Compute h = (h j ) M j=1 via h = Ka, where K = (K( y j x l )) M,L j,l=1 and a = (a l) L l=1 CL. Approximation [Potts, Steidl, Nieslony 24] where A 2 = ( e 2πiky ) j K A 2 DA 1 + K NF j,k CM N 3 nonequispaced Fourier matrix D C N 3 N 3 diagonal matrix A 1 = ( e 2πikx ) l nonequispaced Fourier matrix l,k CL N 3 K NF C M L sparse near field correction

35 Summary FFTW Features performance interface portability High Scalability 2d data distribution PFFT Additional Features oversampling downsampling ghost cells PNFFT Nearfield correction Parallel Fast Summation

Introduction to Practical FFT and NFFT

Introduction to Practical FFT and NFFT Introduction to Practical FFT and NFFT Michael Pippig and Daniel Potts Faculty of Mathematics Chemnitz University of Technology 07.09.2010 supported by BMBF grant 01IH08001B Table of Contents 1 Serial

More information

Particle Simulation Based on Nonequispaced Fast Fourier Transforms

Particle Simulation Based on Nonequispaced Fast Fourier Transforms Particle Simulation Based on Nonequispaced Fast Fourier Transforms Michael Pippig and Daniel Potts Chemnitz University of Technology Department of Mathematics 0907 Chemnitz, Germany E-mail: {michael.pippig,

More information

Problem descriptions: Non-uniform fast Fourier transform

Problem descriptions: Non-uniform fast Fourier transform Problem descriptions: Non-uniform fast Fourier transform Lukas Exl 1 1 University of Vienna, Institut für Mathematik, Oskar-Morgenstern-Platz 1, A-1090 Wien, Austria December 11, 2015 Contents 0.1 Software

More information

Generated sets as sampling schemes for hyperbolic cross trigonometric polynomials

Generated sets as sampling schemes for hyperbolic cross trigonometric polynomials Generated sets as sampling schemes for hyperbolic cross trigonometric polynomials Lutz Kämmerer Chemnitz University of Technology joint work with S. Kunis and D. Potts supported by DFG-SPP 1324 1 / 16

More information

Sparse and Fast Fourier Transforms in Biomedical Imaging

Sparse and Fast Fourier Transforms in Biomedical Imaging Sparse and Fast Fourier Transforms in Biomedical Imaging Stefan Kunis 1 1 joint work with Ines Melzer, TU Chemnitz and Helmholtz Center Munich, supported by DFG and Helmholtz association Outline 1 Trigonometric

More information

Numerical stability of nonequispaced fast Fourier transforms

Numerical stability of nonequispaced fast Fourier transforms Numerical stability of nonequispaced fast Fourier transforms Daniel Potts and Manfred Tasche Dedicated to Franz Locher in honor of his 65th birthday This paper presents some new results on numerical stability

More information

Using NFFT 3 a software library for various nonequispaced fast Fourier transforms

Using NFFT 3 a software library for various nonequispaced fast Fourier transforms Using NFFT 3 a software library for various nonequispaced fast Fourier transforms Jens Keiner University of Lübeck 23560 Lübeck, Germany keiner@math.uni-luebeck.de and Stefan Kunis Chemnitz University

More information

The Nonequispaced FFT and its Applications A Mini Course at Helsinki University of Technology

The Nonequispaced FFT and its Applications A Mini Course at Helsinki University of Technology The Nonequispaced FFT and its Applications A Mini Course at Helsinki University of Technology Jens Keiner 1 Stefan Kunis 2 Antje Vollrath 1 1 Institut für Mathematik, Universität zu Lübeck 2 Fakultät für

More information

Fast Ewald Summation based on NFFT with Mixed Periodicity

Fast Ewald Summation based on NFFT with Mixed Periodicity Fast Ewald Summation based on NFFT with Mixed Periodicity Franziska Nestler, Michael Pippig and Daniel Potts In this paper we develop new fast Fourier-based methods for the Coulomb problem. We combine

More information

NFFT Tutorial. potts/nfft

NFFT Tutorial.  potts/nfft NFFT 3.0 - Tutorial Jens Keiner Stefan Kunis Daniel Potts http://www.tu-chemnitz.de/ potts/nfft keiner@math.uni-luebeck.de,university of Lübeck, Institute of Mathematics, 23560 Lübeck kunis@mathematik.tu-chemnitz.de,

More information

Periodic motions. Periodic motions are known since the beginning of mankind: Motion of planets around the Sun; Pendulum; And many more...

Periodic motions. Periodic motions are known since the beginning of mankind: Motion of planets around the Sun; Pendulum; And many more... Periodic motions Periodic motions are known since the beginning of mankind: Motion of planets around the Sun; Pendulum; And many more... Periodic motions There are several quantities which describe a periodic

More information

Recent Results for Moving Least Squares Approximation

Recent Results for Moving Least Squares Approximation Recent Results for Moving Least Squares Approximation Gregory E. Fasshauer and Jack G. Zhang Abstract. We describe two experiments recently conducted with the approximate moving least squares (MLS) approximation

More information

A Fast Algorithm for Spherical Grid Rotations and its Application to Singular Quadrature

A Fast Algorithm for Spherical Grid Rotations and its Application to Singular Quadrature A Fast Algorithm for Spherical Grid Rotations and its Application to Singular Quadrature Zydrunas Gimbutas Shravan Veerapaneni September 11, 2013 Abstract We present a fast and accurate algorithm for evaluating

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Numerical Minimization of Potential Energies on Specific Manifolds

Numerical Minimization of Potential Energies on Specific Manifolds Numerical Minimization of Potential Energies on Specific Manifolds SIAM Conference on Applied Linear Algebra 2012 22 June 2012, Valencia Manuel Gra f 1 1 Chemnitz University of Technology, Germany, supported

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

Fourier methods. Lukas Palatinus. Institute of Physics AS CR, Prague, Czechia

Fourier methods. Lukas Palatinus. Institute of Physics AS CR, Prague, Czechia Fourier methods Lukas Palatinus Institute of Physics AS CR, Prague, Czechia Outline Fourier transform: mathematician s viewpoint crystallographer s viewpoint programmer s viewpoint Practical aspects libraries,

More information

Introduction to the FFT

Introduction to the FFT Introduction to the FFT 1 Introduction Assume that we have a signal fx where x denotes time. We would like to represent fx as a linear combination of functions e πiax or, equivalently, sinπax and cosπax

More information

Numerical Approximation Methods for Non-Uniform Fourier Data

Numerical Approximation Methods for Non-Uniform Fourier Data Numerical Approximation Methods for Non-Uniform Fourier Data Aditya Viswanathan aditya@math.msu.edu 2014 Joint Mathematics Meetings January 18 2014 0 / 16 Joint work with Anne Gelb (Arizona State) Guohui

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

5.6 Convolution and FFT

5.6 Convolution and FFT 5.6 Convolution and FFT Fast Fourier Transform: Applications Applications. Optics, acoustics, quantum physics, telecommunications, control systems, signal processing, speech recognition, data compression,

More information

Sparse Fourier Transform via Butterfly Algorithm

Sparse Fourier Transform via Butterfly Algorithm Sparse Fourier Transform via Butterfly Algorithm Lexing Ying Department of Mathematics, University of Texas, Austin, TX 78712 December 2007, revised June 2008 Abstract This paper introduces a fast algorithm

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 14 Divide and Conquer Fast Fourier Transform Sofya Raskhodnikova 10/7/2016 S. Raskhodnikova; based on slides by K. Wayne. 5.6 Convolution and FFT Fast Fourier Transform:

More information

Review for the Midterm Exam

Review for the Midterm Exam Review for the Midterm Exam 1 Three Questions of the Computational Science Prelim scaled speedup network topologies work stealing 2 The in-class Spring 2012 Midterm Exam pleasingly parallel computations

More information

Explore Computational Power of GPU in Electromagnetics and Micromagnetics

Explore Computational Power of GPU in Electromagnetics and Micromagnetics Explore Computational Power of GPU in Electromagnetics and Micromagnetics Presenter: Sidi Fu, PhD candidate, UC San Diego Advisor: Prof. Vitaliy Lomakin Center of Magnetic Recording Research, Department

More information

Non-Uniform Fast Fourier Transformation (NUFFT) and Magnetic Resonace Imaging 2008 년 11 월 27 일. Lee, June-Yub (Ewha Womans University)

Non-Uniform Fast Fourier Transformation (NUFFT) and Magnetic Resonace Imaging 2008 년 11 월 27 일. Lee, June-Yub (Ewha Womans University) Non-Uniform Fast Fourier Transformation (NUFFT) and Magnetic Resonace Imaging 2008 년 11 월 27 일 Lee, June-Yub (Ewha Womans University) 1 NUFFT and MRI Nonuniform Fast Fourier Transformation (NUFFT) Basics

More information

Direct Methods for Reconstruction of Functions and their Edges from Non-Uniform Fourier Data

Direct Methods for Reconstruction of Functions and their Edges from Non-Uniform Fourier Data Direct Methods for Reconstruction of Functions and their Edges from Non-Uniform Fourier Data Aditya Viswanathan aditya@math.msu.edu ICERM Research Cluster Computational Challenges in Sparse and Redundant

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 13 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel Numerical Algorithms

More information

Effects of Forcing Scheme on the Flow and the Relative Motion of Inertial Particles in DNS of Isotropic Turbulence

Effects of Forcing Scheme on the Flow and the Relative Motion of Inertial Particles in DNS of Isotropic Turbulence Effects of Forcing Scheme on the Flow and the Relative Motion of Inertial Particles in DNS of Isotropic Turbulence Rohit Dhariwal PI: Sarma L. Rani Department of Mechanical and Aerospace Engineering The

More information

HIGHLY EFFECTIVE STABLE EVALUATION OF BANDLIMITED FUNCTIONS ON THE SPHERE

HIGHLY EFFECTIVE STABLE EVALUATION OF BANDLIMITED FUNCTIONS ON THE SPHERE HIGHLY EFFECTIVE STABLE EVALUATION OF BANDLIMITED FUNCTIONS ON THE SPHERE KAMEN IVANOV AND PENCHO PETRUSHEV Abstract. An algorithm for fast and accurate evaluation of band-limited functions at many scattered

More information

Chapter 2 Algorithms for Periodic Functions

Chapter 2 Algorithms for Periodic Functions Chapter 2 Algorithms for Periodic Functions In this chapter we show how to compute the Discrete Fourier Transform using a Fast Fourier Transform (FFT) algorithm, including not-so special case situations

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

Interpolation lattices for hyperbolic cross trigonometric polynomials

Interpolation lattices for hyperbolic cross trigonometric polynomials Interpolation lattices for hyperbolic cross trigonometric polynomials Lutz Kämmerer a, Stefan Kunis b, Daniel Potts a a Chemnitz University of Technology, Faculty of Mathematics, 09107 Chemnitz, Germany

More information

Cyclops Tensor Framework

Cyclops Tensor Framework Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r

More information

A Fast Spherical Filter with Uniform Resolution

A Fast Spherical Filter with Uniform Resolution JOURNAL OF COMPUTATIONAL PHYSICS 136, 580 584 (1997) ARTICLE NO. CP975782 A Fast Spherical Filter with Uniform Resolution Rüdiger Jakob-Chien*, and Bradley K. Alpert *Department of Computer Science & Engineering,

More information

Performance optimization of WEST and Qbox on Intel Knights Landing

Performance optimization of WEST and Qbox on Intel Knights Landing Performance optimization of WEST and Qbox on Intel Knights Landing Huihuo Zheng 1, Christopher Knight 1, Giulia Galli 1,2, Marco Govoni 1,2, and Francois Gygi 3 1 Argonne National Laboratory 2 University

More information

The Ongoing Development of CSDP

The Ongoing Development of CSDP The Ongoing Development of CSDP Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu Joseph Young Department of Mathematics New Mexico Tech (Now at Rice University)

More information

Fourier extension and sampling on the sphere

Fourier extension and sampling on the sphere Fourier extension and sampling on the sphere Daniel Potts Technische Universität Chemnitz Faculty of Mathematics 97 Chemnitz, Germany Email: potts@mathematiktu-chemnitzde iel Van Buggenhout KU Leuven Faculty

More information

Dark Energy and Massive Neutrino Universe Covariances

Dark Energy and Massive Neutrino Universe Covariances Dark Energy and Massive Neutrino Universe Covariances (DEMNUniCov) Carmelita Carbone Physics Dept, Milan University & INAF-Brera Collaborators: M. Calabrese, M. Zennaro, G. Fabbian, J. Bel November 30

More information

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional

More information

A CUDA Solver for Helmholtz Equation

A CUDA Solver for Helmholtz Equation Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College

More information

Discretization of PDEs and Tools for the Parallel Solution of the Resulting Systems

Discretization of PDEs and Tools for the Parallel Solution of the Resulting Systems Discretization of PDEs and Tools for the Parallel Solution of the Resulting Systems Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee Wednesday April 4,

More information

Non-Equispaced Fast Fourier Transforms in Turbulence Simulation

Non-Equispaced Fast Fourier Transforms in Turbulence Simulation University of Massachusetts Amherst ScholarWorks@UMass Amherst Masters Theses Dissertations and Theses 2017 Non-Equispaced Fast Fourier Transforms in Turbulence Simulation Aditya M. Kulkarni University

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

The parallelization of the Keller box method on heterogeneous cluster of workstations

The parallelization of the Keller box method on heterogeneous cluster of workstations Available online at http://wwwibnusinautmmy/jfs Journal of Fundamental Sciences Article The parallelization of the Keller box method on heterogeneous cluster of workstations Norhafiza Hamzah*, Norma Alias,

More information

Strassen s Algorithm for Tensor Contraction

Strassen s Algorithm for Tensor Contraction Strassen s Algorithm for Tensor Contraction Jianyu Huang, Devin A. Matthews, Robert A. van de Geijn The University of Texas at Austin September 14-15, 2017 Tensor Computation Workshop Flatiron Institute,

More information

Data analysis of massive data sets a Planck example

Data analysis of massive data sets a Planck example Data analysis of massive data sets a Planck example Radek Stompor (APC) LOFAR workshop, Meudon, 29/03/06 Outline 1. Planck mission; 2. Planck data set; 3. Planck data analysis plan and challenges; 4. Planck

More information

Fast Algorithms for Spherical Harmonic Expansions

Fast Algorithms for Spherical Harmonic Expansions Fast Algorithms for Spherical Harmonic Expansions Vladimir Rokhlin and Mark Tygert Research Report YALEU/DCS/RR-1309 December 17, 004 Abstract An algorithm is introduced for the rapid evaluation at appropriately

More information

Linear Scale-Space. Algorithms for Linear Scale-Space. Gaussian Image Derivatives. Rein van den Boomgaard University of Amsterdam.

Linear Scale-Space. Algorithms for Linear Scale-Space. Gaussian Image Derivatives. Rein van den Boomgaard University of Amsterdam. $ ( Linear Scale-Space Linear scale-space is constructed by embedding an image family of images that satisfy the diffusion equation: into a one parameter with initial condition convolution: The solution

More information

INITIAL INTEGRATION AND EVALUATION

INITIAL INTEGRATION AND EVALUATION INITIAL INTEGRATION AND EVALUATION OF SLATE PARALLEL BLAS IN LATTE Marc Cawkwell, Danny Perez, Arthur Voter Asim YarKhan, Gerald Ragghianti, Jack Dongarra, Introduction The aim of the joint milestone STMS10-52

More information

arxiv: v1 [physics.comp-ph] 22 Nov 2012

arxiv: v1 [physics.comp-ph] 22 Nov 2012 A Customized 3D GPU Poisson Solver for Free BCs Nazim Dugan a, Luigi Genovese b, Stefan Goedecker a, a Department of Physics, University of Basel, Klingelbergstr. 82, 4056 Basel, Switzerland b Laboratoire

More information

Building a better non-uniform fast Fourier transform

Building a better non-uniform fast Fourier transform Building a better non-uniform fast Fourier transform ICERM 3/12/18 Alex Barnett (Center for Computational Biology, Flatiron Institute) This work is collaboration with Jeremy Magland. We benefited from

More information

Binding Performance and Power of Dense Linear Algebra Operations

Binding Performance and Power of Dense Linear Algebra Operations 10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique

More information

Development of Molecular Dynamics Simulation System for Large-Scale Supra-Biomolecules, PABIOS (PArallel BIOmolecular Simulator)

Development of Molecular Dynamics Simulation System for Large-Scale Supra-Biomolecules, PABIOS (PArallel BIOmolecular Simulator) Development of Molecular Dynamics Simulation System for Large-Scale Supra-Biomolecules, PABIOS (PArallel BIOmolecular Simulator) Group Representative Hisashi Ishida Japan Atomic Energy Research Institute

More information

RWTH Aachen University

RWTH Aachen University IPCC @ RWTH Aachen University Optimization of multibody and long-range solvers in LAMMPS Rodrigo Canales William McDoniel Markus Höhnerbach Ahmed E. Ismail Paolo Bientinesi IPCC Showcase November 2016

More information

On the computation of spherical designs by a new optimization approach based on fast spherical Fourier transforms

On the computation of spherical designs by a new optimization approach based on fast spherical Fourier transforms On the computation of spherical designs by a new optimization approach based on fast spherical Fourier transforms anuel Gräf Daniel Potts In this paper we consider the problem of finding numerical spherical

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Faster arithmetic for number-theoretic transforms

Faster arithmetic for number-theoretic transforms University of New South Wales 7th October 2011, Macquarie University Plan for talk 1. Review number-theoretic transform (NTT) 2. Discuss typical butterfly algorithm 3. Improvements to butterfly algorithm

More information

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing

More information

c 2006 Society for Industrial and Applied Mathematics

c 2006 Society for Industrial and Applied Mathematics SIAM J. SCI. COMPUT. Vol. 7, No. 6, pp. 1903 198 c 006 Society for Industrial and Applied Mathematics FAST ALGORITHMS FOR SPHERICAL HARMONIC EXPANSIONS VLADIMIR ROKHLIN AND MARK TYGERT Abstract. An algorithm

More information

Hybrid parallelization of a pseudo-spectral DNS code and its computational performance on RZG s idataplex system Hydra

Hybrid parallelization of a pseudo-spectral DNS code and its computational performance on RZG s idataplex system Hydra Hybrid parallelization of a pseudo-spectral DNS code and its computational performance on RZG s idataplex system Hydra Markus Rampp 1, Liang Shi 2, Marc Avila 3,2, Björn Hof 2,4 1 Computing Center of the

More information

A Fast Fourier transform based direct solver for the Helmholtz problem. AANMPDE 2017 Palaiochora, Crete, Oct 2, 2017

A Fast Fourier transform based direct solver for the Helmholtz problem. AANMPDE 2017 Palaiochora, Crete, Oct 2, 2017 A Fast Fourier transform based direct solver for the Helmholtz problem Jari Toivanen Monika Wolfmayr AANMPDE 2017 Palaiochora, Crete, Oct 2, 2017 Content 1 Model problem 2 Discretization 3 Fast solver

More information

A Parallel Implementation of the. Yuan-Jye Jason Wu y. September 2, Abstract. The GTH algorithm is a very accurate direct method for nding

A Parallel Implementation of the. Yuan-Jye Jason Wu y. September 2, Abstract. The GTH algorithm is a very accurate direct method for nding A Parallel Implementation of the Block-GTH algorithm Yuan-Jye Jason Wu y September 2, 1994 Abstract The GTH algorithm is a very accurate direct method for nding the stationary distribution of a nite-state,

More information

A hierarchical Model for the Analysis of Efficiency and Speed-up of Multi-Core Cluster-Computers

A hierarchical Model for the Analysis of Efficiency and Speed-up of Multi-Core Cluster-Computers A hierarchical Model for the Analysis of Efficiency and Speed-up of Multi-Core Cluster-Computers H. Kredel 1, H. G. Kruse 1 retired, S. Richling2 1 IT-Center, University of Mannheim, Germany 2 IT-Center,

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Frequency-domain representation of discrete-time signals

Frequency-domain representation of discrete-time signals 4 Frequency-domain representation of discrete-time signals So far we have been looing at signals as a function of time or an index in time. Just lie continuous-time signals, we can view a time signal as

More information

THE TWO DIMENSIONAL COUPLED NONLINEAR SCHRÖDINGER EQUATION- NUMERICAL METHODS AND EXPERIMENTS HARINI MEDIKONDURU

THE TWO DIMENSIONAL COUPLED NONLINEAR SCHRÖDINGER EQUATION- NUMERICAL METHODS AND EXPERIMENTS HARINI MEDIKONDURU THE TWO DIMENSIONAL COUPLED NONLINEAR SCHRÖDINGER EQUATION- NUMERICAL METHODS AND EXPERIMENTS by HARINI MEDIKONDURU (Under the Direction of Thiab R. Taha) ABSTRACT The coupled nonlinear Schrödinger equation

More information

Parameter estimation for exponential sums by approximate Prony method

Parameter estimation for exponential sums by approximate Prony method Parameter estimation for exponential sums by approximate Prony method Daniel Potts a, Manfred Tasche b a Chemnitz University of Technology, Department of Mathematics, D 09107 Chemnitz, Germany b University

More information

Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Accelerating Model Reduction of Large Linear Systems with Graphics Processors Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex

More information

Some Geometric and Algebraic Aspects of Domain Decomposition Methods

Some Geometric and Algebraic Aspects of Domain Decomposition Methods Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition

More information

How to Multiply. 5.5 Integer Multiplication. Complex Multiplication. Integer Arithmetic. Complex multiplication. (a + bi) (c + di) = x + yi.

How to Multiply. 5.5 Integer Multiplication. Complex Multiplication. Integer Arithmetic. Complex multiplication. (a + bi) (c + di) = x + yi. How to ultiply Slides by Kevin Wayne. Copyright 5 Pearson-Addison Wesley. All rights reserved. integers, matrices, and polynomials Complex ultiplication Complex multiplication. a + bi) c + di) = x + yi.

More information

Fast Algorithms for the Computation of Oscillatory Integrals

Fast Algorithms for the Computation of Oscillatory Integrals Fast Algorithms for the Computation of Oscillatory Integrals Emmanuel Candès California Institute of Technology EPSRC Symposium Capstone Conference Warwick Mathematics Institute, July 2009 Collaborators

More information

Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools

Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools Xiangjun Wu (1),Lilun Zhang (2),Junqiang Song (2) and Dehui Chen (1) (1) Center for Numerical Weather Prediction, CMA (2) School

More information

On the Computational Complexity of the Discrete Pascal Transform

On the Computational Complexity of the Discrete Pascal Transform 6 th International Conference Logic and Applications LAP 207, September 8-22, 207, Dubrovnik, Croatia On the Computational Complexity of the Discrete Pascal Transform Dušan B. Gajić, Radomir S. Stanković

More information

A Scalable, Parallel Implementation of Weighted, Non-Linear Compact Schemes

A Scalable, Parallel Implementation of Weighted, Non-Linear Compact Schemes A Scalable, Parallel Implementation of Weighted, Non-Linear Compact Schemes Debojyoti Ghosh Emil M. Constantinescu Jed Brown Mathematics Computer Science Argonne National Laboratory SIAM Annual Meeting

More information

Coskewness and Cokurtosis John W. Fowler July 9, 2005

Coskewness and Cokurtosis John W. Fowler July 9, 2005 Coskewness and Cokurtosis John W. Fowler July 9, 2005 The concept of a covariance matrix can be extended to higher moments; of particular interest are the third and fourth moments. A very common application

More information

Accelerating AES Using Instruction Set Extensions for Elliptic Curve Cryptography. Stefan Tillich, Johann Großschädl

Accelerating AES Using Instruction Set Extensions for Elliptic Curve Cryptography. Stefan Tillich, Johann Großschädl Accelerating AES Using Instruction Set Extensions for Elliptic Curve Cryptography International Workshop on Information Security & Hiding (ISH '05) Institute for Applied Information Processing and Communications

More information

Fast Poisson solvers on nonequispaced grids: Multigrid and Fourier Methods compared

Fast Poisson solvers on nonequispaced grids: Multigrid and Fourier Methods compared Fast Poisson solvers on nonequispaced grids: Multigrid and Fourier Methods compared Gisela Pöplau a and Daniel Potts b a Institute of General Electrical Engineering, Rostock University, D 805 Rostock,

More information

Fourier Reconstruction from Non-Uniform Spectral Data

Fourier Reconstruction from Non-Uniform Spectral Data School of Electrical, Computer and Energy Engineering, Arizona State University aditya.v@asu.edu With Profs. Anne Gelb, Doug Cochran and Rosemary Renaut Research supported in part by National Science Foundation

More information

The Fastest Convolution in the West

The Fastest Convolution in the West The Fastest Convolution in the West Malcolm Roberts and John Bowman University of Alberta 2011-02-15 www.math.ualberta.ca/ mroberts 1 Outline Convolution Definition Applications Fast Convolutions Dealiasing

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar

More information

Phase Retrieval from Local Measurements: Deterministic Measurement Constructions and Efficient Recovery Algorithms

Phase Retrieval from Local Measurements: Deterministic Measurement Constructions and Efficient Recovery Algorithms Phase Retrieval from Local Measurements: Deterministic Measurement Constructions and Efficient Recovery Algorithms Aditya Viswanathan Department of Mathematics and Statistics adityavv@umich.edu www-personal.umich.edu/

More information

GLoBES. Patrick Huber. Physics Department VT. P. Huber p. 1

GLoBES. Patrick Huber. Physics Department VT. P. Huber p. 1 GLoBES Patrick Huber Physics Department VT P. Huber p. 1 P. Huber p. 2 General Long Baseline Experiment Simulator GLoBES is a software package designed for Simulation Analysis Comparison of neutrino oscillation

More information

A Framework for Discrete Integral Transformations I the Pseudo-polar Fourier transform

A Framework for Discrete Integral Transformations I the Pseudo-polar Fourier transform A Framework for Discrete Integral Transformations I the Pseudo-polar Fourier transform A. Averbuch, R.R. Coifman, D.L. Donoho, M. Israeli, and Y. Shkolnisky June 26, 2007 This paper is dedicated to the

More information

High precision pixel-based simulations of CMB lensing

High precision pixel-based simulations of CMB lensing Laboratoire APC AstroParticule et Cosmologie University of Paris V Denis Diderot 22nd April 2011 Berkeley CMB lensing workshop Outline 1 Pixel Based methods 2 LenS 2 HAT 3 Application 4 Conclusions How

More information

Fastfood Approximating Kernel Expansions in Loglinear Time. Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle)

Fastfood Approximating Kernel Expansions in Loglinear Time. Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle) Fastfood Approximating Kernel Expansions in Loglinear Time Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle) Large Scale Problem: ImageNet Challenge Large scale data Number of training

More information

WEAK FORM NONUNIFORM FAST FOURIER TRANSFORM METHOD FOR SOLVING VOLUME INTEGRAL EQUATIONS

WEAK FORM NONUNIFORM FAST FOURIER TRANSFORM METHOD FOR SOLVING VOLUME INTEGRAL EQUATIONS Progress In Electromagnetics Research, PIER 89, 275 289, 2009 WEAK FORM NONUNIFORM FAST FOURIER TRANSFORM METHOD FOR SOLING OLUME INTEGRAL EQUATIONS Z. H. Fan, R. S. Chen, H. Chen, and D. Z. Ding Department

More information

FFT-based Dense Polynomial Arithmetic on Multi-cores

FFT-based Dense Polynomial Arithmetic on Multi-cores FFT-based Dense Polynomial Arithmetic on Multi-cores Yuzhen Xie Computer Science and Artificial Intelligence Laboratory, MIT and Marc Moreno Maza Ontario Research Centre for Computer Algebra, UWO ACA 2009,

More information

Fall 2011, EE123 Digital Signal Processing

Fall 2011, EE123 Digital Signal Processing Lecture 6 Miki Lustig, UCB September 11, 2012 Miki Lustig, UCB DFT and Sampling the DTFT X (e jω ) = e j4ω sin2 (5ω/2) sin 2 (ω/2) 5 x[n] 25 X(e jω ) 4 20 3 15 2 1 0 10 5 1 0 5 10 15 n 0 0 2 4 6 ω 5 reconstructed

More information

PARALLEL PSEUDO-SPECTRAL SIMULATIONS OF NONLINEAR VISCOUS FINGERING IN MIS- Center for Parallel Computations, COPPE / Federal University of Rio de

PARALLEL PSEUDO-SPECTRAL SIMULATIONS OF NONLINEAR VISCOUS FINGERING IN MIS- Center for Parallel Computations, COPPE / Federal University of Rio de PARALLEL PSEUDO-SPECTRAL SIMULATIONS OF NONLINEAR VISCOUS FINGERING IN MIS- CIBLE DISPLACEMENTS N. Mangiavacchi, A.L.G.A. Coutinho, N.F.F. Ebecken Center for Parallel Computations, COPPE / Federal University

More information

Parallel Sparse Tensor Decompositions using HiCOO Format

Parallel Sparse Tensor Decompositions using HiCOO Format Figure sources: A brief survey of tensors by Berton Earnshaw and NVIDIA Tensor Cores Parallel Sparse Tensor Decompositions using HiCOO Format Jiajia Li, Jee Choi, Richard Vuduc May 8, 8 @ SIAM ALA 8 Outline

More information

V C V L T I 0 C V B 1 V T 0 I. l nk

V C V L T I 0 C V B 1 V T 0 I. l nk Multifrontal Method Kailai Xu September 16, 2017 Main observation. Consider the LDL T decomposition of a SPD matrix [ ] [ ] [ ] [ ] B V T L 0 I 0 L T L A = = 1 V T V C V L T I 0 C V B 1 V T, 0 I where

More information

A parallel implementation for polynomial multiplication modulo a prime.

A parallel implementation for polynomial multiplication modulo a prime. A parallel implementation for polynomial multiplication modulo a prime. ABSTRACT Marshall Law Department of Mathematics Simon Fraser University Burnaby, B.C. Canada. mylaw@sfu.ca. We present a parallel

More information

Optimizing GROMACS for parallel performance

Optimizing GROMACS for parallel performance Optimizing GROMACS for parallel performance Outline 1. Why optimize? Performance status quo 2. GROMACS as a black box. (PME) 3. How does GROMACS spend its time? (MPE) 4. What you can do What I want to

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer September 9, 23 PPAM 23 1 Ian Glendinning / September 9, 23 Outline Introduction Quantum Bits, Registers

More information

Parallelism in Structured Newton Computations

Parallelism in Structured Newton Computations Parallelism in Structured Newton Computations Thomas F Coleman and Wei u Department of Combinatorics and Optimization University of Waterloo Waterloo, Ontario, Canada N2L 3G1 E-mail: tfcoleman@uwaterlooca

More information

A Primer on Three Vectors

A Primer on Three Vectors Michael Dine Department of Physics University of California, Santa Cruz September 2010 What makes E&M hard, more than anything else, is the problem that the electric and magnetic fields are vectors, and

More information

Poisson Solvers. William McLean. April 21, Return to Math3301/Math5315 Common Material.

Poisson Solvers. William McLean. April 21, Return to Math3301/Math5315 Common Material. Poisson Solvers William McLean April 21, 2004 Return to Math3301/Math5315 Common Material 1 Introduction Many problems in applied mathematics lead to a partial differential equation of the form a 2 u +

More information

A Non-sparse Tutorial on Sparse FFTs

A Non-sparse Tutorial on Sparse FFTs A Non-sparse Tutorial on Sparse FFTs Mark Iwen Michigan State University April 8, 214 M.A. Iwen (MSU) Fast Sparse FFTs April 8, 214 1 / 34 Problem Setup Recover f : [, 2π] C consisting of k trigonometric

More information