Introduction to Practical FFT and NFFT
|
|
- Laura Neal
- 5 years ago
- Views:
Transcription
1 Introduction to Practical FFT and NFFT Michael Pippig and Daniel Potts Department of Mathematics Chemnitz University of Technology September 14, 211 supported by BMBF grant 1IH81B
2 Table of Contents 1 Serial FFT Algorithms 2 Parallel FFT Algorithms 3 Parallel Fast Summation
3 Outline 1 Serial FFT Algorithms 2 Parallel FFT Algorithms 3 Parallel Fast Summation
4 Discrete Fast Fourier Transform Task of 3d-DFT Consider a three-dimensional dataset of n n 1 n 2 complex numbers ĝ k k 1 k 2 C. Compute g l l 1 l 2 = n 1 n 1 1 n 2 1 k = k 1 = k 2 = for all l t =,..., n t 1 (t =, 1, 2). Realized by 3d-FFT (n = n 1 = n 2 = n) O(n 3 log n) instead of O(n 6 ) ĝ k k 1 k 2 e 2πi(k 2 l 2 n2 +k 1 l 1 n1 +k l n )
5 FFT Software Libraries Examples of FFT implementations IBM ESSL Intel MKL FFTW Features of FFTW [Frigo, Johnson] public available open source high performance many transforms arbitrary size d-dim. FFT in place FFT Available at collect wisdom adjust planning easy interface
6 Using FFTW FFTW Plan - only once hardware adaptive time consuming Execute - several times fast transform Finalize - only once free memory FFTW_ESTIMATE heuristic choice of algorithm FFTW_MEASURE compare different algorithms FFTW_PATIENT compare more algorithms FFTW_EXHAUSTIVE compare all available algorithms
7 FFTW Interface Basic Interface simple do a single transform Advanced Interface do a transform on multiple datasets by one call supports strided input and output use a plan on different datasets Guru Interface most powerful and most complicated combine transform and data permutation 64 bit compatible
8 Nonequispaced Discrete Fourier Transform Task of 3d-DFT and 3d-NDFT For ˆf k k 1 k 2 C compute f l l 1 l 2 := N 1 N 1 N 1 k = k 1 = k 2 = ˆfk k 1 k 2 e 2πi(k 2 l 2 N +k 1 l 1N +k l N ) (DFT) for all l t < N ( l t N < 1), t =, 1, 2, and compute f j := N 1 N 1 N 1 k = k 1 = k 2 = ˆfk k 1 k 2 e 2πi(k 2x (2) j +k 1 x (1) j +k x () j ) (NDFT) for x (t) j [, 1) (t =, 1, 2), j = 1,..., M. Realized by 3d-NFFT [NFFT software library] O(N 3 log N + log 3 ( 1 ε )M ) instead of O(N 3 M )
9 NFFT Matrix-Vector-Notation Compute f = (f j ) j C M via f = Aˆf, where A = (e 2πikx j ) k,j C M N 3 and ˆf = (ˆf k k 1 k 2 ) k C N 3. Approximation [Dutt, Rohklin 93, Beylkin 95, Steidl 96,... ] A BFD, A DF B T where D R N 3 N 3 diagonal matrix F C n3 N 3 truncated Fourier matrix (n N ) B R M n3 sparse matrix
10 NFFT Software Library NFFT Software Library [Keiner, Kunis, Potts] Available at
11 Using NFFT FFTW Plan NFFT Plan Precompute Execute Execute Finalize Finalize
12 NFFT Precompute PRE_FULL_PSI fully precomputed window function Storage: (2m + 2) d M, Computation: None PRE_PSI tensor product based precomputation Storage: d(2m + 2)M, Computation: (2m + 2) d M PRE_LIN_PSI linear interpolation from lookup table Storage: dk, Computation: 2(2m + 2) d M PRE_FG_PSI fast Gaussian gridding Storage: 2dM, Computation: (2m + 2) d M
13 Outline 1 Serial FFT Algorithms 2 Parallel FFT Algorithms 3 Parallel Fast Summation
14 Discrete Fast Fourier Transform Task of 3d-DFT Consider a three-dimensional dataset of n n 1 n 2 complex numbers ĝ k k 1 k 2 C. Compute g l l 1 l 2 := = n 1 n 1 1 n 2 1 k = k 1 = k 2 = n 1 k = n 1 1 n 2 1 k 1 = k 2 = for all l t =,..., n t 1 (t =, 1, 2). ĝ k k 1 k 2 e 2πi(k 2 l 2 n2 +k 1 l 1 n1 +k l n ) ĝ k k 1 k 2 e 2πi(k 2 l 2 n2 +k 1 l 1 Realized by 3d-FFT (n = n 1 = n 2 = n) O(n 3 log n) instead of O(n 6 ) n1 ) e 2πik l n
15 One-Dimensional Data Distribution p n 1 n 2 n T p p - number of processors n n 1 n 2 Features of FFTW [Frigo, Johnson] open source easy interface communicator arbitrary size d-dim. FFT in place FFT high performance many transforms adjust blocksize adjust planning collect wisdom Maximum Number of Processors D max (n = n 1 = n 2 = n) D max = n FFTW combines portable performance and good usability, but is not scalable enough.
16 Two-Dimensional Data Distribution [Ding, Eleftheriou et al. 3, Plimpton, Pekurovsky - P3DFFT] n 2 n 1 n p T n 2 n p n 1 T p n n 2 n 1 p - size of processor grid Maximum Number of Processors pmax 2D (n = n 1 = n 2 = n) p 2D max = n 2 n pmax 1D = n pmax 2D = n
17 Algorithms Supported by FFTW3.3 Aim Implement a new parallel FFT sofware library (PFFT) based on FFTW and the highly scalable two-dimensional data distribution. 1d-FFT Combined with Local Transposition ˆn ˆn 1 ˆn 2 ˆn ˆn 1 ˆn 2 ˆn ˆn 1 ˆn 2 (ˆn ˆn 1 ) ˆn 2 FFT2 12 FFT2 12 FFT2 21 FFT2 21 ˆn ˆn 1 n 2 ˆn 1 ˆn n 2 ˆn n 2 ˆn 1 n 2 (ˆn ˆn 1 )
18 Algorithms Supported by FFTW3.3 Transposition of One-Dimensional Distributed Data N N 1 P T N P N 1 Group two of the three dimensions to use FFTWs matrix transposition on two-dimensional decomposed data, e.g. N = n 2, N 1 = n n 1, P = p. Transposition of Two-Dimensional Distributed Data n 2 ( n p n 1 ) T n 2 p (n n 1 )
19 Two-Dimensional Distributed FFT Based on FFTW PFFT Forward Transform ˆn p ˆn 1 ˆn 2 n 2 ˆn p ˆn 1 n 1 p n 2 ˆn FFT2 21 FFT2 21 FFT2 12 n 2 ˆn p ˆn 1 n 1 n 2 ˆn p n 1 p n 2 n T T PFFT Backward Transform n 1 p n 2 n n 1 n 2 ˆn p n 2 ˆn p ˆn 1 FFT2 12 FFT 12 FFT 12 n 1 p n 2 ˆn n 2 ˆn p ˆn 1 ˆn p ˆn 1 ˆn 2 T T
20 Scaling FFT of Size on BlueGene/P 1 Perfect PFFT P3DFFT wall clock time in s number of cores
21 Scaling FFT of Size on BlueGene/P Perfect PFFT P3DFFT 2 14 speedup number of cores
22 Scaling FFT of Size on BlueGene/P wall clock time in s Perfect PFFT P3DFFT number of processors
23 Scaling FFT of Size on BlueGene/P Perfect PFFT P3DFFT speedup number of cores
24 Comparison of PFFT and P3DFFT Common Features open source high scalability portability multiple precisions Fortran interface C interface r2c FFT ghost cell support PFFT Unique Features c2c FFT completely in place FFT FFTW like interface (basic, advanced and guru) adjustable blocksize separate communicator accumulated wisdom change of planning effort without recompilation d-dimensional parallel FFT over- and downsampling
25 Ghost Cell Support GC p p
26 Oversampled & Downsampled FFT Without Library Support N N 1 P P 1 P 2 P 3 OS n n 1 P P 1 P 2 P 3 FFT ˆn ˆn 1 P 3 P 2 P 1 P PFFT Library Support N 1 N 1 N OS n P P 1 P 2 P 3 P P 1 P 2 P 3 FFT ˆn N 1 P P 1 P 2 P 3 T ˆn N 1 OS P 3 P 2 P 1 P ˆn ˆn 1 P 3 P 2 P 1 P FFT ˆn n 1 P 3 P 2 P 1 P
27 PFFT Software Library PFFT Software Library [Pippig] Available at
28 Parallel NFFT Matrix-Vector-Notation Compute f = (f j ) j C M via f = Aˆf, where A = (e 2πikx j ) k,j C M N 3 and ˆf = (ˆf k k 1 k 2 ) k C N 3. Approximation [Dutt, Rohklin 93, Beylkin 95, Steidl 96,... ] A BFD, A DF B T where D R N 3 N 3 diagonal matrix F C n3 N 3 truncated Fourier matrix (n N ) B R M n3 sparse matrix
29 PNFFT in Pictures D F B n 1 n n 2 FFT n n 1 n 2 Ghost p p1 p
30 PNFFT Software Library PNFFT Software Library [Pippig] Available at
31 Outline 1 Serial FFT Algorithms 2 Parallel FFT Algorithms 3 Parallel Fast Summation
32 Fast Summation Task Fast computation of h j = L a l K( y j x l 2 ), j = 1,..., M, y j, x l R 3 l=1 Example of Radial Kernel Function K( x 2 ) = 1 x 2,... Realized with 3d-NFFT [NFFT Software Library] O(log 3 ( 1 ε )(M + L)) instead of O(ML)
33 Parallel Fast Summation Matrix-Vector-Notation Compute h = (h j ) M j=1 via h = Ka, where K = (K( y j x l )) M,M j,l=1 and a = (a l) M l=1 CM. Standard Algorithm for Equispaced Nodes where K = FDF F C M M equispaced Fourier matrix D C M M diagonal matrix
34 Parallel Fast Summation Matrix-Vector-Notation Compute h = (h j ) M j=1 via h = Ka, where K = (K( y j x l )) M,L j,l=1 and a = (a l) L l=1 CL. Approximation [Potts, Steidl, Nieslony 24] where A 2 = ( e 2πiky ) j K A 2 DA 1 + K NF j,k CM N 3 nonequispaced Fourier matrix D C N 3 N 3 diagonal matrix A 1 = ( e 2πikx ) l nonequispaced Fourier matrix l,k CL N 3 K NF C M L sparse near field correction
35 Summary FFTW Features performance interface portability High Scalability 2d data distribution PFFT Additional Features oversampling downsampling ghost cells PNFFT Nearfield correction Parallel Fast Summation
Introduction to Practical FFT and NFFT
Introduction to Practical FFT and NFFT Michael Pippig and Daniel Potts Faculty of Mathematics Chemnitz University of Technology 07.09.2010 supported by BMBF grant 01IH08001B Table of Contents 1 Serial
More informationParticle Simulation Based on Nonequispaced Fast Fourier Transforms
Particle Simulation Based on Nonequispaced Fast Fourier Transforms Michael Pippig and Daniel Potts Chemnitz University of Technology Department of Mathematics 0907 Chemnitz, Germany E-mail: {michael.pippig,
More informationProblem descriptions: Non-uniform fast Fourier transform
Problem descriptions: Non-uniform fast Fourier transform Lukas Exl 1 1 University of Vienna, Institut für Mathematik, Oskar-Morgenstern-Platz 1, A-1090 Wien, Austria December 11, 2015 Contents 0.1 Software
More informationGenerated sets as sampling schemes for hyperbolic cross trigonometric polynomials
Generated sets as sampling schemes for hyperbolic cross trigonometric polynomials Lutz Kämmerer Chemnitz University of Technology joint work with S. Kunis and D. Potts supported by DFG-SPP 1324 1 / 16
More informationSparse and Fast Fourier Transforms in Biomedical Imaging
Sparse and Fast Fourier Transforms in Biomedical Imaging Stefan Kunis 1 1 joint work with Ines Melzer, TU Chemnitz and Helmholtz Center Munich, supported by DFG and Helmholtz association Outline 1 Trigonometric
More informationNumerical stability of nonequispaced fast Fourier transforms
Numerical stability of nonequispaced fast Fourier transforms Daniel Potts and Manfred Tasche Dedicated to Franz Locher in honor of his 65th birthday This paper presents some new results on numerical stability
More informationUsing NFFT 3 a software library for various nonequispaced fast Fourier transforms
Using NFFT 3 a software library for various nonequispaced fast Fourier transforms Jens Keiner University of Lübeck 23560 Lübeck, Germany keiner@math.uni-luebeck.de and Stefan Kunis Chemnitz University
More informationThe Nonequispaced FFT and its Applications A Mini Course at Helsinki University of Technology
The Nonequispaced FFT and its Applications A Mini Course at Helsinki University of Technology Jens Keiner 1 Stefan Kunis 2 Antje Vollrath 1 1 Institut für Mathematik, Universität zu Lübeck 2 Fakultät für
More informationFast Ewald Summation based on NFFT with Mixed Periodicity
Fast Ewald Summation based on NFFT with Mixed Periodicity Franziska Nestler, Michael Pippig and Daniel Potts In this paper we develop new fast Fourier-based methods for the Coulomb problem. We combine
More informationNFFT Tutorial. potts/nfft
NFFT 3.0 - Tutorial Jens Keiner Stefan Kunis Daniel Potts http://www.tu-chemnitz.de/ potts/nfft keiner@math.uni-luebeck.de,university of Lübeck, Institute of Mathematics, 23560 Lübeck kunis@mathematik.tu-chemnitz.de,
More informationPeriodic motions. Periodic motions are known since the beginning of mankind: Motion of planets around the Sun; Pendulum; And many more...
Periodic motions Periodic motions are known since the beginning of mankind: Motion of planets around the Sun; Pendulum; And many more... Periodic motions There are several quantities which describe a periodic
More informationRecent Results for Moving Least Squares Approximation
Recent Results for Moving Least Squares Approximation Gregory E. Fasshauer and Jack G. Zhang Abstract. We describe two experiments recently conducted with the approximate moving least squares (MLS) approximation
More informationA Fast Algorithm for Spherical Grid Rotations and its Application to Singular Quadrature
A Fast Algorithm for Spherical Grid Rotations and its Application to Singular Quadrature Zydrunas Gimbutas Shravan Veerapaneni September 11, 2013 Abstract We present a fast and accurate algorithm for evaluating
More informationDense Arithmetic over Finite Fields with CUMODP
Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,
More informationNumerical Minimization of Potential Energies on Specific Manifolds
Numerical Minimization of Potential Energies on Specific Manifolds SIAM Conference on Applied Linear Algebra 2012 22 June 2012, Valencia Manuel Gra f 1 1 Chemnitz University of Technology, Germany, supported
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationFourier methods. Lukas Palatinus. Institute of Physics AS CR, Prague, Czechia
Fourier methods Lukas Palatinus Institute of Physics AS CR, Prague, Czechia Outline Fourier transform: mathematician s viewpoint crystallographer s viewpoint programmer s viewpoint Practical aspects libraries,
More informationIntroduction to the FFT
Introduction to the FFT 1 Introduction Assume that we have a signal fx where x denotes time. We would like to represent fx as a linear combination of functions e πiax or, equivalently, sinπax and cosπax
More informationNumerical Approximation Methods for Non-Uniform Fourier Data
Numerical Approximation Methods for Non-Uniform Fourier Data Aditya Viswanathan aditya@math.msu.edu 2014 Joint Mathematics Meetings January 18 2014 0 / 16 Joint work with Anne Gelb (Arizona State) Guohui
More informationTR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a
More information5.6 Convolution and FFT
5.6 Convolution and FFT Fast Fourier Transform: Applications Applications. Optics, acoustics, quantum physics, telecommunications, control systems, signal processing, speech recognition, data compression,
More informationSparse Fourier Transform via Butterfly Algorithm
Sparse Fourier Transform via Butterfly Algorithm Lexing Ying Department of Mathematics, University of Texas, Austin, TX 78712 December 2007, revised June 2008 Abstract This paper introduces a fast algorithm
More informationCRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?
CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 14 Divide and Conquer Fast Fourier Transform Sofya Raskhodnikova 10/7/2016 S. Raskhodnikova; based on slides by K. Wayne. 5.6 Convolution and FFT Fast Fourier Transform:
More informationReview for the Midterm Exam
Review for the Midterm Exam 1 Three Questions of the Computational Science Prelim scaled speedup network topologies work stealing 2 The in-class Spring 2012 Midterm Exam pleasingly parallel computations
More informationExplore Computational Power of GPU in Electromagnetics and Micromagnetics
Explore Computational Power of GPU in Electromagnetics and Micromagnetics Presenter: Sidi Fu, PhD candidate, UC San Diego Advisor: Prof. Vitaliy Lomakin Center of Magnetic Recording Research, Department
More informationNon-Uniform Fast Fourier Transformation (NUFFT) and Magnetic Resonace Imaging 2008 년 11 월 27 일. Lee, June-Yub (Ewha Womans University)
Non-Uniform Fast Fourier Transformation (NUFFT) and Magnetic Resonace Imaging 2008 년 11 월 27 일 Lee, June-Yub (Ewha Womans University) 1 NUFFT and MRI Nonuniform Fast Fourier Transformation (NUFFT) Basics
More informationDirect Methods for Reconstruction of Functions and their Edges from Non-Uniform Fourier Data
Direct Methods for Reconstruction of Functions and their Edges from Non-Uniform Fourier Data Aditya Viswanathan aditya@math.msu.edu ICERM Research Cluster Computational Challenges in Sparse and Redundant
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 13 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel Numerical Algorithms
More informationEffects of Forcing Scheme on the Flow and the Relative Motion of Inertial Particles in DNS of Isotropic Turbulence
Effects of Forcing Scheme on the Flow and the Relative Motion of Inertial Particles in DNS of Isotropic Turbulence Rohit Dhariwal PI: Sarma L. Rani Department of Mechanical and Aerospace Engineering The
More informationHIGHLY EFFECTIVE STABLE EVALUATION OF BANDLIMITED FUNCTIONS ON THE SPHERE
HIGHLY EFFECTIVE STABLE EVALUATION OF BANDLIMITED FUNCTIONS ON THE SPHERE KAMEN IVANOV AND PENCHO PETRUSHEV Abstract. An algorithm for fast and accurate evaluation of band-limited functions at many scattered
More informationChapter 2 Algorithms for Periodic Functions
Chapter 2 Algorithms for Periodic Functions In this chapter we show how to compute the Discrete Fourier Transform using a Fast Fourier Transform (FFT) algorithm, including not-so special case situations
More informationWelcome to MCS 572. content and organization expectations of the course. definition and classification
Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson
More informationInterpolation lattices for hyperbolic cross trigonometric polynomials
Interpolation lattices for hyperbolic cross trigonometric polynomials Lutz Kämmerer a, Stefan Kunis b, Daniel Potts a a Chemnitz University of Technology, Faculty of Mathematics, 09107 Chemnitz, Germany
More informationCyclops Tensor Framework
Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r
More informationA Fast Spherical Filter with Uniform Resolution
JOURNAL OF COMPUTATIONAL PHYSICS 136, 580 584 (1997) ARTICLE NO. CP975782 A Fast Spherical Filter with Uniform Resolution Rüdiger Jakob-Chien*, and Bradley K. Alpert *Department of Computer Science & Engineering,
More informationPerformance optimization of WEST and Qbox on Intel Knights Landing
Performance optimization of WEST and Qbox on Intel Knights Landing Huihuo Zheng 1, Christopher Knight 1, Giulia Galli 1,2, Marco Govoni 1,2, and Francois Gygi 3 1 Argonne National Laboratory 2 University
More informationThe Ongoing Development of CSDP
The Ongoing Development of CSDP Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu Joseph Young Department of Mathematics New Mexico Tech (Now at Rice University)
More informationFourier extension and sampling on the sphere
Fourier extension and sampling on the sphere Daniel Potts Technische Universität Chemnitz Faculty of Mathematics 97 Chemnitz, Germany Email: potts@mathematiktu-chemnitzde iel Van Buggenhout KU Leuven Faculty
More informationDark Energy and Massive Neutrino Universe Covariances
Dark Energy and Massive Neutrino Universe Covariances (DEMNUniCov) Carmelita Carbone Physics Dept, Milan University & INAF-Brera Collaborators: M. Calabrese, M. Zennaro, G. Fabbian, J. Bel November 30
More informationELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers
ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional
More informationA CUDA Solver for Helmholtz Equation
Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College
More informationDiscretization of PDEs and Tools for the Parallel Solution of the Resulting Systems
Discretization of PDEs and Tools for the Parallel Solution of the Resulting Systems Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee Wednesday April 4,
More informationNon-Equispaced Fast Fourier Transforms in Turbulence Simulation
University of Massachusetts Amherst ScholarWorks@UMass Amherst Masters Theses Dissertations and Theses 2017 Non-Equispaced Fast Fourier Transforms in Turbulence Simulation Aditya M. Kulkarni University
More informationImprovements for Implicit Linear Equation Solvers
Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often
More informationThe parallelization of the Keller box method on heterogeneous cluster of workstations
Available online at http://wwwibnusinautmmy/jfs Journal of Fundamental Sciences Article The parallelization of the Keller box method on heterogeneous cluster of workstations Norhafiza Hamzah*, Norma Alias,
More informationStrassen s Algorithm for Tensor Contraction
Strassen s Algorithm for Tensor Contraction Jianyu Huang, Devin A. Matthews, Robert A. van de Geijn The University of Texas at Austin September 14-15, 2017 Tensor Computation Workshop Flatiron Institute,
More informationData analysis of massive data sets a Planck example
Data analysis of massive data sets a Planck example Radek Stompor (APC) LOFAR workshop, Meudon, 29/03/06 Outline 1. Planck mission; 2. Planck data set; 3. Planck data analysis plan and challenges; 4. Planck
More informationFast Algorithms for Spherical Harmonic Expansions
Fast Algorithms for Spherical Harmonic Expansions Vladimir Rokhlin and Mark Tygert Research Report YALEU/DCS/RR-1309 December 17, 004 Abstract An algorithm is introduced for the rapid evaluation at appropriately
More informationLinear Scale-Space. Algorithms for Linear Scale-Space. Gaussian Image Derivatives. Rein van den Boomgaard University of Amsterdam.
$ ( Linear Scale-Space Linear scale-space is constructed by embedding an image family of images that satisfy the diffusion equation: into a one parameter with initial condition convolution: The solution
More informationINITIAL INTEGRATION AND EVALUATION
INITIAL INTEGRATION AND EVALUATION OF SLATE PARALLEL BLAS IN LATTE Marc Cawkwell, Danny Perez, Arthur Voter Asim YarKhan, Gerald Ragghianti, Jack Dongarra, Introduction The aim of the joint milestone STMS10-52
More informationarxiv: v1 [physics.comp-ph] 22 Nov 2012
A Customized 3D GPU Poisson Solver for Free BCs Nazim Dugan a, Luigi Genovese b, Stefan Goedecker a, a Department of Physics, University of Basel, Klingelbergstr. 82, 4056 Basel, Switzerland b Laboratoire
More informationBuilding a better non-uniform fast Fourier transform
Building a better non-uniform fast Fourier transform ICERM 3/12/18 Alex Barnett (Center for Computational Biology, Flatiron Institute) This work is collaboration with Jeremy Magland. We benefited from
More informationBinding Performance and Power of Dense Linear Algebra Operations
10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique
More informationDevelopment of Molecular Dynamics Simulation System for Large-Scale Supra-Biomolecules, PABIOS (PArallel BIOmolecular Simulator)
Development of Molecular Dynamics Simulation System for Large-Scale Supra-Biomolecules, PABIOS (PArallel BIOmolecular Simulator) Group Representative Hisashi Ishida Japan Atomic Energy Research Institute
More informationRWTH Aachen University
IPCC @ RWTH Aachen University Optimization of multibody and long-range solvers in LAMMPS Rodrigo Canales William McDoniel Markus Höhnerbach Ahmed E. Ismail Paolo Bientinesi IPCC Showcase November 2016
More informationOn the computation of spherical designs by a new optimization approach based on fast spherical Fourier transforms
On the computation of spherical designs by a new optimization approach based on fast spherical Fourier transforms anuel Gräf Daniel Potts In this paper we consider the problem of finding numerical spherical
More informationDirect Self-Consistent Field Computations on GPU Clusters
Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd
More informationFaster arithmetic for number-theoretic transforms
University of New South Wales 7th October 2011, Macquarie University Plan for talk 1. Review number-theoretic transform (NTT) 2. Discuss typical butterfly algorithm 3. Improvements to butterfly algorithm
More informationMassively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem
Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing
More informationc 2006 Society for Industrial and Applied Mathematics
SIAM J. SCI. COMPUT. Vol. 7, No. 6, pp. 1903 198 c 006 Society for Industrial and Applied Mathematics FAST ALGORITHMS FOR SPHERICAL HARMONIC EXPANSIONS VLADIMIR ROKHLIN AND MARK TYGERT Abstract. An algorithm
More informationHybrid parallelization of a pseudo-spectral DNS code and its computational performance on RZG s idataplex system Hydra
Hybrid parallelization of a pseudo-spectral DNS code and its computational performance on RZG s idataplex system Hydra Markus Rampp 1, Liang Shi 2, Marc Avila 3,2, Björn Hof 2,4 1 Computing Center of the
More informationA Fast Fourier transform based direct solver for the Helmholtz problem. AANMPDE 2017 Palaiochora, Crete, Oct 2, 2017
A Fast Fourier transform based direct solver for the Helmholtz problem Jari Toivanen Monika Wolfmayr AANMPDE 2017 Palaiochora, Crete, Oct 2, 2017 Content 1 Model problem 2 Discretization 3 Fast solver
More informationA Parallel Implementation of the. Yuan-Jye Jason Wu y. September 2, Abstract. The GTH algorithm is a very accurate direct method for nding
A Parallel Implementation of the Block-GTH algorithm Yuan-Jye Jason Wu y September 2, 1994 Abstract The GTH algorithm is a very accurate direct method for nding the stationary distribution of a nite-state,
More informationA hierarchical Model for the Analysis of Efficiency and Speed-up of Multi-Core Cluster-Computers
A hierarchical Model for the Analysis of Efficiency and Speed-up of Multi-Core Cluster-Computers H. Kredel 1, H. G. Kruse 1 retired, S. Richling2 1 IT-Center, University of Mannheim, Germany 2 IT-Center,
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationFrequency-domain representation of discrete-time signals
4 Frequency-domain representation of discrete-time signals So far we have been looing at signals as a function of time or an index in time. Just lie continuous-time signals, we can view a time signal as
More informationTHE TWO DIMENSIONAL COUPLED NONLINEAR SCHRÖDINGER EQUATION- NUMERICAL METHODS AND EXPERIMENTS HARINI MEDIKONDURU
THE TWO DIMENSIONAL COUPLED NONLINEAR SCHRÖDINGER EQUATION- NUMERICAL METHODS AND EXPERIMENTS by HARINI MEDIKONDURU (Under the Direction of Thiab R. Taha) ABSTRACT The coupled nonlinear Schrödinger equation
More informationParameter estimation for exponential sums by approximate Prony method
Parameter estimation for exponential sums by approximate Prony method Daniel Potts a, Manfred Tasche b a Chemnitz University of Technology, Department of Mathematics, D 09107 Chemnitz, Germany b University
More informationAccelerating Model Reduction of Large Linear Systems with Graphics Processors
Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex
More informationSome Geometric and Algebraic Aspects of Domain Decomposition Methods
Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition
More informationHow to Multiply. 5.5 Integer Multiplication. Complex Multiplication. Integer Arithmetic. Complex multiplication. (a + bi) (c + di) = x + yi.
How to ultiply Slides by Kevin Wayne. Copyright 5 Pearson-Addison Wesley. All rights reserved. integers, matrices, and polynomials Complex ultiplication Complex multiplication. a + bi) c + di) = x + yi.
More informationFast Algorithms for the Computation of Oscillatory Integrals
Fast Algorithms for the Computation of Oscillatory Integrals Emmanuel Candès California Institute of Technology EPSRC Symposium Capstone Conference Warwick Mathematics Institute, July 2009 Collaborators
More informationPreliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools
Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools Xiangjun Wu (1),Lilun Zhang (2),Junqiang Song (2) and Dehui Chen (1) (1) Center for Numerical Weather Prediction, CMA (2) School
More informationOn the Computational Complexity of the Discrete Pascal Transform
6 th International Conference Logic and Applications LAP 207, September 8-22, 207, Dubrovnik, Croatia On the Computational Complexity of the Discrete Pascal Transform Dušan B. Gajić, Radomir S. Stanković
More informationA Scalable, Parallel Implementation of Weighted, Non-Linear Compact Schemes
A Scalable, Parallel Implementation of Weighted, Non-Linear Compact Schemes Debojyoti Ghosh Emil M. Constantinescu Jed Brown Mathematics Computer Science Argonne National Laboratory SIAM Annual Meeting
More informationCoskewness and Cokurtosis John W. Fowler July 9, 2005
Coskewness and Cokurtosis John W. Fowler July 9, 2005 The concept of a covariance matrix can be extended to higher moments; of particular interest are the third and fourth moments. A very common application
More informationAccelerating AES Using Instruction Set Extensions for Elliptic Curve Cryptography. Stefan Tillich, Johann Großschädl
Accelerating AES Using Instruction Set Extensions for Elliptic Curve Cryptography International Workshop on Information Security & Hiding (ISH '05) Institute for Applied Information Processing and Communications
More informationFast Poisson solvers on nonequispaced grids: Multigrid and Fourier Methods compared
Fast Poisson solvers on nonequispaced grids: Multigrid and Fourier Methods compared Gisela Pöplau a and Daniel Potts b a Institute of General Electrical Engineering, Rostock University, D 805 Rostock,
More informationFourier Reconstruction from Non-Uniform Spectral Data
School of Electrical, Computer and Energy Engineering, Arizona State University aditya.v@asu.edu With Profs. Anne Gelb, Doug Cochran and Rosemary Renaut Research supported in part by National Science Foundation
More informationThe Fastest Convolution in the West
The Fastest Convolution in the West Malcolm Roberts and John Bowman University of Alberta 2011-02-15 www.math.ualberta.ca/ mroberts 1 Outline Convolution Definition Applications Fast Convolutions Dealiasing
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar
More informationPhase Retrieval from Local Measurements: Deterministic Measurement Constructions and Efficient Recovery Algorithms
Phase Retrieval from Local Measurements: Deterministic Measurement Constructions and Efficient Recovery Algorithms Aditya Viswanathan Department of Mathematics and Statistics adityavv@umich.edu www-personal.umich.edu/
More informationGLoBES. Patrick Huber. Physics Department VT. P. Huber p. 1
GLoBES Patrick Huber Physics Department VT P. Huber p. 1 P. Huber p. 2 General Long Baseline Experiment Simulator GLoBES is a software package designed for Simulation Analysis Comparison of neutrino oscillation
More informationA Framework for Discrete Integral Transformations I the Pseudo-polar Fourier transform
A Framework for Discrete Integral Transformations I the Pseudo-polar Fourier transform A. Averbuch, R.R. Coifman, D.L. Donoho, M. Israeli, and Y. Shkolnisky June 26, 2007 This paper is dedicated to the
More informationHigh precision pixel-based simulations of CMB lensing
Laboratoire APC AstroParticule et Cosmologie University of Paris V Denis Diderot 22nd April 2011 Berkeley CMB lensing workshop Outline 1 Pixel Based methods 2 LenS 2 HAT 3 Application 4 Conclusions How
More informationFastfood Approximating Kernel Expansions in Loglinear Time. Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle)
Fastfood Approximating Kernel Expansions in Loglinear Time Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle) Large Scale Problem: ImageNet Challenge Large scale data Number of training
More informationWEAK FORM NONUNIFORM FAST FOURIER TRANSFORM METHOD FOR SOLVING VOLUME INTEGRAL EQUATIONS
Progress In Electromagnetics Research, PIER 89, 275 289, 2009 WEAK FORM NONUNIFORM FAST FOURIER TRANSFORM METHOD FOR SOLING OLUME INTEGRAL EQUATIONS Z. H. Fan, R. S. Chen, H. Chen, and D. Z. Ding Department
More informationFFT-based Dense Polynomial Arithmetic on Multi-cores
FFT-based Dense Polynomial Arithmetic on Multi-cores Yuzhen Xie Computer Science and Artificial Intelligence Laboratory, MIT and Marc Moreno Maza Ontario Research Centre for Computer Algebra, UWO ACA 2009,
More informationFall 2011, EE123 Digital Signal Processing
Lecture 6 Miki Lustig, UCB September 11, 2012 Miki Lustig, UCB DFT and Sampling the DTFT X (e jω ) = e j4ω sin2 (5ω/2) sin 2 (ω/2) 5 x[n] 25 X(e jω ) 4 20 3 15 2 1 0 10 5 1 0 5 10 15 n 0 0 2 4 6 ω 5 reconstructed
More informationPARALLEL PSEUDO-SPECTRAL SIMULATIONS OF NONLINEAR VISCOUS FINGERING IN MIS- Center for Parallel Computations, COPPE / Federal University of Rio de
PARALLEL PSEUDO-SPECTRAL SIMULATIONS OF NONLINEAR VISCOUS FINGERING IN MIS- CIBLE DISPLACEMENTS N. Mangiavacchi, A.L.G.A. Coutinho, N.F.F. Ebecken Center for Parallel Computations, COPPE / Federal University
More informationParallel Sparse Tensor Decompositions using HiCOO Format
Figure sources: A brief survey of tensors by Berton Earnshaw and NVIDIA Tensor Cores Parallel Sparse Tensor Decompositions using HiCOO Format Jiajia Li, Jee Choi, Richard Vuduc May 8, 8 @ SIAM ALA 8 Outline
More informationV C V L T I 0 C V B 1 V T 0 I. l nk
Multifrontal Method Kailai Xu September 16, 2017 Main observation. Consider the LDL T decomposition of a SPD matrix [ ] [ ] [ ] [ ] B V T L 0 I 0 L T L A = = 1 V T V C V L T I 0 C V B 1 V T, 0 I where
More informationA parallel implementation for polynomial multiplication modulo a prime.
A parallel implementation for polynomial multiplication modulo a prime. ABSTRACT Marshall Law Department of Mathematics Simon Fraser University Burnaby, B.C. Canada. mylaw@sfu.ca. We present a parallel
More informationOptimizing GROMACS for parallel performance
Optimizing GROMACS for parallel performance Outline 1. Why optimize? Performance status quo 2. GROMACS as a black box. (PME) 3. How does GROMACS spend its time? (MPE) 4. What you can do What I want to
More informationParallelization of the QC-lib Quantum Computer Simulator Library
Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer September 9, 23 PPAM 23 1 Ian Glendinning / September 9, 23 Outline Introduction Quantum Bits, Registers
More informationParallelism in Structured Newton Computations
Parallelism in Structured Newton Computations Thomas F Coleman and Wei u Department of Combinatorics and Optimization University of Waterloo Waterloo, Ontario, Canada N2L 3G1 E-mail: tfcoleman@uwaterlooca
More informationA Primer on Three Vectors
Michael Dine Department of Physics University of California, Santa Cruz September 2010 What makes E&M hard, more than anything else, is the problem that the electric and magnetic fields are vectors, and
More informationPoisson Solvers. William McLean. April 21, Return to Math3301/Math5315 Common Material.
Poisson Solvers William McLean April 21, 2004 Return to Math3301/Math5315 Common Material 1 Introduction Many problems in applied mathematics lead to a partial differential equation of the form a 2 u +
More informationA Non-sparse Tutorial on Sparse FFTs
A Non-sparse Tutorial on Sparse FFTs Mark Iwen Michigan State University April 8, 214 M.A. Iwen (MSU) Fast Sparse FFTs April 8, 214 1 / 34 Problem Setup Recover f : [, 2π] C consisting of k trigonometric
More information