Introduction to Practical FFT and NFFT

Size: px

Start display at page:

Download "Introduction to Practical FFT and NFFT"

Laura Neal
5 years ago
Views:

1 Introduction to Practical FFT and NFFT Michael Pippig and Daniel Potts Department of Mathematics Chemnitz University of Technology September 14, 211 supported by BMBF grant 1IH81B

2 Table of Contents 1 Serial FFT Algorithms 2 Parallel FFT Algorithms 3 Parallel Fast Summation

3 Outline 1 Serial FFT Algorithms 2 Parallel FFT Algorithms 3 Parallel Fast Summation

4 Discrete Fast Fourier Transform Task of 3d-DFT Consider a three-dimensional dataset of n n 1 n 2 complex numbers ĝ k k 1 k 2 C. Compute g l l 1 l 2 = n 1 n 1 1 n 2 1 k = k 1 = k 2 = for all l t =,..., n t 1 (t =, 1, 2). Realized by 3d-FFT (n = n 1 = n 2 = n) O(n 3 log n) instead of O(n 6 ) ĝ k k 1 k 2 e 2πi(k 2 l 2 n2 +k 1 l 1 n1 +k l n )

5 FFT Software Libraries Examples of FFT implementations IBM ESSL Intel MKL FFTW Features of FFTW [Frigo, Johnson] public available open source high performance many transforms arbitrary size d-dim. FFT in place FFT Available at collect wisdom adjust planning easy interface

6 Using FFTW FFTW Plan - only once hardware adaptive time consuming Execute - several times fast transform Finalize - only once free memory FFTW_ESTIMATE heuristic choice of algorithm FFTW_MEASURE compare different algorithms FFTW_PATIENT compare more algorithms FFTW_EXHAUSTIVE compare all available algorithms

7 FFTW Interface Basic Interface simple do a single transform Advanced Interface do a transform on multiple datasets by one call supports strided input and output use a plan on different datasets Guru Interface most powerful and most complicated combine transform and data permutation 64 bit compatible

8 Nonequispaced Discrete Fourier Transform Task of 3d-DFT and 3d-NDFT For ˆf k k 1 k 2 C compute f l l 1 l 2 := N 1 N 1 N 1 k = k 1 = k 2 = ˆfk k 1 k 2 e 2πi(k 2 l 2 N +k 1 l 1N +k l N ) (DFT) for all l t < N ( l t N < 1), t =, 1, 2, and compute f j := N 1 N 1 N 1 k = k 1 = k 2 = ˆfk k 1 k 2 e 2πi(k 2x (2) j +k 1 x (1) j +k x () j ) (NDFT) for x (t) j [, 1) (t =, 1, 2), j = 1,..., M. Realized by 3d-NFFT [NFFT software library] O(N 3 log N + log 3 ( 1 ε )M ) instead of O(N 3 M )

9 NFFT Matrix-Vector-Notation Compute f = (f j ) j C M via f = Aˆf, where A = (e 2πikx j ) k,j C M N 3 and ˆf = (ˆf k k 1 k 2 ) k C N 3. Approximation [Dutt, Rohklin 93, Beylkin 95, Steidl 96,... ] A BFD, A DF B T where D R N 3 N 3 diagonal matrix F C n3 N 3 truncated Fourier matrix (n N ) B R M n3 sparse matrix

10 NFFT Software Library NFFT Software Library [Keiner, Kunis, Potts] Available at

11 Using NFFT FFTW Plan NFFT Plan Precompute Execute Execute Finalize Finalize

12 NFFT Precompute PRE_FULL_PSI fully precomputed window function Storage: (2m + 2) d M, Computation: None PRE_PSI tensor product based precomputation Storage: d(2m + 2)M, Computation: (2m + 2) d M PRE_LIN_PSI linear interpolation from lookup table Storage: dk, Computation: 2(2m + 2) d M PRE_FG_PSI fast Gaussian gridding Storage: 2dM, Computation: (2m + 2) d M

13 Outline 1 Serial FFT Algorithms 2 Parallel FFT Algorithms 3 Parallel Fast Summation

14 Discrete Fast Fourier Transform Task of 3d-DFT Consider a three-dimensional dataset of n n 1 n 2 complex numbers ĝ k k 1 k 2 C. Compute g l l 1 l 2 := = n 1 n 1 1 n 2 1 k = k 1 = k 2 = n 1 k = n 1 1 n 2 1 k 1 = k 2 = for all l t =,..., n t 1 (t =, 1, 2). ĝ k k 1 k 2 e 2πi(k 2 l 2 n2 +k 1 l 1 n1 +k l n ) ĝ k k 1 k 2 e 2πi(k 2 l 2 n2 +k 1 l 1 Realized by 3d-FFT (n = n 1 = n 2 = n) O(n 3 log n) instead of O(n 6 ) n1 ) e 2πik l n

15 One-Dimensional Data Distribution p n 1 n 2 n T p p - number of processors n n 1 n 2 Features of FFTW [Frigo, Johnson] open source easy interface communicator arbitrary size d-dim. FFT in place FFT high performance many transforms adjust blocksize adjust planning collect wisdom Maximum Number of Processors D max (n = n 1 = n 2 = n) D max = n FFTW combines portable performance and good usability, but is not scalable enough.

16 Two-Dimensional Data Distribution [Ding, Eleftheriou et al. 3, Plimpton, Pekurovsky - P3DFFT] n 2 n 1 n p T n 2 n p n 1 T p n n 2 n 1 p - size of processor grid Maximum Number of Processors pmax 2D (n = n 1 = n 2 = n) p 2D max = n 2 n pmax 1D = n pmax 2D = n

17 Algorithms Supported by FFTW3.3 Aim Implement a new parallel FFT sofware library (PFFT) based on FFTW and the highly scalable two-dimensional data distribution. 1d-FFT Combined with Local Transposition ˆn ˆn 1 ˆn 2 ˆn ˆn 1 ˆn 2 ˆn ˆn 1 ˆn 2 (ˆn ˆn 1 ) ˆn 2 FFT2 12 FFT2 12 FFT2 21 FFT2 21 ˆn ˆn 1 n 2 ˆn 1 ˆn n 2 ˆn n 2 ˆn 1 n 2 (ˆn ˆn 1 )

18 Algorithms Supported by FFTW3.3 Transposition of One-Dimensional Distributed Data N N 1 P T N P N 1 Group two of the three dimensions to use FFTWs matrix transposition on two-dimensional decomposed data, e.g. N = n 2, N 1 = n n 1, P = p. Transposition of Two-Dimensional Distributed Data n 2 ( n p n 1 ) T n 2 p (n n 1 )

19 Two-Dimensional Distributed FFT Based on FFTW PFFT Forward Transform ˆn p ˆn 1 ˆn 2 n 2 ˆn p ˆn 1 n 1 p n 2 ˆn FFT2 21 FFT2 21 FFT2 12 n 2 ˆn p ˆn 1 n 1 n 2 ˆn p n 1 p n 2 n T T PFFT Backward Transform n 1 p n 2 n n 1 n 2 ˆn p n 2 ˆn p ˆn 1 FFT2 12 FFT 12 FFT 12 n 1 p n 2 ˆn n 2 ˆn p ˆn 1 ˆn p ˆn 1 ˆn 2 T T

20 Scaling FFT of Size on BlueGene/P 1 Perfect PFFT P3DFFT wall clock time in s number of cores

21 Scaling FFT of Size on BlueGene/P Perfect PFFT P3DFFT 2 14 speedup number of cores

22 Scaling FFT of Size on BlueGene/P wall clock time in s Perfect PFFT P3DFFT number of processors

23 Scaling FFT of Size on BlueGene/P Perfect PFFT P3DFFT speedup number of cores

24 Comparison of PFFT and P3DFFT Common Features open source high scalability portability multiple precisions Fortran interface C interface r2c FFT ghost cell support PFFT Unique Features c2c FFT completely in place FFT FFTW like interface (basic, advanced and guru) adjustable blocksize separate communicator accumulated wisdom change of planning effort without recompilation d-dimensional parallel FFT over- and downsampling

25 Ghost Cell Support GC p p

26 Oversampled & Downsampled FFT Without Library Support N N 1 P P 1 P 2 P 3 OS n n 1 P P 1 P 2 P 3 FFT ˆn ˆn 1 P 3 P 2 P 1 P PFFT Library Support N 1 N 1 N OS n P P 1 P 2 P 3 P P 1 P 2 P 3 FFT ˆn N 1 P P 1 P 2 P 3 T ˆn N 1 OS P 3 P 2 P 1 P ˆn ˆn 1 P 3 P 2 P 1 P FFT ˆn n 1 P 3 P 2 P 1 P

27 PFFT Software Library PFFT Software Library [Pippig] Available at

28 Parallel NFFT Matrix-Vector-Notation Compute f = (f j ) j C M via f = Aˆf, where A = (e 2πikx j ) k,j C M N 3 and ˆf = (ˆf k k 1 k 2 ) k C N 3. Approximation [Dutt, Rohklin 93, Beylkin 95, Steidl 96,... ] A BFD, A DF B T where D R N 3 N 3 diagonal matrix F C n3 N 3 truncated Fourier matrix (n N ) B R M n3 sparse matrix

29 PNFFT in Pictures D F B n 1 n n 2 FFT n n 1 n 2 Ghost p p1 p

30 PNFFT Software Library PNFFT Software Library [Pippig] Available at

31 Outline 1 Serial FFT Algorithms 2 Parallel FFT Algorithms 3 Parallel Fast Summation

32 Fast Summation Task Fast computation of h j = L a l K( y j x l 2 ), j = 1,..., M, y j, x l R 3 l=1 Example of Radial Kernel Function K( x 2 ) = 1 x 2,... Realized with 3d-NFFT [NFFT Software Library] O(log 3 ( 1 ε )(M + L)) instead of O(ML)

33 Parallel Fast Summation Matrix-Vector-Notation Compute h = (h j ) M j=1 via h = Ka, where K = (K( y j x l )) M,M j,l=1 and a = (a l) M l=1 CM. Standard Algorithm for Equispaced Nodes where K = FDF F C M M equispaced Fourier matrix D C M M diagonal matrix

34 Parallel Fast Summation Matrix-Vector-Notation Compute h = (h j ) M j=1 via h = Ka, where K = (K( y j x l )) M,L j,l=1 and a = (a l) L l=1 CL. Approximation [Potts, Steidl, Nieslony 24] where A 2 = ( e 2πiky ) j K A 2 DA 1 + K NF j,k CM N 3 nonequispaced Fourier matrix D C N 3 N 3 diagonal matrix A 1 = ( e 2πikx ) l nonequispaced Fourier matrix l,k CL N 3 K NF C M L sparse near field correction

35 Summary FFTW Features performance interface portability High Scalability 2d data distribution PFFT Additional Features oversampling downsampling ghost cells PNFFT Nearfield correction Parallel Fast Summation

Introduction to Practical FFT and NFFT

Introduction to Practical FFT and NFFT Michael Pippig and Daniel Potts Faculty of Mathematics Chemnitz University of Technology 07.09.2010 supported by BMBF grant 01IH08001B Table of Contents 1 Serial