Fast Gauss trasforms with complex parameters usig NFFTs Stefa Kuis Daiel Potts Gabriele Steidl Jue 3, 5 We costruct a fast algorithm for the computatio of discrete Gauss trasforms with complex parameters, capable of dealig with o equispaced poits. Our algorithm is based o the fast Fourier trasform at o equispaced kots ad requires oly O(N) arithmetic operatios. Key words ad phrases. Gauss trasform; Uequally spaced Fourier trasforms; Fast algorithms; Chirped Gaussia; NFFT Itroductio Give complex coefficiets α k C ad source kots x k [ 4, 4 ], our goal cosists i the fast evaluatio of the sum f(y) = α k e σ y x k (.) k= at the target kots y j [ 4, 4 ], j =,...,M, where σ = a + ib, a >,b R deotes a complex parameter. Fast Gauss trasforms for real parameters σ were developed, e.g., i [5, 8, 9]. I [], we have specified a more geeral fast summatio algorithm for the Gaussia kerel. Recetly, a fast Gauss trasform for complex parameters σ with arithmetic complexity O(N log N + M) was itroduced by Adersso ad Beylki []. I this paper, we show how our geeral fast summatio algorithm developed i [,, 6] ca be specified for the Gaussia kerel with complex parameter σ to obtai a fast Gauss trasform with arithmetic complexity O(N +M). This results i a simpler algorithm tha those i [] with competitive performace i practice. We prove error estimates cocerig the depedece of the computatioal speed o the desired accuracy ad the parameters a ad σ. The heart of our algorithm is the discrete Fourier trasform for o equispaced kots (NDFT), i.e., the evaluatio of g j = l= ĝ l e πily j, j =,...,M, (.) kuis@math.ui-luebeck.de, Uiversity of Lübeck, Istitute of Mathematics, D 356 Lübeck potts@math.ui-luebeck.de, Uiversity of Lübeck, Istitute of Mathematics, D 356 Lübeck steidl@math.ui-maheim.de, Uiversity of Maheim, Istitute of Mathematics, D 683 Maheim
ad its adjoit M ĝ l = g j e πily j, l =,..., j=, (.3) for arbitrary kots y j [, ) ad a degree N. Startig with [5, ], there exists meawhile a broad literature o fast Fourier trasforms for o equispaced kots (NFFT), takig O((ρ) log(ρ) +mm) arithmetic operatios for the approximate computatio of (.) ad (.3). Here ρ is a oversamplig factor ad m a cut off parameter. Both parameters have to be chose i accordace with the desired accuracy of the NFFT computatios. I geeral, the approximatio error itroduced by the NFFT decreases expoetially i m with basis depedig o ρ. A uified approach to NFFTs was give i [4, 3] ad a correspodig software package ca be foud i []. The remaider of this paper is orgaised as follows: I Sectio, we modify our fast summatio algorithm for the Gaussia kerel with complex parameter σ. I Sectio 3, we prove error bouds for the fast Gauss trasform to justify its arithmetic complexity O(N +M). Fially, Sectio 4 presets various umerical experimets. Fast Gauss trasform A algorithm for the fast computatio of sums of the form f(y j ) := α k K(y j x k ) k= j =,...,M, where K are special real valued kerels was proposed for oe dimesio i [] ad geeralised to the multivariate settig i []. We wat to apply this method to the kerel K(x) = e σx, where σ = a + bi = σ e iϕ, a >, b R, ϕ := arcta b ( a π, π ) is a complex parameter. Followig the lies of [], we start by approximatig K by a periodic fuctio. For some period p, let K P (x) := r Z K(x p r) (.) deote the periodisatio of K. The uiformly coverget Fourier series of K P with Fourier coefficiets b l = p p p K P (x)e πilx/p dx = p e σx e πilx/p dx = π p σ e l π /(σp ), (.) where σ = σ e i ϕ is trucated with a degree N by K (x) := b l e πilx/p. (.3) l=
For y [ 4, 4 ], we approximate f by f(y) := α k K (y x k ), (.4) k= where x k [ 4, 4 ]. Substitutig (.3) ito (.4) we obtai f(y j ) = k= α k l= b l e πil(y j x k )/p = l= ( N ) b l α k e πilx k/p e πilyj/p. The expressio i the ier brackets ca be computed by a adjoit NFFT i O(N +log ) arithmetic operatios. This is followed by multiplicatios with b l ad completed by a NFFT to compute the outer sum i O(M + log ) arithmetic operatios. I Sectio 3, we will prove that depeds oly o the desired accuracy of our algorithm ad o the complex parameter σ, but ot o the umbers M ad N. Thus, the overall arithmetic complexity of our algorithm is O(N +M), i particular this performace does ot deped o the distributio of the poits x k ad y j. I summary, we propose the followig algorithm. k= Algorithm.. For l =,...,, compute a l = N α k e πilxk/p by the adjoit NFFT. k=. For l =,...,, compute the products d l = a l b l. 3. For j =,...,M, compute f(y j ) = l= d l e πily j/p by the NFFT. Remark.. Step ad 3 of our algorithm ca be also realized by NDFTs i O(N) ad O(M) operatios, respectively, yieldig a O(M + N) algorithm, too.. From aother poit of view, the fuctio K was obtaied by applyig the trapezoidal quadrature rule withi the iterval [ /(p),/(p)] as follows: e σx = π σ e y π /σ e πixy dy π σ p l= e l p π /σ e πix l p. It may be iterestig to further reduce the umber of summads i K by applyig a more advaced quadrature rule as the geeralized Gaussia quadrature proposed i [3] or the method i [4]. However, this ivolves a lot of work to determie the correspodig kots ad weights which is beyod the scope of this paper. 3
3 Error estimates Beyod the well kow errors appearig i the NFFT computatios, see [], our algorithm produces for j =,...,M the errors f(y j ) f(y j ) = α k K ERR (y j x k ) k= α K ERR, where K ERR := K K, K ERR := max K ERR (x) ad α := N α k. x k= Theorem 3. For K defied by (.3), the followig error estimate holds true K ERR e a(p ) 4 ( ) + ap(p ) + π p σ Proof: By defiitio of K ERR, we obtai for x [ p, p ) that ad further by (.), (.3), ad (.) that K ERR (x) e π a ( 4p σ K ERR (x) K(x) K P (x) + K P (x) K (x), r Z\{} e σ(x+p r) + π p σ ( π e 4p σ + ) + σ p. (3.) π a l π e p σ l= + Sice e a+i b = e a, this ca be estimated for x [, ) by a(p r ) π K ERR (x) e 4 + r= p e π a 4p σ + e l π a p σ σ l= + e a(p ) 4 + e ay 4 dy π + p p e π a 4p σ + σ p Fially, we obtai the assertio by estimatig the itegrals with the help of ). e y π a p σ dy. e u x dx = e u(x+v) dx e uv v e uvx dx = e uv uv, u,v >. The error estimate (3.) cosists of two parts. The first part depeds oly o the real part of σ. I particular, we have that E (a,p) := e a(p ) 4 < ε < p > l(/ε) +. a Sice p, this part has o ifluece if a is large. For smaller a, the parameter p acts as a scalig factor as follows: for p = ad real part a we obtai the same error as for p > ad 4
smaller real part a/(p ). The secod part of the error is determied by the real ad the imagiary part of σ. Here we have that E (σ,p,) = e π a 4p σ < ε < > p σ l(/ε) π a = p a l(/ε). (3.) π cos ϕ With respect to our scalig factor p, we see that for a well localised Gaussia with parameter σ ad p = we obtai the same error as for a less localised Gaussia with parameter σ/p. Figure 3 illustrates the behaviour of the error. The right-had sides of the figure show the level lies for E, while the left-had sides preset those of (3.). The top ad the middle images moreover demostrate the role of the parameter p. I particular, the left had sides show that the ifluece of the first summad E i (3.) becomes less domiat for small values of a if we icrease the period p. Remark 3.. Istead of K P we ca use the trucated fuctio K T (x) := χ [, )(x r)k(x r), r Z with the characteristic fuctio χ [, ) of the iterval [, ) together with a appropriate boudary regularisatio as described i [, 6]. By the boudary regularisatio, K T becomes a smooth oe periodic fuctio with uiformly coverget Fourier series. For its trucated versio K we use the approximatio of the Fourier coefficiets by the trapezoidal quadrature rule b l := ( ) j K T e πijl/. j= The correspodig error estimates show that we do ot eed a additioal parameter p to cope with smaller values of a if we accept the additioal precomputatio effort due to the boudary regularisatio.. I the Gauss trasform oe should also cosider the case where the parameter a ad thus the shape of the Gaussia chages i some fair fashio with the umber of samples N = M. If we spread poits uiformly ad scale a so that the Gaussia has fixed effective width whe measured i uits of the average spacig betwee poits /N, the we eed a = O(N ). Usig (3.) we see that = O(N). Substitutig this ito the overall complexity estimate after (.3) we get the complexity O(N log N) usig NFFTs ad O(N ) usig NDFTs. However, if N becomes very large ad a = O(N ), the Gaussia becomes very small ad it would be better to switch to real space techiques usig a simple trucatio techique. 4 Numerical experimets We preset umerical experimets i order to demostrate the performace of our algorithm. All algorithms were implemeted i C ad tested o a AMD Athlo TM XP 7+ with 5
8 6 4 4 6 8 6 4 4 6 8 3 4 5 6 7 8 8 3 4 5 6 7 8 5 5 5 5 5 5 5 5 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 e e 8 e e 8 e e 8 e e 8 5 4 6 8 4 5 5 5 5 e 8 e 8 e e e 8 e e e 8 e 8 4 6 8 4 Figure 3.: Level sets i the complex (a,b) plae for (3.) (left) ad for E (σ,p,) (right). Top: p =, = 64, Middle: p =, = 64, Bottom: p =, = 8. GB mai memory, SuSe-Liux (kerel.4.-4gb-athlo, gcc 3.3) usig double precisio arithmetic. Moreover, we have used the libraries FFTW 3.. [7] ad NFFT.. []. Throughout our experimets we have applied the NFFT package [] with precomputed Kaiser Bessel fuctios ad a oversamplig factor ρ =. I our tests we have always chose radom source ad target kots i [ 4, 4 ) ad coefficiets α k uiformly distributed i the complex box [, ] [, ]i. We have cosidered the Gaussia kerels with 6
i) σ = 4(38 + i), ii) σ = + 4i. The first parameter σ was take from [] i order to make the results comparable. The secod choice of a cosiderably smaller σ serves to demostrate the ifluece of the parameter p. First we examie the errors that are geerated by our fast Gauss trasform. Figure 4. presets the error max f(y j) f(y j ) j=,...,m E := K ERR α k k= itroduced by our algorithms as fuctio of the parameter. These results cofirm the error estimates i Theorem 3.. 5 5 5 5 4 6 8 4 6 8 Figure 4.: Error E for = 8,,6,...,8 ad N = M =. Left: for σ = 4(38 + i), fast Gauss trasform with NDFT (solid), fast Gauss trasform with NFFT, cut-off parameter m = 3 (dash-dot), fast Gauss trasform with NFFT, cut-off parameter m = 7 (dashed), error estimate for K ERR (dotted); Right: for σ = +4i, p = (solid), p =.5 (dashed), p = (dash-dot), ad the fast Gauss trasform with NFFT, cut-off parameter m = 7. Fially, we compare the computatio time of the straightforward summatio, the straightforward summatio with the precomputed matrix, the fast Gauss trasform with NDFT, ad the fast Gauss trasform with NFFT for icreasig N = M. The CPU time required by the four algorithms is show i Table 4.. As expected the fast Gauss trasforms outperform the straightforward algorithms, yieldig a O(N) complexity i both variats, whereas the NFFT versio is cosiderably faster. 5 Coclusios We have preseted a fast algorithm for the computatio of sums of type (.) i O(N + M), respectively O(N log N +M) arithmetic operatios, depedig o a fair scalig of the Gaussia. We have proved error estimates cocerig the depedece of the computatioal speed 7
N = M direct alg. with precomp. fast GT, NDFT fast GT, NFFT error E 64 6.e 4 3.e 5.9e 3.e 4.4e 5 8.4e 3.4e 4 3.9e 3.3e 4.7e 5 56 9.6e 3.3e 3 7.6e 3 4.3e 4 8.9e 6 5 3.8e 5.e 3.5e 8.5e 4 5.8e 6 4.5e.e 3.e.8e 3 6.e 6 48 6.e 8.e 6.e 3.5e 3.3e 6 496.5e + 3.7e.e 6.9e 3.4e 6 89 9.9e +.4e +.4e.4e.4e 6 6384 4.e + * 4.9e.7e.e 6 3768.6e + * 9.7e 5.4e.e 6 65536 6.4e + *.e +.e 8.7e 7 37.6e + 3 * 3.9e +.e 7.9e 7 644.e + 4 * 7.8e + 4.3e 6.e 7 5488 * *.6e + 8.8e * 48576 * * 3.e +.7e + * 975 * * 6.5e + 3.6e + * Table 4.: CPU-Time ad error E for the fast Gauss trasform for σ = 4(38 + i), = 8, ad NFFT-cut-off parameter m = 7. Note that we used accumulated measuremets i case of small times ad the times/error (*) are ot displayed due to the large respose time or the limited size of memory. o the desired accuracy ad the parameters a ad σ. The software for this algorithm icludig all described tests is available withi the NFFT package [,./example/fastgauss]. For the oly available commo example our algorithm is faster tha those i []. Refereces [] F. Adersso ad G. Beylki. The fast Gauss trasform with complex parameters. J. Comput. Physics, 5. [] G. Beylki. O the fast Fourier trasform of fuctios with sigularities. Appl. Comput. Harmo. Aal., :363 38, 995. [3] G. Beylki ad L. Moz. O geeralized gaussia quadratures for expoetials ad their applicatios. Appl. Comput. Harmo. Aal., (3):33 373,. [4] G. Beylki ad L. Moz. O approximatios of fuctios by expoetial sums. Preprit, Uiv. Boulder, 5. [5] A. Dutt ad V. Rokhli. Fast Fourier trasforms for oequispaced data. SIAM J. Sci. Stat. Comput., 4:368 393, 993. [6] M. Fe ad G. Steidl. Fast NFFT based summatio of radial fuctios. Samplig Theory i Sigal ad Image Processig, 3: 8, 4. [7] M. Frigo ad S. G. Johso. FFTW, a C subroutie library. http://www.fftw.org/. 8
[8] L. Greegard ad J. Strai. The fast Gauss trasform. SIAM J. Sci. Stat. Comput., :79 94, 99. [9] L. Greegard ad X. Su. A ew versio of the fast Gauss trasform. Doc. Math. J. DMV, 3:575 584, 998. [] S. Kuis ad D. Potts. NFFT, Softwarepackage, C subroutie library. http://www.math.ui-luebeck.de/potts/fft, 4. [] D. Potts ad G. Steidl. Fast summatio at oequispaced kots by NFFTs. SIAM J. Sci. Comput., 4:3 37, 3. [] D. Potts, G. Steidl, ad A. Niesloy. Fast covolutio with radial kerels at oequispaced kots. Numer. Math., 98:39 35, 4. [3] D. Potts, G. Steidl, ad M. Tasche. Fast Fourier trasforms for oequispaced data: A tutorial. I J. J. Beedetto ad P. J. S. G. Ferreira, editors, Moder Samplig Theory: Mathematics ad Applicatios, pages 47 7, Bosto,. Birkhäuser. [4] G. Steidl. A ote o fast Fourier trasforms for oequispaced grids. Adv. Comput. Math., 9:337 353, 998. [5] J. Strai. The fast Gauss trasform with variable scales. SIAM J. Sci. Stat. Comput., :3 39, 99. 9