Improved Fast Gauss Transform and Efficient Kernel Density Estimation

Size: px

Start display at page:

Download "Improved Fast Gauss Transform and Efficient Kernel Density Estimation"

Delilah Higgins
6 years ago
Views:

1 Improved Fast Gauss Transform and Efficient Kernel Density Estimation Cangjiang Yang, Ramani Duraiswami, Nail A. Gumerov and Larry Davis Perceptual Interfaces and Reality Laboratory University of Maryland, College Park, MD 07, USA Abstract Evaluating sums of multivariate Gaussians is a common computational task in computer vision and pattern recognition, including in te general and powerful kernel density estimation tecnique. Te quadratic computational complexity of te summation is a significant barrier to te scalability of tis algoritm to practical applications. Te fast Gauss transform (FGT) as successfully accelerated te kernel density estimation to linear running time for lowdimensional problems. Unfortunately, te cost of a direct extension of te FGT to iger-dimensional problems grows exponentially wit dimension, making it impractical for dimensions above 3. We develop an improved fast Gauss transform to efficiently estimate sums of Gaussians in iger dimensions, were a new multivariate expansion sceme and an adaptive space subdivision tecnique dramatically improve te performance. Te improved FGT as been applied to te mean sift algoritm acieving linear computational complexity. Experimental results demonstrate te efficiency and effectiveness of our algoritm. Introduction In most computer vision and pattern recognition applications, te feature space is complex, noisy and rarely can be described by te common parametric models [7], since te forms of te underlying density functions are in general unknown. In particular, data in ig-dimensional feature space becomes more sparse and scattered, making it muc more difficult to fit tem wit a single ig-dimensional density function. By contrast, witout te assumption tat te form of te underlying densities are known, nonparametric density estimation tecniques [, 0] ave been widely used to analyze arbitrarily structured feature spaces. Te most widely studied and used nonparametric tecnique is kernel density estimation (KDE), first introduced by Rosenblatt [], ten discussed in detail by Parzen [0] and Cacoullos [3]. In tis tecnique te density function is estimated by a sum of kernel functions (typically Gaussians) centered at te data points. A bandwidt associated wit te kernel function is cosen to control te smootness of te estimated densities. In general, more data points allow a narrower bandwidt and a better density estimate. Many approaces in computer vision and pattern recognition use kernel density estimation, including support vector macines [3], M-estimation [], normalized cut [] and mean sift analysis [5]. Wit enoug samples, te kernel density estimates provably converge to any arbitrary density function. On te oter and, te number of samples needed may be very large and muc greater tan would be required for parametric models. Moreover, te demand for large number of samples grows rapidly wit te dimension of te feature space. Given N source data points, te direct evaluation of densities at M target points takes O(MN) time. Te large dataset also leads to severe requirements for computational time and/or storage. Various metods ave been proposed to make te process of kernel density estimation more efficient. Te existing approaces can be rougly divided into two categories. One is based on te k-nearest neigbor searcing, were spatial data structures and/or branc and bound are employed to acieve te computational saving [,, 0, 9]. One is based on te fast Fourier transform (FFT) for evaluating density estimates on gridded data wic, owever, are unavailable for most applications [5]. Recently te fast multipole metod (FMM) and fast Gauss transform (FGT) ave been used to reduce te computational time of kernel density estimation to O(M + N) time, were te data are not necessarily on grids [5, ]. As faster computers and better video cameras become ceaper, te collection of sufficient data is becoming possible, wic results in a steady increase in te size of te dataset and te number of te features. Unfortunately te existing approaces including te fast Gauss transform suffer from te curse of dimensionality. Te complexity of computation and storage of te FGT grows exponentially wit dimension. In tis paper, we proposed an improved fast Gauss transform (IFGT) to efficiently evaluate te sum of Gaussians in iger dimensions. By iger dimensions,

2 we mean dimensions up to ten. Suc ig dimensional spaces are commonly used in many applications suc as in video sequence analysis and eigenspace based approaces. We also sow ow te IFGT can be applied to te kernel density estimation. Specifically te mean sift algoritm [,, 5] is cosen as a case study for te IFGT. Te mean sift algoritm is based on te KDE and recently rediscovered as a robust clustering metod. However, te mean sift algoritm suffers from te quadratic computational complexity, especially in iger dimensions. Te proposed IFGT successfully reduced te computational complexity into linear time. FMM and FGT Te fast Gauss transform introduced by Greengard and Strain [5, ] is an important variant of te more general fast multipole metod [3, ]. Originally te FMM was developed for te fast summation of potential fields generated by a large number of sources, suc as tose arising in gravitational or electrostatic potential problems in two or tree dimensions. Tereafter, tis metod was extended to oter potential problems, suc as tose arising in te solution of te Helmoltz and Maxwell equations, tose in cemistry and interpolation of scattered data [].. Fast Multipole Metod We briefly describe te FMM ere. Consider te sum v(y j ) = u i φ i (y j ), j =,...,M. () Direct evaluation requires O(M N) operations. In te FMM, we assume tat te functions φ i can be expanded in multipole (singular) series and local (regular) series tat are centered at locations x and y as follows: φ(y) = b n (x )S n (y x ) + ǫ(p), () φ(y) = a n (y )R n (y y ) + ǫ(p), (3) were S n and R n respectively are multipole (singular) and local (regular) basis functions, x and y are expansion centers, a n,b n are te expansion coefficients, and ǫ is te error introduced by truncating a possibly infinite series after p terms. Te operation reduction trick of te FMM relies on expressing te sum () using te series expansions () and (3). Ten te reexpansion for (3) is: v(y j ) = u i φ i (y j ) = c ni R n (y j x ), () u i p for j =,...,M. A similar expression can be obtained for (). Consolidating te N series into one p term series, by rearranging te order of summations, we get [ N v(y j )= u i c ni ]R n (y j x )= C n R n (y j x ). (5) Te single consolidated p term series (5) can be evaluated at all te M evaluation points. Te total number of operations required is ten O(Mp + Np) O(Np) for N M. Te truncation number p depends on te desired accuracy alone, and is independent of M, N. Te functions φ i in te FMM are not valid over te wole domain. So te singular expansions () are generated around clusters of sources. In a fine-to-coarse pass, te generated coefficients are translated into coarser level singular expansions troug a tree data structure by translation operators. In a coarse-to-fine pass, te coefficients of te singular expansions at coarser level are converted via a sequence of translations to coefficients of regular expansions at finer levels, ten evaluated at eac evaluation point.. Fast Gauss Transform Te fast Gauss transform was introduced in [5] for efficient computation of te weigted sum of Gaussians G(y j ) = q i e yj xi / () were q i are te weigt coefficients, {x i },...,N are te centers of te Gaussians (called sources ), is te bandwidt parameter of te Gaussians. Te sum of te Gaussians is evaluated at eac of te target points {y j } j=,...,m. Direct evaluation of te sum at M target points due to N sources requires O(MN) operations. Te original FGT directly applies te FMM idea by using te following expansions for te Gaussian: e y xi / = e y xi / = n! ( ) n ( xi x y x n n! n ) + ǫ(p), (7) ( )( ) n xi y y y + ǫ(p), were te Hermite functions n (x) are defined by n (x) = ( ) n dn dx n ( e x). Te two expansions (7) and () are identical, except tat te arguments of te Hermite functions and te monomials (Taylor series) are flipped. Te first is used as te counterpart of te multipole expansion, wile te second is used as te local expansion. Te FGT ten uses tese ()

3 expansions and applies te FMM mecanism to acieve its speedup. Conversion of a Hermite series into a Taylor series is acieved via a translation operation. Te error bound estimate given by Greengard and Strain [5] is incorrect, and a new and more complicated error bound estimate was presented in []. Te extension to iger dimensions was done by treating te multivariate Gaussian as a product of univariate Gaussians, applying te series factorizations (7) and () to eac dimension. For convenience s sake, we adopt te multiindex notation of te original FGT papers [5]. A multiindex α = (α,...,α d ) is a d-tuple of nonnegative integers. For any multi-index α N d and any x R d, we ave te monomial x α = x α xα xα d d. Te lengt and te factorial of α are defined as α = α + α α d, α! = α!α! α d!. Te multidimensional Hermite functions are defined by α (x) = α (x ) α (x ) αd (x d ). Te sum () is ten equal to te Hermite expansion about center x : G(y j ) = ( ) yj x C α α, (9) α 0 were te coefficients C α are given by C α = α! ( ) α xi x q i. (0) Te FGT in iger dimensions is ten just an accumulation of te product of te Hermite expansions along eac dimension. If we truncate eac of te Hermite series after p terms (or equivalently order p ), ten eac of te coefficients C α is a d-dimensional matrix wit p d terms. Te total computational complexity for a single Hermite expansion is O((M + N)p d ). Te factor O(p d ) grows exponentially as te dimensionality d increases. Despite tis defect in iger dimensions, te FGT is quite effective for two and tree-dimensional problems, and as already acieved success in some pysics, computer vision and pattern recognition problems [, ]. Anoter serious defect of te original FGT is te use of te box data structure. Te original FGT subdivides te space into boxes using a uniform mes. However, suc a simple space subdivision sceme is not appropriate in iger dimensions, especially in applications were te data migt be clustered on low dimensional manifolds. First of all, it may generate too many boxes (largely empty) in iger dimensions to store and manipulate. Suppose te unit box in 0 dimensional space is divided into tents along eac dimension, tere are 0 0 boxes wic may cause trouble in storage and waste time on processing empty boxes. Secondly, and more importantly, aving so many boxes makes it more difficult for searcing nonempty neigbor boxes. Finally, and most importantly te worst property of tis sceme is tat te ratio of volume of te ypercube to tat of te inscribed spere grows exponentially wit dimension. In oter words, te points ave a ig probability of falling into te area inside te box and outside te spere. Te truncation error of te above Hermite expansions (7) and () are muc larger near te boundary tan near te expansion center, wic will bring large truncation errors on most of te points. In brief, te original FGT suffers from te following two defects tat are te motivation beind tis paper:. Te exponential growt of complexity wit dimensionality.. Te use of te box data structure in te FGT is inefficient in iger dimensions. 3 Improved Fast Gauss Transform 3. A Different Factorization Te defects listed above can be tougt as a result of applying te FMM metodology to te FGT blindly. As sown in section, te FMM was developed for singular potential functions wose forces are long-ranged and nonsmoot (at least locally), ence it is necessary to make use of te tree data structures, multipole expansions, local expansions and translation operators. In contrast, te Gaussian is far from singular it is infinitely differentiable! Tere is no need to perform te multipole expansions wic account for te far-field contributions. Instead we present a simple new factorization and space subdivision sceme for te FGT. Te new approac is based on te fact tat te Gaussian, especially in iger dimensions, decays so rapidly tat te contributions outside of a certain radius can be safely ignored. Assuming we ave N sources {x i } centered at x and M target points {y j }, we can rewrite te exponential term as e yj xi / = e yj / e xi / e yj xi/, () were y j = y j x, x i = x i x. In expression () te first two exponential terms can be evaluated individually at eiter te source points or te target points. Te only problem left is to evaluate te last term were sources and targets are entangled. One way of breaking te entanglement is to expand it into te series e yj xi/ = Φ n ( y j )Ψ n ( x i ), ()

4 were Φ n and Ψ n are te expansion functions and will be defined in te next section. Denoting φ( y j ) = e yj /, ψ( x i ) = e xi /, we can rewrite te sum () as G(y j ) = q j φ( y j )ψ( x i ) Φ n ( y j )Ψ n ( x i ). (3) If te infinite series () absolutely converges, we can truncate it after p terms so as to obtain a desired precision. Excanging te summations in (3), we obtain G(y j ) = φ( y j ) C n Φ n ( y j ) + ǫ(p), () C n = q i ψ( x i )Ψ n ( x i ). (5) Te factorization () is te basis of our algoritm. In te following sections, we will discuss ow to implement it in an efficient way. 3. Multivariate Taylor Expansions Te key issue to speed up te FGT is to reduce te factor p d in te computational complexity. Te factor p d arises from te way tat te multivariate Gaussian is treated as te product of univariate Gaussian functions and expanded along eac dimension. To reduce tis factor, we treat te dot product in () as a scalar variable and expand it via te Taylor expansion. Te expansion functions Φ and Ψ are expressed as multivariate polynomials. We denote by Π d n te space of all real polynomials in d variables of total degree less tan or equal to n; its dimensionality is r nd = ( ) n+d d. To store, manipulate and evaluate te multivariate polynomials, we consider te monomial representation of polynomials. A polynomial p Π d n can be written as p(x) = C α x α, C α R. () α n It is computationally convenient and efficient to stack all te coefficients into a vector. To store all te r nd coefficients C α in a vector of lengt r nd, we sort te coefficient terms according to Graded lexicograpic order. Graded refers to te fact tat te total degree α is te main criterion. Graded lexicograpic ordering means tat te multi-indices are arranged as (0,0,...,0),(,0,...,0),(0,,...,0),...,(0,0,...,), (,0,...,0),(,,...,0),...,(0,0,...,),...,(0,0,...,n). Te power of te dot product of two vectors x and y can be expanded into multivariate polynomial: (x y) n = ( ) n x α y α, (7) α α =n a b c a b c a b c a ab ac b bc c a b c a 3 a b a c ab abc ac b 3 b c bc c 3 Figure : Efficient expansion of te multivariate polynomials. Te arrows point to te leading terms. were ( ) n α = n! α! α d! are te multinomial coefficients. So we ave te following multivariate Taylor expansion of te Gaussian functions: e x y = α 0 α α! xα y α. () From Eqs.(), () and (), te weigted sum of Gaussians () can be expressed as a multivariate Taylor expansions about center x : G(y j ) = α 0 C α e yj x / ( yj x were te coefficients C α are given by C α = α α! ) α, (9) ( ) α q i e xi x / xi x. (0) If we truncate te series after total degree p, te number of te terms r p,d = ( ) p+d d is muc less tan p d in iger dimensions (as sown in Table ). For instance, wen d = and p = 0, te original FGT needs 0 terms, te multivariate Taylor expansion needs only For d and moderate p, te number of terms becomes O(d p ), a substantial reduction. One of te benefits of te graded lexicograpic order is tat te expansion of multivariate polynomials can be computed efficiently. For a d-variate polynomial of order n, we can store all terms in a vector of lengt r nd. Starting from te order zero term (constant ), we take te following steps recursively. Assume we ave already evaluated terms of order k. Ten terms of order k can be obtained by multiplying eac of te d variables wit all te terms between te variable s leading term and te end, as sown in te Figure. Te required storage is r nd and te computations of te terms require r nd multiplications. 3.3 Spatial Data Structures As discussed above, we need to subdivide space into cells and collect te influence of te sources witin eac

5 Table : Number of terms in d-variate Taylor expansion truncated after order p. p\d cell. Te influence on eac target can be summarized from its neigboring cells tat lie witin a certain radius from te target. To efficiently subdivide te space, we need to devise a sceme tat adaptively subdivides te space according to te distribution of points. It is also desirable to generate cells as compact as possible. Based on te above considerations, we model te space subdivision task as a k-center problem, wic is defined as follows: given a set of n points and a predefined number of te clusters k, find a partition of te points into clusters S,..., Sk, and also te cluster centers c,..., ck, so as to minimize te cost function te maximum radius of clusters: max max kv ci k. i v Si Te k-center problem is known to be N P -ard []. Gonzalez [] proposed a very simple greedy algoritm, called fartest-point clustering. Initially pick an arbitrary point v0 as te center of te first cluster and add it to te center set C. Ten for i = to k do te follows: in iteration i, for every point, compute its distance to te set C: di (v, C) = minc C kv ck. Let vi be a point tat is fartest away from C, i.e., a point for wic di (vi, C) = maxv di (v, C). Add vi to set C. Report te points v0, v,..., vk as te cluster centers. Eac point is assigned to its nearest center. Gonzalez [] proved tat fartest-point clustering is a approximation algoritm wic computes a partition wit maximum radius at most twice te optimum. Te proof uses no geometry beyond te triangle inequity, so it old for any metric space. Hocbaum and Smoys [7] proved tat te factor cannot be improved unless P = N P. Te direct implementation of fartest-point clustering as running time O(nk). Feder and Greene [9] give a two-pase algoritm wit optimal running time O(n log k). Te predefined number of clusters k can be determined as follows: run te fartest-point algoritm until te maximum radius of clusters decreases to a given distance. In practice, te initial point as little influence on te final radius of te approximation, if number of te points n is sufficiently large. Figure displays te results of fartestpoint algoritm. In two dimensions, te algoritm leads to a Voronoi tessellation of te space. In tree dimensions, te partition boundary resembles te surface of a crystal Figure : Te fartest-point algoritm divides 0000 points into clusters (wit te centers indicated by te crosses) in 0. seconds on a 900MHZ PIII PC. Left: normal distributions; Rigt: Uniform distribution. 3. Te Algoritm Te improved fast Gauss transform consists of te following steps: Step Assign te N sources into K clusters using te fartest-point clustering algoritm suc tat te radius is less tan ρx. Step Coose p sufficiently large suc tat te error estimate () in appendix is less tan te desired precision ǫ. Step 3 For eac cluster Sk wit center ck, compute te coefficients given by te expression (0): Cαk α µ α X xi ck kxi ck k /. = qi e α! xi Sk Step Repeat for eac target yj, find its neigbor clusters wose centers lie witin te range ρy. Ten te sum of Gaussians () can be evaluated by te expression (9): G(yj ) = X X kyj ck k ρy α <p Cαk e kyj ck k / µ yj ck α Te amount of work required in step is O(N K) (for large K, we can use Feder and Greene s O(N log K) algoritm [9] instead). Te amount of work required in step 3 is of O(N rpd ). Te work required in step is O(M n rpd ), were n is te maximum number of te neigbor clusters for eac target. For most nonparametric statistics, computer.

6 vision and pattern recognition applications, te precision required is moderate, we can get small K and small r pd. Since n K, te improved fast Gauss transform acieves linear running time. Te algoritm needs to store te K coefficients of size r pd, so te storage complexity is reduced to O(Kr pd ). Mean Sift Analysis wit IFGT Segmentation using te mean sift analysis is cosen as a case study for te IFGT. Mean sift is a clustering tecnique based on kernel density estimation, wic is very effective and robust for te analysis of complex feature spaces. Te mean sift procedure employing a Gaussian kernel converges to te stationary point following a smoot trajectory, wic is teoretically important for convergence [5]. In practice, te quality of te results almost always improves wen te Gaussian kernel is employed. Despite its superior performance, te Gaussian kernel is not as widely used in mean sift as it sould be. In part tis may be due to te ig computational costs wic we try to alleviate in te paper. Given n data points x,...,x n in te d-dimensional space R d, te kernel density estimator wit kernel function K(x) and a window bandwidt, is given by [, 0, 7] ˆf n (x) = n d n ( ) x xi K, () were te d-variate kernel K(x) is nonnegative and integrates to one. Te Gaussian kernel is a common coice. Te mean sift algoritm is a steepest ascent procedure wic requires estimation of te density gradient: ˆf,K (x) = c n k,d x x i n d+ (x i x)g( ) ( n x x x ig i ) () = c k,g ˆf,G (x) n g ( x x i ) x, were g(x) = k N (x) = k N(x) wic can in turn be used as profile to define a Gaussian kernel G(x). Te kernel K(x) is called te sadow of G(x) []. Bot ave te same expression. ˆf,G (x) is te density estimation wit te kernel G. c k,g is te normalization coefficient. Te last term is te mean sift ( n x x x ig i ) m(x) = n g ( x x i ) x, (3) wic is proportional to te normalized density gradient and always points toward te steepest ascent direction of te function. Te mean sift algoritm iteratively performs te following two steps till it reaces te stationary point: Table : Running times in milliseconds for direct evaluation, fast Gauss transform and improved fast Gauss transform in tree dimensions. Case N = M Direct FGT IFGT Computation of te mean sift vector m(x k ). Updating te current position x k+ = x k + m(x k ). Te numerator in expression (3) is a weigted sum of Gaussians except tat te weigts are vectors. Te denominator is a uniform weigted sum of Gaussians. So bot can be evaluated by te improved fast Gauss transform as d + independent weigted sums of Gaussians. Te computation as been furter reduced because tey sare te same space subdivisions and series expansions. 5 Experimental Results Te first experiment compares te performance of our algoritm wit te original fast Gauss transform. Since tere is no practical fast Gauss transform in iger dimensions available, we only make comparisons in tree dimensions. Te sources and targets are uniformly distributed in a unit cube. Te weigts of te sources are uniformly distributed between 0 and. Te bandwidt of te Gaussian is = 0.. We set te relative error bound to % wic is reasonable for most kernel density estimation, because te estimated density function itself is an approximation. Table reports te CPU times using direct evaluation, te original fast Gauss transform (FGT) and te improved fast Gauss transform (IFGT). All te algoritms are programmed in C++ and were run on a 900MHz PIII PC. We can find te running time of te IFGT grows linearly as te number of sources and targets increases, wile te direct evaluation and te original FGT grows quadratically, toug it is lower tan te direct evaluation. Te poor performance of te FGT in 3D is also reported in []. Tis is probably due to te fact tat te number of boxes increases significantly by a uniform space subdivision in 3D, wic makes te cost to compute te interactions between te boxes grow quadratically. Te second experiment is to examine te performance of IFGT in iger dimensions. We randomly generate te source and target points in a unit ypercube based on a uniform distribution. Te weigts of te sources are uniformly distributed between 0 and. Te bandwidt is set to =. Te results are sown in Fig. 3. We compared te running

0 0 0 0 direct metod, D fast metod, D direct metod, D fast metod, D direct metod, D fast metod, D direct metod, 0D fast metod, 0D Table 3: Image sizes v.s. te running time of te mean sift.

359 CPU time 0 0 0 3 0 0 0 3 0 N 0 3 D D D 0D Max abs error 0 0 5 0 0 0 3 0 N Figure 3: Te running times in seconds (Top) and maximum absolute errors (Bottom) of te

Te comparisons are performed in dimensions from to 0 and results in dimensions,,, 0 are reported in Fig. 3.

Te running time of te IFGT grows linearly wit te size of te points. In,,, 0 dimensions, te IFGT takes 5ms, 0ms, 9 ms, 5ms to evaluate te sums on 0000 points.

Te worst error occurs in dimension 0, and is below 0 3.

outset. Te tird experiment is to apply te improved fast Gauss transform to te mean sift algoritm.

After 5 iterations, te convergence points are grouped by a simple k-means algoritm [7]. We do not perform any postprocessing procedure as in [5].

7 direct metod, D fast metod, D direct metod, D fast metod, D direct metod, D fast metod, D direct metod, 0D fast metod, 0D Table 3: Image sizes v.s. te running time of te mean sift. House Cooking Base Dive Zebra Size 55x9 0x53 3x9 x3 Time (s) CPU time N 0 3 D D D 0D Max abs error N Figure 3: Te running times in seconds (Top) and maximum absolute errors (Bottom) of te IFGT ( = ) v.s. direct evaluation in dimensions,,, 0. time of te direct evaluation to te IFGT wit = and N = M = 00,...,0000. Te comparisons are performed in dimensions from to 0 and results in dimensions,,, 0 are reported in Fig. 3. From te figure we notice tat te running time of te direct evaluation grows quadratically wit te size of points. Te running time of te IFGT grows linearly wit te size of te points. In,,, 0 dimensions, te IFGT takes 5ms, 0ms, 9 ms, 5ms to evaluate te sums on 0000 points. Te maximum relative absolute error as defined in [5] increases wit te dimensionality but not wit te number of points. Te worst error occurs in dimension 0, and is below 0 3. We can see tat for a 0-D problem involving more tan 700 Gaussians, te IFGT is faster tan direct evaluation, wile for a -D problem te IFGT is faster from almost te outset. Te tird experiment is to apply te improved fast Gauss transform to te mean sift algoritm. We first transform te images to L*u*v* color space and normalize to a unit cube. Ten we apply te mean sift algoritm wit = 0. to all te points in te 3D color space. After 5 iterations, te convergence points are grouped by a simple k-means algoritm [7]. We do not perform any postprocessing procedure as in [5]. Te code is written in C++ wit Matlab interfaces Figure : Segmentation results: (Rigt Column) Te original images. (Left Column) Segmented images labelled wit different colors. (Top Row) House image. (Second Row) Cooking image. (Tird Row) Base Dive image. (Bottom Row) Zebra image. and run on a 900MHz PIII PC. Te results are sown in Fig.. Te running time of te mean sift in seconds and te sizes of te images are sown in Table 3. Te speed of our implementation is at least as fast as any reported. We find tat te mean sift algoritm wit te improved fast Gauss transform already acieves clear boundaries witout any postprocessing. Tis is partly because tat we apply te mean sift algoritm to all feature points witout subsampling te feature space as in [5]. Tis leads to easily distinguisable valleys in our estimated densities. Anoter reason is tat in our metod te density evaluation at eac target point as contributions from a muc larger neigborood because of te Gaussian kernel, wic generates a smooter and better density estimate.

8 Conclusions In tis paper we present an improved fast Gauss transform to speed up te summations of Gaussians in iger dimensions. Te success in acceleration of te FGT comes from two innovations: te use of te fartest-point algoritm to adaptively subdivide te ig dimensional space, and te use of a new multivariate Taylor expansion we developed to dramatically reduce te computational and storage cost of te fast Gauss transform. Te recursive computation of te multivariate Taylor expansion furter reduces te computational cost and necessary storage. Te improved fast Gauss transform is applied to speed up te kernel density estimation wic is te keystone for many applications in computer vision and pattern recognition. Te general and powerful feature space analysis tool te mean sift algoritm is cosen as a case study for te IFGT. Using te IFGT, we acieved a linear running time mean sift algoritm. Witout using any euristic vicinity information between te points, te mean sift based image segmentation acieve satisfactory results. In future work, we will study te capability of te IFGT in applications suc as learning kernel classifiers (SVM), object tracking. We also plan to combine te IFGT wit oter tecniques [7] to furter improve te performance of mean sift algoritm. Appendix: Error Bound of Improved FGT Te error due to te truncation of series (9) after order p and te cutoff error te satisfies te bound ( ) p E(y) Q p! ρp xρ p y + e ρ y. () were Q = q j. Acknowledgments Te support by te NSF under te awards 9979, and 09 is gratefully acknowledged. References [] B. J. C. Baxter and G. Roussos. A new error estimate of te fast Gauss transform. SIAM Journal on Scientific Computing, ():57 59, 00. [] M. Bern and D. Eppstein. Approximation algoritms for geometric problems. In D. Hocbaum, editor, Approximation Algoritms for NP-Hard Problems, capter, pages PWS Publising Company, Boston, 997. [3] T. Cacoullos. Estimation of a multivariate density. Ann. Inst. Stat. Mat., ():79 9, 9. [] Y. Ceng. Mean sift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mac. Intell., 7(): , 995. [5] D. Comaniciu and P. Meer. Mean sift: A robust approac toward feature space analysis. IEEE Trans. Pattern Anal. Mac. Intell., (5):03 9, May 00. [] L. Devroye and F. Macell. Data structures in kernel density estimation. IEEE Trans. Pattern Anal. Mac. Intell., 7(3):30 3, 95. [7] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Jon Wiley & Sons, New York, nd edition, 00. [] A. Elgammal, R. Duraiswami, and L. Davis. Efficient nonparametric adaptive color modeling using fast Gauss transform. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, Kauai, Hawaii, 00. [9] T. Feder and D. Greene. Optimal algoritms for approximate clustering. In Proc. 0t ACM Symp. Teory of computing, pages 3, Cicago, Illinois, 9. [0] K. Fukunaga and R. R. Hayes. Te reduced Parzen classifier. IEEE Trans. Pattern Anal. Mac. Intell., ():3 5, 99. [] K. Fukunaga and L. D. Hostetler. Te estimation of te gradient of a density function, wit applications in pattern recognition. IEEE Trans. Inform. Teory, :3 0, 975. [] T. Gonzalez. Clustering to minimize te maximum intercluster distance. Teoretical Computer Science, 3:93 30, 95. [3] L. Greengard and V. Roklin. A fast algoritm for particle simulations. J. Comput. Pys., 73():35 3, 97. [] L. Greengard and J. Strain. A fast algoritm for te evaluation of eat potentials. Comm. Pure Appl. Mat., 3():99 93, 990. [5] L. Greengard and J. Strain. Te fast Gauss transform. SIAM J. Sci. Statist. Comput., ():79 9, 99. [] N. A. Gumerov, R. Duraiswami, and E. A. Borovikov. Data structures, optimal coice of parameters, and complexity results for generalized multilevel fast multipole metods in d dimensions. Tecnical Report UMIACS-TR-003-, UMI- ACS, University of Maryland, College Park, 003. [7] D. S. Hocbaum and D. B. Smoys. A best possible euristic for te k-center problem. Matematics of Operations Researc, 0():0, 95. [] P. Huber. Robust Statistical Procedures. SIAM, edition, 99. [9] B. Jeon and D. A. Landgrebe. Fast Parzen density estimation using clustering-based branc and bound. IEEE Trans. Pattern Anal. Mac. Intell., (9):950 95, 99. [0] E. Parzen. On estimation of a probability density function and mode. Ann. Mat. Stat., 33(3):05 07, 9. [] J. G. Postaire and C. Vasseur. A fast algoritm for nonparametric probability density estimation. IEEE Trans. Pattern Anal. Mac. Intell., ():3, 9. [] M. Rosenblatt. Remarks on some nonparametric estimates of a density function. Ann. Mat. Stat., 7(3):3 37, 95. [3] B. Sclkopf and A. Smola. Learning wit Kernels: Support Vector Macines, Regularization, Optimization and Beyond. MIT Press, Cambridge, MA, 00. [] J. Si and J. Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mac. Intell., (): 905, 000. [5] B. W. Silverman. Algoritm AS 7: Kernel density estimation using te fast Fourier transform. Appl. Stat., 3():93 99, 9. [] J. Strain. Te fast Gauss transform wit variable scales. SIAM J. Sci. Statist. Comput., (5), 99. [7] C. Yang, R. Duraiswami, D. DeMenton, and L. Davis. Mean-sift analysis using quasi-newton metods. In Proc. Int l Conf. Image Processing, Barcelona, Spain, 003.

Efficient Kernel Machines Using the Improved Fast Gauss Transform

Efficient Kernel Machines Using the Improved Fast Gauss Transform Changjiang Yang, Ramani Duraiswami and Larry Davis Department of Computer Science University of Maryland College Park, MD 20742 {yangcj,ramani,lsd}@umiacs.umd.edu