Improved Fast Gauss Transform. (based on work by Changjiang Yang and Vikas Raykar)

Similar documents
Improved Fast Gauss Transform. Fast Gauss Transform (FGT)

Improved Fast Gauss Transform

Efficient Kernel Machines Using the Improved Fast Gauss Transform

Computational tractability of machine learning algorithms for tall fat data

Scalable machine learning for massive datasets: Fast summation algorithms

Vikas Chandrakant Raykar Doctor of Philosophy, 2007

CS 534: Computer Vision Segmentation III Statistical Nonparametric Methods for Segmentation

Fast optimal bandwidth selection for kernel density estimation

Mean-Shift Tracker Computer Vision (Kris Kitani) Carnegie Mellon University

Fast Multipole Methods: Fundamentals & Applications. Ramani Duraiswami Nail A. Gumerov

Improved Fast Gauss Transform and Efficient Kernel Density Estimation

12 - Nonparametric Density Estimation

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Solving systems of polynomial equations and minimization of multivariate polynomials

41903: Introduction to Nonparametrics

Lecture 8: Interest Point Detection. Saad J Bedros

Function approximation

Dual-Tree Fast Gauss Transforms

Nonparametric Methods

Corners, Blobs & Descriptors. With slides from S. Lazebnik & S. Seitz, D. Lowe, A. Efros

Fast large scale Gaussian process regression using the improved fast Gauss transform

Fast Multipole Methods for The Laplace Equation. Outline

Nonparametric Density Estimation

5746 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 12, DECEMBER X/$ IEEE

Lecture 2: The Fast Multipole Method

Fast Multipole Methods

Visual Tracking via Geometric Particle Filtering on the Affine Group with Optimal Importance Functions

Chapter 11. Taylor Series. Josef Leydold Mathematical Methods WS 2018/19 11 Taylor Series 1 / 27

Lecture 8: Interest Point Detection. Saad J Bedros

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES

Lecture 1. (i,j) N 2 kx i y j, and this makes k[x, y]

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Department of Mathematics, University of California, Berkeley. GRADUATE PRELIMINARY EXAMINATION, Part A Spring Semester 2015

Non-linear least squares

Nonparametric Methods Lecture 5

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

PREMUR Seminar Week 2 Discussions - Polynomial Division, Gröbner Bases, First Applications

ECE521 lecture 4: 19 January Optimization, MLE, regularization

TAYLOR SERIES [SST 8.8]

Shape of Gaussians as Feature Descriptors

Numerical Analysis for Statisticians

Adaptive Nonparametric Density Estimators

Edges and Scale. Image Features. Detecting edges. Origin of Edges. Solution: smooth first. Effects of noise

Advanced Machine Learning & Perception

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

Gaussian mean-shift algorithms

Fast Krylov Methods for N-Body Learning

QUALITY AND EFFICIENCY IN KERNEL DENSITY ESTIMATES FOR LARGE DATA

A Fast N-Body Solver for the Poisson(-Boltzmann) Equation

Course Requirements. Course Mechanics. Projects & Exams. Homework. Week 1. Introduction. Fast Multipole Methods: Fundamentals & Applications

Multivariate Random Variable

A Remark on the Fast Gauss Transform

The Fast Multipole Method and other Fast Summation Techniques

Fast Methods for Inference in Graphical Models

18.02 Multivariable Calculus Fall 2007

Rewriting Polynomials

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Section 5.2 Series Solution Near Ordinary Point

Problem 1A. Find the volume of the solid given by x 2 + z 2 1, y 2 + z 2 1. (Hint: 1. Solution: The volume is 1. Problem 2A.

Computational Methods. Least Squares Approximation/Optimization

Learning gradients: prescriptive models

Estimating functional uncertainty using polynomial chaos and adjoint equations

APPROXIMATING GAUSSIAN PROCESSES

An Overview of Fast Multipole Methods

Segmentation of Subspace Arrangements III Robust GPCA

CS 556: Computer Vision. Lecture 21

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

CSCI 250: Intro to Robotics. Spring Term 2017 Prof. Levy. Computer Vision: A Brief Survey

MATH 590: Meshfree Methods

Probabilistic Graphical Models (I)

Kernelized Perceptron Support Vector Machines

CS4670: Computer Vision Kavita Bala. Lecture 7: Harris Corner Detec=on

CS 3710: Visual Recognition Describing Images with Features. Adriana Kovashka Department of Computer Science January 8, 2015

ax 2 + bx + c = 0 where

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Fast Straightening Algorithm for Bracket Polynomials Based on Tableau Manipulations

Topics in Nonlinear and Robust Estimation Theory

On the Complexity of Gröbner Basis Computation for Regular and Semi-Regular Systems

Video and Motion Analysis Computer Vision Carnegie Mellon University (Kris Kitani)

Analysis of Algorithm Efficiency. Dr. Yingwu Zhu

Diffusion/Inference geometries of data features, situational awareness and visualization. Ronald R Coifman Mathematics Yale University

Curve Fitting Re-visited, Bishop1.2.5

CS 664 Segmentation (2) Daniel Huttenlocher

Lecture 15: Algebraic Geometry II

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Between Sparse and Dense Arithmetic

Second Order ODEs. CSCC51H- Numerical Approx, Int and ODEs p.130/177

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Chenhan D. Yu The 3rd BLIS Retreat Sep 28, 2015

Designing Kernel Functions Using the Karhunen-Loève Expansion

Mobile Robot Localization

Gross Motion Planning

1 + lim. n n+1. f(x) = x + 1, x 1. and we check that f is increasing, instead. Using the quotient rule, we easily find that. 1 (x + 1) 1 x (x + 1) 2 =

The Lovász Local Lemma: constructive aspects, stronger variants and the hard core model

Fast Direct Methods for Gaussian Processes

Learning Methods for Linear Detectors

Transcription:

Improved Fast Gauss Transform (based on work by Changjiang Yang and Vikas Raykar)

Fast Gauss Transform (FGT) Originally proposed by Greengard and Strain (1991) to efficiently evaluate the weighted sum of Gaussians: G(y j )= N i=1 q i e y j x i =h ; j =1; : : : ; M : FGT is an important variant of Fast Multipole Method (Greengard & Rokhlin 1987)

Fast Gauss Transform (cont d) The weighted sum of Gaussians G(y j )= N i=1 q i e y j x i =h ; j =1; : : : ; M : Targets Sources is equivalent to the matrix-vector product: G(y 1 ) e x 1 y 1 =h e x y 1 =h e x N y 1 =h G(y ). = e x 1 y =h e x y =h e x N y =h q....... G(y M ) e x 1 y M =h e x y M =h e x N y M =h Direct evaluation of the sum of Gaussians requires O(N ) operations. Fast Gauss transform reduces the cost to O(N logn) operations. q 1 q N

Fast Gauss Transform (cont d) Three key components of the FGT: The factorization or separation of variables Space subdivision scheme Error bound analysis

Factorization of Gaussian Factorization is one of the key parts of the FGT. Factorization breaks the entanglements between the sources and targets. The contributions of the sources are accumulated at the centers, then distributed to each target. Center Sources Targets Sources Targets Sources Targets

Factorization in FGT: Hermite Expansion The Gaussian kernel is factorized into Hermite functions e y x i =h = p 1 n=0 where Hermite function h n (x) is defined by Exchange the order of the summations 1 n! ( xi x h h n (x)=( 1) n dn dx n ) n ( y x h n ( e x) : h ) +²(p); a.k.a. Moment, A n

The FGT Algorithm Step 0: Subdivide the space into uniform boxes. Step 1: Assign sources and targets into boxes. Step : For each pair of source and target boxes, compute the interactions using one of the four possible ways: N B N F, M C M L : Direct evaluation N B N F, M C > M L : Taylor expansion N B > N F, M C M L : Hermite expansion N B > N F, M C > M L : Hermite expansion --> Taylor expansion Step 3: Loop through boxes evaluating Taylor series for boxes with more than M L targets. N B : # of sources in Box B M C : # of targets in Box C N F : cutoff of Box B M L : cutoff of Box C

Too Many Terms in Higher Dimensions The higher dimensional Hermite expansion is the Kronecker product of d univariate Hermite expansions. Total number of terms is O(p d ), p is the number of truncation terms. The number of operations in one factorization is O(p d ). h 0 h 0 h 0 h 1 h 0 h h 0 h 1 h h 1 h 0 h 1 h 1 h 1 h h h 0 h h 1 h h d=1 d= d=3 d>3

Too Many Boxes in Higher Dimensions The FGT subdivides the space into uniform boxes and assigns the source points and target points into boxes. For each box the FGT maintain a neighbor list. The number of the boxes increases exponentially with the dimensionality. d=1 d= d=3 d>3

FGT in Higher Dimensions The FGT was originally designed to solve the problems in mathematical physics (heat equation, vortex methods, etc), where the dimension is up to 3. The higher dimensional Hermite expansion is the product of univariate Hermite expansion along each dimension. Total number of terms is O(p d ). The space subdivision scheme in the original FGT is uniform boxes. The number of boxes grows exponentially with dimension. Most boxes are empty. The exponential dependence on the dimension makes the FGT extremely inefficient in higher dimensions.

Improved fast Gauss transform (IFGT)

Improved Fast Gauss Transform Multivariate Taylor expansion Multivariate Horner s Rule Space subdivision based on k-center algorithm Error bound analysis

New factorization

Multivariate Taylor Expansions The Taylor expansion of the Gaussian function: e y j x i =h = e y j x =h e x i x =h e (y j x ) (x i x )=h ; The first two terms can be computed independently. The Taylor expansion of the last term is: where α=(α 1, L, α d ) is multi-index. The multivariate Taylor expansion about center x * : where coefficients C α are given by

Reduction from Taylor Expansions The idea of Taylor expansion of the factored Gaussian kernel is first introduced in the course CMSC878R. The number of terms in multivariate Hermite expansion is O(p d ). The number of terms in multivariate Taylor expansion ( ) p+d 1 d is, asymptotically O(d p ), a big reduction for large d. 10 0 10 14 Number of terms Number of terms 10 18 10 16 10 14 10 1 10 10 10 8 10 6 10 4 10 Hermite Expansion Taylor Expansion Number of terms Number of terms 10 1 10 10 10 8 10 6 10 4 10 Hermite Expansion Taylor Expansion 10 0 d Dimension d p Truncation order p Fix p = 10, vary d = 1:0 Fix d = 10, vary p = 1:0 0 4 6 8 10 1 14 16 18 0 10 0 0 4 6 8 10 1 14 16 18 0

Global nature of the new expansion

Improved Fast Gauss Transform Multivariate Taylor expansion Multivariate Horner s Rule Space subdivision based on k-center algorithm Error bound analysis

Monomial Orders Let α=(α 1, L, α n ), β=(β 1, L, β n ), then three standard monomial orders: Lexicographic order, or dictionary order: α lex β iff the leftmost nonzero entry in α - β is negative. Graded lexicographic order (Veronese map, a.k.a, polynomial embedding): α grlex β iff 1 i n α i < 1 i n β i or ( 1 i n α i = 1 i n β i and α lex β ). Graded reverse lexicographic order: α grevlex β iff 1 i n α i < 1 i n β i or ( 1 i n α i = 1 i n β i and the rightmost nonzero entry in α - β is positive). Example: Let f(x,y,z) = xy 5 z + x y 3 z 3 + x 3, then w.r.t. lex f is: f(x,y,z) = x 3 + x y 3 z 3 + xy 5 z ; w.r.t. grlex f is: f(x,y,z) = x y 3 z 3 + xy 5 z + x 3 ; w.r.t. grevlex f is:f(x,y,z) = xy 5 z + x y 3 z 3 + x 3.

The Horner s Rule The Horner s rule (W.G.Horner 1819) is to recursively evaluate the polynomial p(x) = a n x n + L + a 1 x + a 0 as: p(x) = ((L(a n x + a n-1 )x+l)x + a 0. It costs n multiplications and n additions, no extra storage. We evaluate the multivariate polynomial iteratively using the graded lexicographic order. It costs n multiplications and n additions, and n storage. 1 a b c a b c x 0 x 1 x=(a,b,c) a b c a ab ac b bc c x a b c a 3 a b a c ab abc ac b 3 b c bc c 3 x 3 Figure 1: Efficient expansion of the multivariate polynomials. The arrows point to the leading terms.

An Example of Taylor Expansion Suppose x = (x 1, x, x 3 ) and y = (y 1, y, y 3 ), then Constant 19 ops Direct: 9ops 1 4/3 4 4 4 4 4 4 8 4 4/3 4 3 4 4/3 1 19 ops Direct:9ops x 1 x 1 x x 1 x x 3 x 1 x 3 x x x 3 x 3 x 1 3 x 1 x x 1 x 3 x 1 x x 1 x x 3 x 1 x 3 x 3 x x 3 x x 3 x 3 3 1 19 ops Direct: 9 ops y 1 y 1 y 3 1 y y 1 y y 1 y y 3 y 1 y 3 y 1 y 3 y y 1 y y y 3 y 1 y y 3 y 3 y 1 y 3 y 3 y y 3 y y 3 y 3 3

An Example of Taylor Expansion (Cont d) 1 4 4 4 Constant 19 ops Direct: 9ops 4/3 1 x 1 x 1 4 x x 1 x 4 x 3 x 1 x 3 4 x 8 x x 3 4 x 3 4/3 4 4 4/3 1N ops Direct:31Nops x 1 3 1 y 1 y 1 x 1 x y y 1 y x 1 x 3 y 3 y 1 y 3 x 1 x y x 1 x x 3 y y 3 x 1 x 3 y 3 x 3 x x 3 x x 3 x 3 3 0 ops Direct:30ops y 1 3 y 1 y y 1 y 3 y 1 y y 1 y y 3 y 1 y 3 y 3 y y 3 y y 3 y 3 3

Improved Fast Gauss Transform Multivariate Taylor expansion Multivariate Horner s Rule Space subdivision based on k-center algorithm Error bound analysis

Space Subdivision Scheme The space subdivision scheme in the original FGT is uniform boxes. The number of boxes grows exponentially with the dimensionality. The desirable space subdivision should adaptively fit the density of the points. The cell should be as compact as possible. The algorithm is better to be a progressive one, that is the current space subdivision is from the previous one. Based on the above considerations, we model the task as k-center problem.

k-center Algorithm The k-center problem is defined to seek the best partition of a set of points into clusters (Gonzalez 1985, Hochbaum and Shmoys 1985, Feder and Greene 1988). Given a set of points and a predefined number k, k-center clustering is to find a partition S = S 1 S L S k that minimizes max 1 i k radius(s i ), where radius(s i ) is the radius of the smallest disk that covers all points in S i. The k-center problem is NP-hard but there exists a simple -approximation algorithm. Smallest circles

Farthest-Point Algorithm The farthest-point algorithm (a.k.a. k-center algorithm) is a -approximation of the optimal solution (Gonzales 1985). The total running time is O(kn), n is the number of points. It can be reduced to O(n log k) using a slightly more complicated algorithm (Feder and Greene 1988). 1. Initiallyrandomlypick a point v 0 as thefirst center andaddit to the center set C.. For i=1 tok 1 do For every point v V,computethedistancefromv to thecurrent centerset C={v 0,v 1,...,v i 1 }: d i (v,c)=min c C v c. FromthepointsV C findapointv i thatisfarthestawayfromthe currentcenter setc,i.e. d i (v i,c)=max v min c C v c. Addv i tothecenter setc. 3. Returnthecenter set C ={v 0,v 1,...,v k 1 } as thesolutionto k-center problem.

A Demo of k-center Algorithm k = 4

Results of k-center Algorithm The results of k-center algorithm. 40,000 points are divided into 64 clusters in 0.48 sec on a 900MHZ PIII PC.

More Results of k-center Algorithm The 40,000 points are on the manifolds.

Results of k-center Algorithm The computational complexity of k-center algorithm is O(n logk). Points are generated using uniform distribution. (Left) Number of points varies from 1000 to 40000 for 64 clusters; (Right) The number of clusters k varies from 10 to 500 for 40000 points. Time(s) Time(s) 0.18 0.16 0.14 0.1 0.1 0.08 0.06 0.04 0.0 Time(s) Time (s) 0.4 0.35 0.3 0.5 0. 0.15 0.1 0 0 0.5 1 1.5.5 3 3.5 4 n x 10 4 n 0.05 10 1 10 10 3 Log k log k

Improved Fast Gauss Transform Algorithm Control series truncation error Collect the contributions from sources to centers Control far field cutoff error Summarize the contributions from centers to targets

Complexities FGT Expansion and evaluation Translation IFGT Expansion and evaluation; data structures

Optimal number of centers

Improved Fast Gauss Transform Multivariate Taylor expansion Multivariate Horner s Rule Space subdivision based on k-center algorithm Error bound analysis

Error Bound of IFGT The total error from the series truncation and the cutoff outside of the neighborhood of targets is bounded by E(y) ( p ( rx ) p ( ry ) p+e ) (r q i y =h). p! h h Truncation error Cutoff error r x r y r x Sources Targets

Error Bound Analysis Increase the number of truncation terms p, the error bound will decrease. With the progress of k-center algorithm, the radius of the source points r x will decrease, until the error bound is less than a given precision. The error bound first decreases, then increases with respect to the cutoff radius r y. 10 3 10 Real max abs error Estimated error bound 10 4 10 3 Real max abs error Estimated error bound 10 3 10 Real max abs error Estimated error bound 10 1 10 10 1 Error Error 10 0 10 1 Error 10 1 10 0 10 1 Error Error 10 0 10 1 10 10 10 10 3 10 3 10 3 10 4 10 4 0 4 6 8 10 1 14 16 18 0 p 10 4 0.3 0.4 0.5 rx 0.6 0.7 0.8 10 5 0.5 1 1.5.5 3 3.5 4 4.5 5 ry Truncation order p Max radius of cells Cutoff radius

Experimental Result The speedup of the fast Gauss transform in 4, 6, 8, 10 dimensions (h=1.0). 10 10 1 10 0 direct method, 4D fast method, 4D direct method, 6D fast method, 6D direct method, 8D fast method, 8D direct method, 10D fast method, 10D 10 3 10 4 4D 6D 8D 10D CPU time 10 1 10 Max abs error 10 5 10 3 10 4 10 10 3 10 4 N 10 6 10 10 3 10 4 N

New improvements in the IFGT

Pointwise error bounds and translation Can be used to speed-up general FMM also

Some Applications Kernel density estimation Mean-shift based segmentation Object tracking Robot navigation

Kernel Density Estimation (KDE) Kernel density estimation (a.k.a Parzen method, Rosenblatt 1956, Parzen 196) is an important nonparametric technique. KDE is the keystone of many algorithms: Radial basis function networks Support vector machines Mean shift algorithm Regularized particle filter The main drawback is the quadratic computational complexity. Very slow for large dataset.

Kernel Density Estimation Given a set of observations {x 1, L, x n }, an estimate of density function is Kernel function fˆ n (x)= 1 nh d Dimension Some commonly used kernel functions n i=1 K ( ) x xi h Bandwidth Rectangular Triangular Epanechnikov Gaussian The computational requirement for large datasets is O(N ), for N points.

Efficient KDE and FGT In practice, the most widely used kernel is the Gaussian K N (x)=(π) d= e 1 x. The density estimate using the Gaussian kernel: ˆp n (x)=c N N e x x i =h i=1 Fast Gauss transform can reduce the cost to O(N logn) in low-dimensional spaces. Improved fast Gauss transform accelerates the KDE in both lower and higher dimensions.

Experimental Result Image segmentation results of the mean-shift algorithm with the Gaussian kernel. Size: 43X94 Time: 7.984 s Direct evaluation: more than hours Size: 481X31 Time: 1.359 s Direct evaluation: more than hours

Some Applications Kernel density estimation Mean-shift based segmentation Object tracking Robot Navigation

Object Tracking Goal of object tracking: find the moving objects between consecutive frames. A model image or template is given for tracking. Usually a feature space is used, such as pixel intensity, colors, edges, etc. Usually a similarity measure is used to measure the difference between the model image and current image. Temporal correlation assumption: the change between two consecutive frames is small. Similarity function Model image Target image

Proposed Similarity Measures Our similarity measure Directly computed from the data points. Well scaled into higher dimensions. Ready to be speeded up by fast Gauss transform, if Gaussian kernels are employed. Interpreted as the expectation of the density estimations over the model and target images. d

Experimental results Pure translation Feature space: RGB + D spatial space Average processing time per frame: 0.0169s. Our method Histogram tracking using Bhattacharyya distance

Experimental results Pure translation Feature space: RGB + D gradient + D spatial space Average processing time per frame: 0.0044s.

Experimental results Pure translation Feature space: RGB + D spatial space The similarity measure is robust and accurate to the small target, compared with the other methods, such as histogram.

Experimental Results Translation + Scaling Feature space: RGB + D spatial space

Conclusions Improved fast Gauss transform is presented for higher dimensional spaces. Kernel density estimation is accelerated by the IFGT. A similarity measure in joint feature-spatial space is proposed for robust object tracking. With different motion models, we derived several tracking algorithms. The improved fast Gauss transform is applied to speed up the tracking algorithms to real-time performance.