Very fast optimal bandwidth selection for univariate kernel density estimation

Size: px
Start display at page:

Download "Very fast optimal bandwidth selection for univariate kernel density estimation"

Transcription

1 Very fast optimal bandwidt selection for univariate kernel density estimation VIKAS CHANDAKANT AYKA and AMANI DUAISWAMI Perceptual Interfaces and eality Laboratory Department of Computer Science and Institute for Advanced Computer Studies University of Maryland, CollegePark, MD 783 Most automatic bandwidt selection procedures for kernel density estimates require estimation of quantities involving te density derivatives. Estimation of modes and inflexion points of densities also require derivative estimates. Te computational complexity of evaluating te density derivative at M evaluation points given N sample points from te density is O(MN). In tis paper we propose a computationally efficient ɛ exact approximation algoritm for te univariate Gaussian kernel based density derivative estimation tat reduces te computational complexity from O(MN) to linear O(N + M). Te constant depends on te desired arbitrary accuracy, ɛ. We apply te density derivative evaluation procedure to estimate te optimal bandwidt for kernel density estimation, a process tat is often intractable for large data sets. For example for N = M = 49, 6 points wile te direct evaluation of te density derivative takes around.76 ours te fast evaluation requires only 65 seconds wit an error of around. Algoritm details, error bounds, procedure to coose te parameters and numerical experiments are presented. We demonstrate te speedup acieved on te bandwidt selection using te solve-te-equation plug-in metod. We also demonstrate tat te proposed procedure can be extremely useful for speeding up exploratory projection pursuit tecniques. [CS-T-4774/UMIACS-T-5-73]: December, 5 CS-T-4774/UMIACS-T-5-73

2 aykar and Duraiswami. INTODUCTION Kernel density estimation/regression tecniques [Wand and Jones 995] are widely used in various inference procedures in macine learning, data mining, pattern recognition, and computer vision. Efficient use of tese metods require te optimal selection of te smooting parameter called te bandwidt of te kernel. A pletora of tecniques ave been proposed for data-driven bandwidt selection [Jones et al. 996]. Te most successful state of te art metods rely on te estimation of general integrated squared density derivative functionals. Tis is te most computationally intensive task, te computational cost being O(N ), in addition to te O(N ) cost of computing te kernel density estimate. Te core task is to efficiently compute an estimate of te density derivative. Te current most practically successful approac, solve-te-equation plug-in metod [Seater and Jones 99] involves te numerical solution of a non-linear equation. Iterative metods to solve tis equation will involve repeated use of te density functional estimator for different bandwidts wic adds muc furter to te computational burden. We also point out tat estimation of te density derivatives also comes up in various oter applications like estimation of modes and inflexion points of densities [Fukunaga and Hostetler 975] and estimation of te derivatives of te projection index in projection pursuit algoritms [Huber 985; Jones and Sibson 987]. A good list of applications wic require te estimation of density derivatives can be found in [Sing 977a]. Te computational complexity of evaluating te density derivative at M evaluation points given N sample points from te density is O(MN). In tis paper we propose a computationally efficient ɛ exact approximation algoritm for te univariate Gaussian kernel based density derivative estimation tat reduces te computational complexity from O(MN) to linear O(N + M). Te algoritm is ɛ exact in te sense tat te constant idden in O(N + M), depends on te desired accuracy, wic can be arbitrary. In fact for macine precision accuracy tere is no difference between te direct and te fast metods. Te proposed metod can be viewed as an extension of te improved fast Gauss transform [Yang et al. 3] proposed to accelerate te kernel density estimate. Te rest of te paper is organized as follows. In we introduce te kernel density estimate and discuss te performance of te estimator. Te kernel density derivative estimate is introduced in 3. 4 discusses te density functionals wic are used by most of te automatic bandwidt selection strategies. 5 briefly describes te different strategies for automatic optimal bandwidt selection. Te solve-te-equation plug-in metod is described in detail. Our proposed fast metod is described in detail in 6. Algoritm details, error bounds, procedure to coose te parameters, and numerical experiments are presented. In 7 we sow te speedup acieved for bandwidt estimation bot on simulated and real data. In 8 we also sow ow te proposed procedure can be used for speeding up projection pursuit tecniques. 9 finally concludes wit a brief discussion on furter extensions.. KENEL DENSITY ESTIMATION A univariate random variable X on as a density p if, for all Borel sets A of, p(x)dx = Pr[x A]. Te task of density estimation is to estimate p from an A CS-T-4774/UMIACS-T-5-73

3 Optimal bandwidt estimation 3 i.i.d. sample x,..., x N drawn from p. Te estimate p : () N is called te density estimate. Te parametric approac to density estimation assumes a functional form for te density, and ten estimates te unknown parameters using tecniques like te maximum likeliood estimation. However unless te form of te density is known a priori, assuming a functional form for a density very often leads to erroneous inference. On te oter and nonparametric metods do not make any assumption on te form of te underlying density. Tis is sometimes referred to as letting te data speak for temselves [Wand and Jones 995]. Te price to be paid is a rate of convergence slower tan /N, wic is typical of parametric metods. Some of te commonly used non-parametric estimators include istograms, kernel density estimators, and ortogonal series estimators [Izenman 99]. Te istogram is very sensitive to te placement of te bin edges and te asymptotic convergence is muc slower tan kernel density estimators. Te most popular non-parametric metod for density estimation is te kernel density estimator (KDE) (also known as te Parzen window estimator [Parzen 96]) given by p(x) = ( ) x xi K, () N i= were K(u) is called kernel function and = (N) is called te bandwidt. Te bandwidt is a scaling factor wic goes to zero as N. In order tat p(x) is a bona fide density, K(u) is required to satisfy te following two conditions: K(u), K(u)du =. () Te kernel function is essentially spreading a probability mass of /N associated wit eac point about its neigborood. Te most widely used kernel is te Gaussian of zero mean and unit variance. K(u) = e u /. (3) π In tis case te kernel density estimate can be written as. Computation complexity p(x) = N π e (x xi) /. (4) Te computational cost of evaluating Eq. 4 at N points is O(N ), making it proibitively expensive. Different metods ave been proposed to accelerate tis sum. If te source points are on an evenly spaced grid ten we can evaluate te sum at an evenly spaced grid exactly in O(N log N) using te fast Fourier transform i= Te best rate of convergence of te MISE of kernel density estimate is of order N 4/5 wile tat of te istogram is of te order N /3. Te KDE is not very sensitive to te sape of te kernel. Wile te Epanecnikov kernel is te optimal kernel, in te sense tat it minimizes te MISE, oter kernels are not tat suboptimal [Wand and Jones 995]. Te Epanecnikov kernel is not used ere because it gives an estimate aving a discontinuous first derivative, because of its finite support. CS-T-4774/UMIACS-T-5-73

4 4 aykar and Duraiswami (FFT). One of te earliest metods, especially proposed for univariate fast kernel density estimation was based on tis idea [Silverman 98]. For irregularly spaced data, te space is divided into boxes, and te data is assigned to te closest neigboring grid points to obtain grid counts. Te KDE is also evaluated at regular grid points. For target points not lying on te te grid te value is obtained by doing some sort of interpolation based on te values at te neigboring grid points. As a result tere is no guaranteed error bound for suc kind of metods. Te Fast Gauss Transform(FGT) [Greengard and Strain 99] is an approximation algoritm tat reduces te computational complexity to O(N), at te expense of reduced precision. Te constant depends on te desired precision, dimensionality of te problem, and te bandwidt. Yang et al. [Yang et al. 3; Yang et al. 5] presented an extension of te fast Gauss transform (te improved fast Gauss transform or IFGT) tat was suitable for iger dimensional problems and provides comparable performance in lower dimensions. Te main contribution of te current paper is te extension of te improved fast Gauss transform to accelerate te kernel density derivative estimate, and solve te optimal bandwidt problem. Anoter class of metods for suc problems are dual-tree metods [Gray and Moore ; 3] wic are based on space partitioning trees for bot te source and target points. Using te tree data structure distance bounds between nodes can be computed. An advantage of te dual-tree metods is tat tey work for all common kernel coices, not necessarily Gaussian.. Performance In order to understand te performance of te KDE we need a measure of distance between two densities. Te commonly used criteria, wic can be easily manipulated is te L norm, also called as te integrated square error (ISE) 3. Te ISE between te estimate p(x) and te actual density p(x) is given by ISE( p, p) = L ( p, p) = [ p(x) p(x)] dx. (5) Te ISE depends on a particular realization of N points. Te ISE can be averaged over tese realizations to get te mean integrated squared error (MISE) defined as [ ] MISE( p, p) = E[ISE( p, p)] = E [ p(x) p(x)] dx = E[{ p(x) p(x)} ]dx = IMSE( p, p), (6) were IMSE is integrated mean squared error. Te MISE or IMSE doesn t depend on te actual data-set as we take expectation. So tis is a measure of te average performance of te kernel density estimator, averaged over te support of te density and different realization of te points. Te MISE for te KDE can be sown 3 Oter distance measures like mean integrated absolute error (based on te L distance [Devroye and Lugosi ]), Kullback-Liebler divergence, and Hellinger distance are used. In tis paper we use only te L criterion. CS-T-4774/UMIACS-T-5-73

5 Optimal bandwidt estimation 5 to be ( see for a derivation) MISE( p, p) = [ (K N p)(x) (K p) (x) ] dx + [(K p)(x) p(x)] dx, were is te convolution operator and K (x) = (/)K(x/). Te dependence of te MISE on te bandwidt is not very explicit in te above expression. Tis makes it difficult to interpret te influence of te bandwidt on te performance of te estimator. An asymptotic large sample approximation for tis expression is usually derived via te Taylor s series called as te AMISE, te A is for asymptotic. Based on a certain assumptions 4, te AMISE between te actual density and te estimate can be sown to be AMISE( p, p) = N (K) µ (K) (p ), (8) were (g) = g(x) dx,, µ (g) = (7) x g(x)dx, (9) and p is te second derivative of te density p (See for a complete derivation.). Te first term in te expression 8 is te integrated variance and te second term is te integrated squared bias. Te bias is proportional to 4 wereas te variance is proportional to /N, wic leads to te well known bias-variance tradeoff. Based on te AMISE expression te optimal bandwidt AMISE can be obtained by differentiating Eq. 8 w.r.t. bandwidt and setting it to zero. [ ] /5 (K) AMISE = µ (K). () (p )N However tis expression cannot be used directly since (p ) depends on te second derivative of te density p, wic we are trying to estimate in te first place. We need to use an estimate of (p ). Substituting Eq. in Eq. 8 te minimum AMISE tat can be attained is inf AMISE( p, p) = 5 4 [ µ (K) (K) 4 (p )] /5 N 4/5. () Tis expression sows tat te best rate of convergence of te MISE of KDE is of order N 4/5. 3. KENEL DENSITY DEIVATIVE ESTIMATION In order to estimate (p ) we will need an estimate of te density derivative. A simple estimator for te density derivative can be obtained by taking te derivative of te kernel density estimate p(x) defined earlier [Battacarya 967; Scuster 4 Te second derivative p (x) is continuous, square integrable and ultimately monotone. lim N = and lim N N =, i.e., as te number of samples N is increased approaces zero at a rate slower tan /N.Te kernel function is assumed to be symmetric about te origin ( zk(z)dz = ) and as finite second moment ( z K(z)dz < ). CS-T-4774/UMIACS-T-5-73

6 6 aykar and Duraiswami 969] 5. If te kernel K is differentiable r times ten te r t density derivative estimate p (r) (x) can be written as p (r) ( ) x (x) = N r+ K (r) xi, () i= were K (r) is te r t derivative of te kernel K. Te r t derivative of te Gaussian kernel k(u) is given by K (r) (u) = ( ) r H r (u)k(u) (3) were H r (u) is te r t Hermite polynomial. Te Hermite polynomials are set of ortogonal polynomials [Abramowitz and Stegun 97]. Te first few Hermite polynomials are H (u) =, H (u) = u, and H (u) = u. Hence te density derivative estimate wit te Gaussian kernel can be written as p (r) ( ) r ( ) x xi (x) = H r e (x x i) /. (4) πn r+ 3. Computational complexity i= Te computational complexity of evaluating te r t derivative of te density estimate due to N points at M target locations is O(rNM). 3. Performance Similar to te analysis done for KDE te AMISE for te kernel density derivative estimate, under certain assumptions 6, can be sown to be (See for a complete derivation) AMISE( p (r), p (r) ) = (K(r) ) 4 + Nr+ 4 µ (K) (p (r+) ). (5) It can be observed tat te AMISE for estimating te r t derivative depends upon te te (r + ) t derivative of te true density. Differentiating Eq. 5 w.r.t. bandwidt and setting it to zero we obtain te optimal bandwidt r AMISE to estimate te r t density derivative. [ (K r (r) ] /r+5 )(r + ) AMISE =. (6) µ (K) (p (r+) )N Substituting Eq. 6 in te equation for AMISE, te minimum AMISE tat can be attained is [ r+/r+5 inf AMISE( p(r), p (r) ) = C µ (K) (K) 4(r+) (p )] N 4/r+5. 5 Some better estimators wic are not necessarily te p t order derivatives of te KDE ave been proposed [Sing 977b]. 6 Te (r + ) t derivative p (r+) (x) is continuous, square integrable and ultimately monotone. lim N = and lim N N r+ =, i.e., as te number of samples N is increased approaces zero at a rate slower tan /N r+. Te kernel function is assumed to be symmetric about te origin ( zk(z)dz = ) and as finite second moment ( z K(z)dz < ). CS-T-4774/UMIACS-T-5-73

7 Optimal bandwidt estimation 7 were C is a constant depending on r. Tis expression sows tat te best rate of convergence of te MISE of KDE of te derivative is of order N 4/r+5. Te rate becomes slower for iger values of r, wic says tat estimating te derivative is more difficult tan estimating te density. 4. ESTIMATION OF DENSITY FUNCTIONALS ater tan te actual density derivative metods for automatic bandwidt selection require te estimation of wat are known as density functionals. Te general integrated squared density derivative functional is defined as [ (p (s) ) = p (x)] (s) dx. (7) Using integration by parts, tis can be written in te following form, (p (s) ) = ( ) s p (s) (x)p(x)dx. (8) More specifically for even s we are interested in estimating density functionals of te form, [ ] Φ r = p (r) (x)p(x)dx = E p (r) (X). (9) An estimator for Φ r is, Φ r = N p (r) (x i ). () i= were p (r) (x i ) is te estimate of te r t derivative of te density p(x) at x = x i. Using a kernel density derivative estimate for p (r) (x i ) (Eq. ) we ave Φ r = N r+ i= j= K (r) ( x i x j ). () It sould be noted tat computation of Φ r is O(rN ) and ence can be very expensive if a direct algoritm is used. 4. Performance Te asymptotic MSE for te density functional estimator under certain assumptions 7 is as follows. ( See 3 for a complete derivation.) [ AMSE( Φ r, Φ r ) = N r+ K(r) () + ] µ (K)Φ r+ + N r+ Φ (K (r) ) + 4 [ ] p (r) (y) p(y)dy Φ r () N 7 Te density p ad k > continuous derivatives wic are ultimately monotone. Te (r + ) t derivative p (r+) (x) is continuous, square integrable and ultimately monotone. lim N = and lim N N r+ =, i.e., as te number of samples N is increased approaces zero at a rate slower tan /N r+. Te kernel function is assumed to be symmetric about te origin ( zk(z)dz = ) and as finite second moment ( z K(z)dz < ). CS-T-4774/UMIACS-T-5-73

8 8 aykar and Duraiswami Te optimal bandwidt for estimating te density functional is cosen te make te bias term zero. Te optimal bandwidt is given by [Wand and Jones 995] [ K (r) ] /r+3 () g MSE =. (3) µ (K)Φ r+ N 5. AMISE OPTIMAL BANDWIDTH SELECTION For a practical implementation of KDE te coice of te bandwidt is very important. Small leads to an estimator wit small bias and large variance. Large leads to a small variance at te expense of increase in bias. Te bandwidt as to be cosen optimally. Various tecniques ave been proposed for optimal bandwidt selection. A brief survey can be found in [Jones et al. 996] and [Wand and Jones 995]. Te best known of tese include rules of tumb, oversmooting, least squares cross-validation, biased cross-validation, direct plug-in metods, solvete-equation plug-in metod, and te smooted bootstrap. 5. Brief review of different metods Based on te AMISE expression te optimal bandwidt AMISE as te following form, [ ] /5 (K) AMISE = µ (K). (4) (p )N However tis expression cannot be used directly since (p ) depends on te second derivative of te density p, wic we are trying to estimate in te first place. Te rules of tumb use an estimate of (p ) assuming tat te data is generated by some parametric form of te density (typically a normal distribution). Te oversmooting metods rely on te fact tat tere is a simple upper bound for te AMISE-optimal bandwidt for estimation of densities wit a fixed value of a particular scale measure. Te least squares cross-validation directly minimize te MISE based on a leave-one-out kernel density estimator. Te problem is tat te function to be minimized as fairly large number of local minima and also te practical performance of tis metod is somewat disappointing. Te biased cross-validation uses te AMISE instead of using te exact MISE formula. Tis is more stable tan te least squares cross-validation but as a large bias. Te plug-in metods use an estimate of te density functional (p ) in Eq. 4. However tis is not completely automatic since estimation of (p ) requires te specification of anoter pilot bandwidt g. Tis bandwidt for estimation of te density functional is quite different from te te bandwidt used for te kernel density estimate. As discussed in Section 4 we can find an expression for te AMISE-optimal bandwidt for te estimation of (p ). However tis bandwidt will depend on an unknown density functional (p ). Tis problem will continue since te optimal bandwidt for estimating (p (s) ) will depend on (p (s+) ). Te usual strategy used by te direct plug-in metods is to estimate (p (l) ) for some l, wit bandwidt cosen wit reference to a parametric family, usually a normal density. Tis metod is usually referred to as te l-stage direct plug-in metod. As te te number of stages l increases te bias of te bandwidt decreases, since te CS-T-4774/UMIACS-T-5-73

9 Optimal bandwidt estimation 9 dependence on te assumption of some parametric family decreases. However tis comes at te price of te estimate being more variable. Tere is no good metod for te coice of l, te most common coice being l =. 5. Solve-te-equation plug-in metod Te most successful among all te current metods, bot empirically and teoretically, is te solve-te-equation plug-in metod [Jones et al. 996]. Tis metod differs from te direct plug-in approac in tat te pilot bandwidt used to estimate (p ) is written as a function of te kernel bandwidt. We use te following version as described in [Seater and Jones 99]. Te AMISE optimal bandwidt is te solution to te equation [ ] /5 (K) = µ (K) Φ, (5) 4 [γ()]n were Φ 4 [γ()] is an estimate of Φ 4 = (p ) using te pilot bandwidt γ(), wic depends on te kernel bandwidt. Te bandwidt is cosen suc tat it minimizes te asymptotic MSE for te estimation of Φ 4 and is given by [ K (4) ] /7 () g MSE =. (6) µ (K)Φ 6 N Substituting for N from Eq. 4 g MSE can be written as a function of as follows [ K (4) ] /7 ()µ (K)Φ 4 g MSE = 5/7 AMISE (K)Φ. (7) 6 Tis suggest tat we set [ ] /7 K (4) ()µ (K) Φ 4 (g ) γ() = 5/7, (8) (K) Φ 6 (g ) were Φ 4 (g ) and Φ 6 (g ) are estimates of Φ 4 and Φ 6 using bandwidts g and g respectively. Φ 4 (g ) = N(N )g 5 Φ 6 (g ) = N(N )g 7 i= j= i= j= K (4) ( x i x j g ). (9) K (6) ( x i x j g ). (3) Te bandwidts g and g are cosen suc tat it minimizes te asymptotic MSE. [ ] /7 [ ] /9 K (4) () K (6) () g = g =, (3) µ (K) Φ 6 N µ (K) Φ 8 N were Φ 6 and Φ 8 are estimators for Φ 6 and Φ 8 respectively. We can use a similar strategy for estimation of Φ 6 and Φ 8. However tis problem will continue since te optimal bandwidt for estimating Φ r will depend on Φ r+. Te usual strategy is CS-T-4774/UMIACS-T-5-73

10 aykar and Duraiswami to estimate a Φ r at some stage, using a quick and simple estimate of bandwidt cosen wit reference to a parametric family, usually a normal density. It as been observed tat as te te number of stages increases te variance of te bandwidt increases. Te most common coice is to use only two stages. If p is a normal density wit variance σ ten for even r we can compute Φ r exactly [Wand and Jones 995]. Φ r = ( ) r/ r!. (3) (σ) r+ (r/)!π/ An estimator of Φ r will use an estimate σ of te variance. Based on tis we can write an estimator for Φ 6 and Φ 8 as follows. Φ 6 = 5 6 π σ 7, Φ8 = 5 3 π σ 9. (33) Te two stage solve-te-equation metod using te Gaussian kernel can be summarized as follows. () Compute an estimate σ of te standard deviation σ. () Estimate te density functionals Φ 6 and Φ 8 using te normal scale rule. Φ 6 = 5 6 π σ 7, Φ8 = 5 3 π σ 9. (3) Estimate te density functionals Φ 4 and Φ 6 using te kernel density estimators wit te optimal bandwidt based on te asymptotic MSE. [ ] /7 [ ] /9 6 3 g = g = π Φ6 N π Φ8 N Φ 4 (g ) = N(N ) πg 5 i= j= ( ) xi x j H 4 e (x i x j ) /g. g Φ 6 (g ) = N(N ) πg 7 i= j= (4) Te bandwidt is te solution to te equation [ ] /5 =, π Φ 4 [γ()]n were and Φ 4 [γ()] = CS-T-4774/UMIACS-T-5-73 N(N ) πγ() 5 γ() = [ i= j= ( ) xi x j H 6 e (x i x j ) /g. g ( ) xi x j H 4 e (x i x j ) /γ(), γ() 6 ] /7 Φ 4 (g ) 5/7. Φ 6 (g )

11 Optimal bandwidt estimation Tis equation can be solved using any numerical routine like te Newton- apson metod. Te main computational bottleneck is te estimation of Φ wic is of O(N ). 6. FAST DENSITY DEIVATIVE ESTIMATION Te r t kernel density derivative estimate using te Gaussian kernel of bandwidt is given by p (r) (x) = ( ) r πn r+ i= ( ) x xi H r e (x xi) /. (34) Let us say we ave to estimate te density derivative at M target points, {y j } M j=. More generally we need to evaluate te following sum, ( ) yj x i G r (y j ) = q i H r e (y j x i ) / j =,..., M, (35) i= were {q i } N i= will be referred to as te source weigts, + is te bandwidt of te Gaussian and + is te bandwidt of te Hermite. Te computational complexity of evaluating Eq. 35 is O(rN M). Te fast algoritm is based on separating te x i and y j in te Gaussian via te factorization of te Gaussian by Taylor series and retaining only te first few terms so tat te error due to truncation is less tan te desired error. Te Hermite function is factorized via te binomial teorem. For any given ɛ > te algoritm computes an approximation Ĝr(y j ) suc tat Ĝ r (y j ) G r (y j ) ɛ, (36) Q were Q = N i= q i. We call Ĝr(y j ) an ɛ exact approximation to G r (y j ). 6. Factorization of te Gaussian For any point x te Gaussian can be written as, e y j x i / = e (y j x ) (x i x ) / = e x i x / e y j x / e (x i x )(y j x )/. (37) In Eq. 37 te first exponential e xi x / depends only on te source coordinates x i. Te second exponential e yj x / depends only on te target coordinates y j. However for te tird exponential e (yj x )(xi x )/ te source and target are entangled. Tis entanglement is separated using te Taylor s series expansion. Te factorization of te Gaussian and te evaluation of te error bounds are based on te Taylor s series and Lagrange s evaluation of te remainder wic we state ere witout te proof. Teorem 6.. [Taylor s Series] For any point x, let I be an open set containing te point x. Let f : I be a function wic is n times differentiable CS-T-4774/UMIACS-T-5-73

12 aykar and Duraiswami on I. Ten for any x I, tere is a θ wit < θ < suc tat f(x) = n k= k! (x x ) k f (k) (x ) + n! (x x ) n f (n) (x + θ(x x )), (38) were f (k) is te k t derivative of te function f. Based on te above teorem we ave te following corollary. Corollary 6.. Let B rx (x ) be a open interval of radius r x wit center x, i.e., B rx (x ) = {x : x x < r x }. Let + be a positive constant and y be a fixed point suc tat y x < r y. For any x B rx (x ) and any non-negative integer p te function f(x) = e (x x )(y x )/ can be written as f(x) = e (x x )(y x )/ = and te residual p (x) p p! < p p! p k= k k! ( ) k ( ) k x x y x + p (x), (39) ( ) p ( ) p x x y x e x x y x /. ( rx r ) p y e r x r y /. (4) Proof. Let us define a new function g(x) = e [x(y x )]/. Using te result ( ) k g (k) (x ) = k k e[x (y x )]/ y x (4) and Teorem 6., we ave for any x B rx (x ) tere is a θ wit < θ < suc tat Hence were, ( ) k ( ) k x x y x k= ( ) p ( ) p } x x y x e θ[(x x ).(y x )]/. { p k g(x) = e [x (y x )]/ k! + p p! f(x) = e (x x )(y x )/ = p (x) = p p! CS-T-4774/UMIACS-T-5-73 p k= k k! ( ) k ( ) k x x y x + p (x), ( ) p ( ) p x x y x e θ[(x x )(y x )]/.

13 Optimal bandwidt estimation 3 Te remainder is bounded as follows. ( ) p ( ) p p (x) p x x y x e θ x x y x /, p! ( ) p ( ) p p x x y x e x x y x / [Since < θ < ], p! ( < p rx r ) p y e r xr y/ p! [Since x x < r x and y x < r y ]. Using Corollary 6. te Gaussian can now be factorized as [ p e yj xi / k ( ) ] [ k = e xi x / xi x k! were, k= e yj x / ( ) ] k yj x + error p. (4) error p p p! ( ) p ( xi x yj x ) p e ( x i x y j x ) /. (43) 6. Factorization of te Hermite polynomial Te r t Hermite polynomial can be written as [Wand and Jones 995] Hence, H r (x) = r/ l= a l x r l, were a l = ( )l r! l l!(r l)!. ( ) r/ yj x i ( yj x H r = a l l= x ) r l i x. Using te binomial teorem (a + b) n = n m= ( n m) a m b n m, te x i and y j can be separated as follows. ( yj x x ) r l r l i x ( ) ( ) m ( ) r l m r l = ( ) m xi x yj x. m m= Substituting in te previous equation we ave were, ( ) yj x i H r = r/ l= r l m= a lm = a lm ( xi x ) m ( ) r l m yj x (44) ( ) l+m r! l l!m!(r l m)!. (45) CS-T-4774/UMIACS-T-5-73

14 4 aykar and Duraiswami 6.3 egrouping of te terms Using Eq. 4 and 44, G r (y j ) after ignoring te error terms can be approximated as [ p r/ r l k ( ) k ( ) ] m Ĝ r (y j ) = a lm q i e x i x / xi x xi x k! k= l= m= i= [ ( ) k ( ) ] r l m yj x yj x were = e y j x / p k= r/ l= r l m= B km = k k! a lm B km e yj x / ( ) k ( yj x yj x ( ) k ( ) m q i e x i x / xi x xi x. i= ) r l m Te coefficients B km can be evaluated separately in O(prN). Evaluation of Ĝr(y j ) at M points is O(pr M). Hence te computational complexity as reduced from te quadratic O(rNM) to te linear O(prN + pr M). 6.4 Space subdivision Tus far, we ave used te Taylor s series expansion about a certain point x. However if we use te same x for all te points we typically would require very ig truncation number p since te Taylor s series gives good approximation only in a small open interval around x. We uniformly sub-divide te space into K intervals of lengt r x. Te N source points are assigned into K clusters, S n for n =,..., K wit c n being te center of eac cluster. Te aggregated coefficients are now computed for eac cluster and te total contribution from all te clusters is summed up. Ĝ r (y j ) = were, K p r/ r l n= k= l= m= B n km = k k! 6.5 Decay of te Gaussian a lm B n kme y j c n / ( ) k ( ) r l m yj c n yj c n (46) ( ) k ( ) m q i e xi x / xi x xi x. (47) x i S n Since te Gaussian decays very rapidly a furter speedup is acieved if we ignore all te sources belonging to a cluster if te cluster is greater tan a certain distance from te target point, i.e., y j c n > r y. Te cluster cutoff radius r y depends on te desired error ɛ. Substituting = and = we ave Ĝ r (y j ) = p r/ r l y j c n r y k= l= m= CS-T-4774/UMIACS-T-5-73 ( ) k+r l m a lm Bkme n y j c n / yj c n (48)

15 Optimal bandwidt estimation 5 were, Bkm n = ( ) k+m q i e xi x / xi x. (49) k! x i S n 6.6 Computational and space complexity Computing te coefficients Bkm n for all te clusters is O(prN). Evaluation of Ĝr(y j ) at M points is O(npr M), were n if te maximum number of neigbor clusters wic influence y j. Hence te total computational complexity is O(prN + npr M). Assuming N = M te total computational complexity is O(cN) were te constant c = pr + npr depends on te desired error, te bandwidt, and r. For eac cluster we need to store all te pr coefficients. Hence te storage needed is of O(prK + N + M). 6.7 Error bounds and coosing te parameters Given any ɛ >, we want to coose te following parameters, K (te number of intervals), r y (te cut off radius for eac cluster), and p (te truncation number) suc tat for any target point y j Ĝ r (y j ) G r (y j ) Q ɛ, (5) were Q = N i= q i. Let us define ij to be te point wise error in Ĝr(y j ) contributed by te i t source x i. We now require tat Ĝr(y j ) G r (y j ) = ij ij q i ɛ. (5) One way to acieve tis is to let i= i= ij q i ɛ i =,..., N. We coose tis strategy because it elps us to get tigter bounds. Let c n be te center of te cluster to wic x i belongs. Tere are two different ways in wic a source can contribute to te error. Te first is due to ignoring te cluster S n if it is outside a given radius r y from te target point y j. In tis case, ij = q i H r ( yj x i i= ) e y j x i /. (5) For all clusters wic are witin a distance r y from te target point te error is due to te truncation of te Taylor s series after order p. From Eqs. 43 and using te fact tat = and = we ave, ij q ( ) ( ) p ( ) p i p! H yj x i xi c n yj c n r e ( x i c n y j c n ) /. (53) 6.7. Coosing te cut off radius. From Eq. 5 we ave ( ) H yj x i r e y j x i / ɛ (54) CS-T-4774/UMIACS-T-5-73

16 6 aykar and Duraiswami p= p=3 ij y j c n Fig.. Te error at y j due to source x i, i.e., ij [Eq. 6] as a function of y j c n for different values of p and for =. and r = 4. Te error increases as a function of y j c n, reaces a maximum and ten starts decreasing. Te maximum is marked as *. q i = and x i c n =.. We use te following inequality to bound te Hermite polynomial [Baxter and oussos ]. ( ) H yj x i r r!e yj xi /4. (55) Substituting tis bound in Eq. 54 we ave e y j x i /4 ɛ/ r!. (56) Tis implies tat y j x i > ln ( r!/ɛ). Using te reverse triangle inequality, a b a b, and te fact tat y j c n > r y and x i c n r x, we ave y j x i = (y j c n ) (x i c n ) (yj c n ) (x i c n ) > r y r x (57) So in order tat te error due to ignoring te faraway clusters is less tan q i ɛ we ave to coose r y suc tat r y r x > ln ( r!/ɛ). (58) If we coose r y > r x ten, r y > r x + ln ( r!/ɛ). (59) Let be te maximum distance between any source and target point. Te we coose te cutoff radius as ( r y > r x + min, ln ( ) r!/ɛ). (6) CS-T-4774/UMIACS-T-5-73

17 Optimal bandwidt estimation Coosing te truncation number. For all sources for wic y j c k r y we ave ij q ( ) ( ) p ( ) p i p! H yj x i xi c n yj c n r e ( x i c n y j c n ) /. (6) Using te bound on te Hermite polynomial (Eq. 55) tis can be written as ij q i ( ) p ( ) p r! xi c n yj c n e ( xi cn yj cn ) /4. p! For a given source x i we ave to coose p suc tat ij q i ɛ. ij depends bot on distance between te source and te cluster center, i.e., x i c n and te distance between te target and te cluster center, i.e., y j c n. Te speedup is acieved because at eac cluster S n we sum up te effect of all te sources. As a result we do not ave a knowledge of y j c n. So we will ave to bound te rigt and side of Eq. 6, suc tat it is independent of y j c n. Fig. sows te error at y j due to source x i, i.e., ij [Eq. 6] as a function of y j c n for different values of p and for =. and r = 4. Te error increases as a function of y j c n, reaces a maximum and ten starts decreasing. Te maximum is attained at (obtained by taking te first derivative of te.h.s. of Eq. 6 and setting it to zero), y j c n = x i c n + x i c n + 8p Hence we coose p suc tat, (6) (63) ij [ yj c n = y j c n ] q i ɛ. (64) In case y j c n > r y we need to coose p based on r y, since ij will be muc lower tere. Hence out strategy for coosing p is (we coose r x = /.), 6.8 Numerical experiments ij [ yj c n =min ( y j c n,r y), x i c n =/] q i ɛ, (65) In tis section we present some numerical studies of te speedup and error as a function of te number of data points, te bandwidt, te order r, and te desired error ɛ. Te algoritms were programmed in C++ and was run on a.6 GHz Pentium M processor wit 5Mb of AM. Figure sows te running time and te maximum absolute error relative to Q for bot te direct and te fast metods as a function of N = M. Te bandwidt was =. and te order of te derivative was r = 4. Te source and te target points were uniformly distributed in te unit interval. We see tat te running time of te fast metod grows linearly as te number of sources and targets increases, wile tat of te direct evaluation grows quadratically. We also observe tat te error is way below te desired error tus validating our bound. However te bound is not very tigt. Figure 3 sows te tradeoff between precision and speedup. An increase in speedup is obtained at te cost of reduced accuracy. Figure 4 sows te results CS-T-4774/UMIACS-T-5-73

18 8 aykar and Duraiswami 6 Direct Fast 6 Desired error Acutal error Time (sec) 4 Max. abs. error / Q N (a) N (b) Fig.. (a) Te running time in seconds and (b) maximum absolute error relative to Q for te direct and te fast metods as a function of N. N = M source and te target points were uniformly distributed in te unit interval. For N > 56 te timing results for te direct evaluation were obtained by evaluating te result at M = points and ten extrapolating. [ =., r = 4, and ɛ = 6.] as a function of bandwidt. Better speedup is obtained at larger bandwidts. Figure 5 sows te results for different orders of te density derivatives. 7. SPEEDUP ACHIEVED FO BANDWIDTH ESTIMATION Te solve-te-equation plug-in metod of [Jones et al. 996] was implemented in MATLAB wit te core computational task of computing te density derivative written in C Syntetic data We demonstrate te speedup acieved on te mixture of normal densities used by Marron and Wand [Marron and Wand 99]. Te family of normal mixture densities is extremely ric and, in fact any density can be approximated arbitrarily well by a member of tis family. Fig. 6 sows te fifteen densities wic were used by te autors in [Marron and Wand 99] as a typical representative of te densities likely to be encountered in real data situations. We sampled N = 5, points from eac density. Te AMISE optimal bandwidt was estimated bot using te direct metods and te proposed fast metod. Table I sows te speedup acieved and te absolute relative error. Fig. 6 sows te actual density and te estimated density using te optimal bandwidt estimated using te fast metod. 7. eal data We used te Adult database from te UCI macine learning repository [Newman et al. 998]. Te database extracted from te census bureau database contains 3,56 training instances wit 4 attributes per instance. Of te 4 attributes 6 are continuous and 8 nominal. Table II sows te speedup acieved and te absolute relative error for two of te continuous attributes. CS-T-4774/UMIACS-T-5-73

19 Optimal bandwidt estimation Desired error Acutal error Speedup 5 Max. abs. error / Q ε (a) 5 5 ε (b) Fig. 3. (a) Te speedup acieved and (b) maximum absolute error relative to Q for te direct and te fast metods as a function of ɛ. N = M = 5, source and te target points were uniformly distributed in te unit interval. [ =. and r = 4] 3 Direct Fast 5 Desired error Acutal error Time (sec) Max. abs. error / Q 4 3 Bandwidt, (a) Bandwidt, (b) Fig. 4. (a) Te running time in seconds and (b) maximum absolute error relative to Q for te direct and te fast metods as a function of. N = M = 5, source and te target points were uniformly distributed in te unit interval. [ɛ = 6 and r = 4] 8. POJECTION PUSUIT Projection Pursuit (PP) is an exploratory tecnique for visualizing and analyzing large multivariate data-sets [Friedman and Tukey 974; Huber 985; Jones and Sibson 987]. Te idea of projection pursuit is to searc for projections from igto low-dimensional space tat are most interesting. Tese projections can ten be used for oter nonparametric fitting and oter data-analytic purposes Te conventional dimension reduction tecniques like principal component analysis looks for a projection tat maximizes te variance. Te idea of PP is to look for projections CS-T-4774/UMIACS-T-5-73

20 aykar and Duraiswami 4 Direct Fast 5 Desired error Acutal error Time (sec) 3 Max. abs. error / Q Order, r (a) Order, r (b) Fig. 5. (a) Te running time in seconds and (b) maximum absolute error relative to Q for te direct and te fast metods as a function of r. N = M = 5, source and te target points were uniformly distributed in te unit interval. [ɛ = 6 and =.] Table I. Te bandwidt estimated using te solve-te-equation plug-in metod for te fifteen normal mixture densities of Marron and Wand. direct and fast are te bandwidts estimated using te direct and te fast metods respectively. Te running time in seconds for te direct and te fast metods are sown.te absolute relative error is defined as direct fast / direct. In te study N =, points were sampled from te corresponding densities. For te fast metod we used ɛ = 3. Density direct fast T direct (sec) T fast (sec) Speedup Abs. elative Error e e e e e e e e e e e e e e e-7 tat maximize oter measures of interestingness, like non-normality, entropy etc. Te PP algoritm for finding te most interesting one-dimensional subspace is as follows. () Given N data points in a d dimensional space (centered and scaled), {x i d } N i=, project eac data point onto te direction vector a d, i.e., z i = a T x i. CS-T-4774/UMIACS-T-5-73

21 Optimal bandwidt estimation (a) Gaussian 3 3 (b) Skewed unimodal 3 3 (c) Strongly skewed (d) Kurtotic unimodal 3 3 (e) Outlier 3 3 (f) Bimodal (g) Separated bimodal () Skewed bimodal (i) Trimodal (j) Claw (k) Double Claw (l) Asymmetric Claw (m) Asym. Double Claw (n) Smoot Comb (o) Discrete Comb Fig. 6. Te fifteen normal mixture densities of Marron and Wand. Te solid line corresponds to te actual density wile te dotted line is te estimated density using te optimal bandwidt estimated using te fast metod. CS-T-4774/UMIACS-T-5-73

22 aykar and Duraiswami Table II. Optimal bandwidt estimation for five continuous attributes for te Adult database from te UCI macine learning repository. Te database contains 356 training instances. Te bandwidt was estimated using te solve-te-equation plug-in metod. direct and fast are te bandwidts estimated using te direct and te fast metods respectively. Te running time in seconds for te direct and te fast metods are sown. Te absolute relative error is defined as direct fast / direct. For te fast metod we used ɛ = 3. Attribute direct fast T direct (sec) T fast (sec) Speedup Error Age e-5 fnlwgt e-6 () Compute te univariate nonparametric kernel density estimate, p, of te projected points z i. (3) Compute te projection index I(a) based on te density estimate. (4) Locally optimize over te te coice of a, to get te most interesting projection of te data. (5) epeat from a new initial projection to get a different view. Te projection index is designed to reveal specific structure in te data, like clusters, outliers, or smoot manifolds. Some of te commonly used projection indices are te Friedman-Tukey index [Friedman and Tukey 974], te entropy index [Jones and Sibson 987], and te moment index. Te entropy index based on ényi s order- entropy is given by I(a) = p(z) log p(z)dz. (66) Te density of zero mean and unit variance wic uniquely minimizes tis is te standard normal density. Tus te projection index finds te direction wic is most non-normal.in practice we need to use an estimate p of te te true density p, for example te kernel density estimate using te Gaussian kernel. Tus we ave an estimate of te entropy index as follows. Î(a) = log p(z)p(z)dz = E [log p(z)] = N log p(z i ) = N i= log p(a T x i ). (67) Te entropy index Î(a) as to be optimized over te d-dimensional vector a subject to te constraint tat a =. Te optimization function will require te gradient of te objective function. For te index defined above te gradient can be written as d da [Î(a)] = p (a T x i ) N p(a T x i ) x i. (68) i= For te PP te computational burden is greatly reduced if we use te proposed fast metod. Te computational burden is reduced in te following tree instances. () Computation of te kernel density estimate. CS-T-4774/UMIACS-T-5-73 i=

23 Optimal bandwidt estimation x Age (a) Age fnlwgt x (b) fnlwgt Fig. 7. Te estimated density using te optimal bandwidt estimated using te fast metod, for two of te continuous attributes in te Adult database from te UCI macine learning repository. (a) (b) (c) (d) Fig. 8. (a) Te original image. (b) Te centered and scaled GB space. Eac pixel in te image is a point in te GB space. (c) KDE of te projection of te pixels on te most interesting direction found by projection pursuit. (d) Te assignment of te pixels to te tree modes in te KDE. () Estimation of te optimal bandwidt. (3) Computation of te first derivative of te kernel density estimate, wic is required in te optimization procedure. Fig. 8 sows an example of te PP algoritm on a image. Fig. 8(a) sows te original image of te and wit a ring against a background. Perceptually te image as tree distinct regions, te and, te ring, and te background. Eac pixel is represented as a point in a tree dimensional GB space. Fig. 8(b) sows te te presence of tree clusters in te GB space. We ran te PP algoritm on tis space. Fig. 8(c) sows te KDE of te points projected on te most interesting direction. Tis direction is clearly able to distinguis te tree clusters. Fig. 8(d) sows te segmentation were eac pixel is assigned to te mode nearest to it. 9. CONCLUSIONS We proposed an fast ɛ exact algoritm for kernel density derivative estimation wic reduced te computational complexity from O(N ) to O(N). We demon- CS-T-4774/UMIACS-T-5-73

24 4 aykar and Duraiswami strated te speedup acieved for optimal bandwidt estimation bot on simulated as well as real data. As an example we demonstrated ow to potentially speedup te projection pursuit algoritm. We focussed on te univariate case in te current paper since te bandwidt selection procedures for te univariate case are pretty mature. Bandwidt selection for te multivariate case is a field of very active researc [Wand and Jones 994]. Our future work would include te relatively straigtforward but more involved extension of te current procedure to andle iger dimensions. As pointed out earlier many applications oter tan bandwidt estimation require derivative estimates. We ope tat our fast computation sceme sould benefit all te related applications. Te C++ code is available for academic use by contacting te first autor.. APPENDIX : MISE FO KENEL DENSITY ESTIMATOS First note tat MISE=IMSE. [ ] MISE( p, p) = E [ p(x) p(x)] dx = E[ p(x) p(x)] dx = IMSE( p, p). Te mean square error (MSE) can be decomposed into variance and squared bias of te estimator. (69) MSE( p, p, x) = E[ p(x) p(x)] = V ar[ p(x)] + (E[ p(x)] p(x)). (7) Te kernel density estimate p(x) is given by p(x) = N were K (x) = (/)K(x/).. Bias i= K( x x i ) = N K (x x i ), Te mean of te estimator can be written as E[ p(x)] = E[K (x x i )] = E[K (x X)] = N i= Using te convolution operator we ave i= K (x y)p(y)dy. (7) E[ p(x)] p(x) = (K p)(x) p(x). (7) Te bias is te difference between te smooted version (using te kernel) of te density and te actual density.. Variance Te variance of te estimator can be written as V ar[ p(x)] = N V ar[k (x X)] = N (E[K (x X)] E[K (x X)] ). (73) Using Eq. 7 we ave te following expression for te variance. V ar[ p(x)] = N [(K p)(x) (K p) (x)]. (74) CS-T-4774/UMIACS-T-5-73

25 Optimal bandwidt estimation 5.3 MSE Using Eq. 7 and Eq. 74 te MSE at a point x can be written as, MSE( p, p, x) = N [ (K p)(x) (K p) (x) ] + [(K p)(x) p(x)]. (75).4 MISE Since MISE=IMSE we ave, MISE( p, p) = [ (K N p)(x) (K p) (x) ] dx + [(K p)(x) p(x)] dx. Te dependence of te MISE on te bandwidt is not very explicit in te above expression. Tis makes it difficult to interpret te influence of te bandwidt on te performance of te estimator. An asymptotic approximation for tis expression is usually derived called as te AMISE.. APPENDIX : ASYMPTOTIC MISE FO KENEL DENSITY ESTIMATOS In order to derive an large sample approximation to MISE we make te following assumptions on te density p, te bandwidt, and te kernel K. () Te second derivative p (x) is continuous, square integrable and ultimately monotone 8. () lim N = and lim N N =, i.e., as te number of samples N is increased approaces zero at a rate slower tan /N. (3) In order tat p(x) is a valid density we assume K(z) and K(z)dz =. Te kernel function is assumed to be symmetric about te origin ( zk(z)dz = ) and as finite second moment ( z K(z)dz < ).. Bias From Eq. 7 and a cange of variables we ave E[ p(x)] = (K p)(x) = K (x y)p(y)dy = Using Taylor s series p(x z) can be expanded as Hence E[ p(x)] = p(x) (76) K(z)p(x z)dz. (77) p(x z) = p(x) zp (x) + z p (x) + o( ). (78) K(z)dz p (x) zk(z)dz + p (x) z K(z)dz + o( ). (79) 8 An ultimately monotone function is one tat is monotone over bot (, M) and (M, ) for some M >. CS-T-4774/UMIACS-T-5-73

26 6 aykar and Duraiswami From Assumption 3 we ave, K(z)dz = zk(z)dz = µ (K) = Hence z K(z)dz < (8) E[ p(x)] p(x) = µ (K)p (x) + o( ). (8) Te KDE is asymptotically unbiased. Te bias is directly proportional to te value of te second derivative of te density function, i.e., te curvature of te density function.. Variance From Eq. 74 and a cange of variables we ave V ar[ p(x)] = N [(K p)(x) (K p) (x)] = [ ] K N (x y)p(y)dy [ N = [ ] K (z)p(x z)dz [ N N Using Taylor s series p(x z) can be expanded as K (x y)p(y)dy ] K(z)p(x z)dz] (8) p(x z) = p(x) + o(). (83) We need only te first term because of te factor /N. Hence V ar[ p(x)] = [p(x) + o()] K (z)dz [p(x) + o()] N N = N p(x) K (z)dz + o(/n) (84) Based on Assumption lim N N =, te variable asymptotically converges to zero..3 MSE Te MSE at a point x can be written as (using Eqs. 8 and 84), MSE( p, p, x) = N p(x)(k) µ (K) p (x) + o( 4 + /N). (85) were (K) = K (z)dz. CS-T-4774/UMIACS-T-5-73

27 Optimal bandwidt estimation 7.4 MISE Since MISE=IMSE we ave, MISE( p, p) = N (K) p(x)dx µ (K) p (x) dx + o( 4 + /N) = AMISE( p, p) + o( 4 + /N), (86) were AMISE( p, p) = N (K) µ (K) (p ). (87). APPENDIX 3 : AMISE FO KENEL DENSITY DEIVATIVE ESTIMATOS First note tat MISE=IMSE. MISE( p (r), p (r) ) = E [ ] [ p (r) (x) p (r) (x)] dx = E[ p (r) (x) p (r) (x)] dx = IMSE( p (r), p (r) ). (88) Te mean square error (MSE) can be decomposed into variance and squared bias of te estimator. MSE( p (r), p (r), x) = E[ p (r) (x) p (r) (x)] = Var[ p (r) (x)] + (E[ p (r) (x)] p (r) (x)). (89) An simple estimator for te density derivative can be obtained by taking te derivative of te kernel density estimate p(x) [Battacarya 967; Scuster 969]. If te kernel K is differentiable r times ten te r t density derivative estimate p (r) (x) can be written as p (r) (x) = = N N r+ i= i= ( ) x K (r) xi K (r) (x x i) (9) were K (r) is te r t derivative of te kernel K and K (r) (x) = (/r+ )K (r) (x/). In order to derive an large sample approximation to MISE we make te following assumptions on te density p, te bandwidt, and te kernel K. () Te (r+) t derivative p (r+) (x) is continuous, square integrable and ultimately monotone 9. () lim N = and lim N N r+ =, i.e., as te number of samples N is increased approaces zero at a rate slower tan /N r+. 9 An ultimately monotone function is one tat is monotone over bot (, M) and (M, ) for some M >. CS-T-4774/UMIACS-T-5-73

28 8 aykar and Duraiswami (3) In order tat p(x) is a valid density we assume K(z) and K(z)dz =. Te kernel function is assumed to be symmetric about te origin ( zk(z)dz = ) and as finite second moment ( z K(z)dz < ).. Bias Te mean of te estimator can be written as E[ p (r) (x)] = E[K (r) N (x x i)] Using te convolution operator we ave i= = E[K (r) (x X)] = K (r) (x y)p(y)dy. (9) E[ p (r) (x)] = (K (r) p)(x) = (K p (r) )(x). (9) were we ave used te relation K (r) p = K p (r). We now derive a large sample approximation to te mean. Using a cange of variables te mean can be written as follows. E[ p (r) (x)] = (K p (r) )(x) = K (x y)p (r) (y)dy = K(z)p (r) (x z)dz. (93) Using Taylor s series p (r) (x z) can be expanded as Hence p (r) (x z) = p (r) (x) zp (r+) (x) + z p (r+) (x) + o( ). (94) [ E[ p (r) (x)] = p (r) (x) + p (r+) (x) ] K(z)dz [ From Assumption 3 we ave, K(z)dz = zk(z)dz = µ (K) = Hence te bias can be written as [ p (r+) (x) ] z K(z)dz zk(z)dz + o( ). (95) z K(z)dz < (96) E[ p (r) (x)] p (r) (x) = µ (K)p (r+) (x) + o( ). (97) Te estimate is asymptotically unbiased. Te bias is estimating te r t derivative is directly proportional to te value of te (r + ) t derivative of te density function. CS-T-4774/UMIACS-T-5-73 ]

Fast optimal bandwidth selection for kernel density estimation

Fast optimal bandwidth selection for kernel density estimation Fast optimal bandwidt selection for kernel density estimation Vikas Candrakant Raykar and Ramani Duraiswami Dept of computer science and UMIACS, University of Maryland, CollegePark {vikas,ramani}@csumdedu

More information

Chapter 1. Density Estimation

Chapter 1. Density Estimation Capter 1 Density Estimation Let X 1, X,..., X n be observations from a density f X x. Te aim is to use only tis data to obtain an estimate ˆf X x of f X x. Properties of f f X x x, Parametric metods f

More information

Fast Exact Univariate Kernel Density Estimation

Fast Exact Univariate Kernel Density Estimation Fast Exact Univariate Kernel Density Estimation David P. Hofmeyr Department of Statistics and Actuarial Science, Stellenbosc University arxiv:1806.00690v2 [stat.co] 12 Jul 2018 July 13, 2018 Abstract Tis

More information

The Priestley-Chao Estimator

The Priestley-Chao Estimator Te Priestley-Cao Estimator In tis section we will consider te Pristley-Cao estimator of te unknown regression function. It is assumed tat we ave a sample of observations (Y i, x i ), i = 1,..., n wic are

More information

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines Lecture 5 Interpolation II Introduction In te previous lecture we focused primarily on polynomial interpolation of a set of n points. A difficulty we observed is tat wen n is large, our polynomial as to

More information

Numerical Differentiation

Numerical Differentiation Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc

More information

Kernel Density Estimation

Kernel Density Estimation Kernel Density Estimation Univariate Density Estimation Suppose tat we ave a random sample of data X 1,..., X n from an unknown continuous distribution wit probability density function (pdf) f(x) and cumulative

More information

Basic Nonparametric Estimation Spring 2002

Basic Nonparametric Estimation Spring 2002 Basic Nonparametric Estimation Spring 2002 Te following topics are covered today: Basic Nonparametric Regression. Tere are four books tat you can find reference: Silverman986, Wand and Jones995, Hardle990,

More information

Kernel Density Based Linear Regression Estimate

Kernel Density Based Linear Regression Estimate Kernel Density Based Linear Regression Estimate Weixin Yao and Zibiao Zao Abstract For linear regression models wit non-normally distributed errors, te least squares estimate (LSE will lose some efficiency

More information

IEOR 165 Lecture 10 Distribution Estimation

IEOR 165 Lecture 10 Distribution Estimation IEOR 165 Lecture 10 Distribution Estimation 1 Motivating Problem Consider a situation were we ave iid data x i from some unknown distribution. One problem of interest is estimating te distribution tat

More information

Boosting Kernel Density Estimates: a Bias Reduction. Technique?

Boosting Kernel Density Estimates: a Bias Reduction. Technique? Boosting Kernel Density Estimates: a Bias Reduction Tecnique? Marco Di Marzio Dipartimento di Metodi Quantitativi e Teoria Economica, Università di Cieti-Pescara, Viale Pindaro 42, 65127 Pescara, Italy

More information

Applications of the van Trees inequality to non-parametric estimation.

Applications of the van Trees inequality to non-parametric estimation. Brno-06, Lecture 2, 16.05.06 D/Stat/Brno-06/2.tex www.mast.queensu.ca/ blevit/ Applications of te van Trees inequality to non-parametric estimation. Regular non-parametric problems. As an example of suc

More information

Homework 1 Due: Wednesday, September 28, 2016

Homework 1 Due: Wednesday, September 28, 2016 0-704 Information Processing and Learning Fall 06 Homework Due: Wednesday, September 8, 06 Notes: For positive integers k, [k] := {,..., k} denotes te set of te first k positive integers. Wen p and Y q

More information

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx. Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions

More information

Chapter 5 FINITE DIFFERENCE METHOD (FDM)

Chapter 5 FINITE DIFFERENCE METHOD (FDM) MEE7 Computer Modeling Tecniques in Engineering Capter 5 FINITE DIFFERENCE METHOD (FDM) 5. Introduction to FDM Te finite difference tecniques are based upon approximations wic permit replacing differential

More information

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator.

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator. Lecture XVII Abstract We introduce te concept of directional derivative of a scalar function and discuss its relation wit te gradient operator. Directional derivative and gradient Te directional derivative

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximating a function f(x, wose values at a set of distinct points x, x, x 2,,x n are known, by a polynomial P (x

More information

Order of Accuracy. ũ h u Ch p, (1)

Order of Accuracy. ũ h u Ch p, (1) Order of Accuracy 1 Terminology We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, wic can be for instance te grid size or time step in a numerical

More information

Regularized Regression

Regularized Regression Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize

More information

Improved Fast Gauss Transform and Efficient Kernel Density Estimation

Improved Fast Gauss Transform and Efficient Kernel Density Estimation Improved Fast Gauss Transform and Efficient Kernel Density Estimation Cangjiang Yang, Ramani Duraiswami, Nail A. Gumerov and Larry Davis Perceptual Interfaces and Reality Laboratory University of Maryland,

More information

LECTURE 14 NUMERICAL INTEGRATION. Find

LECTURE 14 NUMERICAL INTEGRATION. Find LECTURE 14 NUMERCAL NTEGRATON Find b a fxdx or b a vx ux fx ydy dx Often integration is required. However te form of fx may be suc tat analytical integration would be very difficult or impossible. Use

More information

Local Orthogonal Polynomial Expansion (LOrPE) for Density Estimation

Local Orthogonal Polynomial Expansion (LOrPE) for Density Estimation Local Ortogonal Polynomial Expansion (LOrPE) for Density Estimation Alex Trindade Dept. of Matematics & Statistics, Texas Tec University Igor Volobouev, Texas Tec University (Pysics Dept.) D.P. Amali Dassanayake,

More information

Bootstrap confidence intervals in nonparametric regression without an additive model

Bootstrap confidence intervals in nonparametric regression without an additive model Bootstrap confidence intervals in nonparametric regression witout an additive model Dimitris N. Politis Abstract Te problem of confidence interval construction in nonparametric regression via te bootstrap

More information

Copyright c 2008 Kevin Long

Copyright c 2008 Kevin Long Lecture 4 Numerical solution of initial value problems Te metods you ve learned so far ave obtained closed-form solutions to initial value problems. A closedform solution is an explicit algebriac formula

More information

MATH745 Fall MATH745 Fall

MATH745 Fall MATH745 Fall MATH745 Fall 5 MATH745 Fall 5 INTRODUCTION WELCOME TO MATH 745 TOPICS IN NUMERICAL ANALYSIS Instructor: Dr Bartosz Protas Department of Matematics & Statistics Email: bprotas@mcmasterca Office HH 36, Ext

More information

HOMEWORK HELP 2 FOR MATH 151

HOMEWORK HELP 2 FOR MATH 151 HOMEWORK HELP 2 FOR MATH 151 Here we go; te second round of omework elp. If tere are oters you would like to see, let me know! 2.4, 43 and 44 At wat points are te functions f(x) and g(x) = xf(x)continuous,

More information

lecture 26: Richardson extrapolation

lecture 26: Richardson extrapolation 43 lecture 26: Ricardson extrapolation 35 Ricardson extrapolation, Romberg integration Trougout numerical analysis, one encounters procedures tat apply some simple approximation (eg, linear interpolation)

More information

7 Semiparametric Methods and Partially Linear Regression

7 Semiparametric Methods and Partially Linear Regression 7 Semiparametric Metods and Partially Linear Regression 7. Overview A model is called semiparametric if it is described by and were is nite-dimensional (e.g. parametric) and is in nite-dimensional (nonparametric).

More information

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i NADARAYA WATSON ESTIMATE JAN 0, 2006: version 2 DATA: (x i, Y i, i =,..., n. ESTIMATE E(Y x = m(x by n i= ˆm (x = Y ik ( x i x n i= K ( x i x EXAMPLES OF K: K(u = I{ u c} (uniform or box kernel K(u = u

More information

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LAURA EVANS.. Introduction Not all differential equations can be explicitly solved for y. Tis can be problematic if we need to know te value of y

More information

Logistic Kernel Estimator and Bandwidth Selection. for Density Function

Logistic Kernel Estimator and Bandwidth Selection. for Density Function International Journal of Contemporary Matematical Sciences Vol. 13, 2018, no. 6, 279-286 HIKARI Ltd, www.m-ikari.com ttps://doi.org/10.12988/ijcms.2018.81133 Logistic Kernel Estimator and Bandwidt Selection

More information

The Laplace equation, cylindrically or spherically symmetric case

The Laplace equation, cylindrically or spherically symmetric case Numerisce Metoden II, 7 4, und Übungen, 7 5 Course Notes, Summer Term 7 Some material and exercises Te Laplace equation, cylindrically or sperically symmetric case Electric and gravitational potential,

More information

Exercises for numerical differentiation. Øyvind Ryan

Exercises for numerical differentiation. Øyvind Ryan Exercises for numerical differentiation Øyvind Ryan February 25, 2013 1. Mark eac of te following statements as true or false. a. Wen we use te approximation f (a) (f (a +) f (a))/ on a computer, we can

More information

232 Calculus and Structures

232 Calculus and Structures 3 Calculus and Structures CHAPTER 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS FOR EVALUATING BEAMS Calculus and Structures 33 Copyrigt Capter 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS 17.1 THE

More information

New Streamfunction Approach for Magnetohydrodynamics

New Streamfunction Approach for Magnetohydrodynamics New Streamfunction Approac for Magnetoydrodynamics Kab Seo Kang Brooaven National Laboratory, Computational Science Center, Building 63, Room, Upton NY 973, USA. sang@bnl.gov Summary. We apply te finite

More information

Scalable machine learning for massive datasets: Fast summation algorithms

Scalable machine learning for massive datasets: Fast summation algorithms Scalable machine learning for massive datasets: Fast summation algorithms Getting good enough solutions as fast as possible Vikas Chandrakant Raykar vikas@cs.umd.edu University of Maryland, CollegePark

More information

Kernel Smoothing and Tolerance Intervals for Hierarchical Data

Kernel Smoothing and Tolerance Intervals for Hierarchical Data Clemson University TigerPrints All Dissertations Dissertations 12-2016 Kernel Smooting and Tolerance Intervals for Hierarcical Data Cristoper Wilson Clemson University, cwilso6@clemson.edu Follow tis and

More information

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x) Calculus. Gradients and te Derivative Q f(x+) δy P T δx R f(x) 0 x x+ Let P (x, f(x)) and Q(x+, f(x+)) denote two points on te curve of te function y = f(x) and let R denote te point of intersection of

More information

2.8 The Derivative as a Function

2.8 The Derivative as a Function .8 Te Derivative as a Function Typically, we can find te derivative of a function f at many points of its domain: Definition. Suppose tat f is a function wic is differentiable at every point of an open

More information

Artificial Neural Network Model Based Estimation of Finite Population Total

Artificial Neural Network Model Based Estimation of Finite Population Total International Journal of Science and Researc (IJSR), India Online ISSN: 2319-7064 Artificial Neural Network Model Based Estimation of Finite Population Total Robert Kasisi 1, Romanus O. Odiambo 2, Antony

More information

AMS 147 Computational Methods and Applications Lecture 09 Copyright by Hongyun Wang, UCSC. Exact value. Effect of round-off error.

AMS 147 Computational Methods and Applications Lecture 09 Copyright by Hongyun Wang, UCSC. Exact value. Effect of round-off error. Lecture 09 Copyrigt by Hongyun Wang, UCSC Recap: Te total error in numerical differentiation fl( f ( x + fl( f ( x E T ( = f ( x Numerical result from a computer Exact value = e + f x+ Discretization error

More information

Gradient Descent etc.

Gradient Descent etc. 1 Gradient Descent etc EE 13: Networked estimation and control Prof Kan) I DERIVATIVE Consider f : R R x fx) Te derivative is defined as d fx) = lim dx fx + ) fx) Te cain rule states tat if d d f gx) )

More information

An Empirical Bayesian interpretation and generalization of NL-means

An Empirical Bayesian interpretation and generalization of NL-means Computer Science Tecnical Report TR2010-934, October 2010 Courant Institute of Matematical Sciences, New York University ttp://cs.nyu.edu/web/researc/tecreports/reports.tml An Empirical Bayesian interpretation

More information

A = h w (1) Error Analysis Physics 141

A = h w (1) Error Analysis Physics 141 Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.

More information

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems 5 Ordinary Differential Equations: Finite Difference Metods for Boundary Problems Read sections 10.1, 10.2, 10.4 Review questions 10.1 10.4, 10.8 10.9, 10.13 5.1 Introduction In te previous capters we

More information

Introduction to Derivatives

Introduction to Derivatives Introduction to Derivatives 5-Minute Review: Instantaneous Rates and Tangent Slope Recall te analogy tat we developed earlier First we saw tat te secant slope of te line troug te two points (a, f (a))

More information

Solving Continuous Linear Least-Squares Problems by Iterated Projection

Solving Continuous Linear Least-Squares Problems by Iterated Projection Solving Continuous Linear Least-Squares Problems by Iterated Projection by Ral Juengling Department o Computer Science, Portland State University PO Box 75 Portland, OR 977 USA Email: juenglin@cs.pdx.edu

More information

2.3 Algebraic approach to limits

2.3 Algebraic approach to limits CHAPTER 2. LIMITS 32 2.3 Algebraic approac to its Now we start to learn ow to find its algebraically. Tis starts wit te simplest possible its, and ten builds tese up to more complicated examples. Fact.

More information

Optimal parameters for a hierarchical grid data structure for contact detection in arbitrarily polydisperse particle systems

Optimal parameters for a hierarchical grid data structure for contact detection in arbitrarily polydisperse particle systems Comp. Part. Mec. 04) :357 37 DOI 0.007/s4057-04-000-9 Optimal parameters for a ierarcical grid data structure for contact detection in arbitrarily polydisperse particle systems Dinant Krijgsman Vitaliy

More information

Poisson Equation in Sobolev Spaces

Poisson Equation in Sobolev Spaces Poisson Equation in Sobolev Spaces OcMountain Dayligt Time. 6, 011 Today we discuss te Poisson equation in Sobolev spaces. It s existence, uniqueness, and regularity. Weak Solution. u = f in, u = g on

More information

Differentiation in higher dimensions

Differentiation in higher dimensions Capter 2 Differentiation in iger dimensions 2.1 Te Total Derivative Recall tat if f : R R is a 1-variable function, and a R, we say tat f is differentiable at x = a if and only if te ratio f(a+) f(a) tends

More information

Deconvolution problems in density estimation

Deconvolution problems in density estimation Deconvolution problems in density estimation Dissertation zur Erlangung des Doktorgrades Dr. rer. nat. der Fakultät für Matematik und Wirtscaftswissenscaften der Universität Ulm vorgelegt von Cristian

More information

HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS

HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS Po-Ceng Cang National Standard Time & Frequency Lab., TL, Taiwan 1, Lane 551, Min-Tsu Road, Sec. 5, Yang-Mei, Taoyuan, Taiwan 36 Tel: 886 3

More information

1 Introduction to Optimization

1 Introduction to Optimization Unconstrained Convex Optimization 2 1 Introduction to Optimization Given a general optimization problem of te form min x f(x) (1.1) were f : R n R. Sometimes te problem as constraints (we are only interested

More information

(a) At what number x = a does f have a removable discontinuity? What value f(a) should be assigned to f at x = a in order to make f continuous at a?

(a) At what number x = a does f have a removable discontinuity? What value f(a) should be assigned to f at x = a in order to make f continuous at a? Solutions to Test 1 Fall 016 1pt 1. Te grap of a function f(x) is sown at rigt below. Part I. State te value of eac limit. If a limit is infinite, state weter it is or. If a limit does not exist (but is

More information

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 Mat 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 f(x+) f(x) 10 1. For f(x) = x 2 + 2x 5, find ))))))))) and simplify completely. NOTE: **f(x+) is NOT f(x)+! f(x+) f(x) (x+) 2 + 2(x+) 5 ( x 2

More information

A Novel Nonparametric Density Estimator

A Novel Nonparametric Density Estimator A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with

More information

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these. Mat 11. Test Form N Fall 016 Name. Instructions. Te first eleven problems are wort points eac. Te last six problems are wort 5 points eac. For te last six problems, you must use relevant metods of algebra

More information

Differential equations. Differential equations

Differential equations. Differential equations Differential equations A differential equation (DE) describes ow a quantity canges (as a function of time, position, ) d - A ball dropped from a building: t gt () dt d S qx - Uniformly loaded beam: wx

More information

A h u h = f h. 4.1 The CoarseGrid SystemandtheResidual Equation

A h u h = f h. 4.1 The CoarseGrid SystemandtheResidual Equation Capter Grid Transfer Remark. Contents of tis capter. Consider a grid wit grid size and te corresponding linear system of equations A u = f. Te summary given in Section 3. leads to te idea tat tere migt

More information

ON RENYI S ENTROPY ESTIMATION WITH ONE-DIMENSIONAL GAUSSIAN KERNELS. Septimia Sarbu

ON RENYI S ENTROPY ESTIMATION WITH ONE-DIMENSIONAL GAUSSIAN KERNELS. Septimia Sarbu ON RENYI S ENTROPY ESTIMATION WITH ONE-DIMENSIONAL GAUSSIAN KERNELS Septimia Sarbu Department of Signal Processing Tampere University of Tecnology PO Bo 527 FI-330 Tampere, Finland septimia.sarbu@tut.fi

More information

Lecture 21. Numerical differentiation. f ( x+h) f ( x) h h

Lecture 21. Numerical differentiation. f ( x+h) f ( x) h h Lecture Numerical differentiation Introduction We can analytically calculate te derivative of any elementary function, so tere migt seem to be no motivation for calculating derivatives numerically. However

More information

CS522 - Partial Di erential Equations

CS522 - Partial Di erential Equations CS5 - Partial Di erential Equations Tibor Jánosi April 5, 5 Numerical Di erentiation In principle, di erentiation is a simple operation. Indeed, given a function speci ed as a closed-form formula, its

More information

Differential Calculus (The basics) Prepared by Mr. C. Hull

Differential Calculus (The basics) Prepared by Mr. C. Hull Differential Calculus Te basics) A : Limits In tis work on limits, we will deal only wit functions i.e. tose relationsips in wic an input variable ) defines a unique output variable y). Wen we work wit

More information

Sin, Cos and All That

Sin, Cos and All That Sin, Cos and All Tat James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 9, 2017 Outline Sin, Cos and all tat! A New Power Rule Derivatives

More information

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab To appear in: Advances in Neural Information Processing Systems 9, eds. M. C. Mozer, M. I. Jordan and T. Petsce. MIT Press, 997 Bayesian Model Comparison by Monte Carlo Caining David Barber D.Barber@aston.ac.uk

More information

New Distribution Theory for the Estimation of Structural Break Point in Mean

New Distribution Theory for the Estimation of Structural Break Point in Mean New Distribution Teory for te Estimation of Structural Break Point in Mean Liang Jiang Singapore Management University Xiaou Wang Te Cinese University of Hong Kong Jun Yu Singapore Management University

More information

Data-Based Optimal Bandwidth for Kernel Density Estimation of Statistical Samples

Data-Based Optimal Bandwidth for Kernel Density Estimation of Statistical Samples Commun. Teor. Pys. 70 (208) 728 734 Vol. 70 No. 6 December 208 Data-Based Optimal Bandwidt for Kernel Density Estimation of Statistical Samples Zen-Wei Li ( 李振伟 ) 2 and Ping He ( 何平 ) 3 Center for Teoretical

More information

Parameter Fitted Scheme for Singularly Perturbed Delay Differential Equations

Parameter Fitted Scheme for Singularly Perturbed Delay Differential Equations International Journal of Applied Science and Engineering 2013. 11, 4: 361-373 Parameter Fitted Sceme for Singularly Perturbed Delay Differential Equations Awoke Andargiea* and Y. N. Reddyb a b Department

More information

On Local Linear Regression Estimation of Finite Population Totals in Model Based Surveys

On Local Linear Regression Estimation of Finite Population Totals in Model Based Surveys American Journal of Teoretical and Applied Statistics 2018; 7(3): 92-101 ttp://www.sciencepublisinggroup.com/j/ajtas doi: 10.11648/j.ajtas.20180703.11 ISSN: 2326-8999 (Print); ISSN: 2326-9006 (Online)

More information

Digital Filter Structures

Digital Filter Structures Digital Filter Structures Te convolution sum description of an LTI discrete-time system can, in principle, be used to implement te system For an IIR finite-dimensional system tis approac is not practical

More information

POLYNOMIAL AND SPLINE ESTIMATORS OF THE DISTRIBUTION FUNCTION WITH PRESCRIBED ACCURACY

POLYNOMIAL AND SPLINE ESTIMATORS OF THE DISTRIBUTION FUNCTION WITH PRESCRIBED ACCURACY APPLICATIONES MATHEMATICAE 36, (29), pp. 2 Zbigniew Ciesielski (Sopot) Ryszard Zieliński (Warszawa) POLYNOMIAL AND SPLINE ESTIMATORS OF THE DISTRIBUTION FUNCTION WITH PRESCRIBED ACCURACY Abstract. Dvoretzky

More information

Finite Difference Method

Finite Difference Method Capter 8 Finite Difference Metod 81 2nd order linear pde in two variables General 2nd order linear pde in two variables is given in te following form: L[u] = Au xx +2Bu xy +Cu yy +Du x +Eu y +Fu = G According

More information

ERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER*

ERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER* EO BOUNDS FO THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BADLEY J. LUCIE* Abstract. Te expected error in L ) attimet for Glimm s sceme wen applied to a scalar conservation law is bounded by + 2 ) ) /2 T

More information

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example, NUMERICAL DIFFERENTIATION James T Smit San Francisco State University In calculus classes, you compute derivatives algebraically: for example, f( x) = x + x f ( x) = x x Tis tecnique requires your knowing

More information

. If lim. x 2 x 1. f(x+h) f(x)

. If lim. x 2 x 1. f(x+h) f(x) Review of Differential Calculus Wen te value of one variable y is uniquely determined by te value of anoter variable x, ten te relationsip between x and y is described by a function f tat assigns a value

More information

Flavius Guiaş. X(t + h) = X(t) + F (X(s)) ds.

Flavius Guiaş. X(t + h) = X(t) + F (X(s)) ds. Numerical solvers for large systems of ordinary differential equations based on te stocastic direct simulation metod improved by te and Runge Kutta principles Flavius Guiaş Abstract We present a numerical

More information

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4.

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4. December 09, 20 Calculus PracticeTest s Name: (4 points) Find te absolute extrema of f(x) = x 3 0 on te interval [0, 4] Te derivative of f(x) is f (x) = 3x 2, wic is zero only at x = 0 Tus we only need

More information

1. Consider the trigonometric function f(t) whose graph is shown below. Write down a possible formula for f(t).

1. Consider the trigonometric function f(t) whose graph is shown below. Write down a possible formula for f(t). . Consider te trigonometric function f(t) wose grap is sown below. Write down a possible formula for f(t). Tis function appears to be an odd, periodic function tat as been sifted upwards, so we will use

More information

Function Composition and Chain Rules

Function Composition and Chain Rules Function Composition and s James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 8, 2017 Outline 1 Function Composition and Continuity 2 Function

More information

Pre-Calculus Review Preemptive Strike

Pre-Calculus Review Preemptive Strike Pre-Calculus Review Preemptive Strike Attaced are some notes and one assignment wit tree parts. Tese are due on te day tat we start te pre-calculus review. I strongly suggest reading troug te notes torougly

More information

Computational tractability of machine learning algorithms for tall fat data

Computational tractability of machine learning algorithms for tall fat data Computational tractability of machine learning algorithms for tall fat data Getting good enough solutions as fast as possible Vikas Chandrakant Raykar vikas@cs.umd.edu University of Maryland, CollegePark

More information

arxiv: v1 [math.pr] 28 Dec 2018

arxiv: v1 [math.pr] 28 Dec 2018 Approximating Sepp s constants for te Slepian process Jack Noonan a, Anatoly Zigljavsky a, a Scool of Matematics, Cardiff University, Cardiff, CF4 4AG, UK arxiv:8.0v [mat.pr] 8 Dec 08 Abstract Slepian

More information

The Verlet Algorithm for Molecular Dynamics Simulations

The Verlet Algorithm for Molecular Dynamics Simulations Cemistry 380.37 Fall 2015 Dr. Jean M. Standard November 9, 2015 Te Verlet Algoritm for Molecular Dynamics Simulations Equations of motion For a many-body system consisting of N particles, Newton's classical

More information

MVT and Rolle s Theorem

MVT and Rolle s Theorem AP Calculus CHAPTER 4 WORKSHEET APPLICATIONS OF DIFFERENTIATION MVT and Rolle s Teorem Name Seat # Date UNLESS INDICATED, DO NOT USE YOUR CALCULATOR FOR ANY OF THESE QUESTIONS In problems 1 and, state

More information

Exam 1 Review Solutions

Exam 1 Review Solutions Exam Review Solutions Please also review te old quizzes, and be sure tat you understand te omework problems. General notes: () Always give an algebraic reason for your answer (graps are not sufficient),

More information

Precalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here!

Precalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here! Precalculus Test 2 Practice Questions Page Note: You can expect oter types of questions on te test tan te ones presented ere! Questions Example. Find te vertex of te quadratic f(x) = 4x 2 x. Example 2.

More information

Bootstrap prediction intervals for Markov processes

Bootstrap prediction intervals for Markov processes arxiv: arxiv:0000.0000 Bootstrap prediction intervals for Markov processes Li Pan and Dimitris N. Politis Li Pan Department of Matematics University of California San Diego La Jolla, CA 92093-0112, USA

More information

Functions of the Complex Variable z

Functions of the Complex Variable z Capter 2 Functions of te Complex Variable z Introduction We wis to examine te notion of a function of z were z is a complex variable. To be sure, a complex variable can be viewed as noting but a pair of

More information

Learning based super-resolution land cover mapping

Learning based super-resolution land cover mapping earning based super-resolution land cover mapping Feng ing, Yiang Zang, Giles M. Foody IEEE Fellow, Xiaodong Xiuua Zang, Siming Fang, Wenbo Yun Du is work was supported in part by te National Basic Researc

More information

Efficient algorithms for for clone items detection

Efficient algorithms for for clone items detection Efficient algoritms for for clone items detection Raoul Medina, Caroline Noyer, and Olivier Raynaud Raoul Medina, Caroline Noyer and Olivier Raynaud LIMOS - Université Blaise Pascal, Campus universitaire

More information

Handling Missing Data on Asymmetric Distribution

Handling Missing Data on Asymmetric Distribution International Matematical Forum, Vol. 8, 03, no. 4, 53-65 Handling Missing Data on Asymmetric Distribution Amad M. H. Al-Kazale Department of Matematics, Faculty of Science Al-albayt University, Al-Mafraq-Jordan

More information

The Complexity of Computing the MCD-Estimator

The Complexity of Computing the MCD-Estimator Te Complexity of Computing te MCD-Estimator Torsten Bernolt Lerstul Informatik 2 Universität Dortmund, Germany torstenbernolt@uni-dortmundde Paul Fiscer IMM, Danisc Tecnical University Kongens Lyngby,

More information

Robust Average Derivative Estimation. February 2007 (Preliminary and Incomplete Do not quote without permission)

Robust Average Derivative Estimation. February 2007 (Preliminary and Incomplete Do not quote without permission) Robust Average Derivative Estimation Marcia M.A. Scafgans Victoria inde-wals y February 007 (Preliminary and Incomplete Do not quote witout permission) Abstract. Many important models, suc as index models

More information

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,

More information

Chapter 4: Numerical Methods for Common Mathematical Problems

Chapter 4: Numerical Methods for Common Mathematical Problems 1 Capter 4: Numerical Metods for Common Matematical Problems Interpolation Problem: Suppose we ave data defined at a discrete set of points (x i, y i ), i = 0, 1,..., N. Often it is useful to ave a smoot

More information

Journal of Computational and Applied Mathematics

Journal of Computational and Applied Mathematics Journal of Computational and Applied Matematics 94 (6) 75 96 Contents lists available at ScienceDirect Journal of Computational and Applied Matematics journal omepage: www.elsevier.com/locate/cam Smootness-Increasing

More information

Finite Difference Methods Assignments

Finite Difference Methods Assignments Finite Difference Metods Assignments Anders Söberg and Aay Saxena, Micael Tuné, and Maria Westermarck Revised: Jarmo Rantakokko June 6, 1999 Teknisk databeandling Assignment 1: A one-dimensional eat equation

More information

ch (for some fixed positive number c) reaching c

ch (for some fixed positive number c) reaching c GSTF Journal of Matematics Statistics and Operations Researc (JMSOR) Vol. No. September 05 DOI 0.60/s4086-05-000-z Nonlinear Piecewise-defined Difference Equations wit Reciprocal and Cubic Terms Ramadan

More information