UNIMODAL KERNEL DENSITY ESTIMATION BY DATA SHARPENING

Size: px
Start display at page:

Download "UNIMODAL KERNEL DENSITY ESTIMATION BY DATA SHARPENING"

Transcription

1 Statistica Sinica 15(2005), UNIMODAL KERNEL DENSITY ESTIMATION BY DATA SHARPENING Peter Hall 1 and Kee-Hoon Kang 1,2 1 Australian National University and 2 Hankuk University of Foreign Studies Abstract: We discuss a robust data sarpening metod for rendering a standard kernel estimator, wit a given bandwidt, unimodal. It as teoretical and numerical properties of te type tat one would like suc a tecnique to enjoy. In particular, we sow teoretically tat, wit probability converging to 1 as sample size diverges, our tecnique alters te kernel estimator only in places were te latter as spurious bumps, and is identical to te kernel estimator in places were tat estimator is monotone in te correct direction. Moreover, it automatically splices togeter, in a smoot and seamless way, tose parts of te estimator tat it leaves uncanged and tose tat it adjusts. Provided te true density is unimodal our estimator generally reduces mean integrated squared error of te standard kernel estimator. Key words and prases: Bandwidt, data sarpening, eavy tailed distributions, kernel metods, mean squared error, nonparametric density estimation, order constraint, tilting metods, unimodal density. 1. Introduction Motivation for te assumption of unimodality usually as a Bayesian connection. It expresses a prior belief tat te sampled distribution is omogeneous. Imposing it as a constraint, wen constructing a nonparametric density estimator, is arguably te most effective way of incorporating knowledge of omogeneity into te final result, witout sacrificing te particularly advantageous adaptivity conferred by nonparametric metods. Te constraint of unimodality also makes te estimator particularly robust against undersmooting, wic, in te absence of a restriction on te number of modes, can seriously impair te qualitative appearance of te estimator. Tere is an extensive literature on function estimation under sape constraints. Recently treated metods for unimodal density estimation include tose considered by Wang (1995), Bickel and Fan (1996), Birgé (1997), Ceng, Gasser and Hall (1999) and Hall and Presnell (1999). A tecnique based on data sarpening was suggested by Braun and Hall (2001), but witout any teoretical support and in te absence of clear guidance as to coice of distance function. In te

2 74 PETER HALL AND KEE-HOON KANG present paper we demonstrate, bot teoretically and in numerical terms, tat a version of data sarpening based on L 1 loss, or using closely related distance functions, performs particularly well. It produces smoot estimators wit very good mean squared error performance, and wit aestetically attractive features wic few oter approaces enjoy. Wen used for unimodal density estimation, data sarpening usually starts from a conventional kernel density estimator. It involves modifying te sample so as to ensure te standard estimator, applied to te altered rater tan te original sample, is unimodal. In particular, for a given bandwidt and kernel te data are moved te least amount subject to te kernel estimator aving te unimodal property. Provided te kernel is a probability density, altering te sample does not affect te basic properties tat make standard kernel metods so attractive, for example te fact tat tey are nonnegative and integrate to 1. If te sampled density truly is unimodal ten, at least for moderately large samples, we expect te standard kernel estimator to depart from unimodality only in te tails and in te vicinity of te true mode. Tese are te places were te true density is relatively flat. Tere, a standard kernel density estimator tends to suffer from spurious wiggles tat prevent it from reflecting qualitative features of te sampled density. Hence, in principle it is necessary only to modify te estimator in suc places. We sall sow tat tese are exactly te places were te data sarpening algoritm adjusts te data, and tat wit ig probability it does not alter any data tat are not very close to te mode or some distance out in one or oter of te tails. Likewise, it alters te standard kernel estimator itself only very close to te mode or out in te tails. And it automatically splices togeter te adjusted and original forms of te kernel estimator at points were tey join. Te final estimator is very smoot; it enjoys as many derivatives as te kernel tat was used in its construction. Of course, tese goals could be acieved in oter ways, using more explicit tecniques. One suc approac would be to draw orizontal lines at appropriate eigts across unwanted valleys in te conventional kernel estimator, so as to fill tem in; to incorporate a degree of monotone smooting at places were te lines met te grap of te conventional estimator, so as to ensure te final estimator was smoot; and to renormalise te result, so it integrated to 1. However, on account of te normalisation step tis approac does not ave te property tat it equals te standard kernel estimator in places were te latter does not need adjustment. Tat can lead to significant performance difficulties, as we sall explain sortly. Moreover, te fact tat te explicitly linearised density estimator as perfectly flat sections means tat it conveys te unwanted visual impression tat tere is someting intrinsically special and interesting about tose parts of te

3 UNIMODAL DENSITY ESTIMATION 75 true density. Tis can be a particular problem in te tails of te linearised density estimator, were a grap of te estimator will often decrease to zero in an eye-catcing sequence of flat steps wit curved edges. Tese and oter aspects of te explicitly linearised estimator are of course no more tan artifacts of te tecnique employed to construct it, altoug tat is usually far from obvious to te casual observer. Suc issues migt be unimportant if te use to wic te final estimate was put was a purely quantitative one, not influenced by te qualitative appearance of a grap of te estimator, but tat is often not te case in practice. Te majority of tese difficulties can be removed by appropriately tuning te explicitly constructed estimator. However, te total number of subsidiary smoots and tapers tat are required can be large, and developing a totally objective and effective procedure for implementing tem produces an inelegant and tedious procedure, involving metodology tat is aestetically unattractive. Moreover, in te case of relatively eavy-tailed densities te linearisation step can add significantly to te probability mass of te density estimator. Wile tis problem is eliminated by te normalisation step, te latter often creates difficulties of its own, by significantly increasing or reducing te eigt of te density estimator in te body of te distribution. Tat can impair performance, for example by seriously increasing mean squared error. A related penomenon occurs in te case of unimodal density estimation using data tilting metods (Hall and Huang (2002)), were te need to remove spurious wiggles in te tails of a conventional density estimator can result in a detrimental increase in te density estimator at oter places, leading to poor mean squared error performance. Numerical results illustrating tis point will be summarised in Section 4, and related penomena can be observed for unimodal density estimators constructing using linearisation and related tecniques. In principle, data sarpening can be used to ensure unimodality for oter estimator types. However, te extreme simplicity of standard kernel estimators, and te fact tat tey can be readily used wit a fixed bandwidt, make tem especially attractive on bot aestetic and computational grounds. By way of contrast, local likeliood metods for density estimation, wic usually require spatially varying bandwidts in order to be implemented effectively, are unattractive. Te distance, D say, troug wic te dataset as to be moved in order to ensure unimodality, can be used to test te null ypotesis tat te density is unimodal. Te ypotesis would be rejected if D were too large. Te null distribution could be estimated using bootstrap metods, by resampling from a unimodal distribution. One candidate for te latter would be te constrained distribution of te test statistic, altoug oter options are available.

4 76 PETER HALL AND KEE-HOON KANG Multivariate applications of our metod are also possible. Indeed, in te setting of two or more dimensions tere are relatively few alternative approaces. Again, te distance troug wic te data ave to be moved could be used as te basis for a test of unimodality. Some early metods for density estimation under qualitative constraints, starting from Grenander s (1956) introduction of te tecnique tat is now often referred to as nonparametric maximum likeliood, employed data tilting. See Prakasa Rao (1969) for an application to unimodal density estimation. Recent applications of related ideas include work of Bickel and Fan (1996) and Hall and Huang (2002), on unimodal density estimation. Fougères (1997) as sown tat te metod of monotone rearrangement, suggested by Hardy, Littlewood and Pólya (1952) and applied to a kernel density estimator, produces a consistent unimodal estimator wen te true density is unimodal. Metods for estimating regression functions under qualitative constraints include tose suggested by Friedman and Tibsirani (1984), Ramsay (1988), Kelly and Rice (1990), Qian (1994), Tantiyaswasdikul and Woodroofe (1994), Delecroix, Simioni and Tomas- Agnan (1995), Mammen and Tomas-Agnan (1999) and Hall and Huang (2001). Data sarpening metods were surveyed by Braun and Hall (2001). Te remainder of tis paper is organised as follows. Section 2 introduces metodology. Main teoretical results are summarised in Section 3, and numerical properties are outlined in Section 4. Tecnical arguments for Section 3 are given in Section Metodology A density f defined on te real line is said to be unimodal if tere exists a point m f, called a mode of f, suc tat f is increasing on (, m f ) and decreasing on (m f, ). (Here and below we use te terms increasing and decreasing to mean nondecreasing and nonincreasing, respectively.) Assuming f as a continuous derivative on te real line we say tat f is uniquely unimodal, or equivalent tat it is unimodal wit a unique mode, if tere exists a unique point m f in te interior of te support of f suc tat f (m f ) = 0. Note tat te definition of unique unimodality excludes densities wit soulders, as well as tose wit more tan one turning point. Let X = {X 1,..., X n } denote te original dataset drawn from te population wit density f. Te conventional kernel estimator of f is ˆf X (x) = (n) 1 n ( ) x Xi K, (2.1) were is a bandwidt and K a kernel function. We wis to perturb te data as little as possible suc tat te constraint of unimodality is satisfied. To tis end,

5 UNIMODAL DENSITY ESTIMATION 77 let Y = {Y 1,..., Y n } and ˆf Y be respectively a sarpened dataset derived from X, and te corresponding kernel density estimator obtained on replacing X by Y at (2.1). We take Y to be te minimiser of D(X, Y) = i Ψ(X i Y i ) subject to ˆf Y being unimodal, were Ψ denotes a distance function. Examples of functions Ψ include Ψ(x) = x p, to wic we sall refer as L p distance, and more general distance measures analogous to tose used in te teory of M estimation, suc as Ψ(x) = x 0 ψ(y) dy were ψ denotes an antisymmetric function tat is positive on te positive alf line. Forcing ˆf Y to be unimodal means ensuring te existence of a quantity m suc tat ˆf Y (x) 0 for x m, and ˆf Y (x) 0 for x m. In practical numerical terms we first coose a candidate m for m, and ten coose Y(m) to minimise i Ψ(X i Y i ) subject to ˆf Y being unimodal. Finally, writing (m) for te corresponding value of i Ψ(X i Y i ), we select m to minimise (m). (In practice, we found tat m did not depend on te starting candidate.) Tus, our metod produces an estimator of te mode as well as a unimodal density estimator. However, we sall not explore properties of m ere, except to note tat, wile its bias and error about te mean are of te same orders as tose of te conventional mode estimator (defined as te point at wic ˆf X attains its global maximum), our m is not asymptotically normally distributed. As we in Section 4, excellent statistical performance, especially for eavy tailed distributions, is obtained very generally using distance measures suc as L 1. In particular, for all te distributions wit wic we ave experimented, tis approac equals or outperformed metods based on L p distance for p > 1. Te excellent statistical properties of our metods based on L 1 -type distances arise from te distance function being asymptotically linear, rater tan increasing at a faster rate, for large values of its argument. Tis feature implies tat only a relatively small penalty is imposed for moving an outlying data value a long distance to te main body of te sample, in order to ensure unimodality. In contrast, wen using L 2 distance te algoritm tends to move data from te middle of te distribution to te tails, as well as moving tem in te opposite direction; it sifts many data by small amounts, rater tan a small number of data by a large amount, and tis almost invariably impairs performance. However, te fact tat te distance function x as a discontinuous derivative at te origin means tat wen using L 1 distance one tends to experience numerical difficulties in 10% to 20% of samples, depending on distribution type, wen attempting to minimise i X i Y i subject to ˆf Y being unimodal. Tis problem can be overcome by smooting te loss function at its vertex. One approac is to use a function suc as Ψ = Ψ tan, constructed wit ψ(x) = arctan(x). Tis Ψ is asymptotically linear, and perfectly convex, but as a smoot bowl sape at te origin, so its statistical performance is close to tat for

6 78 PETER HALL AND KEE-HOON KANG L 1 distance, witout te latter s numerical drawbacks. As in te setting of robust statistical metods, tere are many opportunities for improving performance, for instance by using a non-convex loss function corresponding to a redescending weigt. An example is te function Ψ based on ψ(x) = sin(x/a) I( x < πa). (Here, te parameter a would ave to be cosen.) For discussion of tis and many oter opportunities for coosing ψ, see, for example, Andrews et al. (1972). In numerical work in Section 4 we compare te above data sarpening metods wit a data tilting approac suggested by Hall and Huang (2002), based on ideas discussed by Hall and Presnell (1999). To construct a unimodal density estimator by tilting te empirical distribution, we cange te data weigts instead of altering te data temselves. Accordingly, te kernel density estimator becomes ˆf(x p) = 1 n ( ) x Xi p i K, were p 1,..., p n are cosen to minimise a measure D(p) of te distance between te multinomial distribution p = (p 1,..., p n ) on n points, and te corresponding uniform distribution, subject to ˆf(x p) being unimodal. Te resulting constrained density estimator will be denoted by ˆf tilt. In Section 4 we take D 1 (p) = n i p i log(np i ), tis being a particular form of power divergence (Cressie and Read (1984)). It gives relatively good performance in te context of unimodal density estimation, largely because it is robust against aberrations caused by reducing one or more weigts p i to 0. Tat operation is frequently necessary in order to eliminate outlying data tat cause spurious bumps in te tails of unconstrained kernel density estimators. In comparison, te more conventional power divergence measure, D 0 (p) = i log(np i), wic is infinite wen one or more values of p i vanis, often leads to undefinable density estimators if te imposed constraint is unimodality. Neverteless, even wen data tilting uses D 1 it is outperformed by data sarpening based on Ψ tan. 3. Teoretical Properties Tere is a variety of approaces to establising teory for data sarpening unimodal density estimators, many of tem tailored to specific distance measures D. In most instances te teory sows tat in te case of compactly supported kernels, wit ig probability and away from te mode and te tails of te sampled distribution, te sarpened estimator ˆf Y is identical to its conventional counterpart. For brevity and simplicity we sall describe only one approac ere, developed for te case of L 1 distance; see (3.1) below. It as te advantage tat, under a mild and straigtforward condition, (3.3), te algoritm based on minimising L 1 distance guarantees not only tat ˆf Y = ˆf X along most of te support of f, but also tat specific convergence rates are acieved uniformly on

7 UNIMODAL DENSITY ESTIMATION 79 te line. See Teorem 3.1. Alternative approaces, based on more general loss functions, do not appear to allow suc a simple and elegant description of te properties of ˆf Y. All our results extend easily to te case of estimating f under te constraint tat it as k 1 modes, assuming tis condition is incorporated into te basic algoritm and tat te modes of ˆfY are required to be at least a certain fixed, sufficiently small distance apart. We initially assume te kernel K is a symmetric, compactly supported, uniquely unimodal probability density wit two bounded derivatives. Call tis condition (C K,1 ). Teorems below old witout cange if K is taken to be te Gaussian kernel, altoug te proofs are more elaborate in tat case. However, if K is not compactly supported ten it is not generally true tat ˆf Y = ˆf X, wit ig probability, for most of te support of f. Given a data sarpening algoritm A tat takes te original dataset X = {X 1,..., X n } to Y = {Y 1,..., Y n }, were Y i denotes te image of X i, we define n D(X, Y) = X i Y i. (3.1) We apply subscripts to A and Y to denote specific algoritms and te sarpened datasets tat tey produce, respectively. In particular, let A 0 denote te version of A tat selects Y = Y 0 to minimise D(X, Y) subject to ˆf Y being unimodal. Let F 0 be te class of unimodal densities f wit te properties: te support of f is an interval wit endpoints a f < b f (not necessarily finite), te mode m f of f is unique and lies in (a f, b f ), f as a bounded, uniformly continuous derivative on (, ), and f is bounded away from 0 on te set S(c) = [c 1, c 2 ] [c 3, c 4 ] wenever c = (c 1,..., c 4 ) satisfies a f < c 1 < c 2 < m f < c 3 < c 4 < b f. (3.2) Put 0 = n 1/5. Te notation 0 means tat te ratio / 0 is bounded away from zero and infinity as n. Teorem 3.1. Let f F 0, and assume D is given by (3.1), tat (C K,1 ) olds and tat 0 as n. Suppose too tat tere exists a particular data sarpening algoritm A 1, not necessarily A 0, wic wit probability 1 produces a unimodal density estimator ˆf Y, and wic moves X a distance D(X, Y 1 ) to Y 1, were 2 0 D(X, Y 1) 0 (3.3) wit probability 1. n, Ten wen A 0 is used instead of A 1, c satisfies (3.2) and P f {Y i = X i for eac i suc tat eiter X i or Y i is in S(c)} 1, (3.4)

8 80 PETER HALL AND KEE-HOON KANG sup ˆf Y (x) ˆf X (x) = o p ( 0 ), sup ˆf Y(x) ˆf X (x) = o p (1). (3.5) <x< <x< Te algoritm A 0 is not necessarily uniquely defined, and results suc as (3.4) sould be interpreted as olding uniformly in all versions of A 0. Tat is, te probability tat tere is a version of A 0 for wic Y i X i for some i suc tat eiter X i or Y i is in S(c), converges to zero as n. Non-uniqueness of A 0 can occur wit positive probability wen n = 2, altoug we conjecture tat wen (C K,1 ) is satisfied and n 3, A 0 is uniquely defined wit probability 1. We may paraprase (3.4) by saying tat wit probability converging to 1, A 0 leaves unaltered eac data value X i tat is neiter very close to te true mode nor very far out in te tails. Since K is compactly supported, and (3.4) olding for Y 1 implies te same for Y 0, ten if Y is produced by eiter A 0 or A 1 we ave, for eac vector c suc tat (3.2) olds, P f { ˆf X (x) = ˆf Y (x) for all x S(c)} 1 (3.6) as n. Terefore, away from te mode and te tails, ˆfY and its derivatives ave all te weak convergence properties of ˆf X and te latter s derivatives. Hence, in terms of integrated squared error, we cannot expect improved performance of ˆf Y over ˆf X in tose regions. Note tat we ave assumed only one derivative of f. Tis explains te somewat slow convergence rate at (3.5). However, (3.6) implies tat if Y = Y 0 is produced by A 0, and if f as two bounded derivatives, ten (3.5) as a more conventional form away from te mode and te tails, as follows. Using te data sarpening algoritm A 0, we ave for eac vector c suc tat (3.2) olds, ˆf Y (x) f(x) = O p ( 2 0 ) and ˆf Y (x) f (x) = O p ( 0 ) wenever x S(c), and sup ˆf Y (x) f(x) = O p ( 2 0 l1/2 ), sup ˆf Y (x) f (x) = O p ( 0 l 1/2 ), (3.7) x S(c) x S(c) were l = log n. In fact, under mild additional conditions tese results also extend to neigbouroods of te mode, as we sow following Teorem 3.2. It is only in te extreme tails, beyond locations tat temselves move furter out into eac tail as n increases, tat ˆf Y and ˆf Y may not enjoy te convergence rates of te conventional estimators ˆf X and ˆf X, respectively. In principle (3.4) does not exclude te possibility of cross mappings, were a datum X i in one tail is mapped to Y i in te opposite tail or in a close neigbourood of te mode; or were X i near te mode is mapped to Y i in a tail. However, it is easy to see tat te probability tat tis occurs tends to 0 as n. Indeed, if a cross mapping were to arise ten wit probability converging to 1 we could produce a lesser value of D(X, Y 0 ) by instead moving X to a

9 UNIMODAL DENSITY ESTIMATION 81 point in S(c) and ten moving anoter point in S(c) to te original image of X i. However, for any given c satisfying (3.2), result (3.4) states tat te probability of tis occurring converges to 0. Result (3.4) implies tat wen D(X, Y) is defined by (3.1), and (3.3) olds, te algoritm A 0 is in effect constructed quite separately near te mode and in eiter tail. In te next two teorems we note tat it is possible to find algoritms for te mode and te tails, respectively, suc tat ˆf Y is unimodal and (3.3) olds. Following Teorem 3.3 we describe ow to combine tese two algoritms into a single algoritm tat satisfies (3.3). Assume f F 0 as two bounded derivatives in a neigbourood of te mode, f is continuous at te mode, and f (m f ) < 0. Call tis condition (C f ). Suppose too tat K is a compactly supported, symmetric, uniquely unimodal probability density wit tree bounded derivatives; denote tis constraint by (C K,2 ). Teorem 3.2. Assume (C f ) and (C K,2 ) old, and 0. Ten tere exists a data sarpening algoritm A 1 wic, wen applied to data on (m f ɛ, m f + ɛ) for ɛ > 0 sufficiently small, ensures tat ˆf Y is unimodal on (m f ɛ, m f + ɛ) and also guarantees tat for eac 0 < δ < ɛ, as n, P f {Y i = X i for eac i suc tat eiter X i or Y i is in (m f ɛ, m f δ) (m f + δ, m f + ɛ)} 1 and i : X i m f ɛ X i Y i = O p ( 1 0 ). (3.8) Result (3.8) implies a rate of convergence in te neigbourood of te origin. Indeed, suppose we use te algoritm A 0, instead of A 1, to sarpen data. Assume too tat te conditions of Teorems 3.1 and 3.2 old. Ten it follows from (3.4), (3.8) and te fact tat te probability of cross-mappings converges to 0, tat for any ɛ > 0, (3.8) olds wen te Y i s are produced by A 0 rater tan A 1. Now, a Taylor expansion gives ˆf X (x) ˆf Y (x) sup K n 2 i : X i m f ɛ X i Y i, uniformly in x satisfying x m f δ, for all sufficiently large n. Terefore, by (3.8), ˆf X ˆf Y = O p ( 2 0 l1/2 ) uniformly in a neigbourood of te mode. It follows from tis result and te fact tat ˆf X f = O p ( 2 0 l1/2 ) uniformly in a neigbourood of te mode, tat ˆf Y f = O p ( 2 0 l1/2 ) tere. Te latter property enables us to replace te supremum over x S(c), in te first result at (3.7), by te supremum over x [c 1, c 4 ], were c 1 and c 4 satisfy (3.2). Similarly it may be proved tat an identical cange can be made to te second result at (3.7). Te particular construction of A 1 tat we give in te proof of Teorem 3.2 actually

10 82 PETER HALL AND KEE-HOON KANG ensures tat ˆf X f = O p ( 2 0 ) in an O p( 0 )-neigbourood of m f, and ˆf X = ˆf Y outside tat neigbourood and away from te extreme tails. Finally we discuss a data sarpening algoritm tat produces monotone tails and at te same time ensures (3.3). We consider te case of tails tat decrease only polynomially fast, of wic tose of Student s t density are an example. Densities wit tails tat decrease exponentially quickly are more easily treated. It suffices to consider data sarpening in te rigt and tail. Let α > 1 be given and assume K is a symmetric, compactly supported, uniquely unimodal density wit ν α,k bounded derivatives, tat f F 0 as ν α,f derivatives on (C, ) for some C > 0, and tat f (j) (x) x α j as x for 0 j ν α,f, were ν α,f, ν α,k 2. Call tis condition (C α ). Te values of ν α,k and ν α,f depend only on α; details are given by (5.14) in te proof of Teorem 3.3. Teorem 3.3. If (C α ) olds for α > 8, and 0, ten tere exists a data sarpening algoritm A 1 wic, wen applied to data on (C, ) for C > 0 sufficiently large, ensures ˆf Y is decreasing on (C, ) and also guarantees tat for all C > C, P f {Y i = X i for eac i suc tat eiter X i or Y i is in (C, C )} 1 and i : X i >C X i Y i 0 in probability as n. 2 0 Te constraint α > 8 can be weakened by using a longer argument. Moreover, our numerical work will focus particular attention on performance of data sarpening wen α is as small as 3/2. Assume f F 0, and suppose in addition tat te conditions of Teorems apply to f and K, wit te conditions of Teorems 3.3 olding in bot tails, possibly for different values of α. Teorems 3.2 and 3.3 imply tat we may construct a single data sarpening algoritm A 1 tat produces an estimator ˆf Y wic is unimodal, and suc tat (3.3) and (3.4) bot old. Indeed, (3.4) itself enables us to splice togeter te separate data sarpening algoritms for te tails, and tat for te mode, into a single algoritm tat applies to te wole dataset, in te knowledge tat unimodality of ˆfY will take care of itself at all places tat are intermediate between eiter tail and te mode. Tis is guaranteed by te fact tat ˆf Y = ˆf X in suc places; since f does not vanis tere ten ˆf X is monotone at te intermediate places, wit ig probability. Of course, once (3.3) as been proved for A 1 ten we know from Teorem 3.1 tat bot (3.3) and (3.4) old for A Numerical Properties In te work summarised ere we drew datasets of size n = 25 from te standard Normal distribution (distribution 1), Student s t distribution wit tree

11 UNIMODAL DENSITY ESTIMATION 83 degrees of freedom (distribution 2), te Normal mixture 0.2 N(0, 1) N( 1 2, ( 2 3 )2) N( 13 12, ( 5 9 )2) (distribution 3), and te Normal mixture ( ( 0.35 N 1, ( 3 5 )2) N 1, ( 5 2 )2) N (5, ( 3 2 )2) (distribution 4). All are unimodal and are depicted in Figure 1. Distributions 1 and 2 were cosen 20 because tey represent opposite extremes in terms of tail weigt, distribution 40 3 represents moderately skewed densities (it is density #2 of Marron and60wand (1992)), wile distribution 4 is igly skewed wit a long 80 and relatively flat part, and terefore presents particular callenges to metods 100 for unimodal density 120 estimation. Neverteless, 140 out of te four densities te second is arguably te most difficult for wic to construct unimodal estimators. Standard kernel estimators, computed using data from te second distribution, usually ave quite a few spurious bumps in te tails. Ideally, tese sould be removed witout increasing log() mean integrated squared error. We terefore devote muc of our attention to te log(mise) relative performance of metods applied to distribution x MSE Squared bias Variance Estimates -6 OR PSI L2 pi Density 1 Density Density 3 Density Figure 1. Four true densities. Density 1 is standard normal, density 2 is Student s t wit tree degrees of freedom, density 3 is moderately skewed, and density 4 is igly skewed wit a long, relatively flat portion. For te results presented ere te Gaussian kernel was used trougout, since it is a favourite of statisticians in problems involving mode analysis. However,

12 84 PETER HALL AND KEE-HOON KANG very similar results were obtained using, for example, te biweigt kernel. Indeed, te Gaussian kernel is so ligt tailed tat, even toug it is not compactly supported, it produces density estimates wic visually satisfy te properties reported in Section 3. Tat is, except on very fine scales, te constrained density estimates are usually visually indistinguisable, in places away from te modes and te tails, from teir conventional, unconstrained counterparts. We return to tis point in our discussion of Figure 4 below. We discuss results for estimators constrained by data sarpening using eiter L 2 distance or te distance Ψ tan. Similar results are obtained using L p rater tan L 2 distance, for p > 1. As expected, te efficiency advantages of Ψ tan distance decrease as p decreases to 1, altoug Ψ tan retains its competitive edge. For comparison we also give results for te standard, unconstrained kernel estimator; for te estimator ˆf tilt obtained by data tilting, tis time using te distance measure D 1. See Section 2 for details of te construction of ˆftilt. A comparison wit te kernel rearrangement metod of Fougères (1997) will be made later in tis section. Wen constructing ˆf Y te constrained optimisation step was usually implemented using te NAG routine E04UCF. Tis routine failed only rarely (e.g., in fewer tan one per cent of simulations in te Normal case), and it can be protected against tis problem in tose instances were it fails. An alternative is to use an easily-coded simulated annealing algoritm tere, similar to tat described by Braun and Hall ((2001), Section 3.2), using te penalty p(y) = {i : ξ i <m} ˆf (ξ i ) I{ ˆf (ξ i ) < 0} + {i : ξ i >m} ˆf (ξ i ) { ˆf (ξ i ) > 0}, were ξ 1,..., ξ ν denote grid points on wic te constraints were imposed, and m is a candidate for te mode. In eiter case, te algoritm can be speeded up by moving outlying data a little closer to te body of te distribution, rigt at te start. Tis reflects te teoretical results discussed in Section 3. Te simulations reported below used a small subroutine of tis type. Wen te estimators are applied to data from te standard normal distribution it is found tat a grap of MISE for te data sarpening estimator, using distance measure Ψ tan, lies below tat for te oter tree estimators across all values of. However, altoug tis estimator as a sligt advantage over its unconstrained kernel counterpart, at te optimal bandwidt it offers only negligible improvements relative to te oter two unimodal estimators. Neverteless te data tilting estimator, wen used wit a small suboptimal bandwidt, performs quite poorly relative to bot its data sarpening competitors. Analogous results are also obtained for oter very ligt-tailed distributions. As we see below, te

13 UNIMODAL DENSITY ESTIMATION 85 data sarpening Density 1 estimator based on Ψ tan really comes into its own wen applied Density 2 to distributions tat ave eavier tails tan te normal. Density 3 In particular, Density 4 Figure 2 plots te logaritm of MISE against te logaritm of bandwidt for all four estimators wen te sampled distribution is te eavytailed Student s t wit tree degrees of freedom. Te performance advantages of data sarpening using distance measure Ψ tan are obvious. Note too tat, in terms of optimal MISE performance, te unconstrained kernel metod lies between te two data sarpening approaces. x MSE Squared bias Variance Estimates -6 OR PSI L2 pi log(mise) log() Figure 2. Plots of MISE for distribution 2. Te vertical axis gives to te logaritm of MISE, and te orizontal axis gives te logaritm of bandwidt. Solid, bold-dased, dotted and dased lines correspond to te unconstrained estimate, data sarpening estimates constrained using distances Ψ tan and L 2, and te constrained estimate using data tilting, respectively. 0.2 Figure 2 also sows tat te optimal bandwidt for data sarpening, using Ψ tan, is smaller tan tat for te standard kernel estimator. Tis reflects te fact tat tis form of data sarpening produces an estimator wit smaller integrated variance, and sligtly larger integrated squared bias. Te optimal tradeoff between te two terefore occurs at a smaller bandwidt. Te main contribution to reduced integrated variance comes from te tails of te distribution, were data sarpening reduces te considerable stocastic variability of te standard kernel estimator by removing all te bumps tere. Variance actually increases in te middle of te distribution; see Figure 3. However, te latter property is not caracteristic of data sarpening, wic often reduces variance in te vicinity of te mode. In particular tis is te case for distributions 1 and 3. Figure 3, wic gives pointwise mean squared error, squared bias and variance in te case of data from distribution 2, sows tat in tis instance data

14 PETER HALL AND KEE-HOON KANG sarpening tends 120 to reduce bias towards te centre of te distribution. Tis is also observed 140 for distribution 1, but for distributions 3 and 4 bias sligtly Density 1 increasesdensity at te2centre. Density 3 Density log() log(mise) MSE Squared bias x x 5 10 Variance Estimates -6 OR PSI L2 pi Figure 3. Plots of pointwise mean squared error, squared bias and variance for distribution 2. Line types are as for Figure 2. In eac case te bandwidt was cosen to be tat wic minimises MISE, and te vertical axis is marked in units of x 5 10 Figure 2 revealed tat, wile data tilting outperformed L 2 data sarpening for te second distribution, it was not competitive wit te unconstrained kernel estimator in MISE terms. Figure 3 indicates wy. Te only way tilting can remove spurious modes in te tails of te standard kernel estimator is to give outlying data zero weigt. Tis forces relatively ig weigts to be used near te centre of te distribution, wit a corresponding large increase in bias, wic is clearly evident from te second panel of Figure 3. Te data tilting estimator commonly suffers from tis difficulty for distributions wit one or more relatively eavy tails. Sampling from te tird distribution, wic as te skewed density sown in te tird panel of Figure 1, one again finds tat te data sarpening estimator based on distance measure Ψ tan as lowest minimum MISE. For tis distribution

15 UNIMODAL DENSITY ESTIMATION 87 te data tilting3metod as igest MISE of all four estimators. Tis is again a result of its relatively large bias towards te centre of te distribution, wic occurs for te reasons 6 noted in te previous paragrap; in te case of te tird 8 distribution te left and tail is relatively eavy. 20 Performance 40 advantages of data sarpening are even more clearly evident for te fourt 60 distribution, were te MISE curve for te estimator based on 80 Ψ tan lies below tat for any of te oter two estimators over muc of its range 100 on eiter side 120 of its minimum. On tis occasion, since te rigt and tail of te distribution is140 relatively eavy, te tecnique based on data tilting again performs Density 1 poorly. However, Density 2 in MISE terms L 2 data sarpening produces an estimator wic performsdensity almost 3 as well wen sarpening is based on Ψ tan, and wic performs Density 4 better tan te unconstrained kernel estimator. log() log(mise) MSE Squared bias Variance Estimates L2 OR PSI OR x p i Figure 4. Kernel estimates, and results of data sarpening, for a dataset from te second distribution. Line types are as in Figure 2. Te bold solid line represents te true density. In te lower panel te top and bottom graps sow ow data are moved in te cases of data sarpening based on L 2 and Ψ tan distance, respectively Figure 4 sows individual curve estimates, and te manner in wic data are sarpened, in te case of a dataset drawn from te second distribution. Te dataset cosen was tat for wic ISE was at its 75t percentile, wen te bandwidt was cosen to be optimal for minimising MISE of te unconstrained kernel

16 88 PETER HALL AND KEE-HOON KANG estimator. Tis particular dataset produces a markedly asymmetric density estimate regardless of estimator type. Tere are several outliers in te left and tail, wit te result tat te unconstrained estimate as two spurious modes tere. However, data in te rigt and tail are distributed in a manner wic suggests properties of a ligt tailed distribution, rater tan te eavy tailed distribution from wic tey came. As a result te standard kernel estimate as no unwanted bumps in te rigt and tail. Neiter does it ave any spurious modes in te neigbourood of te point at wic te global maximum is acieved. Terefore, except for te left-and tail, te estimate based on Ψ tan coincides almost exactly wit its unconstrained counterpart. In te left-and tail it produces a better fit tan data tilting. Te lower panel of Figure 4 sows ow data are sarpened to produce a unimodal estimate, wen te distance function is eiter L 2 or Ψ tan. In eac instance te data in te rigt and tail are virtually unaltered, and in te case of Ψ tan distance, excepting te two most extreme values, data are altered in only minor ways in te left and tail. (If te kernel used is compactly supported, for example te biweigt, tere are no canges at all except for te four data furtest to te left.) However, wen distance is measured in L 2 terms te canges to data in te left and tail are muc more widespread. We also examined te effect of data-driven bandwidt selection on te performance of our data-sarpened estimators. In particular, for eac sample of size n = 25 we cose te bandwidt using te metod suggested by Seater and Jones (1991), and compared te unconstrained estimates wit tose constructed by sarpening based on Ψ tan -distance. Mean integrated squared error was approximated by averaging integrated squared error over 200 simulations. Data sarpening reduced MISE by 13%, 18%, 12% and 8% in te cases of distributions 1 4, respectively. Of course, te main source of improvement comes from te tails and from te vicinity of te mode. As one would expect, given tat asymptotic performance of te data-sarpened estimator becomes increasingly like tat of its unconstrained counterpart as n increases (see Section 3), te margin of MISE improvement offered by data sarpening narrows for large n. Figure 5 compares our metod wit tat of Fougères (1997) for distribution 2 (Student s t wit tree degrees of freedom). Here, and for distribution 4, Fougères approac as a marginal advantage. However, te metod generally gives a relatively roug estimate, especially in sample sizes tat are larger tan tat (n = 25) used ere. A clearer idea of te rougness problem, for larger sample sizes, is obtainable from te figures of Fougères (1997). For distribution 1, Fougères approac gives te largest minimum mean squared error of all sape-constrained metods studied; in te case of distribu-

17 UNIMODAL DENSITY ESTIMATION Density 1 Density 2 tion 3, itdensity occupies 3 te rank between te data tilting metod and data sarpening, and in particular Density 4 is beind te latter. log(mise) MSE Squared bias Variance Estimates log() OR PSI L2 pi x Figure 5. Comparison of Fougères metod wit data sarpening using distance Ψ tan. In te context of Figure 2 (i.e., for distribution 2) te upper panel graps te logaritm of MISE. Kernel estimates, for te same sample as in Figure 4, are sown in te lower panel. In bot panels te dotted line corresponds to Fougères estimate, wile te bold-dased line sows te result for te data-sarpened estimate. In te upper panel te (ligt) solid line sows results for te unconstrained estimate, wile in te lower panel te (bold) solid line depicts te true density. Finally, we applied te sarpening approac based on Ψ tan distance, and data tilting, to te Buffalo snowfall dataset, wic is frequently used to explore te effect of bandwidt coice on density estimate sape; see e.g., Silverman (1986, p. 45) and Scott (1992, p. 137). Te dataset consists of te annual snowfalls, measured in inces, at Buffalo, New York, from 1910 to Figure 6 illustrates te unconstrained kernel estimate, and te results of data sarpening and data tilting, applied to tese data using te bandwidt = 6. Silverman (1986) discussed te use of tis bandwidt, for tese data, noting tat it gave a trimodal estimate wen applied to a standard kernel estimator. He observed tat doubling te size of te bandwidt gave a unimodal estimator in te unconstrained case.

18 Density 1 PETER HALL AND KEE-HOON KANG Density 2 Density 3 Density 4 log() 0.02 log(mise) Estimates x MSE Squared bias Variance -6 pi PSI OR x L Figure 6. Unconstrained kernel estimates, and results of data sarpening and data tilting, for Buffalo snowfall data. Te upper panel graps kernel estimates wose line types are as in Figure 2. Te middle panel indicates ow data are moved in te case of data sarpening based on Ψ tan distance. Data weigts for te tilting approac are sown in te lower panel. 80 x Tecnical Arguments 5.1. Proof of Teorem 3.1. Consider a general data sarpening algoritm in wic X is moved a distance D(X, Y) to Y. By a Taylor expansion, ˆf Y(x) ˆf X (x) 1 n 2 n K ( x X i + X i Y ) i K ( x X ) i sup K n 3 D(X, Y). (5.1) Given a particular algoritm A j, let Y j denote te corresponding sarpened dataset, and let ˆf Y j be te version of ˆf Y produced from Y j. By assumption, we can construct A 1 suc tat ˆf Y 1 is unimodal and 2 0 D(X, Y 1) 0 in probability

19 UNIMODAL DENSITY ESTIMATION 91 as n. Hence, by te definition of A 0, ˆfY 0 is unimodal and 2 0 D(X, Y 0) 0 (5.2) in probability. Let c = (c 1,..., c 4 ) and c = (c 1,..., c 4 ) be vectors suc tat a f < c 1 < c 1 < c 2 < c 2 < m f < c 3 < c 3 < c 4 < c 4 < b f. Ten bot c and c satisfy (3.2). Consider te algoritm A 2 constructed from A 0 by putting Y i = X i for X i S(c) and taking Y i to be defined by A 0 oterwise. Ten D(X, Y 2 ) D(X, Y 0 ). (5.3) It is readily proved tat for f F 0, sup ˆf X (x) f (x) = o p (1). (5.4) <x< Since 0 and K is compactly supported ten for all sufficiently large n, ˆf Y 2 = ˆf Y 0 outside S(c ); call tis result (R 1 ). Properties (5.1), employed in te case Y = Y 2, (5.3) and (5.4), and te fact tat 2 0 D(X, Y 0) 0 in probability, imply tat sup ˆf Y 2 f 0 in probability. It follows from tis result and te fact tat f is bounded away from 0 on an open set containing S(c ), tat wit probability converging to 1 as n, ˆf Y 2 is positive on [c 1, c 2 ] and negative on [c 3, c 4 ]. In conjunction wit (R 1) tis implies tat wit probability tending to 1, ˆfY 2 is unimodal on te wole real line; call tis result (R 2 ). It follows from (R 2 ) tat if te probability tat te inequality (5.3) is strict does not converge to 0, ten wit strictly positive probability, for arbitrarily large n, te algoritm A 2 produces a unimodal density estimator and at te same time strictly reduces te distance D(X, Y 0 ). However, by definition of A 0 tis is impossible. Terefore, wit probability converging to 1 te inequality at (5.3) is actually an identity. It follows from tis result and te definition of A 2 tat te probability tat A 0 fixes X i for eac i suc tat X i S(c), converges to 1 as n. Tis is equivalent to P f {Y i = X i for eac i suc tat X i S(c)} 1, wic is a weaker form of (3.4). To obtain te full force of (3.4), observe tat if A 0 involves mapping X i / S(c) to Y i S(c), ten wit probability converging to 1 we may produce a lesser value of D(X, Y) by instead mapping X i to a point in S(c )\S(c), wile retaining unimodality of ˆfY. However, tis contradicts te definition of A 0 as an algoritm tat minimises D(X, Y), and so te probability tat X i / S(c) is mapped to Y i S(c) must converge to 0.

20 92 PETER HALL AND KEE-HOON KANG Te second part of (3.5) follows from te tree conditions (5.1), employed in te case Y = Y 0, (5.2) and (5.4). Te first part of (3.5) may be derived similarly Proof of Teorem 3.2. Write Y i = X i 3 i and observe tat by a Taylor expansion, ˆf Y(x) = ˆf X (x) + 1 n i K ( x X ) ( i 2 n ) + O p 2 i n n were, ere and at (5.5) below, te remainder is of te stated form uniformly in x. Applying Teorem 3 of Komlós, Major and Tusnády (1975) we deduce tat ˆf X (x) = E{ ˆf X (x)} + (n 3 ) 1/2 U(x) + O p {(n 2 ) 1 log n}, (5.5) were U(x) = 1/2 W 0 {F (x y)} K (y) dy, W 0 is a standard Brownian bridge wose construction relative to te data depends on n, and F denotes te distribution function for wic f is te density. Terefore, noting tat 0, and assuming for te present tat n 2 i = O p (n), (5.6) we deduce tat ˆf Y (x) = E{ ˆf X (x)} + (n3 ) 1/2 U(x) + 1 n i K ( x X ) i + O p ( 3 0 log n) n (5.7) uniformly in x. Witout loss of generality te true mode is 0. Let L be a bounded, compactly supported function wit a continuous derivative, put i = L(X i /), and define t = x/ and a(t) = K (u) L(t + u) du. Ten (5.6) olds and 1 n n i K ( x X ) i = a(t) f(x) + o p ( 0 ) uniformly in x. Furtermore, (3.8) follows from te construction of te data sarpening algoritm. It remains to sow tat L can be cosen suc tat ˆf Y is unimodal. To tis end, observe tat E{ ˆf X (x)} = f (x) + o() = t f (0) + o() and f(x) = f(0)+o(1) uniformly in x suc tat x C 1, for any C 1 > 0. Combining (5.7) and te results in tis paragrap we see tat t f (0) + a(t) f(0) + (n 3 ) 1/2 U(x) + o p ( 0 ) ˆf Y(x) = uniformly in x C 1 E{ ˆf X (x)} + a(t) f(x) + (n3 ) 1/2 (5.8) U(x) + o p ( 0 ) uniformly in x.

21 UNIMODAL DENSITY ESTIMATION 93 Note too tat for any C 2 > 0, { sup x y }/( ) U(x) max [1, {log + (y/)} 1/2 ] = O p (1) (5.9) for 0 y C 2. Since K (0) = 0, K 0 on te positive real line, and L as a bounded derivative, ten a(t) = 0 {L (t + u) L (t u)} K (u) du. Let te support of K be [ C 3, C 3 ], let C 4 > 0 be muc larger tan C 3, and let C 5 > 0 be muc larger tan C 4. Given < A 1 < and A 2 < 0, coose L suc tat L(u) = A 1 u 2 /4 + A 2 u 3 /12 for u [ C 4, C 4 ], and L redescends very slowly, and wit very small values of te first tree derivatives, to 0 to te left of C 4 and to te rigt of C 4, vanising outside [ C 5, C 5 ]. Put B j = A j u>0 u K (u) du. Ten a(t) = B 1 +B 2 t on [ C 4 +C 3, C 4 C 3 ], and a redescends slowly to 0 to te left of C 4 +C 3 and to te rigt of C 4 C 3, vanising outside [ C 5 C 3, C 5 +C 3 ]. As A 2 increases, so too does te curvature of ˆfY near its global maximum, modulo te effect of te additive noise term (n 3 ) 1/2 U(x) at (5.8). However, in neigbouroods of widt O() of te origin, te noise is of te same order as te gradient of ˆf Y ; but te size of noise does not depend on A 2, wereas te absolute value of te gradient increases wit A 2. Arguing tus it follows from (5.8) and (5.9) tat if η > 0 is given, ten A 1, B 1, C 4 and C 5 may be cosen suc tat te probability tat ˆf Y is unimodal on ( ɛ, ɛ) exceeds 1 η for all sufficiently large n. Taking L to be a random function, wit O p (1) bounds applying to L and te support of L, we deduce tat te probability tat ˆf Y is unimodal converges to 1 as n. Tis is sufficient to ensure existence of te claimed algoritm A 1 ; in cases were tis particular construction does not produce a unimodal ˆfY we instead translate all data to a single point and rely on te unimodality of K to ensure tat of ˆfY. Note too tat ˆf Y = ˆf X outside an O p ()-neigbourood of te origin Proof of Teorem 3.3. Let m f < C <, put β = 2/(α + 2), and let δ (0, (4/α) β) be a small positive number. We take Y i = X i for C X i ξ 1 Z 1 β 0, were Z 1 = Z 1 (n) denotes a random function of te data wit te property lim lim inf P (ɛ < Z 1 < λ) = 1. (5.10) ɛ 0, λ n

22 94 PETER HALL AND KEE-HOON KANG Data X i (ξ 1, ξ 2 ], were ξ 2 = δ (4/α) 0, will be sarpened respectively to Y i = F 1 (i/n), wile data X i > ξ 2 will be sarpened in a manner tat we sall describe tree paragraps below. We call ξ 1 and ξ 2 te first and second breakpoints, respectively. Te size of te first breakpoint is tat of values x at wic te order of te stocastic error of ˆf X (x) is te same as te order of te quantity f (x) being estimated; note tat te bias of ˆf X (x) is always of strictly smaller order tan f (x). To appreciate wy te critical size is β 0, note tat te variance of ˆf X (x) is asymptotic to v n (x) 2 0 x α as bot n and x increase, in te sense tat te ratio of te variance and v n (x) is bounded away from zero and infinity. Moreover, f (x) /x α 1 is bounded away from zero and infinity as x. Terefore te relative stocastic error of te estimator ˆf X (x) of f (x) is of order 1 for values x suc tat v n (x) is of size x 2(α+1), i.e., for x suc tat x β 0. More concisely, using metods developed during te proof of Teorem 3.2, particularly te argument based on te Komlós-Major-Tusnády (1975) approximation, it may be proved tat if Z 2 = Z 2 (n) denotes te supremum of values z suc tat ˆf X (x) < 0 for C x z, ten (5.10) olds wit Z 1 replaced tere by Z 2. Terefore, at te very least, values of X i tat exceed Z 2 β 0 must be sarpened if te density estimator ˆf Y is to ave negative gradient at all points to te rigt of C. We take Z 1 < Z 2 in order to effect a sligt taper, to ensure te gradient remains negative on bot sides of te first breakpoint. Te size of te second breakpoint is cosen for tecnical reasons tat will become clear in te proof at (5.15) below. Tapering beyond te second breakpoint, used to ensure tat te density estimator as negative gradient tere, does not require any data value X i to be moved by more tan O p (X i ), uniformly in i. For example it is sufficient, modulo minor tapering in te neigbourood of ξ 2, to equally space te Y i s to te rigt of ξ 2, using te spacing F 1 (i/n) F 1 {(i 1)/n} were F 1 (i/n) is te supremum of values F 1 (j/n) ξ 2. Te density of data at any point x, were x C, is by assumption bounded by a constant multiple of x α. Terefore te total contribution to 2 0 D(X, Y) from sarpening tose X i s for wic X i > ξ 2, equals O p (n 2 0 ξ 2 ) x x α dx = O p (n 2 0 ξ 2 α 2 ) = o p (1), (5.11) provided α > 8 and δ in te definition of ξ 2 is sufficiently small. Te sum of te distances, multiplied by 2 0, troug wic data X i tat lie between te first and second breakpoints are sarpened, equals ξ2 } X i F 1 (i/n) = O p {n 2 0 (x α /n) 1/2 x a dx 2 0 X i (ξ 1,ξ 2 ] ξ 1

A = h w (1) Error Analysis Physics 141

A = h w (1) Error Analysis Physics 141 Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.

More information

Fast Exact Univariate Kernel Density Estimation

Fast Exact Univariate Kernel Density Estimation Fast Exact Univariate Kernel Density Estimation David P. Hofmeyr Department of Statistics and Actuarial Science, Stellenbosc University arxiv:1806.00690v2 [stat.co] 12 Jul 2018 July 13, 2018 Abstract Tis

More information

The Priestley-Chao Estimator

The Priestley-Chao Estimator Te Priestley-Cao Estimator In tis section we will consider te Pristley-Cao estimator of te unknown regression function. It is assumed tat we ave a sample of observations (Y i, x i ), i = 1,..., n wic are

More information

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx. Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions

More information

Boosting Kernel Density Estimates: a Bias Reduction. Technique?

Boosting Kernel Density Estimates: a Bias Reduction. Technique? Boosting Kernel Density Estimates: a Bias Reduction Tecnique? Marco Di Marzio Dipartimento di Metodi Quantitativi e Teoria Economica, Università di Cieti-Pescara, Viale Pindaro 42, 65127 Pescara, Italy

More information

Chapter 1. Density Estimation

Chapter 1. Density Estimation Capter 1 Density Estimation Let X 1, X,..., X n be observations from a density f X x. Te aim is to use only tis data to obtain an estimate ˆf X x of f X x. Properties of f f X x x, Parametric metods f

More information

MVT and Rolle s Theorem

MVT and Rolle s Theorem AP Calculus CHAPTER 4 WORKSHEET APPLICATIONS OF DIFFERENTIATION MVT and Rolle s Teorem Name Seat # Date UNLESS INDICATED, DO NOT USE YOUR CALCULATOR FOR ANY OF THESE QUESTIONS In problems 1 and, state

More information

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY (Section 3.2: Derivative Functions and Differentiability) 3.2.1 SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY LEARNING OBJECTIVES Know, understand, and apply te Limit Definition of te Derivative

More information

INFINITE ORDER CROSS-VALIDATED LOCAL POLYNOMIAL REGRESSION. 1. Introduction

INFINITE ORDER CROSS-VALIDATED LOCAL POLYNOMIAL REGRESSION. 1. Introduction INFINITE ORDER CROSS-VALIDATED LOCAL POLYNOMIAL REGRESSION PETER G. HALL AND JEFFREY S. RACINE Abstract. Many practical problems require nonparametric estimates of regression functions, and local polynomial

More information

lecture 26: Richardson extrapolation

lecture 26: Richardson extrapolation 43 lecture 26: Ricardson extrapolation 35 Ricardson extrapolation, Romberg integration Trougout numerical analysis, one encounters procedures tat apply some simple approximation (eg, linear interpolation)

More information

Differential Calculus (The basics) Prepared by Mr. C. Hull

Differential Calculus (The basics) Prepared by Mr. C. Hull Differential Calculus Te basics) A : Limits In tis work on limits, we will deal only wit functions i.e. tose relationsips in wic an input variable ) defines a unique output variable y). Wen we work wit

More information

Average Rate of Change

Average Rate of Change Te Derivative Tis can be tougt of as an attempt to draw a parallel (pysically and metaporically) between a line and a curve, applying te concept of slope to someting tat isn't actually straigt. Te slope

More information

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x) Calculus. Gradients and te Derivative Q f(x+) δy P T δx R f(x) 0 x x+ Let P (x, f(x)) and Q(x+, f(x+)) denote two points on te curve of te function y = f(x) and let R denote te point of intersection of

More information

Basic Nonparametric Estimation Spring 2002

Basic Nonparametric Estimation Spring 2002 Basic Nonparametric Estimation Spring 2002 Te following topics are covered today: Basic Nonparametric Regression. Tere are four books tat you can find reference: Silverman986, Wand and Jones995, Hardle990,

More information

Chapter 2 Limits and Continuity

Chapter 2 Limits and Continuity 4 Section. Capter Limits and Continuity Section. Rates of Cange and Limits (pp. 6) Quick Review.. f () ( ) () 4 0. f () 4( ) 4. f () sin sin 0 4. f (). 4 4 4 6. c c c 7. 8. c d d c d d c d c 9. 8 ( )(

More information

1. Consider the trigonometric function f(t) whose graph is shown below. Write down a possible formula for f(t).

1. Consider the trigonometric function f(t) whose graph is shown below. Write down a possible formula for f(t). . Consider te trigonometric function f(t) wose grap is sown below. Write down a possible formula for f(t). Tis function appears to be an odd, periodic function tat as been sifted upwards, so we will use

More information

ERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER*

ERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER* EO BOUNDS FO THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BADLEY J. LUCIE* Abstract. Te expected error in L ) attimet for Glimm s sceme wen applied to a scalar conservation law is bounded by + 2 ) ) /2 T

More information

A Locally Adaptive Transformation Method of Boundary Correction in Kernel Density Estimation

A Locally Adaptive Transformation Method of Boundary Correction in Kernel Density Estimation A Locally Adaptive Transformation Metod of Boundary Correction in Kernel Density Estimation R.J. Karunamuni a and T. Alberts b a Department of Matematical and Statistical Sciences University of Alberta,

More information

Bootstrap confidence intervals in nonparametric regression without an additive model

Bootstrap confidence intervals in nonparametric regression without an additive model Bootstrap confidence intervals in nonparametric regression witout an additive model Dimitris N. Politis Abstract Te problem of confidence interval construction in nonparametric regression via te bootstrap

More information

Chapter 2. Limits and Continuity 16( ) 16( 9) = = 001. Section 2.1 Rates of Change and Limits (pp ) Quick Review 2.1

Chapter 2. Limits and Continuity 16( ) 16( 9) = = 001. Section 2.1 Rates of Change and Limits (pp ) Quick Review 2.1 Capter Limits and Continuity Section. Rates of Cange and Limits (pp. 969) Quick Review..... f ( ) ( ) ( ) 0 ( ) f ( ) f ( ) sin π sin π 0 f ( ). < < < 6. < c c < < c 7. < < < < < 8. 9. 0. c < d d < c

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc

More information

Order of Accuracy. ũ h u Ch p, (1)

Order of Accuracy. ũ h u Ch p, (1) Order of Accuracy 1 Terminology We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, wic can be for instance te grid size or time step in a numerical

More information

The derivative function

The derivative function Roberto s Notes on Differential Calculus Capter : Definition of derivative Section Te derivative function Wat you need to know already: f is at a point on its grap and ow to compute it. Wat te derivative

More information

Copyright c 2008 Kevin Long

Copyright c 2008 Kevin Long Lecture 4 Numerical solution of initial value problems Te metods you ve learned so far ave obtained closed-form solutions to initial value problems. A closedform solution is an explicit algebriac formula

More information

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,

More information

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example, NUMERICAL DIFFERENTIATION James T Smit San Francisco State University In calculus classes, you compute derivatives algebraically: for example, f( x) = x + x f ( x) = x x Tis tecnique requires your knowing

More information

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines Lecture 5 Interpolation II Introduction In te previous lecture we focused primarily on polynomial interpolation of a set of n points. A difficulty we observed is tat wen n is large, our polynomial as to

More information

Some Review Problems for First Midterm Mathematics 1300, Calculus 1

Some Review Problems for First Midterm Mathematics 1300, Calculus 1 Some Review Problems for First Midterm Matematics 00, Calculus. Consider te trigonometric function f(t) wose grap is sown below. Write down a possible formula for f(t). Tis function appears to be an odd,

More information

Kernel Density Based Linear Regression Estimate

Kernel Density Based Linear Regression Estimate Kernel Density Based Linear Regression Estimate Weixin Yao and Zibiao Zao Abstract For linear regression models wit non-normally distributed errors, te least squares estimate (LSE will lose some efficiency

More information

MAT 145. Type of Calculator Used TI-89 Titanium 100 points Score 100 possible points

MAT 145. Type of Calculator Used TI-89 Titanium 100 points Score 100 possible points MAT 15 Test #2 Name Solution Guide Type of Calculator Used TI-89 Titanium 100 points Score 100 possible points Use te grap of a function sown ere as you respond to questions 1 to 8. 1. lim f (x) 0 2. lim

More information

7.1 Using Antiderivatives to find Area

7.1 Using Antiderivatives to find Area 7.1 Using Antiderivatives to find Area Introduction finding te area under te grap of a nonnegative, continuous function f In tis section a formula is obtained for finding te area of te region bounded between

More information

. If lim. x 2 x 1. f(x+h) f(x)

. If lim. x 2 x 1. f(x+h) f(x) Review of Differential Calculus Wen te value of one variable y is uniquely determined by te value of anoter variable x, ten te relationsip between x and y is described by a function f tat assigns a value

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximating a function f(x, wose values at a set of distinct points x, x, x 2,,x n are known, by a polynomial P (x

More information

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these. Mat 11. Test Form N Fall 016 Name. Instructions. Te first eleven problems are wort points eac. Te last six problems are wort 5 points eac. For te last six problems, you must use relevant metods of algebra

More information

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT LIMITS AND DERIVATIVES Te limit of a function is defined as te value of y tat te curve approaces, as x approaces a particular value. Te limit of f (x) as x approaces a is written as f (x) approaces, as

More information

Numerical Differentiation

Numerical Differentiation Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function

More information

7 Semiparametric Methods and Partially Linear Regression

7 Semiparametric Methods and Partially Linear Regression 7 Semiparametric Metods and Partially Linear Regression 7. Overview A model is called semiparametric if it is described by and were is nite-dimensional (e.g. parametric) and is in nite-dimensional (nonparametric).

More information

Higher Derivatives. Differentiable Functions

Higher Derivatives. Differentiable Functions Calculus 1 Lia Vas Higer Derivatives. Differentiable Functions Te second derivative. Te derivative itself can be considered as a function. Te instantaneous rate of cange of tis function is te second derivative.

More information

New Distribution Theory for the Estimation of Structural Break Point in Mean

New Distribution Theory for the Estimation of Structural Break Point in Mean New Distribution Teory for te Estimation of Structural Break Point in Mean Liang Jiang Singapore Management University Xiaou Wang Te Cinese University of Hong Kong Jun Yu Singapore Management University

More information

HOMEWORK HELP 2 FOR MATH 151

HOMEWORK HELP 2 FOR MATH 151 HOMEWORK HELP 2 FOR MATH 151 Here we go; te second round of omework elp. If tere are oters you would like to see, let me know! 2.4, 43 and 44 At wat points are te functions f(x) and g(x) = xf(x)continuous,

More information

Differentiation in higher dimensions

Differentiation in higher dimensions Capter 2 Differentiation in iger dimensions 2.1 Te Total Derivative Recall tat if f : R R is a 1-variable function, and a R, we say tat f is differentiable at x = a if and only if te ratio f(a+) f(a) tends

More information

Lecture 21. Numerical differentiation. f ( x+h) f ( x) h h

Lecture 21. Numerical differentiation. f ( x+h) f ( x) h h Lecture Numerical differentiation Introduction We can analytically calculate te derivative of any elementary function, so tere migt seem to be no motivation for calculating derivatives numerically. However

More information

Precalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here!

Precalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here! Precalculus Test 2 Practice Questions Page Note: You can expect oter types of questions on te test tan te ones presented ere! Questions Example. Find te vertex of te quadratic f(x) = 4x 2 x. Example 2.

More information

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 Mat 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 f(x+) f(x) 10 1. For f(x) = x 2 + 2x 5, find ))))))))) and simplify completely. NOTE: **f(x+) is NOT f(x)+! f(x+) f(x) (x+) 2 + 2(x+) 5 ( x 2

More information

Chapter 4: Numerical Methods for Common Mathematical Problems

Chapter 4: Numerical Methods for Common Mathematical Problems 1 Capter 4: Numerical Metods for Common Matematical Problems Interpolation Problem: Suppose we ave data defined at a discrete set of points (x i, y i ), i = 0, 1,..., N. Often it is useful to ave a smoot

More information

Teaching Differentiation: A Rare Case for the Problem of the Slope of the Tangent Line

Teaching Differentiation: A Rare Case for the Problem of the Slope of the Tangent Line Teacing Differentiation: A Rare Case for te Problem of te Slope of te Tangent Line arxiv:1805.00343v1 [mat.ho] 29 Apr 2018 Roman Kvasov Department of Matematics University of Puerto Rico at Aguadilla Aguadilla,

More information

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point MA00 Capter 6 Calculus and Basic Linear Algebra I Limits, Continuity and Differentiability Te concept of its (p.7 p.9, p.4 p.49, p.55 p.56). Limits Consider te function determined by te formula f Note

More information

Cubic Functions: Local Analysis

Cubic Functions: Local Analysis Cubic function cubing coefficient Capter 13 Cubic Functions: Local Analysis Input-Output Pairs, 378 Normalized Input-Output Rule, 380 Local I-O Rule Near, 382 Local Grap Near, 384 Types of Local Graps

More information

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households Volume 29, Issue 3 Existence of competitive equilibrium in economies wit multi-member ouseolds Noriisa Sato Graduate Scool of Economics, Waseda University Abstract Tis paper focuses on te existence of

More information

CHAPTER (A) When x = 2, y = 6, so f( 2) = 6. (B) When y = 4, x can equal 6, 2, or 4.

CHAPTER (A) When x = 2, y = 6, so f( 2) = 6. (B) When y = 4, x can equal 6, 2, or 4. SECTION 3-1 101 CHAPTER 3 Section 3-1 1. No. A correspondence between two sets is a function only if eactly one element of te second set corresponds to eac element of te first set. 3. Te domain of a function

More information

Parameter Fitted Scheme for Singularly Perturbed Delay Differential Equations

Parameter Fitted Scheme for Singularly Perturbed Delay Differential Equations International Journal of Applied Science and Engineering 2013. 11, 4: 361-373 Parameter Fitted Sceme for Singularly Perturbed Delay Differential Equations Awoke Andargiea* and Y. N. Reddyb a b Department

More information

SECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES

SECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES (Section.0: Difference Quotients).0. SECTION.0: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES Define average rate of cange (and average velocity) algebraically and grapically. Be able to identify, construct,

More information

Time (hours) Morphine sulfate (mg)

Time (hours) Morphine sulfate (mg) Mat Xa Fall 2002 Review Notes Limits and Definition of Derivative Important Information: 1 According to te most recent information from te Registrar, te Xa final exam will be eld from 9:15 am to 12:15

More information

Exam 1 Review Solutions

Exam 1 Review Solutions Exam Review Solutions Please also review te old quizzes, and be sure tat you understand te omework problems. General notes: () Always give an algebraic reason for your answer (graps are not sufficient),

More information

Local Orthogonal Polynomial Expansion (LOrPE) for Density Estimation

Local Orthogonal Polynomial Expansion (LOrPE) for Density Estimation Local Ortogonal Polynomial Expansion (LOrPE) for Density Estimation Alex Trindade Dept. of Matematics & Statistics, Texas Tec University Igor Volobouev, Texas Tec University (Pysics Dept.) D.P. Amali Dassanayake,

More information

2.8 The Derivative as a Function

2.8 The Derivative as a Function .8 Te Derivative as a Function Typically, we can find te derivative of a function f at many points of its domain: Definition. Suppose tat f is a function wic is differentiable at every point of an open

More information

Recall from our discussion of continuity in lecture a function is continuous at a point x = a if and only if

Recall from our discussion of continuity in lecture a function is continuous at a point x = a if and only if Computational Aspects of its. Keeping te simple simple. Recall by elementary functions we mean :Polynomials (including linear and quadratic equations) Eponentials Logaritms Trig Functions Rational Functions

More information

Chapter 5 FINITE DIFFERENCE METHOD (FDM)

Chapter 5 FINITE DIFFERENCE METHOD (FDM) MEE7 Computer Modeling Tecniques in Engineering Capter 5 FINITE DIFFERENCE METHOD (FDM) 5. Introduction to FDM Te finite difference tecniques are based upon approximations wic permit replacing differential

More information

Bootstrap prediction intervals for Markov processes

Bootstrap prediction intervals for Markov processes arxiv: arxiv:0000.0000 Bootstrap prediction intervals for Markov processes Li Pan and Dimitris N. Politis Li Pan Department of Matematics University of California San Diego La Jolla, CA 92093-0112, USA

More information

Math 161 (33) - Final exam

Math 161 (33) - Final exam Name: Id #: Mat 161 (33) - Final exam Fall Quarter 2015 Wednesday December 9, 2015-10:30am to 12:30am Instructions: Prob. Points Score possible 1 25 2 25 3 25 4 25 TOTAL 75 (BEST 3) Read eac problem carefully.

More information

REVIEW LAB ANSWER KEY

REVIEW LAB ANSWER KEY REVIEW LAB ANSWER KEY. Witout using SN, find te derivative of eac of te following (you do not need to simplify your answers): a. f x 3x 3 5x x 6 f x 3 3x 5 x 0 b. g x 4 x x x notice te trick ere! x x g

More information

Regularized Regression

Regularized Regression Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize

More information

Section 2.7 Derivatives and Rates of Change Part II Section 2.8 The Derivative as a Function. at the point a, to be. = at time t = a is

Section 2.7 Derivatives and Rates of Change Part II Section 2.8 The Derivative as a Function. at the point a, to be. = at time t = a is Mat 180 www.timetodare.com Section.7 Derivatives and Rates of Cange Part II Section.8 Te Derivative as a Function Derivatives ( ) In te previous section we defined te slope of te tangent to a curve wit

More information

Poisson Equation in Sobolev Spaces

Poisson Equation in Sobolev Spaces Poisson Equation in Sobolev Spaces OcMountain Dayligt Time. 6, 011 Today we discuss te Poisson equation in Sobolev spaces. It s existence, uniqueness, and regularity. Weak Solution. u = f in, u = g on

More information

Excursions in Computing Science: Week v Milli-micro-nano-..math Part II

Excursions in Computing Science: Week v Milli-micro-nano-..math Part II Excursions in Computing Science: Week v Milli-micro-nano-..mat Part II T. H. Merrett McGill University, Montreal, Canada June, 5 I. Prefatory Notes. Cube root of 8. Almost every calculator as a square-root

More information

Quantum Theory of the Atomic Nucleus

Quantum Theory of the Atomic Nucleus G. Gamow, ZP, 51, 204 1928 Quantum Teory of te tomic Nucleus G. Gamow (Received 1928) It as often been suggested tat non Coulomb attractive forces play a very important role inside atomic nuclei. We can

More information

1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist

1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist Mat 1120 Calculus Test 2. October 18, 2001 Your name Te multiple coice problems count 4 points eac. In te multiple coice section, circle te correct coice (or coices). You must sow your work on te oter

More information

The Complexity of Computing the MCD-Estimator

The Complexity of Computing the MCD-Estimator Te Complexity of Computing te MCD-Estimator Torsten Bernolt Lerstul Informatik 2 Universität Dortmund, Germany torstenbernolt@uni-dortmundde Paul Fiscer IMM, Danisc Tecnical University Kongens Lyngby,

More information

MA455 Manifolds Solutions 1 May 2008

MA455 Manifolds Solutions 1 May 2008 MA455 Manifolds Solutions 1 May 2008 1. (i) Given real numbers a < b, find a diffeomorpism (a, b) R. Solution: For example first map (a, b) to (0, π/2) and ten map (0, π/2) diffeomorpically to R using

More information

Deconvolution problems in density estimation

Deconvolution problems in density estimation Deconvolution problems in density estimation Dissertation zur Erlangung des Doktorgrades Dr. rer. nat. der Fakultät für Matematik und Wirtscaftswissenscaften der Universität Ulm vorgelegt von Cristian

More information

5.1 We will begin this section with the definition of a rational expression. We

5.1 We will begin this section with the definition of a rational expression. We Basic Properties and Reducing to Lowest Terms 5.1 We will begin tis section wit te definition of a rational epression. We will ten state te two basic properties associated wit rational epressions and go

More information

Kernel Smoothing and Tolerance Intervals for Hierarchical Data

Kernel Smoothing and Tolerance Intervals for Hierarchical Data Clemson University TigerPrints All Dissertations Dissertations 12-2016 Kernel Smooting and Tolerance Intervals for Hierarcical Data Cristoper Wilson Clemson University, cwilso6@clemson.edu Follow tis and

More information

Financial Econometrics Prof. Massimo Guidolin

Financial Econometrics Prof. Massimo Guidolin CLEFIN A.A. 2010/2011 Financial Econometrics Prof. Massimo Guidolin A Quick Review of Basic Estimation Metods 1. Were te OLS World Ends... Consider two time series 1: = { 1 2 } and 1: = { 1 2 }. At tis

More information

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING Statistica Sinica 13(2003), 641-653 EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING J. K. Kim and R. R. Sitter Hankuk University of Foreign Studies and Simon Fraser University Abstract:

More information

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LAURA EVANS.. Introduction Not all differential equations can be explicitly solved for y. Tis can be problematic if we need to know te value of y

More information

Click here to see an animation of the derivative

Click here to see an animation of the derivative Differentiation Massoud Malek Derivative Te concept of derivative is at te core of Calculus; It is a very powerful tool for understanding te beavior of matematical functions. It allows us to optimize functions,

More information

Journal of Computational and Applied Mathematics

Journal of Computational and Applied Mathematics Journal of Computational and Applied Matematics 94 (6) 75 96 Contents lists available at ScienceDirect Journal of Computational and Applied Matematics journal omepage: www.elsevier.com/locate/cam Smootness-Increasing

More information

Pre-Calculus Review Preemptive Strike

Pre-Calculus Review Preemptive Strike Pre-Calculus Review Preemptive Strike Attaced are some notes and one assignment wit tree parts. Tese are due on te day tat we start te pre-calculus review. I strongly suggest reading troug te notes torougly

More information

A Jump-Preserving Curve Fitting Procedure Based On Local Piecewise-Linear Kernel Estimation

A Jump-Preserving Curve Fitting Procedure Based On Local Piecewise-Linear Kernel Estimation A Jump-Preserving Curve Fitting Procedure Based On Local Piecewise-Linear Kernel Estimation Peiua Qiu Scool of Statistics University of Minnesota 313 Ford Hall 224 Curc St SE Minneapolis, MN 55455 Abstract

More information

232 Calculus and Structures

232 Calculus and Structures 3 Calculus and Structures CHAPTER 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS FOR EVALUATING BEAMS Calculus and Structures 33 Copyrigt Capter 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS 17.1 THE

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics 1

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics 1 Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics 1 By Jiti Gao 2 and Maxwell King 3 Abstract We propose a simultaneous model specification procedure for te conditional

More information

Linearized Primal-Dual Methods for Linear Inverse Problems with Total Variation Regularization and Finite Element Discretization

Linearized Primal-Dual Methods for Linear Inverse Problems with Total Variation Regularization and Finite Element Discretization Linearized Primal-Dual Metods for Linear Inverse Problems wit Total Variation Regularization and Finite Element Discretization WENYI TIAN XIAOMING YUAN September 2, 26 Abstract. Linear inverse problems

More information

An Empirical Bayesian interpretation and generalization of NL-means

An Empirical Bayesian interpretation and generalization of NL-means Computer Science Tecnical Report TR2010-934, October 2010 Courant Institute of Matematical Sciences, New York University ttp://cs.nyu.edu/web/researc/tecreports/reports.tml An Empirical Bayesian interpretation

More information

Mathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative

Mathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative Matematics 5 Workseet 11 Geometry, Tangency, and te Derivative Problem 1. Find te equation of a line wit slope m tat intersects te point (3, 9). Solution. Te equation for a line passing troug a point (x

More information

Math 1210 Midterm 1 January 31st, 2014

Math 1210 Midterm 1 January 31st, 2014 Mat 110 Midterm 1 January 1st, 01 Tis exam consists of sections, A and B. Section A is conceptual, wereas section B is more computational. Te value of every question is indicated at te beginning of it.

More information

Continuity and Differentiability Worksheet

Continuity and Differentiability Worksheet Continuity and Differentiability Workseet (Be sure tat you can also do te grapical eercises from te tet- Tese were not included below! Typical problems are like problems -3, p. 6; -3, p. 7; 33-34, p. 7;

More information

Notes on wavefunctions II: momentum wavefunctions

Notes on wavefunctions II: momentum wavefunctions Notes on wavefunctions II: momentum wavefunctions and uncertainty Te state of a particle at any time is described by a wavefunction ψ(x). Tese wavefunction must cange wit time, since we know tat particles

More information

Introduction to Derivatives

Introduction to Derivatives Introduction to Derivatives 5-Minute Review: Instantaneous Rates and Tangent Slope Recall te analogy tat we developed earlier First we saw tat te secant slope of te line troug te two points (a, f (a))

More information

IEOR 165 Lecture 10 Distribution Estimation

IEOR 165 Lecture 10 Distribution Estimation IEOR 165 Lecture 10 Distribution Estimation 1 Motivating Problem Consider a situation were we ave iid data x i from some unknown distribution. One problem of interest is estimating te distribution tat

More information

Integral Calculus, dealing with areas and volumes, and approximate areas under and between curves.

Integral Calculus, dealing with areas and volumes, and approximate areas under and between curves. Calculus can be divided into two ke areas: Differential Calculus dealing wit its, rates of cange, tangents and normals to curves, curve sketcing, and applications to maima and minima problems Integral

More information

3.1 Extreme Values of a Function

3.1 Extreme Values of a Function .1 Etreme Values of a Function Section.1 Notes Page 1 One application of te derivative is finding minimum and maimum values off a grap. In precalculus we were only able to do tis wit quadratics by find

More information

CDF and Survival Function Estimation with Infinite-Order Kernels

CDF and Survival Function Estimation with Infinite-Order Kernels CDF and Survival Function Estimation wit Infinite-Order Kernels Artur Berg and Dimitris N. Politis Abstract An improved nonparametric estimator of te cumulative distribution function CDF) and te survival

More information

Nonparametric regression on functional data: inference and practical aspects

Nonparametric regression on functional data: inference and practical aspects arxiv:mat/0603084v1 [mat.st] 3 Mar 2006 Nonparametric regression on functional data: inference and practical aspects Frédéric Ferraty, André Mas, and Pilippe Vieu August 17, 2016 Abstract We consider te

More information

New families of estimators and test statistics in log-linear models

New families of estimators and test statistics in log-linear models Journal of Multivariate Analysis 99 008 1590 1609 www.elsevier.com/locate/jmva ew families of estimators and test statistics in log-linear models irian Martín a,, Leandro Pardo b a Department of Statistics

More information

Kernel Density Estimation

Kernel Density Estimation Kernel Density Estimation Univariate Density Estimation Suppose tat we ave a random sample of data X 1,..., X n from an unknown continuous distribution wit probability density function (pdf) f(x) and cumulative

More information

Neutron transmission probability through a revolving slit for a continuous

Neutron transmission probability through a revolving slit for a continuous Neutron transmission probability troug a revolving slit for a continuous source and a divergent neutron beam J. Peters*, Han-Meitner-Institut Berlin, Glienicer Str. 00, D 409 Berlin Abstract: Here an analytical

More information

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems 5 Ordinary Differential Equations: Finite Difference Metods for Boundary Problems Read sections 10.1, 10.2, 10.4 Review questions 10.1 10.4, 10.8 10.9, 10.13 5.1 Introduction In te previous capters we

More information

Estimation of boundary and discontinuity points in deconvolution problems

Estimation of boundary and discontinuity points in deconvolution problems Estimation of boundary and discontinuity points in deconvolution problems A. Delaigle 1, and I. Gijbels 2, 1 Department of Matematics, University of California, San Diego, CA 92122 USA 2 Universitair Centrum

More information

Taylor Series and the Mean Value Theorem of Derivatives

Taylor Series and the Mean Value Theorem of Derivatives 1 - Taylor Series and te Mean Value Teorem o Derivatives Te numerical solution o engineering and scientiic problems described by matematical models oten requires solving dierential equations. Dierential

More information

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA EXAMINATION MODULE 5

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA EXAMINATION MODULE 5 THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA EXAMINATION NEW MODULAR SCHEME introduced from te examinations in 009 MODULE 5 SOLUTIONS FOR SPECIMEN PAPER B THE QUESTIONS ARE CONTAINED IN A SEPARATE FILE

More information