A robust, adaptive M-estimator for pointwise estimation in heteroscedastic regression

Size: px
Start display at page:

Download "A robust, adaptive M-estimator for pointwise estimation in heteroscedastic regression"

Transcription

1 Bernoulli 203), 2014, DOI: /13-BEJ533 A robust, adaptive M-estimator for pointwise estimation in eteroscedastic regression MICHAËL CHICHIGNOUD * and JOHANNES LEDERER ** Seminar for Statistics, ETH Züric, Rämistrasse 101, CH-8092 Züric, Switzerland. * cicignoud@stat.mat.etz.c; ** lederer@stat.mat.etz.c We introduce a robust and fully adaptive metod for pointwise estimation in eteroscedastic regression. We allow for noise and design distributions tat are unknown and fulfill very weak assumptions only. In particular, we do not impose moment conditions on te noise distribution. Moreover, we do not require a positive density for te design distribution. In a first step, we study te consistency of locally polynomial M-estimators tat consist of a contrast and a kernel. Afterwards, minimax results are establised over unidimensional Hölder spaces for degenerate design. We ten coose te contrast and te kernel tat minimize an empirical variance term and demonstrate tat te corresponding M-estimator is adaptive wit respect to te noise and design distributions and adaptive Huber) minimax for contamination models. In a second step, we additionally coose a data-driven bandwidt via Lepski s metod. Tis leads to an M-estimator tat is adaptive wit respect to te noise and design distributions and, additionally, adaptive wit respect to te smootness of an isotropic, multivariate, locally polynomial target function. Tese results are also extended to anisotropic, locally constant target functions. Our data-driven approac provides, in particular, a level of robustness tat adapts to te noise, contamination, and outliers. Keywords: adaptation; Huber contrast; Lepski s metod; M-estimation; minimax estimation; nonparametric regression; pointwise estimation; robust estimation 1. Introduction We introduce a new metod for pointwise estimation in eteroscedastic regression tat is adaptive wit respect to te model, in particular, wit respect to te noise and te design distribution Dadaptive) and te smootness of te regression function S-adaptive). Let us first briefly summarize te related literature. First, te seminal paper 14] contains a proof of te asymptotic normality of M-estimators for te location parameter in regular models. Furtermore, te series of papers 30 33] provide minimax results for nonparametric regression. More recently, a block median metod was used in 5] to prove te asymptotic equivalence between Gaussian regression and omoscedastic regression for deterministic designs and possibly eavy-tailed noises. Using a blockwise Stein s Metod wit wavelets, tis leads to an S-adaptive estimator tat is adaptive optimal over Besov spaces wit respect to te L 2 -risk and adaptive optimal over isotropic Hölder classes wit respect to te punctual risk. Moreover, using an estimate of te noise density at 0 and a plug-in metod, tis also leads to a D-adaptive estimator. However, in contrast to tis paper, only omoscedastic regression is considered and multivariate regression functions, in particular anisotropic functions, are not allowed for. Next, a modified version of Lepski s metod was applied for omoscedastic regression in 27]. Finally, local M- estimators, also for regression models wit degenerate designs, were intensively studied in te ISI/BS

2 A fully adaptive pointwise M-estimator 1561 case of Gaussian regression: S-adaptivity results of a local least squares estimator were derived in 8], sup-norm S-minimax results were establised in 9], and te effect of degenerate designs on te L 2 -norm was investigated wit wavelet-type estimators in 1]. However, in contrast to tis paper, pointwise estimation wit random, possibly degenerate designs and eteroscedastic, possibly eavy-tailed noises as not been included. Wat is te main idea beind our approac? Consider te estimation of t 0 R in te translation model Y g t 0 ) for a probability density g. Te M-estimator ˆt of t 0 corresponding to te contrast ρ ) and te sample Y 1,...,Y n of Y is ten It olds tat see 14 16]) ˆt := arg min t ρy i t). n ˆt t 0 ) L ρ ) 2 dg N 0, AV), were AV := n ρ dg) 2, 1.1) G is te distribution of Y t 0, ρ ) and ρ ) are te first and second derivatives of te contrast ρ ), and L indicates convergence in law. In oter words, ˆt is asymptotically normal wit asymptotic variance AV. Tis result suggests tat an optimal estimator is obtained by minimizing te asymptotic variance. Moreover, te Crámer Rao Inequality and see 14]) inf ρ ρ ) 2 dg ρ dg) 2 = IG) ) 1, 1.2) were I ) is te Fiser information and te infimum is taken over all twice differentiable contrasts, imply tat tis M-estimator is efficient. Huber proposed in 14], Proposal 3, to minimize an estimate of te above asymptotic variance since te distribution G is not available in practice) over te family of Huber contrasts teir definition is given below). He also conjectured tat te corresponding estimator is minimax for certain contamination models for more details, see Section A.1 in te arxiv version). More recently, in 2], an M-estimator wit a contrast tat minimizes an estimate of te asymptotic variance was introduced for te parametric model, its asymptotic normality was proved, and especially Huber contrasts indexed by teir scale and a family of l p losses were considered. In a first step, we derive general properties of M-estimators suc as pointwise risk bounds. Tis includes, in particular, S-minimax results for degenerate designs and allows us to recover results in 7] see Teorem 1 and Remark 1). In a second step, we ten consider a local M- estimator tat consists of a contrast and a kernel tat minimize an estimate of te variance and sow, in particular, tat tis estimator mimics te oracle, wic minimizes te true variance. Our data-driven approac can be used, for example, for te selection of te scale of te Huber contrast wit an adaptive robustness wit respect to outliers or for te selection of a suitable even noncentered or nonconvex) support tat takes a maximal number of points around x 0 into account cf. 12] for te latter objective). Finally, we sow tat our estimator is, under some restrictions on te design and te noise level see Condition 3), D-adaptive for various sets of contrasts and kernels wit finite entropy.

3 1562 M. Cicignoud and J. Lederer We finally study simultaneous D- and S-adaptation for anisotropic target functions. In a first step, we study te case of isotropic target functions, were te standard Lepski s metod see 23, 24]) can be applied. To tis end, we assume tat te variance of te estimator is decreasing wit respect to te bandwidt and plug-in an estimate of te minimal variance for te D-adaptation to apply Lepski s metod for te S-adaptation see Section 4.1). Tis yields te first estimator in eteroscedastic regression wit random designs and eavy-tailed noise distributions tat is simultaneously D- and S-adaptive and optimal in a sense describe later. Furtermore, we note tat applications of Lepski s metod to nonlinear estimators are still nonstandard and can only be found in a small number of examples in te literature 6,26,27]. In a next step, we extend our results to anisotropic target functions. For tis, we restrict ourselves to locally constant target functions and omoscedastic regression wit uniform design and apply a modification of Lepski s metod given in 19,22] to construct an optimal, simultaneously S- and D-adaptive estimator. Tis is te first application of Lepski s metod to nonlinear estimators of anisotropic target functions and yields a selection of an anisotropic bandwidt wic is of great interest for applications in te context of image denoising cf. 17]), for example. Altoug we consider estimation problems, our approac may also be useful for inference, for example, for te construction of confidence bands. Wile confidence bands for parametric estimation are derived from central limit teorems see 1.1)), confidence bands for nonparametric regression are especially desired to be adaptive wit respect to te smootness of te target function. Te construction of suc S-adaptive confidence bands is more difficult tan in te parametric case see 13]), but since Lepski-type procedures ave already been used in tis context, see 10], Teorem 1 and Corollary 1, we expect tat our approac may be useful for te construction of S-adaptive confidence bands for regression wit possibly eavy-tailed noises see Section 5 for a discussion of some tecnical aspects). Eventually, if for example te smootness is known, our approac may be used, plugging an estimate of te variance in te confidence band, to obtain D-adaptive confidence bands, wic are, in particular, adaptive wit respect to te design and te noise distributions. Te structure of tis paper is as follows: In te following section, we first introduce an estimator wic satisfies a risk bound see Teorem 1). So, S-minimax results are deduced over Hölder spaces see Corollaries 1, 2 and 3). We ten provide a coice for te contrast and te kernel see Teorem 2) via te minimization of a nonasymptotic variance. Ten, we provide a coice for te bandwidt for isotropic, locally polynomial target functions see Teorem 3) and for anisotropic, locally constant target functions see Teorem 4). After tis, we give a discussion on our assumptions and an outlook in Section 5. Te proofs are finally conducted in Section 6 and in te Appendix. For conciseness, only te crucial proofs are presented ere. For te remaining proofs and more details, in particular, on te parametric model and on a comparison to classical results, we refer to te longer version available on arxiv and te webpages of te autors. 2. Preliminary definitions and results In tis section, we give some preliminary definitions and results. After specifying te model, we introduce a first estimator and ten, we present a risk bound and S-minimax properties of tis estimator.

4 A fully adaptive pointwise M-estimator 1563 Let us first specify te model. Te observations X i,y i ),...,n satisfy te set of equations Y i = f X i ) + σx i )ξ i, i = 1,...,n, 2.1) and are distributed according to te probability measure P := P n) f wit associated expectation E := E n) f. We aim at estimating te target function f : 0, 1] d M,M] for M>0) at a given point x 0 on 0, 1) d. Te target function is assumed to be smoot, more specifically, it is assumed to belong to a Hölder class see Definition 4 below). Te target function is obscured by te second part of te above model, te noise. Te noise variables ξ i ) i 1,...,n areassumedtobe distributed independently according to te densities g i ) wit respect to te Lebesgue measure on R. Te noise densities g i ) may be unknown but are assumed to be symmetric. We stress tat we do not impose, unlike in te literature on te median cf. 5]), any moment assumptions on te noise, and we do not require tat te noise densities are positive at 0. We postpone te detailed discussion on te assumptions to te end of te next section. Te noise level σ : 0, 1] d 0, ) is assumed to be bounded, but may also be unknown. Usually, te noise level is te variance of te noise, owever, tis is not te case if te noise distributions do not ave any moments, for example. Finally, te design points X i ) i 1,...,n are assumed to be distributed independently and identically according to te density μ ) wit respect to te Lebesgue measure on R. We assume tat μ ) vanises at most finitely many points. For ease of exposition, we also assume tat X i ) i 1,...,n and ξ i ) i 1,...,n are mutually independent. Next, we introduce an estimator of f x 0 ) wit a local polynomial approac LPA) for a fixed bandwidt, a fixed kernel, and a fixed contrast. Te key idea of te LPA, as described for example in 18] orin34], Capter 1, is to approximate te target function in a neigborood of size 0, 1] d of a given point x 0 by a polynomial. To start, we define for a fixed m N te set P := {p = p 1,...,p d ) N d :0 p m} wit p =p 1 + +p d and denote its cardinality by P. Te cardinality P is exponential in d and enters te bounds derived below as a factor. For any multi-indexed column vector t = t p1,...,p d R : p P) R P and for any x 0, 1] d, we ten define te desired polynomial as ) x P t x) := t x0 U := ) x p x0 t p. p P Here, z p := z p 1 1 zp d d for all z R d, and te division by is understood coordinate wise. Next, for M>0, we define F := {P t : t M,M] P } as a set of polynomials of degree at most m. We now specify wat we mean by a kernel and a contrast: Definition 1. A function K : R d 0, ) is called kernel function) if it as te following properties: 1. K ) as a not necessarily symmetric) support wic is a ypercube aving edge lengt one and contains te origin; 2. K < and Kx)dx = 1. For ease of exposition, we set := d j=1 j and use te notation K ) := K x 0 )/)/ at some points. Moreover, we define te neigborood of x 0 of size as V :=

5 1564 M. Cicignoud and J. Lederer {x R d : K x) > 0} and assume for simplicity tat te kernel is cosen suc tat V 0, 1] d. Next, we specify wat we mean by a contrast: Definition 2. A function ρ : R 0, ) is called contrast function) if it as te following properties: 1. ρ ) is convex, symmetric and ρ0) = 0; 2. te derivative ρ ) of ρ ) is 1-Lipscitz and bounded; 3. te second derivative ρ ) of ρ ) is defined Lebesgue almost everywere and is 1-Lipscitz wit respect to te measure P. Moreover, ρ 1. Te constants in te Lipscitz condition and te boundedness condition in te last definition are set to 1 for ease of exposition only. Well-known contrasts are te Huber contrast see 14]), for any scale γ>0 and z R, { z ρ H,γ z) := 2 /2, if z γ, γ z γ/2 ), oterwise, 2.2) and te contrast induced by te arctan function see 30]) ρ arc,γ z) := γzarctanz/γ ) γ 2 2 ln 1 + z 2 /γ 2). 2.3) Note tat te square loss and te absolute loss do not satisfy te above definition. However, tey can be mimicked by te Huber contrast wit γ small median) and γ large mean). Let us define, for any function ζ, te empirical measure as P n ζ := n 1 n ζx i,y i ). We can now combine a kernel and a contrast to obtain te λ-lpa estimator fˆ λ x 0 ) of f x 0 ) defined as: ˆ f λ := arg min f F P nλf ), were λf )x, y) := ρ y fx) ) K x) for x 0, 1] d and y R. 2.4) Te coefficients of te estimated polynomial can be considered as estimators of te derivatives of te function f at x 0. In tis paper, owever, we focus on te estimation of f x 0 ) A first risk bound In tis section, we present a risk bound for te estimator introduced above. Tis estimator involves, in particular, fixed contrasts, kernels and bandwidts. To ease te presentation, we introduce some additional definitions. First, we define te best approximation of te target f in F as { f 0 := arg min sup fx) f x) } : f F,fx 0 ) = f x 0 ) 2.5) x V

6 A fully adaptive pointwise M-estimator 1565 and te associated bias term as b F) := sup x V f 0 x) f x). 2.6) Te minimum is not necessarily unique, but all minimizers work for our derivations. We ten fix a multi-indexed vector t 0 = tp 0 1,...,p d ) p P suc tat P t 0 = f 0. We recall tat te entropy wit bracketing of a set of functions A for a given radius u>0 wit respect to a pseudo)metric is te logaritm of te minimal number of pairs of functions f j) 1,f j) 2 ) A A suc tat for any f A, tere is a couple f j) 1,f j) 2 ) suc tat f j) 1 f f j) 2 and f j) 1,f j) 2 ) u. Here, in particular, H F ) denotes te entropy wit bracketing of F wit respect to te pseudometric EP n λ f 1 ) λ f 2 )] 2,f 1,f 2 F, were λ f )x, y) := ρ y fx) ) K x) for x 0, 1] d and y R. 2.7) Te entropy H F ) cannot be calculated if te probability law is unknown. However, it can be upper bounded invoking an upper bound for te pseudometric. For tis, one may use tat EP n λ f 1 ) λ f 2 ) ] 2 K t 1) t 2) 1 due to te continuity of ρ ) and te definition of F. Here, t 1) and t 2) M,M] P are suc tat P t 1) = f 1 and P t 2) = f 2, respectively. Terefore, te entropy H F ) can be bounded by P times te entropy of M,M] wit respect to te Euclidean distance multiplied by K tis is, in particular, independent of n). As a next step, we introduce te condition under wic we derive te risk bound. Condition 1. Let ρ ) be a contrast, K ) a kernel, n {1, 2,...}, and 0, 1] d. We say tat Condition 1 is satisfied if te smallest eigenvalue of te matrix 1 n E U Xi x 0 ) ) U Xi x 0 ρ ] ) σx i )ξ i K X) is positive, n 1, and, defining δ := 2 P 2 E K X) ] b F) + 54 ρ E K 2X)]+ K / n ) ] n lnn P ) H 1/2 F u) du + H, F 1)) 1 tat 4 ) 1 b F) + δ inf x V n E ρ )] σx)ξ i. 2.8) Condition 1 can be interpreted in te following sense: n must be sufficiently large and appropriate for te setting under consideration. In particular,, as a function of n, is usually

7 1566 M. Cicignoud and J. Lederer cosen suc tat 0 and n as n to satisfy Condition 1. We postpone a detailed discussion of Condition 1 to after te main result of tis section. Te variance term of te estimator is crucial for te following. To state it explicitly, we need to introduce some more notation: First, we introduce λ similarly as λ in 2.7)) as λ f )x, y) := ρ y fx) ) K x) for x 0, 1] d and y R. We ten introduce te crucial quantity EP n λ Vλ) := f )] 2 + ρ K ln 2 n)/ ) n 2 EP n λ f. 2.9) ) We call it nonasymptotic variance, since it plays te role of te variance in te risk bounds in te teorems below. From Condition 1 and Definitions 1 and 2, we conclude tat Vλ) <. Te term ρ ln K 2 n) n depends on and n. However, te bandwidt is typically cosen suc tat n for n so tat tis term vanises asymptotically. Additionally, besides te normalization in front of te first term, a dependence on is given troug λ. We will discuss tis after giving te main result of tis section. If = 1,...,1) parametric case), te nonasymptotic variance Vλ) tends towards te asymptotic variance AVλ) defined in 1.1) as n. Te main result of tis section reads as te following. Teorem 1. Let λ be as in 2.4), n {1, 2,...}, and 0, 1] d suc tat Condition 1 is satisfied. Ten, for all q 1, E fˆ λ x 0 ) f x 0 ) q C q b F) H 1/2 F u) du + 4H ] ) F 1) Vλ) q ln 2 n) q Mq n n 2 for a constant C q C q = 4q P 68 q Gammaq) works, were Gamma ) is te classical Gamma function). Te proof can be easily deduced integrating te result of Proposition 3 and using Proposition 2 te propositions can be found in te Appendix, see also te more detailed arxiv version). Remark 1. In contrast to Huber s asymptotic results see 14] and also 2,30 33]), te above teorem olds for finite but sufficiently large) sample sizes n. We note tat te desired variance term Vλ) is found up to constants, wic are of minor interest for tis paper. Moreover, a wide range of designs including degenerate designs, e.g.) and noise levels including zero noise, e.g.) is covered. Let us compare tis result to 7]: assume tat d = 1 and te noise σ X i )ξ i ) i is identically and independently normal distributed wit variance σ>0, and consider te local Huber estimator wit ρ ) = ρ H,lnn) ) 2.2), were γ = lnn), and te indicator kernel K ) = 1 1/2,1/2] ). As we mentioned above, te Huber estimator, wit a large parameter γ, mimics

8 A fully adaptive pointwise M-estimator 1567 te local least squares estimator. Indeed it olds Vλ) σ n n. x 0 +/2 x 0 /2 μx) dx Te term on te rigt-and side is te classical standard deviation of te local least squares estimator. Teorem 1 ten implies te results of 7], Teorem 1 and Proposition 1, in te Gaussian case and extends tem to eteroscedastic, eavy-tailed noises. Remark 2. Wile te above bound is to te best of our knowledge already a new result, te final goal is to provide a specific λ tat minimizes tis bound since te second term 2 q M q /n 2 is neglectable and since te bias term b F) is independent of λ. However, te bandwidt, wic accounts for te smootness of te target function, is included in Vλ). Tis makes simultaneous D- and S-adaptation difficult. Te specific dependences of te numerator and te denominator on can be deduced from EP n λ f )] 2 ρ = μx)k 2 x) σx)z )] 2 n 1 g i z) dz dx 2.10) i and EP n λ f ) = μx)k x) ρ σx)z ) n 1 i g i z) dz dx. 2.11) We study tis in detail in te following section for tree examples. Discussion of Condition 1. Te condition > 0 is fulfilled in many examples. Indeed, wit a cange of variables and by te definition of K ), we obtain 1 n = E U Xi x 0 ) ) U Xi x 0 ρ ] ) σx i )ξ i K X) Ux)U x)μx 0 + x)kx) ρ σx 0 + x)z ) 1 n g i z) dz dx. According to 34], Lemma 1.6, a sufficient condition for > 0 is tus tat μx 0 + x)kx) ρ σx 0 + x)z ) 1 g i z) dz>0 2.12) n for all x in some set in te kernel support wit positive Lebesgue measure. Recall tat μx 0 + ) is positive almost everywere in te support of K ) since μ ) vanises only at finitely many points. Te condition > 0 is tus fulfilled if inf x V ρ σx)z ) 1 n g i z) dz> )

9 1568 M. Cicignoud and J. Lederer Tis condition is satisfied, for example, for all densities g i ) and bounded σ ) if te contrast function is strictly convex. Tis olds true for ρ arc,γ ) see 2.3)). Te Huber contrast ρ H,γ ) see 2.2)), owever, is strictly convex on te interval γ,γ) only. It olds tat ρ H,γ ) = 1 γ,γ] ); terefore, te densities g i ) ave to satisfy te additional constraint inf x V 1 γ,γ] σx)z ) 1 n g i z) dz>0 to ensure > 0 in tis case. If we assume, for simplicity, tat te noise level is constant σ ) σ>0, te last constraint simplifies to γ/σ γ/σ n 1 n g i z) dz >0. So even for te Huber contrast wit a fixed γ>0, te assumption > 0 is weaker tan te standard assumption in te literature of g i ) being positive and continuous in te origin for all i {1, 2,...,n}. For te oter crucial part of te condition, we first note tat for 0, te quantity on te rigt-and side of 2.8) tends to a positive constant if σ ) is continuous in x 0. Indeed, since ρ is P-continuous, it olds tat 1 inf x V n E ρ σx)ξ i )] 1 n E ρ )] σx 0 )ξ i 2.14) as 0. Similarly as above, te quantity on te rigt-and side can be lower bounded by a positive constant for many contrasts and noise densities. We now give te rate for te quantity δ. It olds tat δ 1 E K X) ] E K 2 b F) + X) lnn) + K ] lnn), n n wic sould be cf. 2.8)) bounded by a constant. Here, indicates te asymptotic dependence on n. Te above display corresponds to a so-called bias variance decomposition up to a factor lnn). We also note tat if f is continuous as assumed in te standard literature, te bias term tends to zero as 0. In te literature, one typically cooses some couple of positive constants α 1,α 2 ), and σ ) and some bandwidt = 1,..., d ) suc tat n α 1/d lnn) ) α 2 j lnn) ) 1 for all j = 1,...,d, 2.15) were we assume tat n is sufficiently large suc tat te above inequalities can old. For appropriate α 1,α 2 ), Condition 1 is ten satisfied for n sufficiently large in many examples. Example 1. If, for example, te design is uniform μ ) 1) and te noise level omoscedastic σ ) σ>0), it olds tat const and tus δ b F) + lnn)/ n. Coosing a bandwidt = n as in 2.15) wit α 1 = 1 and α 2 = 4, Condition 2.8) is satisfied for n sufficiently large.

10 A fully adaptive pointwise M-estimator 1569 Example 2. For degenerated designs, owever, it is possible tat 0as 0. For example, let d = 1, te noise level be omoscedastic σ ) σ>0), and s + 1 μ ) = x s x 0 ) x 0 s 1 0,1] ) 2.16) s+1 wit s> 1 and x 0 0, 1] see 7]). Te density explodes for s<0) or vanises for s>0) at x 0, so tat one will eiter ave a lot or very little observations in te vicinity of x 0.Tisis reflected in δ recall tat d = 1 and tus 0, 1]): δ b F) + lnn) n s+1. So, similarly as above, one may coose a bandwidt like in 2.15) wit α 1 = 1/s + 1) and α 2 = 4/s + 1). We finally note tat te concrete form of Condition 1 is due to te application of deviation inequalities for bounded empirical processes. Similarly, we could relax te boundedness condition on te empirical processes involved to Bernstein conditions see, e.g., 35]). Tis allows to incorporate unbounded contrasts suc as te least squares contrast and te factor ρ in Condition 1 sould be replaced by te factor Eρ σ X)ξ)] 2, were ξ would be a sub-gaussian random variable S-minimax results In tis section, we deduce some corollaries adapted to simple examples from te above results. To start, we recall te notion of S-minimaxity. To tis end, let fx 0 ) be an estimator of f x 0 ) and S a set of functions. For any q>0, we define te maximal risk of f and te S-minimax risk for x 0 and S as R n,q f,s]:= sup E f S fx 0 ) f x 0 ) q and R n,q S]:=inf f R n,q f,s], 2.17) respectively. Te infimum on te rigt-and side is taken over all estimators. We can now define te S-minimax rates of convergence and te asymptotic) S-minimax estimators: Definition 3. A sequence φ n is an S-minimax rate of convergence, and te estimator ˆ f is an asymptotic) S-minimax estimator wit respect to te set S if 0 < lim inf n φ q n R n,q S] lim sup n φn q R n,q f,s] ˆ <. We can give some simple examples for one dimensional target functions, tat is, d = 1. We call H 1 β,l,m) Hölder space, wit parameters β,l,m > 0, te set of β -times differentiable functions f : 0, 1] R suc tat f j) M for all j {0, 1,..., β } and satisfied te Hölder continuity f β ) x) f β ) y) L x y β β for all x,y 0, 1] d. Te following corollary can now be easily deduced from Teorem 1.

11 1570 M. Cicignoud and J. Lederer Corollary 1. Consider te model in Example 1, tat is, uniform design μ ) 1) and omoscedastic noise level σ ) σ>0). Let β,l and M be positive parameters. Moreover, let fˆ λ be defined as in 2.4) wit m = β, n 1/2β+1), ρ ) = ρ arc,1 ) as in 2.3), and K ) := 1 1/2,1/2] ). Ten, it olds tat and EP n λ f )] 2 = 1 n EP n λ f ) = 1 n E ρ arc,1 σ ξ i) ] 2, Eρ arc,1 σ ξ i) lim sup n qβ/2β+1) R n,q f ˆλ, H 1 β,l,m) ) <. n Te rate n β/2β+1) is a standard S-minimax rate in te context Gaussian noise see 34], Capter 2). Here, owever, tis rate is acieved for a large class of noise distributions. Similarly, one can deduce te next corollary. Corollary 2. Consider te model in Example 2, tat is, a degenerate design as in 2.16) wit s> 1 and a omoscedastic noise level σ ) σ>0). Let β,l, and M be positive parameters. Moreover, let fˆ λ be defined as in 2.4) wit m = β, n 1/2β+s+1), ρ ) = ρ arc,1 ) as in 2.3), and K ) := 1 { 1/2,1/2} ). Ten, it olds tat and EP n λ f )] 2 = EP n λ f ) = s x s x 0 ) s+1 1 n s x s x 0 ) s+1 1 n E ρ arc,1 σ ξ i) ] 2, Eρ arc,1 σ ξ i) lim sup n qβ/2β+s+1) R n,q f ˆλ, H 1 β,l,m) ) <. n Tus, te rate n β/2β+s+1) is acieved. Tis rate is S-minimax in te nonparametric regression wit omoscedastic Gaussian noise see 7]). Note tat we ave only considered examples wit omoscedastic noises ere. For eteroscedastic noises, te dependence on can be very involved for some contrast functions cf. equations 2.10) and 2.11)). But, as igligted by te next example, tis is not always te case. Corollary 3. Consider te model 2.1) wit d = 1, a degenerate design as in 2.16) wit s> 1, a eteroscedastic noise level σ ) x 0 α, 0 α s/2, and a noise ξ i ) i wit finite variance.

12 A fully adaptive pointwise M-estimator 1571 Let β,l and M be positive parameters. Moreover, let ˆ f λ be defined as in 2.4) wit m = β, n 1/2β+s 2α+1), ρ ) = ρ H,lnn) ) as in 2.2), and K ) := 1 { 1/2,1/2} ). Ten, it olds tat EP n λ f )] 2 s+2α, EP n λ f ) s and lim sup n qβ/2β+s 2α+1) R n,q f ˆλ, H 1 β,l,m) ) <. n Tis result illustrates te effect of small noise levels on te rate and te possible compensations to degenerate unfavorable) designs. In particular, if α = s/2, we get te standard minimax rate n β/2β+1) as in Corollary 1. We assume tat α is smaller tan s/2, since oterwise te noise level is very small and te bandwidt cosen is tus as small as possible. We also recall tat te noise level is assumed to be bounded so tat we only consider te case α A D-adaptive estimator for fixed bandwidts In tis section, we discuss te selection of te combined function λ, tat is, of te kernel and te contrast. For tis, we introduce an oracle tat minimizes te bound in Teorem 1 above and ten provide an estimator tat mimics tis oracle. Tis estimator is ten D-adaptive, tat is, adaptive wit respect to te noise and te design distributions. To tis end, we first introduce := ϒ K as te set of possible combined functions λ as in 2.4) for a given set of contrasts ϒ, a given set of kernels K, and a fixed bandwidt 0, 1] d. For example, one may consider a subset of te set of Huber functions indexed by te scale γ>0 as set of contrasts ϒ := {ρ H,γ ) : γ>0}. An example for te set of kernels is te set of indicator functions wit different supports as K := { 1 Su) ) : u 1/2, 1/2] d} for Su) := 1/2 + u 1, 1/2 + u 1 ] 1/2 + u d, 1/2 + u d ]. Tis contains, in particular, te symmetric indicator kernel 1 S0) ). In tis section, te bandwidt is fixed so tat te bias term b F) in Teorem 1 is of minor importance; we ten introduce te oracle as te minimizer of te variance 2.9) To mimic te oracle λ, we propose te estimator λ λ := arg min Vλ). 3.1) λ λ := arg min Vλ), λ P n λ were Vλ) fˆ λ )] 2 + ρ K ln 2 n)/ ) n 2 :=. P n λ fˆ λ ) 3.2)

13 1572 M. Cicignoud and J. Lederer Note tat we estimate te target function f by fˆ λ and EP n λ f )] 2 and EP n λ f ) by teir empirical versions P n λ fˆ λ )] 2 and P n λ fˆ λ ), respectively. Te explicit expressions for te numerator and te denominator can be obtained using P n λ ˆ f λ ) ] 2 = 1 n P n λ ˆ f λ ) = 1 n K 2 X i) ρ Y i ˆ f λ X i ) )] 2 K X i )ρ Y i ˆ f λ X i ) ). We now sow tat te estimator ˆ f λ tat results from 2.4) and 3.2) performs up to constants as well as te oracle fˆ λ. For tis, we define H F ) as te entropy wit bracketing of F wit respect to te pseudo)metric EP n κf1,λ 1 ) κf 2,λ 2 ) ] 2 EP n λ 1 f 1) λ 2 f 2) ] 2 3.3) for any f 1,f 2 F,λ 1,λ 2, were κf,λ) := λ f )/ EP n λ f )] 2 + ρ K ln 2 n)/ n ). We compute in te Appendix a bound for tis entropy for te set of Huer contrasts indexed by te scale. Before giving te main result of tis section, we give te necessary assumptions. Condition 2. Let = ϒ K be a set of functions as in 2.4) were ϒ is a set of contrasts as in Definition 2 and K is a set of kernels as in Definition 1, n {1, 2,...}, and 0, 1] d. We say tat Condition 2 is satisfied if te smallest eigenvalue defined in Condition 1) is positive, n ln 4 n), and, defining for any λ δ 2 P 2 λ) := E K X) ] b F) + it olds for all λ 4 b F) + δ λ)) 1 inf x V n and 54 ρ E K 2X)]+ K / n ) n lnn P ) H 1/2 F u) du + H F 1)) 1 Condition 3. Additionally, we say tat Condition 3 is satisfied if, defining s λ) := 1 2 K ) δ λ) + b F) ] it olds for all λ ], E ρ )] σx)ξ i. 3.4) K ρ 2 ) n ln n P ) + s λ) 1 0 ) H 1/2 F u) du + H F 1), 1 2 K min { EP n λ f ), EP n λ f )] 2}. 3.5)

14 A fully adaptive pointwise M-estimator 1573 We discuss te above conditions after te following result. Teorem 2. Let be a set of functions as in 2.4), n {1, 2,...}, and 0, 1] d suc tat Conditions 2 and 3 are satisfied. Ten, for all q 1, E ˆ f λ x 0) f x 0 ) q T q b F) H 1/2 F u) du + 4H ] F 1) Vλ ln ) ) q + 52M)q n) n n 2 for a constant T q T q = 2q P 117 q Gammaq) works, were Gamma ) is te classical Gamma function). Remark 3. Apart from te given assumptions, te estimator ˆ f λ x 0) does not premise knowledge about te noise level σ ) and te densities g i ) and μ ) but acieves up to constants te optimal variance term Vλ ) for all suc functions. Te estimator is tus called D-adaptive optimal wit respect to te set ). For example, for te Huber contrast 2.2) indexed by te scale γ, ϒ := {ρ H,γ ) : γ>0}, te estimator is D-adaptive minimax Huber minimax) for te set of contamination models, see Section A.1 in te arxiv version. Finally, we mention tat appropriate coices of te bandwidt in te above result lead to S-minimax results. Discussion of Conditions 2 and 3. Condition 2 limits te possible sets of combined functions and tus, in particular, te sets of possible contrast functions ϒ. It demands tat all possible combined functions λ fulfill Condition 1, wic ten leads to consistent estimators see Proposition 1) and to sets of contrast wit finite entropy. Condition 2 demands, in particular, tat te rigt-and side of 3.4) is positive and, since te rigt-and side of 3.4) is upper bounded by 1, tat sup ρ ϒ ρ does not increase too rapidly wit n. In te following, we illustrate tese restrictions wit an example. We consider a omoscedastic model σ ) σ>0) and ϒ equal to a set of Huber contrasts ρ H,γ ) as in 2.2) wit scale parameter γ γ,γ + ], γ + γ > 0. It olds tat sup ρ ϒ ρ H,γ = γ +. Tis implies tat γ + must not increase too rapidly wit n. Moreover, it must old tat 1 n E ρ H,γ σ ξ i) ] γ/σ 1 = γ/σ n γ /σ 1 g i z) dz γ /σ n g i z) dz>0. For noise densities tat are positive and continuous in te origin, tis condition is verified for all γ > 0. For more involved noise densities vanised at te origin), owever, γ as to be cosen sufficiently large. Condition 3 is similar to Condition 2 since s λ) δ λ). However, te terms in te minimum on te rigt-and side of 3.5), can be small for a certain design and noise level. Te second term, vanises if σ ) 0 since ρ 0) = 0. Moreover, if te design degenerates as in 2.16)) wit alarges, EP n λ f ) and EP n λ f )] 2 ten tend to zero faster tan s λ) as n cf. 2.10) and 2.11)). Tis is due to te estimation of EP n λ f ) and EP n λ f )] 2 : if σ ) 0 or if te design degenerates, te above terms are small cf. 2.9)), and tus, te estimation error of tem wic is related to s )) obstructs teir beavior.

15 1574 M. Cicignoud and J. Lederer 4. A D-adaptive and S-adaptive estimator In tis section, we introduce an estimator of f x 0 ) tat is simultaneously S- and D-adaptive. For tis, we apply te data-driven procedure introduced above to select te contrast and te kernel and a modification of te data-driven Lepski s metod to select te bandwidt. In te first part, we consider isotropic, locally polynomial target functions, in te second part anisotropic, locally constant functions. To simplify te exposition, we present asymptotic results only. Te LPA is designed for functions tat can be locally approximated by polynomials. Tis is, for example, te case for Hölder classes, wic we define similarly as in 3]) as Definition 4. Let β := β 1,...,β d ) ]0, + d suc tat β 1 = = β d =: β, and let L,M > 0. Te function s : 0, 1] d M,M] belongs to te anisotropic Hölder Class H d β,l,m) if for all x,x 0 0, 1] d p S β x 0,1] d sx) Ps)x x 0 ) L sup p sx) x p 1 1 x p d d M, d x j x 0,j β j and were Ps)x x 0 ) is te Taylor polynomial of s of order β at x 0, and x j and x 0,j are te jt components of x and x 0, respectively. Te parameter β is usually unknown; tus, it is desirable to ave an estimator tat is adaptive wit respect to β. Tis motivates te following definition, were := {ψ n β)} β M is a given family of normalizations for a set of parameters M: Definition 5. Te family is called admissible if tere exists an estimator ˆ f n suc tat lim sup n sup β M j=1 ψn q β)r n,q f ˆn, H d β,l,m) ) <. Te estimator ˆ f n is ten called -adaptive in te S-minimax sense. We distinguis two cases in te following: First, we consider te special case of isotropic Hölder classes, tat is, β 1 = =β d. Tese classes only require a common bandwidt for all dimensions tat is cosen wit te standard version of Lepski s metod see 24] and 23]). Afterwards, we allow for anisotropic Hölder classes. Tese classes necessitate a separate bandwidt for every dimension of te domain under consideration. Te standard version of Lepski s metod is not applicable in tis case, because it requires a monotonous bias. We circumvent tis problem using a modified version of Lepski s metod as described in 19] and 22].

16 A fully adaptive pointwise M-estimator A fully adaptive estimator for isotropic, locally polynomial functions Here, we consider isotropic Hölder classes wit β 0,m+ 1], were m is te degree of te estimator fˆ λ and may be cosen arbitrarily large. Terefore, only one bandwidt iso = 1 = = d > 0 as to be selected. Geometrically, tis means tat we select a ypercube in R d wit edge lengt iso as domain of interest in contrast to te anisotropic case, were we select a yperrectangle wit edge lengts 1,..., d ). A major issue is te coice of te bandwidt. In te following, we assume tat te variance term Vλ iso )/n d iso ) for see Definitions 2.4) and 2.9)) λ iso f )x, y) := ρ y fx) ) K iso x) for all x 0, 1] d,y R, is decreasing in te bandwidt so tat we can apply Lepski s metod. Tis imposes an additional restriction on te design and te noise. After te main result of tis section, we give some examples for designs and noises tat fulfill tis restriction. Next, we introduce te set of bandwidts H iso :=, + ], were 0 < < + < 1 are defined as cf. 2.15)) := ln6/d n) n 1/d and + := 1 lnn). 4.1) Since te inequality < + as to be satisfied, n is required to be large enoug. We ten introduce te isotropic M-estimator for any iso H iso as were fˆ iso iso := arg min P n λ iso f ), f F λ iso = arg min Vλ iso ) λ iso and V ) is defined in 2.9). Eventually, we introduce a net Hɛ iso := { iso H iso, m N : iso = + ɛ m },ɛ 0, 1), suc tat 1 Hɛ iso n and ten apply Lepski s metod for isotropic functions see 24] and 23]) to define te data-driven bandwidt ĥ iso : { ĥ iso := max iso H iso ɛ : fˆ iso for all iso Hiso ɛ iso x 0) fˆ iso iso x 0) 15 2 B + isoɛ n) ) V λ ) iso n, iso )d } 4.2) suc tat iso iso, were iso ɛ n) := 11 lnn Hɛ iso ) and B := H 1/2 F u) du + 4H F 1). ln 2 n) We now obtain on isotropic Hölder classes H iso d β,l,m) := H dβ,..., β), L, M), for all β,l,m > 0 te following result:

17 1576 M. Cicignoud and J. Lederer Teorem 3. Let be a set of combined functions as in 2.4) and n {1, 2,...} suc tat Conditions 2 and 3 are satisfied for all iso H iso. Ten, for any x 0 0, 1) d, any β 0,m+ 1], and any L>0, tere exists a universal positive constant C>0 suc tat R n,q f ˆ ĥiso iso x 0), H iso d β,l,m)] C inf iso H iso { Ld β iso + iso ɛn) Vλ ) } q iso as n. n d iso Remark 4. Tis oracle inequality like result sows te simultaneous S- and D-adaptation of te estimator. It generalizes results in 5], wic rely on te asymptotic equivalence of te block median metod, in two important aspects: First, it allows for eteroscedastic regression models wit random designs. Second, it does not require tat te noise densities are positive at teir median and tus allows for a wider range densities. Finally, we note tat Lepski s metod as been used for locally constant M-estimators in 27] but to te best of our knowledge never to locally polynomial M-estimators as it is done ere. Remark 5. If only S-adaptation is considered, te conditions on n can be considerably relaxed. In fact, assuming tat Vλ ) is known, te estimator fˆ λ of 2.4) can be applied instead of fˆ ĥiso iso iso, were iso is selected from 4.2) replacing V λ ) by Vλ iso ). Te Conditions 2 and 3 can ten iso be replaced by Condition 1. Remark 6. Te variance term is decreasing for settings wit indicator kernels and omoscedastic noise levels as one can ceck easily starting from 2.9)); for settings wit indicator kernels, Huber contrasts, σ ) = 1 + x 0 α for a α 0, 1/2], and d = 1; and for many oter settings. On te contrary, te variance term can be increasing, for example, if te noise level is symmetric in x 0 and convex. Corollary 4. Consider te model in Example 1 in te previous section wit μ ) 1uniform design) and σ ) 1omoscdastic noise level). For any β 0,m+ 1] and any L>0, it olds tat ) n qβ/2β+d) lim sup R n,q f ˆ ĥiso n lnn) iso x 0), H iso d β,l,m)] <. Tis corollary can be deduced minimizing te term on te rigt-and side of te last teorem wit a standard bias/variance trade-off. Remark 7. Te rate lnn)/n) β/2β+1) in te above corollary is admissible cf. Definition 5) over isotropic Hölder spaces and is asymptotically optimal see 4] and 24]) up to te logaritm lnn), wic is te usual price for te adapativity see Section 5 for more details). Moreover, te approac used to deduce te above corollary presumes uniform designs and omescedastic noises; owever, more elaborate approaces, peraps similar to te ones in 8], may lead to comparable results for degenerate designs.

18 A fully adaptive pointwise M-estimator A fully adaptive estimator for anisotropic, locally constant functions In tis part, we allow for anisotropic Hölder classes and bandwidts. In return, we restrict ourselves to locally constant functions, tat is, m = 0 and tus P =1) and F = M,M]. Moreover, we restrict ourselves to uniform designs μ ) 1) and omoscedastic σ ) σ 0) and identically distributed noise g i ) g ) for all i = 1,...,n). For tis setting, we introduce an S- and D-adaptive estimator of f x 0 ). Te main properties of tis estimator are given in Teorem 4. We introduce an estimator for eac bandwidt in te set H :=, + ] d, were and + are defined in te previous section. For tis, we define te variance term as ρ σ z)] 2 gz)dz + ρ K ln 2 n)/ n d ) 2 Vρ, K) := ρ, 4.3) σz)gz) dz and te oracle for a set of contrasts ϒ and a set of kernels K as ρ,k ) := arg min Vρ, K). 4.4) ρ ϒ,K K Next, we introduce an estimator of te variance term as 1/n) n ρ Y i fˆ λ + X i ))] 2 + ρ K ln 2 n)/ Vρ, K) := 1/n) n ρ Y i fˆ λ + X i )) n d ) 2, 4.5) were ˆ f λ + is defined in 2.4) wit λ = λ +f )x, y) := ρy f x))k +x), and an estimator of te oracle as ˆρ, ˆK) := arg min Vρ, K). 4.6) ρ ϒ,K K We stress tat te variance term V, te oracle ρ,k ), and teir estimators V and ˆρ, ˆK) are independent of te bandwidt. We can finally introduce te desired estimator ˆ f for all H: ˆ f := arg min f F n 1 i ˆρ Y i fx i ) ) ˆK X i ). 4.7) Te crucial step is now te coice of te bandwidt wit a modified version of Lepski s metod see 19] and 20]). First, we define for all a,b R te scalar a b := maxa, b) and for all, H te vector := 1 1,..., d d ). We ten consider te two families of Locally Constant Approximation LCA) estimators provoked by 4.7)) { ˆ f } H and { ˆ f, := ˆ f }, H 2. Note tat fˆ, = fˆ, commutativity). Similarly as above, we ten introduce a net H ɛ := {,..., )} { H: j = 1,...,d m j N : j = + ɛ m j },ɛ 0, 1), suc tat H ɛ n

19 1578 M. Cicignoud and J. Lederer and set ani ɛ n) := 11 lnn H ɛ ). We finally select te bandwidt according to { ĥ := max H ɛ : fˆ, x 0 ) fˆ x 0 ) 16 B + aniɛ n) ) V ˆρ, ˆK) n } for all H ɛ suc tat. 4.8) Te maximum is taken wit respect to te order wic we define as d j=1 j dj=1 j. Note, in particular, tat te rigt-and side of 4.8) is decreasing wit respect to tis order. Te above coice of te bandwidt leads to te estimator fˆ ĥ wit te following properties: Teorem 4. Let be a set of combined functions as in 2.4) and let n {1, 2,...} suc tat Conditions 2 and 3 are satisfied for all H. Ten, for any x 0 0, 1) d, any β 0, 1] d, and any L>0, tere exists a universal constant C suc tat R n,q ˆ f ĥx 0 ), H d β,l,m) ] C inf H { L d j=1 β j j + ani ɛ n) } q Vρ,K ). n We can also derive te following corollary from Teorem 4 via a bias/variance trade-off. Corollary 5. For any β 0, 1] d and any L>0, it olds tat ) n q β/2 β+1) lim sup R n,q f ˆ ĥx 0 ), H d β,l,m) ] <, n lnn) were β := j 1/β j ) 1 is te armonic average. Remark 8. In contrast to te previous part, only locally constant functions are considered ere, wic is due to te bias term cf. Lemma 6). To te best of our knowledge, te presented coice of te bandwidt is te first application of te anisotropic Lepski s principle 22], see also 11,19, 20]) for te selection of an anisotropic bandwidt for nonlinear M-estimators. We also note tat, comparing te adaptive rate lnn)/n) β/2 β+1) wit te optimal rate in te wite noise model see 20]), for example, one finds tat tis rate is nearly optimal. We finally refer to te remarks after Teorem Discussion Let us detail on te assumptions and restrictions and igligt some open problems: 1. Instead of assuming tat te densities g i ) are symmetric cf. 14,29]), it is sufficient tat te sum i g i ) is symmetric. We are, owever, not aware of examples were tis generalization is relevant.

20 A fully adaptive pointwise M-estimator Te variance of te median estimator is 1/4g 2 0)) wic implies a strong sensitive to te noise density at 0. Moreover, te estimation of g0) see 5], e.g.) requires many observations near fx 0 ) in practice. On te contrary, Huber contrast wit scale γ allowed by our approac), te denominator of te variance term 2.9) depends on te mass of te noise density on te interval γ,γ] instead of te mass at To estimate te variance term 2.9), we plug an estimate Y i fˆ λ of te residuals. Condition 2 ensures te consistency of all estimators in. Tis is considerably restrictive on te initial family. Tis problem can be circumvented using a pre-estimator e.g., wit te contrast 2.3)) instead of fˆ λ for te estimation of te variance. 4. Lepski s metod is very sensitive to outliers see 27]). To complement it wit te adaptive robustness of te estimator via te minimization of te variance term can tus be interesting for many applications. 5. Te variance term and its empirical version do not depend on te bias term see Teorem 1, Definition 3.2) and Remark 2) and, more generally, not on te specific model. Te procedure presented in tis paper may tus be interesting for oter models, suc as ig dimensional settings cf. 21]), for example. 6. Te quantity 15 2B + iso ɛ n)) in te tresold term in 4.2) contains te factor lnn) and known but large constants. For applications, it sould usually be cosen considerably smaller see 23]) and can probably be tuned wit te propagation metod 28], for example. 7. As mentioned in te Introduction, Lepski-type procedures are also useful to get S-adaptive confident bands see 10] and references terein). Tis requires deviation inequalities tat can be derived along te presented lines see Proposition 3) but also a lower bound for te bias term of te estimator cf. 10], Condition 3, Section 3.2 and Section 3.5 for Discussion), wic seems not to be available ere, since robust M-estimators and tus te bias term do not ave explicit expressions. For our purposes, we circumvent tis issue by using te bias term of te criterions s derivative as an estimator of te expected criterion s derivative, see Lemmas 1 and 2. However, tis way, we only obtain an upper bound. We terefore suggest to establis first S-adaptive confidence bands for te criterion s derivative viewed as an estimator and ten, using te smootness of te contrast, confidence bands wit respect to a pointwise semi-norm or sup-norm. 6. Proofs of te main results Let us introduce some additional notation to simplify te exposition. For tis, we introduce F δ := { f = P t F : t t 0 l1 δ } 6.1) as a ball in F wit radius δ>0 centered at f 0. Furtermore, we denote te column vector of partial derivatives of te criterion P n λ ) defined in 2.4)) by D λ P t ) := ) P n λp t ) for all t R P, 6.2) t p p P

21 1580 M. Cicignoud and J. Lederer and te parametric expectation wit respect to te distribution E 0 of X, f 0 X) + σx)ξ) by E 0 D λ ) ]. 6.3) Next, for all t R P, we introduce te Jacobian matrix J D of E 0 D λ ] as JD P t ) ) p,q P := E 0 D p λ t P t) ] ) q p,q P = E 0 ]) P n λp t ), 6.4) t q t p p,q P were D p λ ) is te pt component of D λ ). Te Jacobian matrix exists according to Definition 2 and Fubini s teorem. Furtermore, te sup-norm on R P is denoted by l, and te vector of coefficients of te estimated polynomial fˆ λ is denoted by ˆt λ. Moreover, we set c λ := EP n λ f ) 6.5) and b := b F). We finally define λ := ρ K and for any z 0 B z := H 1/2 F u) du + 4H F 1) ln z + 2z n) ln 2 n). 6.6) 6.1. Auxiliary results Te following propositions are basic for te proofs of te main results. Te proofs of te propositions are given in te Appendix. Proposition 1. Let = ϒ K be a set of functions as in 2.4) were ϒ is a set of contrasts as in Definition 2 and K is a set of kernels as in Definition 1. Let n {1, 2,...} and 0, 1] d be suc tat Condition 2 is satisfied. Ten, P λ { fˆ λ F δ λ)}) 1 n 2, were δ ) is defined in Condition 2. Te following proposition allows us to control te deviations of te process D λ ). Proposition 2. For any z 0, it olds tat P sup sup λ f F δ λ) were B z is defined in 6.6). D λ f ) E D λ f )] l EP n λ f )] 2 + λ ln2 n)/ n B z n ) 2 P exp z), Tis proposition is directly deduced from Massart s Inequality see te arxiv version for details).

22 A fully adaptive pointwise M-estimator 1581 Proposition 3. Let be a set of functions as in 2.4), n {1, 2,...}, and 0, 1] d be suc tat Condition 2 is satisfied. Ten, for any z 0, it olds tat { P sup f ˆλ x 0 ) f x 0 ) ] } Vλ)Bz 2 3b ) fˆ λ F δ λ n λ)} 2 P exp z). λ { We note tat te constants 2 and 3 can be replaced by o1). Proposition 4. Let = ϒ K be a set of functions as in 2.4) were ϒ is a set of contrasts as in Definition 2 and K is a set of kernels as in Definition 1. Let n {1, 2,...} and 0, 1] d be suc tat Condition 2 is satisfied. Ten, P ) 1 5/n 2, were := λ { Vλ) 2 3 Vλ), 6 Vλ)]}. We note tat te constants 2/3 and 6 can be replaced by o1) Proof of Teorem 2 First, we set := λ { Vλ) 2 3 Vλ), 6 Vλ)]}. Ten, we observe tat, since ˆ F, sup f F fx 0 ) M, and f x 0 ) M, te risk can be bounded by E ˆ fˆλ x 0) f x 0 ) q = E ˆ fˆλ x 0) f x 0 ) q 1 + E ˆ fˆλ x 0) f x 0 ) q 1 c E ˆ fˆλ x 0) f x 0 ) q 1 + 2M) q P c). Using Proposition 4, Lemma 3, te last inequality, and simple computations, we obtain E ˆ fˆλ x 0) f x 0 ) q E ˆ fˆλ x 0) f x 0 ) q M) q /n 2 fˆλ 2 E q ˆ fˆλ x 0) f x 0 ) 3b 6 3Vλ ) )B q 0 6.7) n q 3b + 6 3Vλ ) )B q M) q /n 2. n Let us now bound te first term on te rigt-and side of te last inequality. To do so, we note tat on te event V λ ) Vλ ) Vˆλ) Vˆλ) ) Using te last inequality and integrating te result of Proposition 3 wit ε = 10 z + get for more details see te arxiv version) E ˆ fˆλ x 0) f x 0 ) 3b 6 3Vλ ) )B q 0 Vλ 1 T q b + ) )B q 0. n + n 2z ln 2 n),we

23 1582 M. Cicignoud and J. Lederer From 6.7) and te last inequality, te teorem can be deduced Proof of Teorem 3 For ease of exposition, we set B 0 = B cf. 6.6)), k := iso, and ˆk := ĥ iso. Ten, one may verify tat te oracle bandwidt { k := arg min Ldk β + c B 0 + iso ɛ n) ) Vλ k ) } k H iso nk d is well defined, were c is a constant cosen suc tat bot terms are equal at te point k.next, from Propositions 1 and 4 wit = k,...,k), it follows tat P k Hɛ iso, λ k : fˆ iso k / F ) δk λ k) n 2 n 1 6.9) k H iso ɛ and k H iso ɛ P c ) k k Hɛ iso 5 n 2 5n 1, 6.10) were k := is defined in Proposition 4. Tus, we may restrict our considerations to te event k Hɛ iso,λ k { fˆ iso k F δk λ k)} k, since we are only interested in te asymptotic beavior. We now introduce kɛ Hiso ɛ suc tat kɛ k ɛ 1 kɛ. Control of te risk on te event {k ɛ ˆk} Wit te triangular inequality and Lemma 3, we obtain fˆ ˆk iso x 0 ) f x 0 ) q 1 k ɛ ˆk 2 q 1 fˆ ˆk iso x 0 ) fˆ k ɛ iso x 0) q 1 k ɛ ˆk + fˆ k ɛ iso x 0) f x 0 ) q ). 6.11) Te first term on te rigt-and side of te last inequality is controlled using te procedure 4.2) to obtain E fˆ ˆk iso x 0 ) fˆ k ɛ iso x 0) q ] V λ 1 k ɛ ˆk E 15 k )B ɛ 0 + iso ɛ n))] q 2. nk ɛ ) d On te event k Hɛ iso k,wegetsimilarlyasin6.8) E fˆ ˆk iso x 0 ) ˆ f k ɛ iso x 0) q 1 k ɛ ˆk ] 45 Vλ k )B 0 + iso ɛ n))) q ɛ 6. nk ɛ ) d

Poisson Equation in Sobolev Spaces

Poisson Equation in Sobolev Spaces Poisson Equation in Sobolev Spaces OcMountain Dayligt Time. 6, 011 Today we discuss te Poisson equation in Sobolev spaces. It s existence, uniqueness, and regularity. Weak Solution. u = f in, u = g on

More information

Applications of the van Trees inequality to non-parametric estimation.

Applications of the van Trees inequality to non-parametric estimation. Brno-06, Lecture 2, 16.05.06 D/Stat/Brno-06/2.tex www.mast.queensu.ca/ blevit/ Applications of te van Trees inequality to non-parametric estimation. Regular non-parametric problems. As an example of suc

More information

Differentiation in higher dimensions

Differentiation in higher dimensions Capter 2 Differentiation in iger dimensions 2.1 Te Total Derivative Recall tat if f : R R is a 1-variable function, and a R, we say tat f is differentiable at x = a if and only if te ratio f(a+) f(a) tends

More information

Homework 1 Due: Wednesday, September 28, 2016

Homework 1 Due: Wednesday, September 28, 2016 0-704 Information Processing and Learning Fall 06 Homework Due: Wednesday, September 8, 06 Notes: For positive integers k, [k] := {,..., k} denotes te set of te first k positive integers. Wen p and Y q

More information

POLYNOMIAL AND SPLINE ESTIMATORS OF THE DISTRIBUTION FUNCTION WITH PRESCRIBED ACCURACY

POLYNOMIAL AND SPLINE ESTIMATORS OF THE DISTRIBUTION FUNCTION WITH PRESCRIBED ACCURACY APPLICATIONES MATHEMATICAE 36, (29), pp. 2 Zbigniew Ciesielski (Sopot) Ryszard Zieliński (Warszawa) POLYNOMIAL AND SPLINE ESTIMATORS OF THE DISTRIBUTION FUNCTION WITH PRESCRIBED ACCURACY Abstract. Dvoretzky

More information

Chapter 1. Density Estimation

Chapter 1. Density Estimation Capter 1 Density Estimation Let X 1, X,..., X n be observations from a density f X x. Te aim is to use only tis data to obtain an estimate ˆf X x of f X x. Properties of f f X x x, Parametric metods f

More information

Convexity and Smoothness

Convexity and Smoothness Capter 4 Convexity and Smootness 4.1 Strict Convexity, Smootness, and Gateaux Differentiablity Definition 4.1.1. Let X be a Banac space wit a norm denoted by. A map f : X \{0} X \{0}, f f x is called a

More information

7 Semiparametric Methods and Partially Linear Regression

7 Semiparametric Methods and Partially Linear Regression 7 Semiparametric Metods and Partially Linear Regression 7. Overview A model is called semiparametric if it is described by and were is nite-dimensional (e.g. parametric) and is in nite-dimensional (nonparametric).

More information

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households Volume 29, Issue 3 Existence of competitive equilibrium in economies wit multi-member ouseolds Noriisa Sato Graduate Scool of Economics, Waseda University Abstract Tis paper focuses on te existence of

More information

MA455 Manifolds Solutions 1 May 2008

MA455 Manifolds Solutions 1 May 2008 MA455 Manifolds Solutions 1 May 2008 1. (i) Given real numbers a < b, find a diffeomorpism (a, b) R. Solution: For example first map (a, b) to (0, π/2) and ten map (0, π/2) diffeomorpically to R using

More information

Continuity and Differentiability Worksheet

Continuity and Differentiability Worksheet Continuity and Differentiability Workseet (Be sure tat you can also do te grapical eercises from te tet- Tese were not included below! Typical problems are like problems -3, p. 6; -3, p. 7; 33-34, p. 7;

More information

The Priestley-Chao Estimator

The Priestley-Chao Estimator Te Priestley-Cao Estimator In tis section we will consider te Pristley-Cao estimator of te unknown regression function. It is assumed tat we ave a sample of observations (Y i, x i ), i = 1,..., n wic are

More information

Chapter 2 Limits and Continuity

Chapter 2 Limits and Continuity 4 Section. Capter Limits and Continuity Section. Rates of Cange and Limits (pp. 6) Quick Review.. f () ( ) () 4 0. f () 4( ) 4. f () sin sin 0 4. f (). 4 4 4 6. c c c 7. 8. c d d c d d c d c 9. 8 ( )(

More information

Bootstrap confidence intervals in nonparametric regression without an additive model

Bootstrap confidence intervals in nonparametric regression without an additive model Bootstrap confidence intervals in nonparametric regression witout an additive model Dimitris N. Politis Abstract Te problem of confidence interval construction in nonparametric regression via te bootstrap

More information

Combining functions: algebraic methods

Combining functions: algebraic methods Combining functions: algebraic metods Functions can be added, subtracted, multiplied, divided, and raised to a power, just like numbers or algebra expressions. If f(x) = x 2 and g(x) = x + 2, clearly f(x)

More information

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4.

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4. December 09, 20 Calculus PracticeTest s Name: (4 points) Find te absolute extrema of f(x) = x 3 0 on te interval [0, 4] Te derivative of f(x) is f (x) = 3x 2, wic is zero only at x = 0 Tus we only need

More information

A SHORT INTRODUCTION TO BANACH LATTICES AND

A SHORT INTRODUCTION TO BANACH LATTICES AND CHAPTER A SHORT INTRODUCTION TO BANACH LATTICES AND POSITIVE OPERATORS In tis capter we give a brief introduction to Banac lattices and positive operators. Most results of tis capter can be found, e.g.,

More information

Order of Accuracy. ũ h u Ch p, (1)

Order of Accuracy. ũ h u Ch p, (1) Order of Accuracy 1 Terminology We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, wic can be for instance te grid size or time step in a numerical

More information

Kernel Density Based Linear Regression Estimate

Kernel Density Based Linear Regression Estimate Kernel Density Based Linear Regression Estimate Weixin Yao and Zibiao Zao Abstract For linear regression models wit non-normally distributed errors, te least squares estimate (LSE will lose some efficiency

More information

Basic Nonparametric Estimation Spring 2002

Basic Nonparametric Estimation Spring 2002 Basic Nonparametric Estimation Spring 2002 Te following topics are covered today: Basic Nonparametric Regression. Tere are four books tat you can find reference: Silverman986, Wand and Jones995, Hardle990,

More information

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator.

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator. Lecture XVII Abstract We introduce te concept of directional derivative of a scalar function and discuss its relation wit te gradient operator. Directional derivative and gradient Te directional derivative

More information

Section 15.6 Directional Derivatives and the Gradient Vector

Section 15.6 Directional Derivatives and the Gradient Vector Section 15.6 Directional Derivatives and te Gradient Vector Finding rates of cange in different directions Recall tat wen we first started considering derivatives of functions of more tan one variable,

More information

Gradient Descent etc.

Gradient Descent etc. 1 Gradient Descent etc EE 13: Networked estimation and control Prof Kan) I DERIVATIVE Consider f : R R x fx) Te derivative is defined as d fx) = lim dx fx + ) fx) Te cain rule states tat if d d f gx) )

More information

ERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER*

ERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER* EO BOUNDS FO THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BADLEY J. LUCIE* Abstract. Te expected error in L ) attimet for Glimm s sceme wen applied to a scalar conservation law is bounded by + 2 ) ) /2 T

More information

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example, NUMERICAL DIFFERENTIATION James T Smit San Francisco State University In calculus classes, you compute derivatives algebraically: for example, f( x) = x + x f ( x) = x x Tis tecnique requires your knowing

More information

Math 212-Lecture 9. For a single-variable function z = f(x), the derivative is f (x) = lim h 0

Math 212-Lecture 9. For a single-variable function z = f(x), the derivative is f (x) = lim h 0 3.4: Partial Derivatives Definition Mat 22-Lecture 9 For a single-variable function z = f(x), te derivative is f (x) = lim 0 f(x+) f(x). For a function z = f(x, y) of two variables, to define te derivatives,

More information

Continuity and Differentiability of the Trigonometric Functions

Continuity and Differentiability of the Trigonometric Functions [Te basis for te following work will be te definition of te trigonometric functions as ratios of te sides of a triangle inscribed in a circle; in particular, te sine of an angle will be defined to be te

More information

Bootstrap prediction intervals for Markov processes

Bootstrap prediction intervals for Markov processes arxiv: arxiv:0000.0000 Bootstrap prediction intervals for Markov processes Li Pan and Dimitris N. Politis Li Pan Department of Matematics University of California San Diego La Jolla, CA 92093-0112, USA

More information

Analytic Functions. Differentiable Functions of a Complex Variable

Analytic Functions. Differentiable Functions of a Complex Variable Analytic Functions Differentiable Functions of a Complex Variable In tis capter, we sall generalize te ideas for polynomials power series of a complex variable we developed in te previous capter to general

More information

Chapter 5 FINITE DIFFERENCE METHOD (FDM)

Chapter 5 FINITE DIFFERENCE METHOD (FDM) MEE7 Computer Modeling Tecniques in Engineering Capter 5 FINITE DIFFERENCE METHOD (FDM) 5. Introduction to FDM Te finite difference tecniques are based upon approximations wic permit replacing differential

More information

Click here to see an animation of the derivative

Click here to see an animation of the derivative Differentiation Massoud Malek Derivative Te concept of derivative is at te core of Calculus; It is a very powerful tool for understanding te beavior of matematical functions. It allows us to optimize functions,

More information

Exam 1 Review Solutions

Exam 1 Review Solutions Exam Review Solutions Please also review te old quizzes, and be sure tat you understand te omework problems. General notes: () Always give an algebraic reason for your answer (graps are not sufficient),

More information

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point MA00 Capter 6 Calculus and Basic Linear Algebra I Limits, Continuity and Differentiability Te concept of its (p.7 p.9, p.4 p.49, p.55 p.56). Limits Consider te function determined by te formula f Note

More information

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx. Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions

More information

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LAURA EVANS.. Introduction Not all differential equations can be explicitly solved for y. Tis can be problematic if we need to know te value of y

More information

HOMEWORK HELP 2 FOR MATH 151

HOMEWORK HELP 2 FOR MATH 151 HOMEWORK HELP 2 FOR MATH 151 Here we go; te second round of omework elp. If tere are oters you would like to see, let me know! 2.4, 43 and 44 At wat points are te functions f(x) and g(x) = xf(x)continuous,

More information

Math 161 (33) - Final exam

Math 161 (33) - Final exam Name: Id #: Mat 161 (33) - Final exam Fall Quarter 2015 Wednesday December 9, 2015-10:30am to 12:30am Instructions: Prob. Points Score possible 1 25 2 25 3 25 4 25 TOTAL 75 (BEST 3) Read eac problem carefully.

More information

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these. Mat 11. Test Form N Fall 016 Name. Instructions. Te first eleven problems are wort points eac. Te last six problems are wort 5 points eac. For te last six problems, you must use relevant metods of algebra

More information

Uniform Convergence Rates for Nonparametric Estimation

Uniform Convergence Rates for Nonparametric Estimation Uniform Convergence Rates for Nonparametric Estimation Bruce E. Hansen University of Wisconsin www.ssc.wisc.edu/~bansen October 2004 Preliminary and Incomplete Abstract Tis paper presents a set of rate

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL IFFERENTIATION FIRST ERIVATIVES Te simplest difference formulas are based on using a straigt line to interpolate te given data; tey use two data pints to estimate te derivative. We assume tat

More information

OSCILLATION OF SOLUTIONS TO NON-LINEAR DIFFERENCE EQUATIONS WITH SEVERAL ADVANCED ARGUMENTS. Sandra Pinelas and Julio G. Dix

OSCILLATION OF SOLUTIONS TO NON-LINEAR DIFFERENCE EQUATIONS WITH SEVERAL ADVANCED ARGUMENTS. Sandra Pinelas and Julio G. Dix Opuscula Mat. 37, no. 6 (2017), 887 898 ttp://dx.doi.org/10.7494/opmat.2017.37.6.887 Opuscula Matematica OSCILLATION OF SOLUTIONS TO NON-LINEAR DIFFERENCE EQUATIONS WITH SEVERAL ADVANCED ARGUMENTS Sandra

More information

A = h w (1) Error Analysis Physics 141

A = h w (1) Error Analysis Physics 141 Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.

More information

Chapter 2. Limits and Continuity 16( ) 16( 9) = = 001. Section 2.1 Rates of Change and Limits (pp ) Quick Review 2.1

Chapter 2. Limits and Continuity 16( ) 16( 9) = = 001. Section 2.1 Rates of Change and Limits (pp ) Quick Review 2.1 Capter Limits and Continuity Section. Rates of Cange and Limits (pp. 969) Quick Review..... f ( ) ( ) ( ) 0 ( ) f ( ) f ( ) sin π sin π 0 f ( ). < < < 6. < c c < < c 7. < < < < < 8. 9. 0. c < d d < c

More information

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 Mat 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 f(x+) f(x) 10 1. For f(x) = x 2 + 2x 5, find ))))))))) and simplify completely. NOTE: **f(x+) is NOT f(x)+! f(x+) f(x) (x+) 2 + 2(x+) 5 ( x 2

More information

Convexity and Smoothness

Convexity and Smoothness Capter 4 Convexity and Smootness 4. Strict Convexity, Smootness, and Gateaux Di erentiablity Definition 4... Let X be a Banac space wit a norm denoted by k k. A map f : X \{0}!X \{0}, f 7! f x is called

More information

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x) Calculus. Gradients and te Derivative Q f(x+) δy P T δx R f(x) 0 x x+ Let P (x, f(x)) and Q(x+, f(x+)) denote two points on te curve of te function y = f(x) and let R denote te point of intersection of

More information

How to Find the Derivative of a Function: Calculus 1

How to Find the Derivative of a Function: Calculus 1 Introduction How to Find te Derivative of a Function: Calculus 1 Calculus is not an easy matematics course Te fact tat you ave enrolled in suc a difficult subject indicates tat you are interested in te

More information

The derivative function

The derivative function Roberto s Notes on Differential Calculus Capter : Definition of derivative Section Te derivative function Wat you need to know already: f is at a point on its grap and ow to compute it. Wat te derivative

More information

= 0 and states ''hence there is a stationary point'' All aspects of the proof dx must be correct (c)

= 0 and states ''hence there is a stationary point'' All aspects of the proof dx must be correct (c) Paper 1: Pure Matematics 1 Mark Sceme 1(a) (i) (ii) d d y 3 1x 4x x M1 A1 d y dx 1.1b 1.1b 36x 48x A1ft 1.1b Substitutes x = into teir dx (3) 3 1 4 Sows d y 0 and states ''ence tere is a stationary point''

More information

Polynomial Functions. Linear Functions. Precalculus: Linear and Quadratic Functions

Polynomial Functions. Linear Functions. Precalculus: Linear and Quadratic Functions Concepts: definition of polynomial functions, linear functions tree representations), transformation of y = x to get y = mx + b, quadratic functions axis of symmetry, vertex, x-intercepts), transformations

More information

MIXED DISCONTINUOUS GALERKIN APPROXIMATION OF THE MAXWELL OPERATOR. SIAM J. Numer. Anal., Vol. 42 (2004), pp

MIXED DISCONTINUOUS GALERKIN APPROXIMATION OF THE MAXWELL OPERATOR. SIAM J. Numer. Anal., Vol. 42 (2004), pp MIXED DISCONTINUOUS GALERIN APPROXIMATION OF THE MAXWELL OPERATOR PAUL HOUSTON, ILARIA PERUGIA, AND DOMINI SCHÖTZAU SIAM J. Numer. Anal., Vol. 4 (004), pp. 434 459 Abstract. We introduce and analyze a

More information

Subdifferentials of convex functions

Subdifferentials of convex functions Subdifferentials of convex functions Jordan Bell jordan.bell@gmail.com Department of Matematics, University of Toronto April 21, 2014 Wenever we speak about a vector space in tis note we mean a vector

More information

Financial Econometrics Prof. Massimo Guidolin

Financial Econometrics Prof. Massimo Guidolin CLEFIN A.A. 2010/2011 Financial Econometrics Prof. Massimo Guidolin A Quick Review of Basic Estimation Metods 1. Were te OLS World Ends... Consider two time series 1: = { 1 2 } and 1: = { 1 2 }. At tis

More information

Practice Problem Solutions: Exam 1

Practice Problem Solutions: Exam 1 Practice Problem Solutions: Exam 1 1. (a) Algebraic Solution: Te largest term in te numerator is 3x 2, wile te largest term in te denominator is 5x 2 3x 2 + 5. Tus lim x 5x 2 2x 3x 2 x 5x 2 = 3 5 Numerical

More information

Regularized Regression

Regularized Regression Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize

More information

MATH745 Fall MATH745 Fall

MATH745 Fall MATH745 Fall MATH745 Fall 5 MATH745 Fall 5 INTRODUCTION WELCOME TO MATH 745 TOPICS IN NUMERICAL ANALYSIS Instructor: Dr Bartosz Protas Department of Matematics & Statistics Email: bprotas@mcmasterca Office HH 36, Ext

More information

2.3 Algebraic approach to limits

2.3 Algebraic approach to limits CHAPTER 2. LIMITS 32 2.3 Algebraic approac to its Now we start to learn ow to find its algebraically. Tis starts wit te simplest possible its, and ten builds tese up to more complicated examples. Fact.

More information

4.2 - Richardson Extrapolation

4.2 - Richardson Extrapolation . - Ricardson Extrapolation. Small-O Notation: Recall tat te big-o notation used to define te rate of convergence in Section.: Definition Let x n n converge to a number x. Suppose tat n n is a sequence

More information

arxiv:math/ v1 [math.ca] 1 Oct 2003

arxiv:math/ v1 [math.ca] 1 Oct 2003 arxiv:mat/0310017v1 [mat.ca] 1 Oct 2003 Cange of Variable for Multi-dimensional Integral 4 Marc 2003 Isidore Fleiscer Abstract Te cange of variable teorem is proved under te sole ypotesis of differentiability

More information

3.4 Worksheet: Proof of the Chain Rule NAME

3.4 Worksheet: Proof of the Chain Rule NAME Mat 1170 3.4 Workseet: Proof of te Cain Rule NAME Te Cain Rule So far we are able to differentiate all types of functions. For example: polynomials, rational, root, and trigonometric functions. We are

More information

Finite Difference Methods Assignments

Finite Difference Methods Assignments Finite Difference Metods Assignments Anders Söberg and Aay Saxena, Micael Tuné, and Maria Westermarck Revised: Jarmo Rantakokko June 6, 1999 Teknisk databeandling Assignment 1: A one-dimensional eat equation

More information

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems 5 Ordinary Differential Equations: Finite Difference Metods for Boundary Problems Read sections 10.1, 10.2, 10.4 Review questions 10.1 10.4, 10.8 10.9, 10.13 5.1 Introduction In te previous capters we

More information

. If lim. x 2 x 1. f(x+h) f(x)

. If lim. x 2 x 1. f(x+h) f(x) Review of Differential Calculus Wen te value of one variable y is uniquely determined by te value of anoter variable x, ten te relationsip between x and y is described by a function f tat assigns a value

More information

Nonparametric regression on functional data: inference and practical aspects

Nonparametric regression on functional data: inference and practical aspects arxiv:mat/0603084v1 [mat.st] 3 Mar 2006 Nonparametric regression on functional data: inference and practical aspects Frédéric Ferraty, André Mas, and Pilippe Vieu August 17, 2016 Abstract We consider te

More information

Chapter Seven The Quantum Mechanical Simple Harmonic Oscillator

Chapter Seven The Quantum Mechanical Simple Harmonic Oscillator Capter Seven Te Quantum Mecanical Simple Harmonic Oscillator Introduction Te potential energy function for a classical, simple armonic oscillator is given by ZÐBÑ œ 5B were 5 is te spring constant. Suc

More information

A Goodness-of-fit test for GARCH innovation density. Hira L. Koul 1 and Nao Mimoto Michigan State University. Abstract

A Goodness-of-fit test for GARCH innovation density. Hira L. Koul 1 and Nao Mimoto Michigan State University. Abstract A Goodness-of-fit test for GARCH innovation density Hira L. Koul and Nao Mimoto Micigan State University Abstract We prove asymptotic normality of a suitably standardized integrated square difference between

More information

Math Spring 2013 Solutions to Assignment # 3 Completion Date: Wednesday May 15, (1/z) 2 (1/z 1) 2 = lim

Math Spring 2013 Solutions to Assignment # 3 Completion Date: Wednesday May 15, (1/z) 2 (1/z 1) 2 = lim Mat 311 - Spring 013 Solutions to Assignment # 3 Completion Date: Wednesday May 15, 013 Question 1. [p 56, #10 (a)] 4z Use te teorem of Sec. 17 to sow tat z (z 1) = 4. We ave z 4z (z 1) = z 0 4 (1/z) (1/z

More information

Nonlinear elliptic-parabolic problems

Nonlinear elliptic-parabolic problems Nonlinear elliptic-parabolic problems Inwon C. Kim and Norbert Požár Abstract We introduce a notion of viscosity solutions for a general class of elliptic-parabolic pase transition problems. Tese include

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc

More information

Approximation of the Viability Kernel

Approximation of the Viability Kernel Approximation of te Viability Kernel Patrick Saint-Pierre CEREMADE, Université Paris-Daupine Place du Marécal de Lattre de Tassigny 75775 Paris cedex 16 26 october 1990 Abstract We study recursive inclusions

More information

A Jump-Preserving Curve Fitting Procedure Based On Local Piecewise-Linear Kernel Estimation

A Jump-Preserving Curve Fitting Procedure Based On Local Piecewise-Linear Kernel Estimation A Jump-Preserving Curve Fitting Procedure Based On Local Piecewise-Linear Kernel Estimation Peiua Qiu Scool of Statistics University of Minnesota 313 Ford Hall 224 Curc St SE Minneapolis, MN 55455 Abstract

More information

Numerical Differentiation

Numerical Differentiation Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function

More information

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines Lecture 5 Interpolation II Introduction In te previous lecture we focused primarily on polynomial interpolation of a set of n points. A difficulty we observed is tat wen n is large, our polynomial as to

More information

Precalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here!

Precalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here! Precalculus Test 2 Practice Questions Page Note: You can expect oter types of questions on te test tan te ones presented ere! Questions Example. Find te vertex of te quadratic f(x) = 4x 2 x. Example 2.

More information

Test 2 Review. 1. Find the determinant of the matrix below using (a) cofactor expansion and (b) row reduction. A = 3 2 =

Test 2 Review. 1. Find the determinant of the matrix below using (a) cofactor expansion and (b) row reduction. A = 3 2 = Test Review Find te determinant of te matrix below using (a cofactor expansion and (b row reduction Answer: (a det + = (b Observe R R R R R R R R R Ten det B = (((det Hence det Use Cramer s rule to solve:

More information

arxiv: v1 [math.pr] 28 Dec 2018

arxiv: v1 [math.pr] 28 Dec 2018 Approximating Sepp s constants for te Slepian process Jack Noonan a, Anatoly Zigljavsky a, a Scool of Matematics, Cardiff University, Cardiff, CF4 4AG, UK arxiv:8.0v [mat.pr] 8 Dec 08 Abstract Slepian

More information

The Complexity of Computing the MCD-Estimator

The Complexity of Computing the MCD-Estimator Te Complexity of Computing te MCD-Estimator Torsten Bernolt Lerstul Informatik 2 Universität Dortmund, Germany torstenbernolt@uni-dortmundde Paul Fiscer IMM, Danisc Tecnical University Kongens Lyngby,

More information

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my course tat I teac ere at Lamar University. Despite te fact tat tese are my class notes, tey sould be accessible to anyone wanting to learn or needing a refreser

More information

lecture 26: Richardson extrapolation

lecture 26: Richardson extrapolation 43 lecture 26: Ricardson extrapolation 35 Ricardson extrapolation, Romberg integration Trougout numerical analysis, one encounters procedures tat apply some simple approximation (eg, linear interpolation)

More information

158 Calculus and Structures

158 Calculus and Structures 58 Calculus and Structures CHAPTER PROPERTIES OF DERIVATIVES AND DIFFERENTIATION BY THE EASY WAY. Calculus and Structures 59 Copyrigt Capter PROPERTIES OF DERIVATIVES. INTRODUCTION In te last capter you

More information

3.1 Extreme Values of a Function

3.1 Extreme Values of a Function .1 Etreme Values of a Function Section.1 Notes Page 1 One application of te derivative is finding minimum and maimum values off a grap. In precalculus we were only able to do tis wit quadratics by find

More information

Bob Brown Math 251 Calculus 1 Chapter 3, Section 1 Completed 1 CCBC Dundalk

Bob Brown Math 251 Calculus 1 Chapter 3, Section 1 Completed 1 CCBC Dundalk Bob Brown Mat 251 Calculus 1 Capter 3, Section 1 Completed 1 Te Tangent Line Problem Te idea of a tangent line first arises in geometry in te context of a circle. But before we jump into a discussion of

More information

Semigroups of Operators

Semigroups of Operators Lecture 11 Semigroups of Operators In tis Lecture we gater a few notions on one-parameter semigroups of linear operators, confining to te essential tools tat are needed in te sequel. As usual, X is a real

More information

IEOR 165 Lecture 10 Distribution Estimation

IEOR 165 Lecture 10 Distribution Estimation IEOR 165 Lecture 10 Distribution Estimation 1 Motivating Problem Consider a situation were we ave iid data x i from some unknown distribution. One problem of interest is estimating te distribution tat

More information

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i NADARAYA WATSON ESTIMATE JAN 0, 2006: version 2 DATA: (x i, Y i, i =,..., n. ESTIMATE E(Y x = m(x by n i= ˆm (x = Y ik ( x i x n i= K ( x i x EXAMPLES OF K: K(u = I{ u c} (uniform or box kernel K(u = u

More information

The complex exponential function

The complex exponential function Capter Te complex exponential function Tis is a very important function!. Te series For any z C, we define: exp(z) := n! zn = + z + z2 2 + z3 6 + z4 24 + On te closed disk D(0,R) := {z C z R}, one as n!

More information

The Laplace equation, cylindrically or spherically symmetric case

The Laplace equation, cylindrically or spherically symmetric case Numerisce Metoden II, 7 4, und Übungen, 7 5 Course Notes, Summer Term 7 Some material and exercises Te Laplace equation, cylindrically or sperically symmetric case Electric and gravitational potential,

More information

ch (for some fixed positive number c) reaching c

ch (for some fixed positive number c) reaching c GSTF Journal of Matematics Statistics and Operations Researc (JMSOR) Vol. No. September 05 DOI 0.60/s4086-05-000-z Nonlinear Piecewise-defined Difference Equations wit Reciprocal and Cubic Terms Ramadan

More information

Finite Difference Method

Finite Difference Method Capter 8 Finite Difference Metod 81 2nd order linear pde in two variables General 2nd order linear pde in two variables is given in te following form: L[u] = Au xx +2Bu xy +Cu yy +Du x +Eu y +Fu = G According

More information

1 1. Rationalize the denominator and fully simplify the radical expression 3 3. Solution: = 1 = 3 3 = 2

1 1. Rationalize the denominator and fully simplify the radical expression 3 3. Solution: = 1 = 3 3 = 2 MTH - Spring 04 Exam Review (Solutions) Exam : February 5t 6:00-7:0 Tis exam review contains questions similar to tose you sould expect to see on Exam. Te questions included in tis review, owever, are

More information

Functions of the Complex Variable z

Functions of the Complex Variable z Capter 2 Functions of te Complex Variable z Introduction We wis to examine te notion of a function of z were z is a complex variable. To be sure, a complex variable can be viewed as noting but a pair of

More information

arxiv: v1 [math.dg] 4 Feb 2015

arxiv: v1 [math.dg] 4 Feb 2015 CENTROID OF TRIANGLES ASSOCIATED WITH A CURVE arxiv:1502.01205v1 [mat.dg] 4 Feb 2015 Dong-Soo Kim and Dong Seo Kim Abstract. Arcimedes sowed tat te area between a parabola and any cord AB on te parabola

More information

1 Introduction to Optimization

1 Introduction to Optimization Unconstrained Convex Optimization 2 1 Introduction to Optimization Given a general optimization problem of te form min x f(x) (1.1) were f : R n R. Sometimes te problem as constraints (we are only interested

More information

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY (Section 3.2: Derivative Functions and Differentiability) 3.2.1 SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY LEARNING OBJECTIVES Know, understand, and apply te Limit Definition of te Derivative

More information

Copyright c 2008 Kevin Long

Copyright c 2008 Kevin Long Lecture 4 Numerical solution of initial value problems Te metods you ve learned so far ave obtained closed-form solutions to initial value problems. A closedform solution is an explicit algebriac formula

More information

Function Composition and Chain Rules

Function Composition and Chain Rules Function Composition and s James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 8, 2017 Outline 1 Function Composition and Continuity 2 Function

More information

1. Consider the trigonometric function f(t) whose graph is shown below. Write down a possible formula for f(t).

1. Consider the trigonometric function f(t) whose graph is shown below. Write down a possible formula for f(t). . Consider te trigonometric function f(t) wose grap is sown below. Write down a possible formula for f(t). Tis function appears to be an odd, periodic function tat as been sifted upwards, so we will use

More information

Sin, Cos and All That

Sin, Cos and All That Sin, Cos and All Tat James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 9, 2017 Outline Sin, Cos and all tat! A New Power Rule Derivatives

More information

(a) At what number x = a does f have a removable discontinuity? What value f(a) should be assigned to f at x = a in order to make f continuous at a?

(a) At what number x = a does f have a removable discontinuity? What value f(a) should be assigned to f at x = a in order to make f continuous at a? Solutions to Test 1 Fall 016 1pt 1. Te grap of a function f(x) is sown at rigt below. Part I. State te value of eac limit. If a limit is infinite, state weter it is or. If a limit does not exist (but is

More information

Superconvergence of energy-conserving discontinuous Galerkin methods for. linear hyperbolic equations. Abstract

Superconvergence of energy-conserving discontinuous Galerkin methods for. linear hyperbolic equations. Abstract Superconvergence of energy-conserving discontinuous Galerkin metods for linear yperbolic equations Yong Liu, Ci-Wang Su and Mengping Zang 3 Abstract In tis paper, we study superconvergence properties of

More information