arxiv: v1 [stat.me] 21 Mar 2016

Size: px
Start display at page:

Download "arxiv: v1 [stat.me] 21 Mar 2016"

Transcription

1 Sharp sup-norm Bayesian curve estimation Catia Scricciolo Department of Economics, University of Verona, Via Cantarane 24, Verona, Italy arxiv: v1 [stat.me] 21 Mar 216 Abstract Sup-norm curve estimation is a fundamental statistical problem and, in principle, a premise for the construction of confidence bands for infinite-dimensional parameters. In a Bayesian framework, the issue of whether the sup-normconcentration-of-posterior-measure approach proposed by Giné and Nickl (211), which involves solving a testing problem exploiting concentration properties of kernel and projection-type density estimators around their expectations, can yield minimax-optimal rates is herein settled in the affirmative beyond conjugate-prior settings obtaining sharp rates for common prior-model pairs like random histograms, Dirichlet Gaussian or Laplace mixtures, which can be employed for density, regression or quantile estimation. Keywords: McDiarmind s inequality, Nonparametric hypothesis testing, Posterior distributions, Sup-norm rates 1. Introduction The study of the frequentist asymptotic behaviour of Bayesian nonparametric (BNP) procedures has initially focused on the Hellinger or L 1 -distance loss, see Shen and Wasserman (21) and Ghosal et al. (2), but an extension and generalization of the results to L r -distance losses, 1 r, has been the object of two recent contributions by Giné and Nickl (211) and Castillo (214). Sup-norm estimation has particularly attracted attention as it may constitute the premise for the construction of confidence bands whose geometric structure can be easily visualized and interpreted. Furthermore, as shown in the example of Section 3.2, the study of sup-norm posterior contraction rates for density estimation can be motivated as being an intermediate step for the final assessment of convergence rates for quantile estimation. While the contribution of Castillo (214) has a more prior-model specific flavour, the article by Giné and Nickl (211) aims at a unified understanding of the drivers of the asymptotic behaviour of BNP procedures by developing a new approach to the involved testing problem constructing nonparametric tests that have good exponential bounds on the type-one and type-two error probabilities that rely on concentration properties of kernel and projection-type density estimators around their expectations. Even if Giné and Nickl (211) s approach can only be useful if a fine control of the approximation properties of the prior support is possible, it has the merit of replacing the entropy condition for sieve sets with approximating conditions. However, the result, as presented in their Theorem 2 (Theorem 3), can only retrieve minimax-optimal rates for L r -losses when 1 r 2, while rates deteriorate by a genuine power of n, in fact n 1/2, for r>2. Thus, the open question remains whether their approach can give the right rates for 2<r for non-conjugate priors and sub-optimal rates are possibly only an artifact of the proof. We herein settle this issue in the affirmative by refining their result and proof and showing in concrete examples that this approach retrieves the right rates. The paper is organized as follows. In Section 2, we state the main result whose proof is postponed to Appendix A. Examples concerning different statistical settings like density and quantile estimation are presented in Section 3. Corresponding author. address: catia.scricciolo@univr.it (Catia Scricciolo)

2 2. Main result In this section, we describe the set-up and present the main contribution of this note. Let ( (X,A, P), P P ) be a collection of probability measures on a measurable space (X, A) that possess densities with respect to some σ-finite dominating measureµ. LetΠ n be a sequence of priors on (P,B), wherebis aσ-field onpfor which the maps x p(x) are jointly measurable relative toa B. Let X 1,..., X n be i.i.d. (independent, identically distributed) observations from a common law P P with density p onxwith respect toµ, p = dp /dµ. For a probability measure P on (X,A) and ana-measurable function f :X R k, k 1, let P f denote the integral f dp, where, unless otherwise specified, the set of integration is understood to be the whole domain. When this notation is applied to the empirical measure P n associated with a sample X (n) := (X 1,..., X n ), namely the discrete uniform measure on the sample values, this yields P n f = n 1 n i=1 f (X i ). For each n N, let ˆp n ( j)( )=n 1 n i=1 K j (, X i ) be a kernel or projection-type density estimator based on X 1,..., X n at resolution level j, with K j as in Definition (1) below. Its expectation is then equal to P n ˆp n( j)( )=P K j (, X 1 )=K j (p )( ), where we have used the notation K j (p )( )= K j (, y)p (y) dy. In order to refine Giné and Nickl (211) s result, we use concentration properties of ˆp n ( j) K j (p ) 1 around its expectation by applying McDiarmind s inequality for bounded differences functions. The following definition, which corresponds to Condition in Giné and Nickl (215), is essential for the main result. Definition 1. LetX=R,X=[, 1] orx=(, 1]. The sequence of operators K j (x, y) := 2 j K(2 j x, 2 j y), x, y X, j, is called an admissible approximating sequence if it satisfies one of the following conditions: a) convolution kernel case,x=r: K(x, y)=k(x y), where K L 1 (R) L (R), integrates to 1 and is of bounded p-variation for some finite p 1 and right (left)-continuous; b) multi-resolution projection case,x=r: K(x, y)= k Zφ(x k)φ(y k), with K j as above or K j (x, y)= K(x, y)+ j 1 l= kψ lk (x)ψ lk (y), whereφ,ψ L 1 (R) L (R) define an S -regular wavelet basis, have bounded p-variation for some p 1 and are uniformly continuous, or define the Haar basis, see Chapter 4, ibidem; c) multi-resolution case,x=[, 1]: K j,bc (x, y) is the projection kernel at resolution j of a Cohen-Daubechies-Vial (CDV) wavelet basis, see Chapter 4, ibidem; d) multi-resolution case,x=(, 1]: K j,per (x, y) is the projection kernel at resolution j of the periodization of a scaling function satisfying b), see (4.126) and (4.127), ibidem. Remark 1. A useful property of S -regular wavelet bases is the following: there exists a non-negative measurable functionφ L 1 (R) L (R) such that K(x, y) Φ( x y ) for all x, y R, that is, K is dominated by a bounded and integrable convolution kernel Φ. In order to state the main result, we recall that a sequence of positive real numbers L n is slowly varying at if, for eachλ>, it holds that lim n (L [λn] /L n )=1. Also, for s, let L 1 (µ s ) be the space ofµ s -integrable functions, dµ s (x) := (1+ x ) s dx, equipped with the norm f L1 (µ s ) := f (x) (1+ x ) s dx. Theorem 1. Letǫ n and J n be sequences of positive real numbers such thatǫ n, nǫn 2 and 2 J n = O(nǫn). 2 For each r {1, } and a slowly varying sequence L n,r, letǫ n,r := L n,r ǫ n. Suppose that, for K as in Definition (1), with K 2,Φ 2 and p that integrate (1+ x ) s for some s>1in cases a) and b), K Jn (p ) p r = O(ǫ n,r ) (1) and, for a constant C>, setsp n {P P: K Jn (p) p r C K ǫ n,r }, where C K > only depends on K, we have (i)π n (P\P n ) exp ( (C+ 4)nǫ 2 n), (ii)π n ( P P: P log(p/p ) ǫ 2 n, P log 2 (p/p ) ǫ 2 n) exp ( Cnǫ 2 n ). 2

3 Then, for sufficiently large M r >, P n Π n( P P: p p r M r ǫ n,r X (n)). (2) If the convergence in (2) holds for r {1, }, then, for each 1< s<. P n Π n( P P: p p s M s ǫ n X (n)), where ǫ n := (L n,1 L n, )ǫ n. The assertion, whose proof is reported in Appendix A, is an in-probability statement that the posterior mass outside a sup-norm ball of radius a large multiple M ofǫ n is negligible. The theorem provides the same sufficient conditions for deriving sup-norm posterior contraction rates that are minimax-optimal, up to logarithmic factors, as in Giné and Nickl (211). Condition (ii), which is mutuated from Ghosal et al. (2), is the essential one: the prior concentration rate is the only determinant of the posterior contraction rate at densities p having sup-norm approximation error of the same order against a kernel-type approximant, provided the prior support is almost the set of densities with the same approximation property. 3. Examples In this section, we apply Theorem 1 to some prior-model pairs used for (conditional) density or regression estimation, including random histograms, Dirichlet Gaussian or Laplace mixtures, that have been selected in an attempt to reflect cases for which the issue of obtaining sup-norm posterior rates was still open. We do not consider Gaussian priors or wavelets series because these examples have been successfully worked out in Castillo (214) taking a different approach. We furthermore exhibit an example with the aim of illustrating that obtaining sup-norm posterior contraction rates for density estimation can be motivated as being an intermediate step for the final assessment of convergence rates for estimating single quantiles Density estimation Example 1 (Random dyadic histograms). For J n N, consider a partition of [, 1] into 2 J n intervals (bins) of equal length A 1,2 Jn= [, 2 J n ] and A j,2 Jn= (( j 1)2 J n, j2 J n ], j=2,..., 2 J n. Let Dir 2 Jn denote the Dirichlet distribution on the (2 J n 1)-dimensional unit simplex with all parameters equal to 1. Consider the random histogram 2 Jn w j,2 Jn 2 J n 1 A ( ), (w j,2 Jn 1,2 Jn,..., w 2 Jn,2 Jn ) Dir 2 Jn. j=1 Denote byπ 2 Jn the induced law on the space of probability measures with Lebesgue density on [, 1]. Let X 1,..., X n be i.i.d. observations from a density p on [, 1]. Then, the Bayes density estimator, that is the posterior expected histogram, has expression ˆp n (x)= 2 Jn j=1 1+N l(x) 2 J 2 J n 1 A n + n j,2 (x), x [, 1], Jn where l(x) identifies the bin containing x, i.e., A l(x),2 Jn x, and N l(x) stands for the number of observations falling into A l(x),2 Jn. LetC α ([, 1]) denote the class of Hölder continuous functions on [, 1] with exponentα>. Let ǫ n,α := ( n/ log n ) α/(2α+1) be the minimax rate of convergence over (C α ([, 1]), ). Proposition 1. Let X 1,..., X n be i.i.d. observations from a density p C α ([, 1]), withα (, 1], satisfying p > on [, 1]. Let J n be such that 2 J n ǫn,α 1/α. Then, for sufficiently large M>, P n Π 2 Jn (P : p p Mǫ n,α X (n) ). Consequently, P n ˆp n p ǫ n,α. The first part of the assertion, which concerns posterior contraction rates, immediately follows from Theorem (1) combined with the proof of Proposition 3 of Giné and Nickl (211), whose result, together with that of Theorem 3 in Castillo (214), is herein improved to the minimax-optimal rate ( n/ log n ) α/(2α+1) for every <α 1. The second part of the assertion, which concerns convergence rates for the histogram density estimator, is a consequence of Jensen s inequality and convexity of p p p, combined with the fact that the priorπ 2 Jn is supported on densities uniformly bounded above by 2 J n and that the proof of Theorem 1 yields the exponential order exp ( Bnǫn,α) 2 for the convergence of the posterior probability of the complement of an (Mǫ n,α )-ball around p, in symbols, P n ˆp n p < Mǫ n,α + 2 J n P n Π 2 Jn (P : p p Mǫ n,α X (n) ) Mǫ n,α + 2 J n exp ( Bnǫn,α), 2 whence P n ˆp n p = O(ǫ n,α ). 3

4 Example 2 (Dirichlet-Laplace mixtures). Consider, as in Scricciolo (211), Gao and van der Vaart (215), a Laplace mixture priorπthus defined. Forϕ(x) := 1 2 exp ( x ), x R, the density of a Laplace (, 1) distribution, let p G ( ) := ϕ( θ) dg(θ) denote a mixture of Laplace densities with mixing distribution G, G D α, the Dirichlet process with base measureα :=α R ᾱ, for <α R < andᾱaprobability measure onr. Proposition 2. Let X 1,..., X n be i.i.d. observations from a density p G, with G supported on a compact interval [ a, a]. Ifαhas support on [ a, a] with continuous Lebesgue density bounded below away from and above from, then, for sufficiently large M>, P n Π(P : p p M(n/ log n) 3/8 X (n) ). Consequently, for the Bayes estimator ˆp n ( )= p G ( )Π(dG X (n) ) we have P n ˆp n p (n/ log n) 3/8. Proof. It is known from Proposition 4 in Gao and van der Vaart (215) that the small-ball probability estimate in condition (ii) of Theorem 1 is satisfied forǫ n = (n/ log n) 3/8. For the bias condition, we takep n to be the support ofπand show that, for 2 J n ǫn 1/3 = (n/ log n) 1/8 and any symmetric density K with finite second moment, we have K Jn (p G ) p G = O(ǫ n ) uniformly over the support ofπ. Indeed, by applying Lemma 1 withβ=2, for each x R it results K Jn (p G )(x) p G (x) 2 K Jn (p G ) p G 2 2 ϕ(t) 2 K(2 J n t) 1 2 dt (2π) 1 (B 2 ϕ I 2 [ K])(2 2J n ) 3, which implies that both conditions (1) and (i) are satisfied. The assertion on the Bayes estimator follows from the same arguments laid out for random histograms together with the fact that p G 1/2 uniformly in G. Example 3 (Dirichlet-Gaussian mixtures). Consider, as in Ghosal and van der Vaart (21, 27), Shen et al. (213), Scricciolo (214), a Gaussian mixture prior Π G thus defined. For φ the standard normal density, let p F,σ ( ) := φ σ ( θ) df(θ) denote a mixture of Gaussian densities with mixing distribution F, F D α, the Dirichlet process with base measureα :=α R ᾱ, for <α R < andᾱaprobability measure on R, which has continuous and positive densityα (θ) e b θ δ as θ, for some constants <b< and <δ 2, σ G which has continuous and positive density g on (, ) such that, for constants <C 1, C 2, D 1, D 2 <, s, t<, C 1 σ s exp ( D 1 σ 1 log t (1/σ) g(σ) C 2 σ s exp ( D 2 σ 1 log t (1/σ)) for allσin a neighborhood of. LetC β (R) denote the class of Hölder continuous functions onrwith exponentβ>. Letǫ n,β := ( n/ log n ) β/(2β+1) be the minimax rate of convergence over (C β (R), ). For any realβ>, let β stand for the largest integer strictly smaller thanβ. Proposition 3. Let X 1,..., X n be i.i.d. observations from a density p L (R) C β (R) such that condition (ii) is satisfied forǫ n,β. Then, for sufficiently large M>, P n (Π G)((F,σ) : p F,σ p Mǫ n,β X (n) ). Proof. Let K L 1 (R) be a convolution kernel such that x k K(x) dx=1 {} (k), k=,..., β, and x β K(x) dx<, the Fourier transform K has supp( K) [ 1, 1]. Let 2 J n ǫ 1/β n,β. For every x R, K J n (p )(x) p (x) C 1 (2 J n ) β ǫ n,β, where the constant C 1 (1/ β!) x β K(x) dx does not depend on x. Thus, K Jn (p ) p = O(ǫ n,β ). For the bias condition, letσ n := E(nǫn,β 2 ) 1 (log n) ψ, with 1/2<ψ<tand a suitable constant <E<. For everyσ σ n and uniformly in F, K Jn (p F,σ ) p F,σ = sup K Jn (u)[φ σ (x v u) φ σ (x v)] du df(v) x R 1 2π sup e it(x v) φ σ (t) K(2 J n t) 1 dt df(v) x R 1 φ σ (t) dt σ 1 n π exp ( (ρσ n 2J n ) 2 ) n 1 <ε n,β t >2 Jn because (σ n 2 J n ) 2 (log n) 2ψ (log n) asψ > 1/2. Now, G(σ < σ n ) σ s n exp ( [D 2σ 1 n logt (1/σ n )]) exp ( (C+ 4)nǫn) 2 becauseψ<t, which implies that the remaining mass condition (ii) is satisfied. 4

5 Remark 2. Conditions on the density p under which assumption (ii) of Theorem 1 is satisfied can be found, for instance, in Shen et al. (213) and Scricciolo (214) Quantile estimation Forτ (, 1), consider the problem of estimating theτ-quantile q τ of the population distribution function F from observations X 1,..., X n. For any (possibly unbounded) interval I R and function g on I, define the Hölder norm as α g Cα (I) := g (k) g α (x) g α (y) L (I)+ sup. x, y I: x y x y α α k= LetC (I) denote the space of continuous and bounded functions on I andc α (I, R) :={g C (I) : g Cα (I) R}, R>. Proposition 4. Suppose that, givenτ (, 1), there are constants r,ζ> so that p ( +q τ ) Cα ([ ζ,ζ], R) and inf [q τ ζ, qτ +ζ] p (x) r. (3) Consider a priorπconcentrated on probability measures having densities p( +q τ ) Cα ([ ζ,ζ], R). If, for sufficiently large M, the posterior probability P n Π(P : p p Mǫ n,α X (n) ), then, there exists M > so that P n Π( qτ q τ M ǫ 1+1/α n,α X (n) ). Proof. We preliminarily make the following remark. Let F(x) := x p(y) dy, x R. Forτ (, 1), let qτ be theτquantile of F. By Lagrange s theorem, there exists a point q τ between qτ and q τ so that F(qτ ) F(q τ )= p(qτ )(qτ q τ ). Consequently, =τ τ= If p(q τ )>, then q q τ p(x) dx p (x) dx= p(x) dx+ q τ τ q τ = [F(qτ ) F (q τ )] p(q τ ) [p(x) p (x)] dx= p(q τ )(qτ q τ )+[F(qτ ) F (q τ )]. = [F(qτ ) τ] p(q τ. (4) ) In order to upper bound q τ q τ, by appealing to relationship (4), we can separately control F(qτ ) F (q τ ) and p(qτ ). Let the kernel function K L 1 (R) be such that x k K(x)dx=1 {} (k), k=,..., α +1, and x α+1 K(x) dx<, its Fourier transform K has supp( K) [ 1, 1]. By Lemma 5.2 in Dattner et al. (213), sup p ( +q τ ) Cα ([ ζ,ζ], R) with D := [R/( α +1)!+ 2ζ (α+1) ] x α+1 K(x) dx. Write F(q τ ) F (q τ )= [K b p p ](x) dx+ [K b p p ](x) dx Dbα+1, (5) [K b (p p )](x) dx+ [p K b p](x) dx=: T 1 + T 2 + T 3. By inequality (5), we have T 1 =O(b α+1 ). By the same reasoning, T 3 =O(b α+1 ). We now consider T 2. Taking into account that K(x) dx=1and ( 1 q τ ) T 2 := [K b (F F )](q τ )= b K u (F F )(u) du= b 5 K(z)(F F )(q τ bz) dz= K(z)(F F)(q τ bz) dz,

6 for some pointξ between q τ bz and qτ (clearly,ξ depends on qτ, z, b), T 2 = [K b (F F )](q τ ) (F F)(q τ )= K(z)[(F F)(q τ bz) (F F)(q τ )] dz+(f F)(q τ ) = K(z)( bz)[d 1 (F F)(ξ)] dz+ (F F)(q τ ) = ( b) zk(z)[(p p)(ξ)] dz+(f F)(q τ ). Then, F(q τ ) F (q τ )=T 1+ T 3 + ( b) zk(z)[(p p)(ξ)] dz [(F F )(q τ )], which implies that 2[F(qτ ) F (q τ )]= T 1 +T 3 +( b) zk(z)[(p p)(ξ)] dz. It follows that 2 F(q τ ) F (q τ ) T 1 + T 3 +b p p z K(z) dz. Taking into account that z K(z) dz<, T 1 =O(b α+1 ) and T 3 =O(b α+1 ), choosing b=o(ǫn,α 1/α ), we have F(q τ ) F (q τ ) T 1 + T 3 +b p p b α+1 + b p p ǫn,α 1+1/α. If p p ǫ n,α then, under condition (3), p(q τ )>r η> for every <η<r. In fact, for any interval I [q τ ζ, qτ +ζ] that includes the point qτ so that it also includes the intermediate point q τ between qτ and q τ, for anyη>we haveη p p sup I p(x) p (x) p( x) p ( x) for every x I. It follows that p(q τ )> p (q τ ) η inf x [q τ ζ, qτ +ζ] p (x) η r η. Conclude the proof by noting that, in virtue of (4), P n Π(P : p p < Mǫ n,α X (n) ) P n Π( qτ q τ < M ǫn,α 1+1/α X (n) ). The assertion then follows. Remark 3. Proposition 4 considers local Hölder regularity of p, which seems natural for estimating single quantiles. Clearly, requirements on p are automatically satisfied if p is globally Hölder regular and, in this case, the minimax-optimal sup-norm rate isǫ n,α = (n/ log n) α/(2α+1) so that the rate for estimating single quantiles is ǫn,α 1+1/α = (n/ log n) (α+1)/(2α+1). The conditions on the random density p are automatically satisfied if the prior is concentrated on probability measures possessing globally Hölder regular densities. Appendix A. Proof of Theorem 1 Proof. Using the remaining mass condition (i) and the small-ball probability estimate (ii), by the proof of Theorem 2.1 in Ghosal et al. (2), it is enough to construct, for each r {1, }, a testψ n,r for the hypothesis H : P=P vs. H 1 :{P P n : p p r M r ǫ n,r }, with M r > large enough, whereψ n,r Ψ n,r (X (n) ; P ) :X n {, 1} is the indicator function of the rejection region of H, such that P n Ψ n,r as n and sup P n (1 Ψ n,r ) exp ( K r Mr 2 nǫ 2 ) n,r for sufficiently large n, P P n : p p r M r ǫ n,r where K r Mr 2 (C+ 4), the constant C> being that appearing in (i) and (ii). By assumption (1), there exists a constant C,r > such that P n ˆp n p r = K Jn (p ) p r C,r ǫ n,r. Define T n,r := ˆp n p r. For a constant M,r > C,r, define the event A n,r := (T n,r > M,r ǫ n,r ) and the testψ n,r := 1 An,r. For r = 1, the triangular inequality T n,1 ˆp n P n ˆp n 1 + P n ˆp n p 1 implies that, when T n,1 > M,1 ǫ n,1, ˆp n P n ˆp n 1 T n,1 P n ˆp n p 1 > M,1 ǫ n,1 P n ˆp n p 1 (M,1 C,1 )ǫ n,1 ; r=, we have ˆp n (x) p (x) ˆp n (x) P n ˆp n(x) + P n ˆp n(x) p (x) ˆp n P n ˆp n 1 + P n ˆp n p for every x R. It follows that T n, ˆp n P n ˆp n 1 + P n ˆp n p, which implies that, when T n, > M, ǫ n,, ˆp n P n ˆp n 1 T n, P n ˆp n p > M, ǫ n, P n ˆp n p (M, C, )ǫ n,. Let h :X n [, 2] be the function defined as h(x (n) ) := ˆp n P n ˆp n 1. Thus, for each r {1, }, when T n,r > M,r ǫ n,r, the inequality h(x (n) )>(M,r C,r )ǫ n,r holds. Therefore, to control the type-one error probability, it is enough to bound above the probability on the right-hand side of the following display P n Ψ n,r P n ( h(x (n) )>(M,r C,r )ǫ n,r ), (A.1) 6

7 which can be done using McDiarmind s inequality, McDiarmid (1989). Given any x (n) := (x 1,..., x n ) X n, for each 1 i n, let x i be the ith component of x (n) and x i := (x i +δ) a perturbation of the ith variable withδ R so that x i X. Letting e i be the canonical vector with all zeros except for a 1 in the ith position, the vector with the perturbed ith variable can be expressed as x (n) +δe i. If (a) the function h has bounded differences: for some non-negative constants c 1,..., c n, (b) P n h(x(n) )=O(ǫ n ), sup h(x (n) ) h(x (n) +δe i ) c i, 1 i n, x (n), x i then, for C := n i=1 c 2 i, by McDiarmind s bounded differences inequality, We show that (a) and (b) are verified. t>, P n ( h(x (n) ) P n h(x(n) ) t ) 2 exp ( 2t 2 /C ). (a) Using the inequality a b a b, setting Φ = K under condition a) of Definition (1), [ i {1,..., n}, sup h(x (n) ) h(x (n) 1 n +δe i ) = sup K Jn (x, x i ) K Jn (p )(x) x (n), x i x (n), x n i i=1 1 n K Jn (x, x i )+ 1 ] n n K J n (x, x i ) K J n (p )(x) dx i i 1 sup x i, x n K J n (, x i ) K Jn (, x i ) 1 2 n Φ 1. i Hence, h has bounded differences with c i = 2 Φ 1 /n, 1 i n. (b) By Theorem in Giné and Nickl (215), P n h(x(n) ) L 2 J n /n=o(ǫn ), with the following upper bounds for the constant L: under conditions a) and b) of Definition (1), settingφ=kin case a), L 2/(s 1) Φ 2 1/2 L 1 (µ s ) p 1/2 L 1 (µ s ) ; under conditions c) and d), L C(φ)(1 p 1/2 ) 1/2, where the constant C(φ) only depends onφ. Forα (, 1), taking t= 2α(M,r C,r )ǫ n,r, P n ( h(x (n) ) P n h(x(n) ) 2α(M,r C,r )ǫ n,r ) 2 exp ( α 2 (M,r C,r ) 2 nǫ 2 n,r/ Φ 2 1). By (b), there exists a constant L L so that P n h(x(n) ) L ǫ n = (L /L n,r )ǫ n,r. Hence, h(x (n) ) P n h(x(n) ) h(x (n) ) P n h(x(n) ) h(x (n) ) (L /L n,r )ǫ n,r. Thus, for sufficiently large L n,r so that [(M,r C,r ) (L /L n,r )] 2α(M,r C,r ), P n ( ˆpn P n ˆp ) n 1 (M,r C,r )ǫ n,r P n( h(x (n) ) P n h(x(n) ) ) 2α(M,r C,r )ǫ n,r 2 exp ( α 2 (M,r C,r ) 2 nǫ 2 n,r/ Φ 2 1). We now provide an exponential upper bound on the type-two error probability. For r {1, }, let P P n be such that p p r M r ǫ n,r. For r=1, when T n,1 M,1 ǫ n,1, p p 1 p P n ˆp n 1 + ˆp n P n ˆp n 1 + T n,1 p P n ˆp n 1 + ˆp n P n ˆp n 1 + M,1 ǫ n,1, 7

8 r=, when T n, M, ǫ n,, x X, p(x) p (x) p P n ˆp n + ˆp n P n ˆp n 1 + T n, p P n ˆp n + ˆp n P n ˆp n 1 + M, ǫ n,, which implies that p p p P n ˆp n + ˆp n P n ˆp n 1 + M, ǫ n,. Summarizing, for r {1, }, when T n,r M,r ǫ n,r, we have p p r p P n ˆp n r + ˆp n P n ˆp n 1 + M,r ǫ n,r. If sup P Pn p P n ˆp n r = sup P Pn p K Jn (p) r C K ǫ n,r, we have ˆp n P n ˆp n 1 p p r p P n ˆp n r M,r ǫ n,r [M r (C K + M,r )]ǫ n,r. Using, as before, McDiarmind s inequality with P playing the same role as P, we get that for a constantα (, 1) small enough and [M r (C K + M,r )]>, sup P n (1 φ n,r )=P n ( ˆp n p r M,r ǫ n,r )= P ( ˆp n P n ) ˆp n 1 [M r (C K + M,r )]ǫ n,r P P n : p p r M r ǫ n,r 2 exp ( α 2 [M r (C K + M,r )] 2 nǫ 2 n,r/ Φ 2 1 ). We need thatα 2 [M r (C K + M,r )] 2 / Φ 2 1 (C+ 4), which implies that [M r (C K + M,r )] α 1 Φ 1 C+ 4. This concludes the proof of the first assertion. If the convergence in (2) holds for r=1 and r=, then the last assertion of the statement follows from the interpolation inequality: for every 1< s<, p p s max{ p p 1, p p }. Appendix B. Auxiliary results for Proposition 3 Following Parzen (1962), Watson and Leadbetter (1963), we adopt the subsequent definition. Definition 2. The Fourier transform or characteristic function of a Lebesgue probability density function p on R, denoted by p, is said to decrease algebraically of degreeβ> if lim t t β p(t) = B p, < B p <. The following lemma is essentially contained in the first theorem of section 3B in Watson and Leadbetter (1963). Lemma 1. Let p L 2 (R) be a probability density with characteristic function that decreases algebraically of degree β>1/2. Let h L 1 (R) have Fourier transform h satisfying 1 h(t) 2 I β [ h] := t 2β dt<. Then,δ 2(β 1/2) p p h δ 2 2 (2π) 1 B 2 p I β [ h] asδ. Proof. Since p L 1 (R) L 2 (R), then p h δ q p q h δ 1 <, for q=1, 2. Thus, p h δ L 1 (R) L 2 (R). It follows that (p p h δ ) L 1 (R) L 2 (R). Hence, { B 2 p I β[ h]+ p p h δ 2 2 =δ2β 1 2π 1 h(z) 2 z 2β [ z/δ 2β p(z/δ) 2 B 2 p ] dz }, where the second integral tends to by the dominated convergence theorem because of assumption (B.1). In the next remark, which is essentially due to Davis (1977), section 3, we consider a sufficient condition for a function h L 1 (R) to satisfy requirement (B.1). Remark 4. If h L 1 (R), then t 2β 1 h(t) 2 dt< forβ>1/2. Suppose further that there exists an integer r 2 1 such that x m h(x) dx=, for m=1,..., r 1, and x r h(x) dx. Then, [1 h(t)] t r [ = t r e itx r 1 (itx) j j= j! ] h(x) i r dx= (r 1)! x r h(x) 1 (1 u) r 1 e itxu du dx ir r! (B.1) x r h(x) dx, as t. For r β, the integral 1 t 2β 1 h(t) 2 dt<. Conversely, for r<β, the integral diverges. Therefore, for 1/2<β 2, any symmetric probability density h with finite second moment is such that I β [ h]< and condition (B.1) is verified. 8

9 References Barron, A., Schervish, M.J., Wasserman, L., The consistency of posterior distributions in nonparametric problems. The Annals of Statistics 27 (2), Castillo, I., 214. On Bayesian supremum norm contraction rates. The Annals of Statistics, 42 (5), Dattner, I., Reiß, M., Trabs, M., 213. Adaptive quantile estimation in deconvolution with unknown error distribution. Technical Report. URL< Bernoulli, to appear Davis, K.B., Mean integrated square error properties of density estimates. The Annals of Statistics 5 (3), Gao, F., van der Vaart, A., 215. Posterior contraction rates for deconvolution of Dirichlet-Laplace mixtures. Technical Report. URL< Ghosal, S., Ghosh, J.K., van der Vaart, A.W., 2. Convergence rates of posterior distributions. The Annals of Statistics 28 (2), Ghosal, S., van der Vaart, A.W., 21. Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities. The Annals of Statistics 29 (5), Ghosal, S., van der Vaart, A., 27. Posterior convergence rates of Dirichlet mixtures at smooth densities. The Annals of Statistics 35 (2), Giné, E., Nickl, R., 211. Rates of contraction for posterior distributions in L r -metrics, 1 r. The Annals of Statistics 39 (6), Giné, E., Nickl, R., 215. Mathematical Foundations of Infinite-dimensional Statistical Models. Cambridge Series in Statistical and Probabilistic Mathematics. McDiarmid, C., On the method of bounded differences. In: Surveys in Combinatorics, Cambridge University Press, Cambridge, pp Parzen, E., On estimation of a probability density function and mode. Annals of Mathematical Statistics 33 (3), Scricciolo, C., 27. On rates of convergence for Bayesian density estimation. Scandinavian Journal of Statistics 34 (3), Scricciolo, C., 211. Posterior rates of convergence for Dirichlet mixtures of exponential power densities. Electronic Journal of Statistics 5, Scricciolo, C., 214. Adaptive Bayesian density estimation in L p -metrics with Pitman-Yor or normalized inverse- Gaussian process kernel mixtures. Bayesian Analysis 9 (2), Shen, W., Tokdar, S.T., Ghosal, S., 213. Adaptive Bayesian multivariate density estimation with Dirichlet mixtures. Biometrika 1 (3), Shen, X., Wasserman, L., 21. Rates of convergence of posterior distributions. The Annals of Statistics 29 (3), Watson, G.S., Leadbetter, M.R., On the estimation of the probability density, I. Annals of Mathematical Statistics 34 (2),

On Deconvolution of Dirichlet-Laplace Mixtures

On Deconvolution of Dirichlet-Laplace Mixtures Working Paper Series Department of Economics University of Verona On Deconvolution of Dirichlet-Laplace Mixtures Catia Scricciolo WP Number: 6 May 2016 ISSN: 2036-2919 (paper), 2036-4679 (online) On Deconvolution

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam.  aad. Bayesian Adaptation p. 1/4 Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection

More information

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der

More information

Nonparametric Bayesian Uncertainty Quantification

Nonparametric Bayesian Uncertainty Quantification Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery

More information

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

Estimation of the functional Weibull-tail coefficient

Estimation of the functional Weibull-tail coefficient 1/ 29 Estimation of the functional Weibull-tail coefficient Stéphane Girard Inria Grenoble Rhône-Alpes & LJK, France http://mistis.inrialpes.fr/people/girard/ June 2016 joint work with Laurent Gardes,

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

1234 S. GHOSAL AND A. W. VAN DER VAART algorithm to adaptively select a finitely supported mixing distribution. These authors showed consistency of th

1234 S. GHOSAL AND A. W. VAN DER VAART algorithm to adaptively select a finitely supported mixing distribution. These authors showed consistency of th The Annals of Statistics 2001, Vol. 29, No. 5, 1233 1263 ENTROPIES AND RATES OF CONVERGENCE FOR MAXIMUM LIKELIHOOD AND BAYES ESTIMATION FOR MIXTURES OF NORMAL DENSITIES By Subhashis Ghosal and Aad W. van

More information

L p Spaces and Convexity

L p Spaces and Convexity L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function

More information

Latent factor density regression models

Latent factor density regression models Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Real Analysis Problems

Real Analysis Problems Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.

More information

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539 Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory

More information

Adaptive Bayesian procedures using random series priors

Adaptive Bayesian procedures using random series priors Adaptive Bayesian procedures using random series priors WEINING SHEN 1 and SUBHASHIS GHOSAL 2 1 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 2 Department

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

Minimax Estimation of a nonlinear functional on a structured high-dimensional model

Minimax Estimation of a nonlinear functional on a structured high-dimensional model Minimax Estimation of a nonlinear functional on a structured high-dimensional model Eric Tchetgen Tchetgen Professor of Biostatistics and Epidemiologic Methods, Harvard U. (Minimax ) 1 / 38 Outline Heuristics

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Convergence of latent mixing measures in nonparametric and mixture models 1

Convergence of latent mixing measures in nonparametric and mixture models 1 Convergence of latent mixing measures in nonparametric and mixture models 1 XuanLong Nguyen xuanlong@umich.edu Department of Statistics University of Michigan Technical Report 527 (Revised, Jan 25, 2012)

More information

Heat kernels of some Schrödinger operators

Heat kernels of some Schrödinger operators Heat kernels of some Schrödinger operators Alexander Grigor yan Tsinghua University 28 September 2016 Consider an elliptic Schrödinger operator H = Δ + Φ, where Δ = n 2 i=1 is the Laplace operator in R

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Bayesian Consistency for Markov Models

Bayesian Consistency for Markov Models Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for

More information

Tools from Lebesgue integration

Tools from Lebesgue integration Tools from Lebesgue integration E.P. van den Ban Fall 2005 Introduction In these notes we describe some of the basic tools from the theory of Lebesgue integration. Definitions and results will be given

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true 3 ohn Nirenberg inequality, Part I A function ϕ L () belongs to the space BMO() if sup ϕ(s) ϕ I I I < for all subintervals I If the same is true for the dyadic subintervals I D only, we will write ϕ BMO

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018

EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018 EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018 While these notes are under construction, I expect there will be many typos. The main reference for this is volume 1 of Hörmander, The analysis of liner

More information

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract Convergence

More information

Lecture 4 Lebesgue spaces and inequalities

Lecture 4 Lebesgue spaces and inequalities Lecture 4: Lebesgue spaces and inequalities 1 of 10 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 4 Lebesgue spaces and inequalities Lebesgue spaces We have seen how

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

1.5 Approximate Identities

1.5 Approximate Identities 38 1 The Fourier Transform on L 1 (R) which are dense subspaces of L p (R). On these domains, P : D P L p (R) and M : D M L p (R). Show, however, that P and M are unbounded even when restricted to these

More information

THE INVERSE FUNCTION THEOREM

THE INVERSE FUNCTION THEOREM THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Nonparametric Bayesian model selection and averaging

Nonparametric Bayesian model selection and averaging Electronic Journal of Statistics Vol. 2 2008 63 89 ISSN: 1935-7524 DOI: 10.1214/07-EJS090 Nonparametric Bayesian model selection and averaging Subhashis Ghosal Department of Statistics, North Carolina

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

On some shift invariant integral operators, univariate case

On some shift invariant integral operators, univariate case ANNALES POLONICI MATHEMATICI LXI.3 1995) On some shift invariant integral operators, univariate case by George A. Anastassiou Memphis, Tenn.) and Heinz H. Gonska Duisburg) Abstract. In recent papers the

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Wavelets and modular inequalities in variable L p spaces

Wavelets and modular inequalities in variable L p spaces Wavelets and modular inequalities in variable L p spaces Mitsuo Izuki July 14, 2007 Abstract The aim of this paper is to characterize variable L p spaces L p( ) (R n ) using wavelets with proper smoothness

More information

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes Alea 4, 117 129 (2008) Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes Anton Schick and Wolfgang Wefelmeyer Anton Schick, Department of Mathematical Sciences,

More information

1 Math 241A-B Homework Problem List for F2015 and W2016

1 Math 241A-B Homework Problem List for F2015 and W2016 1 Math 241A-B Homework Problem List for F2015 W2016 1.1 Homework 1. Due Wednesday, October 7, 2015 Notation 1.1 Let U be any set, g be a positive function on U, Y be a normed space. For any f : U Y let

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

Translation Invariant Experiments with Independent Increments

Translation Invariant Experiments with Independent Increments Translation Invariant Statistical Experiments with Independent Increments (joint work with Nino Kordzakhia and Alex Novikov Steklov Mathematical Institute St.Petersburg, June 10, 2013 Outline 1 Introduction

More information

Bayes spaces: use of improper priors and distances between densities

Bayes spaces: use of improper priors and distances between densities Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de

More information

Priors for the frequentist, consistency beyond Schwartz

Priors for the frequentist, consistency beyond Schwartz Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist

More information

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector On Minimax Filtering over Ellipsoids Eduard N. Belitser and Boris Y. Levit Mathematical Institute, University of Utrecht Budapestlaan 6, 3584 CD Utrecht, The Netherlands The problem of estimating the mean

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

MAT 578 FUNCTIONAL ANALYSIS EXERCISES

MAT 578 FUNCTIONAL ANALYSIS EXERCISES MAT 578 FUNCTIONAL ANALYSIS EXERCISES JOHN QUIGG Exercise 1. Prove that if A is bounded in a topological vector space, then for every neighborhood V of 0 there exists c > 0 such that tv A for all t > c.

More information

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by . Sanov s Theorem Here we consider a sequence of i.i.d. random variables with values in some complete separable metric space X with a common distribution α. Then the sample distribution β n = n maps X

More information

A new class of pseudodifferential operators with mixed homogenities

A new class of pseudodifferential operators with mixed homogenities A new class of pseudodifferential operators with mixed homogenities Po-Lam Yung University of Oxford Jan 20, 2014 Introduction Given a smooth distribution of hyperplanes on R N (or more generally on a

More information

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction SHARP BOUNDARY TRACE INEQUALITIES GILES AUCHMUTY Abstract. This paper describes sharp inequalities for the trace of Sobolev functions on the boundary of a bounded region R N. The inequalities bound (semi-)norms

More information

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: Oct. 1 The Dirichlet s P rinciple In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: 1. Dirichlet s Principle. u = in, u = g on. ( 1 ) If we multiply

More information

NORM DEPENDENCE OF THE COEFFICIENT MAP ON THE WINDOW SIZE 1

NORM DEPENDENCE OF THE COEFFICIENT MAP ON THE WINDOW SIZE 1 NORM DEPENDENCE OF THE COEFFICIENT MAP ON THE WINDOW SIZE Thomas I. Seidman and M. Seetharama Gowda Department of Mathematics and Statistics University of Maryland Baltimore County Baltimore, MD 8, U.S.A

More information

arxiv: v2 [math.st] 18 Feb 2013

arxiv: v2 [math.st] 18 Feb 2013 Bayesian optimal adaptive estimation arxiv:124.2392v2 [math.st] 18 Feb 213 using a sieve prior Julyan Arbel 1,2, Ghislaine Gayraud 1,3 and Judith Rousseau 1,4 1 Laboratoire de Statistique, CREST, France

More information

Singular Integrals. 1 Calderon-Zygmund decomposition

Singular Integrals. 1 Calderon-Zygmund decomposition Singular Integrals Analysis III Calderon-Zygmund decomposition Let f be an integrable function f dx 0, f = g + b with g Cα almost everywhere, with b

More information

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011 LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

Homework 1 Due: Wednesday, September 28, 2016

Homework 1 Due: Wednesday, September 28, 2016 0-704 Information Processing and Learning Fall 06 Homework Due: Wednesday, September 8, 06 Notes: For positive integers k, [k] := {,..., k} denotes te set of te first k positive integers. Wen p and Y q

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

1 Introduction. 2 Measure theoretic definitions

1 Introduction. 2 Measure theoretic definitions 1 Introduction These notes aim to recall some basic definitions needed for dealing with random variables. Sections to 5 follow mostly the presentation given in chapter two of [1]. Measure theoretic definitions

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Nonlinear Stationary Subdivision

Nonlinear Stationary Subdivision Nonlinear Stationary Subdivision Michael S. Floater SINTEF P. O. Box 4 Blindern, 034 Oslo, Norway E-mail: michael.floater@math.sintef.no Charles A. Micchelli IBM Corporation T.J. Watson Research Center

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Sobolev Spaces. Chapter Hölder spaces

Sobolev Spaces. Chapter Hölder spaces Chapter 2 Sobolev Spaces Sobolev spaces turn out often to be the proper setting in which to apply ideas of functional analysis to get information concerning partial differential equations. Here, we collect

More information

g(x) = P (y) Proof. This is true for n = 0. Assume by the inductive hypothesis that g (n) (0) = 0 for some n. Compute g (n) (h) g (n) (0)

g(x) = P (y) Proof. This is true for n = 0. Assume by the inductive hypothesis that g (n) (0) = 0 for some n. Compute g (n) (h) g (n) (0) Mollifiers and Smooth Functions We say a function f from C is C (or simply smooth) if all its derivatives to every order exist at every point of. For f : C, we say f is C if all partial derivatives to

More information

Posterior concentration rates for empirical Bayes procedures with applications to Dirichlet process mixtures

Posterior concentration rates for empirical Bayes procedures with applications to Dirichlet process mixtures Submitted to Bernoulli arxiv: 1.v1 Posterior concentration rates for empirical Bayes procedures with applications to Dirichlet process mixtures SOPHIE DONNET 1 VINCENT RIVOIRARD JUDITH ROUSSEAU 3 and CATIA

More information

Refining the Central Limit Theorem Approximation via Extreme Value Theory

Refining the Central Limit Theorem Approximation via Extreme Value Theory Refining the Central Limit Theorem Approximation via Extreme Value Theory Ulrich K. Müller Economics Department Princeton University February 2018 Abstract We suggest approximating the distribution of

More information

Mathematical Methods for Physics and Engineering

Mathematical Methods for Physics and Engineering Mathematical Methods for Physics and Engineering Lecture notes for PDEs Sergei V. Shabanov Department of Mathematics, University of Florida, Gainesville, FL 32611 USA CHAPTER 1 The integration theory

More information

Preliminary Exam 2018 Solutions to Morning Exam

Preliminary Exam 2018 Solutions to Morning Exam Preliminary Exam 28 Solutions to Morning Exam Part I. Solve four of the following five problems. Problem. Consider the series n 2 (n log n) and n 2 (n(log n)2 ). Show that one converges and one diverges

More information

Random Bernstein-Markov factors

Random Bernstein-Markov factors Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 13 Sequences and Series of Functions These notes are based on the notes A Teacher s Guide to Calculus by Dr. Louis Talman. The treatment of power series that we find in most of today s elementary

More information

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 RATE-OPTIMAL GRAPHON ESTIMATION By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Network analysis is becoming one of the most active

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2017-0074.R2 Title The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors Manuscript ID SS-2017-0074.R2 URL http://www.stat.sinica.edu.tw/statistica/

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

Density estimators for the convolution of discrete and continuous random variables

Density estimators for the convolution of discrete and continuous random variables Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract

More information

Nonparametric Bayes Inference on Manifolds with Applications

Nonparametric Bayes Inference on Manifolds with Applications Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces

More information

We denote the space of distributions on Ω by D ( Ω) 2.

We denote the space of distributions on Ω by D ( Ω) 2. Sep. 1 0, 008 Distributions Distributions are generalized functions. Some familiarity with the theory of distributions helps understanding of various function spaces which play important roles in the study

More information

Minimax theory for a class of non-linear statistical inverse problems

Minimax theory for a class of non-linear statistical inverse problems Minimax theory for a class of non-linear statistical inverse problems Kolyan Ray and Johannes Schmidt-Hieber Leiden University Abstract We study a class of statistical inverse problems with non-linear

More information