Convergence Rates of Nonparametric Posterior Distributions
|
|
- Antony Warren
- 5 years ago
- Views:
Transcription
1 Convergence Rates of Nonparametric Posterior Distributions Yang Xing arxiv: v1 [math.st] 17 Apr 008 Swedish University of Agricultural Sciences We study the asymptotic behavior of posterior distributions. We present general posterior convergence rate theorems, which extend several results on posterior convergence rates provided by Ghosal and Van der Vaart 000, Shen and Wasserman 001 and Walker, Lijor and Prunster 007. Our main tools are the Hausdorff α-entropy introduced by Xing and Ranneby 008 and a new notion of prior concentration, which is a slight improvement of the usual prior concentration provided by Ghosal and Van der Vaart 000. We apply our results to several statistical models. 1. Introduction. Recently, a major theoretical advance has occurred in the theory of Bayesian consistency for infinite-dimensional models. Schwartz 1965 first proved that, if the true density function is in the Kullback-Leibler support of the prior distribution, then the sequence of posterior distributions accumulates in all weak neighborhoods of the true density function. It is known that the condition of positivity of prior mass on each Kullback-Leibler neighborhood in Schwartz s theorem is not a necessary condition. When one considers problems of density estimation, it is natural to ask for the strong consistency of Bayesian procedures. Sufficient conditions for the strong Hellinger consistency and for evaluating consistency rates have been currently developed by many authors. In this paper we study the problem of determining whether the posterior distributions accumulate in Hellinger neighborhoods of the true density function. The rate of this convergence can be measured by the size of the smallest shrinking Hellinger balls around the true density function on which posterior masses tend to zero as the sample size increases to infinity. By the fundamental works of Ghosal, Ghosh and Van der Vaart 000 and Shen and Wasserman 001, we know that the convergence rate of posterior distributions is completely determined by two quantities: the rate of the metric entropy and the prior concentration rate. Roughly speaking, the rate of the metric entropy describes how large the model is, and the prior concentration rate depends on prior masses near the true distribution. Since the true distribution is unknown, the later assumption actually requires that the prior AMS 000 Subject Classifications. 6G07, 6G0, 6F15. Key words. Hellinger consistency, posterior distribution, rate of convergence, sieve, infinite-dimensional model. 1
2 distribution spreads its mass uniformly over the whole density space. Another elegant approach for determination of the convergence rate was provided by Walker 004, who obtained a sufficient condition for strong consistency by using summability of square root of prior probability instead of the metric entropy method. In this paper, in dealing with the rate of metric entropies we shall apply the Hausdorff α-entropy introduced by Xing and Ranneby 008, which is much smaller than widely used metric entropies and the bracketing entropy. For some important prior distributions of statistical models the Hausdorff α-entropies of all sieves are uniformly bounded, whereas it is generally impossible to get uniform boundedness of metric entropies of large sieves. The application of the Hausdorff α-entropy leads refinements of several theorems on posterior convergence rates, for instance, the well known assumptions on metric entropies and summability of square root of prior probability have been weakened, which particularly yields that Theorem 5 of Ghosal et al.007b is strengthened into Corollary 3 of this paper. To handle the prior concentration rate, we shall apply a new notion of prior concentration. Our approach is a slight improvement of the prior concentration provided by Ghosal et al.000, and moreover the proof of Lemma 1 on which the approach bases is quite simple. Finally, to get posterior convergence at the optimal rate 1/ n, we give an extension of Ghosal et al.000, Theorem.4, in which the universal testing constant has been replaced by any fixed constant. An outline of this paper is as follows. In Section we define the Hausdorff α-entropy with respect to a given prior and then present general theorems for the determination of posterior convergence rates. We also give a new approach to compute concentration rates. In Section 3 we apply our results to Bernstein polynomial priors, priors based on uniform distribution, log spline models and finite-dimensional models, which leads some improvements on known results for these models. The proofs of the main results are contained in Section 4.. Notations and Theorems. We consider a family of probability measures dominated by a σ-finite measure µ in X, a Polish space endowed with a σ-algebra X. Let X 1,X,...,X n stand for an independent identically distributed i.i.d. sample of n random variables, taking values in X and having a common probability density function f 0 with respect to the measure µ. Denote by F0 the infinite product distribution of the probability distribution F 0 associated with f 0. For two probability densities f and g µdx 1/ we denote the Hellinger distance Hf,g = X fx gx and the Kullback-Leibler divergence Kf,g = X µdx. Assume that the space F of probability density functions is separable with respect to the Hellinger metric and that F is the Borel σ-algebra of F. Given a prior distribution Π on F, the posterior distribution Π n is a random probability measure with the following expression fxlog fx gx Π n A = Π A X1,X,...,X n = n fx i Πdf A i=1 F n fx i Πdf i=1 = AR nfπdf F R nfπdf
3 forallmeasurablesubsetsa F, wherer n f = n { fxi /f 0 X i isthelikelihoodratio. In other words, the posterior distribution Π n is the conditional distribution of Π given the observations X 1,X,...,X n. If the posterior distribution Π n concentrates on arbitrarily small neighborhoods of the true density function f 0 almost surely or in probability, then it is said to be consistent at f 0 almost surely and in probability respectively. Throughout this paper, almost sure convergence and convergence in probability should be understood as to be with respect to the infinite product distribution F 0 of F 0. Our aim of this article is to present general theorems on posterior convergence rates at f 0. By the posterior convergence rate theorems of Ghosal, Ghosh and Van der Vaart 000, we know that the prior concentration rate and the rate of metric entropy both completely determine the convergence rate of posterior distributions. More specifically, a key inequality to determine almost sure convergence rates of posterior distributions is that for each ε > 0, F R nfπdf e 3nε i=1 Π f : Hf 0,f f 0 /f < ε almost surely for all sufficiently large n, where g stands for the supremum norm of the function g on X. This inequality was obtained by Ghosal et al.000, Lemma 8.4 under mild assumptions. It appears almost in all of papers handling strong convergence rates of posterior distributions. The reason is that in order to get the convergence rate of posterior distributions one needs to find a suitable lower bound for the denominator in the expression of posterior distributions. This is successfully done in Ghosal et al.000, who suggested that the prior Π puts sufficiently amount of mass around the true density function f 0 in the sense: Π f : Hf 0,f f 0 /f < ε n e n ε n c for some fixed constant c. Such a sequence { ε n is referred to as the concentration rate of the prior Π around f 0. Here we give a slightly stronger result. We introduce a modification of the Hellinger distance H f 0,f = f0 x fx f 0 x X 3 fx µdx. 3 Itisclearthattheinequality f 0 /f 1holdsforalldensityfunctionsf andf 0 suchthat the supremum is well-defined, and the quality holds if and only if f = f 0 almost surely. Observe also that H f 0,f H f,f 0 and 3 1/ Hf 0,f H f 0,f. Moreover, we have H f 0,f Hf 0,f 3 which yields f0 /f / Hf 0,f f0 /f 1/4 Hf 0,f f0 /f 1/ { f F : H f 0,f ε n { f F : Hf0,f f0 / f < ε n { f F : Hf 0,f f0 / f < ε n. 3
4 The following simplelemma shows that, for ε n tobe a prior concentration rate, it isenough to assume Π W εn e n ε n c 3, where W ε = { f F : H f 0,f ε. Lemma 1. Let ε > 0 and c > 0. Then the inequality F0 F R nfπdf e nε 3+c Π W ε e nε c holds for all n. Lemma 1 provides a useful approach to compute prior concentration rates, particularly for models in which rate of convergence is governed by the prior concentration rate. It leads a simplification of the proof of Theorem. of Ghosal et al.000. Furthermore, we shall present general posterior convergence rate theorems in which the well known assumptions on metric entropies and summability of square root of prior probability are also weakened. We shall apply the Hausdorff α-entropy Jδ, G, α introduced by Xing et al.008. Denote by L µ the space of all nonnegative integrable functions with the norm f 1 = Xfxµdx. Write log0 =. Definition. Let α 0 and G F. For δ > 0, the Hausdorff α-entropy Jδ,G,α with respect to the prior distribution Π is defined as Jδ,G,α = log inf N ΠB j α, where the infimum is taken over all coverings {B 1,B,...,B N of G, where N may take the value, such that each B j is contained in some Hellinger ball {f : Hf j,f < δ of radius δ and center at f j L µ. Notethat theinfimum can beequivalentlytakenover allpartitions{p 1,P,...,P N of G such that the Hellinger radius of each subset P j does not exceed δ. It was proved in Xing et al.008, Lemma 1 that the Hausdorff α-entropy Jδ, G, α is an increasing subadditive function of G and satisfies Jδ,G,α log Nδ,G for all α 0, where Nδ,G stands for the minimal number of Hellinger balls of radius δ needed to cover G. For 0 α 1 and each G F, we also obtained the following useful inequality ΠG α e Jδ,G,α ΠG α Nδ,G 1 α. Our first result in this paper is the following general theorem on posterior strong convergence rates. Denote A ε = { f : Hf 0,f ε. Theorem 1. Let { ε n and { ε n be positive sequences such that n min ε n, ε n as n. Suppose that there exist constants c 1 > 0, c > 0, c 3 0, 0 α < 1 and a sequence {G n of subsets on F such that e n ε n c < and 1 e J ε n,g n,α n ε n c 1 <, 4
5 e n ε n 3+3c +c 3 ΠA εn \G n <, 3 Π f : H f 0,f ε n e n ε n c 3. 3α+αc Then for ε n = max ε n, ε n and each r > + +αc 3 +c 1 1 α, we have Π n Arεn 0 almost surely as n. As direct applications we have Corollary 1. Let c 1 0, c > 0, c 3 0 and 0 α < 1. Suppose that {ε n is a positive sequence satisfying e nε n c < and suppose that there exists a sequence {G n of subsets on F such that 1 Jε n,g n,α nε nc 1, ΠF\G n e nε n 3+3c +c 3, 3 Π f : H f 0,f ε n e nε n c 3. 3α+αc Then for each r > + +αc 3 +c 1 +c 1 α, we have that Π n Arεn 0 almost surely as n. Proof. It is clear that all conditions of Theorem 1 are fulfilled if we let ε n = ε n = ε n and replace the c 1 in Theorem 1 by c 1 +c of Corollary 1, and the proof is complete. Corollary 1 extends Theorem. of Ghosal et al.000, in which they have stronger conditions than 1 and 3 of Corollary 1. It is probably worth mentioning that for several important prior distributions of infinite-dimensional statistical models, the quantities Jε n,g n,α are uniformly bounded for all n and hence condition 1 of Corollary 1 is triviallyfulfilled, whereasgeneral metricentropiesofthesieveg n growtoinfinity asthesample size increases. A slightly different version of Theorem 1 is the following consequence, which is in fact an extension of Proposition 1 of Walker et all.007. Corollary. Let {ε n be a positive sequence such that nε n as n. Suppose that there exist constants c 1 > 0, c > 0, c 3 0, 0 α < 1 and a sequence { G nj with G nj F such that e nε n c < and 1 Nε n,g nj 1 α ΠG nj α e nε n c 1 < ; 5
6 e nε n 3+3c +c 3 ΠA εn \ G nj <, 3 Π f : H f 0,f ε n e nε n c 3. 3α+αc Then for each r > + +αc 3 +c 1, we have that Π 1 α n Arεn 0 almost surely as n. Proof. Let G n = G nj. We only need to verify condition 1 of Theorem 1 for such a sieve G n. By Lemma 1 of Xing et al.008 we have e Jε n,g n,α nε n c 1 Corollary then follows from Theorem 1. e Jε n,g nj,α nε n c 1 Nε n,g nj 1 α ΠG nj α e nε n c 1 <. The assertion of Theorem 1 is an almost sure statement that the posterior distributions outside a Hellinger ball with a multiple of ε n as radius converge to zero almost surely. Now we give an in-probability assertion under weaker conditions. Denote Vf,g = X fx log fx gx µdx. Theorem. Let { ε n and { ε n be positive sequences such that n min ε n, ε n as n. Suppose that there exist constants c 1 > 0, c 0, 0 α < 1 and a sequence {G n of subsets on F such that 1 J ε n,g n,α n ε nc 1 as n, e n ε n +c ΠA εn \G n 0 as n, 3 Π f : Kf 0,f < ε n and Vf 0,f < ε n e n ε n c. Then for ε n = max ε n, ε n and each r > + in probability as n. α+αc +c 1 1 α, we have that Π n Arεn 0 Observe that the conditions of Theorem 1 and Theorem are only used to ensure that Π n A εn \G n 0 as n. So one can replace the conditions of these theorems by Π n A εn \ G n 0 as n almost surely and in probability respectively. Theorem is an extended version of Theorem.1 of Ghosal et al.001 and Theorem 1 of Walker 6
7 et all.007. Furthermore, as a consequence of Theorem we obtain the following slight improvement of Theorem 5 of Ghosal et al.007b. Corollary 3. Let {ε n be a positive sequence such that nε n as n. Suppose that there exist constants c 1 > 0, c 0, 0 α < 1 and a sequence { G nj with G nj F. If 1 e nε n c 1 Nε n,g nj 1 α ΠG nj α 0 as n ; e nε n +c ΠA εn \ G nj 0 as n ; 3 Π f : Kf 0,f < ε n and Vf 0,f < ε n e nε n c, α+αc then for each r > + +c 1 1 α, we have that Π n Arεn 0 in probability as n. Proof. For G n = G nj, by Lemma 1 of Xing et al.008 we get e Jε n,g n,α nε n c 1 e Jε n,g nj,α nε n c 1 e nε n c 1 Nε n,g nj 1 α ΠG nj α which tends to zero as n and condition 1 of Theorem holds. Then by Theorem we conclude the proof. The above theorems cannot yield a convergence rate 1/ n because of the assumption nε n. Particularly, these theorems cannot well serve finite-dimensional models. Ghosal et al.000, 007a have obtained a nice theorem to handle such models. Denote B ε n = { f : Kf 0,f < ε n and Vf 0,f < ε n. Now our result is Theorem 3. Let {ε n be a positive sequence such that nε n are uniformly bounded away from zero, i.e., there exists a constant c 0 > 0 such that nε n c 0 for all n. Suppose that there exist constants 0 < α < 1, c 1 < 1 α 18 and a sequence {G n of subsets on F such that 1 e nε n ΠA ε n \G n ΠB ε n 0 as n, exp J jε n 3, { f G n : jε n Hf 0,f < jε n,α e c 1j nε n ΠBε n α for all sufficiently large positive integers j and n. Then for each r n we have that Π n Arn ε n 0 in probability as n. 7
8 Remark. Here we adopt the convention that if the denominator of a quotient equals zero thenthenumeratormust alsobezero. Hence, Theorem3isstilltrueevenwhenΠB ε n = 0 for some n. Corollary 4. Let {ε n be a positive sequence such that nε n are uniformly bounded away from zero. Suppose that there exist constants c 1, c and a sequence {G n of subsets on F such that 1 logn jε n 3, { f G n : jε n Hf 0,f < jε n c1 nε n for all integer j and n large enough, e nε n ΠA ε n \G n ΠB ε n 0 as n, 3 Π f G n :jε n Hf 0,f<jε n e c j nε n for all integer j and n large enough. ΠB ε n Then for each r n we have that Π n Arn ε n 0 in probability as n. Corollary 4 is a slightly stronger version of Theorem.4 of Ghosal et al.000. A notable improvement in Corollary 4 is that we have no restriction on the constant c, whereas their constant c equals half of some universal testing constant. Proof of Corollary 4. We only need to check condition of Theorem 3. It follows from Lemma 1 of Xing et al.008 and conditions 1 and 3 that J jε n 3,{ f G n : jε n Hf 0,f < jε n,α αlogπ f G n : jε n Hf 0,f < jε n +1 αlogn jε n 3,{ f G n : jε n Hf 0,f < jε n αlog e c j nε n ΠBε n +1 αc 1 nε n = αc + 1 αc 1 j j nε n +αlogπb ε n. Taking a small α in 0,1 and then letting j be large enough, we have that αc + 1 αc 1 1 α 18 j < and hence condition of Theorem 3 is fulfilled. The proof of Corollary4is complete. We will conclude this section by presenting an analogue of Theorem 3, which gives an almost sure assertion under stronger conditions. 8
9 Theorem 4. Let {ε n be a positive sequence such that there exists a constant c 0 > 0 such that nε n c 0 logn for all large n. Suppose that there exist constants 0 < α < 1, c 1 < 1 α 18, c > 1 c 0 and a sequence {G n of subsets on F such that 1 e nε n 3+c ΠA ε n \G n ΠW ε n <, exp J jε n 3, { f G n : jε n Hf 0,f < jε n,α e c 1j nε n ΠWεn α for all sufficiently large positive integers j and n. Then for each r large enough we have that Π n Arεn 0 almost surely as n. Completely following the proof of Corollary 4, we have the following consequence of Theorem 4. Corollary 5. Let {ε n be a positive sequence such that there exists a constant c 0 > 0 such that nε n c 0 logn for all large n. Suppose that there exist constants c 1, c > 1 c 0, c 3 and a sequence {G n of subsets on F such that 1 logn jε n 3, { f G n : jε n Hf 0,f < jε n c1 nε n for all integer j and n large enough, 3 e nε n 3+c ΠA ε n \G n ΠW ε n <, Π f G n :jε n Hf 0,f<jε n ΠW ε n e c 3j nε n for all integer j and n large enough. Then for each r large enough we have that Π n Arεn 0 almost surely as n. 3. Illustrations. In this section we apply our theorems to Bernstein polynomial priors, priors based on uniform distribution, log spline models and finite-dimensional models. This leads some improvements on known results for these models Bernstein polynomial prior. A Bernstein polynomial prior is a probability measure on the space of continuous probability distribution functions on [0, 1]. Petrone 1999 introduced the Bernstein polynomial prior Π by putting a prior distribution on the class of Bernstein densities in [0,1] in the following way: bx;k,f = k Fj/k Fj 1/k βx;j,k j +1, 9
10 where βx;a,b stands for the beta density βx;a,b = Γa+b ΓaΓb xa 1 1 x b 1, k has a distribution ρ, F is a random distribution independent of ρ. In other words, if B j stands for the set of all Bernstein densities of order j, then Π = ρjπ B j, where the probability measure Π Bj is the normalized restriction of Π on B j. We refer to Petrone and Wasserman 00 for a detailed description of the Bernstein polynomial prior, in which consistency of the posterior distribution for the Bernstein polynomial prior is discussed. Rates of convergence have been established under suitable tail conditions on ρ by Ghosal 001 and Walker et al.007, where the convergence is understood as convergence in F0 -probability. Ghosal 001, Theorem.3 proved that the prior concentration rate is logn 1/3 /n 1/3 and the entropy rate is logn 5/6 /n 1/3 under the tail assumption ρj e cj for all j, which yields the convergence rate logn 5/6 /n 1/3. Walker etal.007obtainedtheentropyratelogn 1/3 /n 1/3 under thelightertailconditionρj e 4j logj = 1/j j 4 for all j, which leads the convergence rate logn 1/3 /n 1/3. In the following we establish the entropy rate 1/n γ under the tail condition ρj 1/j j c 0 for all j, where c 0 is any fixed positive constant and γ is any fixed constant strictly less than 1/. Hence we also get the convergence rate logn 1/3 /n 1/3 under the weaker condition ρj 1/j j c 0. This improves the result of Walker et al.007. From Ghosal 001 it follows that there exists an absolute constant c > 0 such that Nε n,b j c/ε n j for all j. Given γ < 1/, choose G nj = B j and ε n = 1/n γ. Take 0 < α < 1 such that γ + 1/d < 1 with d = c 0 α/1 α. To verify condition 1 of Corollary 3, by ρj 1/j j c 0 we obtain that e nε n c 1 Nε n,b j 1 α ΠB j α e nε n c 1 c j1 α ρj α ε n e c 1n 1 γ e nε n c 1 1 j cn γ 1/d cn γ c ε n j c 0 α 1 α j d j1 α + j1 α j cn γ 1/d 1 j1 α, where the last sum on the right hand side tends to zero as n. To estimate the first term, note that for gt = cn γ /t d t we have that g t = cn γ /t d t logcn γ /t d d = 0 is equivalent to t = c 1/d v γ/d e 1. This implies gj e d 1n γ/d for all j, where d 1 stands for the constant dc 1/d e 1. Therefore, by the inequality x e x for x 0 we get e c 1n 1 γ 1 j cn γ 1/d cn γ j d j1 α e c 1 n 1 γ cn γ 1/d e d 11 αn γ/d e c 1n 1 γ + 1/d c 1/d +d 1 d 1 αn γ/d 0 as j, since 1 γ > γ/d, and hence condition1 ofcorollary3holds. Condition of Corollary 3 is trivially fulfilled and hence by Corollary 3 we obtain that the entropy rate is at least 1/n γ for any given γ < 1/. 10
11 3.. Prior based on uniform distribution. Ghosal et al.1997 established posterior consistency for prior distributions based on uniform distributions of finite subsets. Priors based on discrete uniform distributions were further studied in Ghosal et al.000, in which they used the bracketing entropy as a tool to compute the convergence rate of posterior distributions. As an application of Theorem 1, we now give a slight extension. Given ε n > 0, assume that there exist density functions f 1,...,f Nn such that all sets {f : H f,f j 3 1/ ε n form a covering of F. Denote by µ n the uniform discrete probability measure on the set {f 1,...,f Nn. Define a prior distribution Π = a j µ j for a given sequence a j with a j > 0 and a j = 1. Theorem 5. If logn n + logn + log 1 a n = Onε n as n, then the posterior distributions Π n converge almost surely at least at the rate ε n, that is, Π n f : Hf0,f rε n 0 as n almost surely for any given sufficiently large r. Proof. From logn = Onε n it follows that e nε n c < for all large c > 0. For any f with H f,f j 3 1/ ε n we have Hf,f j 3 1/ H f,f j ε n. Hence we obtain that {f : H f,f j 3 1/ ε n {f : Hf,f j < ε n and then Jε n,f,0 logn n = Onε n, which implies condition 1 of Theorem 1. Condition is trivially fulfilled. Condition 3 follows from the fact that Πf : H f 0,f ε n Πf : H f 0,f 3 1/ ε n a n /N n e nε n c 3 for some c 3 > 0, since {f : H f 0,f 3 1/ ε n contains at least some density function of the set {f 1,...,f Nn. The proof of Theorem 5 is complete. It seems to be unusual to find a covering of the density space with covering sets of type {f : H f,f j cε n. The most widely used norm for continuous functions should be the supremum norm. In fact, one can easily construct a new covering N n {f : H f,f j cε n of F in terms of a given covering N n {f : f g j ε n with nonnegative bounded functions g j not necessarily density functions, as shown in the following: Take f j x = g j x+ε n / X g j x+ε n µdx, where we assume that µ is a probability measure on X and ε n 1 for all n. Then for each f with f g j ε n, that is, gj x ε n fx g j x+ε n on X, we have fx f j x = fx g j x+ε n X = 1+4ε n +4ε n g j x+ε n µdx X X f j xµdx 1+ε n, fj x+ε µdx n where f j is some density function in {f : f g j ε n and the last inequality follows from f 1 f = 1. This implies that H f,f j 3 1+ε n Hf,f j Hf,f j H f, g j +ε n +H g j +ε n,f j 11
12 1 µdx 4ε n + g j x+ε n X Therefore, we have N n N { n f : H f,f j 8ε n 1 4ε n + 1+ε n 1 = 8ε n. { f : f gj ε n F. Observe that the numbers of covering subsets in both type coverings are equal. For models with H f,g controlled by a constant multiple of the Hellinger metric Hf, g such as the exponential family and a model with uniformly bounded supremum norm f/g for all density functions f and g, it is not necessary to assume that the probability measures µ n constructed above concentrate on a finite number of points. Here we give an extension of Theorem 5. Let c 0 1 and F c0 be a subfamily of F such that H f,g c 0 Hf,g for all f,g F c0. Given ε n > 0, let {P 1,...,P Kn be a partition of F c0 such that for each P i there exists f i in F c0 with P i {f : Hf i,f ε n /c 0. Take any probability measure µ n on F c0 with µ n P i = 1/K n for i = 1,,...,K n. Define then a prior distribution Π = a j µ j for a given sequence a j with a j > 0 and a j = 1. Now we have Theorem 6. Let f 0 F c0. If logk n + logn + log 1 a n = Onε n as n, then the posterior distributions Π n converge at least at the rate ε n almost surely. Proof. By the proof of Theorem 5, we only need to verify condition 3 of Theorem 1. Take f i0 F c0 such that Hf 0,f i0 ε n /c 0. Then, for all f F c0 with Hf i0,f ε n /c 0, we have that H f 0,f c 0 Hf 0,f c 0 Hf 0,f i0 + c 0 Hf i0,f ε n. Hence we get that Πf : H f 0,f ε n ΠP i0 a n /K n e nε n c 3 for some c 3 > 0, and the proof of Theorem 6 is complete. Observe that, given a covering {O 1,O,...,O Kn of F c0, one can easily construct a partition {P 1,P,...,P Kn of F c0 in the following way: P 1 = O 1 F c0 and P i = O i i 1 l=1 P l F c0 for i =,3,...,K n. Example Exponential families. We consider the exponential family of all density functions of the form e hx, where the function hx belongs to a fixed bounded subset in the Sobolev space C p [0,1] with p > 0. A subclass of this family has been recently studied by Scricciolo 006. Following a result of Kolmogorov and Tihomirov 1959, we know that the ε-entropy of this family with respect to the norm equals Oε 1/p. Thus, using the above argument we get that logk n = Onε n for ε n = n p/p+1 and hence by Theorem 6 the posterior distributions constructed above converge at the rate ε n = n p/p+1, which is known to be the optimal rate of convergence in the minimax sense under the Hellinger loss Log spline models. Log spline models for density estimation have been studied, among others, by Stone 1990 and Ghosal et al.000. Let [ k 1/K n,k/k n with 1
13 k = 1,,...,K n be a partition of the interval [0,1. The space of splines of order q relative to this partition is the set of all functions f : [0,1 R such that f is q times continuously differentiable on [0,1 and the restriction of f on each [ k 1/K n,k/k n is a polynomial of degree strictly less then q. Let J n = q+k n 1. This space of splines is a J n -dimensional vector space with a B-spline basis B 1 x,b x,...,b Jn x, see Ghosal et al.000 for the details of such a basis. Let F be the set of all density functions in C α [0,1]. Assume that the true density function f 0 x is bounded away from zero and infinity. We consider the J n -dimensional exponential subfamily of C α [0,1] of the form J n f θ x = exp θ j B j x cθ where θ = θ 1,θ,...,θ Jn Θ 0 = {θ 1,θ,...,θ Jn R J n : J n θ j = 0 and the constant cθ is chosen such that f θ x is a density function in [0,1]. Each prior on Θ 0 induces naturally a prior on F. Let θ = max j θ j be the infinity norm on Θ 0. Assume that a 1 n 1/α+1 K n a n 1/α+1 for two fixed positive constants a 1 and a. Assume that the prior Π for Θ 0 is supported on [ M,M] J n for some M 1 and has a density function with respect to the Lebesgue measure on Θ 0, which is bounded below by a J n 3 and above by a J n 4. Take a constant d > 0 such that d θ logf θ x for all θ Θ 0. Ghosal et al.000, Theorem 4.5 proved that, if f 0 C α [0,1] with q α 1/ and logf 0 x dm/, the posteriors Π n converge in probability at the rate n α/α+1. Using Corollary 5 we now get that under the same assumptions as in Ghosal et al.000, Theorem 4.5, the posteriors Π n are in fact convergent almost surely at the rate ε n = n α/α+1. To see this, take G n = F. Clearly, nε n logn for all large n. Condition 1 of Corollary 5 has been verified by Ghosal et al.000 and condition is trivially fulfilled. Condition 3 follows also from the proof of Theorem 4.5 in Ghosal et al.000, since the inequality H f 0,f θ Hf 0,f θ f 0 /f θ 1/ holds for all θ Θ Finite-dimensional models. Let β > 0 and let Θ be a bounded subset in R d with the Euclidean norm. Denote by F the family of all density functions f θ with the parameter θ in Θ satisfying a 1 θ 1 θ β Hf θ1,f θ 3H f θ1,f θ a θ 1 θ β for all θ 1, θ Θ, where a 1 and a are two fixed positive constants. Assume that the true value θ 0 is in Θ and that the density function of the prior distribution Π with respect to the Lebesgue measure on Θ is uniformly bounded away from zero and infinity. Under slightly weaker conditions, Ghosal et al.000 proved that the posterior distributions Π n converge in probability at the rate 1/ n. Now we give an almost sure assertion for this model. Theorem 7. Under the above assumptions, the posterior distributions Π n converge almost surely at least at the rate logn/ n. 13,
14 Proof. We shall apply Corollary 5 for G n = F. Clearly, nε n = logn for ε n = logn/ n. Condition 1 has been verified in the proof of Theorem 5.1 of Ghosal et al.000. Condition is trivially fulfilled. Using H f θ0,f θ 3 1/ a θ 0 θ β, we have that ΠW εn Π θ : θ θ 0 3ε n /a 1/β. Hence, the verification of condition 3 follows from the same lines as the proof of Ghosal et al.000, Theorem 5 and then by Corollary 5 we conclude the proof of Theorem Lemmas and Proofs. In this section we give proofs of our lemmas and theorems. For simplicity of notations, we assume throughout this section that ε n = ε n = ε n. Proof of Lemma 1. ItisnorestrictiontoassumethatΠ W ε > 0. UsingJensen sinequality for the convex function x 1/ for x > 0 and Chebyshev s inequality, we obtain that F 0 On the other hand, we have = 1+ X = 1+ X = 1+ X F R nfπdf e nε 3+c Π W ε F0 R n fπdf e nε W ε = F0 e nε 3 +c 1 R n fπdf ΠW ε W ε F0 e nε 3 +c 1 ΠW ε e nε 3 +c 1 ΠW ε E R n f 1 Πdf W ε = e nε 3 +c 1 ΠW ε E W ε E 3+c Π W ε 1 W ε R n f 1 Πdf f 0 X 1 fx 1 n Πdf. f 0 X 1 fx 1 = 1+E f0 X 1 fx 1 fx1 f0 x fx fx f0 x fx+ fx f0 xµdx f0 x f0 x fx µdx+ fx f0 x f0 x fx µdx+ 1 fx 14 X f0 x fxf 0 x µdx X f0 x fx µdx
15 = 1+ 3 H f 0,f e 3 H f 0,f e 3 ε, where the last inequality holds when f W ε. Hence we get F 0 F R nfπdf e nε 3+c Π W ε e nε 3 +c 1 ΠW ε The proof of Lemma 1 is complete. W ε e 3 nε Πdf = e nε c. In the proof of Theorem 1 we use the following Lemma, which is similar to Lemma 5 of Barron et al Lemma. Let c > 0 and c 3 0. Let {ε n be a positive sequence such that ΠW ε n e nε n c 3 for all n and e nε n c <. If a sequence {D n of subsets in F satisfies e nε n 3+3c +c 3 ΠD n <, then Π n D n 0 almost surely as n. Proof. From Chebyshev s inequality and Fubini s theorem it turns out that F 0 { D n R n fπdf e nε = e nε n 3+3c +c 3 n 3+3c +c 3 e nε n 3+3c +c 3 E R n fπdf D n ER n fπdf = e nε n 3+3c +c 3 ΠD n D n for all n. Hence by the first Borel-Cantelli Lemma we get that D n R n fπdf e nε n 3+3c +c 3 almost surely for all n large enough. On the other hand, Lemma 1 and the first Borel- Cantelli Lemma yield that F R nfπdf ΠW εn e nε n 3+c e nε n 3+c +c 3 almost surely for all n. Therefore, we obtain that with probability one, Π n D n = D n R n fπdf F R nfπdf e nε n c, which tends to zero as n and the proof of Lemma is complete. 15
16 Proof of Theorem 1. It isclear that if condition1 holds for some α = α 0 then it also holds 3α+αc for any α α 0. So we may assume that 0 < α < 1. Given r > + +αc 3 +c 1 1 α, we have Π n A rεn Π n Gn A rεn +Πn Aεn \G n. Itthen followsfrom Lemma that Π n Aεn \G n 0 almostsurelyasn. Soitsuffices toprovethatπ n Gn A rεn 0almostsurely asn. BythedefinitionofJεn,G n,α, for each fixed n there exist functions f 1,f,...,f N in L µ such that G n A rεn N B j, where B j = G n A rεn {f : Hf j, f < ε n and N ΠB j α e Jε n,g n,α. It is no restriction to assume that all the sets B j are disjoint and nonempty. Taking a fj B j we get that Hf j,f 0 Hfj,f 0 Hfj,f j r 1ε n. Now for each B j we have B j R n fπdf = ΠB j n 1 B j R k+1 fπdf B j R k fπdf = ΠB j n 1 f kbj X k+1 f 0 X k+1, wheref kbj x = B j fxr k fπdf / B j R k fπdfandr 0 f = 1. Thefunction f kbj was introduced by Walker 004 and can be considered as the predictive density of f with a normalized posterior distribution, restricted on the set B j. Clearly, Jensen s inequality yields that Hf kbj,f j ε n for each k. Hence Hf kbj,f 0 Hf j,f 0 Hf j,f kbj r ε n > 0. Since e nε n c <, it turns out from Lemma 1 and the first Borel- Cantelli Lemma that F R nfπdf e nε n 3+c Π W εn almost surely for all n large enough. Hence, by condition 3 we obtain that Π n Gn A rεn Πn G n A rεn α N Π n B j α N Π n B j α = N ΠB j α n 1 f kbj X k+1 α f 0 X k+1 α F R nfπdf α N ΠB j α n 1 f kbj X k+1 α f 0 X k+1 α α ΠW εn e nε n 3+c e nε n 3+c +c 3 α N n 1 ΠB j α almost surely for all n large enough. Since r > + f kbj X k+1 α f 0 X k+1 α 3α+αc +αc 3 +c 1, the inequality 1 α 3+c +c 3 α < 1 r 1 α c 1 holds. Take a constant b with 3+c +c 3 α < b < 1 r 1 α c 1. Denote F k = σ{x 1,X,...,X k. Then we have F 0 { N n 1 ΠB j α f kbj X k+1 α f 0 X k+1 α e nε n b 16
17 where E n 1 N e nε n b E n 1 ΠB j α N = e nε n b ΠB j α E n 1 n 1 e Jε n,g n,α+nε n b max E 1 j N = E f kbj X k+1 α n 1 f 0 X k+1 α = E E n f kbj X k+1 α f 0 X k+1 α f kbj X k+1 α f 0 X k+1 α f kbj X k+1 α f 0 X k+1 E fn 1Bj X n α α f 0 X n α f kbj X k+1 α f 0 X k+1 α, f kbj X k+1 α f 0 X k+1 α F n 1 F n 1. By the conditional Hölder s inequality we get that with probability one, fn 1Bj X n α E f 0 X n α F n 1 fn 1Bj X n α = E f 0 X n α f n 1Bj X n α f 0 X n α F n 1 fn 1Bj X n α α E f 0 X n α α F n 1 α fn 1Bj X n α α = E f 0 X n α α fn 1Bj X n α α E f 0 X n α α F n 1 α. F n 1 α Take the integer m with α 1 α m < α. Repeating the above procedure m 1 more 1 α times we obtain that with probability one, fn 1Bj X n α E f 0 X n α F n 1 fn 1Bj X n E f 0 X n α m 1 α+α α m 1 α+α F n 1 m 1 α+α m, which by the conditional Hölder s inequality is less than fn 1Bj X n 1 E f 0 X n 1 F n 1 = 1 Hf n 1B j,f 0 α m 1 α = f n 1Bj X n f 0 X n µdx n m 1 α m 1 1 r ε n e m r αε n e 1 r α 1ε n. 17 α m 1
18 Hence, with probability one, we have n 1 E f kbj X k+1 α f 0 X k+1 α n e 1 r α 1ε n E f kbj X k+1 α f 0 X k+1 α. Repeating the same argument n 1 times, we obtain that for each j, n 1 E f kbj X k+1 α f 0 X k+1 α Therefore, we have gotten that for all n, F 0 { N e Jε n,g n,α+nε n n 1 ΠB j α e 1 r α 1nε n. f kbj X k+1 α f 0 X k+1 α b e nε n b+ 1 r α 1 e Jε n,g n,α nε n c 1. Thus, together with condition 1, the first Borel-Cantelli Lemma yields that N n 1 ΠB j α f kbj X k+1 α f 0 X k+1 α e nε n b almost surely for all n large enough. Hence we have Π n Gn A rεn e nε n 3α+αc +αc 3 b, which tends to zero as n, since 3+c +c 3 α < b and nε n proof of Theorem 1 is complete. as n. The To prove Theorem, we need a replacement of Lemma under weaker conditions. Lemma 3. Let c 0 and let {ε n be a positive sequence such that ΠB ε n e nε n c for all n. If a sequence {D n of subsets in F satisfies e nε n +c ΠD n 0 as n, then Π n D n 0 in probability as n. Proof. From Lemma 1 of Shen et al. 001 or Lemma 8.1 of Ghosal et al. 000 it turns out that we have, with probability tending to 1, D Π n D n n R n fπdf e nε n +c R n fπdf. ΠB ε n e nε n D n Hence for any given δ > 0 we have that F 0 { Πn D n δ F 0 { e nε n +c 18 R n fπdf δ +o1 D n
19 1 δ enε n +c E R n fπdf+o1 D n = 1 δ enε n +c ΠD n +o1 0 as n, which concludes the proof of Lemma 3. Proof of Theorem. Assumethat0 < α < 1. TheproofofTheoremfollowsfromthesame lines as the proof of Theorem 1. By Lemma 3 it suffices to prove that Π n Gn A rεn 0 in probability as n. For any given δ > 0, following the proof of Theorem 1 we get F 0 { Πn Gn A rεn δ F 0 { e nε n +c α N n 1 ΠB j α f kbj X k+1 α f 0 X k+1 α δ +o1 1 N δ enε n +c α ΠB j α E n 1 f kbj X k+1 α f 0 X k+1 α +o1 n 1 δ ejε n,g n,α+nε n +cα max E 1 j N δ ejε n,g n,α+nε n +c α+ 1 r α 1 f kbj X k+1 α f 0 X k+1 α +o1 +o1 δ ejε n,g n,α nε n c +o1 0 as n, α+αc where the last inequality follows from r > + +c 1. The proof of Theorem is 1 α complete. Proof of Theorem 3. Since Π n A rn ε n Π n Gn A rn ε n +Πn Aεn \G n, it suffices that the terms on the right hand side both tend to zero in probability. Given δ > 0, the proof of Lemma 3 implies that F 0 { Πn Aεn \G n δ e nε n ΠAεn \G n δ ΠB ε n +o1, which by condition 1 tends to zero as n. Assume that [r n ] stands for the largest integer less than or equal to r n and assume that D j = { f G n : jε n Hf 0,f < jε n Indeed, D j is an empty set for j > /ε n since the Hellinger distance cannot exceed. Then we have F 0 { Πn Gn A rn ε n δ F 0 { j=[r n ] Π n D j δ 19
20 F 0 { j=[r n ] Π n D j α δ 1 δ E j=[r n ] Π n D j α 1 δ j=[r n ] EΠ n D j α. Take a partition N j i=1 D ji for each D j such that D ji {f : Hf ji, f < jε n 3 for some f ji in L µ and N j ΠD ji α exp J jε n 3,D j,α e c 1j nε n ΠBε n α, i=1 where the last inequality follows from condition. Using the same argument as the proof of Theorem 1, one can get Hf kdji,f 0 jε n /3 and hence we have with probability tending to 1 F 0 { Πn Gn A rn ε n δ 1 δ j=[r n ] E N j i=1 α Π n D ji δ 1 δ 3 δnε n j=[r n ] N j j=[r n ] i=1 j=[r n ] EΠ n D ji α eαnε n δπb ε n α e nε n α+c 1j j α 1 δ j=[r n ] N j j=[r n ] i=1 ΠD ji α e 1 18 j nε n α 1 1 nε n α+c1 j j α 1 1 c 1 j j 1 α 3 δnε n c α = 3 δnεn c α [r n ] 1 j=[r n ] 1 j 1 1 j which tends to zero as n, since nε n are uniformly bounded away from zero and r n, where the second inequality follows from α 0,1, the third from the proof of Theorem 1, the fifth from the elementary inequality e x < 1 x for x > 0 and some of the inequalities only hold for all large n. Thus, we have proved that Π n Gn A rn ε n converges to zero in probability. The proof of Theorem 3 is complete. Proofof Theorem 4. Fromnε n c 0 lognandc 0 c > 1itturnsoutthate nε n c 1/n c 0 c and hence, by the first Borel-Cantelli Lemma and Lemma 1, we obtain that F R nfπdf e nε n 3+c Π W εn almost surely for all large n. Then, following the proofs of Lemma 3 and Theorem 3, one can get that for any δ > 0 and r > 1, F 0 { { Πn Arεn δ F δ { 0 Πn Aεn \G n +F δ 0 Πn Gn A rεn 0
21 enε n 3+c ΠA εn \G n + 4 δπw εn δ j=[r] e nε n 3+c α+c 1 j j α 1 := a n +b n. Condition 1 yields that a n <. On the other hand, since c 1 < 1 α 18, we have that for all n and for all r so large that 3+c α+c 1 + α 1 18 r 1 c 0, 4nc 0 b n 4enε n 3+c α δ 3+c α+c 1 + α 1 18 [r] j=[r] e nε δ 4nc0 1 n c 0c 1 + α 1 18 n c 1+ α 1 18 j = 4enε n 3+c α+c 1 + α 1 18 [r] δ 1 e nε n c 1+ α c α+c 1 + α 1 18 r 1 δ 1 c 0c 1 + α 1 4n 18 δ 1 c 0c 1 + α and hence b n <. Thus, by the first Borel-Cantelli Lemma we obtain that Π n Arεn δ almost surely for all large n, which concludes the proof of Theorem 4. REFERENCES BARRON, A., SCHERVISH, M. and WASSERMAN, L The consistency of posterior distributions in nonparametric problems. Ann. Statist. 7, GHOSAL, S Convergence rates for density estimation with Bernstein polynomials. Ann. Statist. 9, GHOSAL, S., GHOSH, J. K. and RAMAMOORTHI, R. V Non-informative priors via sieves and packing numbers. In Advances in Statistical Decision Theory and Applications S. Panchapakeshan and N.Balakrishnan eds Birkhäuser, Boston. GHOSAL, S., GHOSH, J. K. and VAN DERVAART, A. W Convergence rates of posterior distributions. Ann. Statist. 8, GHOSAL,S.andVAN DERVAART,A.W.001. Entropiesand ratesof convergencefor maximum likelihood and Bayes estimation for mixtures of normal densities. Ann. Statist. 9, GHOSAL, S. and VAN DER VAART, A. W. 007a. Convergence rates of posterior distributions for noniid observations. Ann. Statist. 35, GHOSAL, S. and VAN DER VAART, A. W. 007b. Posterior convergence rates of Dirichlet mixtures at smooth densities. Ann. Statist. 35, KOLMOGOROV, A. N. and TIHOMIROV, V. M ε-entropy and ε-capacity of sets in function spaces. Uspekhi Mat. Nauk 14, 3-86 [in Russian; English transl. Amer. Math. Soc. Transl. Ser., 17, ]. LIJOI, A., PRUNSTER, I. and WALKER, S On convergence rates for nonparametric posterior distributions. Aust. N. Z. J. Stat. 493, PETRONE, S Random Bernstein polynomials. Scand. J. Statist. 6, PETRONE, S. and WASSERMAN, L. 00. Consistency of Bernstein polynomial posteriors. J. R. Stat. Soc. Ser. B Stat. Methodol. 64, SCHWARTZ, L On Bayes procedures Z. Wahr. verw. Geb. 4,
22 SCRICCIOLO, C Convergence rates for Bayesian density estimation of infinite-dimensional exponential families. Ann. Statist. 34, SHEN, X. and WASSERMAN, L Rates of convergence of posterior distributions. Ann. Statist. 9, STONE, C. J Lerge-sample inference for log-spline models. Ann. Statist. 18, WALKER, S New approaches to Bayesian consistency. Ann. Statist. 3, WALKER, S., LIJOI, A. and PRUNSTER, I On rates of convergence for posterior distributions in infinite-dimensional models. Ann. Statist. 35, XING, Y. and RANNEBY, B Sufficient conditions for Bayesian consistency. Preprint. Yang Xing Centre of Biostochastics Swedish University of Agricultural Sciences SE , Umeå Sweden address:
BERNSTEIN POLYNOMIAL DENSITY ESTIMATION 1265 bounded away from zero and infinity. Gibbs sampling techniques to compute the posterior mean and other po
The Annals of Statistics 2001, Vol. 29, No. 5, 1264 1280 CONVERGENCE RATES FOR DENSITY ESTIMATION WITH BERNSTEIN POLYNOMIALS By Subhashis Ghosal University of Minnesota Mixture models for density estimation
More informationBayesian Consistency for Markov Models
Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for
More informationAsymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors
Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:
More informationPriors for the frequentist, consistency beyond Schwartz
Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationIntroduction and Preliminaries
Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis
More informationNonparametric Bayesian model selection and averaging
Electronic Journal of Statistics Vol. 2 2008 63 89 ISSN: 1935-7524 DOI: 10.1214/07-EJS090 Nonparametric Bayesian model selection and averaging Subhashis Ghosal Department of Statistics, North Carolina
More information1234 S. GHOSAL AND A. W. VAN DER VAART algorithm to adaptively select a finitely supported mixing distribution. These authors showed consistency of th
The Annals of Statistics 2001, Vol. 29, No. 5, 1233 1263 ENTROPIES AND RATES OF CONVERGENCE FOR MAXIMUM LIKELIHOOD AND BAYES ESTIMATION FOR MIXTURES OF NORMAL DENSITIES By Subhashis Ghosal and Aad W. van
More informationBayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4
Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection
More informationBayesian asymptotics with misspecified models
ISSN 2279-9362 Bayesian asymptotics with misspecified models Pierpaolo De Blasi Stephen G. Walker No. 270 October 2012 www.carloalberto.org/research/working-papers 2012 by Pierpaolo De Blasi and Stephen
More informationMAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9
MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended
More informationNonparametric Bayesian Uncertainty Quantification
Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS
Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)
More informationAn inverse of Sanov s theorem
An inverse of Sanov s theorem Ayalvadi Ganesh and Neil O Connell BRIMS, Hewlett-Packard Labs, Bristol Abstract Let X k be a sequence of iid random variables taking values in a finite set, and consider
More informationOn Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation
On Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation Antonio LIJOI, Igor PRÜNSTER, andstepheng.walker The past decade has seen a remarkable development in the area of Bayesian
More informationChapter 4. Measure Theory. 1. Measure Spaces
Chapter 4. Measure Theory 1. Measure Spaces Let X be a nonempty set. A collection S of subsets of X is said to be an algebra on X if S has the following properties: 1. X S; 2. if A S, then A c S; 3. if
More informationRandom Bernstein-Markov factors
Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit
More informationNonparametric Bayes Inference on Manifolds with Applications
Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces
More informationSHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction
SHARP BOUNDARY TRACE INEQUALITIES GILES AUCHMUTY Abstract. This paper describes sharp inequalities for the trace of Sobolev functions on the boundary of a bounded region R N. The inequalities bound (semi-)norms
More informationD I S C U S S I O N P A P E R
I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive
More informationSTAT 7032 Probability Spring Wlodek Bryc
STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,
More informationConsistent semiparametric Bayesian inference about a location parameter
Journal of Statistical Planning and Inference 77 (1999) 181 193 Consistent semiparametric Bayesian inference about a location parameter Subhashis Ghosal a, Jayanta K. Ghosh b; c; 1 d; ; 2, R.V. Ramamoorthi
More informationEntropy and Ergodic Theory Lecture 15: A first look at concentration
Entropy and Ergodic Theory Lecture 15: A first look at concentration 1 Introduction to concentration Let X 1, X 2,... be i.i.d. R-valued RVs with common distribution µ, and suppose for simplicity that
More informationAn introduction to some aspects of functional analysis
An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms
More information1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3
Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,
More informationReal Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi
Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.
More informationRATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1
The Annals of Statistics 1997, Vol. 25, No. 6, 2493 2511 RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 By Theodoros Nicoleris and Yannis
More informationINDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem
More information10 Typical compact sets
Tel Aviv University, 2013 Measure and category 83 10 Typical compact sets 10a Covering, packing, volume, and dimension... 83 10b A space of compact sets.............. 85 10c Dimensions of typical sets.............
More informationSome Background Material
Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important
More information3 Integration and Expectation
3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ
More informationarxiv: v2 [math.st] 18 Feb 2013
Bayesian optimal adaptive estimation arxiv:124.2392v2 [math.st] 18 Feb 213 using a sieve prior Julyan Arbel 1,2, Ghislaine Gayraud 1,3 and Judith Rousseau 1,4 1 Laboratoire de Statistique, CREST, France
More informationP-adic Functions - Part 1
P-adic Functions - Part 1 Nicolae Ciocan 22.11.2011 1 Locally constant functions Motivation: Another big difference between p-adic analysis and real analysis is the existence of nontrivial locally constant
More informationThe Hilbert Transform and Fine Continuity
Irish Math. Soc. Bulletin 58 (2006), 8 9 8 The Hilbert Transform and Fine Continuity J. B. TWOMEY Abstract. It is shown that the Hilbert transform of a function having bounded variation in a finite interval
More informationDefining the Integral
Defining the Integral In these notes we provide a careful definition of the Lebesgue integral and we prove each of the three main convergence theorems. For the duration of these notes, let (, M, µ) be
More information4 Countability axioms
4 COUNTABILITY AXIOMS 4 Countability axioms Definition 4.1. Let X be a topological space X is said to be first countable if for any x X, there is a countable basis for the neighborhoods of x. X is said
More informationDISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University
Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der
More informationIN this paper, we study two related problems of minimaxity
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 7, NOVEMBER 1999 2271 Minimax Nonparametric Classification Part I: Rates of Convergence Yuhong Yang Abstract This paper studies minimax aspects of
More informationBayesian Aggregation for Extraordinarily Large Dataset
Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work
More informationIntroduction to Real Analysis Alternative Chapter 1
Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces
More informationAsymptotic efficiency of simple decisions for the compound decision problem
Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com
More informationHILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define
HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,
More informationarxiv: v2 [math.co] 11 Mar 2008
Testing properties of graphs and functions arxiv:0803.1248v2 [math.co] 11 Mar 2008 Contents László Lovász Institute of Mathematics, Eötvös Loránd University, Budapest and Balázs Szegedy Department of Mathematics,
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationEmpirical Processes: General Weak Convergence Theory
Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated
More informationTopological properties
CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological
More informationA LITTLE REAL ANALYSIS AND TOPOLOGY
A LITTLE REAL ANALYSIS AND TOPOLOGY 1. NOTATION Before we begin some notational definitions are useful. (1) Z = {, 3, 2, 1, 0, 1, 2, 3, }is the set of integers. (2) Q = { a b : aεz, bεz {0}} is the set
More informationTheorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1
Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be
More informationTopology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:
Topology Xiaolong Han Department of Mathematics, California State University, Northridge, CA 91330, USA E-mail address: Xiaolong.Han@csun.edu Remark. You are entitled to a reward of 1 point toward a homework
More informationStat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, Metric Spaces
Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, 2013 1 Metric Spaces Let X be an arbitrary set. A function d : X X R is called a metric if it satisfies the folloing
More informationConvergence of latent mixing measures in nonparametric and mixture models 1
Convergence of latent mixing measures in nonparametric and mixture models 1 XuanLong Nguyen xuanlong@umich.edu Department of Statistics University of Michigan Technical Report 527 (Revised, Jan 25, 2012)
More informationCourse 212: Academic Year Section 1: Metric Spaces
Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........
More informationTHE SET OF RECURRENT POINTS OF A CONTINUOUS SELF-MAP ON AN INTERVAL AND STRONG CHAOS
J. Appl. Math. & Computing Vol. 4(2004), No. - 2, pp. 277-288 THE SET OF RECURRENT POINTS OF A CONTINUOUS SELF-MAP ON AN INTERVAL AND STRONG CHAOS LIDONG WANG, GONGFU LIAO, ZHENYAN CHU AND XIAODONG DUAN
More informationVARIETIES OF ABELIAN TOPOLOGICAL GROUPS AND SCATTERED SPACES
Bull. Austral. Math. Soc. 78 (2008), 487 495 doi:10.1017/s0004972708000877 VARIETIES OF ABELIAN TOPOLOGICAL GROUPS AND SCATTERED SPACES CAROLYN E. MCPHAIL and SIDNEY A. MORRIS (Received 3 March 2008) Abstract
More informationOptimal global rates of convergence for interpolation problems with random design
Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289
More informationThe Heine-Borel and Arzela-Ascoli Theorems
The Heine-Borel and Arzela-Ascoli Theorems David Jekel October 29, 2016 This paper explains two important results about compactness, the Heine- Borel theorem and the Arzela-Ascoli theorem. We prove them
More informationUNIFORMLY DISTRIBUTED MEASURES IN EUCLIDEAN SPACES
MATH. SCAND. 90 (2002), 152 160 UNIFORMLY DISTRIBUTED MEASURES IN EUCLIDEAN SPACES BERND KIRCHHEIM and DAVID PREISS For every complete metric space X there is, up to a constant multiple, at most one Borel
More informationA PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction
tm Tatra Mt. Math. Publ. 00 (XXXX), 1 10 A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES Martin Janžura and Lucie Fialová ABSTRACT. A parametric model for statistical analysis of Markov chains type
More informationIntegral Jensen inequality
Integral Jensen inequality Let us consider a convex set R d, and a convex function f : (, + ]. For any x,..., x n and λ,..., λ n with n λ i =, we have () f( n λ ix i ) n λ if(x i ). For a R d, let δ a
More informationApproximation by polynomials with bounded coefficients
Journal of Number Theory 27 (2007) 03 7 www.elsevier.com/locate/jnt Approximation by polynomials with bounded coefficients Toufik Zaimi Département de mathématiques, Centre universitaire Larbi Ben M hidi,
More informationRecall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm
Chapter 13 Radon Measures Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm (13.1) f = sup x X f(x). We want to identify
More informationProblem set 1, Real Analysis I, Spring, 2015.
Problem set 1, Real Analysis I, Spring, 015. (1) Let f n : D R be a sequence of functions with domain D R n. Recall that f n f uniformly if and only if for all ɛ > 0, there is an N = N(ɛ) so that if n
More informationAsymptotics for posterior hazards
Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and
More informationAdaptive Bayesian procedures using random series priors
Adaptive Bayesian procedures using random series priors WEINING SHEN 1 and SUBHASHIS GHOSAL 2 1 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 2 Department
More informationarxiv: v1 [math.st] 1 Aug 2007
The Annals of Statistics 2006, Vol. 34, No. 6, 2897 2920 DOI: 10.1214/009053606000000911 c Institute of Mathematical Statistics, 2006 arxiv:0708.0175v1 [math.st] 1 Aug 2007 CONVERGENCE RATES FOR BAYESIAN
More informationSummary of Real Analysis by Royden
Summary of Real Analysis by Royden Dan Hathaway May 2010 This document is a summary of the theorems and definitions and theorems from Part 1 of the book Real Analysis by Royden. In some areas, such as
More informationarxiv: v1 [math.fa] 14 Jul 2018
Construction of Regular Non-Atomic arxiv:180705437v1 [mathfa] 14 Jul 2018 Strictly-Positive Measures in Second-Countable Locally Compact Non-Atomic Hausdorff Spaces Abstract Jason Bentley Department of
More informationThe Measure Problem. Louis de Branges Department of Mathematics Purdue University West Lafayette, IN , USA
The Measure Problem Louis de Branges Department of Mathematics Purdue University West Lafayette, IN 47907-2067, USA A problem of Banach is to determine the structure of a nonnegative (countably additive)
More informationAnalysis Qualifying Exam
Analysis Qualifying Exam Spring 2017 Problem 1: Let f be differentiable on R. Suppose that there exists M > 0 such that f(k) M for each integer k, and f (x) M for all x R. Show that f is bounded, i.e.,
More informationThe main results about probability measures are the following two facts:
Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a
More informationON LEFT-INVARIANT BOREL MEASURES ON THE
Georgian International Journal of Science... Volume 3, Number 3, pp. 1?? ISSN 1939-5825 c 2010 Nova Science Publishers, Inc. ON LEFT-INVARIANT BOREL MEASURES ON THE PRODUCT OF LOCALLY COMPACT HAUSDORFF
More informationSome topics in analysis related to Banach algebras, 2
Some topics in analysis related to Banach algebras, 2 Stephen Semmes Rice University... Abstract Contents I Preliminaries 3 1 A few basic inequalities 3 2 q-semimetrics 4 3 q-absolute value functions 7
More informationProbability and Measure
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability
More informationAsymptotic Normality of Posterior Distributions for Exponential Families when the Number of Parameters Tends to Infinity
Journal of Multivariate Analysis 74, 4968 (2000) doi:10.1006jmva.1999.1874, available online at http:www.idealibrary.com on Asymptotic Normality of Posterior Distributions for Exponential Families when
More informationProofs for Large Sample Properties of Generalized Method of Moments Estimators
Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationAPPROXIMATE ISOMETRIES ON FINITE-DIMENSIONAL NORMED SPACES
APPROXIMATE ISOMETRIES ON FINITE-DIMENSIONAL NORMED SPACES S. J. DILWORTH Abstract. Every ε-isometry u between real normed spaces of the same finite dimension which maps the origin to the origin may by
More informationOn maxitive integration
On maxitive integration Marco E. G. V. Cattaneo Department of Mathematics, University of Hull m.cattaneo@hull.ac.uk Abstract A functional is said to be maxitive if it commutes with the (pointwise supremum
More informationIn N we can do addition, but in order to do subtraction we need to extend N to the integers
Chapter 1 The Real Numbers 1.1. Some Preliminaries Discussion: The Irrationality of 2. We begin with the natural numbers N = {1, 2, 3, }. In N we can do addition, but in order to do subtraction we need
More informationPROBLEMS. (b) (Polarization Identity) Show that in any inner product space
1 Professor Carl Cowen Math 54600 Fall 09 PROBLEMS 1. (Geometry in Inner Product Spaces) (a) (Parallelogram Law) Show that in any inner product space x + y 2 + x y 2 = 2( x 2 + y 2 ). (b) (Polarization
More informationLecture 4 Lebesgue spaces and inequalities
Lecture 4: Lebesgue spaces and inequalities 1 of 10 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 4 Lebesgue spaces and inequalities Lebesgue spaces We have seen how
More informationChapter 5. Measurable Functions
Chapter 5. Measurable Functions 1. Measurable Functions Let X be a nonempty set, and let S be a σ-algebra of subsets of X. Then (X, S) is a measurable space. A subset E of X is said to be measurable if
More informationMATHS 730 FC Lecture Notes March 5, Introduction
1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists
More informationEstimates for probabilities of independent events and infinite series
Estimates for probabilities of independent events and infinite series Jürgen Grahl and Shahar evo September 9, 06 arxiv:609.0894v [math.pr] 8 Sep 06 Abstract This paper deals with finite or infinite sequences
More informationX n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)
14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence
More informationChapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries
Chapter 1 Measure Spaces 1.1 Algebras and σ algebras of sets 1.1.1 Notation and preliminaries We shall denote by X a nonempty set, by P(X) the set of all parts (i.e., subsets) of X, and by the empty set.
More information1.1. MEASURES AND INTEGRALS
CHAPTER 1: MEASURE THEORY In this chapter we define the notion of measure µ on a space, construct integrals on this space, and establish their basic properties under limits. The measure µ(e) will be defined
More informationA VERY BRIEF REVIEW OF MEASURE THEORY
A VERY BRIEF REVIEW OF MEASURE THEORY A brief philosophical discussion. Measure theory, as much as any branch of mathematics, is an area where it is important to be acquainted with the basic notions and
More informationThe 123 Theorem and its extensions
The 123 Theorem and its extensions Noga Alon and Raphael Yuster Department of Mathematics Raymond and Beverly Sackler Faculty of Exact Sciences Tel Aviv University, Tel Aviv, Israel Abstract It is shown
More informationExistence and Uniqueness
Chapter 3 Existence and Uniqueness An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect
More informationAPPROXIMATION OF ENTIRE FUNCTIONS OF SLOW GROWTH ON COMPACT SETS. G. S. Srivastava and Susheel Kumar
ARCHIVUM MATHEMATICUM BRNO) Tomus 45 2009), 137 146 APPROXIMATION OF ENTIRE FUNCTIONS OF SLOW GROWTH ON COMPACT SETS G. S. Srivastava and Susheel Kumar Abstract. In the present paper, we study the polynomial
More informationIntegration on Measure Spaces
Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of
More informationScalar curvature and the Thurston norm
Scalar curvature and the Thurston norm P. B. Kronheimer 1 andt.s.mrowka 2 Harvard University, CAMBRIDGE MA 02138 Massachusetts Institute of Technology, CAMBRIDGE MA 02139 1. Introduction Let Y be a closed,
More informationREVIEW OF ESSENTIAL MATH 346 TOPICS
REVIEW OF ESSENTIAL MATH 346 TOPICS 1. AXIOMATIC STRUCTURE OF R Doğan Çömez The real number system is a complete ordered field, i.e., it is a set R which is endowed with addition and multiplication operations
More informationON THE COMPLETE CONVERGENCE FOR WEIGHTED SUMS OF DEPENDENT RANDOM VARIABLES UNDER CONDITION OF WEIGHTED INTEGRABILITY
J. Korean Math. Soc. 45 (2008), No. 4, pp. 1101 1111 ON THE COMPLETE CONVERGENCE FOR WEIGHTED SUMS OF DEPENDENT RANDOM VARIABLES UNDER CONDITION OF WEIGHTED INTEGRABILITY Jong-Il Baek, Mi-Hwa Ko, and Tae-Sung
More informationAsymptotics for posterior hazards
Asymptotics for posterior hazards Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with P. Di Biasi and G. Peccati Workshop on Limit Theorems and Applications Paris, 16th January
More informationContinuity. Chapter 4
Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of
More information1 Definition of the Riemann integral
MAT337H1, Introduction to Real Analysis: notes on Riemann integration 1 Definition of the Riemann integral Definition 1.1. Let [a, b] R be a closed interval. A partition P of [a, b] is a finite set of
More information