RATE EXACT BAYESIAN ADAPTATION WITH MODIFIED BLOCK PRIORS. By Chao Gao and Harrison H. Zhou Yale University

Size: px
Start display at page:

Download "RATE EXACT BAYESIAN ADAPTATION WITH MODIFIED BLOCK PRIORS. By Chao Gao and Harrison H. Zhou Yale University"

Transcription

1 Submitted to the Annals o Statistics RATE EXACT BAYESIAN ADAPTATION WITH MODIFIED BLOCK PRIORS By Chao Gao and Harrison H. Zhou Yale University A novel block prior is proposed or adaptive Bayesian estimation. The prior does not depend on the smoothness o the unction or the sample size. It puts suicient prior mass near the true signal and automatically concentrates on its eective dimension. A rateoptimal posterior contraction is obtained in a general ramework, which includes density estimation, white noise model, Gaussian sequence model, Gaussian regression and spectral density estimation. 1. Introduction. Bayesian nonparametric estimation is attracting more and more attention in a wide range o applications. We consider a undamental question in Bayesian nonparametric estimation: Is it possible to construct a prior such that the posterior contracts to the truth with the exact optimal rate and at the same time is adaptive regardless o the unknown smoothness? We provide a positive answer to this question by designing a block prior on coeicients o orthogonal series expansion o the unction. Speciically, we obtain adaptive Bayesian estimation under a Sobolev ball assumption. Assume that is a unction on the unit interval [0, 1]. Let {φ j } be the trigonometric orthogonal basis o L [0, 1], and deine θ j = φ j or each j. The Sobolev ball is speciied as E α Q = L [0, 1] : j α θj Q, with θ j = φ j or each j. Under a general ramework, we construct a prior Π, which satisies the Kullback-Leibler KL property and it automatically concentrates on the eective dimension o the signal, then as a consequence, the minimax posterior contraction rate is obtained, i.e., 1 P n Π > Mn α+1 X n α 0, The research o C. Gao and H. H. Zhou is supported in part by NSF DMS AMS 000 subject classiications: 6G07, 6G0 Keywords and phrases: Bayesian nonparametrics, adaptive estimation, block prior 1

2 C. GAO AND H.H. ZHOU where the loss unction is the l -norm. Adaptive Bayesian estimators over Sobolev balls or Hölder balls are considered in the literature. There are two main approaches in these works. The irst one is to put a hyper-prior on the smoothness index α. As is shown in Scricciolo 006 and Ghosal, Lember and van der Vaart 008, minimax rate can be achieved, but the set o α is restricted to be countable or even inite. The second approach is to put a prior on k, where k is the number o basis unctions or approximation, or the model dimension. This is called sieve prior in Shen and Wasserman 001. Examples o using sieve prior includes Kruijer and van der Vaart 008 and Rivoirard and Rousseau 01. Their procedures are adaptive over all α, but the rates have extra logarithmic terms. Other recent works in Bayesian adaptive estimation include van der Vaart and van Zanten 007, van der Vaart and van Zanten 009, de Jonge and van Zanten 010, Kruijer, Rousseau and van der Vaart 010, Rousseau 010, Shen, Tokdar and Ghosal 013 and Castillo, Kerkyacharian and Picard 014, but the posterior contraction rates in these works all miss a logarithmic actor. The investigation o whether a logarithmic term is necessary in the posterior contraction rate has undamental implications. The results can lead to answers to two important questions. First, is the presence o a logarithmic term an intrinsic problem to Bayesian adaptive nonparametric estimation? Second, is the presence o a logarithmic term an artiact due to the current proo technique? The answer to the irst question should have an impact on statisticians views o the requentist/bayesian debate. The answer to the second question will provide a better understanding on the amous prior mass and testing ramework Barron, Schervish and Wasserman,1999; Ghosal, Ghosh and van der Vaart, 000 that is widely used to establish posterior contraction results. Compared to the previous results in the literature, the proposed block prior is adaptive over a continuum o smoothness, and its posterior contraction is exactly rate-optimal. The ramework or the applications o the block prior is very general. It includes density estimation, white noise, Gaussian sequence, regression and spectral density estimation. At the point when the irst drat o the paper was inished, we received a manuscript by Homann, Rousseau and Schmidt-Hieber 015 on Bayes adaptive estimation. They considered the similar problem as ours and obtain the exact minimax rate by using a spike and slab prior. However, their adaptation result or the l loss only holds or the white noise model. Since their proo technique takes advantage o the Gaussian sequence structure, it cannot be immediately extended to other model settings. In contrast, by

3 BAYESIAN ADAPTIVE BLOCK PRIOR 3 designing a block prior that especially work under the prior mass and testing ramework, we are able to establish results or models including density estimation, nonparametric regression and spectral density estimation. The major diiculty o adaptation with the exact rate in various model settings is the design o a prior distribution that satisies the conditions o the general prior mass and testing ramework, which can be applied to a wide range o models. This ramework was pioneered by Le Cam 1973 and Schwartz 1965, and was later extended to the nonparametric setting by Barron 1988, Barron, Schervish and Wasserman 1999 and Ghosal, Ghosh and van der Vaart 000. They proved as long as the prior satisies a Kullback-Leibler property and there exists a testing procedure on the essential support o the prior, the posterior distribution contracts to the truth with certain rate o convergence. Though it is possible to analyze the posterior distribution according to the Bayes ormula directly as in Homann, Rousseau and Schmidt-Hieber 015, the prior mass and testing ramework imposes the weakest assumption on the likelihood unction, which makes it lexible to various model settings. The price o such lexibility to model settings is the rather strong requirements on the prior. In our opinion, the design o a prior that satisies the prior mass and testing ramework is the major diiculty o achieving rate-optimal adaptation over various model settings. The block prior we propose in this paper gives a solution to this problem. We show that it possesses the strong properties required by the prior mass and testing ramework. Thereore, not only does it give rate-optimal adaptation, the good posterior behavior also extends to the settings beyond the white noise model. The paper is organized as ollows. In Section, we irst introduce a preliminary block prior Π, which satisies the Kullback-Leibler property and concentrates on the eective dimension o the truth, and then we present the key result o this paper, adaptive rate-optimal posterior contraction or a slightly modiied prior Π under a general ramework. As applications o the main results, we study adaptive Bayesian estimation o various nonparametric models in Section 3. Section 4 discussed the posterior tail probability bound and an extension o the theory to Besov balls. It also includes discussion on why a logarithmic actor is usually present in the Bayes nonparametric literature. The main body o the proos are presented in Section 5. Simulation and some auxiliary results o the proos are given in the supplement Gao and Zhou, 015b Notations. Throughout the paper, P and E are generic probability and expectation operators, which are used whenever the distribution is clear

4 4 C. GAO AND H.H. ZHOU in the context. Small and big case letters denote constants which may vary rom line to line. We won t pay attention to the values o constants which do not aect the result, unless otherwise speciied. Notice these constants may or may not be universal, which we shall make clear in the context. The unction and its Fourier coeicients θ = {θ j } are used interchangeably. We say is distributed by Π i the corresponding θ Π. In the same way, the unction space and the parameter space o and θ will not be distinguished. The norm denotes both the l -norm o and the l -norm o θ. For two probabilities P 1 and P with densities p 1 and p, we use the ollowing divergences throughout the paper, DP 1, P = P 1 log p 1, p V P 1, P = P 1 log p 1, DP 1, P p HP 1, P = p1 p 1/. We use θ j and θ 0j to indicate the j-th entries o vectors θ = {θ j } and θ 0 = {θ 0j } respectively. The bold notation θ k represents the vector {θ j } j Bk or the k-th block. The rate ɛ n is always the minimax rate ɛ n = n α α+1.. Main Results. In this section, we irst give some necessary backgrounds o Bayes nonparametric estimation, then introduce a block prior and the result o adaptive posterior contraction..1. Background. Suppose we have data X n P n, and the distribution P n has density p n with respect to a dominating measure. The posterior distribution or a prior Π is deined to be ΠA X n = p n A p n n p p n X n dπ X n dπ, where X n P n. We need to bound the expectation o Π d, > Mɛ n X n in this paper. To bound this quantity, it is suicient to upper bound the numerator and lower bound the denominator. Following Barron, Schervish and Wasserman 1999 and Ghosal, Ghosh and van der Vaart 000, this involves three steps:

5 BAYESIAN ADAPTIVE BLOCK PRIOR 5 1. Show the prior Π puts suicient mass near the truth, i.e., we need ΠK n exp C 1 nɛ n, where K n = { DP n, P n nɛ n, V P n, P n nɛ n }.. Choose an appropriate set F n, and show the prior is essentially supported on F n in the sense that ΠF c n exp C nɛ. This controls the complexity o the prior. 3. Construct a testing unction φ n or the ollowing testing problem H 0 : = vs. H 1 : suppπ F n and d, > Mɛ n. The testing error needs to be well controlled in the sense that P n φ n sup P n 1 φ n exp C 3 nɛ. H 1 Note that the constants C 1, C,and C 3 are dierent in these three steps above. Step 1 lower bounds the prior concentration near the truth, which leads to a lower bound or the denominator p n p n X n dπ. It is originated rom Schwartz Step and Step 3 are mainly or upper bounding the numerator A p n p n X n dπ. The testing idea in Step 3 is initialized by Le Cam 1973 and Schwartz Step goes back to Barron 1988, who proposes the idea to choose an appropriate F n to regularize the alternative hypothesis in the test, otherwise the testing unction or Step 3 may never exist see Le Cam 1973 and Barron The Block Prior Π. Given a sequence θ = θ 1, θ,... in the Hilbert space l. Deine the blocks to be B k = {l k,..., l k+1 1}, and {1,, 3,...} = k=0 B k. Deine the block size o the k-th block to be n k = l k+1 l k = B k. Remember the notation θ k represents the vector {θ j } j Bk. The block prior Π on the unction is induced by a distribution on its Fourier sequence {θ j }. For each k, let g k be a one-dimensional density unction on R +. We describe Π as ollows. A k g k independently or each k, θ k A k N0, A k I nk independently or each k,

6 6 C. GAO AND H.H. ZHOU where I nk is the n k n k identity matrix. In this work, we speciy l k to be l k = [e k ]. The sequence o densities {g k } is used to mix the scale parameter A k or each block, and we call them mixing densities. Our theory covers a class o mixing densities. The mixing density class G contains all {g k } satisying the ollowing properties: 1. There exists c 1 > 0 such that, or any k and t [e k, e k ], g k t exp c 1 e k.. There exists c > 0, such that or any k, 3 0 tg k tdt 4 exp c k. 3. There exists c 3 > 0, such that or any k, 4 e k g ktdt exp c 3 e k. For a unction E α Q, deine the set 5 F n = F n β = θ : θ j θ 0j ɛ n. j>nβ 1 1 α+1 We have the ollowing theorem characterizing the property o Π. Theorem.1. For the block prior Π with mixing densities {g k } G, let E α Q or some α, Q > 0, then there exists a constant C > 0 such that 6 Π θ j θ 0j ɛ n exp Cnɛ n, and 7 Π F c n or suiciently large n whenever β c 3 deined in 4. exp C + 4nɛ n, min { c3 C+4, 4Q α } α+1, with

7 BAYESIAN ADAPTIVE BLOCK PRIOR 7 Remark.1. The theorem presents two properties o the block prior Π. Property 6 says the prior gives suicient mass near the true signal. This is also recognized as the K-L condition once the Kullback-Leibler divergence is upper bounded by the l -norm in the support o the prior. Property 7 says the prior concentrates on the eective dimension o the true signal automatically. In Bayesian nonparametric theory, a testing argument is needed to prove posterior contraction rate. Such test can be established on a sieve receiving most o the prior mass. In 7, the set F n can be used as such a sieve. Remark.. When the smoothness α is known, a well-known prior Π α = N0, j α 1 is used in the literature. It can be shown that this prior satisies 6. The block prior Π satisies 6 and 7, and it does not depend on the smoothness α. Thus it is ully adaptive. We claim that the mixing density class G is not empty by presenting an example Figure 1. e k exp e k T k t + T k, 0 t e k ; 8 g k t = exp e k, e k < t e k ; 0, t > e k. The value o T k is speciied as 9 T k = e k exp e k + k k + exp e k. The ollowing proposition is proved in the supplementary material Gao and Zhou, 015b. Proposition.1. The densities {g k } deined in 8 satisies, 3 and 4. Thus, G is not empty..3. Adaptive Posterior Contraction o the Modiied Block Prior Π. In order to prove posterior contraction rate, it is essential to construct a suitable test. A preliminary test is irst constructed in a local neighborhood. Then a global test is established by combining all the local tests when the metric entropy is well controlled. We say the distance d satisies the testing property with respect to the prior Π and the truth i and only i there exists some constants L > 0 and ξ 0, 1/, such that or any 1 suppπ satisying d, 1 > ɛ n, we have 10 P n φ n exp Lnd, 1,

8 8 C. GAO AND H.H. ZHOU g k t exp e k T k 0 e k e k t Fig 1. The plot o the mixing density unction A k g k deined in sup P n 1 φ n exp Lnd, 1, { suppπ:d, 1 ξd, 1 } or some testing unction φ n. Then, a global test can be constructed or H 0 : = against H 1 = { F n suppπ : d, > Mɛ n } as long as d 1, 1 or any 1 and. The equivalence o d and may not be true or d being Hellinger distance or total variation. We thus consider a modiication o the block prior Π, denoted as Π, so that d and are equivalent in the support o the modiied block prior Π. Deine ΠA = ΠD A, ΠD where the constraint set D needs to be designed case by case such that D P n 1, P n bn 1, V P n b 1 d 1, 1 bd 1,, 1, P n bn 1, or some constant b > 1. We give a speciic choice o D or each model considered in this paper. Another crucial property o D we need is that Π

9 BAYESIAN ADAPTIVE BLOCK PRIOR 9 inherits properties 6 and 7 rom Π. It is obvious that 7 is still true or Π as long as ΠD > 0. Thereore, one only needs to check 6, which is usually not hard as we will see in all the examples in Section 3. A general theorem covers all examples in Section 3 is stated as ollows. Theorem.. For the block prior Π with mixing densities {g k } G, deine ΠA = ΠD A with the constraint set D satisying the properties ΠD above. Let the distance d satisy the the testing property 10 and 11. Assume that, or any E α Q D with α α, and Q 0, Q, the prior Π inherits properties 6 and 7 rom Π or some C > 0. Then, or any such, there exists M > 0, such that P n Π d, > Mn α α+1 X n 0. Remark.3. We note that the range α α, and Q 0, Q is the adaptive region or the prior Π. It is determined by the constraint set D and by whether properties 6 and 7 can be inherited rom Π to Π. In some examples such as the white noise model, the modiication by D is not needed, so that we have Π = Π. This will result in α = 0 and Q =, and thus the prior may adapt to all Sobolev balls. In the regression and the density estimation models, α needs to be larger than 1/, and Q can be chosen arbitrarily large by properly picking the corresponding D. For the spectral density estimation, we need α > 3/. See Section 3 or details. Remark.4. Theorem. requires the assumption E α Q D. In all the nonparametric estimation examples we consider in Section 3, we consider very speciic orms o D, and we are going to show that such D can be removed rom the assumption because o the relation E α Q D or α > α. This implies E α Q D = E α Q and we only need E α Q in the assumption. X 3. Applications. Given the experiment n, A n, P n : Eα Q, and observation X n P n, we estimate the unction by an adaptive Bayesian procedure. The goal is to achieve the minimax posterior contraction rate without knowing the smoothness α. In this section, we consider the ollowing examples: 1. Density Estimation. The observations X 1,..., X n are i.i.d. distributed

10 10 C. GAO AND H.H. ZHOU according to the density p t = et e t dt, or some unction in a Sobolev ball.. White Noise. The observation Y n t is rom the ollowing process dy n t = tdt + 1 n dw t, where W t is the standard Wiener process. 3. Gaussian Sequence. We have independent observations X i = θ i + n 1/ Z i, i N, where {θ i } are Fourier coeicients o, and {Z i } are i.i.d. standard Gaussian variables. 4. Gaussian Regression. The design is uniorm X U[0, 1]. Given X, Y X NX, 1. The observations are i.i.d. pairs X 1, Y 1,..., X n, Y n. 5. Spectral Density. The observations are stationary Gaussian time series X 1,..., X n with mean 0 and auto-covariance η h g = π π eihλ gλdλ. The spectral density g is modeled by g = exp or some symmetric in a Sobolev ball. The above models have similar requentist estimation procedures, which is due to the deep act that they are asymptotically equivalent to each other under minor regularity assumptions. Reerences or asymptotic equivalence theory include Brown and Low 1996, Nussbaum 1996, Brown et al. 00 and Golubev, Nussbaum and Zhou Density Estimation. Let P n be the product measure P n = n i=1 P. The data is i.i.d. X n = X 1,..., X n n i=1 P. Let P be dominated by e Lebesgue measure µ, and it has density unction p t = t. Con- 1 0 et µdt sider the Fourier expansion = j θ jφ j, and the density p can be written in the orm o ininite dimensional exponential amily p t = exp θ j φ j t ψθ, j where ψθ = 1 0 e j θ jφ j t µdt.

11 BAYESIAN ADAPTIVE BLOCK PRIOR 11 Notice the irst Fourier base unction is φ 1 t = 1. It is easy to see that dierent θ 1 s correspond to the same p. For identiiability, we set θ 1 = 0, so that we have tµdt = j θ j φj tdt = 0. We use the modiied block prior ΠA = ΠD A with the constraint set ΠD 1 D = θ : θ j < B, or some constant B > 0. The next lemma shows that the modiied block prior Π inherits properties 6 and 7 rom Π. Lemma 3.1. For α > 1/, deine the constant 13 γ = j α 1/ <. For any E α Q, with α α and 3γQ B, there is a constant C > 0, such that Π θ 0j θ j ɛ n exp Cnɛ n, and Π Fn c exp C + 4nɛ n. For density estimation, it is natural to use Hellinger distance as the testing distance d. According to the testing theory in Le Cam 1973 and Ghosal, Ghosh and van der Vaart 000, it satisies testing property 10 and 11. The next lemma establishes equivalence among various distances and divergences under D deined in 1. Lemma 3.. On the set D, there exists a constant b > 1, such that DP 1, P b θ 1 θ, V P 1, P b θ 1 θ, b 1 HP 1, P θ 1 θ bhp 1, P. We will prove the above two lemmas in the supplementary material Gao and Zhou, 015b. The main result o posterior contraction or density estimation is stated as ollows.

12 1 C. GAO AND H.H. ZHOU Theorem 3.1. Let α > 1/ be ixed, and γ is the associated constant deined in 13. For any α, Q satisying α α and B 3γQ, there is a constant M > 0, such that sup P n 0 Π HP, P 0 > Mɛ n X 1,..., X n 0. E αq Remark 3.1. The prior Π depends on the value o B, which determines the range o adaptation. For any α > 1/ and Q > 0, we can choose B satisying B 3γQ γ depends on α, such that the prior Π is adaptive or all E α Q with α α and Q Q. 3.. White Noise. We let P n be the distribution o the ollowing process dy n t = tdt + 1 n dw t, t [0, 1]. where W t is the standard Wiener process and the signal has Fourier expansion = j θ jφ j. This model is the simplest and most studied nonparametric model. It is equivalent to the Gaussian sequence model, and we have DP n, P n = 1 n, V P n, P n = n. In the white noise model, it is natural to use the l norm as the testing distance d. The ollowing lemma is rom Lemma 5 in Ghosal and van der Vaart 007. { Lemma 3.3. Let φ n = 1 t tdy t t > 1 }. Then we have P n φ n 1 Φ n 1 / sup P n 1 φ n 1 Φ n 1 /4, {: 1 1 /4} where Φ is the standard Gaussian cumulative distribution unction. By the property o Gaussian tail, we have 1 Φ nl 1 e 1 L n 1, provided nl 1 > 1, which is true because we only need to test those 1 with 1 > Mɛ n, and we have nɛ n. Thereore, in the white noise model, the distance satisying 10 and 11 is the l norm. Considering that the divergence DP n, P n also l norm, we reach the ollowing conclusion. and V P n, P n are

13 BAYESIAN ADAPTIVE BLOCK PRIOR 13 Theorem 3.. In the white noise model, or any α > 0 and Q > 0, there exists a constant M > 0, such that P n Π > Mɛ n Y n t 0. sup E αq Hence, this is a case that we have adaptation or all Sobolev balls Gaussian Sequence. The Gaussian sequence model is equivalent to the while noise model. We present this case just or illustration o the theory. Given = j θ jφ j, the model P n is in a product orm 14 P n = i=1 P n θ i = Nθ i, n 1. Thus, the observations are independent Gaussian variables in the orm i=1 X i = θ i + n 1/ Z i, i N, where {Z i } are i.i.d. standard Gaussian variables. The divergence in this case is easy to calculate. That is, DP n, P n = n θ 0 θ and V P n, P n = n θ 0 θ, and they are exactly the l norm. Deine φ n X = { X θ 1 < X θ 0 } = { X T θ 1 θ 0 > θ 1 θ 0 }. We observe this is exactly the same test in the white noise model, and thus Lemma 3.3 applies here. Thereore, P n φ n e 1 8 n θ 0 θ 1, sup P n 1 φ n e 1 3 n θ 0 θ 1. {θ: θ θ 1 θ 1 θ 0 /4} The d satisying the testing property 10 and 11 can be chosen as the l norm. We thus reach the ollowing conclusion. Theorem 3.3. In the Gaussian sequence model, or any α > 0 and Q > 0, there exists a constant M > 0, such that P n Π θ θ 0 > Mɛ n X 1, X, sup E αq We have adaptation or all Sobolev balls.

14 14 C. GAO AND H.H. ZHOU 3.4. Gaussian Regression. We consider uniorm random design instead o ixed design, because the random design allows simple connection between various divergences and the l distance. The model P n gives i.i.d. observations X 1, Y 1,..., X n, Y n with distribution X U[0, 1], Y X N X, 1. The theory is easily extended to general random design with X q or some density q on [0, 1] bounded rom above and below. We choose the uniorm design or simplicity o presentation. The unction has Fourier expansion = j θ jφ j so that we can apply the modiied block prior on. Let P be the distribution o a single observation, and we need to calculate DP 0, P and V P 0, P. Let φ be the standard normal density, and it can be shown that DP 0, P 1 and V P 0, P As what we have done in the density estimation case, we use the modiied block prior ΠA = ΠA D { } with the constraint set D = ΠD θ j < B. According to Lemma 3.1, the prior Π inherits properties 6 and 7 rom Π. Thereore, or and D, V P 0, P 1 + B. Next, we deal with the testing procedure. We use likelihood ratio test as in the white noise and Gaussian sequence model cases, and the error is bounded in the ollowing lemma. Lemma 3.4. There exists a constant L > 0, such that or any, 1 D satisying n 1 > 1, there exits a testing unction φ n with error probability bounded as P n φ n e Ln 1, sup P n { suppπ: φ n e Ln 1. 1 } The lemma will be proved in later sections. It says l norm satisies the testing property 10 and 11. Using Theorem., we reach the ollowing conclusion. Theorem 3.4. Let α > 1/ and γ be the constant deined in 13. In the Gaussian regression model with uniorm random design, or any α, Q satisying α α and 3γQ B, there exists a constant M > 0, such that P n Π > Mɛ n X 1,..., X n, Y 1,..., Y n 0. sup E αq

15 BAYESIAN ADAPTIVE BLOCK PRIOR 15 Remark 3.. The prior Π depends on the value o B, which determines the range o adaptation. For any α > 1/ and Q > 0, we can choose B satisying B 3γQ γ depends on α, such that the prior Π is adaptive or all E α Q with α α and Q Q Spectral Density Estimation. Suppose the probability P n generates stationary Gaussian time series data X 1,..., X n with mean 0 and spectral density g = e, with t = t. We assume the spectral density to be a unction on [ π, π]. The auto-covariance is η h = π π eiht gtdt. Thus, the observation X 1,..., X n ollows P n = N 0, Γ n g, where the covariance matrix is η 0 η 1 η n 1 η 1 η 0 η n Γ n g = η n 1 η n η 0 We model the exponent o the spectral density by t = j=0 θ j cosjt. According to Parseval s identity, we have π g = η and π = θ. We use the modiied block prior ΠA = ΠD A with the constraint ΠD set 15 D = j θ j < B, j=0 The constraint set 15 is stronger than 1. Thus, in order that the modiied prior Π inherits properties 6 and 7 rom the block prior Π, we need α > 3/. The ollowing lemma will be proved in the supplementary material Gao and Zhou, 015b. Lemma 3.5. For an arbitrary α > 3/, and the constant γ deined as 16 γ = j α. For any E α Q, with α α and 3γQ B, there is a constant C > 0, such that Π θ 0j θ j ɛ n exp Cnɛ n, and Π Fn c exp C + 4nɛ n.

16 16 C. GAO AND H.H. ZHOU The ollowing lemma, comparing the l norm with DP n, P n and V P n, P n, will be proved in the supplementary material Gao and Zhou, 015b. Lemma 3.6. For any, 1 D, we have DP n, P n 1 bn 1, V P n, P n 1 bn 1, where b > 1 is a constant only depending on Π. The testing distance satisying the testing properties 10 and 11 is the l -norm. Lemma 3.7. There exists constants L > 0 and 0 < ξ < 1/, such that or any, 1 D with 1 ɛ n, there exists a testing unction φ n such that P n φ n exp Ln 1, P n 1 φ n exp Ln 1. sup { suppπ: 1 ξ 1 } The lemma will be proved in later sections. We state the main result o posterior contraction o spectral density estimation as ollows. Theorem 3.5. In the spectral density estimation problem, let X 1,..., X n P n. For any α and Q satisying Lemma 3.5, there is a constant M > 0, such that P n Π > Mɛ n X 1,..., X n 0. sup E αq Remark 3.3. The prior Π depends on the value o B, which determines the range o adaptation. For any α > 3/ and Q > 0, we can choose B satisying B 3γQ γ depends on α, such that the prior Π is adaptive or all E α Q with α α and Q Q. Notice the deinition o γ in 16 is dierent rom that in Discussion.

17 BAYESIAN ADAPTIVE BLOCK PRIOR Exponential Tail o the Posterior. The conclusion o the main posterior contraction result in Theorem. does not speciy a decaying rate o the posterior tail. In act, by scrutinizing the its proo, it has the ollowing polynomial tail P n Π θ θ 0 > Mɛ n X n C nɛ. n However, to obtain a point estimator such as posterior mean with the same rate o convergence as ɛ n, aster posterior tail probability is needed see, or example, Ghosal, Ghosh and van der Vaart 000 and Shen and Wasserman 001. In this section, we show that this polynomial tail can be improved to exponential tail in all the examples we consider in Section 3. The critical step is the ollowing lemma, which improves Lemma 5.6 in the proo o the general result o Theorem.. Lemma 4.1. For all statistical models we consider in Section 3 and the corresponding modiied block prior Π, let C be the constant with which Π satisies 6 and 7. Deine n p 17 H n = X n dπ exp C + b + 1nɛ n. p n Then we have P n Hn c exp Cnɛ or E α Q D and some C > 0. From Lemma 4.1, we have the ollowing improved result or posterior contraction. Theorem 4.1. The conclusions o Theorem 3.1, Theorem 3., Theorem 3.3, Theorem 3.4 and Theorem 3.5 can be strengthened as P n Π θ θ 0 > Mɛ n X n exp C nɛ n, under their corresponding settings. As a consequence, the posterior mean serves as a rate-optimal point estimator. Corollary 4.1. Under the setting o Theorem 3.1, Theorem 3., Theorem 3.3, Theorem 3.4 and Theorem 3.5, we have or some constant M > 0. P n E Πθ X n θ 0 M ɛ n,

18 18 C. GAO AND H.H. ZHOU The proos o Lemma 4.1, Theorem 4.1 and Corollary 4.1 are presented in the supplementary material Gao and Zhou, 015b. 4.. Extension to Besov Balls. Besov balls provides a more lexible collection o unctions than Sobolev balls. They are related to wavelet bases. The block prior we propose in this paper naturally takes advantage o the multi-resolution structure o Besov balls. Given a sequence {θ j }, deine θ k = {θ k +l} k 1 l=0 or k = 0, 1,,... We can view the signals on each resolution level θ k as a natural block with size n k = k. The Besov ball is deined as { Bp,qQ α = θ : } skq θ k q p Q q, k where s = α p and p is the vector l p -norm. We consider the non-sparse case where the parameters are restricted by 18 α, p, q, Q 0, [, ] [1, ] 0,. Under such restriction, the block prior is suitable or estimating the signal in B α p,qq. We describe the prior Π as ollows. A k g k independently or each k, θ k A k N0, A k I nk independently or each k, where I nk is the k k identity matrix. The mixing densities {g k } are deined through 8 and 9 with the constant e replaced by. It is clear that the new mixing densities {g k } satisies, 3 and 4 with every e replaced by. Deine the new sieve F n = θ k θ 0k ɛ n. k>α+1 1 log nβ 1 We state the property o the block prior Π targeting at Besov balls below. Theorem 4.. For the block prior Π deined above, let θ 0 Bp,qQ α with α, p, q, Q satisying 18, then there exists a constant C > 0 such that 19 Π θ j θ 0j ɛ n n, Cnɛ

19 and 0 Π BAYESIAN ADAPTIVE BLOCK PRIOR 19 F c n or suiciently large n whenever β 1 C+4nɛ n, c 3 deined in 4 where e is replaced by. min { c3 C+4, 4Q α } α+1, with We apply the prior to the Gaussian sequence model. For other models, some slightly extra works are needed. Theorem 4.3. For the Gaussian sequence model 14 with any θ 0 Bp,qQ, α where α, p, q, Q satisies 18, then there exists M > 0, such that sup P n θ 0 Bp,q α Q θ 0 Π θ θ 0 > Mɛ n X1, X, Thus, the prior is adaptive or all Besov balls satisying 18. We prove the results o the extension in the supplementary material Gao and Zhou, 015b Diiculty o Achieving the Exact Rate. The literature o Bayes nonparametric adaptive estimation usually reports an extra logarithmic term along with the minimax rate ɛ n. In this section, we provide examples o two priors and illustrate the reasons or them to have the extra logarithmic term. In the irst example, the diiculty lies in the prior itsel. In the second example, the diiculty lies in the method o proo. The analysis also sheds light on why the block prior is able to achieve the exact minimax rate Diiculty due to the Prior. One o the most elegant prior on is the rescaled Gaussian process studied by van der Vaart and van Zanten 007 and van der Vaart and van Zanten 009. Consider the centered Gaussian process W t : t [0, 1] with the double exponential kernel EW t W s = exp s t. The rescaled Gaussian process is deined as W t/c or some c either ixed or sampled rom a hyper-prior. The reason or the rescaling is that the original W t has an ininitely dierentiable sample path almost surely. The rescaling step makes it rougher so that it is appropriate or estimating a signal in Sobolev or Hölder balls. In van der Vaart and van Zanten 007, the number c is ixed as n/log n 1/α+1, and in

20 0 C. GAO AND H.H. ZHOU van der Vaart and van Zanten 009 c is sampled rom a Gamma distribution. The posterior convergence rates are ɛ 4α n log n α+1 and ɛ 4α+1 n log n α+1, respectively. Recently, this prior was extended by Castillo, Kerkyacharian and Picard 014 or estimation o a unction living on a general maniold M. They constructed a rescaled Gaussian process on M and obtained an improved posterior convergence rate ɛ α n log n α+1. Moreover, they also showed that such a rate cannot urther be improved by a rescaled Gaussian process with a reasonable distribution on the rescaling parameter c. To be speciic, they proved that under mild conditions, there exists a unction B, α Q and a constant C > 0, such that P n Π Cɛ nlog n α α+1 X n 0, or a rescaled Gaussian process Π. Hence, the posterior convergence rate cannot be aster than ɛ nlog n α α+1. To summarize, in this example, the diiculty lies in the prior. It is shown that a certain class o prior distribution is unable to achieve the exact minimax rate Diiculty due to the Proo. The sieve prior is another popular prior used in Bayes nonparametric estimation. It irst samples an integer J, which is the model dimension. Conditioning on J, θ j is sampled rom some distribution p independently or all j J and is set to zero or j > J. Rivoirard and Rousseau 01 considered both ixed J = [n 1 α+1 ] and J sampled rom a distribution with exponential tail. In the irst case, the posterior convergence rate is ɛ nlog n and a slightly slower rate is obtained or the second case. We argue that the diiculty or obtaining the exact minimax rate is not due to the sieve prior itsel, but due to the technique o the proo. Using the prior mass and testing see Section.1 proo technique developed by Barron, Schervish and Wasserman 1999 and Ghosal, Ghosh and van der Vaart 000, it is impossible to get the exact minimax rate. Let us consider the Gaussian sequence model. In this case, the prior mass condition or the truth θ 0 E α Q and the rate ɛ n is 1 Π θ θ 0 ɛ n exp Cnɛ n, or some constant C > 0. Even in the simplest sieve prior where J is chosen to be ixed, 1 cannot hold. This is established in the ollowing lemma.

21 BAYESIAN ADAPTIVE BLOCK PRIOR 1 Lemma 4.. Consider a sieve prior with ixed J and density p. Assume p G or some constant G > 0. Then, or any δ n 0 satisying log δn 1 log n and any θ 0, we have Π θ θ 0 δn exp CJ log n, or some constant C > 0. In the ideal case where J = [n 1 α+1 ], the best possible δ n or 1 to hold is δn n α α+1 log n. The extra log n term cannot be avoided to establish the desired prior mass condition. On the other hand, we show that the sieve prior in Lemma 4. does achieve the exact minimax rate when p is taken as N0, 1. Lemma 4.3. For Gaussian sequence model, consider the prior distribution Π = 1 J N0, 1, with J = [n α+1 ]. Then, we have or any θ0 E α Q, P n Π θ θ 0 Mɛ n X n exp Cnɛ n, or some constants C, M > 0. The proo o this results takes advantage o the conjugacy and calculates the posterior probability directly rom the posterior distribution ormula. Both the proos o Lemma 4. and Lemma 4.3 are stated in the supplementary material Gao and Zhou, 015b. Moreover, we also establish an adaptive version o Lemma 4.3. Namely, consider the prior distribution k π and conditioning on k, nθ j g i.i.d. or 1 j k and θ j = 0 or j > k. πj Theorem 4.4. Assume max j πj 1 c, log πn 1 1 α+1 Cn α+1, log gx log gy C1 + x y and log g0 C or some constants c 0, 1 and C > 0. Then, or Gaussian sequence model with any θ 0 E α Q, we have k > Mn 1 P n Π Π α+1 X n exp C nɛ n, θ θ 0 Mɛ n X n exp C nɛ n, P n or some constants M, C > 0.

22 C. GAO AND H.H. ZHOU The assumption on the prior distribution in Theorem 4.4 is mild. For example, we may choose πj e Dj or some constant D > 0 and choose g to be the double exponential density. The resulting posterior distribution contracts to the true signal at the minimax rate adaptively or all α > 0. The success o this prior crucially depends on the result, which allows us to establish an optimal testing procedure on the set J Mn 1 α+1. However, the proo o takes advantage o the independence structure o the Gaussian sequence model and we are not able to establish or other models. For the same reason, the block spike and slab prior proposed in Homann, Rousseau and Schmidt-Hieber 015 works only or the Gaussian sequence model as well. Their argument in establishing also uses the independence structure o Gaussian sequence model and thus does not work in other settings. To summarize, the sieve prior is an example showing that the current proo technique may result in the sub-optimal posterior convergence rate, while or Gaussian sequence model, special techniques can be used to overcome the diiculty The Block Prior Overcomes Both Diiculties. The above discussion leads to two undamental questions. 1. Is there a prior which can achieve the exact minimax posterior convergence rate without knowing α?. Can the prior mass and testing proo technique handle a minimax optimal adaptive prior? While the importance o the irst question is evident, the second question seems not that relevant at irst thought. However, the prior mass and testing method has a great advantage that it is not speciic to the choice o the prior or the orm o the model. Though we use direct calculation to show the optimal posterior convergence in Lemma 4.3 and Theorem 4.4, the same proo cannot be extended to a setting beyond Gaussian sequence model. The independence structure o Gaussian sequence model plays an important role in the proo. In contrast, the prior mass and testing method is very general so that it can be applied in various settings. The block prior provides airmative answers to both questions. Not only can it achieve the exact minimax rate, its proo also relies on the prior mass and testing method, which makes it easy to apply in many complex settings beyond Gaussian sequence model. We provide various examples in Section 3 including regression, density estimation and spectral density estimation to illustrate the beneit o using the prior mass and testing method. Without the prior mass and testing method, an adaptive prior cannot be easily extended to the case beyond Gaussian sequence model.

23 BAYESIAN ADAPTIVE BLOCK PRIOR 3 In act, the inequality can be written as 3 P n ΠFn X c n exp C nɛ n, where F n can be o a more general orm than that in as long as an optimal testing procedure can be established in F n. Then, both the sieve prior and the block spike and slab prior in Homann, Rousseau and Schmidt- Hieber 015 satisy 3. In contrast, the block prior proposed in this paper satisies 4 ΠF c n exp C nɛ n, which is one o the three conditions required by the prior mass and testing technique. It can be shown that generally 4 is a stronger condition than 3 in the sense that 4 combining the prior mass lower bound imply 3. In this sense, the block prior in this paper is a stronger prior than the sieve prior and the block spike and slab prior in Homann, Rousseau and Schmidt- Hieber 015. To put it in another way, 3 is not only a condition on the prior distribution, it is also a condition on the likelihood, which imposes certain model structure. On the other hand, 4 is a condition only on the prior. This is why it works in various models besides Gaussian sequence model. 5. Proos o Main Results Proo o Theorem.1. We irst outline the proo and list some preparatory lemmas, and then state the proo in details. We introduce the notation Π A to be deined as 5 ΠA = N0, A k I nk. k=1 Given a scale sequence A = {A k }, the random unction = j θ jφ j is distributed by Π A i or each block B k, θ k = {θ j } j Bk N0, A k I nk. Then, Π A is a Gaussian process or a given A, and the block prior is a mixture o Gaussian process with A distributed by the mixing densities {g k } G. Since Π itsel is not a Gaussian process, the result or the l small ball probability asymptotics or Gaussian process cannot be applied directly. Our strategy is to pick a collection V α, and by conditioning, we have 6 Π PV α E Π A A Vα.

24 4 C. GAO AND H.H. ZHOU Then as long as or each A V α, there is constants C 1, C > 0 independent o A, such that 7 ΠA θ j θ 0j ɛ n exp C 1 nɛ n, and 8 PV α exp C nɛ n, then the property 6 is a direct consequence with C = C 1 + C. Thus, picking such V α is important. Generally speaking, or each A V α, we need Π A to behave just like a Gaussian prior designed or estimating E α Q when α is known. The distribution Π A may be hard to deal with. Our strategy is to use the ollowing simple comparison result so that we can study a simpler distribution instead. The lemma will be proved in the supplementary material Gao and Zhou, 015b. Lemma 5.1. For standard i.i.d. Gaussian sequence {Z j } and sequences {a j }, {b j } and {c j }, suppose there is a constant R > 0 such that R 1 a j b j Ra j, or all j, then we have P b j Z j c j R 1 ɛ P a j Z j c j ɛ P b j Z j c j Rɛ. j j Deine J α to be the smallest integer such that J α 8Q 1 α n 1 α+1. Let K to be the smallest integer such that e K > J α, and deine J = [e K ]. Inspired by the comparison lemma, we deine 9 V α = V α,r = with { A : R 1 min 1kK A k A α,k max 1kK A α,k = l α k l α k+1, or k = 1,,..., K. αl k+1 l k Deine the truncated Gaussian process, j } A k R, A α,k 30 ΠA α K K = N0, A α,k I nk. k=1

25 BAYESIAN ADAPTIVE BLOCK PRIOR 5 A random unction = j θ Aα jφ j is distributed by Π K i θ k N0, A α,k I nk or each k = 1,..., K and θ k = 0 or k > K. The comparison lemma implies that we can control Π A or each A V α by the truncated Gaussian process Π Aα Aα K. Additionally, the small ball probability o Π K can be established. The argument is separated in the ollowing lemmas, which will be proved in later sections. Lemma 5.. that Π Aα K For any α > 0, and E α Q, there exists C 3 > 0, such θ j θ 0j ɛ n exp C 3 nɛ n. Lemma 5.3. Lemma 5.4. For each k, let A k g k, with {g k } G. we have PV α exp C nɛ n. For J deined above, and E α Q, we have Π θ j θ 0j ɛ n 1, j>j or suiciently large n. Proo o 6 in Theorem.1. We irst introduce the truncated version o Π A to be K Π A K = N0, A k I nk. k=1 By Lemma 5.4, we have Π θ j θ 0j ɛ n Π J θ j θ 0j ɛ n, θ j θ 0j ɛ n j>j = Π J θ j θ 0j ɛ n Π θ j θ 0j ɛ n j>j 1 Π J θ j θ 0j ɛ n,

26 6 C. GAO AND H.H. ZHOU where we have used independence between dierent blocks in the above equality. In the spirit o 6, we have 31 J Π θ j θ 0j ɛ n PV αe By Lemma 5.1, or each A V α, Π A K θ j θ 0j ɛ n Π A K Π Aα K θ j θ 0j ɛ n θ j θ 0j ɛ n R. By Lemma 5., we have Π Aα K θ j θ 0j ɛ n R exp C nɛ n. A V α. Combining what we have derived and Lemma 5.3, 6 is proved. Proo o 7 in Theorem.1. We ix the constant C in 6, and we are going to prove 7 with the same C. Remember the sieve F n is deined by 5. Deine the set { } 1 A n = A k e k or all k > α + 1 lognβ 1. Then, Condition 4 implies PA c n ΠF c n sup A A n ΠA F c n + PA c n. k>α+1 1 lognβ 1 k>α+1 1 lognβ 1 exp 1 c 3n 1 α+1 β 1 α+1 exp C + 4nɛ n. P A k > e k exp c 3 e k

27 The last inequality is because β each A A n, Π A Fn c = Π A Π A 3 BAYESIAN ADAPTIVE BLOCK PRIOR 7 Π A Π A j>nβ 1 1 α+1 j>nβ 1 1 α+1 j>nβ 1 1 α+1 c 3 C+4 α+1. We bound Π A F c n or θ j θ 0j > ɛ n θ j + θj 1 4 ɛ n k>α+1 1 lognβ 1 k>α+1 1 lognβ 1 j>nβ 1 1 α+1 θ k 1 4 ɛ n Π A { θ k a k ɛ n}, θ0j > ɛ n where k a k 1/4 and we choose a k = ak. The inequality 3 is because θ 0 E α Q and β 4Q α+1 α. Deine χ d to be the chi-square random variable with degree o reedom d. Π A { θ k a k ɛ } n = = k>α+1 1 lognβ 1 k>α+1 1 lognβ 1 k>α+1 1 lognβ 1 k>α+1 1 lognβ 1 P { a 1 k A kχ n k ɛ } n { P ɛ n C e k a 1 k A kχ n k C e k} exp C e k 1 ɛ n C e k a 1 k A k n k, where we can choose C suiciently large. On the set A k, or n suiciently large, A k e k 1 4C a ke k ɛ n, or all k > 1 α + 1 lognβ 1.

28 8 C. GAO AND H.H. ZHOU Thereore, k>α+1 1 lognβ 1 k>α+1 1 lognβ 1 k>α+1 1 lognβ 1 exp C e k 1 ɛ n exp C e k n k exp C 1 log e k exp 1 C 1 β log 1 α+1 nɛ exp C + 4nɛ n, n k C e k a 1 k A k with suiciently large C and n. Hence, sup ΠA Fn c exp C + 4nɛ n, A A n and we have Π Fn c exp C + 4nɛ n. Thus the proo is complete. 5.. Proo o Theorem.. Beore stating the proo o Theorem., we need to establish a testing result. It will be proved in later sections. Lemma 5.5. Let d be a distance satisying the testing property 10 and 11. Suppose that there is b > 0 such that or all 1, D, b 1 d 1, 1 bd 1,. Then or any suiciently large M > 0, there exists a testing unction φ n, such that P n φ n exp 1 LM nɛ n, sup { F n suppπ:d, >Mɛ n} P n 1 φ n exp L nɛ n. The ollowing result is Lemma 10 in Ghosal and van der Vaart 007. It lower bounds the denominator o the posterior distribution in probability.

29 Lemma 5.6. { Π DP n we have P n H c n 1 C nɛ n BAYESIAN ADAPTIVE BLOCK PRIOR 9 Consider H n deined in 17, as long as }, P n bnɛ n, V P n, P n bnɛ n exp or some C > 0. Cnɛ n, Proo o Theorem.. Notice the prior Π inherits the properties 6 and 7 rom Π. Since both DP n, P n and V P n, P n are upper bounded by bn θ 0 θ, we have { } Π DP n, P n bnɛ n, V P n, P n bnɛ n Π θ j θ 0j ɛ n exp Cnɛ n, or the constant C with which Π satisies 6 and 7. By Lemma 5.6, the K-L property o prior implies P n Hn c C 1. Let F nɛ n be the sieve deined n in 5 and we have Π Fn c exp C +4nɛ n. Letting φ n be the testing unction in Lemma 5.5, we have P n Π d, > Mɛ n X n P n Hn c + P n φ n + P n Π d, > Mɛ n X n 1 φ n 1 Hn, where the irst two terms go to 0. The last term has bound P n Π d, > Mɛ n X n 1 φ n 1 Hn exp C + nɛ n P n + exp C + nɛ n P n { F n:d, >Mɛ n} F c n p n p n exp C + nɛ n { F n:d, >Mɛ n} + exp C + nɛ n F c n P n exp C + nɛ n + exp C + nɛ n Π Fn c exp LM C nɛ n p n p n sup p n p n X n dπ P n p n p n X n dπ { F n suppπ:d, >Mɛ n} + exp nɛ n. X n 1 φ n X n dπ X n 1 φ n X n dπ P n 1 φ n

30 30 C. GAO AND H.H. ZHOU We pick M satisying M > L 1 C +, and then every term goes to 0. The proo is complete. Acknowledgements. We want to thank Andrew Barron or the helpul discussion on this topic, and to thank the associate editor and the reerees or their constructive comments and suggestions that lead to the improvement o the paper. SUPPLEMENTARY MATERIAL Supplement A: Supplement to Rate exact Bayesian adaptation with modiied block priors url to be speciied. The supplementary material Gao and Zhou, 015b contains the remaining proos and numerical studies o the block prior. Reerences. [1] Barron, A. R The exponential convergence o posterior probabilities with implications or Bayes estimators o density unctions. Univ.o Illinois. [] Barron, A. R Uniormly powerul goodness o it tests. The Annals o Statistics [3] Barron, A., Schervish, M. J. and Wasserman, L The consistency o posterior distributions in nonparametric problems. The Annals o Statistics [4] Barron, A. R. and Sheu, C.-H Approximation o density unctions by sequences o exponential amilies. The Annals o Statistics [5] Brown, L. D. and Low, M. G Asymptotic equivalence o nonparametric regression and white noise. The Annals o Statistics [6] Brown, L. D., Cai, T. T., Low, M. G. and Zhang, C.-H. 00. Asymptotic equivalence theory or nonparametric regression with random design. The Annals o Statistics [7] Castillo, I., Kerkyacharian, G. and Picard, D Thomas Bayes walk on maniolds. Probability Theory and Related Fields [8] Castillo, I. and van der Vaart, A. 01. Needles and straw in a haystack: Posterior concentration or possibly sparse sequences. The Annals o Statistics [9] de Jonge, R. and van Zanten, J Adaptive nonparametric Bayesian inerence using location-scale mixture priors. The Annals o Statistics [10] Gao, C. and Zhou, H. H. 015a. Rate-optimal posterior contraction or sparse PCA. The Annals o Statistics [11] Gao, C. and Zhou, H. H. 015b. Supplement to Rate exact Bayesian adaptation with modiied block priors. [1] Ghosal, S., Ghosh, J. K. and van der Vaart, A. W Convergence rates o posterior distributions. Annals o Statistics [13] Ghosal, S., Lember, J. and van der Vaart, A Nonparametric Bayesian model selection and averaging. Electronic Journal o Statistics [14] Ghosal, S. and van der Vaart, A Convergence rates o posterior distributions or noniid observations. The Annals o Statistics

31 BAYESIAN ADAPTIVE BLOCK PRIOR 31 [15] Golubev, G. K., Nussbaum, M. and Zhou, H. H Asymptotic equivalence o spectral density estimation and Gaussian white noise. The Annals o Statistics [16] Homann, M., Rousseau, J. and Schmidt-Hieber, J On adaptive posterior concentration rates. The Annals o Statistics to appear. [17] Kruijer, W., Rousseau, J. and van der Vaart, A Adaptive Bayesian density estimation with location-scale mixtures. Electronic Journal o Statistics [18] Kruijer, W. and van der Vaart, A Posterior convergence rates or Dirichlet mixtures o beta densities. Journal o Statistical Planning and Inerence [19] Le Cam, L Convergence o estimates under dimensionality restrictions. The Annals o Statistics [0] Nussbaum, M Asymptotic equivalence o density estimation and Gaussian white noise. The Annals o Statistics [1] Pollard, D Empirical processes: theory and applications. In NSF-CBMS regional conerence series in probability and statistics. JSTOR. [] Rivoirard, V. and Rousseau, J. 01. Posterior concentration rates or ininite dimensional exponential amilies. Bayesian Analysis [3] Rousseau, J Rates o convergence or the posterior distributions o mixtures o betas and adaptive nonparametric estimation o the density. The Annals o Statistics [4] Schwartz, L On bayes procedures. Probability Theory and Related Fields [5] Scricciolo, C Convergence rates or Bayesian density estimation o ininitedimensional exponential amilies. The Annals o Statistics [6] Shen, W., Tokdar, S. T. and Ghosal, S Adaptive Bayesian multivariate density estimation with Dirichlet mixtures. Biometrika. [7] Shen, X. and Wasserman, L Rates o convergence o posterior distributions. The Annals o Statistics [8] van der Vaart, A. W Asymptotic statistics. Cambridge university press. [9] van der Vaart, A. and van Zanten, H Bayesian inerence with rescaled Gaussian process priors. Electronic Journal o Statistics [30] van der Vaart, A. W. and van Zanten, J. H Reproducing kernel Hilbert spaces o Gaussian priors. IMS Collections [31] van der Vaart, A. W. and van Zanten, J. H Adaptive Bayesian estimation using a Gaussian random ield with inverse Gamma bandwidth. The Annals o Statistics [3] Zolotarev, V Asymptotic behavior o the Gaussian measure in l. Journal o Mathematical Sciences

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

arxiv: v2 [math.st] 18 Feb 2013

arxiv: v2 [math.st] 18 Feb 2013 Bayesian optimal adaptive estimation arxiv:124.2392v2 [math.st] 18 Feb 213 using a sieve prior Julyan Arbel 1,2, Ghislaine Gayraud 1,3 and Judith Rousseau 1,4 1 Laboratoire de Statistique, CREST, France

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

On a Closed Formula for the Derivatives of e f(x) and Related Financial Applications

On a Closed Formula for the Derivatives of e f(x) and Related Financial Applications International Mathematical Forum, 4, 9, no. 9, 41-47 On a Closed Formula or the Derivatives o e x) and Related Financial Applications Konstantinos Draais 1 UCD CASL, University College Dublin, Ireland

More information

Latent factor density regression models

Latent factor density regression models Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON

More information

Adaptive Bayesian procedures using random series priors

Adaptive Bayesian procedures using random series priors Adaptive Bayesian procedures using random series priors WEINING SHEN 1 and SUBHASHIS GHOSAL 2 1 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 2 Department

More information

Bayesian Consistency for Markov Models

Bayesian Consistency for Markov Models Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for

More information

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der

More information

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam.  aad. Bayesian Adaptation p. 1/4 Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection

More information

Nonparametric Bayesian Uncertainty Quantification

Nonparametric Bayesian Uncertainty Quantification Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

Asymptotically Efficient Estimation of the Derivative of the Invariant Density

Asymptotically Efficient Estimation of the Derivative of the Invariant Density Statistical Inerence or Stochastic Processes 6: 89 17, 3. 3 Kluwer cademic Publishers. Printed in the Netherlands. 89 symptotically Eicient Estimation o the Derivative o the Invariant Density NK S. DLLYN

More information

ANISOTROPIC FUNCTION ESTIMATION USING MULTI-BANDWIDTH GAUSSIAN PROCESSES

ANISOTROPIC FUNCTION ESTIMATION USING MULTI-BANDWIDTH GAUSSIAN PROCESSES Submitted to the Annals of Statistics arxiv: 1111.1044v1 ANISOTROPIC FUNCTION ESTIMATION USING MULTI-BANDWIDTH GAUSSIAN PROCESSES By Anirban Bhattacharya,, Debdeep Pati, and David Dunson, Duke University,

More information

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

1 Relative degree and local normal forms

1 Relative degree and local normal forms THE ZERO DYNAMICS OF A NONLINEAR SYSTEM 1 Relative degree and local normal orms The purpose o this Section is to show how single-input single-output nonlinear systems can be locally given, by means o a

More information

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 RATE-OPTIMAL GRAPHON ESTIMATION By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Network analysis is becoming one of the most active

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Outline. Approximate sampling theorem (AST) recall Lecture 1. P. L. Butzer, G. Schmeisser, R. L. Stens

Outline. Approximate sampling theorem (AST) recall Lecture 1. P. L. Butzer, G. Schmeisser, R. L. Stens Outline Basic relations valid or the Bernstein space B and their extensions to unctions rom larger spaces in terms o their distances rom B Part 3: Distance unctional approach o Part applied to undamental

More information

Finite Dimensional Hilbert Spaces are Complete for Dagger Compact Closed Categories (Extended Abstract)

Finite Dimensional Hilbert Spaces are Complete for Dagger Compact Closed Categories (Extended Abstract) Electronic Notes in Theoretical Computer Science 270 (1) (2011) 113 119 www.elsevier.com/locate/entcs Finite Dimensional Hilbert Spaces are Complete or Dagger Compact Closed Categories (Extended bstract)

More information

CONVECTIVE HEAT TRANSFER CHARACTERISTICS OF NANOFLUIDS. Convective heat transfer analysis of nanofluid flowing inside a

CONVECTIVE HEAT TRANSFER CHARACTERISTICS OF NANOFLUIDS. Convective heat transfer analysis of nanofluid flowing inside a Chapter 4 CONVECTIVE HEAT TRANSFER CHARACTERISTICS OF NANOFLUIDS Convective heat transer analysis o nanoluid lowing inside a straight tube o circular cross-section under laminar and turbulent conditions

More information

Scattered Data Approximation of Noisy Data via Iterated Moving Least Squares

Scattered Data Approximation of Noisy Data via Iterated Moving Least Squares Scattered Data Approximation o Noisy Data via Iterated Moving Least Squares Gregory E. Fasshauer and Jack G. Zhang Abstract. In this paper we ocus on two methods or multivariate approximation problems

More information

EXISTENCE OF SOLUTIONS TO SYSTEMS OF EQUATIONS MODELLING COMPRESSIBLE FLUID FLOW

EXISTENCE OF SOLUTIONS TO SYSTEMS OF EQUATIONS MODELLING COMPRESSIBLE FLUID FLOW Electronic Journal o Dierential Equations, Vol. 15 (15, No. 16, pp. 1 8. ISSN: 17-6691. URL: http://ejde.math.txstate.e or http://ejde.math.unt.e tp ejde.math.txstate.e EXISTENCE OF SOLUTIONS TO SYSTEMS

More information

STAT 801: Mathematical Statistics. Hypothesis Testing

STAT 801: Mathematical Statistics. Hypothesis Testing STAT 801: Mathematical Statistics Hypothesis Testing Hypothesis testing: a statistical problem where you must choose, on the basis o data X, between two alternatives. We ormalize this as the problem o

More information

Priors for the frequentist, consistency beyond Schwartz

Priors for the frequentist, consistency beyond Schwartz Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist

More information

The Asymptotic Minimax Constant for Sup-Norm. Loss in Nonparametric Density Estimation

The Asymptotic Minimax Constant for Sup-Norm. Loss in Nonparametric Density Estimation The Asymptotic Minimax Constant or Sup-Norm Loss in Nonparametric Density Estimation ALEXANDER KOROSTELEV 1 and MICHAEL NUSSBAUM 2 1 Department o Mathematics, Wayne State University, Detroit, MI 48202,

More information

2. ETA EVALUATIONS USING WEBER FUNCTIONS. Introduction

2. ETA EVALUATIONS USING WEBER FUNCTIONS. Introduction . ETA EVALUATIONS USING WEBER FUNCTIONS Introduction So ar we have seen some o the methods or providing eta evaluations that appear in the literature and we have seen some o the interesting properties

More information

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

In many diverse fields physical data is collected or analysed as Fourier components.

In many diverse fields physical data is collected or analysed as Fourier components. 1. Fourier Methods In many diverse ields physical data is collected or analysed as Fourier components. In this section we briely discuss the mathematics o Fourier series and Fourier transorms. 1. Fourier

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

Nonparametric Bayes Inference on Manifolds with Applications

Nonparametric Bayes Inference on Manifolds with Applications Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces

More information

Received: 30 July 2017; Accepted: 29 September 2017; Published: 8 October 2017

Received: 30 July 2017; Accepted: 29 September 2017; Published: 8 October 2017 mathematics Article Least-Squares Solution o Linear Dierential Equations Daniele Mortari ID Aerospace Engineering, Texas A&M University, College Station, TX 77843, USA; mortari@tamu.edu; Tel.: +1-979-845-734

More information

In the index (pages ), reduce all page numbers by 2.

In the index (pages ), reduce all page numbers by 2. Errata or Nilpotence and periodicity in stable homotopy theory (Annals O Mathematics Study No. 28, Princeton University Press, 992) by Douglas C. Ravenel, July 2, 997, edition. Most o these were ound by

More information

(C) The rationals and the reals as linearly ordered sets. Contents. 1 The characterizing results

(C) The rationals and the reals as linearly ordered sets. Contents. 1 The characterizing results (C) The rationals and the reals as linearly ordered sets We know that both Q and R are something special. When we think about about either o these we usually view it as a ield, or at least some kind o

More information

Telescoping Decomposition Method for Solving First Order Nonlinear Differential Equations

Telescoping Decomposition Method for Solving First Order Nonlinear Differential Equations Telescoping Decomposition Method or Solving First Order Nonlinear Dierential Equations 1 Mohammed Al-Reai 2 Maysem Abu-Dalu 3 Ahmed Al-Rawashdeh Abstract The Telescoping Decomposition Method TDM is a new

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2512 2546 OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 By O. V. Lepski and V. G. Spokoiny Humboldt University and Weierstrass Institute

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

BANDELET IMAGE APPROXIMATION AND COMPRESSION

BANDELET IMAGE APPROXIMATION AND COMPRESSION BANDELET IMAGE APPOXIMATION AND COMPESSION E. LE PENNEC AND S. MALLAT Abstract. Finding eicient geometric representations o images is a central issue to improve image compression and noise removal algorithms.

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2017-0074.R2 Title The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors Manuscript ID SS-2017-0074.R2 URL http://www.stat.sinica.edu.tw/statistica/

More information

The Number of Fields Generated by the Square Root of Values of a Given Polynomial

The Number of Fields Generated by the Square Root of Values of a Given Polynomial Canad. Math. Bull. Vol. 46 (1), 2003 pp. 71 79 The Number o Fields Generated by the Square Root o Values o a Given Polynomial Pamela Cutter, Andrew Granville, and Thomas J. Tucker Abstract. The abc-conjecture

More information

Fluctuationlessness Theorem and its Application to Boundary Value Problems of ODEs

Fluctuationlessness Theorem and its Application to Boundary Value Problems of ODEs Fluctuationlessness Theorem and its Application to Boundary Value Problems o ODEs NEJLA ALTAY İstanbul Technical University Inormatics Institute Maslak, 34469, İstanbul TÜRKİYE TURKEY) nejla@be.itu.edu.tr

More information

Supplementary material for Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values

Supplementary material for Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values Supplementary material or Continuous-action planning or discounted ininite-horizon nonlinear optimal control with Lipschitz values List o main notations x, X, u, U state, state space, action, action space,

More information

SIO 211B, Rudnick. We start with a definition of the Fourier transform! ĝ f of a time series! ( )

SIO 211B, Rudnick. We start with a definition of the Fourier transform! ĝ f of a time series! ( ) SIO B, Rudnick! XVIII.Wavelets The goal o a wavelet transorm is a description o a time series that is both requency and time selective. The wavelet transorm can be contrasted with the well-known and very

More information

The Space of Negative Scalar Curvature Metrics

The Space of Negative Scalar Curvature Metrics The Space o Negative Scalar Curvature Metrics Kyler Siegel November 12, 2012 1 Introduction Review: So ar we ve been considering the question o which maniolds admit positive scalar curvature (p.s.c.) metrics.

More information

Feedback Linearization

Feedback Linearization Feedback Linearization Peter Al Hokayem and Eduardo Gallestey May 14, 2015 1 Introduction Consider a class o single-input-single-output (SISO) nonlinear systems o the orm ẋ = (x) + g(x)u (1) y = h(x) (2)

More information

arxiv: v1 [math.st] 15 Dec 2017

arxiv: v1 [math.st] 15 Dec 2017 A THEORETICAL FRAMEWORK FOR BAYESIAN NONPARAMETRIC REGRESSION: ORTHONORMAL RANDOM SERIES AND RATES OF CONTRACTION arxiv:171.05731v1 math.st] 15 Dec 017 By Fangzheng Xie, Wei Jin, and Yanxun Xu Johns Hopkins

More information

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health

More information

POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES. By Andriy Norets and Justinas Pelenis

POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES. By Andriy Norets and Justinas Pelenis Resubmitted to Econometric Theory POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES By Andriy Norets and Justinas Pelenis Princeton University and Institute for Advanced

More information

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points Roberto s Notes on Dierential Calculus Chapter 8: Graphical analysis Section 1 Extreme points What you need to know already: How to solve basic algebraic and trigonometric equations. All basic techniques

More information

Universidad de Chile, Casilla 653, Santiago, Chile. Universidad de Santiago de Chile. Casilla 307 Correo 2, Santiago, Chile. May 10, 2017.

Universidad de Chile, Casilla 653, Santiago, Chile. Universidad de Santiago de Chile. Casilla 307 Correo 2, Santiago, Chile. May 10, 2017. A Laplace transorm approach to linear equations with ininitely many derivatives and zeta-nonlocal ield equations arxiv:1705.01525v2 [math-ph] 9 May 2017 A. Chávez 1, H. Prado 2, and E.G. Reyes 3 1 Departamento

More information

2.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics. References - geostatistics. References geostatistics (cntd.

2.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics. References - geostatistics. References geostatistics (cntd. .6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics Spline interpolation was originally developed or image processing. In GIS, it is mainly used in visualization o spatial

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

Problem Set. Problems on Unordered Summation. Math 5323, Fall Februray 15, 2001 ANSWERS

Problem Set. Problems on Unordered Summation. Math 5323, Fall Februray 15, 2001 ANSWERS Problem Set Problems on Unordered Summation Math 5323, Fall 2001 Februray 15, 2001 ANSWERS i 1 Unordered Sums o Real Terms In calculus and real analysis, one deines the convergence o an ininite series

More information

Gaussian Process Regression Models for Predicting Stock Trends

Gaussian Process Regression Models for Predicting Stock Trends Gaussian Process Regression Models or Predicting Stock Trends M. Todd Farrell Andrew Correa December 5, 7 Introduction Historical stock price data is a massive amount o time-series data with little-to-no

More information

( x) f = where P and Q are polynomials.

( x) f = where P and Q are polynomials. 9.8 Graphing Rational Functions Lets begin with a deinition. Deinition: Rational Function A rational unction is a unction o the orm ( ) ( ) ( ) P where P and Q are polynomials. Q An eample o a simple rational

More information

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

Inconsistency of Bayesian inference when the model is wrong, and how to repair it Inconsistency of Bayesian inference when the model is wrong, and how to repair it Peter Grünwald Thijs van Ommen Centrum Wiskunde & Informatica, Amsterdam Universiteit Leiden June 3, 2015 Outline 1 Introduction

More information

Numerical Solution of Ordinary Differential Equations in Fluctuationlessness Theorem Perspective

Numerical Solution of Ordinary Differential Equations in Fluctuationlessness Theorem Perspective Numerical Solution o Ordinary Dierential Equations in Fluctuationlessness Theorem Perspective NEJLA ALTAY Bahçeşehir University Faculty o Arts and Sciences Beşiktaş, İstanbul TÜRKİYE TURKEY METİN DEMİRALP

More information

SEPARATED AND PROPER MORPHISMS

SEPARATED AND PROPER MORPHISMS SEPARATED AND PROPER MORPHISMS BRIAN OSSERMAN The notions o separatedness and properness are the algebraic geometry analogues o the Hausdor condition and compactness in topology. For varieties over the

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

Nonparametric Bayesian model selection and averaging

Nonparametric Bayesian model selection and averaging Electronic Journal of Statistics Vol. 2 2008 63 89 ISSN: 1935-7524 DOI: 10.1214/07-EJS090 Nonparametric Bayesian model selection and averaging Subhashis Ghosal Department of Statistics, North Carolina

More information

EXPANSIONS IN NON-INTEGER BASES: LOWER, MIDDLE AND TOP ORDERS

EXPANSIONS IN NON-INTEGER BASES: LOWER, MIDDLE AND TOP ORDERS EXPANSIONS IN NON-INTEGER BASES: LOWER, MIDDLE AND TOP ORDERS NIKITA SIDOROV To the memory o Bill Parry ABSTRACT. Let q (1, 2); it is known that each x [0, 1/(q 1)] has an expansion o the orm x = n=1 a

More information

VALUATIVE CRITERIA BRIAN OSSERMAN

VALUATIVE CRITERIA BRIAN OSSERMAN VALUATIVE CRITERIA BRIAN OSSERMAN Intuitively, one can think o separatedness as (a relative version o) uniqueness o limits, and properness as (a relative version o) existence o (unique) limits. It is not

More information

Math 216A. A gluing construction of Proj(S)

Math 216A. A gluing construction of Proj(S) Math 216A. A gluing construction o Proj(S) 1. Some basic deinitions Let S = n 0 S n be an N-graded ring (we ollows French terminology here, even though outside o France it is commonly accepted that N does

More information

Asymptotic efficiency of simple decisions for the compound decision problem

Asymptotic efficiency of simple decisions for the compound decision problem Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com

More information

ENSC327 Communications Systems 2: Fourier Representations. School of Engineering Science Simon Fraser University

ENSC327 Communications Systems 2: Fourier Representations. School of Engineering Science Simon Fraser University ENSC37 Communications Systems : Fourier Representations School o Engineering Science Simon Fraser University Outline Chap..5: Signal Classiications Fourier Transorm Dirac Delta Function Unit Impulse Fourier

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Scattering of Solitons of Modified KdV Equation with Self-consistent Sources

Scattering of Solitons of Modified KdV Equation with Self-consistent Sources Commun. Theor. Phys. Beijing, China 49 8 pp. 89 84 c Chinese Physical Society Vol. 49, No. 4, April 5, 8 Scattering o Solitons o Modiied KdV Equation with Sel-consistent Sources ZHANG Da-Jun and WU Hua

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Chapter 6 Reliability-based design and code developments

Chapter 6 Reliability-based design and code developments Chapter 6 Reliability-based design and code developments 6. General Reliability technology has become a powerul tool or the design engineer and is widely employed in practice. Structural reliability analysis

More information

The achievable limits of operational modal analysis. * Siu-Kui Au 1)

The achievable limits of operational modal analysis. * Siu-Kui Au 1) The achievable limits o operational modal analysis * Siu-Kui Au 1) 1) Center or Engineering Dynamics and Institute or Risk and Uncertainty, University o Liverpool, Liverpool L69 3GH, United Kingdom 1)

More information

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)

More information

Treatment and analysis of data Applied statistics Lecture 6: Bayesian estimation

Treatment and analysis of data Applied statistics Lecture 6: Bayesian estimation Treatment and analysis o data Applied statistics Lecture 6: Bayesian estimation Topics covered: Bayes' Theorem again Relation to Likelihood Transormation o pd A trivial example Wiener ilter Malmquist bias

More information

Strong Lyapunov Functions for Systems Satisfying the Conditions of La Salle

Strong Lyapunov Functions for Systems Satisfying the Conditions of La Salle 06 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 49, NO. 6, JUNE 004 Strong Lyapunov Functions or Systems Satisying the Conditions o La Salle Frédéric Mazenc and Dragan Ne sić Abstract We present a construction

More information

Least-Squares Spectral Analysis Theory Summary

Least-Squares Spectral Analysis Theory Summary Least-Squares Spectral Analysis Theory Summary Reerence: Mtamakaya, J. D. (2012). Assessment o Atmospheric Pressure Loading on the International GNSS REPRO1 Solutions Periodic Signatures. Ph.D. dissertation,

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics Posterior contraction and limiting shape Ismaël Castillo LPMA Université Paris VI Berlin, September 2016 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September

More information

Objectives. By the time the student is finished with this section of the workbook, he/she should be able

Objectives. By the time the student is finished with this section of the workbook, he/she should be able FUNCTIONS Quadratic Functions......8 Absolute Value Functions.....48 Translations o Functions..57 Radical Functions...61 Eponential Functions...7 Logarithmic Functions......8 Cubic Functions......91 Piece-Wise

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Journal of Mathematical Analysis and Applications

Journal of Mathematical Analysis and Applications J. Math. Anal. Appl. 352 2009) 739 748 Contents lists available at ScienceDirect Journal o Mathematical Analysis Applications www.elsevier.com/locate/jmaa The growth, oscillation ixed points o solutions

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

On the Efficiency of Maximum-Likelihood Estimators of Misspecified Models

On the Efficiency of Maximum-Likelihood Estimators of Misspecified Models 217 25th European Signal Processing Conerence EUSIPCO On the Eiciency o Maximum-ikelihood Estimators o Misspeciied Models Mahamadou amine Diong Eric Chaumette and François Vincent University o Toulouse

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

VALUATIVE CRITERIA FOR SEPARATED AND PROPER MORPHISMS

VALUATIVE CRITERIA FOR SEPARATED AND PROPER MORPHISMS VALUATIVE CRITERIA FOR SEPARATED AND PROPER MORPHISMS BRIAN OSSERMAN Recall that or prevarieties, we had criteria or being a variety or or being complete in terms o existence and uniqueness o limits, where

More information

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric

More information

Online Appendix: The Continuous-type Model of Competitive Nonlinear Taxation and Constitutional Choice by Massimo Morelli, Huanxing Yang, and Lixin Ye

Online Appendix: The Continuous-type Model of Competitive Nonlinear Taxation and Constitutional Choice by Massimo Morelli, Huanxing Yang, and Lixin Ye Online Appendix: The Continuous-type Model o Competitive Nonlinear Taxation and Constitutional Choice by Massimo Morelli, Huanxing Yang, and Lixin Ye For robustness check, in this section we extend our

More information

Global Weak Solution of Planetary Geostrophic Equations with Inviscid Geostrophic Balance

Global Weak Solution of Planetary Geostrophic Equations with Inviscid Geostrophic Balance Global Weak Solution o Planetary Geostrophic Equations with Inviscid Geostrophic Balance Jian-Guo Liu 1, Roger Samelson 2, Cheng Wang 3 Communicated by R. Temam) Abstract. A reormulation o the planetary

More information

Fisher Consistency of Multicategory Support Vector Machines

Fisher Consistency of Multicategory Support Vector Machines Fisher Consistency o Multicategory Support Vector Machines Yueng Liu Department o Statistics and Operations Research Carolina Center or Genome Sciences University o North Carolina Chapel Hill, NC 7599-360

More information

Basic mathematics of economic models. 3. Maximization

Basic mathematics of economic models. 3. Maximization John Riley 1 January 16 Basic mathematics o economic models 3 Maimization 31 Single variable maimization 1 3 Multi variable maimization 6 33 Concave unctions 9 34 Maimization with non-negativity constraints

More information

Wavelet Shrinkage for Nonequispaced Samples

Wavelet Shrinkage for Nonequispaced Samples University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Wavelet Shrinkage for Nonequispaced Samples T. Tony Cai University of Pennsylvania Lawrence D. Brown University

More information

A Simple Explanation of the Sobolev Gradient Method

A Simple Explanation of the Sobolev Gradient Method A Simple Explanation o the Sobolev Gradient Method R. J. Renka July 3, 2006 Abstract We have observed that the term Sobolev gradient is used more oten than it is understood. Also, the term is oten used

More information

Asymptotics of integrals

Asymptotics of integrals December 9, Asymptotics o integrals Paul Garrett garrett@math.umn.e http://www.math.umn.e/ garrett/ [This document is http://www.math.umn.e/ garrett/m/v/basic asymptotics.pd]. Heuristic or the main term

More information

ICES REPORT Model Misspecification and Plausibility

ICES REPORT Model Misspecification and Plausibility ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin

More information

ELEG 3143 Probability & Stochastic Process Ch. 4 Multiple Random Variables

ELEG 3143 Probability & Stochastic Process Ch. 4 Multiple Random Variables Department o Electrical Engineering University o Arkansas ELEG 3143 Probability & Stochastic Process Ch. 4 Multiple Random Variables Dr. Jingxian Wu wuj@uark.edu OUTLINE 2 Two discrete random variables

More information

Asymptotic Equivalence and Adaptive Estimation for Robust Nonparametric Regression

Asymptotic Equivalence and Adaptive Estimation for Robust Nonparametric Regression Asymptotic Equivalence and Adaptive Estimation for Robust Nonparametric Regression T. Tony Cai 1 and Harrison H. Zhou 2 University of Pennsylvania and Yale University Abstract Asymptotic equivalence theory

More information

Optimal robust estimates using the Hellinger distance

Optimal robust estimates using the Hellinger distance Adv Data Anal Classi DOI 10.1007/s11634-010-0061-8 REGULAR ARTICLE Optimal robust estimates using the Hellinger distance Alio Marazzi Victor J. Yohai Received: 23 November 2009 / Revised: 25 February 2010

More information

A universal symmetry detection algorithm

A universal symmetry detection algorithm DOI 10.1186/s40064-015-1156-7 RESEARCH Open Access A universal symmetry detection algorithm Peter M Maurer * *Correspondence: Peter_Maurer@Baylor.edu Department o Computer Science, Baylor University, Waco,

More information

EXISTENCE OF ISOPERIMETRIC SETS WITH DENSITIES CONVERGING FROM BELOW ON R N. 1. Introduction

EXISTENCE OF ISOPERIMETRIC SETS WITH DENSITIES CONVERGING FROM BELOW ON R N. 1. Introduction EXISTECE OF ISOPERIMETRIC SETS WITH DESITIES COVERGIG FROM BELOW O R GUIDO DE PHILIPPIS, GIOVAI FRAZIA, AD ALDO PRATELLI Abstract. In this paper, we consider the isoperimetric problem in the space R with

More information

EEO 401 Digital Signal Processing Prof. Mark Fowler

EEO 401 Digital Signal Processing Prof. Mark Fowler EEO 401 Digital Signal Processing Pro. Mark Fowler Note Set #10 Fourier Analysis or DT Signals eading Assignment: Sect. 4.2 & 4.4 o Proakis & Manolakis Much o Ch. 4 should be review so you are expected

More information

Theorem Let J and f be as in the previous theorem. Then for any w 0 Int(J), f(z) (z w 0 ) n+1

Theorem Let J and f be as in the previous theorem. Then for any w 0 Int(J), f(z) (z w 0 ) n+1 (w) Second, since lim z w z w z w δ. Thus, i r δ, then z w =r (w) z w = (w), there exist δ, M > 0 such that (w) z w M i dz ML({ z w = r}) = M2πr, which tends to 0 as r 0. This shows that g = 2πi(w), which

More information