1. Introduction 1.1. Model. In this paper, we consider the following heteroscedastic white noise model:

Size: px

Start display at page:

Download "1. Introduction 1.1. Model. In this paper, we consider the following heteroscedastic white noise model:"

Ira Young
5 years ago
Views:

1 Volume 14, No ), pp. Allerton Press, Inc. M A T H E M A T I C A L M E T H O D S O F S T A T I S T I C S BAYESIAN MODELLING OF SPARSE SEQUENCES AND MAXISETS FOR BAYES RULES V. Rivoirard Equipe Probabilités, Statistique et Modélisation Laboratoire de mathématiques, CNRS-UMR 868, Bâtiment 45 Université Paris 11 Orsay, 15 rue Georges Clémenceau, Orsay Cedex, France Vincent.Rivoirard@math.u-psud.fr In this paper, our aim is to estimate sparse sequences in the framewor of the heteroscedastic white noise model. To model sparsity, we consider a Bayesian model composed of a mixture of a heavy-tailed density and a point mass at zero. To evaluate the performance of the Bayes rules the median or the mean of the posterior distribution), we exploit an alternative to the minimax setting developed in particular by Keryacharian and Picard: we determine the maxisets for each of these estimators. Using this approach, we compare the performance of Bayesian procedures with thresholding ones. Furthermore, the maxisets obtained can be viewed as weighted versions of wea l q spaces that naturally model sparsity. This remar leads us to investigate the following problem: how can we choose the prior parameters to build typical realizations of weighted wea l q spaces? Key words: Bayes rules, Bayesian model, heteroscedastic white noise model, maxisets, rate of convergence, sparsity, thresholding rules, weighted wea l q spaces. 000 Mathematics Subject Classification: 6C10, 6G05, 6G0. 1. Introduction 1.1. Model. In this paper, we consider the following heteroscedastic white noise model: 1) x = θ + εσ ξ, = 1,,..., where θ = θ ) 1 is an unnown sequence to be estimated by using observations x ) 1, ε > 0 is a small parameter, and ξ ) 1 is an independent and identically distributed i.i.d.) sequence of Gaussian variables with mean zero and unit variance. Along this paper, we assume that σ = σ ) 1 is a nown sequence of c 005 by Allerton Press, Inc. Authorization to photocopy individual items for internal or personal use, or the internal or personal use of specific clients, is granted by Allerton Press, Inc. for libraries and other users registered with the Copyright Clearance Center CCC) Transactional Reporting Service, provided that the base fee of $50.00 per copy is paid directly to CCC, Rosewood Drive, Danvers, MA

2 V. Rivoirard positive real numbers. This heteroscedastic white noise model, that appears as a generalization of the classical white noise model for which, we have 1, σ = 1), is extensively used by statisticians. Let us briefly recall the reasons for this large use and provide references. Given a nown linear operator A, we use the heteroscedastic white noise model when we have to estimate the solution f of the linear equation g = Af, with noisy observations of g. Most of the time, to deal with such a problem, we exploit the singular value decomposition of A and the sequence σ ) 1 is then the eigenvalues sequence of the operator A A, with A the adjoint of A. There is a considerable literature about statistical inverse problems. Let us cite Korostelev and Tsybaov [8], Donoho [14], Golubev and Khasminsii [19], Goldenshluger and Pereverzev [18], Cavalier, Golubev, Picard and Tsybaov [7] and the references cited below. Some well-posed inverse problems with noise can be reduced to 1) with σ 0. The condition σ + characterizes ill-posed problems. For instance, the sequences σ ) 1 associated with operators such as integration considered by Ruymgaart [35], the Radon transform see Cavalier and Tsybaov [8]), convolution for the case studied by Cavalier and Tsybaov [8] or operators for some elliptic differential equations see Mair and Ruymgaart [9]) have a polynomial growth. But, the σ s may grow exponentially. See, for instance, Pereverzev and Schoc [3] who considered the problem of satellite geodesy or the inverse problems associated with partial differential equations such as the heat equation see Mair and Ruymgaart [9]). In the wavelet context, Johnstone [1] and Johnstone and Silverman [] explained that the heteroscedastic white noise model can also be used to represent direct observations with correlated structure. More precisely, let us assume that we are given the following nonparametric regression model: ) Y i = fi/n) + e i, i {1,,..., n}, where n is an integer, f is the signal to be estimated, and the e i s are drawn from a stationary Gaussian process. By studying the autocorrelation function of the e i s, Johnstone [1] and Johnstone and Silverman [] showed that under a good choice of ε and σ = σ ) 1, the model 1) appears as a good approximation of the model ) when n is large. 1.. Bayesian model and Bayes rules. In this paper, we suppose that the sequence θ to be estimated is sparse. It means that only a small proportion of the θ s are non-negligible. When the θ s are wavelet coefficients, this assumption is natural since the underlying property of wavelets is that a function can be well approximated by a function with a relatively small proportion of nonzero wavelet coefficients. In this paper, we model the sparsity by using a Bayesian approach. For this purpose, we assume that the θ s are random and the distribution of θ is such that the θ s are independent and for any 1, there exist a fixed parameter w,ε 0, 1) depending on ε and and a fixed density γ, such that, with probability 1 w,ε, θ is equal to 0 and with probability w,ε, the density of θ is γ,ε, where γ,ε θ) = s,ε γs,ε θ), θ R and s,ε = εσ ) 1.

3 Maxisets for Bayes Rules 3 If δ 0 denotes the Dirac mass at 0, this model is written as follows: M 1 ) θ 1 w,ε )δ 0 θ ) + w,ε γ,ε θ ), 1. So, roughly speaing, the first term models the negligible components and the second one non-negligible ones. Bayesian procedures have now become very popular in signal estimation, since they often outperform classical procedures and in particular thresholding procedures from the practical point of view. See the very complete review paper of Antoniadis, Bigot and Sapatinas [4] who provide descriptions and practical comparisons of various Bayesian wavelet shrinage and wavelet thresholding estimators. It is relevant to note that most of wors about Bayesian procedures tae place in the practical framewor. However, we can cite Johnstone and Silverman [4, 5] and Abramovich, Amato and Angelini [1] who studied Bayesian procedures from the minimax point of view. Most of the authors consider quite similar Bayesian models and often, priors are based on normal distributions. For instance, in the wavelet framewor, Johnstone and Silverman [3] following Abramovich, Sapatinas and Silverman [3] and Clyde, Parmigiani and Vidaovic [10] consider a mixture of a normal component and a point mass at zero for the wavelet coefficients. Chipman, Kolaczy and McCulloch [9] impose a mixture of two Gaussian distributions with different variances for negligible and non-negligible wavelet coefficients. Let us add that, often, properties of conjugate families enable statisticians to point out easily Bayes rules when Gaussian priors are considered in the classical Gaussian white noise model. However, Johnstone and Silverman [4, 5] did not use Gaussian distributions and showed the advantages from the minimax point of view in considering heavy-tailed distributions. In the maxiset framewor, we shall draw similar conclusions concerning γ. The posterior distribution of θ given x is 3) γ φ,ε θ x ) = φ x θ )[w,ε γ,ε θ ) + 1 w,ε )δ 0 θ )] w,ε φ x θ)γ,ε θ) dθ + 1 w,ε )φ x ), where φ denotes the density of εσ ξ, using notations of Section 1.1. For all ε > 0, we assume that we are given a real number Λ ε > 1 depending only on ε and b1 tending to + as ε 0. We estimate each θ by ˆθ x b ) or by ˆθ x ) defined by the following procedure. b1 If < Λ ε, ˆθ x b ) respectively ˆθ x )) is the median respectively the mean) of the posterior distribution of θ given x. Therefore, these rules satisfy: for any b1 ˆθ < ˆθ x ), F γ φ ˆθ ) < 0.5 F,ε γ φ ˆθ b 1 x )), ˆθb x ) = θ γ φ,ε θ x ) dθ,,ε where F γ φ denotes the cumulative distribution function of γ φ,ε x ).,ε If Λ ε, then ˆθ b 1 x ) = ˆθ b x ) = 0.

4 4 V. Rivoirard The values of the hyperparameters w,ε, γ, and Λ ε will be chosen later. Other properties of these Bayes rules are given in Section.1. So, the choice for Bayes rules is very classical. Indeed, most of the time, in practice, the Bayes rules used by statisticians are the median, and more frequently the mean of the posterior distribution, which are generally better estimates than the mode see Berger [5], p. 101). Note that the posterior mean and the posterior median are built by using the posterior distribution on its whole support. For instance, let us cite Chipman, Kolaczy and McCulloch [9] and Clyde, Parmigiani and Vidaovic [10] who used the posterior mean. But Abramovich, Sapatinas and Silverman [3] considered the posterior median under a Bayesian model that has the same form as M 1 ). In their Bayesian framewor, unlie the posterior mean, the posterior median is a true thresholding rule. In the wavelet context, they showed several simulated examples for which this approach improves most of the traditional methods. If from the practical point of view, the median seems preferable to the mean, what happens under a theoretical approach? From the minimax point of view, Theorem 1 of Johnstone and Silverman [4] showed that the posterior median of their Bayes model achieves optimal rates of convergence under Besov body constraints and for l q - losses, with 0 < q. If the posterior mean is used, optimal rates are also achieved but only if 1 < q see Section 7.3 of [4]). This provides some theoretical justification for preferring the posterior median over the posterior mean. In this paper, to evaluate the performance of ˆθ b 1 = ˆθ b 1 x )) 1 and ˆθ b = ˆθ b x )) 1, we use neither a practical approach nor the minimax theory, which have been extensively considered, but the maxiset theory that we describe now The maxiset theory and functional spaces. Let us first motivate the introduction of the maxiset point of view. When nonparametric problems are explored, the minimax theory is the most popular point of view: it consists in ensuring that the used procedure ˆθ = ˆθ x )) 1 achieves the best rate on a given sequence space S. But, at first, the choice of S is arbitrary what ind of spaces has to be considered: Sobolev spaces? Besov spaces? why?), secondly, S could contain sequences very difficult to estimate. Since the unnown quantity θ = θ ) 1 could be easier to estimate, the used procedure could be too pessimistic and not adapted to the data. More embarrassing in practice, several minimax procedures may be proposed and the practitioner has no way to decide but his experiment. To answer these issues, another point of view has recently appeared: the maxiset point of view introduced by Cohen, DeVore, Keryacharian and Picard [1] and Keryacharian and Picard [6]. Given an estimate ˆθ, it consists in assessing the accuracy of ˆθ by fixing a prescribed rate ρ ε and pointing out the set of all the sequences θ that can be estimated by the procedure ˆθ at the target rate ρ ε. So, under the statistical model 1), we introduce the following definition. Definition 1. Let 1 p < and let ˆθ = ˆθ x )) 1 be an estimator. The maxiset of ˆθ associated with the rate ρ ε and the l p -loss is MSˆθ, ρ ε, p) = { [ θ = θ ) 1 : sup E ) 1/p ] } ˆθ x ) θ p ρ 1 ε <. ε 1

5 Maxisets for Bayes Rules 5 The maxiset point of view brings answers to the previous issues. Indeed, there is no a priori assumption on θ and then, the practitioner does not need to restrict his study to an arbitrary sequence space. The practitioner states the desired accuracy and then, nows the quality of the used procedure. Obviously, he chooses the procedure with the largest maxiset. Let us give first examples of maxiset results in the statistical framewor of this paper. For this purpose, we need to introduce the following sequence spaces. Definition. For all 1 p < and 0 < η <, we set { Bp, η = θ = θ ) 1 : sup λ } pη θ p <, λ and if q is a real number such that 0 < q < p, we set { wl p,q σ) = θ = θ ) 1 : sup λ q 1 θ >λσ σ p }. < Now, let us focus on thresholding rules associated with the universal threshold λ,ε = σ ε log ε see Donoho and Johnstone [16]): for all ε > 0, we assume that we are given a real number Λ ε > 0 only depending on ε and tending to + as ε 0, and we set { ˆθ x t x 1 x κ ) = λ,ε if < Λ ε, 0 otherwise, where κ is a constant. Keryacharian and Picard [6] have studied the maxisets for this procedure. They obtained the following result for ˆθ t = ˆθ t x )) 1 Theorems 3.1 and 3. of Keryacharian and Picard [6]): Theorem 1. Let 1 p < be a fixed real number and 0 < r <. We suppose that 4) ε, Λ ε = ε log ε ) r, and there exists a positive constant T such that ε 5) ε κ /16 log ε 1/4 p/ <Λ ε σ p T. Let q be a fixed positive real number such that q < p. Then, if κ p, ˆθt MS, ε log ε ) ) 1 q/p), p = wl p,q σ) B 1 r 1 q/p) p,. Remar 1. In this last result, of course, to avoid problems of definitions in 4) and 5), it is implicitly assumed, without loss of generality, that ε remains smaller than a positive constant ε 0 strictly smaller than 1 equal to 1/ for instance).

6 6 V. Rivoirard For the model 1), we can prove that under some conditions, the maxisets associated with linear estimates of the form l x ) 1, where l ) 1 is a non-increasing sequence of weights lying in [0, 1], are the spaces Bp,, η called Besov bodies. These conditions are satisfied, for instance, by projection weights, Tihonov Phillips weights or Pinser weights. For further details see Theorem of Rivoirard [34]. We can add that Lemma 1 of Rivoirard [34] proves that for the rate ε log ε ) 1 q/p), the maxisets of linear estimates are strictly contained in the maxisets of thresholding rules. It means that from the maxiset point of view, linear estimates are outperformed by thresholding ones. Keryacharian and Picard [7] also applied the maxiset theory for local bandwidth selection in the framewor of the Gaussian white noise model. Under some conditions, they proved that local bandwidth selection is at least as good as the thresholding procedure see Section 5 of [7]). The comparison of procedures based on maxisets is not as widely used as minimax comparison. However the results that have been obtained up to now are very promising since they generally show that the maxisets of well-nown procedures are spaces that are well established and easily interpretable. Indeed, in the maxiset approach, the Besov bodies the spaces Bp, ) η control the θ s for the large values of. As for the spaces wl p,q σ), they can be viewed as weighted wea l q spaces. The wea l q space is the space wl p,q σ) when σ = 1 for any 1, so we denote it wl p,q 1), and it was considered in statistics by Johnstone [0], Donoho and Johnstone [17] or Abramovich, Benjamini, Donoho and Johnstone []. This space was also studied in approximation theory and coding by DeVore [13], Donoho [15], or Cohen, DeVore and Hochmuth [11]. Abramovich, Benjamini, Donoho and Johnstone [] proved that if we order the components of a sequence θ according to their size: θ 1) θ ) θ n)..., then θ wl p,q 1) sup n 1/q θ n) < n see Section 1. of []). So, wl p,q 1) spaces naturally measure the sparsity of a signal. Of course, the weighted versions of these spaces, the wl p,q σ) spaces, share the same property. So, since Bayesian Procedures, commonly used in practice, have been barely studied from a theoretical point of view, it seems relevant to investigate the maxiset results for the Bayesian procedures ˆθ b1 and ˆθ b introduced in Section 1. and our first two goals in this paper will be the following: 1. to point out the maxisets of the Bayesian procedures ˆθ b1 and ˆθ b,. to compare these estimators with traditional procedures in the maxiset approach by comparing their respective maxisets. We shall draw interesting conclusions from the maxiset results of these classical Bayes rules that are extensively used in practice Maxiset results. Given 1 p < and ρ ε the prescribed rate, our first issue is to determine for i {1, }, MSˆθ b i, ρ ε, p) once we have fixed assumptions on the hyperparameters w,ε, γ, and Λ ε. First, throughout this paper, we tae the density γ to be unimodal, symmetric about 0, positive, and absolutely

7 Maxisets for Bayes Rules 7 continuous on R. We also assume that there exist two positive constants M and M 1 such that H 1 ) sup d log γθ) dθ = M <. θ M 1 It implies that the tails of γ have to be exponential or heavier. This enables b1 us to establish asymptotic properties of ˆθ x b ) and ˆθ x ). We prove that the posterior median ˆθ b 1 x ) is a thresholding rule. The posterior mean does not have this thresholding property, but is a shrinage rule see Propositions 1,, and 3 in Section ). We also assume that w,ε = w ε depends only on ε and we set π ε = 1 w ε )w 1 ε. Then, the maxisets for these procedures can be pointed out if we tae in addition ρ ε and Λ ε of the form ρ ε = ε log π ε ) 1 q/p, Λ ε = ε log π ε ) r, with 0 < r < and 0 < q < p. Corollaries 1 and show that under mild assumptions on π ε and on the size of the σ s see Assumptions 7), 8), 10), and 11), for i {1, }, MS ˆθb i, ε log π ε ) 1 q/p, p ) = wl p,q σ) B 1/r1 q/p) p,. In particular, it is possible to choose w ε = ε ν, as soon as ν is great enough to satisfy Assumptions 7), 8), 10), and 11). So, as far as the maxiset point of view is concerned, and roughly under the same conditions, both Bayesian procedures achieve exactly the same performance as the thresholding one for the rate ε log ε ) 1 q/p). And for this last rate, we can then claim that each of the Bayesian procedures outperforms the linear algorithm. We note in Section.4 that Assumption H 1 ) on the tails of γ is essential to get maxisets as large as possible. Finally, in Section.5, the previous results are readily extended to the case where we want to estimate functions f decomposed in an appropriate unconditional fixed basis B = {ψ, 1} as f = 1 θ ψ. We assume that θ is still observed through the model 1) and still estimated by Bayes rules. In this case, the maxisets are no longer sequence spaces but real functional spaces. See Section.5 for more details Connections between the proposed Bayesian model and the spaces pointed out. Starting from a Bayesian model, the outcomes of the study of the associated natural Bayes rules under the maxiset approach are the spaces wl p,q σ). Thus, the Bayesian model M 1 ) and wl p,q σ) spaces are connected throughout the maxiset approach. We can wonder whether or not this connection is artificial. It is worthwhile to observe that the Bayesian model has been constructed to model the sparsity of the sequences to be estimated. And as recalled in Section 1.3, wl p,q σ) spaces are natural spaces to measure the sparsity of a sequence by controlling the proportion of non-negligible θ s. The third goal of this paper is then the following. 3. Can we establish a direct connection between M 1 ) and wl p,q σ) spaces?

8 8 V. Rivoirard Actually, we would lie to prove a result similar to the one obtained by Abramovich, Sapatinas and Silverman [3] and exploited by Abramovich, Amato and Angelini [1]. Abramovich, Sapatinas and Silverman considered in the wavelet framewor a Bayesian model denoted M1 )) similar to M 1 ), where γ is fixed in advance and is the density of a Gaussian variable with mean zero and unit variance. Then, they established a necessary and sufficient condition on the other hyperparameters of M1 ) to ensure that the signal built from the wavelet coefficients coming from M1 ) belongs, almost surely, to a prescribed Besov space see Theorem 1 of Abramovich, Sapatinas and Silverman [3]). We would lie to do the same job with M 1 ) and wl p,q σ) spaces, but without fixing γ in advance. Theorem 7 of Section 3 gives answers to this issue. In particular, we point out the condition sup λ q + γx) dx <, which means that the tails of γ cannot be heavier λ than those of a Paretoq)-variable. Consequently, similarly to the result presented in Section. of Rivoirard [33], this result illustrates the strong connections between Paretoq)-distributions and wl p,q σ) spaces. Theorem 7 is proved by using results on weighted empirical distribution process established by Marcus and Zinn [30] Contents. In Section, we give some properties of the Bayes rules and evaluate the maxisets obtained for the Bayesian procedures ˆθ b 1 and ˆθ b. Section 3 investigates the relationships between the Bayesian model and wl p,q σ) spaces. Finally, in Section 4, we prove the results concerning the asymptotic properties of the Bayes rules.. Maxisets for Bayesian Procedures.1. Properties of the Bayes rules. In the Introduction, we defined the Bayes procedures ˆθ b1 and ˆθ b used in this paper. Recall that for < Λ ε, ˆθ b 1 x ) respectively ˆθ b x )) is the median respectively the mean) of the posterior distribution of θ given x given by 3). We also mention that if γ,ε denotes the prior distribution of θ, then for any 1 p < and any estimator ˆθ x ) of θ, the Bayes ris Bˆθ, γ,ε, p) of ˆθ x ) with respect to γ,ε associated with the l p -loss defined by Bˆθ, γ,ε, p) = γ,ε θ ) ˆθ x ) θ p φ x θ ) dθ dx, where φ still denotes the density of εσ ξ, satisfies: and Bˆθ b 1, γ,ε, 1) Bˆθ, γ,ε, 1), Bˆθ b, γ,ε, ) Bˆθ, γ,ε, ). These Bayes rules are shrinage rules. In particular, they satisfy the following property, which will be capital for description of their maxisets. For further details, see Lemma, inequality 6), and Section 5.5 of Johnstone and Silverman [5]. Proposition 1. For all 1, since γ is symmetric, absolutely continuous, positive, and unimodal, we have for i {1, }, ˆθ b i x1 ) ˆθ b i x ) for any x1, x ) such that x1 x,

9 Maxisets for Bayes Rules 9 bi ˆθ x ) = ˆθ bi x ) for any x, 0 ˆθ b i x ) x for any x 0, ˆθ bi x ) θ max θ ; x θ ) for any x, θ ). Assumptions imposed on the Bayesian model enable us to deduce other properties. Assumption H 1 ) implies that u M 1, γu) γm 1 ) exp Mu M 1 )). It means that the tails of γ have to be exponential or heavier. We shall see below see Section.4) that this assumption is essential to get maxisets as large as possible. Furthermore, w,ε = w ε depends only on ε and we shall assume throughout this paper that π ε denoted H ): 1. ε π ε is continuous,. inf ε>0 π ε > 1, 3. π 1 = exp1), ε 0 4. π ε +, = 1 w ε )w 1 ε 5. ε ε 0 log π ε 0. b1 Then, we deduce the asymptotic behavior of ˆθ x ). satisfies the following mild assumptions, globally Proposition. Assume that H 1 ) and H ) hold. For all < Λ ε, ˆθ b 1 x ) is a thresholding rule, i.e., there exists a uniquely defined tπ ε ) such that ˆθ b1 x ) = 0 s,ε x tπ ε ), where the threshold tπ ε ) satisfies tπ ε ) logπ ε ) for π ε large enough, and lim π ε + tπ ε ) logπε ) = 1. Furthermore, there exists a positive constant C such that lim sup s,ε x s,ε ˆθb 1 x ) 1 s,ε x >tπ ε ) C. π ε + Remar. The threshold tπ ε ) introduced in Proposition will be used throughout this paper, even though it is only implicitly defined. The proof of this proposition is given in the Appendix. Since < Λ ε, ˆθ b 1 x ) is a thresholding rule, it will be easy to evaluate the maxiset for ˆθ b1. As for the posterior mean, we have the following useful result. Proposition 3. Assume that H 1 ) and H ) hold. Let < Λ ε be fixed. There exist two functions ε 1 and ε bounded on [1, + ) such that ε 1 x) x 0, and 6) ˆθb x ) = x 1 + ε 1 s,ε x ) 1 + π ε φs,ε x )γs,ε x ) 1 ε s,ε x ),

10 10 V. Rivoirard where φ denotes the density of a 0, 1) Gaussian variable. If tπ ε ) is the threshold introduced in Proposition, there exists a positive constant C such that lim sup s,ε x s,ε ˆθb x ) 1 s,ε x >tπ ε ) C. π ε + Proposition 3 is proved in the Appendix. Remar 3. By using the results of Proposition 3, we have for π ε large enough, ˆθ b s 1,ε tπ ε ) ) π 1 ε s,ε 1 log πε. Unlie ˆθ b 1 x ), for all < Λ ε, ˆθ b x ) is not a thresholding rule, since ˆθ b x ) 0 if x 0, and we can easily prove that under H 1 ), lim x u exp 1 ) u )γu) du 1 s,ε x 0 + ˆθb exp 1 u )γu) du + π x ) = 1. ε Now, we are ready to evaluate the maxisets for both Bayesian rules. In Sections. and.3, for i {1, }, the ris of ˆθ b i studied under the l p -norm 1 p < ), will be denoted R p ˆθ b i ). We have: ) 1/p. R p ˆθ b i ) = E ˆθ b i θ p l p ) 1/p = E ˆθ bi x ) θ p b1.. Maxisets for the posterior median. Since < Λ ε, ˆθ x ) is a thresholding rule, it is easy to evaluate the maxiset for ˆθ b 1. On the one hand, we have: Theorem. Assume that H 1 ) and H ) hold. Let 0 < r < and 1 p < be two fixed real numbers. Suppose that ε > 0, 1 Λ ε = ε log π ε ) r and there exist two positive constants T 1 and T such that ε > 0, 7) 8) ε p πε 1 log π ε ) 1 p T1, π 1 8 ε log π ε ) 1 4 p σ p T. <Λ ε Let q be a fixed positive real number such that q < p. If θ B 1 r 1 q/p) p, then there exists a positive constant C such that ε > 0, R p ˆθ b1 ) Cε log π ε ) 1 q/p. wl p,q σ),

11 Maxisets for Bayes Rules 11 The proof of this theorem that uses Proposition is omitted since it is inspired by the proof of Theorem 3.1 of Keryacharian and Picard [6] and is very similar to the proof of Theorem 4 of Section.3. On the other hand, we have: Theorem 3. Assume that H 1 ) and H ) hold. Let 0 < r < and 1 p < be two fixed real numbers. Suppose that ε > 0, Λ ε = ε log π ε ) r. Let q be a fixed positive real number such that q < p. constant C such that If there exists a positive then θ B 1 r 1 q/p) p, ε > 0, R p ˆθ b 1 ) Cε log π ε ) 1 q/p, wl p,q σ). Before proving Theorem 3, let us recall the following result Lemma. of Keryacharian and Picard [6]): Proposition 4. For any 1 p < and 0 < q < p, { wl p,q σ) = θ = θ ) 1 : sup λ } q p 1 θ λσ θ p <. Proof of Theorem 3. Since π 1 = exp1), we have Λ 1 = 1 and θ p l p any ε > 0, θ p E ˆθ b1 θ p l p C p Λ p q)/r ε, Λ ε and since ε Λ ε is continuous with lim ε 0 Λ ε = +, we have: C p. For sup λ pr 1 qp ) θ p C p, λ and θ B 1 r 1 q p ) p,. In the following, we shall use tπ ε ) denoted t if there is no ris of confusion) introduced in Proposition and the following inequality: for any m > 0, with Z N 0, 1), for ε small enough, 9) P s,ε x s,ε θ t ) P Z 1 ) log πε m m e u / du 1 m log πε π exp 1 1 ) ) log πε e m u 1 m Klog π ε ) 1/ π 1/m ε, 0 log πε du π

12 1 V. Rivoirard where K depends only on m. So, for < Λ ε, and for ε small enough, θ p 1 s,ε θ t = θ p E 1 s,ε θ t 1 s,ε x t) + E θ p 1 s,ε θ t 1 s,ε x <t) θ p 1 s,ε θ t P s,ε x s,ε θ t ) + E ˆθ b1 x ) θ p Therefore, for ε small enough, 1 θ p 1 s,ε θ t + E ˆθ b 1 x ) θ p. θ p 1 s,ε θ t E ˆθ b 1 θ p l p C p ε log π ε ) p q. Since ε π ε is continuous, it implies that there exists λ 0 > 0 such that sup λ<λ 0 λ q p θ p 1 θ σ λ <. Since θ B 1 r 1 q p ) p,, sup λ λ 0 λ q p θ p 1 θ σ λ λ q p 0 θ p <. Using Proposition 4, we have proved that θ B 1 r 1 q p ) p, wl p,q σ). Finally, from Theorems and 3, we deduce easily: Corollary 1. Assume that H 1 ) and H ) hold. Let 0 < r < and 1 p < be two fixed real numbers. Suppose that ε > 0, Λ ε = ε log π ε ) r such that 7) and 8) are satisfied. Let q be a fixed positive real number such that q < p. Then, MS ˆθb 1, ε log π ε ) 1 q/p, p ) = wl p,q σ) B 1 r 1 q/p) p,. We can conclude that the spaces B 1 r 1 q p ) p, wl p,q σ) appear as maximal spaces b1 where ˆθ attains specific rates of convergence..3. Maxisets for the posterior mean. As in Section., we prove the following results: Theorem 4. Assume that H 1 ) and H ) hold. Let 0 < r < and 1 p < be two fixed real numbers. Suppose that ε > 0, Λ ε = ε log π ε ) r

13 Maxisets for Bayes Rules 13 and there exist two positive constants T 1 and T such that ε > 0, 10) 11) ε p π 1 4 ε log π ε ) 1 p T1, π 1 3 ε log π ε ) 1 4 p σ p T. <Λ ε Let q be a fixed positive real number such that q < p. If θ B 1 r 1 q/p) p, then there exists a positive constant C such that ε > 0, R p ˆθ b ) Cε log π ε ) 1 q/p. wl p,q σ), Proof. In the following, K will denote a constant independent of ε that may be different at each line. Let t = tπ ε ) be the threshold introduced in Proposition. For all < Λ ε, E ˆθ b x ) θ p = A + B + C, with and A = E ˆθ b x ) θ p 1 s,ε x t, B = E ˆθ b x ) θ p 1 t < s,εx t, C = E ˆθ b x ) θ p 1 s,ε x >t. For the first term, we have, using Remar 3 and 9), A = E ˆθ b x ) θ p 1 s,ε x t p 1 E ˆθ b x ) p 1 s,ε x t + p 1 θ p E 1 s,ε x t Ks p,ε π p ε log π ε ) p + p 1 θ p E [1 s,ε x t 1 s,ε θ >t] + p 1 θ p E [1 s,ε x t 1 s,ε θ t] Kσ p εp π p ε log π ε ) p + p 1 θ p P s,ε x s,ε θ t ) + p 1 θ p 1 s,ε θ t Kσ p εp π p ε log π ε ) p + K θ p π 1 4 ε log π ε ) 1 + p 1 θ p 1 s,ε θ t. The second term can be split into B = B 1 + B + B 3, with Using Proposition 1 and 9), B 1 = E ˆθ b x ) θ p 1 t < s,εx t1 s,ε θ >3t, B = E ˆθ b x ) θ p 1 t < s,εx t1 t 4 < s,εθ 3t, B 3 = E ˆθ b x ) θ p 1 t < s,εx t1 s,ε θ t 4. B 1 = E ˆθ b x ) θ p 1 t < s,εx t1 s,ε θ >3t

14 14 V. Rivoirard E x θ p 1 x θ θ 1 t < s,εx t1 s,ε θ >3t + θ p E 1 x θ < θ 1 t < s,εx t1 s,ε θ >3t E x θ p ) 1 P s,ε x s,ε θ t) 1 + θ p P s,ε x s,ε θ t) Kε p σ p π 1 ε log π ε ) K θ p π 1 ε log π ε ) 1, B = E ˆθ b x ) θ p 1 t < s t,εx t1 4 < s,εθ 3t E x θ p 1 x θ θ 1 t 4 < s,εθ + θ p E 1 x θ < θ 1 s,ε θ 3t Kε p σ p P s,ε x s,ε θ t ) 1 + θ p 1 s,ε θ 4 3t Kε p σ p π 1 3 ε log π ε ) θ p 1 s,ε θ 3t, B 3 = E ˆθ b x ) θ p 1 t < s,εx t1 s,ε θ t 4 E x θ p 1 s,ε x s,ε θ t 4 E x θ p ) 1 P s,ε x s,ε θ t ) 1 4 Kε p σ p π 1 3 ε log π ε ) 1 4. For the last term, we denote τ, ε) = s,ε x s,ε ˆθb x ) and recall that ξ is the noise. Using Proposition 3 and 9), we have C = E ˆθ b x ) θ p 1 s,ε x >t = E ˆθ b x ) θ p 1 s,ε x >t1 s,ε θ t + E ˆθ b x ) θ p 1 s,ε x >t1 s,ε θ >t s p,ε E ξ τ, ε) p 1 s,ε x >t) 1 P s,ε x s,ε θ t) 1 + s p,ε E ξ τ, ε) p 1 s,ε x >t1 s,ε θ >t Ks p,ε P s,εx s,ε θ t) 1 + Ks p,ε 1 s,ε θ >t Kε p σ p π 1 ε log π ε ) s,ε θ >t). Finally, for ε small enough, and < Λ ε, [ E ˆθ b x ) θ p K ε p σ p 1 π 3 ε log π ε ) ε p σ p 1 θ >εσ log πε + θ p 1 θ 4εσ log πε + θ p π 1 4 ε log π ε ) 1 We conclude by using Proposition 4 and by observing that E ˆθ b θ p l p = E ˆθ b x ) θ p + θ p Kε log πε ) p q, <Λ ε Λ ε since θ B 1 r 1 q p ) p, wl p,q σ) and Λ ε = ε log π ε ) r. ].

15 Maxisets for Bayes Rules 15 As in Section., we have a converse result, but unlie Theorem 3, we need to control the size of the σ s. Theorem 5. Assume that H 1 ) and H ) hold. Let 0 < r < and 1 p < be two fixed real numbers. Suppose that ε > 0, Λ ε = ε log π ε ) r. Let q be a fixed positive real number such that q < p. constant C such that then θ B 1 r 1 q p ) p, ε > 0, R p ˆθ b ) Cε log π ε ) 1 q/p, If there exists a positive wl p,q σ) as soon as there exists a positive constant T such that σ p T. <Λ ε 1) ε > 0, π p ε Proof. To prove that θ B 1 r 1 q p ) p,, we refer the reader to the proof of Theorem 3. Then, we want to show that sup λ q p θ p 1 θ λσ <. For this, we still use the threshold t = tπ ε ) of Proposition. Using 9), for any < Λ ε and for ε small enough, θ p 1 s,ε θ t = θ p E 1 4 s,ε θ t 1 4 s,ε x t ) + θ p E 1 s,ε θ t 1 4 s,ε x < t ) θ p 1 s,ε θ t P s 4,ε x s,ε θ t ) 4 + θ p 1 s,ε θ t P s 4,ε x < t ) 1 θ p 1 s,ε θ t 4 + θ p 1 s,ε θ t 4 P s,ε x < t By using Remar 3, we have for ε small enough, ˆθ b s 1 t ),ε π 1 ε s 1,ε log πε. Therefore, θ p 1 s,ε θ t θ 4 p 1 s,ε θ t P s 4,ε x < t ) [ E ˆθ b x ) θ p + ˆθ b x ) p] 1 s,ε x < t p p ). E ˆθ b x ) θ p + p <Λ ε E ˆθ b x ) p 1 s,ε x < t p E ˆθ b θ p l p + p <Λ ε s p p,ε π ε log π ε ) p p C p ε log π ε ) p q + p ε p π p ε log π ε ) p <Λ ε σ p.

16 16 V. Rivoirard Consequently, under condition 1), for ε small enough, θ p 1 θ p C p + T )ε log π 1 4 σ ε log π ε ) p q. ε Using the same arguments as for the proof of Theorem 3, the last inequality implies that θ wl p,q σ). Finally, from Theorems 4 and 5, observing that condition 1) is less restrictive than condition 11) since p/ 1/ > 1/3, we deduce easily: Corollary. Assume that H 1 ) and H ) hold. Let 0 < r < and 1 p < be two fixed real numbers. Suppose that ε > 0, Λ ε = ε log π ε ) r such that 10) and 11) are satisfied. Let q be a fixed positive real number such that q < p. Then, MS ˆθb, ε log π ε ) 1 q/p, p ) = wl p,q σ) B 1 r 1 q/p) p,. Once more, we can conclude that the spaces B 1 r 1 q p ) p, wl p,q σ) appear as maximal spaces where ˆθ b attains specific rates of convergence..4. First conclusions. Before going further, let us compare the various procedures involved in this paper. We recall that unlie ˆθ b 1 x ), ˆθ b x ) does not possess the advantage of being a thresholding rule < Λ ε ) and this explains the differences between the assumptions that are needed to determine the respective b1 maxisets associated with the Bayesian procedures ˆθ and ˆθ b. For instance, this explains why, unlie ˆθ b1 b, if ˆθ achieves the given rate of convergence, we need a condition on the σ s to prove that θ B 1 r 1 q p ) p, wl p,q σ). Furthermore, to obtain the upper bound for R p ˆθ b ), we use a decomposition into eleven terms for E ˆθ b x ) θ p. Since it is a thresholding rule, the corresponding decomposition b1 for ˆθ is simpler. This explains why the assumptions of Theorem 4 are a bit more restrictive than those of Theorem. Actually, to obtain the assumptions of Theorem 4, we just have to replace π ε with πε 1/4 in the assumptions of Theorem. But, since we consider a rate of the form ε log π ε without focusing on the optimal constant, the Bayesian procedures achieve exactly the same performance from the maxiset point of view. When π ε is a power of ε, then, by using Theorem 1, we can compare the Bayesian procedures ˆθ b 1 and ˆθ b with the thresholding one. We can conclude that each of them achieves the same performance as the thresholding one. Finally, since linear estimates are outperformed by thresholding ones, they are also outperformed by ˆθ b 1 and ˆθ b. Let us show now the importance of Assumption H 1 ). Section 7. of Johnstone and Silverman [4] proves that if γ is a normal density, whatever the value of w,ε, the posterior median satisfies ˆθ b1 x ) 1 α) x,

17 Maxisets for Bayes Rules 17 for some α > 0 and the same inequality holds for the posterior mean. It yields for θ > 0, E ˆθ b 1 x ) θ p 1 αp θ p. Moreover, when γ has tails equivalent to exp C t λ ) for some λ 1, ), Johnstone and Silverman showed that for large θ, ˆθ b 1 x ) θ C θ λ 1. Thus H 1 ) cannot be essentially relaxed without obtaining smaller maxisets..5. Maxisets of Bayesian procedures for estimating functions of L p spaces. In this section, we estimate functions of L p D) = { f : f Lp = D fx) p dx) 1/p < }, where D = [0, 1] d or D = R d. For this purpose, we exploit a wavelet basis of L D) denoted B = {ψ, 1}. More precisely, we assume that ψ ) 1 is the wavelet-tensor product constructed on compactly supported wavelets see Meyer [31]). So, if 1 < p <, Meyer [31] proved that B is an unconditional basis of L p D), which means that: for any f L p D), there exists a unique sequence θ such that f = θ ψ, there exists an absolute constant K such that if 1, θ θ, then θ ψ K θ ψ Lp. Lp Remar 4. The restriction 1 < p < is due to the fact that there is no unconditional basis if p / 1, ). Furthermore, we assume that {σ ψ, 1} satisfies the following inequality, called a superconcentration inequality: for any 0 < r 1 <, there exists a constant Cp, r 1 ) such that for all F {1,,... }, [ F σ ψ r1 ] 1 r 1 Lp Cp, r 1 ) sup σ ψ Lp. F Remar 5. Theorem 4. of Keryacharian and Picard [6] gives conditions on the σ s and on B to satisfy the above superconcentration inequality. We consider the model 1), and the function f = θ ψ is estimated by ˆf b 1 = b1 ˆθ x )ψ or ˆf b = b ˆθ x )ψ. In this framewor, we set: Definition 3. For 1 < p < and any i {1, }, the maxiset of ˆf b i associated with the rate ρ ε and the L p -loss is MS ˆf bi, ρ ε, p) = { f = [ θ ψ : sup E ˆf ] bi f p L p ) 1 p ρ 1 ε ε } <.

18 18 V. Rivoirard To study maxisets of ˆf b 1 and ˆf b, we introduce for any η > 0 and any 0 < q < p: B η p, B) = wl p,q σ)b) = { f = { f = θ ψ : sup λ η θ ψ : sup λ q λ θ ψ Lp } <, } 1 θ >λσ σ p ψ p L p <. If B is a standard wavelet basis regular enough, the space B η p, B) can be identified with a real Besov space. See Meyer [31] for further details. Theorem 6. Assume that H 1 ) and H ) hold. Let 0 < r < be a fixed real number. Suppose that ε > 0, Λ ε = ε log π ε ) r, and there exist two positive constants T 1 and T such that for any ε > 0, and ε p [ πε 1 log π ε ) ] 1 1 minp;) log π ε ) p T1 π 1 8 ε log π ε ) 1 4 p <Λ ε σ p ψ p L p T. Let q be a fixed positive real number such that q < p. Then, under the model 1), MS b ˆf 1, ε log π ε ) 1 q/p, p ) = wl p,q σ)b) B 1 r 1 q/p) p, B). The analogous result for ˆf b is obtained if we assume that for any ε > 0, and ε p [ π 1 4 ε log π ε ) ] 1 1 minp;) log π ε ) p T3, π 1 3 ε log π ε ) 1 4 p where T 3 and T 4 are two positive constants. <Λ ε σ p ψ p L p T 4, Once more, when π ε is a power of ε i.e., for the rate ε log1/ε)), by using Theorems 5.1 and 5. of Keryacharian and Picard [6] we can conclude that each of the Bayesian procedures achieves the same performance as the thresholding one. Proof. Again, we only give the proof for the procedure associated with the mean. Recall that B is an unconditional basis of L p D) if and only if there exists M > 0 such that for any set F {1,,... } and any choice of the coefficients c s 13) M 1 c ψ Lp F F c ψ ) 1 Lp M c ψ. Lp F

19 Maxisets for Bayes Rules 19 Furthermore, since {σ ψ, 1} satisfies a superconcentration inequality, there exist two positive constants c p and C p such that for any F {1,,... }, we have: ) p 14) c p σ ψ p σ ψ Cp σ ψ p. F F F In the following, K will denote a constant independent of ε that may be different at each line. We use the threshold t = tπ ε ) introduced in Proposition and the results of Proposition 1. Let us assume that f wl p,q σ)b) B 1 r 1 q/p) p, E ˆθ b θ )ψ p L p is bounded by K 11 A 1 = θ ψ Λ ε p L p B). Then, i=1 A i, with the A i s defined as follows: K ε log π ε ) p q), since f B 1 r 1 q/p) p, B). Next, A = E θ ψ 1 s,ε x t 1 s,ε θ >t p L p <Λ ε ) p KE θψ 1 s,ε x s,ε θ t, <Λ ε by using 13). If p, by using 9), the Jensen inequality, and 13), A K θψ P s,ε x s,ε θ t/ )) p <Λ ε K πε 1/4 log π ε ) ) 1 p θ ψ p. L p <Λ ε If p, by using 9), the generalized Minowsi inequality, and 13) A K θψ P s,ε x s,ε θ t/ ) ) p p <Λ ε Kπε 1/4 log π ε ) 1 θ ψ p L p <Λ ε and A 3 = E θ ψ 1 s,ε x t 1 s,ε θ t <Λ ε p L p by using 13). Further, A 4 = E ˆθb ψ 1 s,ε x t p KE L p <Λ ε ) Kε p πε p/ log π ε ) p p σψ <Λ ε K θψ 1 s,ε θ t <Λ ε <Λ ε ˆθ b ) 1 s,ε x t ψ Kε p π p/ ε log π ε ) p ) p ) p <Λ ε σ p ψ p L p,,

20 0 V. Rivoirard by using 13), Remar 3, and 14). Using 13), we have: A 5 = E θ )ψ 1 t < s t,εx t1 If p, <Λ ε ˆθ b KE A 5 K <Λ ε x θ ) ψ 1 s,ε x s,ε θ t 4 <Λ ε ψ p E Kε p π 1 3 ε log π ε ) < s,εθ 3t1 x θ θ p L p ) p. x θ p ) 1 s,εx s,εθ t 4 <Λ ε ψ p σ p, by using 9) and the Cauchy Schwarz inequality. If p, by using 9), the generalized Minowsi inequality, and 14), A 5 K ψ E x θ p ) ) p 1 s,εx s,εθ t p 4 <Λ ε Kε p πε 13 log π ε ) 1 4 ψ p σ p. Now we have A 6 = E ˆθ b <Λ ε <Λ ε θ )ψ 1 t < s,εx t1 t K θψ 1 s,ε θ 3t <Λ ε ) p by using 13). By using 13), the following terms: A 7 = E θ )ψ 1 t ˆθ b <Λ ε, 4 < s,εθ 3t1 x θ < θ p L p < s,εx t1 s,ε θ >3t1 x θ θ p L p KE x θ ) ψ1 s,ε x s,ε θ t <Λ ε A 8 = E θ )ψ 1 t ˆθ b <Λ ε KE θψ 1 s,ε x s,ε θ t <Λ ε A 9 = E ˆθ b θ )ψ 1 t <Λ ε KE ) p < s,εx t1 s,ε θ >3t1 x θ < θ p L p ) p < s,εx t1 s,ε θ t p 4 L p <Λ ε x θ ) ψ 1 s,ε x s,ε θ t 4,, ) p

21 Maxisets for Bayes Rules 1 are bounded, up to a constant, by the upper bounds of A or A 5. Next, A 10 = E ˆθ b <Λ ε KE θ )ψ 1 s,ε x >t1 s,ε θ t ˆθ b <Λ ε p L p θ ) ψ 1 s,ε x >t1 s,ε x s,ε θ >t ) p, by using 13). If p, by using 9), the Cauchy Schwarz inequality, and Proposition 3, A 10 K K ψ p E ˆθ b <Λ ε ψ p ε p σ p P <Λ ε Kε p π 1/ ε log π ε ) 1 4 θ p 1 s,ε x >t1 s,ε x s,ε θ >t s,ε x s,ε θ > t ) 1 <Λ ε σ p ψ p. If p, by using the generalized Minowsi inequality, the Cauchy Schwarz inequality, 9), 14), and Proposition 3, A 10 K <Λ ε ψ E ˆθb θ p 1 s,ε x >t1 s,ε x s,ε θ >t K ψσ ε P s,ε x s,ε θ > t) 1 p <Λ ε Kε p πε 1/ log π ε ) 1/4 ψ p σ p. <Λ ε ) p ) ) p p Finally, A 11 = E ˆθ b <Λ ε KE θ )ψ 1 s,ε x >t1 s,ε θ >t ˆθ b <Λ ε p L p θ ) ψ 1 s,ε x >t1 s,ε θ >t ) p. If p, by using Proposition 3, A 11 K Kε p ψ p E ˆθ b θ p 1 s,ε x >t1 s,ε θ >t <Λ ε ψ p σ p 1 s,ε θ >t. <Λ ε

22 V. Rivoirard If p, by using 14) and Proposition 3, A 11 K ψ 1 s,ε θ >t E ˆθb θ p 1 s,ε x >t <Λ ε Kε p <Λ ε ψ σ 1 s,ε θ >t ) p <Λ ε ) p ) p Kε p ψ p σ p 1 s,ε θ >t. Thus we conclude that there exists a positive constant C such that ε > 0, E ˆθ b x p ) θ )ψ C p ε ) p q), log π ε L p by using the following result proved by Keryacharian and Picard [6] upper bound of the term B, p. 311): if f = θ ψ wl p,q σ)b), then λ > 0, θψ 1 θ σ λ ) p Kλ p q. Now, let us assume that there exists a positive constant C such that ε > 0, E ˆθ b x ) θ )ψ p L p C p ε log π ε ) p q). To bound the following term, we just use 13): A 1 = θ ψ Λ ε KE ˆθ b p L p = E θ )ψ p ˆθ b Λ ε L p θ )ψ p L p K ε log π ε ) p q, KE ˆθ b θ ) ψ ) p which proves that f = θ ψ B 1 r 1 q/p) p, B). Now, using 13), θ ψ 1 s,ε θ t p KA A 13 ), L p with A 13 = p L p θ ψ 1 s,ε θ t 4 <Λ ε p 1 E θ ψ 1 s,ε θ t 1 4 s,ε x > t p + p 1 A 14 L p <Λ ε 1 θ ψ 1 s,ε θ t p + p 1 A L p A 13 + p 1 A 14 p A 14, <Λ ε

23 Maxisets for Bayes Rules 3 for ε small enough see the upper bound of A ), where A 14 = E θ ψ 1 s,ε θ t 1 4 s,ε x t p. L p <Λ ε But, using 13), we obtain A 14 KE ˆθ b θ p )ψ + KE ˆθb ψ 1 s,ε x t L p <Λ ε KE ˆθ b θ p )ψ + KA 4 K ε ) p q), log π ε L p p L p which implies that θ ψ 1 s,ε θ t 4 p L p K ε log π ε ) p q). Now, we use Lemma 5.1 of Keryacharian and Picard [6], which ends the proof of the theorem. 3. Relationships between M 1 ) and wl p,q σ) Spaces In this paper, our aim is to estimate sparse sequences, and we model sparsity within a Bayes approach, and more precisely, by using the model M 1 ). We noticed that under this model, the maxisets for the previous Bayes rules are defined by using wl p,q σ) spaces. To some extent, this result is not surprising, since we recalled in the Introduction that these spaces are weighted versions of wea l q spaces that naturally measure sparsity. Then, the model M 1 ) and wl p,q σ) spaces are connected via a maxiset approach. So, it is natural to wonder whether we can establish other more natural connections between our Bayesian approach to model sparsity and wl p,q σ) spaces. The following result gives a positive answer. Theorem 7. Suppose that we are given 1 p < and 0 < q < p. We again consider the model M 1 ) with ε = 1. Denote w = w,1 and λ 0, F λ) = γx) dx. If there exists a constant C such that λ sup λ q σ p 1 θ >σ λ C q a.s., then sup λ q F λ) w σ p Cq. Conversely, if there exists a constant C such that 15) sup λ q F λ) w σ p Cq, then sup λ q σ p 1 θ >σ λ < a.s.

24 4 V. Rivoirard Proof. If sup λ q σ p 1 θ >σ λ C q a.s., we have: sup λ q F λ) w σ p = sup E E λ q sup λ q σ p 1 θ >σ λ ) ) σ p 1 θ >σ λ C q. Conversely, suppose that 15) is true. To establish the required inequality, we exploit Theorem 0.3 of Marcus and Zinn [30]. Let r ) 1 be a Rademacher sequence i.e., a sequence of i.i.d. random variables taing values +1 and 1 with probability 1/ each) independent of θ ) 1 and let S n be the partial sum of the symmetrized random variables σ p q θ q ) 1 : S n = n Y, =1 where We have Y = r σ p q θ q. E Y 1 Y 1) = 0, θ P Y > 1) = P σ q ) > σ p vary 1 Y 1) = E Y 1 Y 1) = σ p p/q σ = w σ p 0 C q w σ p C q w σ p = w F σ p/q E [ σ 1 x q γx) dx w σ p ) C q w σ p θ ) q 1 σ 1 θ σ p/q σ p/q ) 1w p/q σ σ p qx q 1 dx 0 ) 1w σ p. 0 ] ) 1w σ p, qx q 1 F x) dx Using the three series theorem see Theorem.8 of Billingsley [6]), S n converges with probability 1 as n +. Therefore, if µ is a fixed positive real number, we have for any increasing sequence of positive real numbers b n ) n with lim n + b n = +, lim sup n + 1 b n S n µ a.s.

25 Maxisets for Bayes Rules 5 Obviously, lim sup n + sup 1 j n 1 sup λ q j σ p b E 1 θ >σ λ) µ n =1 a.s. Then, we can apply Theorem 0.3 of Marcus and Zinn [30], which shows that Since it yields lim sup n + 1 sup λ q n σ b p 1 θ >σ λ P θ > σ λ) ) 1160 µ. n lim sup n + lim sup n + =1 1 sup b n 1 sup b n λ q λ q n =1 n =1 σ p P θ > σ λ) = 0, σ p 1 θ >σ λ 1160 µ. With µ 0, we have proved that for any increasing sequence of positive real numbers b n ) n with lim n + b n = +, lim n + 1 sup b n λ q n =1 σ p 1 θ >σ λ = 0 a.s. If the random sequence A n = sup λ q n =1 σp 1 θ >σ λ were not bounded, we could construct an increasing function Φ, with lim n + Φn) = +, such that A Φn) > n. By considering an increasing sequence b n ) n, with b Φn) = n, we obtain a contradiction. So, there exists a finite random variable Y such that n 1, A n Y, a.s. It implies that sup λ q σ p 1 θ >σ λ < a.s. The result is proved. So, to ensure that a sequence coming from the Bayesian model M 1 ) belongs to wl p,q σ) almost surely, we should not consider densities γ having tails heavier than those of Paretoq)-distributions. In the wavelet framewor, with special values for the σ s, Rivoirard [33] has already noted the strong connections between Paretoq)- distributions and wl p,q σ) spaces, since in Section. of that paper, least favorable priors for these spaces are presented and it is explained how these priors are built from Paretoq)-variables. Concluding remars. In this paper, we discussed the modelling of sparsity. The form of our Bayesian model was the following: θ w,ε γ,ε θ ) + 1 w,ε )δ 0 θ ), 1.

26 6 V. Rivoirard Provided the tails of γ,ε are exponential or heavier, the maxisets of the Bayes rules are wl p,q σ) spaces that naturally measure the sparsity of a signal. So, our choice for the Bayesian modelling seems appropriate. It is all the more appropriate since this model enables us to build typical realizations of wl p,q σ) spaces. The main goal of this paper was to compare in the maxiset approach the performances of classical Bayes estimators: the posterior median and mean of our Bayesian model. We proved that for a large range of loss functions, the maxisets of these estimators coincide with the maxisets associated with thresholding estimators. These results have been established for the heteroscedastic white noise model 1), where the ξ s are assumed to be Gaussian and independent. It would be interesting to study the maxisets of the Bayes rules without these assumptions. It would also be interesting to try to find Bayes estimators that outperform ˆθ b 1 and ˆθ b under the maxiset approach, if possible. Note that if the maxiset theory seems to provide advantages, the problem of optimality in this approach remains an entirely open issue. Can we introduce a meaningful notion of optimality? If yes, what are optimal estimators? Other natural questions arise: what do the maxisets become when the σ s are unnown and they are, for instance, estimated by using a Bayes approach? Do we obtain larger maxisets when the θ s are gathered in non-overlapping blocs, each of which is provided with a prior? Since the outcome of the maxiset approach is a functional space or a sequence space), we have not focused on the Bayes riss of the estimators. But, inspired by the maxiset point of view used in this paper, we could investigate the maximal set of prior distributions such that the associated Bayes ris of a given estimator achieves a prescribed rate. This provides an interesting topic for further research. 4. Appendix In this section, O x 1) will denote any function of x that is bounded as x +. We write o πε 1) for any function that is bounded by a function depending only on π ε and that tends to 0 as π ε tends to +. Furthermore, φ denotes the density of a 0, 1) Gaussian variable and γ is the density introduced in M 1 ). We shall exploit the following lemma: and Lemma 1. For any x > 0 and 0 < τ < x, define: K τ x) = Ix) = x τ 0 exp 1 v) γx v) dv exp 1 ) v + xv γv) dv. Under H 1 ), there exist four positive constants M, M 3, C 1, and C such that M x τ exp 1 v Mv ) dv K τ x)γx) 1 M 3 τ exp 1 ) v + Mv dv, and C 1 lim inf x + Ix)γx) 1 φx) lim sup Ix)γx) 1 φx) C <. x +

27 Maxisets for Bayes Rules 7 Proof. Under H 1 ), since γ is positive, absolutely continuous, symmetric, and unimodal, it is easy to show that there exist two constants M and M 3 such that, a, b) R, M exp M a b ) γa)γb) 1 M 3 expm a b ). We immediately get the first inequality. Now, let us define x > 0, As before, M 0 Jx) = exp 1 v Mv By simple computations, we have: 0 exp 1 v) γx + v) dv. ) dv Jx)γx) 1 M 3 0 exp 1 ) v + Mv dv. which implies the result. Ix) = exp 1 x) Jx) + K0 x) ), Now, let us give the proof of Propositions and 3. Proof of Proposition. Without loss of generality, we can assume that x > 0. Then, using Proposition 1, ˆθ b1 x ) = 0 Pθ > 0 x ) < 1 w ε s,ε φ s,ε x θ) ) s,ε γs,ε θ) dθ 0 < w ε s,ε φ s,ε x θ) ) s,ε γs,ε θ) dθ + 1 w ε )s,ε φs,ε x ) 0 0 s,ε φ s,ε x θ) ) s,ε γs,ε θ) dθ s,ε φ s,ε x θ) ) s,ε γs,ε θ) dθ < π ε s,ε φs,ε x ). Then ˆθ b 1 x ) = 0 s,ε x t, with t such that 0 exp 1 u) γu) exptu) exp tu) ) du = π ε. We have that t is a function depending only on π ε and as π ε +, 0 exp 1 u) γu) exptu) du = π ε 1 + o πε 1)).

28 8 V. Rivoirard Using Lemma 1, we have, for π ε large enough, logπε ) tπ ε ) logπ ε )1 + o πε 1)), and the first statement of Proposition is proved. For the second statement, we assume that s,ε x ˆθ b1 x ) > 0. Using 13), we have: > t, which implies that P ˆθb 1 θ ˆθ b 1 x ) 1 ) x = w ε φ x θ)γ,ε θ) dθ + 1 w ε )φ x ) = w ε φ x θ)γ,ε θ) dθ s,ε ˆθb 1 = exp Using Lemma 1, since s,ε x t, we prove easily that Therefore, π ε Is,ε x ) 1 = o πε 1). s,ε x u 1 u) γu) du + π ε exp s,ε x u 1 u) γu) du. s,ε ˆθb 1 exp s,ε x u 1 u) γu) du = Is,ε x )1 + o πε 1)). By using again Lemma 1, it implies that there exists a positive constant V such that for π ε large enough, s,ε ˆθb 1 s,εx exp 1 v) γv + s,ε x ) dv γs,ε x ) 1 V K s,ε x s,ε ˆθb 1 s,ε x )γs,ε x ) 1 V, in notations of Lemma 1. Finally, there exists a positive constant C such that lim sup s,ε x s,ε ˆθb 1 x ) 1 s,ε x >tπ ε ) C. π ε + Proof of Proposition 3. Without loss of generality, we can assume that x > 0. We have: ˆθ b x ) = = θγ φ,ε θ x ) dθ s,εθφs,ε x θ))s,ε γs,ε θ) dθ s,εφs,ε x θ))s,ε γs,ε θ) dθ + π ε s,ε φs,ε x ) = 1 u exp 1 u + s,ε x u ) γu) du s +,ε exp 1 u + s,ε x u ). γu) du + π ε

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII

Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric