1. Introduction 1.1. Model. In this paper, we consider the following heteroscedastic white noise model:

Size: px
Start display at page:

Download "1. Introduction 1.1. Model. In this paper, we consider the following heteroscedastic white noise model:"

Transcription

1 Volume 14, No ), pp. Allerton Press, Inc. M A T H E M A T I C A L M E T H O D S O F S T A T I S T I C S BAYESIAN MODELLING OF SPARSE SEQUENCES AND MAXISETS FOR BAYES RULES V. Rivoirard Equipe Probabilités, Statistique et Modélisation Laboratoire de mathématiques, CNRS-UMR 868, Bâtiment 45 Université Paris 11 Orsay, 15 rue Georges Clémenceau, Orsay Cedex, France Vincent.Rivoirard@math.u-psud.fr In this paper, our aim is to estimate sparse sequences in the framewor of the heteroscedastic white noise model. To model sparsity, we consider a Bayesian model composed of a mixture of a heavy-tailed density and a point mass at zero. To evaluate the performance of the Bayes rules the median or the mean of the posterior distribution), we exploit an alternative to the minimax setting developed in particular by Keryacharian and Picard: we determine the maxisets for each of these estimators. Using this approach, we compare the performance of Bayesian procedures with thresholding ones. Furthermore, the maxisets obtained can be viewed as weighted versions of wea l q spaces that naturally model sparsity. This remar leads us to investigate the following problem: how can we choose the prior parameters to build typical realizations of weighted wea l q spaces? Key words: Bayes rules, Bayesian model, heteroscedastic white noise model, maxisets, rate of convergence, sparsity, thresholding rules, weighted wea l q spaces. 000 Mathematics Subject Classification: 6C10, 6G05, 6G0. 1. Introduction 1.1. Model. In this paper, we consider the following heteroscedastic white noise model: 1) x = θ + εσ ξ, = 1,,..., where θ = θ ) 1 is an unnown sequence to be estimated by using observations x ) 1, ε > 0 is a small parameter, and ξ ) 1 is an independent and identically distributed i.i.d.) sequence of Gaussian variables with mean zero and unit variance. Along this paper, we assume that σ = σ ) 1 is a nown sequence of c 005 by Allerton Press, Inc. Authorization to photocopy individual items for internal or personal use, or the internal or personal use of specific clients, is granted by Allerton Press, Inc. for libraries and other users registered with the Copyright Clearance Center CCC) Transactional Reporting Service, provided that the base fee of $50.00 per copy is paid directly to CCC, Rosewood Drive, Danvers, MA

2 V. Rivoirard positive real numbers. This heteroscedastic white noise model, that appears as a generalization of the classical white noise model for which, we have 1, σ = 1), is extensively used by statisticians. Let us briefly recall the reasons for this large use and provide references. Given a nown linear operator A, we use the heteroscedastic white noise model when we have to estimate the solution f of the linear equation g = Af, with noisy observations of g. Most of the time, to deal with such a problem, we exploit the singular value decomposition of A and the sequence σ ) 1 is then the eigenvalues sequence of the operator A A, with A the adjoint of A. There is a considerable literature about statistical inverse problems. Let us cite Korostelev and Tsybaov [8], Donoho [14], Golubev and Khasminsii [19], Goldenshluger and Pereverzev [18], Cavalier, Golubev, Picard and Tsybaov [7] and the references cited below. Some well-posed inverse problems with noise can be reduced to 1) with σ 0. The condition σ + characterizes ill-posed problems. For instance, the sequences σ ) 1 associated with operators such as integration considered by Ruymgaart [35], the Radon transform see Cavalier and Tsybaov [8]), convolution for the case studied by Cavalier and Tsybaov [8] or operators for some elliptic differential equations see Mair and Ruymgaart [9]) have a polynomial growth. But, the σ s may grow exponentially. See, for instance, Pereverzev and Schoc [3] who considered the problem of satellite geodesy or the inverse problems associated with partial differential equations such as the heat equation see Mair and Ruymgaart [9]). In the wavelet context, Johnstone [1] and Johnstone and Silverman [] explained that the heteroscedastic white noise model can also be used to represent direct observations with correlated structure. More precisely, let us assume that we are given the following nonparametric regression model: ) Y i = fi/n) + e i, i {1,,..., n}, where n is an integer, f is the signal to be estimated, and the e i s are drawn from a stationary Gaussian process. By studying the autocorrelation function of the e i s, Johnstone [1] and Johnstone and Silverman [] showed that under a good choice of ε and σ = σ ) 1, the model 1) appears as a good approximation of the model ) when n is large. 1.. Bayesian model and Bayes rules. In this paper, we suppose that the sequence θ to be estimated is sparse. It means that only a small proportion of the θ s are non-negligible. When the θ s are wavelet coefficients, this assumption is natural since the underlying property of wavelets is that a function can be well approximated by a function with a relatively small proportion of nonzero wavelet coefficients. In this paper, we model the sparsity by using a Bayesian approach. For this purpose, we assume that the θ s are random and the distribution of θ is such that the θ s are independent and for any 1, there exist a fixed parameter w,ε 0, 1) depending on ε and and a fixed density γ, such that, with probability 1 w,ε, θ is equal to 0 and with probability w,ε, the density of θ is γ,ε, where γ,ε θ) = s,ε γs,ε θ), θ R and s,ε = εσ ) 1.

3 Maxisets for Bayes Rules 3 If δ 0 denotes the Dirac mass at 0, this model is written as follows: M 1 ) θ 1 w,ε )δ 0 θ ) + w,ε γ,ε θ ), 1. So, roughly speaing, the first term models the negligible components and the second one non-negligible ones. Bayesian procedures have now become very popular in signal estimation, since they often outperform classical procedures and in particular thresholding procedures from the practical point of view. See the very complete review paper of Antoniadis, Bigot and Sapatinas [4] who provide descriptions and practical comparisons of various Bayesian wavelet shrinage and wavelet thresholding estimators. It is relevant to note that most of wors about Bayesian procedures tae place in the practical framewor. However, we can cite Johnstone and Silverman [4, 5] and Abramovich, Amato and Angelini [1] who studied Bayesian procedures from the minimax point of view. Most of the authors consider quite similar Bayesian models and often, priors are based on normal distributions. For instance, in the wavelet framewor, Johnstone and Silverman [3] following Abramovich, Sapatinas and Silverman [3] and Clyde, Parmigiani and Vidaovic [10] consider a mixture of a normal component and a point mass at zero for the wavelet coefficients. Chipman, Kolaczy and McCulloch [9] impose a mixture of two Gaussian distributions with different variances for negligible and non-negligible wavelet coefficients. Let us add that, often, properties of conjugate families enable statisticians to point out easily Bayes rules when Gaussian priors are considered in the classical Gaussian white noise model. However, Johnstone and Silverman [4, 5] did not use Gaussian distributions and showed the advantages from the minimax point of view in considering heavy-tailed distributions. In the maxiset framewor, we shall draw similar conclusions concerning γ. The posterior distribution of θ given x is 3) γ φ,ε θ x ) = φ x θ )[w,ε γ,ε θ ) + 1 w,ε )δ 0 θ )] w,ε φ x θ)γ,ε θ) dθ + 1 w,ε )φ x ), where φ denotes the density of εσ ξ, using notations of Section 1.1. For all ε > 0, we assume that we are given a real number Λ ε > 1 depending only on ε and b1 tending to + as ε 0. We estimate each θ by ˆθ x b ) or by ˆθ x ) defined by the following procedure. b1 If < Λ ε, ˆθ x b ) respectively ˆθ x )) is the median respectively the mean) of the posterior distribution of θ given x. Therefore, these rules satisfy: for any b1 ˆθ < ˆθ x ), F γ φ ˆθ ) < 0.5 F,ε γ φ ˆθ b 1 x )), ˆθb x ) = θ γ φ,ε θ x ) dθ,,ε where F γ φ denotes the cumulative distribution function of γ φ,ε x ).,ε If Λ ε, then ˆθ b 1 x ) = ˆθ b x ) = 0.

4 4 V. Rivoirard The values of the hyperparameters w,ε, γ, and Λ ε will be chosen later. Other properties of these Bayes rules are given in Section.1. So, the choice for Bayes rules is very classical. Indeed, most of the time, in practice, the Bayes rules used by statisticians are the median, and more frequently the mean of the posterior distribution, which are generally better estimates than the mode see Berger [5], p. 101). Note that the posterior mean and the posterior median are built by using the posterior distribution on its whole support. For instance, let us cite Chipman, Kolaczy and McCulloch [9] and Clyde, Parmigiani and Vidaovic [10] who used the posterior mean. But Abramovich, Sapatinas and Silverman [3] considered the posterior median under a Bayesian model that has the same form as M 1 ). In their Bayesian framewor, unlie the posterior mean, the posterior median is a true thresholding rule. In the wavelet context, they showed several simulated examples for which this approach improves most of the traditional methods. If from the practical point of view, the median seems preferable to the mean, what happens under a theoretical approach? From the minimax point of view, Theorem 1 of Johnstone and Silverman [4] showed that the posterior median of their Bayes model achieves optimal rates of convergence under Besov body constraints and for l q - losses, with 0 < q. If the posterior mean is used, optimal rates are also achieved but only if 1 < q see Section 7.3 of [4]). This provides some theoretical justification for preferring the posterior median over the posterior mean. In this paper, to evaluate the performance of ˆθ b 1 = ˆθ b 1 x )) 1 and ˆθ b = ˆθ b x )) 1, we use neither a practical approach nor the minimax theory, which have been extensively considered, but the maxiset theory that we describe now The maxiset theory and functional spaces. Let us first motivate the introduction of the maxiset point of view. When nonparametric problems are explored, the minimax theory is the most popular point of view: it consists in ensuring that the used procedure ˆθ = ˆθ x )) 1 achieves the best rate on a given sequence space S. But, at first, the choice of S is arbitrary what ind of spaces has to be considered: Sobolev spaces? Besov spaces? why?), secondly, S could contain sequences very difficult to estimate. Since the unnown quantity θ = θ ) 1 could be easier to estimate, the used procedure could be too pessimistic and not adapted to the data. More embarrassing in practice, several minimax procedures may be proposed and the practitioner has no way to decide but his experiment. To answer these issues, another point of view has recently appeared: the maxiset point of view introduced by Cohen, DeVore, Keryacharian and Picard [1] and Keryacharian and Picard [6]. Given an estimate ˆθ, it consists in assessing the accuracy of ˆθ by fixing a prescribed rate ρ ε and pointing out the set of all the sequences θ that can be estimated by the procedure ˆθ at the target rate ρ ε. So, under the statistical model 1), we introduce the following definition. Definition 1. Let 1 p < and let ˆθ = ˆθ x )) 1 be an estimator. The maxiset of ˆθ associated with the rate ρ ε and the l p -loss is MSˆθ, ρ ε, p) = { [ θ = θ ) 1 : sup E ) 1/p ] } ˆθ x ) θ p ρ 1 ε <. ε 1

5 Maxisets for Bayes Rules 5 The maxiset point of view brings answers to the previous issues. Indeed, there is no a priori assumption on θ and then, the practitioner does not need to restrict his study to an arbitrary sequence space. The practitioner states the desired accuracy and then, nows the quality of the used procedure. Obviously, he chooses the procedure with the largest maxiset. Let us give first examples of maxiset results in the statistical framewor of this paper. For this purpose, we need to introduce the following sequence spaces. Definition. For all 1 p < and 0 < η <, we set { Bp, η = θ = θ ) 1 : sup λ } pη θ p <, λ and if q is a real number such that 0 < q < p, we set { wl p,q σ) = θ = θ ) 1 : sup λ q 1 θ >λσ σ p }. < Now, let us focus on thresholding rules associated with the universal threshold λ,ε = σ ε log ε see Donoho and Johnstone [16]): for all ε > 0, we assume that we are given a real number Λ ε > 0 only depending on ε and tending to + as ε 0, and we set { ˆθ x t x 1 x κ ) = λ,ε if < Λ ε, 0 otherwise, where κ is a constant. Keryacharian and Picard [6] have studied the maxisets for this procedure. They obtained the following result for ˆθ t = ˆθ t x )) 1 Theorems 3.1 and 3. of Keryacharian and Picard [6]): Theorem 1. Let 1 p < be a fixed real number and 0 < r <. We suppose that 4) ε, Λ ε = ε log ε ) r, and there exists a positive constant T such that ε 5) ε κ /16 log ε 1/4 p/ <Λ ε σ p T. Let q be a fixed positive real number such that q < p. Then, if κ p, ˆθt MS, ε log ε ) ) 1 q/p), p = wl p,q σ) B 1 r 1 q/p) p,. Remar 1. In this last result, of course, to avoid problems of definitions in 4) and 5), it is implicitly assumed, without loss of generality, that ε remains smaller than a positive constant ε 0 strictly smaller than 1 equal to 1/ for instance).

6 6 V. Rivoirard For the model 1), we can prove that under some conditions, the maxisets associated with linear estimates of the form l x ) 1, where l ) 1 is a non-increasing sequence of weights lying in [0, 1], are the spaces Bp,, η called Besov bodies. These conditions are satisfied, for instance, by projection weights, Tihonov Phillips weights or Pinser weights. For further details see Theorem of Rivoirard [34]. We can add that Lemma 1 of Rivoirard [34] proves that for the rate ε log ε ) 1 q/p), the maxisets of linear estimates are strictly contained in the maxisets of thresholding rules. It means that from the maxiset point of view, linear estimates are outperformed by thresholding ones. Keryacharian and Picard [7] also applied the maxiset theory for local bandwidth selection in the framewor of the Gaussian white noise model. Under some conditions, they proved that local bandwidth selection is at least as good as the thresholding procedure see Section 5 of [7]). The comparison of procedures based on maxisets is not as widely used as minimax comparison. However the results that have been obtained up to now are very promising since they generally show that the maxisets of well-nown procedures are spaces that are well established and easily interpretable. Indeed, in the maxiset approach, the Besov bodies the spaces Bp, ) η control the θ s for the large values of. As for the spaces wl p,q σ), they can be viewed as weighted wea l q spaces. The wea l q space is the space wl p,q σ) when σ = 1 for any 1, so we denote it wl p,q 1), and it was considered in statistics by Johnstone [0], Donoho and Johnstone [17] or Abramovich, Benjamini, Donoho and Johnstone []. This space was also studied in approximation theory and coding by DeVore [13], Donoho [15], or Cohen, DeVore and Hochmuth [11]. Abramovich, Benjamini, Donoho and Johnstone [] proved that if we order the components of a sequence θ according to their size: θ 1) θ ) θ n)..., then θ wl p,q 1) sup n 1/q θ n) < n see Section 1. of []). So, wl p,q 1) spaces naturally measure the sparsity of a signal. Of course, the weighted versions of these spaces, the wl p,q σ) spaces, share the same property. So, since Bayesian Procedures, commonly used in practice, have been barely studied from a theoretical point of view, it seems relevant to investigate the maxiset results for the Bayesian procedures ˆθ b1 and ˆθ b introduced in Section 1. and our first two goals in this paper will be the following: 1. to point out the maxisets of the Bayesian procedures ˆθ b1 and ˆθ b,. to compare these estimators with traditional procedures in the maxiset approach by comparing their respective maxisets. We shall draw interesting conclusions from the maxiset results of these classical Bayes rules that are extensively used in practice Maxiset results. Given 1 p < and ρ ε the prescribed rate, our first issue is to determine for i {1, }, MSˆθ b i, ρ ε, p) once we have fixed assumptions on the hyperparameters w,ε, γ, and Λ ε. First, throughout this paper, we tae the density γ to be unimodal, symmetric about 0, positive, and absolutely

7 Maxisets for Bayes Rules 7 continuous on R. We also assume that there exist two positive constants M and M 1 such that H 1 ) sup d log γθ) dθ = M <. θ M 1 It implies that the tails of γ have to be exponential or heavier. This enables b1 us to establish asymptotic properties of ˆθ x b ) and ˆθ x ). We prove that the posterior median ˆθ b 1 x ) is a thresholding rule. The posterior mean does not have this thresholding property, but is a shrinage rule see Propositions 1,, and 3 in Section ). We also assume that w,ε = w ε depends only on ε and we set π ε = 1 w ε )w 1 ε. Then, the maxisets for these procedures can be pointed out if we tae in addition ρ ε and Λ ε of the form ρ ε = ε log π ε ) 1 q/p, Λ ε = ε log π ε ) r, with 0 < r < and 0 < q < p. Corollaries 1 and show that under mild assumptions on π ε and on the size of the σ s see Assumptions 7), 8), 10), and 11), for i {1, }, MS ˆθb i, ε log π ε ) 1 q/p, p ) = wl p,q σ) B 1/r1 q/p) p,. In particular, it is possible to choose w ε = ε ν, as soon as ν is great enough to satisfy Assumptions 7), 8), 10), and 11). So, as far as the maxiset point of view is concerned, and roughly under the same conditions, both Bayesian procedures achieve exactly the same performance as the thresholding one for the rate ε log ε ) 1 q/p). And for this last rate, we can then claim that each of the Bayesian procedures outperforms the linear algorithm. We note in Section.4 that Assumption H 1 ) on the tails of γ is essential to get maxisets as large as possible. Finally, in Section.5, the previous results are readily extended to the case where we want to estimate functions f decomposed in an appropriate unconditional fixed basis B = {ψ, 1} as f = 1 θ ψ. We assume that θ is still observed through the model 1) and still estimated by Bayes rules. In this case, the maxisets are no longer sequence spaces but real functional spaces. See Section.5 for more details Connections between the proposed Bayesian model and the spaces pointed out. Starting from a Bayesian model, the outcomes of the study of the associated natural Bayes rules under the maxiset approach are the spaces wl p,q σ). Thus, the Bayesian model M 1 ) and wl p,q σ) spaces are connected throughout the maxiset approach. We can wonder whether or not this connection is artificial. It is worthwhile to observe that the Bayesian model has been constructed to model the sparsity of the sequences to be estimated. And as recalled in Section 1.3, wl p,q σ) spaces are natural spaces to measure the sparsity of a sequence by controlling the proportion of non-negligible θ s. The third goal of this paper is then the following. 3. Can we establish a direct connection between M 1 ) and wl p,q σ) spaces?

8 8 V. Rivoirard Actually, we would lie to prove a result similar to the one obtained by Abramovich, Sapatinas and Silverman [3] and exploited by Abramovich, Amato and Angelini [1]. Abramovich, Sapatinas and Silverman considered in the wavelet framewor a Bayesian model denoted M1 )) similar to M 1 ), where γ is fixed in advance and is the density of a Gaussian variable with mean zero and unit variance. Then, they established a necessary and sufficient condition on the other hyperparameters of M1 ) to ensure that the signal built from the wavelet coefficients coming from M1 ) belongs, almost surely, to a prescribed Besov space see Theorem 1 of Abramovich, Sapatinas and Silverman [3]). We would lie to do the same job with M 1 ) and wl p,q σ) spaces, but without fixing γ in advance. Theorem 7 of Section 3 gives answers to this issue. In particular, we point out the condition sup λ q + γx) dx <, which means that the tails of γ cannot be heavier λ than those of a Paretoq)-variable. Consequently, similarly to the result presented in Section. of Rivoirard [33], this result illustrates the strong connections between Paretoq)-distributions and wl p,q σ) spaces. Theorem 7 is proved by using results on weighted empirical distribution process established by Marcus and Zinn [30] Contents. In Section, we give some properties of the Bayes rules and evaluate the maxisets obtained for the Bayesian procedures ˆθ b 1 and ˆθ b. Section 3 investigates the relationships between the Bayesian model and wl p,q σ) spaces. Finally, in Section 4, we prove the results concerning the asymptotic properties of the Bayes rules.. Maxisets for Bayesian Procedures.1. Properties of the Bayes rules. In the Introduction, we defined the Bayes procedures ˆθ b1 and ˆθ b used in this paper. Recall that for < Λ ε, ˆθ b 1 x ) respectively ˆθ b x )) is the median respectively the mean) of the posterior distribution of θ given x given by 3). We also mention that if γ,ε denotes the prior distribution of θ, then for any 1 p < and any estimator ˆθ x ) of θ, the Bayes ris Bˆθ, γ,ε, p) of ˆθ x ) with respect to γ,ε associated with the l p -loss defined by Bˆθ, γ,ε, p) = γ,ε θ ) ˆθ x ) θ p φ x θ ) dθ dx, where φ still denotes the density of εσ ξ, satisfies: and Bˆθ b 1, γ,ε, 1) Bˆθ, γ,ε, 1), Bˆθ b, γ,ε, ) Bˆθ, γ,ε, ). These Bayes rules are shrinage rules. In particular, they satisfy the following property, which will be capital for description of their maxisets. For further details, see Lemma, inequality 6), and Section 5.5 of Johnstone and Silverman [5]. Proposition 1. For all 1, since γ is symmetric, absolutely continuous, positive, and unimodal, we have for i {1, }, ˆθ b i x1 ) ˆθ b i x ) for any x1, x ) such that x1 x,

9 Maxisets for Bayes Rules 9 bi ˆθ x ) = ˆθ bi x ) for any x, 0 ˆθ b i x ) x for any x 0, ˆθ bi x ) θ max θ ; x θ ) for any x, θ ). Assumptions imposed on the Bayesian model enable us to deduce other properties. Assumption H 1 ) implies that u M 1, γu) γm 1 ) exp Mu M 1 )). It means that the tails of γ have to be exponential or heavier. We shall see below see Section.4) that this assumption is essential to get maxisets as large as possible. Furthermore, w,ε = w ε depends only on ε and we shall assume throughout this paper that π ε denoted H ): 1. ε π ε is continuous,. inf ε>0 π ε > 1, 3. π 1 = exp1), ε 0 4. π ε +, = 1 w ε )w 1 ε 5. ε ε 0 log π ε 0. b1 Then, we deduce the asymptotic behavior of ˆθ x ). satisfies the following mild assumptions, globally Proposition. Assume that H 1 ) and H ) hold. For all < Λ ε, ˆθ b 1 x ) is a thresholding rule, i.e., there exists a uniquely defined tπ ε ) such that ˆθ b1 x ) = 0 s,ε x tπ ε ), where the threshold tπ ε ) satisfies tπ ε ) logπ ε ) for π ε large enough, and lim π ε + tπ ε ) logπε ) = 1. Furthermore, there exists a positive constant C such that lim sup s,ε x s,ε ˆθb 1 x ) 1 s,ε x >tπ ε ) C. π ε + Remar. The threshold tπ ε ) introduced in Proposition will be used throughout this paper, even though it is only implicitly defined. The proof of this proposition is given in the Appendix. Since < Λ ε, ˆθ b 1 x ) is a thresholding rule, it will be easy to evaluate the maxiset for ˆθ b1. As for the posterior mean, we have the following useful result. Proposition 3. Assume that H 1 ) and H ) hold. Let < Λ ε be fixed. There exist two functions ε 1 and ε bounded on [1, + ) such that ε 1 x) x 0, and 6) ˆθb x ) = x 1 + ε 1 s,ε x ) 1 + π ε φs,ε x )γs,ε x ) 1 ε s,ε x ),

10 10 V. Rivoirard where φ denotes the density of a 0, 1) Gaussian variable. If tπ ε ) is the threshold introduced in Proposition, there exists a positive constant C such that lim sup s,ε x s,ε ˆθb x ) 1 s,ε x >tπ ε ) C. π ε + Proposition 3 is proved in the Appendix. Remar 3. By using the results of Proposition 3, we have for π ε large enough, ˆθ b s 1,ε tπ ε ) ) π 1 ε s,ε 1 log πε. Unlie ˆθ b 1 x ), for all < Λ ε, ˆθ b x ) is not a thresholding rule, since ˆθ b x ) 0 if x 0, and we can easily prove that under H 1 ), lim x u exp 1 ) u )γu) du 1 s,ε x 0 + ˆθb exp 1 u )γu) du + π x ) = 1. ε Now, we are ready to evaluate the maxisets for both Bayesian rules. In Sections. and.3, for i {1, }, the ris of ˆθ b i studied under the l p -norm 1 p < ), will be denoted R p ˆθ b i ). We have: ) 1/p. R p ˆθ b i ) = E ˆθ b i θ p l p ) 1/p = E ˆθ bi x ) θ p b1.. Maxisets for the posterior median. Since < Λ ε, ˆθ x ) is a thresholding rule, it is easy to evaluate the maxiset for ˆθ b 1. On the one hand, we have: Theorem. Assume that H 1 ) and H ) hold. Let 0 < r < and 1 p < be two fixed real numbers. Suppose that ε > 0, 1 Λ ε = ε log π ε ) r and there exist two positive constants T 1 and T such that ε > 0, 7) 8) ε p πε 1 log π ε ) 1 p T1, π 1 8 ε log π ε ) 1 4 p σ p T. <Λ ε Let q be a fixed positive real number such that q < p. If θ B 1 r 1 q/p) p, then there exists a positive constant C such that ε > 0, R p ˆθ b1 ) Cε log π ε ) 1 q/p. wl p,q σ),

11 Maxisets for Bayes Rules 11 The proof of this theorem that uses Proposition is omitted since it is inspired by the proof of Theorem 3.1 of Keryacharian and Picard [6] and is very similar to the proof of Theorem 4 of Section.3. On the other hand, we have: Theorem 3. Assume that H 1 ) and H ) hold. Let 0 < r < and 1 p < be two fixed real numbers. Suppose that ε > 0, Λ ε = ε log π ε ) r. Let q be a fixed positive real number such that q < p. constant C such that If there exists a positive then θ B 1 r 1 q/p) p, ε > 0, R p ˆθ b 1 ) Cε log π ε ) 1 q/p, wl p,q σ). Before proving Theorem 3, let us recall the following result Lemma. of Keryacharian and Picard [6]): Proposition 4. For any 1 p < and 0 < q < p, { wl p,q σ) = θ = θ ) 1 : sup λ } q p 1 θ λσ θ p <. Proof of Theorem 3. Since π 1 = exp1), we have Λ 1 = 1 and θ p l p any ε > 0, θ p E ˆθ b1 θ p l p C p Λ p q)/r ε, Λ ε and since ε Λ ε is continuous with lim ε 0 Λ ε = +, we have: C p. For sup λ pr 1 qp ) θ p C p, λ and θ B 1 r 1 q p ) p,. In the following, we shall use tπ ε ) denoted t if there is no ris of confusion) introduced in Proposition and the following inequality: for any m > 0, with Z N 0, 1), for ε small enough, 9) P s,ε x s,ε θ t ) P Z 1 ) log πε m m e u / du 1 m log πε π exp 1 1 ) ) log πε e m u 1 m Klog π ε ) 1/ π 1/m ε, 0 log πε du π

12 1 V. Rivoirard where K depends only on m. So, for < Λ ε, and for ε small enough, θ p 1 s,ε θ t = θ p E 1 s,ε θ t 1 s,ε x t) + E θ p 1 s,ε θ t 1 s,ε x <t) θ p 1 s,ε θ t P s,ε x s,ε θ t ) + E ˆθ b1 x ) θ p Therefore, for ε small enough, 1 θ p 1 s,ε θ t + E ˆθ b 1 x ) θ p. θ p 1 s,ε θ t E ˆθ b 1 θ p l p C p ε log π ε ) p q. Since ε π ε is continuous, it implies that there exists λ 0 > 0 such that sup λ<λ 0 λ q p θ p 1 θ σ λ <. Since θ B 1 r 1 q p ) p,, sup λ λ 0 λ q p θ p 1 θ σ λ λ q p 0 θ p <. Using Proposition 4, we have proved that θ B 1 r 1 q p ) p, wl p,q σ). Finally, from Theorems and 3, we deduce easily: Corollary 1. Assume that H 1 ) and H ) hold. Let 0 < r < and 1 p < be two fixed real numbers. Suppose that ε > 0, Λ ε = ε log π ε ) r such that 7) and 8) are satisfied. Let q be a fixed positive real number such that q < p. Then, MS ˆθb 1, ε log π ε ) 1 q/p, p ) = wl p,q σ) B 1 r 1 q/p) p,. We can conclude that the spaces B 1 r 1 q p ) p, wl p,q σ) appear as maximal spaces b1 where ˆθ attains specific rates of convergence..3. Maxisets for the posterior mean. As in Section., we prove the following results: Theorem 4. Assume that H 1 ) and H ) hold. Let 0 < r < and 1 p < be two fixed real numbers. Suppose that ε > 0, Λ ε = ε log π ε ) r

13 Maxisets for Bayes Rules 13 and there exist two positive constants T 1 and T such that ε > 0, 10) 11) ε p π 1 4 ε log π ε ) 1 p T1, π 1 3 ε log π ε ) 1 4 p σ p T. <Λ ε Let q be a fixed positive real number such that q < p. If θ B 1 r 1 q/p) p, then there exists a positive constant C such that ε > 0, R p ˆθ b ) Cε log π ε ) 1 q/p. wl p,q σ), Proof. In the following, K will denote a constant independent of ε that may be different at each line. Let t = tπ ε ) be the threshold introduced in Proposition. For all < Λ ε, E ˆθ b x ) θ p = A + B + C, with and A = E ˆθ b x ) θ p 1 s,ε x t, B = E ˆθ b x ) θ p 1 t < s,εx t, C = E ˆθ b x ) θ p 1 s,ε x >t. For the first term, we have, using Remar 3 and 9), A = E ˆθ b x ) θ p 1 s,ε x t p 1 E ˆθ b x ) p 1 s,ε x t + p 1 θ p E 1 s,ε x t Ks p,ε π p ε log π ε ) p + p 1 θ p E [1 s,ε x t 1 s,ε θ >t] + p 1 θ p E [1 s,ε x t 1 s,ε θ t] Kσ p εp π p ε log π ε ) p + p 1 θ p P s,ε x s,ε θ t ) + p 1 θ p 1 s,ε θ t Kσ p εp π p ε log π ε ) p + K θ p π 1 4 ε log π ε ) 1 + p 1 θ p 1 s,ε θ t. The second term can be split into B = B 1 + B + B 3, with Using Proposition 1 and 9), B 1 = E ˆθ b x ) θ p 1 t < s,εx t1 s,ε θ >3t, B = E ˆθ b x ) θ p 1 t < s,εx t1 t 4 < s,εθ 3t, B 3 = E ˆθ b x ) θ p 1 t < s,εx t1 s,ε θ t 4. B 1 = E ˆθ b x ) θ p 1 t < s,εx t1 s,ε θ >3t

14 14 V. Rivoirard E x θ p 1 x θ θ 1 t < s,εx t1 s,ε θ >3t + θ p E 1 x θ < θ 1 t < s,εx t1 s,ε θ >3t E x θ p ) 1 P s,ε x s,ε θ t) 1 + θ p P s,ε x s,ε θ t) Kε p σ p π 1 ε log π ε ) K θ p π 1 ε log π ε ) 1, B = E ˆθ b x ) θ p 1 t < s t,εx t1 4 < s,εθ 3t E x θ p 1 x θ θ 1 t 4 < s,εθ + θ p E 1 x θ < θ 1 s,ε θ 3t Kε p σ p P s,ε x s,ε θ t ) 1 + θ p 1 s,ε θ 4 3t Kε p σ p π 1 3 ε log π ε ) θ p 1 s,ε θ 3t, B 3 = E ˆθ b x ) θ p 1 t < s,εx t1 s,ε θ t 4 E x θ p 1 s,ε x s,ε θ t 4 E x θ p ) 1 P s,ε x s,ε θ t ) 1 4 Kε p σ p π 1 3 ε log π ε ) 1 4. For the last term, we denote τ, ε) = s,ε x s,ε ˆθb x ) and recall that ξ is the noise. Using Proposition 3 and 9), we have C = E ˆθ b x ) θ p 1 s,ε x >t = E ˆθ b x ) θ p 1 s,ε x >t1 s,ε θ t + E ˆθ b x ) θ p 1 s,ε x >t1 s,ε θ >t s p,ε E ξ τ, ε) p 1 s,ε x >t) 1 P s,ε x s,ε θ t) 1 + s p,ε E ξ τ, ε) p 1 s,ε x >t1 s,ε θ >t Ks p,ε P s,εx s,ε θ t) 1 + Ks p,ε 1 s,ε θ >t Kε p σ p π 1 ε log π ε ) s,ε θ >t). Finally, for ε small enough, and < Λ ε, [ E ˆθ b x ) θ p K ε p σ p 1 π 3 ε log π ε ) ε p σ p 1 θ >εσ log πε + θ p 1 θ 4εσ log πε + θ p π 1 4 ε log π ε ) 1 We conclude by using Proposition 4 and by observing that E ˆθ b θ p l p = E ˆθ b x ) θ p + θ p Kε log πε ) p q, <Λ ε Λ ε since θ B 1 r 1 q p ) p, wl p,q σ) and Λ ε = ε log π ε ) r. ].

15 Maxisets for Bayes Rules 15 As in Section., we have a converse result, but unlie Theorem 3, we need to control the size of the σ s. Theorem 5. Assume that H 1 ) and H ) hold. Let 0 < r < and 1 p < be two fixed real numbers. Suppose that ε > 0, Λ ε = ε log π ε ) r. Let q be a fixed positive real number such that q < p. constant C such that then θ B 1 r 1 q p ) p, ε > 0, R p ˆθ b ) Cε log π ε ) 1 q/p, If there exists a positive wl p,q σ) as soon as there exists a positive constant T such that σ p T. <Λ ε 1) ε > 0, π p ε Proof. To prove that θ B 1 r 1 q p ) p,, we refer the reader to the proof of Theorem 3. Then, we want to show that sup λ q p θ p 1 θ λσ <. For this, we still use the threshold t = tπ ε ) of Proposition. Using 9), for any < Λ ε and for ε small enough, θ p 1 s,ε θ t = θ p E 1 4 s,ε θ t 1 4 s,ε x t ) + θ p E 1 s,ε θ t 1 4 s,ε x < t ) θ p 1 s,ε θ t P s 4,ε x s,ε θ t ) 4 + θ p 1 s,ε θ t P s 4,ε x < t ) 1 θ p 1 s,ε θ t 4 + θ p 1 s,ε θ t 4 P s,ε x < t By using Remar 3, we have for ε small enough, ˆθ b s 1 t ),ε π 1 ε s 1,ε log πε. Therefore, θ p 1 s,ε θ t θ 4 p 1 s,ε θ t P s 4,ε x < t ) [ E ˆθ b x ) θ p + ˆθ b x ) p] 1 s,ε x < t p p ). E ˆθ b x ) θ p + p <Λ ε E ˆθ b x ) p 1 s,ε x < t p E ˆθ b θ p l p + p <Λ ε s p p,ε π ε log π ε ) p p C p ε log π ε ) p q + p ε p π p ε log π ε ) p <Λ ε σ p.

16 16 V. Rivoirard Consequently, under condition 1), for ε small enough, θ p 1 θ p C p + T )ε log π 1 4 σ ε log π ε ) p q. ε Using the same arguments as for the proof of Theorem 3, the last inequality implies that θ wl p,q σ). Finally, from Theorems 4 and 5, observing that condition 1) is less restrictive than condition 11) since p/ 1/ > 1/3, we deduce easily: Corollary. Assume that H 1 ) and H ) hold. Let 0 < r < and 1 p < be two fixed real numbers. Suppose that ε > 0, Λ ε = ε log π ε ) r such that 10) and 11) are satisfied. Let q be a fixed positive real number such that q < p. Then, MS ˆθb, ε log π ε ) 1 q/p, p ) = wl p,q σ) B 1 r 1 q/p) p,. Once more, we can conclude that the spaces B 1 r 1 q p ) p, wl p,q σ) appear as maximal spaces where ˆθ b attains specific rates of convergence..4. First conclusions. Before going further, let us compare the various procedures involved in this paper. We recall that unlie ˆθ b 1 x ), ˆθ b x ) does not possess the advantage of being a thresholding rule < Λ ε ) and this explains the differences between the assumptions that are needed to determine the respective b1 maxisets associated with the Bayesian procedures ˆθ and ˆθ b. For instance, this explains why, unlie ˆθ b1 b, if ˆθ achieves the given rate of convergence, we need a condition on the σ s to prove that θ B 1 r 1 q p ) p, wl p,q σ). Furthermore, to obtain the upper bound for R p ˆθ b ), we use a decomposition into eleven terms for E ˆθ b x ) θ p. Since it is a thresholding rule, the corresponding decomposition b1 for ˆθ is simpler. This explains why the assumptions of Theorem 4 are a bit more restrictive than those of Theorem. Actually, to obtain the assumptions of Theorem 4, we just have to replace π ε with πε 1/4 in the assumptions of Theorem. But, since we consider a rate of the form ε log π ε without focusing on the optimal constant, the Bayesian procedures achieve exactly the same performance from the maxiset point of view. When π ε is a power of ε, then, by using Theorem 1, we can compare the Bayesian procedures ˆθ b 1 and ˆθ b with the thresholding one. We can conclude that each of them achieves the same performance as the thresholding one. Finally, since linear estimates are outperformed by thresholding ones, they are also outperformed by ˆθ b 1 and ˆθ b. Let us show now the importance of Assumption H 1 ). Section 7. of Johnstone and Silverman [4] proves that if γ is a normal density, whatever the value of w,ε, the posterior median satisfies ˆθ b1 x ) 1 α) x,

17 Maxisets for Bayes Rules 17 for some α > 0 and the same inequality holds for the posterior mean. It yields for θ > 0, E ˆθ b 1 x ) θ p 1 αp θ p. Moreover, when γ has tails equivalent to exp C t λ ) for some λ 1, ), Johnstone and Silverman showed that for large θ, ˆθ b 1 x ) θ C θ λ 1. Thus H 1 ) cannot be essentially relaxed without obtaining smaller maxisets..5. Maxisets of Bayesian procedures for estimating functions of L p spaces. In this section, we estimate functions of L p D) = { f : f Lp = D fx) p dx) 1/p < }, where D = [0, 1] d or D = R d. For this purpose, we exploit a wavelet basis of L D) denoted B = {ψ, 1}. More precisely, we assume that ψ ) 1 is the wavelet-tensor product constructed on compactly supported wavelets see Meyer [31]). So, if 1 < p <, Meyer [31] proved that B is an unconditional basis of L p D), which means that: for any f L p D), there exists a unique sequence θ such that f = θ ψ, there exists an absolute constant K such that if 1, θ θ, then θ ψ K θ ψ Lp. Lp Remar 4. The restriction 1 < p < is due to the fact that there is no unconditional basis if p / 1, ). Furthermore, we assume that {σ ψ, 1} satisfies the following inequality, called a superconcentration inequality: for any 0 < r 1 <, there exists a constant Cp, r 1 ) such that for all F {1,,... }, [ F σ ψ r1 ] 1 r 1 Lp Cp, r 1 ) sup σ ψ Lp. F Remar 5. Theorem 4. of Keryacharian and Picard [6] gives conditions on the σ s and on B to satisfy the above superconcentration inequality. We consider the model 1), and the function f = θ ψ is estimated by ˆf b 1 = b1 ˆθ x )ψ or ˆf b = b ˆθ x )ψ. In this framewor, we set: Definition 3. For 1 < p < and any i {1, }, the maxiset of ˆf b i associated with the rate ρ ε and the L p -loss is MS ˆf bi, ρ ε, p) = { f = [ θ ψ : sup E ˆf ] bi f p L p ) 1 p ρ 1 ε ε } <.

18 18 V. Rivoirard To study maxisets of ˆf b 1 and ˆf b, we introduce for any η > 0 and any 0 < q < p: B η p, B) = wl p,q σ)b) = { f = { f = θ ψ : sup λ η θ ψ : sup λ q λ θ ψ Lp } <, } 1 θ >λσ σ p ψ p L p <. If B is a standard wavelet basis regular enough, the space B η p, B) can be identified with a real Besov space. See Meyer [31] for further details. Theorem 6. Assume that H 1 ) and H ) hold. Let 0 < r < be a fixed real number. Suppose that ε > 0, Λ ε = ε log π ε ) r, and there exist two positive constants T 1 and T such that for any ε > 0, and ε p [ πε 1 log π ε ) ] 1 1 minp;) log π ε ) p T1 π 1 8 ε log π ε ) 1 4 p <Λ ε σ p ψ p L p T. Let q be a fixed positive real number such that q < p. Then, under the model 1), MS b ˆf 1, ε log π ε ) 1 q/p, p ) = wl p,q σ)b) B 1 r 1 q/p) p, B). The analogous result for ˆf b is obtained if we assume that for any ε > 0, and ε p [ π 1 4 ε log π ε ) ] 1 1 minp;) log π ε ) p T3, π 1 3 ε log π ε ) 1 4 p where T 3 and T 4 are two positive constants. <Λ ε σ p ψ p L p T 4, Once more, when π ε is a power of ε i.e., for the rate ε log1/ε)), by using Theorems 5.1 and 5. of Keryacharian and Picard [6] we can conclude that each of the Bayesian procedures achieves the same performance as the thresholding one. Proof. Again, we only give the proof for the procedure associated with the mean. Recall that B is an unconditional basis of L p D) if and only if there exists M > 0 such that for any set F {1,,... } and any choice of the coefficients c s 13) M 1 c ψ Lp F F c ψ ) 1 Lp M c ψ. Lp F

19 Maxisets for Bayes Rules 19 Furthermore, since {σ ψ, 1} satisfies a superconcentration inequality, there exist two positive constants c p and C p such that for any F {1,,... }, we have: ) p 14) c p σ ψ p σ ψ Cp σ ψ p. F F F In the following, K will denote a constant independent of ε that may be different at each line. We use the threshold t = tπ ε ) introduced in Proposition and the results of Proposition 1. Let us assume that f wl p,q σ)b) B 1 r 1 q/p) p, E ˆθ b θ )ψ p L p is bounded by K 11 A 1 = θ ψ Λ ε p L p B). Then, i=1 A i, with the A i s defined as follows: K ε log π ε ) p q), since f B 1 r 1 q/p) p, B). Next, A = E θ ψ 1 s,ε x t 1 s,ε θ >t p L p <Λ ε ) p KE θψ 1 s,ε x s,ε θ t, <Λ ε by using 13). If p, by using 9), the Jensen inequality, and 13), A K θψ P s,ε x s,ε θ t/ )) p <Λ ε K πε 1/4 log π ε ) ) 1 p θ ψ p. L p <Λ ε If p, by using 9), the generalized Minowsi inequality, and 13) A K θψ P s,ε x s,ε θ t/ ) ) p p <Λ ε Kπε 1/4 log π ε ) 1 θ ψ p L p <Λ ε and A 3 = E θ ψ 1 s,ε x t 1 s,ε θ t <Λ ε p L p by using 13). Further, A 4 = E ˆθb ψ 1 s,ε x t p KE L p <Λ ε ) Kε p πε p/ log π ε ) p p σψ <Λ ε K θψ 1 s,ε θ t <Λ ε <Λ ε ˆθ b ) 1 s,ε x t ψ Kε p π p/ ε log π ε ) p ) p ) p <Λ ε σ p ψ p L p,,

20 0 V. Rivoirard by using 13), Remar 3, and 14). Using 13), we have: A 5 = E θ )ψ 1 t < s t,εx t1 If p, <Λ ε ˆθ b KE A 5 K <Λ ε x θ ) ψ 1 s,ε x s,ε θ t 4 <Λ ε ψ p E Kε p π 1 3 ε log π ε ) < s,εθ 3t1 x θ θ p L p ) p. x θ p ) 1 s,εx s,εθ t 4 <Λ ε ψ p σ p, by using 9) and the Cauchy Schwarz inequality. If p, by using 9), the generalized Minowsi inequality, and 14), A 5 K ψ E x θ p ) ) p 1 s,εx s,εθ t p 4 <Λ ε Kε p πε 13 log π ε ) 1 4 ψ p σ p. Now we have A 6 = E ˆθ b <Λ ε <Λ ε θ )ψ 1 t < s,εx t1 t K θψ 1 s,ε θ 3t <Λ ε ) p by using 13). By using 13), the following terms: A 7 = E θ )ψ 1 t ˆθ b <Λ ε, 4 < s,εθ 3t1 x θ < θ p L p < s,εx t1 s,ε θ >3t1 x θ θ p L p KE x θ ) ψ1 s,ε x s,ε θ t <Λ ε A 8 = E θ )ψ 1 t ˆθ b <Λ ε KE θψ 1 s,ε x s,ε θ t <Λ ε A 9 = E ˆθ b θ )ψ 1 t <Λ ε KE ) p < s,εx t1 s,ε θ >3t1 x θ < θ p L p ) p < s,εx t1 s,ε θ t p 4 L p <Λ ε x θ ) ψ 1 s,ε x s,ε θ t 4,, ) p

21 Maxisets for Bayes Rules 1 are bounded, up to a constant, by the upper bounds of A or A 5. Next, A 10 = E ˆθ b <Λ ε KE θ )ψ 1 s,ε x >t1 s,ε θ t ˆθ b <Λ ε p L p θ ) ψ 1 s,ε x >t1 s,ε x s,ε θ >t ) p, by using 13). If p, by using 9), the Cauchy Schwarz inequality, and Proposition 3, A 10 K K ψ p E ˆθ b <Λ ε ψ p ε p σ p P <Λ ε Kε p π 1/ ε log π ε ) 1 4 θ p 1 s,ε x >t1 s,ε x s,ε θ >t s,ε x s,ε θ > t ) 1 <Λ ε σ p ψ p. If p, by using the generalized Minowsi inequality, the Cauchy Schwarz inequality, 9), 14), and Proposition 3, A 10 K <Λ ε ψ E ˆθb θ p 1 s,ε x >t1 s,ε x s,ε θ >t K ψσ ε P s,ε x s,ε θ > t) 1 p <Λ ε Kε p πε 1/ log π ε ) 1/4 ψ p σ p. <Λ ε ) p ) ) p p Finally, A 11 = E ˆθ b <Λ ε KE θ )ψ 1 s,ε x >t1 s,ε θ >t ˆθ b <Λ ε p L p θ ) ψ 1 s,ε x >t1 s,ε θ >t ) p. If p, by using Proposition 3, A 11 K Kε p ψ p E ˆθ b θ p 1 s,ε x >t1 s,ε θ >t <Λ ε ψ p σ p 1 s,ε θ >t. <Λ ε

22 V. Rivoirard If p, by using 14) and Proposition 3, A 11 K ψ 1 s,ε θ >t E ˆθb θ p 1 s,ε x >t <Λ ε Kε p <Λ ε ψ σ 1 s,ε θ >t ) p <Λ ε ) p ) p Kε p ψ p σ p 1 s,ε θ >t. Thus we conclude that there exists a positive constant C such that ε > 0, E ˆθ b x p ) θ )ψ C p ε ) p q), log π ε L p by using the following result proved by Keryacharian and Picard [6] upper bound of the term B, p. 311): if f = θ ψ wl p,q σ)b), then λ > 0, θψ 1 θ σ λ ) p Kλ p q. Now, let us assume that there exists a positive constant C such that ε > 0, E ˆθ b x ) θ )ψ p L p C p ε log π ε ) p q). To bound the following term, we just use 13): A 1 = θ ψ Λ ε KE ˆθ b p L p = E θ )ψ p ˆθ b Λ ε L p θ )ψ p L p K ε log π ε ) p q, KE ˆθ b θ ) ψ ) p which proves that f = θ ψ B 1 r 1 q/p) p, B). Now, using 13), θ ψ 1 s,ε θ t p KA A 13 ), L p with A 13 = p L p θ ψ 1 s,ε θ t 4 <Λ ε p 1 E θ ψ 1 s,ε θ t 1 4 s,ε x > t p + p 1 A 14 L p <Λ ε 1 θ ψ 1 s,ε θ t p + p 1 A L p A 13 + p 1 A 14 p A 14, <Λ ε

23 Maxisets for Bayes Rules 3 for ε small enough see the upper bound of A ), where A 14 = E θ ψ 1 s,ε θ t 1 4 s,ε x t p. L p <Λ ε But, using 13), we obtain A 14 KE ˆθ b θ p )ψ + KE ˆθb ψ 1 s,ε x t L p <Λ ε KE ˆθ b θ p )ψ + KA 4 K ε ) p q), log π ε L p p L p which implies that θ ψ 1 s,ε θ t 4 p L p K ε log π ε ) p q). Now, we use Lemma 5.1 of Keryacharian and Picard [6], which ends the proof of the theorem. 3. Relationships between M 1 ) and wl p,q σ) Spaces In this paper, our aim is to estimate sparse sequences, and we model sparsity within a Bayes approach, and more precisely, by using the model M 1 ). We noticed that under this model, the maxisets for the previous Bayes rules are defined by using wl p,q σ) spaces. To some extent, this result is not surprising, since we recalled in the Introduction that these spaces are weighted versions of wea l q spaces that naturally measure sparsity. Then, the model M 1 ) and wl p,q σ) spaces are connected via a maxiset approach. So, it is natural to wonder whether we can establish other more natural connections between our Bayesian approach to model sparsity and wl p,q σ) spaces. The following result gives a positive answer. Theorem 7. Suppose that we are given 1 p < and 0 < q < p. We again consider the model M 1 ) with ε = 1. Denote w = w,1 and λ 0, F λ) = γx) dx. If there exists a constant C such that λ sup λ q σ p 1 θ >σ λ C q a.s., then sup λ q F λ) w σ p Cq. Conversely, if there exists a constant C such that 15) sup λ q F λ) w σ p Cq, then sup λ q σ p 1 θ >σ λ < a.s.

24 4 V. Rivoirard Proof. If sup λ q σ p 1 θ >σ λ C q a.s., we have: sup λ q F λ) w σ p = sup E E λ q sup λ q σ p 1 θ >σ λ ) ) σ p 1 θ >σ λ C q. Conversely, suppose that 15) is true. To establish the required inequality, we exploit Theorem 0.3 of Marcus and Zinn [30]. Let r ) 1 be a Rademacher sequence i.e., a sequence of i.i.d. random variables taing values +1 and 1 with probability 1/ each) independent of θ ) 1 and let S n be the partial sum of the symmetrized random variables σ p q θ q ) 1 : S n = n Y, =1 where We have Y = r σ p q θ q. E Y 1 Y 1) = 0, θ P Y > 1) = P σ q ) > σ p vary 1 Y 1) = E Y 1 Y 1) = σ p p/q σ = w σ p 0 C q w σ p C q w σ p = w F σ p/q E [ σ 1 x q γx) dx w σ p ) C q w σ p θ ) q 1 σ 1 θ σ p/q σ p/q ) 1w p/q σ σ p qx q 1 dx 0 ) 1w σ p. 0 ] ) 1w σ p, qx q 1 F x) dx Using the three series theorem see Theorem.8 of Billingsley [6]), S n converges with probability 1 as n +. Therefore, if µ is a fixed positive real number, we have for any increasing sequence of positive real numbers b n ) n with lim n + b n = +, lim sup n + 1 b n S n µ a.s.

25 Maxisets for Bayes Rules 5 Obviously, lim sup n + sup 1 j n 1 sup λ q j σ p b E 1 θ >σ λ) µ n =1 a.s. Then, we can apply Theorem 0.3 of Marcus and Zinn [30], which shows that Since it yields lim sup n + 1 sup λ q n σ b p 1 θ >σ λ P θ > σ λ) ) 1160 µ. n lim sup n + lim sup n + =1 1 sup b n 1 sup b n λ q λ q n =1 n =1 σ p P θ > σ λ) = 0, σ p 1 θ >σ λ 1160 µ. With µ 0, we have proved that for any increasing sequence of positive real numbers b n ) n with lim n + b n = +, lim n + 1 sup b n λ q n =1 σ p 1 θ >σ λ = 0 a.s. If the random sequence A n = sup λ q n =1 σp 1 θ >σ λ were not bounded, we could construct an increasing function Φ, with lim n + Φn) = +, such that A Φn) > n. By considering an increasing sequence b n ) n, with b Φn) = n, we obtain a contradiction. So, there exists a finite random variable Y such that n 1, A n Y, a.s. It implies that sup λ q σ p 1 θ >σ λ < a.s. The result is proved. So, to ensure that a sequence coming from the Bayesian model M 1 ) belongs to wl p,q σ) almost surely, we should not consider densities γ having tails heavier than those of Paretoq)-distributions. In the wavelet framewor, with special values for the σ s, Rivoirard [33] has already noted the strong connections between Paretoq)- distributions and wl p,q σ) spaces, since in Section. of that paper, least favorable priors for these spaces are presented and it is explained how these priors are built from Paretoq)-variables. Concluding remars. In this paper, we discussed the modelling of sparsity. The form of our Bayesian model was the following: θ w,ε γ,ε θ ) + 1 w,ε )δ 0 θ ), 1.

26 6 V. Rivoirard Provided the tails of γ,ε are exponential or heavier, the maxisets of the Bayes rules are wl p,q σ) spaces that naturally measure the sparsity of a signal. So, our choice for the Bayesian modelling seems appropriate. It is all the more appropriate since this model enables us to build typical realizations of wl p,q σ) spaces. The main goal of this paper was to compare in the maxiset approach the performances of classical Bayes estimators: the posterior median and mean of our Bayesian model. We proved that for a large range of loss functions, the maxisets of these estimators coincide with the maxisets associated with thresholding estimators. These results have been established for the heteroscedastic white noise model 1), where the ξ s are assumed to be Gaussian and independent. It would be interesting to study the maxisets of the Bayes rules without these assumptions. It would also be interesting to try to find Bayes estimators that outperform ˆθ b 1 and ˆθ b under the maxiset approach, if possible. Note that if the maxiset theory seems to provide advantages, the problem of optimality in this approach remains an entirely open issue. Can we introduce a meaningful notion of optimality? If yes, what are optimal estimators? Other natural questions arise: what do the maxisets become when the σ s are unnown and they are, for instance, estimated by using a Bayes approach? Do we obtain larger maxisets when the θ s are gathered in non-overlapping blocs, each of which is provided with a prior? Since the outcome of the maxiset approach is a functional space or a sequence space), we have not focused on the Bayes riss of the estimators. But, inspired by the maxiset point of view used in this paper, we could investigate the maximal set of prior distributions such that the associated Bayes ris of a given estimator achieves a prescribed rate. This provides an interesting topic for further research. 4. Appendix In this section, O x 1) will denote any function of x that is bounded as x +. We write o πε 1) for any function that is bounded by a function depending only on π ε and that tends to 0 as π ε tends to +. Furthermore, φ denotes the density of a 0, 1) Gaussian variable and γ is the density introduced in M 1 ). We shall exploit the following lemma: and Lemma 1. For any x > 0 and 0 < τ < x, define: K τ x) = Ix) = x τ 0 exp 1 v) γx v) dv exp 1 ) v + xv γv) dv. Under H 1 ), there exist four positive constants M, M 3, C 1, and C such that M x τ exp 1 v Mv ) dv K τ x)γx) 1 M 3 τ exp 1 ) v + Mv dv, and C 1 lim inf x + Ix)γx) 1 φx) lim sup Ix)γx) 1 φx) C <. x +

27 Maxisets for Bayes Rules 7 Proof. Under H 1 ), since γ is positive, absolutely continuous, symmetric, and unimodal, it is easy to show that there exist two constants M and M 3 such that, a, b) R, M exp M a b ) γa)γb) 1 M 3 expm a b ). We immediately get the first inequality. Now, let us define x > 0, As before, M 0 Jx) = exp 1 v Mv By simple computations, we have: 0 exp 1 v) γx + v) dv. ) dv Jx)γx) 1 M 3 0 exp 1 ) v + Mv dv. which implies the result. Ix) = exp 1 x) Jx) + K0 x) ), Now, let us give the proof of Propositions and 3. Proof of Proposition. Without loss of generality, we can assume that x > 0. Then, using Proposition 1, ˆθ b1 x ) = 0 Pθ > 0 x ) < 1 w ε s,ε φ s,ε x θ) ) s,ε γs,ε θ) dθ 0 < w ε s,ε φ s,ε x θ) ) s,ε γs,ε θ) dθ + 1 w ε )s,ε φs,ε x ) 0 0 s,ε φ s,ε x θ) ) s,ε γs,ε θ) dθ s,ε φ s,ε x θ) ) s,ε γs,ε θ) dθ < π ε s,ε φs,ε x ). Then ˆθ b 1 x ) = 0 s,ε x t, with t such that 0 exp 1 u) γu) exptu) exp tu) ) du = π ε. We have that t is a function depending only on π ε and as π ε +, 0 exp 1 u) γu) exptu) du = π ε 1 + o πε 1)).

28 8 V. Rivoirard Using Lemma 1, we have, for π ε large enough, logπε ) tπ ε ) logπ ε )1 + o πε 1)), and the first statement of Proposition is proved. For the second statement, we assume that s,ε x ˆθ b1 x ) > 0. Using 13), we have: > t, which implies that P ˆθb 1 θ ˆθ b 1 x ) 1 ) x = w ε φ x θ)γ,ε θ) dθ + 1 w ε )φ x ) = w ε φ x θ)γ,ε θ) dθ s,ε ˆθb 1 = exp Using Lemma 1, since s,ε x t, we prove easily that Therefore, π ε Is,ε x ) 1 = o πε 1). s,ε x u 1 u) γu) du + π ε exp s,ε x u 1 u) γu) du. s,ε ˆθb 1 exp s,ε x u 1 u) γu) du = Is,ε x )1 + o πε 1)). By using again Lemma 1, it implies that there exists a positive constant V such that for π ε large enough, s,ε ˆθb 1 s,εx exp 1 v) γv + s,ε x ) dv γs,ε x ) 1 V K s,ε x s,ε ˆθb 1 s,ε x )γs,ε x ) 1 V, in notations of Lemma 1. Finally, there exists a positive constant C such that lim sup s,ε x s,ε ˆθb 1 x ) 1 s,ε x >tπ ε ) C. π ε + Proof of Proposition 3. Without loss of generality, we can assume that x > 0. We have: ˆθ b x ) = = θγ φ,ε θ x ) dθ s,εθφs,ε x θ))s,ε γs,ε θ) dθ s,εφs,ε x θ))s,ε γs,ε θ) dθ + π ε s,ε φs,ε x ) = 1 u exp 1 u + s,ε x u ) γu) du s +,ε exp 1 u + s,ε x u ). γu) du + π ε

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be Wavelet Estimation For Samples With Random Uniform Design T. Tony Cai Department of Statistics, Purdue University Lawrence D. Brown Department of Statistics, University of Pennsylvania Abstract We show

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2512 2546 OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 By O. V. Lepski and V. G. Spokoiny Humboldt University and Weierstrass Institute

More information

The Stein hull. Clément Marteau* Institut de Mathématiques, Université de Toulouse, INSA - 135, Avenue de Rangueil, F Toulouse Cedex 4, France

The Stein hull. Clément Marteau* Institut de Mathématiques, Université de Toulouse, INSA - 135, Avenue de Rangueil, F Toulouse Cedex 4, France Journal of Nonparametric Statistics Vol. 22, No. 6, August 2010, 685 702 The Stein hull Clément Marteau* Institut de Mathématiques, Université de Toulouse, INSA - 135, Avenue de Rangueil, F-31 077 Toulouse

More information

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are

More information

On the Estimation of the Function and Its Derivatives in Nonparametric Regression: A Bayesian Testimation Approach

On the Estimation of the Function and Its Derivatives in Nonparametric Regression: A Bayesian Testimation Approach Sankhyā : The Indian Journal of Statistics 2011, Volume 73-A, Part 2, pp. 231-244 2011, Indian Statistical Institute On the Estimation of the Function and Its Derivatives in Nonparametric Regression: A

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/32 Part II 2) Adaptation and oracle inequalities YES, Eurandom, 10 October 2011

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/27 Table of contents YES, Eurandom, 10 October 2011 p. 2/27 Table of contents 1)

More information

19.1 Problem setup: Sparse linear regression

19.1 Problem setup: Sparse linear regression ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

and finally, any second order divergence form elliptic operator

and finally, any second order divergence form elliptic operator Supporting Information: Mathematical proofs Preliminaries Let be an arbitrary bounded open set in R n and let L be any elliptic differential operator associated to a symmetric positive bilinear form B

More information

SOLUTION OF GENERALIZED LINEAR VECTOR EQUATIONS IN IDEMPOTENT ALGEBRA

SOLUTION OF GENERALIZED LINEAR VECTOR EQUATIONS IN IDEMPOTENT ALGEBRA , pp. 23 36, 2006 Vestnik S.-Peterburgskogo Universiteta. Matematika UDC 519.63 SOLUTION OF GENERALIZED LINEAR VECTOR EQUATIONS IN IDEMPOTENT ALGEBRA N. K. Krivulin The problem on the solutions of homogeneous

More information

Fast learning rates for plug-in classifiers under the margin condition

Fast learning rates for plug-in classifiers under the margin condition Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,

More information

A NEW PROOF OF THE ATOMIC DECOMPOSITION OF HARDY SPACES

A NEW PROOF OF THE ATOMIC DECOMPOSITION OF HARDY SPACES A NEW PROOF OF THE ATOMIC DECOMPOSITION OF HARDY SPACES S. DEKEL, G. KERKYACHARIAN, G. KYRIAZIS, AND P. PETRUSHEV Abstract. A new proof is given of the atomic decomposition of Hardy spaces H p, 0 < p 1,

More information

A Lower Bound Theorem. Lin Hu.

A Lower Bound Theorem. Lin Hu. American J. of Mathematics and Sciences Vol. 3, No -1,(January 014) Copyright Mind Reader Publications ISSN No: 50-310 A Lower Bound Theorem Department of Applied Mathematics, Beijing University of Technology,

More information

Lecture 4 Lebesgue spaces and inequalities

Lecture 4 Lebesgue spaces and inequalities Lecture 4: Lebesgue spaces and inequalities 1 of 10 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 4 Lebesgue spaces and inequalities Lebesgue spaces We have seen how

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Adaptive Lasso and group-lasso for functional Poisson regression

Adaptive Lasso and group-lasso for functional Poisson regression Journal of Machine Learning Research 17 2016 1-46 Submitted 1/15; Revised 12/15; Published 4/16 Adaptive Lasso and group-lasso for functional Poisson regression Stéphane Ivanoff CEREMADE UMR CNRS 7534

More information

Estimation of the functional Weibull-tail coefficient

Estimation of the functional Weibull-tail coefficient 1/ 29 Estimation of the functional Weibull-tail coefficient Stéphane Girard Inria Grenoble Rhône-Alpes & LJK, France http://mistis.inrialpes.fr/people/girard/ June 2016 joint work with Laurent Gardes,

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Some functional (Hölderian) limit theorems and their applications (II)

Some functional (Hölderian) limit theorems and their applications (II) Some functional (Hölderian) limit theorems and their applications (II) Alfredas Račkauskas Vilnius University Outils Statistiques et Probabilistes pour la Finance Université de Rouen June 1 5, Rouen (Rouen

More information

Wavelet Shrinkage for Nonequispaced Samples

Wavelet Shrinkage for Nonequispaced Samples University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Wavelet Shrinkage for Nonequispaced Samples T. Tony Cai University of Pennsylvania Lawrence D. Brown University

More information

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent Chapter 5 ddddd dddddd dddddddd ddddddd dddddddd ddddddd Hilbert Space The Euclidean norm is special among all norms defined in R n for being induced by the Euclidean inner product (the dot product). A

More information

NEEDLES AND STRAW IN HAYSTACKS: EMPIRICAL BAYES ESTIMATES OF POSSIBLY SPARSE SEQUENCES

NEEDLES AND STRAW IN HAYSTACKS: EMPIRICAL BAYES ESTIMATES OF POSSIBLY SPARSE SEQUENCES The Annals of Statistics 2004, Vol. 32, No. 4, 1594 1649 DOI 10.1214/009053604000000030 Institute of Mathematical Statistics, 2004 NEEDLES AND STRAW IN HAYSTACKS: EMPIRICAL BAYES ESTIMATES OF POSSIBLY

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information

I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A)

I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A) I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 011/17 BLOCK-THRESHOLD-ADAPTED

More information

Memoirs on Differential Equations and Mathematical Physics

Memoirs on Differential Equations and Mathematical Physics Memoirs on Differential Equations and Mathematical Physics Volume 34, 005, 1 76 A. Dzhishariani APPROXIMATE SOLUTION OF ONE CLASS OF SINGULAR INTEGRAL EQUATIONS BY MEANS OF PROJECTIVE AND PROJECTIVE-ITERATIVE

More information

arxiv: v1 [math.st] 26 Mar 2012

arxiv: v1 [math.st] 26 Mar 2012 POSTERIOR CONSISTENCY OF THE BAYESIAN APPROACH TO LINEAR ILL-POSED INVERSE PROBLEMS arxiv:103.5753v1 [math.st] 6 Mar 01 By Sergios Agapiou Stig Larsson and Andrew M. Stuart University of Warwick and Chalmers

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

A Data-Driven Block Thresholding Approach To Wavelet Estimation

A Data-Driven Block Thresholding Approach To Wavelet Estimation A Data-Driven Block Thresholding Approach To Wavelet Estimation T. Tony Cai 1 and Harrison H. Zhou University of Pennsylvania and Yale University Abstract A data-driven block thresholding procedure for

More information

Tools from Lebesgue integration

Tools from Lebesgue integration Tools from Lebesgue integration E.P. van den Ban Fall 2005 Introduction In these notes we describe some of the basic tools from the theory of Lebesgue integration. Definitions and results will be given

More information

Optimal spectral shrinkage and PCA with heteroscedastic noise

Optimal spectral shrinkage and PCA with heteroscedastic noise Optimal spectral shrinage and PCA with heteroscedastic noise William Leeb and Elad Romanov Abstract This paper studies the related problems of denoising, covariance estimation, and principal component

More information

REMARKS ON THE VANISHING OBSTACLE LIMIT FOR A 3D VISCOUS INCOMPRESSIBLE FLUID

REMARKS ON THE VANISHING OBSTACLE LIMIT FOR A 3D VISCOUS INCOMPRESSIBLE FLUID REMARKS ON THE VANISHING OBSTACLE LIMIT FOR A 3D VISCOUS INCOMPRESSIBLE FLUID DRAGOŞ IFTIMIE AND JAMES P. KELLIHER Abstract. In [Math. Ann. 336 (2006), 449-489] the authors consider the two dimensional

More information

STOKES PROBLEM WITH SEVERAL TYPES OF BOUNDARY CONDITIONS IN AN EXTERIOR DOMAIN

STOKES PROBLEM WITH SEVERAL TYPES OF BOUNDARY CONDITIONS IN AN EXTERIOR DOMAIN Electronic Journal of Differential Equations, Vol. 2013 2013, No. 196, pp. 1 28. ISSN: 1072-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu STOKES PROBLEM

More information

MIT Spring 2016

MIT Spring 2016 MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 3 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a)

More information

Minimax Goodness-of-Fit Testing in Ill-Posed Inverse Problems with Partially Unknown Operators

Minimax Goodness-of-Fit Testing in Ill-Posed Inverse Problems with Partially Unknown Operators Minimax Goodness-of-Fit Testing in Ill-Posed Inverse Problems with Partially Unknown Operators Clément Marteau, Institut Camille Jordan, Université Lyon I - Claude Bernard, 43 boulevard du novembre 98,

More information

A COUNTEREXAMPLE TO AN ENDPOINT BILINEAR STRICHARTZ INEQUALITY TERENCE TAO. t L x (R R2 ) f L 2 x (R2 )

A COUNTEREXAMPLE TO AN ENDPOINT BILINEAR STRICHARTZ INEQUALITY TERENCE TAO. t L x (R R2 ) f L 2 x (R2 ) Electronic Journal of Differential Equations, Vol. 2006(2006), No. 5, pp. 6. ISSN: 072-669. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu (login: ftp) A COUNTEREXAMPLE

More information

DS-GA 1002 Lecture notes 2 Fall Random variables

DS-GA 1002 Lecture notes 2 Fall Random variables DS-GA 12 Lecture notes 2 Fall 216 1 Introduction Random variables Random variables are a fundamental tool in probabilistic modeling. They allow us to model numerical quantities that are uncertain: the

More information

Wavelet Footprints: Theory, Algorithms, and Applications

Wavelet Footprints: Theory, Algorithms, and Applications 1306 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 5, MAY 2003 Wavelet Footprints: Theory, Algorithms, and Applications Pier Luigi Dragotti, Member, IEEE, and Martin Vetterli, Fellow, IEEE Abstract

More information

u xx + u yy = 0. (5.1)

u xx + u yy = 0. (5.1) Chapter 5 Laplace Equation The following equation is called Laplace equation in two independent variables x, y: The non-homogeneous problem u xx + u yy =. (5.1) u xx + u yy = F, (5.) where F is a function

More information

Adaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach

Adaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1999 Adaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach T. Tony Cai University of Pennsylvania

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

Nonparametric regression with martingale increment errors

Nonparametric regression with martingale increment errors S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Bayesian inference for the location parameter of a Student-t density

Bayesian inference for the location parameter of a Student-t density Bayesian inference for the location parameter of a Student-t density Jean-François Angers CRM-2642 February 2000 Dép. de mathématiques et de statistique; Université de Montréal; C.P. 628, Succ. Centre-ville

More information

Adaptive Lasso and group-lasso for functional Poisson regression

Adaptive Lasso and group-lasso for functional Poisson regression Adaptive Lasso and group-lasso for functional Poisson regression S. Ivanoff, F. Picard & V. Rivoirard CEREMADE UMR CNRS 7534, Université Paris Dauphine,F-75775 Paris, France LBBE, UMR CNRS 5558 Univ. Lyon

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector On Minimax Filtering over Ellipsoids Eduard N. Belitser and Boris Y. Levit Mathematical Institute, University of Utrecht Budapestlaan 6, 3584 CD Utrecht, The Netherlands The problem of estimating the mean

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

NONTRIVIAL SOLUTIONS TO INTEGRAL AND DIFFERENTIAL EQUATIONS

NONTRIVIAL SOLUTIONS TO INTEGRAL AND DIFFERENTIAL EQUATIONS Fixed Point Theory, Volume 9, No. 1, 28, 3-16 http://www.math.ubbcluj.ro/ nodeacj/sfptcj.html NONTRIVIAL SOLUTIONS TO INTEGRAL AND DIFFERENTIAL EQUATIONS GIOVANNI ANELLO Department of Mathematics University

More information

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Eco517 Fall 2014 C. Sims MIDTERM EXAM Eco57 Fall 204 C. Sims MIDTERM EXAM You have 90 minutes for this exam and there are a total of 90 points. The points for each question are listed at the beginning of the question. Answer all questions.

More information

Optimal rates of convergence for estimating Toeplitz covariance matrices

Optimal rates of convergence for estimating Toeplitz covariance matrices Probab. Theory Relat. Fields 2013) 156:101 143 DOI 10.1007/s00440-012-0422-7 Optimal rates of convergence for estimating Toeplitz covariance matrices T. Tony Cai Zhao Ren Harrison H. Zhou Received: 17

More information

PCMI LECTURE NOTES ON PROPERTY (T ), EXPANDER GRAPHS AND APPROXIMATE GROUPS (PRELIMINARY VERSION)

PCMI LECTURE NOTES ON PROPERTY (T ), EXPANDER GRAPHS AND APPROXIMATE GROUPS (PRELIMINARY VERSION) PCMI LECTURE NOTES ON PROPERTY (T ), EXPANDER GRAPHS AND APPROXIMATE GROUPS (PRELIMINARY VERSION) EMMANUEL BREUILLARD 1. Lecture 1, Spectral gaps for infinite groups and non-amenability The final aim of

More information

The Root-Unroot Algorithm for Density Estimation as Implemented. via Wavelet Block Thresholding

The Root-Unroot Algorithm for Density Estimation as Implemented. via Wavelet Block Thresholding The Root-Unroot Algorithm for Density Estimation as Implemented via Wavelet Block Thresholding Lawrence Brown, Tony Cai, Ren Zhang, Linda Zhao and Harrison Zhou Abstract We propose and implement a density

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

19.1 Maximum Likelihood estimator and risk upper bound

19.1 Maximum Likelihood estimator and risk upper bound ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 19: Denoising sparse vectors - Ris upper bound Lecturer: Yihong Wu Scribe: Ravi Kiran Raman, Apr 1, 016 This lecture

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

On Schrödinger equations with inverse-square singular potentials

On Schrödinger equations with inverse-square singular potentials On Schrödinger equations with inverse-square singular potentials Veronica Felli Dipartimento di Statistica University of Milano Bicocca veronica.felli@unimib.it joint work with Elsa M. Marchini and Susanna

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest

More information

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS TSOGTGEREL GANTUMUR Abstract. After establishing discrete spectra for a large class of elliptic operators, we present some fundamental spectral properties

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Nonlinear error dynamics for cycled data assimilation methods

Nonlinear error dynamics for cycled data assimilation methods Nonlinear error dynamics for cycled data assimilation methods A J F Moodey 1, A S Lawless 1,2, P J van Leeuwen 2, R W E Potthast 1,3 1 Department of Mathematics and Statistics, University of Reading, UK.

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Does l p -minimization outperform l 1 -minimization?

Does l p -minimization outperform l 1 -minimization? Does l p -minimization outperform l -minimization? Le Zheng, Arian Maleki, Haolei Weng, Xiaodong Wang 3, Teng Long Abstract arxiv:50.03704v [cs.it] 0 Jun 06 In many application areas ranging from bioinformatics

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Inverse Statistical Learning

Inverse Statistical Learning Inverse Statistical Learning Minimax theory, adaptation and algorithm avec (par ordre d apparition) C. Marteau, M. Chichignoud, C. Brunet and S. Souchet Dijon, le 15 janvier 2014 Inverse Statistical Learning

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

arxiv: v2 [math.st] 12 Feb 2008

arxiv: v2 [math.st] 12 Feb 2008 arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig

More information

Discussion of Hypothesis testing by convex optimization

Discussion of Hypothesis testing by convex optimization Electronic Journal of Statistics Vol. 9 (2015) 1 6 ISSN: 1935-7524 DOI: 10.1214/15-EJS990 Discussion of Hypothesis testing by convex optimization Fabienne Comte, Céline Duval and Valentine Genon-Catalot

More information

Besov regularity for the solution of the Stokes system in polyhedral cones

Besov regularity for the solution of the Stokes system in polyhedral cones the Stokes system Department of Mathematics and Computer Science Philipps-University Marburg Summer School New Trends and Directions in Harmonic Analysis, Fractional Operator Theory, and Image Analysis

More information

LECTURE 3 Functional spaces on manifolds

LECTURE 3 Functional spaces on manifolds LECTURE 3 Functional spaces on manifolds The aim of this section is to introduce Sobolev spaces on manifolds (or on vector bundles over manifolds). These will be the Banach spaces of sections we were after

More information

Module 3. Function of a Random Variable and its distribution

Module 3. Function of a Random Variable and its distribution Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given

More information

Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis

Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis Jeffrey S. Morris University of Texas, MD Anderson Cancer Center Joint wor with Marina Vannucci, Philip J. Brown,

More information

The Canonical Gaussian Measure on R

The Canonical Gaussian Measure on R The Canonical Gaussian Measure on R 1. Introduction The main goal of this course is to study Gaussian measures. The simplest example of a Gaussian measure is the canonical Gaussian measure P on R where

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series C.7 Numerical series Pag. 147 Proof of the converging criteria for series Theorem 5.29 (Comparison test) Let and be positive-term series such that 0, for any k 0. i) If the series converges, then also

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Sparsity and the truncated l 2 -norm

Sparsity and the truncated l 2 -norm Lee H. Dicker Department of Statistics and Biostatistics utgers University ldicker@stat.rutgers.edu Abstract Sparsity is a fundamental topic in highdimensional data analysis. Perhaps the most common measures

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May University of Luxembourg Master in Mathematics Student project Compressed sensing Author: Lucien May Supervisor: Prof. I. Nourdin Winter semester 2014 1 Introduction Let us consider an s-sparse vector

More information