The Stein hull. Clément Marteau* Institut de Mathématiques, Université de Toulouse, INSA - 135, Avenue de Rangueil, F Toulouse Cedex 4, France
|
|
- Polly Simon
- 6 years ago
- Views:
Transcription
1 Journal of Nonparametric Statistics Vol. 22, No. 6, August 2010, The Stein hull Clément Marteau* Institut de Mathématiques, Université de Toulouse, INSA - 135, Avenue de Rangueil, F Toulouse Cedex 4, France (Received 17 October 2008; final version received 2 October 2009 ) We are interested in the statistical linear inverse problem Y = Af + ϵξ, where A denotes a compact operator and ϵξ a stochastic noise. In this setting, the ris hull point of view provides interesting tools for the construction of adaptive estimators. It sheds light on the processes governing the behaviour of linear estimators. In this article, we investigate the lin between some threshold estimators and this ris hull point of view. The penalised blocwise Stein rule plays a central role in this study. In particular, this estimator may be considered as a ris hull minimisation method, provided the penalty is well chosen. Using this perspective, we study the properties of the threshold and propose an admissible range for the penalty leading to accurate results. We eventually propose a penalty close to the lower bound of this range. Keywords: inverse problem; oracle inequality; ris hull; penalised blocwise Stein rule Mathematical Subect classification (2000): 62G05; 62G20 1. Introduction This article deals with the statistical inverse problem Y = Af + ϵξ, (1) where H,K are Hilbert spaces and A : H K denotes a linear operator. The function f H is unnown and has to be recovered from a measurement of Af corrupted by some stochastic noise ϵξ. Here, ϵ represents a positive noise level and ξ a Gaussian white noise (see, Hida 1980 for more details). In particular, for all g K, we can observe Y, g = Af, g +ϵ ξ,g, (2) where ξ,g N(0, g 2 ). Denote by A the adoint operator of A. In the sequel, A is supposed to be a compact operator. Such a restriction is very interesting from a mathematical point of view. The operator (A A) 1 is unbounded: the least square solution fˆ LS = (A A) 1 A Y does not continuously depend on Y. The problem is said to be ill-posed. * clement.marteau@insa-toulouse.fr ISSN print/issn online American Statistical Association and Taylor & Francis 2010 DOI: /
2 686 C. Marteau In a statistical context, several studies of ill-posed inverse problems have been proposed in recent years. It would be however impossible to cite them all. For the interested reader, we may mention Fan (1991) and Ermaov (1989) for convolution operators, Johnstone and Silverman (1990) for the positron emission tomography problem, Donoho (1995) in a wavelet setting, or Bissantz, Hohage, Mun and Ryumgaart (2007) for a general statistical approach and some rates of convergence. We also refer to Engl, Han and Neubauer (1996) for a survey in a numerical setting. Using a specific representation (i.e. particular choices for g in Equation (2)) may help in the understanding of the model (1). In this sense, the classical singular value decomposition is a very useful tool. Since A A is compact and self-adoint, the associated sequence of eigenvalues (b 2) N is strictly positive and converges to 0 as +. The sequence of eigenvectors (φ ) N is supposed, in the sequel, to be an orthonormal basis of H. For all N, set ψ = b 1 Aφ. The triple (b,φ,ψ ) N verifies Aφ = b ψ, A ψ = b φ, for all N. This representation leads to a simpler understanding of the model (1). Indeed, for all N, using Equation (3) and the properties of the Gaussian white noise, y = Y, ψ = Af, ψ +ϵ ξ,ψ =b f, φ +ϵξ, (4) where ξ are i.i.d. standard Gaussian variables. Hence, for all N, we can obtain from Equation (1) an observation on θ = f, φ. In the l 2 -sense, θ = (θ ) N and f represents the same mathematical obect. The sequence space model (4) clarifies the effect of A on the signal f. Since A is compact, b 0 as +. For large values of, the coefficients b θ are negligible compared with ϵξ. In a certain sense, the signal is smoothed by the operator. The recovery becomes difficult in the presence of noise for large frequencies (i.e. when is large). Our aim is to estimate the sequence (θ ) N = ( f, φ ) N. The linear estimation plays an important role in the inverse problem framewor and is a starting point for several recovering methods. Let (λ ) N be a real sequence with values in [0, 1]. In the following, this sequence will be called a filter. The associated linear estimator is defined by + fˆ λ = λ b 1 y φ. =1 (3) In the sequel, fˆ λ may be sometimes identified with ˆθ λ = (λ b 1 y ) N. The meaning will be clear from the context. The error related to fˆ λ is measured by the quadratic ris E θ fˆ λ f 2. Given a family of estimators T, we would lie to construct an estimator θ comparable with the best possible one contained in T (called the oracle), via the inequality E θ θ θ 2 (1 + ϑ ϵ )E θ θ T θ 2 + Cϵ 2, (5) with ϑ ϵ,c >0. The quantity Cϵ 2 is a residual term. The inequality (5) is said to be sharp if ϑ ϵ 0 as ϵ 0. In this case, θ asymptotically mimics the behaviour of θ T. Oracle inequalities play an important, though recent role in statistics. They provide a precise and non-asymptotic measure on the performances of θ, which does not require a priori informations on the signal. In several situations, oracle results lead to interesting minimax rates of convergence. This theory has given rise to a considerable amount of literature. We mention in particular Donoho (1995), Barron, Birgé and Massart (1999), Cavalier, Golubev, Picard and Tsybaov (2002) or Candés (2006) for a survey.
3 Journal of Nonparametric Statistics 687 The ris hull minimisation (RHM) principle, initiated in Cavalier and Golubev (2006) for spectral cut-off (or proection) schemes, is an interesting approach for the construction of datadriven parameter choice rules. The principle is to identify the stochastic processes that control the behaviour of a proection estimator. Then, a deterministic criterion, called a hull, is constructed in order to contain these processes. We also mention Marteau (in press) for a generalisation of this method to some other regularisation approaches (Tihonov, Landweber,...). In this article, our aim is to establish a lin between the RHM approach and some specific threshold estimators. We are interested in the family of blocwise constant filters. In this specific case, this approach leads to the penalised blocwise Stein rule studied for instance in Cavalier and Tsybaov (2002). This is a new perspective for this well-nown threshold estimator. In particular, the ris hull point of view maes precise the role of the penalty through a simple and general assumption. This article is organised as follows. In Section 2, we construct a hull for the family of blocwise constant filters. Section 3 establishes a lin between the penalised blocwise Stein rule and the ris hull method, and investigates the performances of the related estimator. Section 4 proposes some examples and a discussion on the choice of the penalty. Some results on the theory of ordered processes and the proofs of the main results are gathered in Section A ris hull for blocwise constant filters In this section, we recall the RHM approach for proection schemes. Then, we explain why an extension of the RHM method may be pertinent. A specific family of estimators is introduced and the related hull is constructed The ris hull principle For all N N, denote by ˆθ N the proection estimator associated with the filter (1 { N} ) N. For each value of N N, the related quadratic ris is E θ ˆθ N θ 2 = >N θ 2 + E θ N =1 (b 1 y θ ) 2 = >N θ 2 + ϵ2 N =1 b 2. (6) The optimal choice for N is the oracle N 0 that minimises E θ ˆθ N θ 2. It is a trade-off between the two sums (bias and variance) in the r.h.s. of Equation (6). This trade-off cannot be found without a priori nowledge on the unnown sequence θ. Data-driven choices for N are necessary. The classical unbiased ris estimation (URE) approach consists in estimating the quadratic ris. One may use the functional U(y, N) = N =1 The related adaptive bandwidth is defined as b 2 y2 + 2ϵ2 N =1 Ñ = arg min U(y, N). N N b 2, N N. Some oracle inequalities related to this approach have been obtained in different papers (see, for instance, Cavalier et al. 2002). Nevertheless, this approach suffers from some drawbacs, especially in the inverse problem framewor.
4 688 C. Marteau Indeed, this method is based on the average behaviour of the proection estimators: U(y, N) is an estimator of the quadratic ris. This is quite problematic in the inverse problem framewor where the main quantities of interest often possess a great variability. This can be illustrated by a very simple example: f = 0. In this particular case, for all N N, the loss of the related proection estimator ˆθ N is ˆθ N θ 2 = ϵ 2 N =1 b 2 + η N, with η N = ϵ 2 N =1 b 2 (ξ 2 1) N N. Since b 0 as +, the process N η N possesses a great variability, which explodes with N. In this case, the behaviour of E θ ˆθ N θ 2 and ˆθ N θ 2 are rather different. The variability is neglected when only considering the average behaviour of the loss. This leads in practice to wrong decisions for the choice of N. More generally, as soon as the signal-to-noise ratio is small, one may expect poor performances of the URE method. We refer to Cavalier and Golubev (2006) for a complete discussion illustrated by some numerical simulations. From now on, the problem is to construct a data-driven bandwidth that taes into account this phenomenon. Instead of the quadratic ris, in Cavalier and Golubev (2006) it is proposed to consider a deterministic term V (θ, N), called a hull, satisfying E θ sup[ ˆθ N θ 2 V (θ, N)] 0. (7) N N This hull bounds uniformly the loss in the sense of inequality (7). Ideally, it is constructed in order to contain the variability of the proection estimators. The related estimator is then defined as the minimiser of V (y, N), an estimator of V (θ, N), on N. The theoretical and numerical properties of this estimator are presented and discussed in detail in Cavalier and Golubev (2006) in the particular case of spectral cut-off regularisation. In the same spirit, we mention Marteau (in press) for an extension of this method to wider regularisation schemes (Landweber, Tihonov,...) The choice of In order to construct an estimator leading to an accurate oracle inequality, one must consider both a family of filters and a procedure in order to mimic the behaviour of the best element in. In this article we are interested in the ris hull principle. This point of view possesses indeed interesting theoretical properties. It maes the role of the stochastic processes involved in linear estimation more precise and leads to an accurate understanding of the problem. Now, we address the problem of the choice of. In the oracle sense, an ideal goal of adaptation is to obtain a sharp oracle inequality over all possible estimators. This is in most cases an unreachable tas since this set is too large. The difficulty of the oracle adaptation increases with the size of the considered family. At a smaller scale, one may consider mon, the family of linear and monotone filters defined as mon ={λ = (λ ) N l 2 : 1 λ 1 λ 0}, The set mon contains the linear and monotone filters and covers most of the existing linear procedures as the spectral cut-off, Tihonov, Pinser or Landweber filters (see, for instance, Engl et al or Bissantz et al. 2007). Some oracle inequalities have been already obtained on specific subsets of mon in Cavalier and Golubev (2006) and Marteau (in press), but we would lie to consider the whole family at the same time.
5 Journal of Nonparametric Statistics 689 The set mon is always rather large and obtaining an explicit estimator in this setting seems difficult. A possible alternative is to consider a set that contains elements presenting a behaviour similar to the best one in mon, but where an estimator could be explicitly constructed. In this sense, the collection of blocwise constant estimators may be a good choice. In the sequel, this family will be identified to the set ={λ l 2 : 0 λ 1,λ = λ K, [K,K +1 1], = 0,...,J,λ = 0, >N}, where J, N and (K ) =0,...,J are such that K 0 = 1, K J = N + 1 and K >K 1. In the following, we will also use the notations I ={ [K 1,K 1]} and T = K K 1, for all {1,...,J}. In the following, most of the results are established for a general construction of. There exists several choices that may lead to interesting results. Typically, N + as ϵ 0. It is chosen in order to capture most of the nonparametric functions with a controlled bias (see Equation (17) for an example). Concerning the size (T ),...,J of the blocs, we refer to Cavalier and Tsybaov (2001) for several examples. The family can easily be handled. In particular, each bloc I can be considered independently of the other ones. This simplifies considerably the study of the considered estimators. Moreover, for all θ l 2, R(θ, λ mon ) = inf R(θ, λ) and R(θ, λ 0 ) = inf R(θ, λ) (8) λ mon λ are in fact rather close, subect to some reasonable constraints on the sequences (b ) N and (T ),...,J (see Section 4 or Cavalier and Tsybaov 2002 for more details). The extension of the RHM principle to the family presents other advantages. The related estimator corresponds indeed to a threshold scheme. Hence, we will be able to address the question of the choice of the threshold through the ris hull approach. This may be a new perspective for the blocwise constant adaptive approach, and more generally for this class of regularisation procedures A ris hull for First, we introduce some notations. For all {1,...,J}, let η be defined as follows: η = ϵ 2 b 2 (ξ 2 1). (9) I The random variable η plays a central role in blocwise constant estimation. It corresponds to the main stochastic part of the loss in each bloc I. The hull proposed in Theorem 2.1 is constructed in order to contain these terms. Introduce also ρ ϵ = max and θ 2 () = θ 2 {1,...,J}, (10),...,J I with = max I ϵ 2 b 2 σ 2 and σ 2 = ϵ2 I b 2. We will see that ρ ϵ 0 as ϵ 0 with appropriate choices of blocs and minor assumptions on the sequence (b ) N (see Section 4 for more details).
6 690 C. Marteau From now on, we are able to present a hull for the family, that is, a deterministic sequence (V (θ, λ)) λ verifying E θ sup λ { ˆθ λ θ 2 V (θ, λ)} 0. The proof of the following result is postponed to Section 5.2. Theorem 2.1 Let (pen ),...,J be a positive sequence verifying E[η pen ] + C 1 ϵ 2, (11) for some positive constant C 1. Then, there exists B>0such that V (θ, λ) = (1 + Bρ ϵ ) λ K ) [(1 2 θ 2 () + λ2 K σ 2 + 2λ K pen ]+ >N is a ris hull on. + C 1 ϵ 2 + Bρ ϵ R(θ, λ 0 ), (12) Theorem 2.1 states in fact that the penalised quadratic ris, R pen (θ, λ) = λ K ) [(1 2 θ 2 () + λ2 K σ 2 ]+ θ λ K pen, (13) >N is, up to some constants and residual terms, a ris hull on the family. Hence, we will use R pen (θ, λ) as a criterion for the construction of a data-driven filter on, provided that inequality (11) is satisfied (see Section 3). The construction of a hull can be reduced to the choice of a penalty (pen ),...,J, provided Equation (11) is verified. A brief discussion concerning this assumption is presented in Section 3. Some examples of penalties are presented in Section 4. θ 2 3. Oracle inequalities In Section 2, we have proposed a family of hulls indexed by the penalty (pen ),...,J. In this section, we are interested in the performances of the estimators constructed from these hulls. In the sequel, we set λ = λ K for all {1,...,J}. This is a slight abuse of notation, but the meaning will be clear from the context. Then define U pen (y, λ) = [(λ 2 2λ )( ỹ 2 () σ 2 ) + λ2 σ 2 + 2λ pen ], where ỹ 2 () = ϵ2 b 2 y2, {1,...,J}. I The term U pen (y, λ) is an estimator of the penalised quadratic ris R pen (θ, λ) defined in Equation (13). Recall that from Theorem 2.1, R pen (θ, λ) is, up to some constant and residual
7 Journal of Nonparametric Statistics 691 terms, a ris hull. Let θ denote the estimator associated with the filter λ = arg min λ U pen(y, λ). (14) Using simple algebra, we can prove that the solution of Equation (14) is ( ( ) ) λ = 1 σ 2 + pen / ỹ 2 (), I, = 1,...,J, + 0, > N. (15) This filter behaves as follows. For all {1,...,J}, λ compares the term ỹ 2 () with σ 2 + pen. When θ 2 () is small (or even equal to 0), this comparison may lead to wrong decision. Indeed, ỹ 2 () is in this case close to σ 2 + η. The variance of the variables η is very large since b 0 as +. Fortunately, these variables are uniformly bounded by the penalty in the sense of Equation (11). Hence, λ should be close to 0 for small θ 2 (). Theorem 3.1 emphasises this heuristic discussion through a simple oracle inequality. Remar that the particular case pen = 0 for all {1,...,J} leads to the URE approach. Inequality (11) does not hold in this setting. Theorem 3.1 Let θ be the estimator associated with the filter λ. Assume that inequality (11) holds. Then, there exists C > 0 independent of ϵ such that, for all θ l 2 and any 0 <ϵ<1, E θ θ θ 2 (1 + τ ϵ ) inf λ R(θ, λ) + C ϵ 2, where τ ϵ 0 as ϵ 0 provided max pen /σ 2 0 and ρ ϵ 0 as ϵ 0. Although this result is rather general, the constraints on the blocs and the penalty are only expressed through one inequality (here Equation (11)). This is one of the advantages of the RHM approach. For the particular choice pen = ϕ σ 2 leading to the penalised blocwise Stein rule, we obtain a simpler assumption than in Cavalier and Tsybaov (2002). This is an interesting outcome. We conclude this section with an oracle inequality on mon, the family of monotone filters. We tae advantage of the closeness between and mon under specific conditions. For the sae of convenience, we restrict to one specific type of blocs. Let ν ϵ = log ϵ 1 and κ ϵ = log 1 ν ϵ, where for all x R, x denotes the minimal integer strictly greater than x. Define the sequence (T ),...,J by T 1 = ν ϵ, T = ν ϵ (1 + κ ϵ ) 1, > 1, (16) and the bandwidth J as J = min{ : K > N}, with N = max { m : m =1 b 2 ϵ 2 κ 3 ϵ }. (17) Corollary 3.2 Assume that (b ) ( β ) N for some β>0 and that the sequence (pen ),...,J satisfies inequality (11). Then, for any θ l 2 and 0 <ϵ<ϵ 1, we have E θ θ θ 2 (1 + Ɣ ϵ ) inf λ mon R(θ, λ) + C 2 ϵ 2, where C 2,ϵ 1 denote positive constants independent of ϵ, and Ɣ ϵ 0 as ϵ 0.
8 692 C. Marteau The proof is a direct consequence of Lemma 1 of Cavalier and Tsybaov (2002). It can be in fact extended to other constructions for blocs. One has only to verify that σ+1 2 max,...,j 1 σ η ϵ, for 0 <η ϵ < 1 2. The inequality is sharp if η ϵ 0 as ϵ 0. The interested reader can refer to Cavalier and Tsybaov (2001) for some examples of blocs. The results obtained in this section hold for a wide range of penalties. This range is characterised and studied in the next section. 4. Some choices of penalty In this section, we present two possible choices of penalty satisfying inequality (11). Then, we present a brief discussion on the range in which this sequence can be chosen. The goal of this section is not to say what could be a good penalty. This question is rather ambitious and may require more than a single paper. Our aim here is rather to present some hint on the way it could be chosen and on the related problem. For the sae of convenience, we use in this section the same framewor of Corollary 3.2. We assume that the sequence of eigenvalues possesses a polynomial behaviour, that is, (b ) ( β ) N for some β>0. Concerning the set, we consider the wealy geometrically increasing blocs defined in Equations (16) and (17). All the results presented in the sequel hold for other constructions (see, for instance, Cavalier and Tsybaov 2001). We leave the proof to the interested reader. Concerning the sequence (b ) N, the relaxation of the assumption on polynomial behaviour is not straightforward. In particular, considering exponentially decreasing eigenvalues require a specific treatment in this setting. Let u and v be two real sequences. Here, and in the sequel, for all N, we write u v if we can find a positive constant C independent of such that u Cv, and u v if both u v and u v. Since the sequence (b ) N possesses a polynomial behaviour, we can write that for all {1,...,J} σ 2 = ϵ2 I b 2 ϵ 2 K 2β (K +1 K ) and = since K +1 /K 1 as +. K 2β +1 K 2β (K +1 K ) (K +1 K ) 1, 4.1. Examples The following lemma provides upper bounds on the term E θ [η pen ] + and maes more explicit the behaviour of the penalty. It can be used to prove inequality (11) in several situations. Lemma 4.1 For all {1,...,J} and δ such that 0 <δ<ϵ 2 b 2 K 1 /2, E[η pen ] + δ 1 exp δpen + δ2 2 + ϵ 6 b 6 4δ3 (1 2δϵ I 2 b 2 K 1 ),
9 Journal of Nonparametric Statistics 693 with 2 = ϵ4 b 4, {1,...,J}. (18) I Proof Let {1,...,J} be fixed. First, remar that for all δ>0, E θ [η pen ] + = = + pen + pen P (η t)dt, P(exp(δη ) e δt ) dt, δ 1 e δpen Eθ exp(δη ). Then, provided 0 <δ<ϵ 2 bk 2 1 /2, E θ exp(δη ) exp δ2 2 + ϵ 6 b 6 4δ3 (1 2ϵ I 2 δb 2 This concludes the proof. The principle of RHM leads to an interesting choice. The only restriction on (pen ),...,J from the ris hull point of view is expressed through inequality (11) as follows: E θ [η pen ] + C 1 ϵ 2, for some positive constant C 1. Since E θ [η u] + E θ η 1 {η u} for all positive u, we may be interested in the penalty ) +. pen = (1 + α)u, with U = inf{u : E θ η 1 {η u} ϵ 2 }, {1,...,J}, (19) for some α>0. This penalty is an extension of the sequence proposed by Cavalier and Golubev (2006) for spectral cut-off schemes. The next corollary establishes that the sequence (pen ),...,J is a relevant choice for the penalty. We obtain a sharp oracle inequality for the related estimator. In particular, inequality (11) holds, i.e. the penalty contains the variability of the problem. Corollary 4.2 Let θ be the estimator introduced in Equation (15) with the penalty (pen ),...,J. Then, E θ θ θ 2 (1 + γ ϵ ) inf λ R(θ, λ) + C 4 α ϵ2, (20) where C 4 denotes a positive constant independent of ϵ and γ ϵ = o(1) as ϵ 0. Proof We use the following lower bound on U : U = inf{u : E θ η 1 {η u} ϵ 2 } 2 2 log(cϵ 4 2 ), (21) where for all {1,...,J}, 2 is defined in Equation (18). The proof can be directly derived from the Lemma 1 of Cavalier and Golubev (2006). Then, thans to Theorems 2.1 and 3.1, we
10 694 C. Marteau only have to prove that inequality (11) holds since max pen /σ 2 converges to 0 as ϵ 0. For all {1,...,J}, using Lemma 4.1, E[η pen ] + 1 δ exp δpen + δ2 2 + ϵ 6 b 6 4δ3 (1 2δϵ I 2 b 2 K 1 ), (22) for all 0 <δ<ϵ 2 bk 2 1 /2. Setting δ = log(cϵ 4 2 ) 2 2, and using Equation (21), we obtain { } E[η pen ] log(cϵ 4 2) exp 2 log(cϵ 4 2 ) exp{ (1 + α) log(cϵ 4 2 )}, Cϵ 2 1 log(cϵ 4 2) exp{ α log(cϵ 4 2 )}. Indeed, provided Equations (16) and (17) hold, δb 2 K 1 and the last term in the right-hand side of the exponential in Equation (22) converge to 0 as +. Hence, we eventually obtain E[η pen ] + Cϵ 2 Cϵ 2 1 log 1/2 (CT ) exp{ α log(ct )}, 1/2 exp{ α log(cν ϵ (1 + κ ϵ ) )}, Cϵ 2 + 1/2 exp{ αd} < Cϵ2 α, where D and C denote two positive constants independent of ϵ. This concludes the proof of Corollary 4.3. The penalty (19) is not explicit. Nevertheless, it can be computed using Monte Carlo approximation: there are only J terms to compute. Remar that it is also possible to deal with the lower bound (21) which is explicit. The theoretical results are essentially the same since we use this bound in the proof of Corollary 4.3. Now, we consider the penalty introduced in Cavalier and Tsybaov (2002). For all {1,...,J}, it is defined as follows: pen CT = γ σ 2, with 0 <γ <1 2. Remar that with our assumptions on and (b ) N, pen CT ϵ 2 K 2β (K +1 K ) 1 γ pen. This inequality entails that Equation (11) is satisfied. Hence, an oracle inequality similar to Equation (20) can be obtained for this sequence. This is the same result as in Cavalier and Tsybaov (2002). However, we construct a different proof, thans to the RHM approach.
11 Journal of Nonparametric Statistics The range Theorem 3.1 provides in fact an admissible range for the penalty. If we want a sharp oracle inequality, necessarily max pen /σ 2 0 as ϵ 0. Hence, the penalty should not be too large. At the same time, we require from inequality (11) that the penalty contains in a certain sense the variables (η ),...,J. Hence, small penalties will not be convenient. From inequality (11) and Lemma 4.1, the sequence (pen ),...,J should at least fulfil pen for all {1,...,J}. Since we require at the same time max σ 2/pen 0 as +, an admissible penalty in the sense of Theorem 3.1 should satisfy pen σ 2, {1,...,J}. (23) With our assumptions, this range is equivalent to ϵ 2 K 2β (K +1 K ) 1/2 pen ϵ 2 K 2β +1 (K +1 K ), {1,...,J}. It is possible to prove that similar to Equation (20) oracle inequality holds for all penalty (pen ),...,J satisfying Equation (23). This is in particular the case for (pen ),...,J and (pen CT ),...,J. Using the same bounds as in the proof of Corollary 4.3, it seems difficult to obtain a sharp oracle inequality with the penalty ( ),...,J. Nevertheless, the range (23) is derived from upper bounds on the estimator θ and may certainly be refined. A lower bound approach may perhaps produce interesting results (see for instance (Efromovich 2007)). In order to conclude this discussion, it would be interesting to compare the two penalties presented in this section. Remar that (pen ),...,J is closer to the lower bound of the range that (pen ),...,J. However, we do not claim that a penalty is better than another. This is an interesting but a very difficult question that should be addressed in a separate paper Conclusion The main contribution of this article is an extension of the RHM method to the family and a lin between the penalised blocwise Stein rule and the ris hull approach. It is rather surprising that a threshold estimator may be studied via some tools usually related to a parameter selection setting. In any case, this approach allows us to develop a general study on this threshold estimator. In particular, we impose a simple assumption on the threshold that is related to the variance of the η in each bloc. This treatment may certainly be applied to other existing adaptive approaches. For instance, one may be interested in wavelet threshold estimators in a wavelet-vaguelette decomposition framewor (Donoho 1995). The generalisation of our wor in this setting is not straightforward since the blocs are of size one. Nevertheless, this approach may provide some new interesting perspectives. In order to conclude this article, it seems necessary to discuss the role played by the constant α in the penalty (pen ),...,J. Inequality (11) does not hold for α = 0. On the other hand, the proof of Corollary 4.3 indicates that large values for α will not lead to an accurate recovering. The choice of α has already been discussed and illustrated via some numerical simulations in a slightly different setting, see Cavalier and Golubev (2006) or Marteau (in press) for more details. Remar that we do not require α to be greater than one in this article. This is a small difference compared with the constraints expressed in a regularisation parameter choice scheme. This can be explained by the blocwise structure of the variables (η ),...,J.
12 696 C. Marteau 5. Proofs and technical lemmas 5.1. Ordered processes Ordered processes were introduced in Kneip (1994). In Cao and Golubev (2006), these processes are studied in detail and very interesting tools are provided. These stochastic obects may play an important role in the adaptive estimation, see in particular Golubev (2007) or Marteau (in press) for more details. The aim of this section is not to provide an exhaustive presentation of this theory but rather to introduce some definitions and useful properties. Definition 5.1 Let ζ(t), t 0, be a separable random process with Eζ(t) = 0 and finite variance 2 (t). It is called ordered if, for all t 2 t 1 0, 2 (t 2 ) 2 (t 1 ) and E[ζ(t 2 ) ζ(t 1 )] 2 2 (t 2 ) 2 (t 1 ). Let ζ be a standard Gaussian random variable. The process t ζt is the most simple example of ordered process. Wiener processes are also covered by Definition 5.1. The family of ordered processes is in fact quite large. Assumption C1 There exists κ>0such that { } ζ(t 1 ) ζ(t 2 ) ϕ(κ) = sup E exp κ < +. t 1,t 2 E[ζ(t1 ) ζ(t 2 )] 2 This assumption is not very restrictive. Several processes encountered in linear estimation satisfy it. The proof of the following result can be found in Cao and Golubev (2006). Lemma 5.2 Let ζ(t), t 0, be an ordered process satisfying ζ(0) = 0 and Assumption C1. There exists a constant C = C(κ) such that, for all γ>0, E sup[ζ(t) γ 2 (t)] + C t 0 γ. This lemma is rather important in the theory of ordered processes and leads to several interesting results. In particular, the following corollary will be often used in the proofs. Corollary 5.3 Let ζ(t), t 0, be an ordered process satisfying ζ(0) = 0 and Assumption C1. Consider ˆt measurable with respect to ζ. Then, there exists C = C(κ) > 0 such that Eζ(ˆt) C E 2 (ˆt). Proof Let γ>0 be fixed. Using Lemma 5.2, Eζ(ˆt) = Eζ(ˆt) γ E 2 (ˆt) + γ E 2 (ˆt), E sup[ζ(t) γ 2 (t)] + + γ E 2 (ˆt), t 0 C γ + γ E 2 (ˆt). Choose γ = (E 2 (ˆt)) 1/2 in order to conclude the proof.
13 Journal of Nonparametric Statistics Proofs Proof of Theorem 2.1 First, remar that E θ sup { ˆθ λ θ 2 V (θ, λ)} λ { + = E θ sup (1 λ ) 2 θ ϵ2 λ with and =1 =1 λ 2 b 2 = E θ sup (1 λ ) 2 θ 2 () +λ2 λ = E θ (1 ˆλ ) 2 θ 2 () + ˆλ 2 ϵ 2 b 2 I ϵ 2 b 2 I ξ 2 2ϵ + =1 λ (1 λ )θ b 1 ξ V (θ, λ) ξ 2 2λ (1 λ )X + >N θ 2 }, V (θ, λ), ξ 2 + 2ˆλ (ˆλ 1)X + θ 2 E θv (θ, ˆλ), >N ˆλ = arg sup λ { ˆθ λ θ 2 V (θ, λ)} Let {1,...,J} be fixed. Use the decomposition X = ϵ I θ b 1 ξ, {1,...,J}. (24) E θ 2ˆλ (ˆλ 1)X = E θ ˆλ 2 X + E θ (ˆλ 2 2ˆλ )X, = E θ ˆλ 2 X + E θ (1 ˆλ ) 2 X = A 1 + A2, (25) since E θ X = 0. First, consider A 1. Let λ0 denote the blocwise constant oracle on the bloc. Using Corollary 5.3 in Section 5.1, A 1 = E θ ˆλ 2 X = E θ [ˆλ 2 (λ0 )2 ]X C E θ [ˆλ 2 (λ0 )2 ] 2 ϵ 2 b 2 θ 2, I where C>0 denotes a positive constant. Indeed, both processes ζ : t (t 2 (λ 0 )2 )X, t [(λ 0 )2, 1] and ζ : t (t 2 (λ 0 )2 )X, t [(λ 0 ) 1 ; + [ are ordered and satisfy Assumption C1. For all γ>0, use [ˆλ 2 (λ0 )2 ] 2 4[(1 ˆλ ) 2 + (1 λ 0 )2 ](ˆλ 2 + (λ0 )2 ), and the Cauchy Schwartz and Young inequalities A 1 E C θ [(1 ˆλ ) 2 + (1 λ 0 )2 ](ˆλ 2 + (λ0 )2 ) max ϵ 2 b 2 I θ 2 (), CE θ [γ(1 ˆλ ) 2 θ 2 () + γ 1 ˆλ 2 σ 2 ]+Cγ(1 λ0 )2 θ 2 () + Cγ 1 (λ 0 )2 σ E 2 + C θ (1 ˆλ ) 2 ˆλ 2 max ϵ 2 b 2 I θ 2 (), (26)
14 698 C. Marteau for some positive constant C. The bound of the last term in the r.h.s. of Equation (26) requires some wor. First, suppose that In such a situation, for all γ>0, If Equation (27) holds, then θ 2 () σ 2. (27) E θ (1 ˆλ ) 2 ˆλ 2 max ϵ 2 b 2 I θ 2 () θ 2 () E θ ˆλ 2 max ϵ 2 b 2, I θ 2 () = σ 2 θ 2 () σ 2 + θ 2 () ( 1 + θ 2 () σ 2 ) γ θ 2 () + γ 1 E θ ˆλ 2 σ 2. 2{(1 λ 0 )2 θ 2 () + (λ0 )2 σ 2 }, where λ 0 is the oracle defined in Equation (8). Indeed, Now, suppose Then, for all γ>0, Using Equation (28), λ 0 = θ 2 () σ 2, {1,...,J}. + θ 2 () θ 2 () >σ2. (28) E θ (1 ˆλ ) 2 ˆλ 2 max ϵ 2 b 2 I θ 2 () max ϵ 2 b 2 E θ(1 ˆλ ) 2 θ 2 I (), σ 2 = σ 2 θ 2 () σ 2 + θ 2 () ( 1 + σ 2 θ 2 () ) γ E θ (1 ˆλ ) 2 θ 2 () + γ 1 σ 2. 2{(1 λ 0 )2 θ 2 () + (λ0 )2 σ 2 }. Setting γ =, we eventually obtain A 1 C E θ [(1 ˆλ ) 2 θ 2 () + ˆλ 2 σ 2 ]+C [(1 λ 0 )2 θ 2 () + (λ0 )2 σ 2 ], (29) for some constant C>0 independent of ϵ. The same bound occurs for the term A 2 Equation (25). Hence, there exists B>0independent of ϵ such that in E θ sup { ˆθ λ θ 2 V (θ, λ)} λ E θ (1 + Bρ ϵ )(1 ˆλ ) 2 θ 2 () + ˆλ 2 ϵ 2 b 2 ξ 2 + Bρ ϵ ˆλ 2 σ 2 I + >N θ 2 + Bρ ϵr(θ, λ 0 ) E θ V (θ, ˆλ),
15 Journal of Nonparametric Statistics 699 E θ sup (1 + Bρ λ ϵ )(1 λ ) 2 θ 2 () + λ2 ϵ 2 b 2 ξ 2 + Bρ ϵλ 2 σ 2 I + θ 2 + Bρ ϵr(θ, λ 0 ) V (θ, λ), >N where ρ ϵ is defined in Equation (10). Now, using Equations (9) and (12), E θ sup { ˆθ λ θ 2 V (θ, λ)} E θ sup [λ 2 λ λ η 2λ pen ] C 1 ϵ 2, = E θ sup [λ 2 η 2λ pen ] C 1 ϵ 2. λ [0,1] Let {1,...,J} be fixed. We are looing for λ [0, 1] that maximises the quantity λ 2 η 2λ pen. If η < 0, the function λ λ 2 η 2λpen is concave and the maximum on [0, 1] is attained for λ = 0. Now, if η > 0, the function λ λ 2 η 2λpen is convex and the maximum on [0, 1] is attained in 0 or 1. Therefore, Using inequality (11), we eventually obtain sup (λ 2 η 2λ pen ) =[η 2pen ] +, (30) λ [0,1] E θ sup λ { ˆθ λ θ 2 V (θ, λ)} This concludes the proof of Theorem 2.1. E θ [η 2pen ] + C 1 ϵ 2, E θ [η pen ] + C 1 ϵ 2 0. Remar where Using the same algebra in the proof of Theorem 2.1, it is possible to prove that E θ sup λ { ˆθ λ θ 2 W(θ, λ)} 0, (31) W(θ, λ) = (1 + Bρ ϵ ) λ K ) [(1 2 θ 2 () + λ2 K σ 2 + λ2 K pen ]+ >N + C 1 ϵ 2 + Bρ ϵ R(θ, λ 0 ). θ 2 Hence, W(θ, λ) is also a ris hull. For all {1,...,J}, the only difference with V (θ, λ) is contained in the bound of sup {λ 2 η λ 2 pen } [η pen ] +. λ [0,1] Then, we use inequality (11) in order to obtain Equation (31).
16 700 C. Marteau Proof of Theorem 3.1 In the situation where inequality (11) holds, Equation (31) yields E θ θ θ 2 W (θ, λ ) = (1 + Bρ ϵ ) R pen (θ, λ ) + Bρ ϵ R(θ, λ 0 ) + C 1 ϵ 2, (32) where R pen (θ, λ ) = λ [(1 )2 θ 2 () + (λ )2 σ 2 + (λ )2 pen ]+ θ 2, >N and B denotes a positive constant independent of ϵ. Moreover, from Equation (14), U pen (y, λ ) U pen (y, λ), λ. The proof of Theorem 3.1 is mainly based on these two equalities. First, remar that Hence, U pen (y, λ ) R pen (θ, λ ) = U pen (y, λ ) R pen (θ, λ ) = = [((λ )2 2λ )( ỹ 2 () σ 2 ) + (λ )2 σ 2 + 2λ pen (1 λ )2 θ 2 () (λ )2 σ 2 (λ )2 pen ] >N [{(λ )2 2λ }( ỹ 2 () σ 2 ) (1 λ )2 θ 2 () +{2λ (λ )2 }pen ] >N θ 2. θ 2, {(λ )2 2λ } I(θ 2 + ϵ2 b 2 (ξ 2 1) + 2ϵb 1 ξ θ ) (1 λ )2 θ 2 () +{2λ (λ )2 }pen θ 2, >N = {(λ )2 2λ }(η + 2X pen ) θ 2, where η and X are respectively defined in Equations (9) and (24). Hence, from Equation (14) R pen (θ, λ ) = U pen (y, λ ) + θ 2 + {2λ (λ )2 }(η + 2X pen ), U pen (y, λ p ) + θ 2 + {2λ (λ )2 }(η + 2X pen ), where λ p = arg inf λ R pen(θ, λ)
17 Journal of Nonparametric Statistics 701 and R pen (θ, λ) is defined in Equation (13). Then, with simple algebra This leads to E θ U pen (y, λ p ) = E θ [{(λ p )2 2λ p }( ỹ 2 () σ 2 ) + (λp )2 σ 2 + 2λp pen ], = R pen (θ, λ p ) θ 2. E θ R pen (θ, λ ) R pen (θ, λ p ) + E θ {2λ (λ )2 }(η + 2X pen ). (33) We are now interested in the behaviour of the right-hand side of Equation (33). First, using Equations (25) (29) in the proof of Theorem 2.1, E θ {2λ (λ )2 }X Cρ ϵ {(1 λ p )2 θ 2 () + (λp )2 σ 2 }+ Cρ ϵ E θ {(1 λ )2 θ 2 () + (λ )2 σ 2 }, for all {1,...,J}. Here, C and C denote positive constants independent of ϵ. In particular, it is always possible to obtain C verifying Cρ ϵ < 1 (see the proof of Theorem 2.1 for more details). Hence, E θ R pen (θ, λ ) (1 + Cρ ϵ )R pen (θ, λ p ) + Cρ ϵ E θ R pen (θ, λ ) + E θ Then, from inequalities (11) and (30), This leads to E θ {2λ (λ )2 }(η pen ). {2λ (λ )2 }(η pen ) = E θ [η pen ] + C 1 ϵ 2. E θ R pen (θ, λ ) (1 + Cρ ϵ )R pen (θ, λ p ) + Cρ ϵ E θ R pen (θ, λ ) + C 1 ϵ 2, Using Equations (32) and (34), (1 Cρ ϵ )E θ R pen (θ, λ ) (1 + Cρ ϵ )R pen (θ, λ p ) + C 1 ϵ 2, E θ R pen (θ, λ ) (1 + Cρ ϵ) (1 Cρ ϵ ) R pen(θ, λ p ) + Cϵ 2. (34) E θ θ θ 2 (1 + Bρ ϵ )E θ R pen (θ, λ ) + C 1 ϵ 2 + Bρ ϵ R(θ, λ 0 ), (1 + µ ϵ )R pen (θ, λ p ) + Cϵ 2 + Bρ ϵ R(θ, λ 0 ), where µ ϵ = µ ϵ (ρ ϵ ) such that µ ϵ 0 as ρ ϵ 0 and C is a positive constant independent of ϵ. In order to conclude the proof, we ust have to compare R(θ, λ 0 ) with R pen (θ, λ p ). For all {1,...,J}, introduce R pen (θ, λ) = (1 λ ) 2 θ 2 () + λ2 σ 2 + 2λ pen, and R (θ, λ) = (1 λ ) 2 θ 2 () + λ2 σ 2.
18 702 C. Marteau Then, Rpen (θ, σ λp 4 ) θ 2 () (σ 2 + σ 2 θ 4 () + θ 2 () )2 (σ 2 + 2pen σ 2 θ 2 () + θ 2 () )2 σ 2 σ 2, + θ 2 () ( ) = pen σ 2 R (θ, λ 0 ), since Rpen(θ, λ p ) Rpen(θ, λ 0 ) from the definition of λ p. This concludes the proof of Theorem 3.1. Acnowledgements The author would lie to than both an associate editor and two anonymous referees whose constructive comments and remars considerably improved the paper. References Barron, A., Birgé, L., and Massart, P. (1999), Ris Bounds for Model Selection via Penalization, Probability Theory and Related Fields, 113, Bissantz, N., Hohage, T., Mun, A., and Ryumgaart, F. (2007), Convergence Rates of General Regularization Methods for Statistical Inverse Problems and Applications, SIAM Journal on Numerical Analysis, 45, Candès, E. (2006), Modern Statistical Estimation via Oracle Inequalities, Acta Numerica, 15, Cao, Y., and Golubev, Y. (2006), On Adaptive Regression by Splines, Mathematical Methods of Statistics, 15, Cavalier, L., and Golubev,Y. (2006), Ris Hull Method and Regularization by Proections of Ill-Posed Inverse Problems, The Annals of Statistics, 34, Cavalier, L., and Tsybaov, A.B. (2001), Penalized Blocwise Stein s Method, Monotone Oracles and Sharp Adaptive Estimation, Mathematical Methods of Statistics, 3, Cavalier, L., and Tsybaov, A.B. (2002), SharpAdaptation for Inverse Problems With Random Noise, Probability Theory and Related Fields, 123, Cavalier, L., Golubev, Y., Picard, D., and Tsybaov, A.B. (2002), Oracle Inequalities for Inverse Problems, The Annals of Statistics, 30, Donoho, D.L. (1995), Nonlinear Solutions of Linear Inverse Problems by Wavelet-Vaguelette Decomposition, Journal of Applied and Computational Harmonic Analysis, 2, Efromovich, S. (2007), A Lower Bound Oracle Inequality for a Blocwise-Shrinage Estimate, Journal of Statistical Planning and Inference, 137, Engl, H.W., Han, M., and Neubauer, A. (1996), Regularization of Inverse Problems, Dordrecht: Kluwer Academic Publishers Group. Ermaov, M.S. (1989), Minimax Estimation of the Solution of an Ill-Posed Convolution Type Problem, Problems of Information Transmission, 25, Fan, J. (1991), On the Optimal Rates of Convergence for Nonparametric Deconvolution Problems, The Annals of Statistics, 19, Golubev, Y. (2007), On Oracle Inequalities Related to High Dimensional Linear Models, in Topics in Stochastic Analysis and Nonparametric Estimation, IMA Proceedings, eds. P.L. Chow, B. Morduhovich, and G. Yin, New Yor: Springer-Verlag, pp Hida, T. (1980), Brownian Motion, New Yor: Springer-Verlag. Johnstone, I.M., and Silverman, B.W. (1990), Speed of Estimation in Positron Emission Tomography and Related Inverse Problems, The Annals of Statistics, 18, Kneip, A. (1994), Ordered Linear Smoothers, The Annals of Statistics, 22, Marteau, C. (in press), Ris Hull Method for Spectral Regularization in Linear Statistical Inverse Problems, ESAIM Probability and Statistics, DOI: /PS/
Inverse problems in statistics
Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationInverse problems in statistics
Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/32 Part II 2) Adaptation and oracle inequalities YES, Eurandom, 10 October 2011
More informationInverse problems in statistics
Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/27 Table of contents YES, Eurandom, 10 October 2011 p. 2/27 Table of contents 1)
More informationD I S C U S S I O N P A P E R
I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive
More informationOPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1
The Annals of Statistics 1997, Vol. 25, No. 6, 2512 2546 OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 By O. V. Lepski and V. G. Spokoiny Humboldt University and Weierstrass Institute
More informationStatistical Inverse Problems and Instrumental Variables
Statistical Inverse Problems and Instrumental Variables Thorsten Hohage Institut für Numerische und Angewandte Mathematik University of Göttingen Workshop on Inverse and Partial Information Problems: Methodology
More informationConvergence rates of spectral methods for statistical inverse learning problems
Convergence rates of spectral methods for statistical inverse learning problems G. Blanchard Universtität Potsdam UCL/Gatsby unit, 04/11/2015 Joint work with N. Mücke (U. Potsdam); N. Krämer (U. München)
More informationMinimax Goodness-of-Fit Testing in Ill-Posed Inverse Problems with Partially Unknown Operators
Minimax Goodness-of-Fit Testing in Ill-Posed Inverse Problems with Partially Unknown Operators Clément Marteau, Institut Camille Jordan, Université Lyon I - Claude Bernard, 43 boulevard du novembre 98,
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.
More informationEmpirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods
Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry,
More informationUnbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods
Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry,
More informationGaussian Estimation under Attack Uncertainty
Gaussian Estimation under Attack Uncertainty Tara Javidi Yonatan Kaspi Himanshu Tyagi Abstract We consider the estimation of a standard Gaussian random variable under an observation attack where an adversary
More informationConcentration behavior of the penalized least squares estimator
Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar
More information1. Introduction 1.1. Model. In this paper, we consider the following heteroscedastic white noise model:
Volume 14, No. 3 005), pp. Allerton Press, Inc. M A T H E M A T I C A L M E T H O D S O F S T A T I S T I C S BAYESIAN MODELLING OF SPARSE SEQUENCES AND MAXISETS FOR BAYES RULES V. Rivoirard Equipe Probabilités,
More informationIntroduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be
Wavelet Estimation For Samples With Random Uniform Design T. Tony Cai Department of Statistics, Purdue University Lawrence D. Brown Department of Statistics, University of Pennsylvania Abstract We show
More informationA talk on Oracle inequalities and regularization. by Sara van de Geer
A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationEQUIVALENCE OF TOPOLOGIES AND BOREL FIELDS FOR COUNTABLY-HILBERT SPACES
EQUIVALENCE OF TOPOLOGIES AND BOREL FIELDS FOR COUNTABLY-HILBERT SPACES JEREMY J. BECNEL Abstract. We examine the main topologies wea, strong, and inductive placed on the dual of a countably-normed space
More informationON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS
ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS Olivier Scaillet a * This draft: July 2016. Abstract This note shows that adding monotonicity or convexity
More informationarxiv: v2 [math.st] 30 Mar 2018
State-by-state Minimax HMM Estimation State-by-state Minimax Adaptive Estimation for Nonparametric Hidden Marov Models arxiv:1706.0877v [math.st] 30 Mar 018 Luc Lehéricy Laboratoire de Mathématiques d
More informationNonlinear error dynamics for cycled data assimilation methods
Nonlinear error dynamics for cycled data assimilation methods A J F Moodey 1, A S Lawless 1,2, P J van Leeuwen 2, R W E Potthast 1,3 1 Department of Mathematics and Statistics, University of Reading, UK.
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationClosest Moment Estimation under General Conditions
Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids
More informationA Lower Bound Theorem. Lin Hu.
American J. of Mathematics and Sciences Vol. 3, No -1,(January 014) Copyright Mind Reader Publications ISSN No: 50-310 A Lower Bound Theorem Department of Applied Mathematics, Beijing University of Technology,
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More information19.1 Problem setup: Sparse linear regression
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In
More informationInequalities Relating Addition and Replacement Type Finite Sample Breakdown Points
Inequalities Relating Addition and Replacement Type Finite Sample Breadown Points Robert Serfling Department of Mathematical Sciences University of Texas at Dallas Richardson, Texas 75083-0688, USA Email:
More informationBayesian Nonparametric Point Estimation Under a Conjugate Prior
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda
More informationEstimation of a quadratic regression functional using the sinc kernel
Estimation of a quadratic regression functional using the sinc kernel Nicolai Bissantz Hajo Holzmann Institute for Mathematical Stochastics, Georg-August-University Göttingen, Maschmühlenweg 8 10, D-37073
More informationInverse Statistical Learning
Inverse Statistical Learning Minimax theory, adaptation and algorithm avec (par ordre d apparition) C. Marteau, M. Chichignoud, C. Brunet and S. Souchet Dijon, le 15 janvier 2014 Inverse Statistical Learning
More informationPARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS. Yngve Selén and Erik G. Larsson
PARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS Yngve Selén and Eri G Larsson Dept of Information Technology Uppsala University, PO Box 337 SE-71 Uppsala, Sweden email: yngveselen@ituuse
More informationAdditive Isotonic Regression
Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationDiscussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan
Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are
More informationUncertainty Quantification for Inverse Problems. November 7, 2011
Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance
More informationORACLE INEQUALITY FOR A STATISTICAL RAUS GFRERER TYPE RULE
ORACLE INEQUALITY FOR A STATISTICAL RAUS GFRERER TYPE RULE QINIAN JIN AND PETER MATHÉ Abstract. We consider statistical linear inverse problems in Hilbert spaces. Approximate solutions are sought within
More informationThe properties of L p -GMM estimators
The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion
More informationClosest Moment Estimation under General Conditions
Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest
More informationASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin
The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian
More informationModel selection theory: a tutorial with applications to learning
Model selection theory: a tutorial with applications to learning Pascal Massart Université Paris-Sud, Orsay ALT 2012, October 29 Asymptotic approach to model selection - Idea of using some penalized empirical
More informationON THE DETECTION OF CHANGE-POINTS IN STRUCTURAL DEFORMATION ANALYSIS
ON THE DETECTION OF CHANGE-POINTS IN STRUCTURAL DEFORMATION ANALYSIS Hans Neuner, Hansjörg Kutterer Geodetic Institute University of Hanover Email: neuner@~; utterer@gih.uni-hannover.de Abstract: In structural
More informationOptimal design for inverse problems
School of Mathematics and Statistical Sciences Research Institute University of Southampton, UK Joint work with Nicolai Bissantz, Holger Dette (both Ruhr-Universität Bochum) and Edmund Jones (University
More informationMaximum Likelihood Diffusive Source Localization Based on Binary Observations
Maximum Lielihood Diffusive Source Localization Based on Binary Observations Yoav Levinboo and an F. Wong Wireless Information Networing Group, University of Florida Gainesville, Florida 32611-6130, USA
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationMultiresolution analysis by infinitely differentiable compactly supported functions. N. Dyn A. Ron. September 1992 ABSTRACT
Multiresolution analysis by infinitely differentiable compactly supported functions N. Dyn A. Ron School of of Mathematical Sciences Tel-Aviv University Tel-Aviv, Israel Computer Sciences Department University
More informationOn the Goodness-of-Fit Tests for Some Continuous Time Processes
On the Goodness-of-Fit Tests for Some Continuous Time Processes Sergueï Dachian and Yury A. Kutoyants Laboratoire de Mathématiques, Université Blaise Pascal Laboratoire de Statistique et Processus, Université
More informationWavelet Shrinkage for Nonequispaced Samples
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Wavelet Shrinkage for Nonequispaced Samples T. Tony Cai University of Pennsylvania Lawrence D. Brown University
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationRemarks on Bronštein s root theorem
Remarks on Bronštein s root theorem Guy Métivier January 23, 2017 1 Introduction In [Br1], M.D.Bronštein proved that the roots of hyperbolic polynomials (1.1) p(t, τ) = τ m + m p k (t)τ m k. which depend
More informationOn Consistent Hypotheses Testing
Mikhail Ermakov St.Petersburg E-mail: erm2512@gmail.com History is rather distinctive. Stein (1954); Hodges (1954); Hoefding and Wolfowitz (1956), Le Cam and Schwartz (1960). special original setups: Shepp,
More informationStatistical Measures of Uncertainty in Inverse Problems
Statistical Measures of Uncertainty in Inverse Problems Workshop on Uncertainty in Inverse Problems Institute for Mathematics and Its Applications Minneapolis, MN 19-26 April 2002 P.B. Stark Department
More informationOPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE
17th European Signal Processing Conference (EUSIPCO 009) Glasgow, Scotland, August 4-8, 009 OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE Abdourrahmane M. Atto 1, Dominique Pastor, Gregoire Mercier
More informationMaximization of Submodular Set Functions
Northeastern University Department of Electrical and Computer Engineering Maximization of Submodular Set Functions Biomedical Signal Processing, Imaging, Reasoning, and Learning BSPIRAL) Group Author:
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationComputing a Lower Bound for the Canonical Height on Elliptic Curves over Q
Computing a Lower Bound for the Canonical Height on Elliptic Curves over Q John Cremona 1 and Samir Siksek 2 1 School of Mathematical Sciences, University of Nottingham, University Park, Nottingham NG7
More informationStatistical inference on Lévy processes
Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline
More informationUTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN. EE & CS, The University of Newcastle, Australia EE, Technion, Israel.
UTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN Graham C. Goodwin James S. Welsh Arie Feuer Milan Depich EE & CS, The University of Newcastle, Australia 38. EE, Technion, Israel. Abstract:
More informationAsymptotic Filtering and Entropy Rate of a Hidden Markov Process in the Rare Transitions Regime
Asymptotic Filtering and Entropy Rate of a Hidden Marov Process in the Rare Transitions Regime Chandra Nair Dept. of Elect. Engg. Stanford University Stanford CA 94305, USA mchandra@stanford.edu Eri Ordentlich
More informationA line-search algorithm inspired by the adaptive cubic regularization framework with a worst-case complexity O(ɛ 3/2 )
A line-search algorithm inspired by the adaptive cubic regularization framewor with a worst-case complexity Oɛ 3/ E. Bergou Y. Diouane S. Gratton December 4, 017 Abstract Adaptive regularized framewor
More informationarxiv: v3 [stat.me] 23 Jul 2016
Functional mixed effects wavelet estimation for spectra of replicated time series arxiv:1604.01191v3 [stat.me] 23 Jul 2016 Joris Chau and Rainer von achs Institut de statistique, biostatistique et sciences
More informationProbabilistic analysis of the asymmetric digital search trees
Int. J. Nonlinear Anal. Appl. 6 2015 No. 2, 161-173 ISSN: 2008-6822 electronic http://dx.doi.org/10.22075/ijnaa.2015.266 Probabilistic analysis of the asymmetric digital search trees R. Kazemi a,, M. Q.
More informationChi-square lower bounds
IMS Collections Borrowing Strength: Theory Powering Applications A Festschrift for Lawrence D. Brown Vol. 6 (2010) 22 31 c Institute of Mathematical Statistics, 2010 DOI: 10.1214/10-IMSCOLL602 Chi-square
More informationKonstantinos Chrysafinos 1 and L. Steven Hou Introduction
Mathematical Modelling and Numerical Analysis Modélisation Mathématique et Analyse Numérique Will be set by the publisher ANALYSIS AND APPROXIMATIONS OF THE EVOLUTIONARY STOKES EQUATIONS WITH INHOMOGENEOUS
More informationLeast Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006
Least Squares Model Averaging Bruce E. Hansen University of Wisconsin January 2006 Revised: August 2006 Introduction This paper developes a model averaging estimator for linear regression. Model averaging
More informationMinimax Risk: Pinsker Bound
Minimax Risk: Pinsker Bound Michael Nussbaum Cornell University From: Encyclopedia of Statistical Sciences, Update Volume (S. Kotz, Ed.), 1999. Wiley, New York. Abstract We give an account of the Pinsker
More informationOn the simplest expression of the perturbed Moore Penrose metric generalized inverse
Annals of the University of Bucharest (mathematical series) 4 (LXII) (2013), 433 446 On the simplest expression of the perturbed Moore Penrose metric generalized inverse Jianbing Cao and Yifeng Xue Communicated
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More information= E [ξ ϕ ξ ψ ] satises 1
ITERATIVELY REGULARIZED GAUSS-NEWTON METHOD FOR NONLINEAR INVERSE PROBLEMS WITH RANDOM NOISE FRAN BAUER 1, THORSTEN HOHAGE 2, AXEL MUN 2 1 FUZZY LOGIC LABORATORIUM LINZ-HAGENBERG, AUSTRIA 2 GEORG-AUGUST
More informationA Randomized Algorithm for the Approximation of Matrices
A Randomized Algorithm for the Approximation of Matrices Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert Technical Report YALEU/DCS/TR-36 June 29, 2006 Abstract Given an m n matrix A and a positive
More informationNonparametric regression with martingale increment errors
S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration
More informationMathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector
On Minimax Filtering over Ellipsoids Eduard N. Belitser and Boris Y. Levit Mathematical Institute, University of Utrecht Budapestlaan 6, 3584 CD Utrecht, The Netherlands The problem of estimating the mean
More informationON THE BOUNDEDNESS BEHAVIOR OF THE SPECTRAL FACTORIZATION IN THE WIENER ALGEBRA FOR FIR DATA
ON THE BOUNDEDNESS BEHAVIOR OF THE SPECTRAL FACTORIZATION IN THE WIENER ALGEBRA FOR FIR DATA Holger Boche and Volker Pohl Technische Universität Berlin, Heinrich Hertz Chair for Mobile Communications Werner-von-Siemens
More informationA CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE
Journal of Applied Analysis Vol. 6, No. 1 (2000), pp. 139 148 A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE A. W. A. TAHA Received
More informationAsymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data
Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations
More informationMATH 205C: STATIONARY PHASE LEMMA
MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)
More informationIMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES
IMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES Bere M. Gur Prof. Christopher Niezreci Prof. Peter Avitabile Structural Dynamics and Acoustic Systems
More informationPareto approximation of the tail by local exponential modeling
Pareto approximation of the tail by local exponential modeling Ion Grama Université de Bretagne Sud rue Yves Mainguy, Tohannic 56000 Vannes, France email: ion.grama@univ-ubs.fr Vladimir Spokoiny Weierstrass
More informationEE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)
EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in
More informationLARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011
LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic
More informationRenormings of c 0 and the minimal displacement problem
doi: 0.55/umcsmath-205-0008 ANNALES UNIVERSITATIS MARIAE CURIE-SKŁODOWSKA LUBLIN POLONIA VOL. LXVIII, NO. 2, 204 SECTIO A 85 9 ŁUKASZ PIASECKI Renormings of c 0 and the minimal displacement problem Abstract.
More informationA Note on Nonconvex Minimax Theorem with Separable Homogeneous Polynomials
A Note on Nonconvex Minimax Theorem with Separable Homogeneous Polynomials G. Y. Li Communicated by Harold P. Benson Abstract The minimax theorem for a convex-concave bifunction is a fundamental theorem
More informationNecessary conditions for convergence rates of regularizations of optimal control problems
Necessary conditions for convergence rates of regularizations of optimal control problems Daniel Wachsmuth and Gerd Wachsmuth Johann Radon Institute for Computational and Applied Mathematics RICAM), Austrian
More informationAsymptotics of minimax stochastic programs
Asymptotics of minimax stochastic programs Alexander Shapiro Abstract. We discuss in this paper asymptotics of the sample average approximation (SAA) of the optimal value of a minimax stochastic programming
More informationEstimation of parametric functions in Downton s bivariate exponential distribution
Estimation of parametric functions in Downton s bivariate exponential distribution George Iliopoulos Department of Mathematics University of the Aegean 83200 Karlovasi, Samos, Greece e-mail: geh@aegean.gr
More informationApproximation of BSDEs using least-squares regression and Malliavin weights
Approximation of BSDEs using least-squares regression and Malliavin weights Plamen Turkedjiev (turkedji@math.hu-berlin.de) 3rd July, 2012 Joint work with Prof. Emmanuel Gobet (E cole Polytechnique) Plamen
More informationAn Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems
An Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems Justin Romberg Georgia Tech, School of ECE ENS Winter School January 9, 2012 Lyon, France Applied and Computational
More informationAsymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals
Acta Applicandae Mathematicae 78: 145 154, 2003. 2003 Kluwer Academic Publishers. Printed in the Netherlands. 145 Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals M.
More informationTaylor and Laurent Series
Chapter 4 Taylor and Laurent Series 4.. Taylor Series 4... Taylor Series for Holomorphic Functions. In Real Analysis, the Taylor series of a given function f : R R is given by: f (x + f (x (x x + f (x
More informationDecision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over
Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we
More informationSome Properties of the Augmented Lagrangian in Cone Constrained Optimization
MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented
More informationOptimal rates of convergence for estimating Toeplitz covariance matrices
Probab. Theory Relat. Fields 2013) 156:101 143 DOI 10.1007/s00440-012-0422-7 Optimal rates of convergence for estimating Toeplitz covariance matrices T. Tony Cai Zhao Ren Harrison H. Zhou Received: 17
More informationGaussian Processes. 1. Basic Notions
Gaussian Processes 1. Basic Notions Let T be a set, and X : {X } T a stochastic process, defined on a suitable probability space (Ω P), that is indexed by T. Definition 1.1. We say that X is a Gaussian
More informationBOUNDARY VALUE PROBLEMS OF A HIGHER ORDER NONLINEAR DIFFERENCE EQUATION
U.P.B. Sci. Bull., Series A, Vol. 79, Iss. 4, 2017 ISSN 1223-7027 BOUNDARY VALUE PROBLEMS OF A HIGHER ORDER NONLINEAR DIFFERENCE EQUATION Lianwu Yang 1 We study a higher order nonlinear difference equation.
More informationStatistical regularization theory for Inverse Problems with Poisson data
Statistical regularization theory for Inverse Problems with Poisson data Frank Werner 1,2, joint with Thorsten Hohage 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical
More informationarxiv: v2 [math.st] 12 Feb 2008
arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig
More informationWHEN MODIFIED GRAM-SCHMIDT GENERATES A WELL-CONDITIONED SET OF VECTORS
IMA Journal of Numerical Analysis (2002) 22, 1-8 WHEN MODIFIED GRAM-SCHMIDT GENERATES A WELL-CONDITIONED SET OF VECTORS L. Giraud and J. Langou Cerfacs, 42 Avenue Gaspard Coriolis, 31057 Toulouse Cedex
More informationRobust error estimates for regularization and discretization of bang-bang control problems
Robust error estimates for regularization and discretization of bang-bang control problems Daniel Wachsmuth September 2, 205 Abstract We investigate the simultaneous regularization and discretization of
More informationA note on error estimation for hypothesis testing problems for some linear SPDEs
Stoch PDE: Anal Comp 214 2:48 431 DOI 1.17/s472-14-36-4 A note on error estimation for hypothesis testing problems for some linear SPDEs Igor Cialenco Liaosha Xu Received: 17 December 213 / Published online:
More information