A Data-Driven Block Thresholding Approach To Wavelet Estimation

Size: px

Start display at page:

Download "A Data-Driven Block Thresholding Approach To Wavelet Estimation"

Luke Parks
6 years ago
Views:

1 A Data-Driven Block Thresholding Approach To Wavelet Estimation T. Tony Cai 1 and Harrison H. Zhou University of Pennsylvania and Yale University Abstract A data-driven block thresholding procedure for wavelet regression is proposed and its theoretical and numerical properties are investigated. The procedure empirically chooses the block size and threshold level at each resolution level by minimizing Stein s unbiased risk estimate. The estimator is sharp adaptive over a class of Besov bodies and achieves simultaneously within a small constant factor of the minimax risk over a wide collection of Besov Bodies including both the dense and sparse cases. The procedure is easy to implement. Numerical results show that it has superior finite sample performance in comparison to the other leading wavelet thresholding estimators. Keywords: Adaptivity; Besov body; Block thresholding; James-Stein estimator; Nonparametric regression; Stein s unbiased risk estimate; Wavelets. AMS 1991 Subject Classification: Primary 6G07, Secondary 6G0. 1 Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA The research of Tony Cai was supported in part by NSF Grants DMS and DMS Department of Statistics, Yale University, New Haven, CT

2 1 Introduction Consider the nonparametric regression model: y i = ft i + σ z i, i = 1,,..., n, 1 where t i = i/n, σ is the noise level and z i s are independent standard normal random variables. The goal is to estimate the unknown regression function f based on the sample {y i }. Wavelet methods have demonstrated considerable success in nonparametric regression. They achieve a high degree of adaptivity through thresholding of the empirical wavelet coefficients. Standard wavelet approaches threshold the empirical coefficients term by term based on their individual magnitudes. See, for example, Donoho and Johnstone 1994a, Gao 1998, and Antoniadis and Fan 001. More recent work has demonstrated that block thresholding, which simultaneously keeps or kills all the coefficients in groups rather than individually, enjoys a number of advantages over the conventional term-by-term thresholding. Block thresholding increases estimation precision by utilizing information about neighboring wavelet coefficients and allows the balance between variance and bias to be varied along the curve which results in adaptive smoothing. The degree of adaptivity, however, depends on the choice of block size and threshold level. The idea of block thresholding can be traced back to Efromovich 1985 in orthogonal series estimators. In the context of wavelet estimation, global level-by-level thresholding was discussed in Donoho and Johnstone 1995 for regression and in Kerkyacharian, Picard and Tribouley 1996 for density estimation. But these block thresholding methods are not local, so they do not enjoy a high degree of spatial adaptivity. Hall, Kerkyacharian and Picard 1999 introduced a local blockwise hard thresholding procedure with a block size of the order log n where n is the sample size. Cai 1999 considered blockwise James- Stein rules and investigated the effect of block size and threshold level on adaptivity using an oracle inequality approach. In particular it was shown that a block size of order log n is optimal in the sense that it leads to an estimator which is both globally and locally adaptive. Cai and Silverman 001 considered overlapping block thresholding estimators and Chicken and Cai 004 applied block thresholding to density estimation. The block size and threshold level play important roles in the performance of a block thresholding estimator. The local block thresholding methods mentioned above all have fixed block size and threshold and same thresholding rule is applied to all resolution levels regardless of the distribution of the wavelet coefficients. In the present paper, we propose a data-driven approach to empirically select both the block size and threshold at individual

3 resolution levels. At each resolution level, the procedure, SureBlock, chooses the block size and threshold empirically by minimizing Stein s Unbiased Risk Estimate SURE. By empirically selecting both the block size and threshold and allowing them to vary from resolution level to resolution level, the SureBlock estimator has significant advantages over the more conventional wavelet thresholding estimators with fixed block sizes. We consider in the present paper both the numerical performance and asymptotic properties of the SureBlock estimator. It is shown that the minimizer of SURE yields a nearoptimal estimate of the wavelet coefficient vector at each resolution level which in turn generates an adaptive nonparametric estimator of the regression function f. The theoretical properties of the SureBlock estimator are considered in the Besov sequence space formulation that is by now classical for the analysis of wavelet regression methods. Besov spaces, denoted by Bp,q α and defined in Section 4, are a very rich class of function spaces which contain functions of homogeneous and inhomogeneous smoothness. The theoretical results show that the SureBlock estimator automatically adapts to the sparsity of the underlying wavelet coefficient sequence and enjoys excellent adaptivity over a wide range of Besov bodies. In particular, in the dense case p the SureBlock estimator is sharp adaptive over all Besov bodies Bp,qM α with p = q = and adaptively achieves within a factor of 1.5 of the minimax risk over Besov bodies Bp,qM α for all p, q. At the same time the SureBlock estimator achieves simultaneously within a constant factor of the minimax risk over a wide collection of Besov bodies Bp,qM α in the sparse case p <. These properties are not shared simultaneously by many commonly used fixed block size procedures such as VisuShrink Donoho and Johnstone 1994a, SureShrink Donoho and Johnstone 1995 or BlockJS Cai The SureBlock estimator is completely data-driven and easy to implement. A simulation study is carried out and the numerical findings reconfirm the theoretical results. The numerical results show that SureBlock has superior finite sample performance in comparison to the other leading wavelet estimators. More specifically, SureBlock uniformly outperforms both VisuShrink and SureShrink Donoho and Johnstone, 1994a and 1995 in all 4 simulation cases in terms of the average squared error. SureBlock procedure is better than BlockJS Cai 1999 in 37 out of 4 cases. The paper is organized as follows. After Section in which basic notation and block thresholding methods are briefly reviewed, we introduce in Section 3 the SureBlock procedure. Asymptotic properties of the SureBlock estimator are presented in Section 4. Numerical implementation and finite-sample performance of the SureBlock estimator are then discussed in Section 5. The numerical performance of SureBlock is compared with 3

4 those of VisuShrink Donoho and Johnstone 1994a, SureShrink Donoho and Johnstone 1995 and BlockJS Cai The proofs are given in Section 6. Wavelets And Block Thresholding Let {φ, ψ} be a pair of father and mother wavelets. The functions φ and ψ are assumed to be compactly supported and φ = 1. Dilation and translation of φ and ψ generate an orthonormal wavelet basis with an associated orthogonal Discrete Wavelet Transform DWT which transforms sampled data into the wavelet coefficient domain. ψ is called r-regular if ψ has r vanishing moments and r continuous derivatives. A wavelet Daubechies 199 and Strang 199 for details on the discrete wavelet transform and compactly supported wavelets. Let φ j,k x = j φ j x k, and ψ j,k x = j ψ j x k. For a given square-integrable function f on [0, 1], define the wavelet coefficients of the wavelet expansion of f by ξ j,k = f, φ j,k, θ j,k = f, ψ j,k. Suppose we observe the noisy data Y = y 1, y,..., y n as in 1 and suppose the sample size n = J for some integer J > 0. We use the standard device of the discrete wavelet transform to turn the function estimation problem into a problem of estimating wavelet coefficients. Let Ỹ = W Y be the discrete wavelet transform of Y. Then Ỹ can be written as Ỹ = ξ j0,1,, ξ j0 j 0, ỹ j0,1,, ỹ j0, j 0,, ỹ J 1,1,, ỹ J 1, J 1. where j 0 is some fixed primary resolution level. Here ξ j0,k are the gross structure terms at the lowest resolution level, and ỹ j,k are the empirical wavelet coefficients at level j which represent fine structure at scale j. Since the DWT is an orthogonal transform, the ỹ j,k are independent normal random variables with standard deviation σ. Note that the mean of n 1 ỹ j,k equals, up to some small approximation errors, the wavelet coefficient θ j,k of f. See Daubechies 199. Through the discrete wavelet transform the nonparametric regression problem is then turned into a problem of estimating a high dimensional normal mean vector. A block thresholding procedure thresholds wavelet coefficients in groups and makes simultaneous decisions on all the coefficients within a block. See Fix a block size L and a threshold level λ and divide the empirical wavelet coefficients ỹ j,k at any given resolution level j into nonoverlapping blocks of size L. Denote jb the indices of the coefficients in the b-th block at level j, i.e. jb = {j, k : b 1L + 1 k bl}. Let S jb = k jb ỹ j,k 4

5 denote the sum of squared empirical wavelet coefficients in the block. A block jb is deemed important if Sjb is larger than the threshold λ and then all the coefficients in the block are retained; otherwise the block is considered negligible and all the coefficients in the block are estimated by zero. The most commonly used rules for a given block are the James-Stein rule and the hard thresholding rule θ j,k = ỹ j,k 1 λ Sjb + for j, k jb 3 θ j,k = ỹ j,k IS jb > λ, for j, k jb. 4 The estimator of {ft i, i = 1,,..., n} is then given by the inverse discrete wavelet transform of the thresholded wavelet coefficients. Block thresholding estimators depend on the choice of the block size L and threshold level λ which largely determines the performance of the resulting estimator. important to choose L and λ in an optimal way. It is thus 3 The SureBlock Procedure As discussed in the previous section the nonparametric regression problem of estimating f can be turned into a problem of estimating a high dimensional normal mean vector by applying the orthogonal discrete wavelet transform to the data. We thus begin in this section by considering estimation of the mean of a multivariate normal random variable. Suppose that we observe x i = θ i + σz i, i = 1,,..., d 5 where z i are independent standard normal random variables and σ is known. Here x = x 1,, x d represents the observations in one resolution level in wavelet regression. Without loss of generality we assume in this section σ = 1. We wish to estimate θ = θ 1,, θ d based on the observations x = x 1,, x d under the average mean squared error: R θ, θ = 1 d d E θ i θ i. 6 i=1 We shall estimate θ by a blockwise James-Stein estimator with block size L and threshold level λ chosen empirically by minimizing Stein s Unbiased Risk Estimate SURE. Let L 1 be the possible length of each block, and m = d/l be the number of blocks. 5

6 For simplicity we shall assume that d is divisible by L in the following discussion. Let x b = x b 1L+1,..., x bl represent observations in the b-th block, and similarly θ b = θb 1L+1,..., θ bl and z b = z b 1L+1,..., z bl. Let S b = x b for b = 1,,..., m. The Blockwise James-Stein estimator is given by θ b λ, L = 1 λ x Sb b, + b = 1,,..., m 7 where λ 0 is the threshold level. Write θ b λ, L = x b + gx b, where g is a function from IR L to IR L. Stein 1981 showed that when g is weakly differentiable, then In our case, E θ b θ b λ, L θ b = E θ b { L + g + g }. g x b = 1 λ x Sb b x b + is weakly differentiable. Simple calculations show E θ b θ b λ, L θ b = E θ b SURE x b, λ, L where SURE x b, λ, L = L + λ λ L I S Sb b > λ + Sb L I Sb λ. 8 This implies that the total risk E θ θλ, L θ = E θ SURE x, λ, L where SURE x, λ, L = m SURE x b, λ, L 9 b=1 is an unbiased estimate of risk. i.e. We choose the block size L S and threshold level λ S to be the minimizer of SURE x, λ, L, λ S, L S = arg minsure x, λ, L. λ, L The estimator of θ is then given by 7 with L = L S and λ = λ S. The SURE Block James-Stein estimator has some drawbacks in situations of extreme sparsity of wavelet coefficients. We propose to use a hybrid scheme similar to Donoho and Johnstone 1995 and Johnstone The hybrid method works as follows. Set T d = d 1 x i 1, γ d = d 1 log 3 d and λ F = L log d. For a fixed 0 v < 1 let λ, L denote the minimizers of SURE with an additional restriction on the search range : λ, L = arg min max{l,0} λ λ F,1 L d v SURE x, λ, L. 10 6

7 Define the estimator θ x of θ by θ b = θ b λ, L, if T d > γ d, 11 and θ i = 1 log d x x i, if T d γ d. 1 i + When T d γ d the estimator is a degenerate Block James-Stein estimator with block size L = 1. In this case the estimator is also called the non-negative garrote estimator. See Breiman 1995 and Gao The SURE approach has also been used for the selection of the threshold level for fixed block size procedures, term-by-term thresholding L = 1 in Donoho and Johnstone 1995 and block thresholding L = log n in Chicken 004. We now return to the nonparametric regression model 1. As in Section denote by Ỹ j = {ỹ j,k : k = 1,..., j } and θ j = {θ j,k : k = 1,..., j } the empirical and true wavelet coefficients of the regression function f at resolution level j. We apply the hybrid SURE Block James-Stein procedure to the empirical wavelet coefficients Ỹ j level by level and then use the inverse discrete wavelet transform to obtain the estimate of the regression function. More specifically the procedure for wavelet regression, called SureBlock, has the following steps. 1. Transform the data into the wavelet domain via the discrete wavelet transform: Ỹ = W Y.. At each resolution level j, Select the block size L j and threshold level λ j by minimizing Stein s unbiased risk estimate and then estimate the wavelet coefficients by the hybrid Block James-Stein rule. More precisely, for j 0 j < J λ j, L j = arg min max{l,0} λ λ F,1 L vj SURE σ 1 Ỹ j, λ, L where Ỹ j is the empirical wavelet coefficient vector at resolution level j, and 13 θ j = σ θ σ 1 Ỹ j 14 where θ is the hybrid Block James-Stein estimator given in 11 and Obtain the estimate of the function via the inverse discrete wavelet transform of the denoised wavelet coefficients: f = W 1 Θ. 7

8 Theoretical results given in Section 4 show that the pair L j, λ j is asymptotically optimal in the sense that the resulting block thresholding estimator adaptively attains the exact minimax block thresholding risk asymptotically. Note that, in contrast to some other block thresholding methods, both the block size and threshold level of the SureBlock estimator are chosen empirically and vary from resolution level to resolution level. Both the theoretical and numerical results given in the next two sections show that the SureBlock estimator outperforms wavelet thresholding estimators with a fixed block size. 4 Theoretical Properties of SureBlock In this section we turn to the theoretical properties of the SureBlock estimator in the Besov sequence space formulation that is by now classical for the analysis of wavelet regression methods. The asymptotic results show that the SureBlock procedure is strongly adaptive. Besov spaces are a very rich class of function spaces and contain as special cases many traditional smoothness spaces such as Hölder and Sobolev Spaces. Roughly speaking, the Besov space B α p,q contains functions having α bounded derivatives in L p norm, the third parameter q gives a finer gradation of smoothness. Full details of Besov spaces are given, for example, in Triebel 1983 and DeVore and Popov For a given r-regular mother wavelet ψ with r > α and a fixed primary resolution level j 0, the Besov sequence norm b α p,q of the wavelet coefficients of a function f is then defined by 1 q f b α p,q = ξ j0 p + j=j 0 js θ j p q where ξ j0 is the vector of the father wavelet coefficients at the primary resolution level j 0, θ j is the vector of the wavelet coefficients at level j, and s = α > 0. Note that the p Besov function norm of index α, p, q of a function f is equivalent to the sequence norm 15 of the wavelet coefficients of the function. See Meyer 199. We shall focus our theoretical discussion to a sequence space version of the SureBlock estimator. Suppose that n = J for some integer J and that we observe sequence data 15 y j,k = θ j,k + n 1 σzj,k, j j 0, k = 1,,, j 16 where z j,k are independent standard normal random variables. We wish to estimate the mean array θ under the expected squared error Rˆθ, θ = E j,k θ j,k θ j,k = E θ θ. 8

9 Define the Besov body B α p,qm by B α p,qm = {θ : j=j 0 js θ j p q 1/q M} 17 where, as usual, s = α > 0. The minimax risk of estimating θ over the Besov body p Bp,qM α is R B α p,qm = inf θ sup E θ θ. 18 θ Bp,qM α Donoho and Johnstone 1998 show that the minimax risk R B α p,qm converges to 0 at the rate of n α/1+α as n. We apply the SureBlock procedure of Section 3 to the array of empirical coefficients y j,k for j 0 j < J, to obtain estimated wavelet coefficients θ j,k. Set θ j,k = 0 for j J. The minimax risk among all Block James-Stein estimators with all possible block sizes 1 L j jv with 0 v < 1 and threshold levels λ i 0 is RT Bp,qM α = inf sup E θ j λ j, L j θ j. 19 λ j 0,1 L j jv θ Bp,qM α j=j 0 We shall call R T Bα p,qm the minimax block thresholding risk. It is clear that R T Bα p,qm R B α p,qm. On the other hand, Theorems and 3 below show R T Bα p,qm is within a small constant factor of the minimax risk R B α p,qm. The following theorem shows that the SureBlock estimator adaptively attains the exact minimax block thresholding risk R T Bα p,qm asymptotically over a wide range of Besov bodies. Theorem 1 Suppose we observe the sequence model 16. Let ˆθ be the SureBlock estimator of θ defined in 11 and 1. Then sup E θ θ θ RT Bp,qM α 1 + o 1 0 θ Bp,qM α for 1 p, q, 0 < M <, and α > 1 1 v 1 p Theorems and 3 below make it clear that the SureBlock procedure is indeed nearly optimally adaptive over a wide collection of Besov bodies B α p,qm including both the dense p and sparse p < cases. The estimator is asymptotically sharp adaptive over Besov bodies with p = q = in the sense that it adaptively attains both the optimal rate and optimal constant. Over Besov bodies with p and q SureBlock adaptively achieves 9

10 within a factor of 1.5 of the minimax risk. At the same time the maximum risk of the estimator is simultaneously within a constant factor of the minimax risk over a collection of Besov bodies B α p,qm in the sparse case of p <. Theorem i. The SureBlock estimator is adaptively sharp minimax over Besov bodies B α,m for all M > 0 and α > v. That is, 1 v sup E θ θ θ θ B, α M R B,M α 1 + o 1 1 ii. The SureBlock estimator is adaptively, asymptotically within a factor of 1.5 of the minimax risk over Besov bodies B α p,qm, sup E θ θ θ 1.5R Bp,qM α 1 + o 1 θ Bp,qM α for all p, q, M > 0 and α > v/1 v. For the sparse case p <, the SureBlock estimator is also simultaneously within a small constant factor of the minimax risk. Theorem 3 The SureBlock estimator is asymptotically minimax up to a constant factor G p q over a large range of Besov bodies with 1 p, q, 0 < M <, and α > v p That is, sup E θ θ θ G p q R Bp,qM α 1 + o 1 3 θ Bp,qM α where G p q is a constant depending only on p q. 4.1 Analysis of a Single Resolution Level The proof of the main results given above relies on a detailed risk analysis of the SureBlock procedure on a single resolution level. In particular we derive the following upper bound for the risk of SureBlock at a given resolution level. The result may also be of independent interest for the problem of estimation of a multivariate normal mean. Consider the sequence model 5 where x i = θ i + z i, i = 1,..., d and z i are independent normal N0, 1 variables. Set rλ, L = d 1 E ˆθλ, L θ and let R θ = inf 0 λ,1 L dvr λ, L = inf max{l,0} λ,1 L dvr λ, L 4 and R F θ = r log d, 1. 5 The following theorem gives upper bounds of the risk of the SureBlock estimator. 10

11 Theorem 4 Let the data {x i, i = 1,,..., d} be given as in 5 and let ˆθ be the SureBlock estimator defined in 11 and 1. Set µ d = θ /d and γ d = d 1 log 3 d. Then a. for any constant η > 0 and uniformly in θ R d, d 1 E θ ˆθ θ R θ + R F θ I µ d 3γ d + c η d η 1 + v ; 6 b. for any constant η > 0 and uniformly in µ d 1γ 3 d, d 1 E θ ˆθ θ R F θ + c η d 1 η 7 where c η is a constant depending only on η. In the proof of Theorem 1, we will see that R θ, the ideal risk, is the dominating term and all other terms in Equations 6 and 7 are negligible. This result shows that the SureBlock procedure mimics the performance of the oracle procedure where the ideal threshold level and the ideal block size are given. Theorem 4 can be regarded as a generalization of Theorem 4 of Donoho and Johnstone1995 from a fixed block size of one to variable block size. This generalization is important because it enables the resulting SureBlock estimator to be not only adaptively rate optimal over a wide collection of Besov bodies across both the dense p and sparse p < cases, but also sharp adaptive over spaces where linear estimators can be asymptotically minimax. This property is not shared by fixed block size procedures such as VisuShrink, SureShrink or BlockJS. The theoretical advantages of the SureBlock estimator over these fixed block size procedures are also confirmed by the numerical results given in the next section. 5 Implementation And Numerical Results We consider in this section the finite sample performance of the SureBlock estimator. The SureBlock procedure is easy to implement. The following result is useful for numerical implementation as well as the derivation of the theoretical results of the SureBlock estimator. Proposition 1 Let x i, i = 1,..., d and SURE x, λ, L be given as in 5 and 9, respectively. Let the block size L be given. Then the minimizer λ of SURE x, λ, L is an element of the set A where A = { x i ; 1 i d } {0}, if L = 1 and A = { S i ; S i L, 1 i m } {L }, if L. 11

12 Proposition 1 shows that for a given block size L it is sufficient to search over the finite set A for the threshold λ which minimizes SURE x, λ, L. The SureBlock procedure first empirically chooses the optimal threshold and block size λ, L as in 10. The values of λ, L depend on the choice of the parameter v which determines the search range for the block size L. In our simulation studies we choose v = 3 4. That is, λ, L = arg min max{l,0} λ λ F,1 L d 3 4 SURE x, λ, L. 8 The theoretical results given in the next section show that this procedure is adaptively within a constant factor of the minimax risk over a wide collection of Besov bodies. The noise level σ is assumed to be known in Section 3. In practice σ needs to be estimated from the data. We use the following robust estimator of σ given in Donoho and Johnstone 1994a. The estimator ˆσ is based on the empirical wavelet coefficients at the highest resolution level, ˆσ = median ỹ J 1,k : 1 k J 1. In this section we compare the numerical performance of SureBlock with those of VisuShrink Donoho and Johnstone 1994a, SureShrink Donoho and Johnstone 1995 and BlockJS Cai VisuShrink thresholds empirical wavelet coefficients individually with a fixed threshold level. SureShrink is a term by term thresholding procedure which selects the threshold at each resolution level by minimizing Stein s unbiased risk estimate. In the simulation, we use the hybrid method proposed in Donoho and Johnstone BlockJS is a block thresholding procedure with a fixed block size log n and a fixed threshold level. Each of these wavelet estimators has been shown to perform well numerically as well as theoretically. For further details see the original papers. Six test functions, representing different level of spatial variability, and various sample sizes, wavelets and signal to noise ratios are used for a systematic comparison of the four wavelet procedures. The test functions are plotted in Figure 3 in Appendix. Sample sizes ranging from n = 56 to n = and signal-to-noise ratios SNR from 3 to 7 were considered. The SNR is the ratio of the standard deviation of the function values to the standard deviation of the noise. Different combinations of wavelets and signal-to-noise ratios yield basically the same results. For reasons of space, we only report in detail the results for one particular case, using Daubechies compactly supported wavelet Symmlet 8 and SNR equal to 7. We use the software package WaveLab for simulations and the procedures MultiVisu for VisuShrink and MultiHybrid for SureShrink in WaveLab 80 are used See wavelab/. 1

13 VisuShrink vs. SureBlock Doppler HeaviSine Bumps Blocks Piece Regular Piece Polynomial SureShrink vs. SureBlock Doppler HeaviSine Bumps Blocks Piece Regular Piece Polynomial BlockJS vs. SureBlock Doppler HeaviSine Bumps Blocks Piece Regular Piece Polynomial Figure 1: The vertical bars represent the ratios of the average squared errors of various estimators to the corresponding average squared error of SureBlock. The higher the bar the better the relative performance of SureBlock. The bars are plotted on a log scale and are truncated at the value of the original ratio. For each signal the bars are ordered from left to right by the sample sizes n = 56 to Table 1 below reports the average squared errors rounded to three significant digits over 50 replications for the four thresholding estimators. A graphical presentation is given in Figure 1. SureBlock consistently outperforms both VisuShrink and SureShrink in all 4 simulation cases in terms of the average squared error. SureBlock procedure is better than BlockJS about 88% of times 37 out of 4 cases. SureBlock fails to dominate BlockJS only for the test function Doppler. For n = the risk ratio of SureShrink to BlockJS is 0.01/ However, SureBlock is almost as good as BlockJS with a risk ratio 0.01/ Although SureBlock does not dominate BlockJS for the test function Doppler, the improvement of SureBlock over BlockJS is significant for other test functions. 13

14 The simulation results show that, by empirically choosing the block size and threshold and allowing them to vary from resolution level to resolution level, the SureBlock estimator has significant numerical advantages over thresholding estimators with fixed block size L = 1 VisuShrink or SureShrink or L = log n BlockJS. These numerical findings reconfirm the theoretical results given in Section 4. n VS SS BJS SB n VS SS BJS SB Blocks Bumps Doppler HeaviSine Piece-Polynomial Piece-Regular Table 1: Average squared errors over 50 replications with SNR = 7. In the table VS stands for VisuShrink, SS for SureShrink, BJS for BlockJS and SB for SureBlock. In addition to the comparison of the overall global performance of the wavelet estimators, it is also instructive to compare the performance at individual resolution levels and to examine the value in the empirical selection of block sizes. Table gives the average 14

15 squared errors at individual resolution levels over 50 replications for Bumps with n = 104 and SNR = 7. Table compares the errors for SureBlock, SureShrink and an estimator, called SureGarrote, which empirically chooses the threshold λ at each level but fixes the block size L = 1. SureBlock consistently outperforms SureGarrote and SureShrink at each resolution level. Both SureGarrote and SureShrink have block size L = 1, the results indicate that the Non-negative garrote shrinkage is slightly better than soft thresholding in this example. Resolution Level j SureBlock SureGarrote SureShrink Table : Average squared errors at individual resolution levels Figure shows a typical result of the SureBlock procedure applied to the noisy Bumps signal. The left panel is the noisy signal; the middle panel is a display of the empirical wavelet coefficients arranged according resolution levels; and the right panel is the Sure- Block reconstruction solid line and the true signal dotted line. In this example the block sizes chosen by SureBlock are, 3, 1, 5, 3, 5 and 1 from the resolution level j = 3 to level j = d1 d d3 d4 d5 d6 d7 s Figure : SureBlock procedure applied to a noisy Bumps signal. 15

16 6 Proofs Throughout this section, without loss of generality, we shall assume the noise level σ = 1. We first prove Theorem 4 and then use it as the main tool to prove Theorem 1. The proofs of Theorems and 3 and Proposition 1 are given later. 6.1 Notation And Preparatory Results Before proving the main theorems, we need to introduce some notation and collect a few technical results. The proofs of some of these preparatory results are long and tedious. For reasons of space these proofs are omitted here. We refer interested readers to Cai and Zhou 005 for the complete proofs. Consider the sequence model 5 with σ = 1. For a given block size L and threshold level λ, set r b λ, L = E θ b θ b λ, L θ b and define r λ, L = 1 d m r b λ, L = ED λ, L b=1 where D λ, L = 1 d m b=1 θ b λ, L θ b. Set R θ = inf r λ, L = inf r λ, L. 9 λ λ F,1 L d v max{l,0} λ λ F,1 L d v The difference between Rθ and Rθ defined in 4 is that the search range for the threshold λ in Rθ is restricted to be at most λ F. The result given below shows that the effect of this restriction is negligible for any block size L. Lemma 1 For any fixed η > 0, there exists a constant C η > 0 such that for all θ R d, R θ R θ C a d a 1. The following simple lemma is adapted from Donoho and Johnstone 1995 and is used the proof of Theorem 4. Lemma Let T d = d 1 x i 1 and µ d = d 1 θ. If γd d/ log d, then sup 1 + µ d P T d γ d = o d 1. µ d 3γ d We also need the following bounds for the loss of the SureBlock estimator and the derivative of the risk function r b λ, L. The first bound is used in the proof of Theorem 1 and the second in the proof of Theorem 4. 16

17 Lemma 3 Let {x i : i = 1,..., d} be given as in 5 with σ = 1. Then and for λ > 0 and L 1, ˆθ θ λ F d + z 30 λ r b λ, L < Finally we develop a key technical result for the proof of Theorem 4. Set U λ, L = 1 d SURE x, λ, L = m λ λ L I Sb > λ + Sb L I Sb λ. d Note that both D λ, L and U λ, L have expectation r λ, L. b=1 The goal is to show that the minimizer λ S, L S of Stein unbiased estimator of risk U λ, L is asymptotically the ideal threshold level and block size. The key step is to show that the following difference d = ED λ S, L S infr λ, L is negligible for max {L, 0} λ λ F and 1 L d v. It is easy to verify that for any two functions g and h defined on the same domain, inf x gx inf x hx sup x gx hx. Hence, Uλ S, L S inf rλ, L = inf Uλ, L inf λ,l λ,l λ,l S b λ,l rλ, L sup Uλ, L rλ, L λ,l and consequently d E D λ S, L S r λ S, L S + r λ S, L S U λ S, L S + U λ S, L S infr λ, L Esup λ,l D λ, L r λ, L + Esup r λ, L U λ, L. 3 λ,l The following proposition provides the upper bounds for the two terms on the RHS of 3 and is a key technical result for the proof of Theorem 4. λ,l Proposition Uniformly in θ R d, for 0 v < 1 we have E θ sup U λ, L r λ, L = o d η 1 + v max{l,0} λ λ F,1 L d v E θ sup D λ, L r λ, L = o d η 1 + v max{l,1/ log d} λ λ F,1 L d v for any 1 v > η > 0, where λ F = L log d. In the proofs we will denote by C a generic constant that may vary from place to place. 17

18 6. Proof of Theorem 4 The proof of Theorem 4 is similar to but more involved than those of Theorem 4 of Donoho and Johnstone1995 and Theorem of Johnstone 1999 because of variable block size. Without loss of generality, we may assume λ 1/ log d since equation 31 implies R θ inf 1/ log d λ,1 L dvr λ, L 1 + C/ log d R θ, which means that the ideal risk over λ 1/ log d is asymptotically the same as the ideal risk over all λ 0. Correspondingly, we shall also restrict the minimizer of U λ, L searched over [1/ log d, instead of over [0, in the SureBlock procedure. Set T d = d 1 x i 1 and γ d = d 1 log 3 d. Define the event A d = {T d γ d } and decompose the risk of SureBlock procedure into two parts, R d θ = d 1 E θ θ θ = R 1,d θ + R,d θ } { where R 1,d θ = d 1 E θ { θ θ I A d and R,d θ = d 1 E θ θ θ I A c d }. We first consider R 1,d θ. On the event A d, the signal is sparse and θ is the nonnegative garrote estimator ˆθ i = 1 log d/x i + x i by the definition of θ in equation 1. Decomposing R 1,d θ further into two parts with either µ d = d 1 θ 3γ d or µ d > 3γ d yields that R 1,d θ R F θ I µ d 3γ d + r 1,d θ where R F θ is the risk of the non-negative garrote estimator and } r 1,d θ = d 1 E θ { θ θ I A d I µ d > 3γ d. Note that on the event A d, ˆθ x d + dγ d and so r 1,d θ d 1 E θ + θ P A d I µ d > 3γ d 1 + µ d P A d = od 1 35 where the last step follows from Lemma. Note that for any η > 0, Lemma 1 yields that for some constant C η > 0 for all θ R d, and Equations 3-34 yield R θ R θ C η d η 1 36 R,d θ R θ d Cd η 1 + v. 37 The proof of part a of the theorem is completed by putting together Equations

19 We now turn to part b. Note first that R 1,d θ R F θ. On the other hand, R,d θ = d 1 E { } 1 θ θ I A c d Cd 1 E θ θ 4 P 1 A c d. To complete the proof of 7, it then suffices to show that under the assumption µ d 1 3 γ d, E θ θ 4 is bounded by a polynomial of d and P A c d decays faster than any polynomial of d 1. Note that in this case θ = dµ d d 1 log 3 d. Since ˆθ x and x i = θ i + z i, E θ θ 4 E θ + θ E x + θ E 4 θ + z 3 θ 4 + 8E z 4 3d log 3 d + 16d + 8d. On the other hand, it follows from Hoeffding s inequality and Mill s inequality that P A c d = P d 1 z i + z i θ i + θ i 1 > d 1 3 log d P d 1 zi 1 > d log d + P d 1 z i θ i > d log d exp C log 3 d + 1 exp C log 3 d/µ d which decays faster than any polynomial of d Proof of Theorem 1 We now use Theorem 4 as one of the main technical tools to prove Theorem 1. Again we set σ = 1. To more conveniently use the results given in Theorem 4, we shall renormalize the model 16 so that the noise level is 1. Multiplying by a factor of n 1 on both sides, the sequence model 16 becomes y j,k = θ j,k + z j,k, j j 0, k = 1,,, j 38 where y j,k = n 1 y j,k and θ j,k = n 1 θ j,k. Note that E θ ˆθ θ = n 1 E θ ˆθ θ where the expectation on the left side is under model 16 and the right side under the renormalized model 38. We shall use the sequence model 38 in the proof below. Fix 0 < ε 0 < 1/1 + α and let J 0 be the largest integer satisfying J 0 n ε 0. Write n 1 E θ ˆθ θ = + + n 1 E θ ˆθ θ j J 1 j J 0 k J 0 j<j 1 k k = S 1 + S + S 3 19

20 where J 1 > J 0 is to be chosen later. Since 0 < ε 0 < 1/1+α, n ε 0 n 1 log n = o n α/1+α. It then follows from equation 30 in Lemma 3 that S 1 Cλ F n ε 0 n 1 = o n α 1+α which is negligible relative to the minimax risk. On the other hand, it follows from 6 in Theorem 4 that S n 1 j R θ j J 0 j<j 1 = S 1 + S + S 3 + n 1 j R F θ j J 0 j<j 1 I µ j 3γ j + where µ j = j θ j = j n θ j and γ j = j j 3. It is obvious that S 1 = J 0 j<j 1 n 1 c η jη+1/+v/ J 0 j<j 1 n 1 j Rθ j R T B α p,qm 39 and R T Bα p,qm R B α p,qm Cn α/1+α for some constant C > 0. We shall see that both S and S 3 are negligible relative to the minimax risk. Note that S 3 = J 0 j<j 1 n 1 c η jη+1/+v/ Cn 1 J1η+1/+v/. 40 On the other hand, it follows from the Oracle Inequality 3.10 in Cai 1999 with L = 1 and λ = λ F = log j that j j R F θ j nθj,k λ F + 8 log j 1 j n θ j + 8 log j 1 j. 41 j=1 Recall that µ j = j n θ j and γ j = j j 3. It then follows from 41 that S = J 0 j<j 1 n 1 j R F θ jiµ j Cn 1 J 1η+ 1 J γ j J 0 j<j 1 3n 1 j 3 j + 8n 1 log j 1 j Hence if J 1 satisfies J 1 = n γ 1 with some γ <, then 40 and 4 yield 1+αη+1/+v/ S + S 3 = o n α 1+α. 43 We now turn to the term S 3. It is easy to check that for θ B α p,qm, θ j M α j where α = α 1 p 1 + > 0. Note that if J 1 satisfies J 1 = n γ for some γ > 1 1/+α, then 0 4

21 for all sufficiently large n and all j J 1, µ j 1 3 γ j where γ j = j j 3. It thus follows from 7 and 41 that whenever α > 1 p 1 +, S 3 = j J 1 k n 1 E θ ˆθ j,k θ j,k j J 1 n 1 j R F θ j + c η n 1 jη θ j + Cn 1 C α J 1 + Cn 1 = on α 1+α. 44 j J 1 For α > v p , both Equations 43 and 44 hold, and hence the proof is completed by choosing γ satisfying 1 1/ + α 1/p 1/ + < γ < This is always possible by choosing η > 0 sufficiently small α η + 1/ + v/. 6.4 Proof of Theorem Define the minimax linear risk by R L Bα p,qm = inf θ linear sup θ B α p,q M E ˆθ θ. It follows from Donoho, Liu and MacGibbon 1990 that θ RLB p,qm α j,k /n = sup θj,k + 1/n, θ Bp,qM α j=j 0 R LB α p,qm = R B α p,qm1 + o1 for p = q =, and R L Bα p,qm 1.5R B α p,qm1 + o1, for all p > and q >. It therefore suffices to establish that the SureBlock procedure asymptotically attains the minimax linear risk, i.e., sup E θ ˆθ θ RLB p,qm1 α + o1, for α > 0, p and q. θ Bp,qM α Recall that in the proof of Theorem 1 it is shown that E θ ˆθ θ S 1 + S 1 + S + S 3 + S 3, k where S 1 + S + S 3 + S 3 = on α/α+1 and S 1 = J 0 j<j 1 n 1 j Rθ j with J 0 and J 1 chosen as in the proof of Theorem 1. Since the minimax risk R B α p,qm n α/α+1, this implies that S 1 is the dominating term in the maximum risk of the SureBlock procedure. It follows from the definition of Rθ j given in 4 that n 1 j Rθ j n 1 b E θ b ˆθ bl j, L j θ b 1

22 where the RHS is the risk of the blockwise James-Stein estimator with any fixed block size 1 L j jv where 0 v < 1 and a fixed threshold level L j. Stein s unbiased risk estimate see for example Lehmann 1983, page 300 yields that n 1 E θ b ˆθ bl j, L j θ b θ b L j /n θ b b b + L j /n +. n Hence the maximum risk of SureBlock procedure satisfies sup E θ ˆθ θ θ b L j /n θ Bp,q α M sup θ Bp,q α M θ b + L j /n + n = sup θ B α p,qm J 0 j<j 1 J 0 j<j 1 b b θ b L j /n θ b + L j /n o1 J 0 j<j 1 j nl j 1 + o1. Note that in the proof of Theorem 1, J 1 satisfies J 1 = n γ 1 with γ <. Hence 1+αη+1/+v/ if L j satisfies jρ L j jv for some ρ > 1 v η, then and hence j 1 nl j n 1 J 1 + v +η = o n α 1+α j J 1 sup E θ ˆθ θ θ Bp,qM α sup θ Bp,qM α J 0 j<j 1 b θ b L j /n θ b + L j /n 1 + o1. To complete the proof it remains to show that θ b L j /n sup θ Bp,qM α θ b RLB p,qm1 α + o L j /n Note that j /L j b=1 J 0 j<j 1 θ b L j /n θ b + L j /n = L j /L j n j j L j b b=1 Then the simple inequality m i=1 a i m J 0 j<j 1 b θ b L j /n θ b + L j /n L j /n θ b + L j /n = j n i=1 a 1 i j n j /L Lj j n b=1 1 θ b + L j /n. m, for a i > 0, 1 i m yields that Lj j j /L j θ b + L j n L j n b=1 = j /n j /L j b=1 θ b j /L j b=1 θ b + j /n = j /n k θ j,k + j /n. k θ j,k

23 Theorem in Cai, Low and Zhao 000 shows that sup j /n k θ j,k θ Bp,qM α J 0 j<j 1 and this proves inequality 45. k θ j,k + j /n = R LB α p,qm1 + o1, 6.5 Proof of Theorem 3 To establish the theorem, it suffices to show the following result which plays the role similar to that of Proposition 16 in Donoho and Johnstone 1994b. Proposition 3 Let X N µ, 1 and let F p η denote the probability measures F dµ satisfying the moment condition µ p F dµ η p. Let { } r δ g λ, η = sup E F r g µ : µ p F dµ η p, F pη where r g µ = E µ δ g λ x µ and δ g λ x = 1 λ x. Let p 0, and λ = x + log η p, then r δ g λ, η ηp λ p 1 + o 1 as η 0. Proof of Proposition 3 : It follows from an analogous argument to the proof of Proposition 16 in Donoho and Johnstone 1994b that p η r g δ λ, η = sup r g µ 1 + o1, as η 0. µ η µ Stein s unbiased risk estimate yields that r g µ = E µ [1 + x λ I x λ + x ] + λ4 I x λ x 1 + λ P x λ + + λ P x λ λ 1 + o 1 which implies that for µ λ p η r g µ η p λ p 1 + o 1, µ as η Similarly, λ r g µ = E µ [x 1 + x ] + λ4 x x + I x λ µ + 4P x λ. 3

24 Now the simple inequality P x λ φλ + µ = µ + o λ 4 4 ηp yields that for η µ < λ p η r g µ η p λ p 1 + o 1, as η µ The proposition now follows from 46 and 47. We now return to the proof of Theorem 3. Set ρ g η = inf λ sup Fp η E F r g µ. Then Proposition 3 implies that ρ g η r δ g λ, η ηp log η p p/ 1 + o 1, as η 0. For p 0,, Theorem 18 of Donoho and Johnstone 1994b shows the univariate Bayesminimax risk satisfies ρ η = inf δ sup E F E µ δ x µ = η p log η p p/ 1 + o1, as η 0. F pη Note that ρ g η /ρ η is bounded as η 0 and ρ g η /ρ η 1 as η. Both ρ g η and ρη are continuous on 0,, so G p = sup η ρ g η ρ η <, for p 0,. Theorems 4 and 5 in Section 4 of Donoho and Johnstone 1998 derived the asymptotic minimaxity over Besov bodies from the univariate Bayes-minimax estimators. It then follows from an analogous argument of Section 5.3 in Donoho and Johnstone 1998 that R T B α p,qm inf λ j sup E θ Bp,q α M 6.6 Proof of Proposition 1 j=j 0 θ j λ j, 1 θ j G p q R B α p,qm 1 + o 1. Proposition 1 in Section 5 shows that the minimizer of the Stein s unbiased estimator of risk for the block James-Stein estimator is in a finite set A with cardinality A d + 1. This makes the implementation of the SureBlock procedure easy. In addition, Proposition 1 is needed for the proof of Proposition, a key result to prove Theorem 4. Proof of Proposition 1: We shall consider two separate cases: L = 1 and L. L = 1, we define A = {x i ; 1 i d} {0}. From equations 8 and 9, Stein s unbiased estimator of risk is SURE x, λ, 1 = d 1 + λ + λ I x x i > λ + x i I x i λ. i i=1 4 For

25 Suppose, without loss of generality, that the x i have been reordered in the order of increasing x i. On intervals x k λ < x k+1, SURE x, λ, 1 = d + k x i + i=1 i k+1 λ + λ x i is strictly increasing. So is true on the interval 0 λ < x 1. Therefore the minimizer λ S of SURE x, λ, 1 is either 0 or one of the x i. We now turn to the case L. We first show that λ S L, or equivalently that SURE x, λ, L SURE x, L, L 48 uniformly for 0 λ L. It suffices to show that for every block b Equation 8 gives SURE x b, λ, L SURE x b, L, L, for 0 λ L. SURE x b, L, L SURE x b, λ, L λ L = I S Sb b > L S 4 + b LSb λ + λ L I λ < S Sb b L The first term on the right side is nonpositive and the second term can be also shown to be nonpositive. Note that the convex function h x = x Lx λ λ L is nonpositive at two endpoints x = λ and x = L which implies Thus inequality 48 is established. S 4 b LS b λ + λ L 0, for λ < S b L. We now consider the case λ L with L. Let S1 S S m denote the order statistics of Si, 1 i m. On the interval Sk λ < S k+1 for some 1 k m, we consider two separate cases: i L Sk λ < S k+1 ; ii S k L λ < S k+1. If L Sk λ < S k+1, it follows from equation 8 that SURE x, λ, L = d + k S b L + b=1 b k+1 λ λ L S b which, as a function of λ, achieves minimum at Sk. For the second case, SURE x, λ, L achieves minimum at λ = L from the equation above, and this completes the proof. 5

26 References [1] Antoniadis, A. and Fan, J Regularization of wavelet approximations with discussion. J. Amer. Statist. Assoc. 96, [] Breiman, L Better subset regression using the nonnegative garrote. Technometrics 37, [3] Cai, T Adaptive wavelet estimation: A block thresholding and oracle inequality approach. Ann. Statist. 7, [4] Cai, T., Low, M.G., and Zhao, L Blockwise thresholding methods for sharp adaptive estimation. Technical Report, Department of Statistics, University of Pennsylvania. [5] Cai, T. and Silverman, B.W Incorporating information on neighboring coefficients into wavelet estimation. Sankhya Ser. B 63, [6] Cai, T. and Zhou, H A data-driven block thresholding approach to wavelet estimation. Technical Report, Department of Statistics, University of Pennsylvania. Available at [7] Chicken, E Block-dependent thresholding in wavelet regression. J. Nonparametric Statist., to appear. [8] Chicken, E. and Cai, T Block thresholding for density estimation: Local and global adaptivity. J. Multivariate Anal., in press. [9] Daubechies, I Ten Lectures on Wavelets SIAM: Philadelphia. [10] DeVore, R. and Popov, V Interpolation of Besov spaces. Trans. Amer. Math. Soc. 305, [11] Donoho, D.L. and Johnstone, I.M. 1994a. Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, [1] Donoho, D. L. and Johnstone, I. M. 1994b. Minimax risk over l p -balls for l q -error. Prob. Th. Rel. Fields 99, [13] Donoho, D. L. and Johnstone, I. M Adapt to unknown smoothness via wavelet shrinkage. J. Amer. Stat. Assoc. 90,

27 [14] Donoho, D.L. and Johnstone, I.M Minimax estimation via wavelet shrinkage. Ann. Statist. 6, [15] Donoho, D.L., Liu, R.C. and MacGibbon, B Minimax risk over hyperrectangles, and implications. Ann. Statist. 18, [16] Efromovich, S. Y Nonparametric estimation of a density of unknown smoothness. Theor. Probab. Appl. 30, [17] Gao, H.-Y Wavelet shrinkage denoising using the non-negative garrote. J. Comput. Graph. Statist. 7, [18] Hall, P., Kerkyacharian, G. and Picard, D On the minimax optimality of block thresholded wavelet estimators. Statist. Sinica 9, [19] Johnstone, I. M Wavelet shrinkage for correlated data and inverse problems: adaptivity results. Statist. Sinica 9, [0] Kerkyacharian, G., Picard, D. and Tribouley, K L p Adaptive density estimation. Bernoulli, [1] Lehmann, E.L Theory of Point Estimation. John Wiley and Sons, New York. [] Meyer, Y Wavelets and Operators. Cambridge University Press, Cambridge. [3] Stein, C Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9, [4] Strang, G Wavelet and dilation equations: a brief introduction. SIAM Review 31, [5] Triebel, H Theory of Function Spaces. Birkhäuser Verlag, Basel. 7

28 Blocks Bumps Doppler HeaviSine Piece-Polynomial Piece-Regular Figure 3: Test functions 8

29 7 Appendix: Proofs of Technical Results We collect in this Appendix the proofs of the technical results stated in Section Proof of Lemma 1 Recall that R θ and R θ are defined as following R θ = inf λ,1 L nvr λ, L = inf L λ,1 L nvr λ, L R θ = inf r λ, L = inf r λ, L, λ λ F,1 L n v L λ λ F,1 L n v If R θ = r λ 1, L 1 with λ 1 λ F, R θ R θ = 0. In the case R θ = r λ1, L 1 with λ 1 > λ F, we have R θ r λ F, L 1, which implies then it is enough to prove R θ R θ r λ F, L 1 r λ1, L 1 = 1 m rb λ F, L 1 rb λ 1, L 1 d uniformly for all λ 1, b and L. Recall that where b=1 B = r b λ F, L r b λ 1, L Cd δ 1 r b λ, L = θ b + E θ b g λ I S b > λ g λ = λ λ L S Sb b + L which is an increasing function of λ for λ L, then simple algebra gives B E θ b λ F λ F L S b S b + L I λ 1 S b > λ F so Note that λ F λ F L Sb + L 0, when Sb λ F + S b B E θ b λ F λ F L S b 4P θ b λf + S b > λ F. S b + L I λ F + S b > λ F 9

30 Then B 4P θ b λf + Sb 4P θ b λf + 1/ θ b z b = O d 1 when θ b 4λ F. Now we consider the case θ b 4λ F. We have r λ F, θ b r λ1, θ b = and which leads to where Note that I 1 = = λ1 λ F λ1 λ F λ r λ, θ b dλ θ e k θ k=0 k! y>λ f L+k λ = f L+k y dy y>λ L + k / 1 y/ = f L+k y dy y r λ F, θ b r λ1, θ b λ1 θ k = e θ k! λ F = I 1 I k=0 λ1 θ I 1 = e k θ λ k! F k=0 λ1 θ I = e k θ λ k! F λ1 λ F k=0 θ e k θ k=0 = I 11 + I 1 k! y>λ+4k y>λ y>λ+k λ+k y>λ λ L + 4 f L+k y dy 4f L+k λ dλ y λ + k y f L+k y dy dλ y y λ + k f L+k y dy dλ y λ + k y f L+k y dy dλ y + λ+4k y>λ+k y λ + k f L+k y dy dλ y and I = λ1 λ F k=0 = I 1 + I θ e k θ k! λ+k max{λ,l+k } + max{λ,l+k } min{λ,l+k } λ + k y f L+k y dy dλ y 30

31 where I 11 = o d 1/, since y>λ+4k y λ + k f L+k y dy P χ L+k λ F + 4k y [ λ 1 F 1 + 4k exp 1 λ F + 4k L + k ] L + k = o d 1/ by Lemma in Cai 1999, and I 0, then I 1 I o d 1/ + I 1 I 1. When λ L + k, then f L+k y is increasing on [λ,, thus we have = λ+4k λ+k k 0 [ y λ + k λ+k f L+k y dy y λ z z + λ + k f L+k z + λ + k λ + k y f L+k y dy y z λ + k z f L+k λ + k z ] dz 0 because the integrand is nonpositive. When λ L + k, then k λ F L + /. Let s assume k Mλ F. In this case, we consider the following difference λ+4k λ+k λ+4k λ+k y λ + k λ+k f L+k y dy y y λ + k f L+k y dy y L+k λ+l/+k L+k λ + k y f L+k y dy y λ + k y f L+k y dy y Note that, for c 1 > c > 1 and m = L + k we have f m c 1 m f m c m = c1 c m/ 1 e c c 1 m/ = c c 1 e mc log c c 1 log c 1 c c 1 e m1 1/c 1c c 1 because x log x = 1 1/x 1 1/c 1, for c x c 1, as d we have f mc 1 m f m c m m which implies as d the following term is negative λ+4k λ+l/+k y λ + k λ + k y f L+k y dy f L+k y dy λ+k y L+k y λ + k λ + L / + k kf L+k λ + k f L+k λ + L / + k λ + L / + k 31

A DATA-DRIVEN BLOCK THRESHOLDING APPROACH TO WAVELET ESTIMATION

The Annals of Statistics 2009, Vol. 37, No. 2, 569 595 DOI: 10.1214/07-AOS538 Institute of Mathematical Statistics, 2009 A DATA-DRIVEN BLOCK THRESHOLDING APPROACH TO WAVELET ESTIMATION BY T. TONY CAI 1