A Data-Driven Block Thresholding Approach To Wavelet Estimation

Size: px
Start display at page:

Download "A Data-Driven Block Thresholding Approach To Wavelet Estimation"

Transcription

1 A Data-Driven Block Thresholding Approach To Wavelet Estimation T. Tony Cai 1 and Harrison H. Zhou University of Pennsylvania and Yale University Abstract A data-driven block thresholding procedure for wavelet regression is proposed and its theoretical and numerical properties are investigated. The procedure empirically chooses the block size and threshold level at each resolution level by minimizing Stein s unbiased risk estimate. The estimator is sharp adaptive over a class of Besov bodies and achieves simultaneously within a small constant factor of the minimax risk over a wide collection of Besov Bodies including both the dense and sparse cases. The procedure is easy to implement. Numerical results show that it has superior finite sample performance in comparison to the other leading wavelet thresholding estimators. Keywords: Adaptivity; Besov body; Block thresholding; James-Stein estimator; Nonparametric regression; Stein s unbiased risk estimate; Wavelets. AMS 1991 Subject Classification: Primary 6G07, Secondary 6G0. 1 Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA The research of Tony Cai was supported in part by NSF Grants DMS and DMS Department of Statistics, Yale University, New Haven, CT

2 1 Introduction Consider the nonparametric regression model: y i = ft i + σ z i, i = 1,,..., n, 1 where t i = i/n, σ is the noise level and z i s are independent standard normal random variables. The goal is to estimate the unknown regression function f based on the sample {y i }. Wavelet methods have demonstrated considerable success in nonparametric regression. They achieve a high degree of adaptivity through thresholding of the empirical wavelet coefficients. Standard wavelet approaches threshold the empirical coefficients term by term based on their individual magnitudes. See, for example, Donoho and Johnstone 1994a, Gao 1998, and Antoniadis and Fan 001. More recent work has demonstrated that block thresholding, which simultaneously keeps or kills all the coefficients in groups rather than individually, enjoys a number of advantages over the conventional term-by-term thresholding. Block thresholding increases estimation precision by utilizing information about neighboring wavelet coefficients and allows the balance between variance and bias to be varied along the curve which results in adaptive smoothing. The degree of adaptivity, however, depends on the choice of block size and threshold level. The idea of block thresholding can be traced back to Efromovich 1985 in orthogonal series estimators. In the context of wavelet estimation, global level-by-level thresholding was discussed in Donoho and Johnstone 1995 for regression and in Kerkyacharian, Picard and Tribouley 1996 for density estimation. But these block thresholding methods are not local, so they do not enjoy a high degree of spatial adaptivity. Hall, Kerkyacharian and Picard 1999 introduced a local blockwise hard thresholding procedure with a block size of the order log n where n is the sample size. Cai 1999 considered blockwise James- Stein rules and investigated the effect of block size and threshold level on adaptivity using an oracle inequality approach. In particular it was shown that a block size of order log n is optimal in the sense that it leads to an estimator which is both globally and locally adaptive. Cai and Silverman 001 considered overlapping block thresholding estimators and Chicken and Cai 004 applied block thresholding to density estimation. The block size and threshold level play important roles in the performance of a block thresholding estimator. The local block thresholding methods mentioned above all have fixed block size and threshold and same thresholding rule is applied to all resolution levels regardless of the distribution of the wavelet coefficients. In the present paper, we propose a data-driven approach to empirically select both the block size and threshold at individual

3 resolution levels. At each resolution level, the procedure, SureBlock, chooses the block size and threshold empirically by minimizing Stein s Unbiased Risk Estimate SURE. By empirically selecting both the block size and threshold and allowing them to vary from resolution level to resolution level, the SureBlock estimator has significant advantages over the more conventional wavelet thresholding estimators with fixed block sizes. We consider in the present paper both the numerical performance and asymptotic properties of the SureBlock estimator. It is shown that the minimizer of SURE yields a nearoptimal estimate of the wavelet coefficient vector at each resolution level which in turn generates an adaptive nonparametric estimator of the regression function f. The theoretical properties of the SureBlock estimator are considered in the Besov sequence space formulation that is by now classical for the analysis of wavelet regression methods. Besov spaces, denoted by Bp,q α and defined in Section 4, are a very rich class of function spaces which contain functions of homogeneous and inhomogeneous smoothness. The theoretical results show that the SureBlock estimator automatically adapts to the sparsity of the underlying wavelet coefficient sequence and enjoys excellent adaptivity over a wide range of Besov bodies. In particular, in the dense case p the SureBlock estimator is sharp adaptive over all Besov bodies Bp,qM α with p = q = and adaptively achieves within a factor of 1.5 of the minimax risk over Besov bodies Bp,qM α for all p, q. At the same time the SureBlock estimator achieves simultaneously within a constant factor of the minimax risk over a wide collection of Besov bodies Bp,qM α in the sparse case p <. These properties are not shared simultaneously by many commonly used fixed block size procedures such as VisuShrink Donoho and Johnstone 1994a, SureShrink Donoho and Johnstone 1995 or BlockJS Cai The SureBlock estimator is completely data-driven and easy to implement. A simulation study is carried out and the numerical findings reconfirm the theoretical results. The numerical results show that SureBlock has superior finite sample performance in comparison to the other leading wavelet estimators. More specifically, SureBlock uniformly outperforms both VisuShrink and SureShrink Donoho and Johnstone, 1994a and 1995 in all 4 simulation cases in terms of the average squared error. SureBlock procedure is better than BlockJS Cai 1999 in 37 out of 4 cases. The paper is organized as follows. After Section in which basic notation and block thresholding methods are briefly reviewed, we introduce in Section 3 the SureBlock procedure. Asymptotic properties of the SureBlock estimator are presented in Section 4. Numerical implementation and finite-sample performance of the SureBlock estimator are then discussed in Section 5. The numerical performance of SureBlock is compared with 3

4 those of VisuShrink Donoho and Johnstone 1994a, SureShrink Donoho and Johnstone 1995 and BlockJS Cai The proofs are given in Section 6. Wavelets And Block Thresholding Let {φ, ψ} be a pair of father and mother wavelets. The functions φ and ψ are assumed to be compactly supported and φ = 1. Dilation and translation of φ and ψ generate an orthonormal wavelet basis with an associated orthogonal Discrete Wavelet Transform DWT which transforms sampled data into the wavelet coefficient domain. ψ is called r-regular if ψ has r vanishing moments and r continuous derivatives. A wavelet Daubechies 199 and Strang 199 for details on the discrete wavelet transform and compactly supported wavelets. Let φ j,k x = j φ j x k, and ψ j,k x = j ψ j x k. For a given square-integrable function f on [0, 1], define the wavelet coefficients of the wavelet expansion of f by ξ j,k = f, φ j,k, θ j,k = f, ψ j,k. Suppose we observe the noisy data Y = y 1, y,..., y n as in 1 and suppose the sample size n = J for some integer J > 0. We use the standard device of the discrete wavelet transform to turn the function estimation problem into a problem of estimating wavelet coefficients. Let Ỹ = W Y be the discrete wavelet transform of Y. Then Ỹ can be written as Ỹ = ξ j0,1,, ξ j0 j 0, ỹ j0,1,, ỹ j0, j 0,, ỹ J 1,1,, ỹ J 1, J 1. where j 0 is some fixed primary resolution level. Here ξ j0,k are the gross structure terms at the lowest resolution level, and ỹ j,k are the empirical wavelet coefficients at level j which represent fine structure at scale j. Since the DWT is an orthogonal transform, the ỹ j,k are independent normal random variables with standard deviation σ. Note that the mean of n 1 ỹ j,k equals, up to some small approximation errors, the wavelet coefficient θ j,k of f. See Daubechies 199. Through the discrete wavelet transform the nonparametric regression problem is then turned into a problem of estimating a high dimensional normal mean vector. A block thresholding procedure thresholds wavelet coefficients in groups and makes simultaneous decisions on all the coefficients within a block. See Fix a block size L and a threshold level λ and divide the empirical wavelet coefficients ỹ j,k at any given resolution level j into nonoverlapping blocks of size L. Denote jb the indices of the coefficients in the b-th block at level j, i.e. jb = {j, k : b 1L + 1 k bl}. Let S jb = k jb ỹ j,k 4

5 denote the sum of squared empirical wavelet coefficients in the block. A block jb is deemed important if Sjb is larger than the threshold λ and then all the coefficients in the block are retained; otherwise the block is considered negligible and all the coefficients in the block are estimated by zero. The most commonly used rules for a given block are the James-Stein rule and the hard thresholding rule θ j,k = ỹ j,k 1 λ Sjb + for j, k jb 3 θ j,k = ỹ j,k IS jb > λ, for j, k jb. 4 The estimator of {ft i, i = 1,,..., n} is then given by the inverse discrete wavelet transform of the thresholded wavelet coefficients. Block thresholding estimators depend on the choice of the block size L and threshold level λ which largely determines the performance of the resulting estimator. important to choose L and λ in an optimal way. It is thus 3 The SureBlock Procedure As discussed in the previous section the nonparametric regression problem of estimating f can be turned into a problem of estimating a high dimensional normal mean vector by applying the orthogonal discrete wavelet transform to the data. We thus begin in this section by considering estimation of the mean of a multivariate normal random variable. Suppose that we observe x i = θ i + σz i, i = 1,,..., d 5 where z i are independent standard normal random variables and σ is known. Here x = x 1,, x d represents the observations in one resolution level in wavelet regression. Without loss of generality we assume in this section σ = 1. We wish to estimate θ = θ 1,, θ d based on the observations x = x 1,, x d under the average mean squared error: R θ, θ = 1 d d E θ i θ i. 6 i=1 We shall estimate θ by a blockwise James-Stein estimator with block size L and threshold level λ chosen empirically by minimizing Stein s Unbiased Risk Estimate SURE. Let L 1 be the possible length of each block, and m = d/l be the number of blocks. 5

6 For simplicity we shall assume that d is divisible by L in the following discussion. Let x b = x b 1L+1,..., x bl represent observations in the b-th block, and similarly θ b = θb 1L+1,..., θ bl and z b = z b 1L+1,..., z bl. Let S b = x b for b = 1,,..., m. The Blockwise James-Stein estimator is given by θ b λ, L = 1 λ x Sb b, + b = 1,,..., m 7 where λ 0 is the threshold level. Write θ b λ, L = x b + gx b, where g is a function from IR L to IR L. Stein 1981 showed that when g is weakly differentiable, then In our case, E θ b θ b λ, L θ b = E θ b { L + g + g }. g x b = 1 λ x Sb b x b + is weakly differentiable. Simple calculations show E θ b θ b λ, L θ b = E θ b SURE x b, λ, L where SURE x b, λ, L = L + λ λ L I S Sb b > λ + Sb L I Sb λ. 8 This implies that the total risk E θ θλ, L θ = E θ SURE x, λ, L where SURE x, λ, L = m SURE x b, λ, L 9 b=1 is an unbiased estimate of risk. i.e. We choose the block size L S and threshold level λ S to be the minimizer of SURE x, λ, L, λ S, L S = arg minsure x, λ, L. λ, L The estimator of θ is then given by 7 with L = L S and λ = λ S. The SURE Block James-Stein estimator has some drawbacks in situations of extreme sparsity of wavelet coefficients. We propose to use a hybrid scheme similar to Donoho and Johnstone 1995 and Johnstone The hybrid method works as follows. Set T d = d 1 x i 1, γ d = d 1 log 3 d and λ F = L log d. For a fixed 0 v < 1 let λ, L denote the minimizers of SURE with an additional restriction on the search range : λ, L = arg min max{l,0} λ λ F,1 L d v SURE x, λ, L. 10 6

7 Define the estimator θ x of θ by θ b = θ b λ, L, if T d > γ d, 11 and θ i = 1 log d x x i, if T d γ d. 1 i + When T d γ d the estimator is a degenerate Block James-Stein estimator with block size L = 1. In this case the estimator is also called the non-negative garrote estimator. See Breiman 1995 and Gao The SURE approach has also been used for the selection of the threshold level for fixed block size procedures, term-by-term thresholding L = 1 in Donoho and Johnstone 1995 and block thresholding L = log n in Chicken 004. We now return to the nonparametric regression model 1. As in Section denote by Ỹ j = {ỹ j,k : k = 1,..., j } and θ j = {θ j,k : k = 1,..., j } the empirical and true wavelet coefficients of the regression function f at resolution level j. We apply the hybrid SURE Block James-Stein procedure to the empirical wavelet coefficients Ỹ j level by level and then use the inverse discrete wavelet transform to obtain the estimate of the regression function. More specifically the procedure for wavelet regression, called SureBlock, has the following steps. 1. Transform the data into the wavelet domain via the discrete wavelet transform: Ỹ = W Y.. At each resolution level j, Select the block size L j and threshold level λ j by minimizing Stein s unbiased risk estimate and then estimate the wavelet coefficients by the hybrid Block James-Stein rule. More precisely, for j 0 j < J λ j, L j = arg min max{l,0} λ λ F,1 L vj SURE σ 1 Ỹ j, λ, L where Ỹ j is the empirical wavelet coefficient vector at resolution level j, and 13 θ j = σ θ σ 1 Ỹ j 14 where θ is the hybrid Block James-Stein estimator given in 11 and Obtain the estimate of the function via the inverse discrete wavelet transform of the denoised wavelet coefficients: f = W 1 Θ. 7

8 Theoretical results given in Section 4 show that the pair L j, λ j is asymptotically optimal in the sense that the resulting block thresholding estimator adaptively attains the exact minimax block thresholding risk asymptotically. Note that, in contrast to some other block thresholding methods, both the block size and threshold level of the SureBlock estimator are chosen empirically and vary from resolution level to resolution level. Both the theoretical and numerical results given in the next two sections show that the SureBlock estimator outperforms wavelet thresholding estimators with a fixed block size. 4 Theoretical Properties of SureBlock In this section we turn to the theoretical properties of the SureBlock estimator in the Besov sequence space formulation that is by now classical for the analysis of wavelet regression methods. The asymptotic results show that the SureBlock procedure is strongly adaptive. Besov spaces are a very rich class of function spaces and contain as special cases many traditional smoothness spaces such as Hölder and Sobolev Spaces. Roughly speaking, the Besov space B α p,q contains functions having α bounded derivatives in L p norm, the third parameter q gives a finer gradation of smoothness. Full details of Besov spaces are given, for example, in Triebel 1983 and DeVore and Popov For a given r-regular mother wavelet ψ with r > α and a fixed primary resolution level j 0, the Besov sequence norm b α p,q of the wavelet coefficients of a function f is then defined by 1 q f b α p,q = ξ j0 p + j=j 0 js θ j p q where ξ j0 is the vector of the father wavelet coefficients at the primary resolution level j 0, θ j is the vector of the wavelet coefficients at level j, and s = α > 0. Note that the p Besov function norm of index α, p, q of a function f is equivalent to the sequence norm 15 of the wavelet coefficients of the function. See Meyer 199. We shall focus our theoretical discussion to a sequence space version of the SureBlock estimator. Suppose that n = J for some integer J and that we observe sequence data 15 y j,k = θ j,k + n 1 σzj,k, j j 0, k = 1,,, j 16 where z j,k are independent standard normal random variables. We wish to estimate the mean array θ under the expected squared error Rˆθ, θ = E j,k θ j,k θ j,k = E θ θ. 8

9 Define the Besov body B α p,qm by B α p,qm = {θ : j=j 0 js θ j p q 1/q M} 17 where, as usual, s = α > 0. The minimax risk of estimating θ over the Besov body p Bp,qM α is R B α p,qm = inf θ sup E θ θ. 18 θ Bp,qM α Donoho and Johnstone 1998 show that the minimax risk R B α p,qm converges to 0 at the rate of n α/1+α as n. We apply the SureBlock procedure of Section 3 to the array of empirical coefficients y j,k for j 0 j < J, to obtain estimated wavelet coefficients θ j,k. Set θ j,k = 0 for j J. The minimax risk among all Block James-Stein estimators with all possible block sizes 1 L j jv with 0 v < 1 and threshold levels λ i 0 is RT Bp,qM α = inf sup E θ j λ j, L j θ j. 19 λ j 0,1 L j jv θ Bp,qM α j=j 0 We shall call R T Bα p,qm the minimax block thresholding risk. It is clear that R T Bα p,qm R B α p,qm. On the other hand, Theorems and 3 below show R T Bα p,qm is within a small constant factor of the minimax risk R B α p,qm. The following theorem shows that the SureBlock estimator adaptively attains the exact minimax block thresholding risk R T Bα p,qm asymptotically over a wide range of Besov bodies. Theorem 1 Suppose we observe the sequence model 16. Let ˆθ be the SureBlock estimator of θ defined in 11 and 1. Then sup E θ θ θ RT Bp,qM α 1 + o 1 0 θ Bp,qM α for 1 p, q, 0 < M <, and α > 1 1 v 1 p Theorems and 3 below make it clear that the SureBlock procedure is indeed nearly optimally adaptive over a wide collection of Besov bodies B α p,qm including both the dense p and sparse p < cases. The estimator is asymptotically sharp adaptive over Besov bodies with p = q = in the sense that it adaptively attains both the optimal rate and optimal constant. Over Besov bodies with p and q SureBlock adaptively achieves 9

10 within a factor of 1.5 of the minimax risk. At the same time the maximum risk of the estimator is simultaneously within a constant factor of the minimax risk over a collection of Besov bodies B α p,qm in the sparse case of p <. Theorem i. The SureBlock estimator is adaptively sharp minimax over Besov bodies B α,m for all M > 0 and α > v. That is, 1 v sup E θ θ θ θ B, α M R B,M α 1 + o 1 1 ii. The SureBlock estimator is adaptively, asymptotically within a factor of 1.5 of the minimax risk over Besov bodies B α p,qm, sup E θ θ θ 1.5R Bp,qM α 1 + o 1 θ Bp,qM α for all p, q, M > 0 and α > v/1 v. For the sparse case p <, the SureBlock estimator is also simultaneously within a small constant factor of the minimax risk. Theorem 3 The SureBlock estimator is asymptotically minimax up to a constant factor G p q over a large range of Besov bodies with 1 p, q, 0 < M <, and α > v p That is, sup E θ θ θ G p q R Bp,qM α 1 + o 1 3 θ Bp,qM α where G p q is a constant depending only on p q. 4.1 Analysis of a Single Resolution Level The proof of the main results given above relies on a detailed risk analysis of the SureBlock procedure on a single resolution level. In particular we derive the following upper bound for the risk of SureBlock at a given resolution level. The result may also be of independent interest for the problem of estimation of a multivariate normal mean. Consider the sequence model 5 where x i = θ i + z i, i = 1,..., d and z i are independent normal N0, 1 variables. Set rλ, L = d 1 E ˆθλ, L θ and let R θ = inf 0 λ,1 L dvr λ, L = inf max{l,0} λ,1 L dvr λ, L 4 and R F θ = r log d, 1. 5 The following theorem gives upper bounds of the risk of the SureBlock estimator. 10

11 Theorem 4 Let the data {x i, i = 1,,..., d} be given as in 5 and let ˆθ be the SureBlock estimator defined in 11 and 1. Set µ d = θ /d and γ d = d 1 log 3 d. Then a. for any constant η > 0 and uniformly in θ R d, d 1 E θ ˆθ θ R θ + R F θ I µ d 3γ d + c η d η 1 + v ; 6 b. for any constant η > 0 and uniformly in µ d 1γ 3 d, d 1 E θ ˆθ θ R F θ + c η d 1 η 7 where c η is a constant depending only on η. In the proof of Theorem 1, we will see that R θ, the ideal risk, is the dominating term and all other terms in Equations 6 and 7 are negligible. This result shows that the SureBlock procedure mimics the performance of the oracle procedure where the ideal threshold level and the ideal block size are given. Theorem 4 can be regarded as a generalization of Theorem 4 of Donoho and Johnstone1995 from a fixed block size of one to variable block size. This generalization is important because it enables the resulting SureBlock estimator to be not only adaptively rate optimal over a wide collection of Besov bodies across both the dense p and sparse p < cases, but also sharp adaptive over spaces where linear estimators can be asymptotically minimax. This property is not shared by fixed block size procedures such as VisuShrink, SureShrink or BlockJS. The theoretical advantages of the SureBlock estimator over these fixed block size procedures are also confirmed by the numerical results given in the next section. 5 Implementation And Numerical Results We consider in this section the finite sample performance of the SureBlock estimator. The SureBlock procedure is easy to implement. The following result is useful for numerical implementation as well as the derivation of the theoretical results of the SureBlock estimator. Proposition 1 Let x i, i = 1,..., d and SURE x, λ, L be given as in 5 and 9, respectively. Let the block size L be given. Then the minimizer λ of SURE x, λ, L is an element of the set A where A = { x i ; 1 i d } {0}, if L = 1 and A = { S i ; S i L, 1 i m } {L }, if L. 11

12 Proposition 1 shows that for a given block size L it is sufficient to search over the finite set A for the threshold λ which minimizes SURE x, λ, L. The SureBlock procedure first empirically chooses the optimal threshold and block size λ, L as in 10. The values of λ, L depend on the choice of the parameter v which determines the search range for the block size L. In our simulation studies we choose v = 3 4. That is, λ, L = arg min max{l,0} λ λ F,1 L d 3 4 SURE x, λ, L. 8 The theoretical results given in the next section show that this procedure is adaptively within a constant factor of the minimax risk over a wide collection of Besov bodies. The noise level σ is assumed to be known in Section 3. In practice σ needs to be estimated from the data. We use the following robust estimator of σ given in Donoho and Johnstone 1994a. The estimator ˆσ is based on the empirical wavelet coefficients at the highest resolution level, ˆσ = median ỹ J 1,k : 1 k J 1. In this section we compare the numerical performance of SureBlock with those of VisuShrink Donoho and Johnstone 1994a, SureShrink Donoho and Johnstone 1995 and BlockJS Cai VisuShrink thresholds empirical wavelet coefficients individually with a fixed threshold level. SureShrink is a term by term thresholding procedure which selects the threshold at each resolution level by minimizing Stein s unbiased risk estimate. In the simulation, we use the hybrid method proposed in Donoho and Johnstone BlockJS is a block thresholding procedure with a fixed block size log n and a fixed threshold level. Each of these wavelet estimators has been shown to perform well numerically as well as theoretically. For further details see the original papers. Six test functions, representing different level of spatial variability, and various sample sizes, wavelets and signal to noise ratios are used for a systematic comparison of the four wavelet procedures. The test functions are plotted in Figure 3 in Appendix. Sample sizes ranging from n = 56 to n = and signal-to-noise ratios SNR from 3 to 7 were considered. The SNR is the ratio of the standard deviation of the function values to the standard deviation of the noise. Different combinations of wavelets and signal-to-noise ratios yield basically the same results. For reasons of space, we only report in detail the results for one particular case, using Daubechies compactly supported wavelet Symmlet 8 and SNR equal to 7. We use the software package WaveLab for simulations and the procedures MultiVisu for VisuShrink and MultiHybrid for SureShrink in WaveLab 80 are used See wavelab/. 1

13 VisuShrink vs. SureBlock Doppler HeaviSine Bumps Blocks Piece Regular Piece Polynomial SureShrink vs. SureBlock Doppler HeaviSine Bumps Blocks Piece Regular Piece Polynomial BlockJS vs. SureBlock Doppler HeaviSine Bumps Blocks Piece Regular Piece Polynomial Figure 1: The vertical bars represent the ratios of the average squared errors of various estimators to the corresponding average squared error of SureBlock. The higher the bar the better the relative performance of SureBlock. The bars are plotted on a log scale and are truncated at the value of the original ratio. For each signal the bars are ordered from left to right by the sample sizes n = 56 to Table 1 below reports the average squared errors rounded to three significant digits over 50 replications for the four thresholding estimators. A graphical presentation is given in Figure 1. SureBlock consistently outperforms both VisuShrink and SureShrink in all 4 simulation cases in terms of the average squared error. SureBlock procedure is better than BlockJS about 88% of times 37 out of 4 cases. SureBlock fails to dominate BlockJS only for the test function Doppler. For n = the risk ratio of SureShrink to BlockJS is 0.01/ However, SureBlock is almost as good as BlockJS with a risk ratio 0.01/ Although SureBlock does not dominate BlockJS for the test function Doppler, the improvement of SureBlock over BlockJS is significant for other test functions. 13

14 The simulation results show that, by empirically choosing the block size and threshold and allowing them to vary from resolution level to resolution level, the SureBlock estimator has significant numerical advantages over thresholding estimators with fixed block size L = 1 VisuShrink or SureShrink or L = log n BlockJS. These numerical findings reconfirm the theoretical results given in Section 4. n VS SS BJS SB n VS SS BJS SB Blocks Bumps Doppler HeaviSine Piece-Polynomial Piece-Regular Table 1: Average squared errors over 50 replications with SNR = 7. In the table VS stands for VisuShrink, SS for SureShrink, BJS for BlockJS and SB for SureBlock. In addition to the comparison of the overall global performance of the wavelet estimators, it is also instructive to compare the performance at individual resolution levels and to examine the value in the empirical selection of block sizes. Table gives the average 14

15 squared errors at individual resolution levels over 50 replications for Bumps with n = 104 and SNR = 7. Table compares the errors for SureBlock, SureShrink and an estimator, called SureGarrote, which empirically chooses the threshold λ at each level but fixes the block size L = 1. SureBlock consistently outperforms SureGarrote and SureShrink at each resolution level. Both SureGarrote and SureShrink have block size L = 1, the results indicate that the Non-negative garrote shrinkage is slightly better than soft thresholding in this example. Resolution Level j SureBlock SureGarrote SureShrink Table : Average squared errors at individual resolution levels Figure shows a typical result of the SureBlock procedure applied to the noisy Bumps signal. The left panel is the noisy signal; the middle panel is a display of the empirical wavelet coefficients arranged according resolution levels; and the right panel is the Sure- Block reconstruction solid line and the true signal dotted line. In this example the block sizes chosen by SureBlock are, 3, 1, 5, 3, 5 and 1 from the resolution level j = 3 to level j = d1 d d3 d4 d5 d6 d7 s Figure : SureBlock procedure applied to a noisy Bumps signal. 15

16 6 Proofs Throughout this section, without loss of generality, we shall assume the noise level σ = 1. We first prove Theorem 4 and then use it as the main tool to prove Theorem 1. The proofs of Theorems and 3 and Proposition 1 are given later. 6.1 Notation And Preparatory Results Before proving the main theorems, we need to introduce some notation and collect a few technical results. The proofs of some of these preparatory results are long and tedious. For reasons of space these proofs are omitted here. We refer interested readers to Cai and Zhou 005 for the complete proofs. Consider the sequence model 5 with σ = 1. For a given block size L and threshold level λ, set r b λ, L = E θ b θ b λ, L θ b and define r λ, L = 1 d m r b λ, L = ED λ, L b=1 where D λ, L = 1 d m b=1 θ b λ, L θ b. Set R θ = inf r λ, L = inf r λ, L. 9 λ λ F,1 L d v max{l,0} λ λ F,1 L d v The difference between Rθ and Rθ defined in 4 is that the search range for the threshold λ in Rθ is restricted to be at most λ F. The result given below shows that the effect of this restriction is negligible for any block size L. Lemma 1 For any fixed η > 0, there exists a constant C η > 0 such that for all θ R d, R θ R θ C a d a 1. The following simple lemma is adapted from Donoho and Johnstone 1995 and is used the proof of Theorem 4. Lemma Let T d = d 1 x i 1 and µ d = d 1 θ. If γd d/ log d, then sup 1 + µ d P T d γ d = o d 1. µ d 3γ d We also need the following bounds for the loss of the SureBlock estimator and the derivative of the risk function r b λ, L. The first bound is used in the proof of Theorem 1 and the second in the proof of Theorem 4. 16

17 Lemma 3 Let {x i : i = 1,..., d} be given as in 5 with σ = 1. Then and for λ > 0 and L 1, ˆθ θ λ F d + z 30 λ r b λ, L < Finally we develop a key technical result for the proof of Theorem 4. Set U λ, L = 1 d SURE x, λ, L = m λ λ L I Sb > λ + Sb L I Sb λ. d Note that both D λ, L and U λ, L have expectation r λ, L. b=1 The goal is to show that the minimizer λ S, L S of Stein unbiased estimator of risk U λ, L is asymptotically the ideal threshold level and block size. The key step is to show that the following difference d = ED λ S, L S infr λ, L is negligible for max {L, 0} λ λ F and 1 L d v. It is easy to verify that for any two functions g and h defined on the same domain, inf x gx inf x hx sup x gx hx. Hence, Uλ S, L S inf rλ, L = inf Uλ, L inf λ,l λ,l λ,l S b λ,l rλ, L sup Uλ, L rλ, L λ,l and consequently d E D λ S, L S r λ S, L S + r λ S, L S U λ S, L S + U λ S, L S infr λ, L Esup λ,l D λ, L r λ, L + Esup r λ, L U λ, L. 3 λ,l The following proposition provides the upper bounds for the two terms on the RHS of 3 and is a key technical result for the proof of Theorem 4. λ,l Proposition Uniformly in θ R d, for 0 v < 1 we have E θ sup U λ, L r λ, L = o d η 1 + v max{l,0} λ λ F,1 L d v E θ sup D λ, L r λ, L = o d η 1 + v max{l,1/ log d} λ λ F,1 L d v for any 1 v > η > 0, where λ F = L log d. In the proofs we will denote by C a generic constant that may vary from place to place. 17

18 6. Proof of Theorem 4 The proof of Theorem 4 is similar to but more involved than those of Theorem 4 of Donoho and Johnstone1995 and Theorem of Johnstone 1999 because of variable block size. Without loss of generality, we may assume λ 1/ log d since equation 31 implies R θ inf 1/ log d λ,1 L dvr λ, L 1 + C/ log d R θ, which means that the ideal risk over λ 1/ log d is asymptotically the same as the ideal risk over all λ 0. Correspondingly, we shall also restrict the minimizer of U λ, L searched over [1/ log d, instead of over [0, in the SureBlock procedure. Set T d = d 1 x i 1 and γ d = d 1 log 3 d. Define the event A d = {T d γ d } and decompose the risk of SureBlock procedure into two parts, R d θ = d 1 E θ θ θ = R 1,d θ + R,d θ } { where R 1,d θ = d 1 E θ { θ θ I A d and R,d θ = d 1 E θ θ θ I A c d }. We first consider R 1,d θ. On the event A d, the signal is sparse and θ is the nonnegative garrote estimator ˆθ i = 1 log d/x i + x i by the definition of θ in equation 1. Decomposing R 1,d θ further into two parts with either µ d = d 1 θ 3γ d or µ d > 3γ d yields that R 1,d θ R F θ I µ d 3γ d + r 1,d θ where R F θ is the risk of the non-negative garrote estimator and } r 1,d θ = d 1 E θ { θ θ I A d I µ d > 3γ d. Note that on the event A d, ˆθ x d + dγ d and so r 1,d θ d 1 E θ + θ P A d I µ d > 3γ d 1 + µ d P A d = od 1 35 where the last step follows from Lemma. Note that for any η > 0, Lemma 1 yields that for some constant C η > 0 for all θ R d, and Equations 3-34 yield R θ R θ C η d η 1 36 R,d θ R θ d Cd η 1 + v. 37 The proof of part a of the theorem is completed by putting together Equations

19 We now turn to part b. Note first that R 1,d θ R F θ. On the other hand, R,d θ = d 1 E { } 1 θ θ I A c d Cd 1 E θ θ 4 P 1 A c d. To complete the proof of 7, it then suffices to show that under the assumption µ d 1 3 γ d, E θ θ 4 is bounded by a polynomial of d and P A c d decays faster than any polynomial of d 1. Note that in this case θ = dµ d d 1 log 3 d. Since ˆθ x and x i = θ i + z i, E θ θ 4 E θ + θ E x + θ E 4 θ + z 3 θ 4 + 8E z 4 3d log 3 d + 16d + 8d. On the other hand, it follows from Hoeffding s inequality and Mill s inequality that P A c d = P d 1 z i + z i θ i + θ i 1 > d 1 3 log d P d 1 zi 1 > d log d + P d 1 z i θ i > d log d exp C log 3 d + 1 exp C log 3 d/µ d which decays faster than any polynomial of d Proof of Theorem 1 We now use Theorem 4 as one of the main technical tools to prove Theorem 1. Again we set σ = 1. To more conveniently use the results given in Theorem 4, we shall renormalize the model 16 so that the noise level is 1. Multiplying by a factor of n 1 on both sides, the sequence model 16 becomes y j,k = θ j,k + z j,k, j j 0, k = 1,,, j 38 where y j,k = n 1 y j,k and θ j,k = n 1 θ j,k. Note that E θ ˆθ θ = n 1 E θ ˆθ θ where the expectation on the left side is under model 16 and the right side under the renormalized model 38. We shall use the sequence model 38 in the proof below. Fix 0 < ε 0 < 1/1 + α and let J 0 be the largest integer satisfying J 0 n ε 0. Write n 1 E θ ˆθ θ = + + n 1 E θ ˆθ θ j J 1 j J 0 k J 0 j<j 1 k k = S 1 + S + S 3 19

20 where J 1 > J 0 is to be chosen later. Since 0 < ε 0 < 1/1+α, n ε 0 n 1 log n = o n α/1+α. It then follows from equation 30 in Lemma 3 that S 1 Cλ F n ε 0 n 1 = o n α 1+α which is negligible relative to the minimax risk. On the other hand, it follows from 6 in Theorem 4 that S n 1 j R θ j J 0 j<j 1 = S 1 + S + S 3 + n 1 j R F θ j J 0 j<j 1 I µ j 3γ j + where µ j = j θ j = j n θ j and γ j = j j 3. It is obvious that S 1 = J 0 j<j 1 n 1 c η jη+1/+v/ J 0 j<j 1 n 1 j Rθ j R T B α p,qm 39 and R T Bα p,qm R B α p,qm Cn α/1+α for some constant C > 0. We shall see that both S and S 3 are negligible relative to the minimax risk. Note that S 3 = J 0 j<j 1 n 1 c η jη+1/+v/ Cn 1 J1η+1/+v/. 40 On the other hand, it follows from the Oracle Inequality 3.10 in Cai 1999 with L = 1 and λ = λ F = log j that j j R F θ j nθj,k λ F + 8 log j 1 j n θ j + 8 log j 1 j. 41 j=1 Recall that µ j = j n θ j and γ j = j j 3. It then follows from 41 that S = J 0 j<j 1 n 1 j R F θ jiµ j Cn 1 J 1η+ 1 J γ j J 0 j<j 1 3n 1 j 3 j + 8n 1 log j 1 j Hence if J 1 satisfies J 1 = n γ 1 with some γ <, then 40 and 4 yield 1+αη+1/+v/ S + S 3 = o n α 1+α. 43 We now turn to the term S 3. It is easy to check that for θ B α p,qm, θ j M α j where α = α 1 p 1 + > 0. Note that if J 1 satisfies J 1 = n γ for some γ > 1 1/+α, then 0 4

21 for all sufficiently large n and all j J 1, µ j 1 3 γ j where γ j = j j 3. It thus follows from 7 and 41 that whenever α > 1 p 1 +, S 3 = j J 1 k n 1 E θ ˆθ j,k θ j,k j J 1 n 1 j R F θ j + c η n 1 jη θ j + Cn 1 C α J 1 + Cn 1 = on α 1+α. 44 j J 1 For α > v p , both Equations 43 and 44 hold, and hence the proof is completed by choosing γ satisfying 1 1/ + α 1/p 1/ + < γ < This is always possible by choosing η > 0 sufficiently small α η + 1/ + v/. 6.4 Proof of Theorem Define the minimax linear risk by R L Bα p,qm = inf θ linear sup θ B α p,q M E ˆθ θ. It follows from Donoho, Liu and MacGibbon 1990 that θ RLB p,qm α j,k /n = sup θj,k + 1/n, θ Bp,qM α j=j 0 R LB α p,qm = R B α p,qm1 + o1 for p = q =, and R L Bα p,qm 1.5R B α p,qm1 + o1, for all p > and q >. It therefore suffices to establish that the SureBlock procedure asymptotically attains the minimax linear risk, i.e., sup E θ ˆθ θ RLB p,qm1 α + o1, for α > 0, p and q. θ Bp,qM α Recall that in the proof of Theorem 1 it is shown that E θ ˆθ θ S 1 + S 1 + S + S 3 + S 3, k where S 1 + S + S 3 + S 3 = on α/α+1 and S 1 = J 0 j<j 1 n 1 j Rθ j with J 0 and J 1 chosen as in the proof of Theorem 1. Since the minimax risk R B α p,qm n α/α+1, this implies that S 1 is the dominating term in the maximum risk of the SureBlock procedure. It follows from the definition of Rθ j given in 4 that n 1 j Rθ j n 1 b E θ b ˆθ bl j, L j θ b 1

22 where the RHS is the risk of the blockwise James-Stein estimator with any fixed block size 1 L j jv where 0 v < 1 and a fixed threshold level L j. Stein s unbiased risk estimate see for example Lehmann 1983, page 300 yields that n 1 E θ b ˆθ bl j, L j θ b θ b L j /n θ b b b + L j /n +. n Hence the maximum risk of SureBlock procedure satisfies sup E θ ˆθ θ θ b L j /n θ Bp,q α M sup θ Bp,q α M θ b + L j /n + n = sup θ B α p,qm J 0 j<j 1 J 0 j<j 1 b b θ b L j /n θ b + L j /n o1 J 0 j<j 1 j nl j 1 + o1. Note that in the proof of Theorem 1, J 1 satisfies J 1 = n γ 1 with γ <. Hence 1+αη+1/+v/ if L j satisfies jρ L j jv for some ρ > 1 v η, then and hence j 1 nl j n 1 J 1 + v +η = o n α 1+α j J 1 sup E θ ˆθ θ θ Bp,qM α sup θ Bp,qM α J 0 j<j 1 b θ b L j /n θ b + L j /n 1 + o1. To complete the proof it remains to show that θ b L j /n sup θ Bp,qM α θ b RLB p,qm1 α + o L j /n Note that j /L j b=1 J 0 j<j 1 θ b L j /n θ b + L j /n = L j /L j n j j L j b b=1 Then the simple inequality m i=1 a i m J 0 j<j 1 b θ b L j /n θ b + L j /n L j /n θ b + L j /n = j n i=1 a 1 i j n j /L Lj j n b=1 1 θ b + L j /n. m, for a i > 0, 1 i m yields that Lj j j /L j θ b + L j n L j n b=1 = j /n j /L j b=1 θ b j /L j b=1 θ b + j /n = j /n k θ j,k + j /n. k θ j,k

23 Theorem in Cai, Low and Zhao 000 shows that sup j /n k θ j,k θ Bp,qM α J 0 j<j 1 and this proves inequality 45. k θ j,k + j /n = R LB α p,qm1 + o1, 6.5 Proof of Theorem 3 To establish the theorem, it suffices to show the following result which plays the role similar to that of Proposition 16 in Donoho and Johnstone 1994b. Proposition 3 Let X N µ, 1 and let F p η denote the probability measures F dµ satisfying the moment condition µ p F dµ η p. Let { } r δ g λ, η = sup E F r g µ : µ p F dµ η p, F pη where r g µ = E µ δ g λ x µ and δ g λ x = 1 λ x. Let p 0, and λ = x + log η p, then r δ g λ, η ηp λ p 1 + o 1 as η 0. Proof of Proposition 3 : It follows from an analogous argument to the proof of Proposition 16 in Donoho and Johnstone 1994b that p η r g δ λ, η = sup r g µ 1 + o1, as η 0. µ η µ Stein s unbiased risk estimate yields that r g µ = E µ [1 + x λ I x λ + x ] + λ4 I x λ x 1 + λ P x λ + + λ P x λ λ 1 + o 1 which implies that for µ λ p η r g µ η p λ p 1 + o 1, µ as η Similarly, λ r g µ = E µ [x 1 + x ] + λ4 x x + I x λ µ + 4P x λ. 3

24 Now the simple inequality P x λ φλ + µ = µ + o λ 4 4 ηp yields that for η µ < λ p η r g µ η p λ p 1 + o 1, as η µ The proposition now follows from 46 and 47. We now return to the proof of Theorem 3. Set ρ g η = inf λ sup Fp η E F r g µ. Then Proposition 3 implies that ρ g η r δ g λ, η ηp log η p p/ 1 + o 1, as η 0. For p 0,, Theorem 18 of Donoho and Johnstone 1994b shows the univariate Bayesminimax risk satisfies ρ η = inf δ sup E F E µ δ x µ = η p log η p p/ 1 + o1, as η 0. F pη Note that ρ g η /ρ η is bounded as η 0 and ρ g η /ρ η 1 as η. Both ρ g η and ρη are continuous on 0,, so G p = sup η ρ g η ρ η <, for p 0,. Theorems 4 and 5 in Section 4 of Donoho and Johnstone 1998 derived the asymptotic minimaxity over Besov bodies from the univariate Bayes-minimax estimators. It then follows from an analogous argument of Section 5.3 in Donoho and Johnstone 1998 that R T B α p,qm inf λ j sup E θ Bp,q α M 6.6 Proof of Proposition 1 j=j 0 θ j λ j, 1 θ j G p q R B α p,qm 1 + o 1. Proposition 1 in Section 5 shows that the minimizer of the Stein s unbiased estimator of risk for the block James-Stein estimator is in a finite set A with cardinality A d + 1. This makes the implementation of the SureBlock procedure easy. In addition, Proposition 1 is needed for the proof of Proposition, a key result to prove Theorem 4. Proof of Proposition 1: We shall consider two separate cases: L = 1 and L. L = 1, we define A = {x i ; 1 i d} {0}. From equations 8 and 9, Stein s unbiased estimator of risk is SURE x, λ, 1 = d 1 + λ + λ I x x i > λ + x i I x i λ. i i=1 4 For

25 Suppose, without loss of generality, that the x i have been reordered in the order of increasing x i. On intervals x k λ < x k+1, SURE x, λ, 1 = d + k x i + i=1 i k+1 λ + λ x i is strictly increasing. So is true on the interval 0 λ < x 1. Therefore the minimizer λ S of SURE x, λ, 1 is either 0 or one of the x i. We now turn to the case L. We first show that λ S L, or equivalently that SURE x, λ, L SURE x, L, L 48 uniformly for 0 λ L. It suffices to show that for every block b Equation 8 gives SURE x b, λ, L SURE x b, L, L, for 0 λ L. SURE x b, L, L SURE x b, λ, L λ L = I S Sb b > L S 4 + b LSb λ + λ L I λ < S Sb b L The first term on the right side is nonpositive and the second term can be also shown to be nonpositive. Note that the convex function h x = x Lx λ λ L is nonpositive at two endpoints x = λ and x = L which implies Thus inequality 48 is established. S 4 b LS b λ + λ L 0, for λ < S b L. We now consider the case λ L with L. Let S1 S S m denote the order statistics of Si, 1 i m. On the interval Sk λ < S k+1 for some 1 k m, we consider two separate cases: i L Sk λ < S k+1 ; ii S k L λ < S k+1. If L Sk λ < S k+1, it follows from equation 8 that SURE x, λ, L = d + k S b L + b=1 b k+1 λ λ L S b which, as a function of λ, achieves minimum at Sk. For the second case, SURE x, λ, L achieves minimum at λ = L from the equation above, and this completes the proof. 5

26 References [1] Antoniadis, A. and Fan, J Regularization of wavelet approximations with discussion. J. Amer. Statist. Assoc. 96, [] Breiman, L Better subset regression using the nonnegative garrote. Technometrics 37, [3] Cai, T Adaptive wavelet estimation: A block thresholding and oracle inequality approach. Ann. Statist. 7, [4] Cai, T., Low, M.G., and Zhao, L Blockwise thresholding methods for sharp adaptive estimation. Technical Report, Department of Statistics, University of Pennsylvania. [5] Cai, T. and Silverman, B.W Incorporating information on neighboring coefficients into wavelet estimation. Sankhya Ser. B 63, [6] Cai, T. and Zhou, H A data-driven block thresholding approach to wavelet estimation. Technical Report, Department of Statistics, University of Pennsylvania. Available at [7] Chicken, E Block-dependent thresholding in wavelet regression. J. Nonparametric Statist., to appear. [8] Chicken, E. and Cai, T Block thresholding for density estimation: Local and global adaptivity. J. Multivariate Anal., in press. [9] Daubechies, I Ten Lectures on Wavelets SIAM: Philadelphia. [10] DeVore, R. and Popov, V Interpolation of Besov spaces. Trans. Amer. Math. Soc. 305, [11] Donoho, D.L. and Johnstone, I.M. 1994a. Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, [1] Donoho, D. L. and Johnstone, I. M. 1994b. Minimax risk over l p -balls for l q -error. Prob. Th. Rel. Fields 99, [13] Donoho, D. L. and Johnstone, I. M Adapt to unknown smoothness via wavelet shrinkage. J. Amer. Stat. Assoc. 90,

27 [14] Donoho, D.L. and Johnstone, I.M Minimax estimation via wavelet shrinkage. Ann. Statist. 6, [15] Donoho, D.L., Liu, R.C. and MacGibbon, B Minimax risk over hyperrectangles, and implications. Ann. Statist. 18, [16] Efromovich, S. Y Nonparametric estimation of a density of unknown smoothness. Theor. Probab. Appl. 30, [17] Gao, H.-Y Wavelet shrinkage denoising using the non-negative garrote. J. Comput. Graph. Statist. 7, [18] Hall, P., Kerkyacharian, G. and Picard, D On the minimax optimality of block thresholded wavelet estimators. Statist. Sinica 9, [19] Johnstone, I. M Wavelet shrinkage for correlated data and inverse problems: adaptivity results. Statist. Sinica 9, [0] Kerkyacharian, G., Picard, D. and Tribouley, K L p Adaptive density estimation. Bernoulli, [1] Lehmann, E.L Theory of Point Estimation. John Wiley and Sons, New York. [] Meyer, Y Wavelets and Operators. Cambridge University Press, Cambridge. [3] Stein, C Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9, [4] Strang, G Wavelet and dilation equations: a brief introduction. SIAM Review 31, [5] Triebel, H Theory of Function Spaces. Birkhäuser Verlag, Basel. 7

28 Blocks Bumps Doppler HeaviSine Piece-Polynomial Piece-Regular Figure 3: Test functions 8

29 7 Appendix: Proofs of Technical Results We collect in this Appendix the proofs of the technical results stated in Section Proof of Lemma 1 Recall that R θ and R θ are defined as following R θ = inf λ,1 L nvr λ, L = inf L λ,1 L nvr λ, L R θ = inf r λ, L = inf r λ, L, λ λ F,1 L n v L λ λ F,1 L n v If R θ = r λ 1, L 1 with λ 1 λ F, R θ R θ = 0. In the case R θ = r λ1, L 1 with λ 1 > λ F, we have R θ r λ F, L 1, which implies then it is enough to prove R θ R θ r λ F, L 1 r λ1, L 1 = 1 m rb λ F, L 1 rb λ 1, L 1 d uniformly for all λ 1, b and L. Recall that where b=1 B = r b λ F, L r b λ 1, L Cd δ 1 r b λ, L = θ b + E θ b g λ I S b > λ g λ = λ λ L S Sb b + L which is an increasing function of λ for λ L, then simple algebra gives B E θ b λ F λ F L S b S b + L I λ 1 S b > λ F so Note that λ F λ F L Sb + L 0, when Sb λ F + S b B E θ b λ F λ F L S b 4P θ b λf + S b > λ F. S b + L I λ F + S b > λ F 9

30 Then B 4P θ b λf + Sb 4P θ b λf + 1/ θ b z b = O d 1 when θ b 4λ F. Now we consider the case θ b 4λ F. We have r λ F, θ b r λ1, θ b = and which leads to where Note that I 1 = = λ1 λ F λ1 λ F λ r λ, θ b dλ θ e k θ k=0 k! y>λ f L+k λ = f L+k y dy y>λ L + k / 1 y/ = f L+k y dy y r λ F, θ b r λ1, θ b λ1 θ k = e θ k! λ F = I 1 I k=0 λ1 θ I 1 = e k θ λ k! F k=0 λ1 θ I = e k θ λ k! F λ1 λ F k=0 θ e k θ k=0 = I 11 + I 1 k! y>λ+4k y>λ y>λ+k λ+k y>λ λ L + 4 f L+k y dy 4f L+k λ dλ y λ + k y f L+k y dy dλ y y λ + k f L+k y dy dλ y λ + k y f L+k y dy dλ y + λ+4k y>λ+k y λ + k f L+k y dy dλ y and I = λ1 λ F k=0 = I 1 + I θ e k θ k! λ+k max{λ,l+k } + max{λ,l+k } min{λ,l+k } λ + k y f L+k y dy dλ y 30

31 where I 11 = o d 1/, since y>λ+4k y λ + k f L+k y dy P χ L+k λ F + 4k y [ λ 1 F 1 + 4k exp 1 λ F + 4k L + k ] L + k = o d 1/ by Lemma in Cai 1999, and I 0, then I 1 I o d 1/ + I 1 I 1. When λ L + k, then f L+k y is increasing on [λ,, thus we have = λ+4k λ+k k 0 [ y λ + k λ+k f L+k y dy y λ z z + λ + k f L+k z + λ + k λ + k y f L+k y dy y z λ + k z f L+k λ + k z ] dz 0 because the integrand is nonpositive. When λ L + k, then k λ F L + /. Let s assume k Mλ F. In this case, we consider the following difference λ+4k λ+k λ+4k λ+k y λ + k λ+k f L+k y dy y y λ + k f L+k y dy y L+k λ+l/+k L+k λ + k y f L+k y dy y λ + k y f L+k y dy y Note that, for c 1 > c > 1 and m = L + k we have f m c 1 m f m c m = c1 c m/ 1 e c c 1 m/ = c c 1 e mc log c c 1 log c 1 c c 1 e m1 1/c 1c c 1 because x log x = 1 1/x 1 1/c 1, for c x c 1, as d we have f mc 1 m f m c m m which implies as d the following term is negative λ+4k λ+l/+k y λ + k λ + k y f L+k y dy f L+k y dy λ+k y L+k y λ + k λ + L / + k kf L+k λ + k f L+k λ + L / + k λ + L / + k 31

A DATA-DRIVEN BLOCK THRESHOLDING APPROACH TO WAVELET ESTIMATION

A DATA-DRIVEN BLOCK THRESHOLDING APPROACH TO WAVELET ESTIMATION The Annals of Statistics 2009, Vol. 37, No. 2, 569 595 DOI: 10.1214/07-AOS538 Institute of Mathematical Statistics, 2009 A DATA-DRIVEN BLOCK THRESHOLDING APPROACH TO WAVELET ESTIMATION BY T. TONY CAI 1

More information

Adaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach

Adaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1999 Adaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach T. Tony Cai University of Pennsylvania

More information

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be Wavelet Estimation For Samples With Random Uniform Design T. Tony Cai Department of Statistics, Purdue University Lawrence D. Brown Department of Statistics, University of Pennsylvania Abstract We show

More information

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are

More information

Wavelet Shrinkage for Nonequispaced Samples

Wavelet Shrinkage for Nonequispaced Samples University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Wavelet Shrinkage for Nonequispaced Samples T. Tony Cai University of Pennsylvania Lawrence D. Brown University

More information

Asymptotic Equivalence and Adaptive Estimation for Robust Nonparametric Regression

Asymptotic Equivalence and Adaptive Estimation for Robust Nonparametric Regression Asymptotic Equivalence and Adaptive Estimation for Robust Nonparametric Regression T. Tony Cai 1 and Harrison H. Zhou 2 University of Pennsylvania and Yale University Abstract Asymptotic equivalence theory

More information

(a). Bumps (b). Wavelet Coefficients

(a). Bumps (b). Wavelet Coefficients Incorporating Information on Neighboring Coecients into Wavelet Estimation T. Tony Cai Bernard W. Silverman Department of Statistics Department of Mathematics Purdue University University of Bristol West

More information

ON BLOCK THRESHOLDING IN WAVELET REGRESSION: ADAPTIVITY, BLOCK SIZE, AND THRESHOLD LEVEL

ON BLOCK THRESHOLDING IN WAVELET REGRESSION: ADAPTIVITY, BLOCK SIZE, AND THRESHOLD LEVEL Statistica Sinica 12(2002), 1241-1273 ON BLOCK THRESHOLDING IN WAVELET REGRESSION: ADAPTIVITY, BLOCK SIZE, AND THRESHOLD LEVEL T. Tony Cai University of Pennsylvania Abstract: In this article we investigate

More information

The Root-Unroot Algorithm for Density Estimation as Implemented. via Wavelet Block Thresholding

The Root-Unroot Algorithm for Density Estimation as Implemented. via Wavelet Block Thresholding The Root-Unroot Algorithm for Density Estimation as Implemented via Wavelet Block Thresholding Lawrence Brown, Tony Cai, Ren Zhang, Linda Zhao and Harrison Zhou Abstract We propose and implement a density

More information

On the Estimation of the Function and Its Derivatives in Nonparametric Regression: A Bayesian Testimation Approach

On the Estimation of the Function and Its Derivatives in Nonparametric Regression: A Bayesian Testimation Approach Sankhyā : The Indian Journal of Statistics 2011, Volume 73-A, Part 2, pp. 231-244 2011, Indian Statistical Institute On the Estimation of the Function and Its Derivatives in Nonparametric Regression: A

More information

Nonparametric Regression

Nonparametric Regression Adaptive Variance Function Estimation in Heteroscedastic Nonparametric Regression T. Tony Cai and Lie Wang Abstract We consider a wavelet thresholding approach to adaptive variance function estimation

More information

MINIMAX ESTIMATION WITH THRESHOLDING AND ITS APPLICATION TO WAVELET ANALYSIS. Harrison H. Zhou* and J. T. Gene Hwang** Cornell University

MINIMAX ESTIMATION WITH THRESHOLDING AND ITS APPLICATION TO WAVELET ANALYSIS. Harrison H. Zhou* and J. T. Gene Hwang** Cornell University MINIMAX ESTIMATION WITH THRESHOLDING AND ITS APPLICATION TO WAVELET ANALYSIS Harrison H. Zhou* and J. T. Gene Hwang** Cornell University March 3, Abstract. Many statistical practices involve choosing between

More information

An Introduction to Wavelets and some Applications

An Introduction to Wavelets and some Applications An Introduction to Wavelets and some Applications Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France An Introduction to Wavelets and some Applications p.1/54

More information

ADAPTIVE CONFIDENCE BALLS 1. BY T. TONY CAI AND MARK G. LOW University of Pennsylvania

ADAPTIVE CONFIDENCE BALLS 1. BY T. TONY CAI AND MARK G. LOW University of Pennsylvania The Annals of Statistics 2006, Vol. 34, No. 1, 202 228 DOI: 10.1214/009053606000000146 Institute of Mathematical Statistics, 2006 ADAPTIVE CONFIDENCE BALLS 1 BY T. TONY CAI AND MARK G. LOW University of

More information

Chi-square lower bounds

Chi-square lower bounds IMS Collections Borrowing Strength: Theory Powering Applications A Festschrift for Lawrence D. Brown Vol. 6 (2010) 22 31 c Institute of Mathematical Statistics, 2010 DOI: 10.1214/10-IMSCOLL602 Chi-square

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Signal Denoising with Wavelets

Signal Denoising with Wavelets Signal Denoising with Wavelets Selin Aviyente Department of Electrical and Computer Engineering Michigan State University March 30, 2010 Introduction Assume an additive noise model: x[n] = f [n] + w[n]

More information

Block thresholding for density estimation: local and global adaptivity

Block thresholding for density estimation: local and global adaptivity Journal of Multivariate Analysis 95 005) 76 106 www.elsevier.com/locate/jmva Bloc thresholding for density estimation: local and global adaptivity Eric Chicen a,,t.tonycai b,1 a Department of Statistics,

More information

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector On Minimax Filtering over Ellipsoids Eduard N. Belitser and Boris Y. Levit Mathematical Institute, University of Utrecht Budapestlaan 6, 3584 CD Utrecht, The Netherlands The problem of estimating the mean

More information

I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A)

I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A) I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 011/17 BLOCK-THRESHOLD-ADAPTED

More information

Wavelet Thresholding for Non Necessarily Gaussian Noise: Functionality

Wavelet Thresholding for Non Necessarily Gaussian Noise: Functionality Wavelet Thresholding for Non Necessarily Gaussian Noise: Functionality By R. Averkamp 1 and C. Houdré 2 Freiburg University and Université Paris XII and Georgia Institute of Technology For signals belonging

More information

Optimal testing in functional analysis of variance. models

Optimal testing in functional analysis of variance. models Optimal testing in functional analysis of variance models Felix Abramovich, Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv 69978, Israel. Anestis Antoniadis, Laboratoire

More information

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2493 2511 RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 By Theodoros Nicoleris and Yannis

More information

Mark Gordon Low

Mark Gordon Low Mark Gordon Low lowm@wharton.upenn.edu Address Department of Statistics, The Wharton School University of Pennsylvania 3730 Walnut Street Philadelphia, PA 19104-6340 lowm@wharton.upenn.edu Education Ph.D.

More information

A Lower Bound Theorem. Lin Hu.

A Lower Bound Theorem. Lin Hu. American J. of Mathematics and Sciences Vol. 3, No -1,(January 014) Copyright Mind Reader Publications ISSN No: 50-310 A Lower Bound Theorem Department of Applied Mathematics, Beijing University of Technology,

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Research Statement. Harrison H. Zhou Cornell University

Research Statement. Harrison H. Zhou Cornell University Research Statement Harrison H. Zhou Cornell University My research interests and contributions are in the areas of model selection, asymptotic decision theory, nonparametric function estimation and machine

More information

arxiv: v1 [math.ca] 23 Oct 2018

arxiv: v1 [math.ca] 23 Oct 2018 A REMARK ON THE ARCSINE DISTRIBUTION AND THE HILBERT TRANSFORM arxiv:80.08v [math.ca] 3 Oct 08 RONALD R. COIFMAN AND STEFAN STEINERBERGER Abstract. We prove that if fx) ) /4 L,) and its Hilbert transform

More information

How smooth is the smoothest function in a given refinable space? Albert Cohen, Ingrid Daubechies, Amos Ron

How smooth is the smoothest function in a given refinable space? Albert Cohen, Ingrid Daubechies, Amos Ron How smooth is the smoothest function in a given refinable space? Albert Cohen, Ingrid Daubechies, Amos Ron A closed subspace V of L 2 := L 2 (IR d ) is called PSI (principal shift-invariant) if it is the

More information

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).

More information

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2512 2546 OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 By O. V. Lepski and V. G. Spokoiny Humboldt University and Weierstrass Institute

More information

Wavelets and Image Compression. Bradley J. Lucier

Wavelets and Image Compression. Bradley J. Lucier Wavelets and Image Compression Bradley J. Lucier Abstract. In this paper we present certain results about the compression of images using wavelets. We concentrate on the simplest case of the Haar decomposition

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Comments on \Wavelets in Statistics: A Review" by. A. Antoniadis. Jianqing Fan. University of North Carolina, Chapel Hill

Comments on \Wavelets in Statistics: A Review by. A. Antoniadis. Jianqing Fan. University of North Carolina, Chapel Hill Comments on \Wavelets in Statistics: A Review" by A. Antoniadis Jianqing Fan University of North Carolina, Chapel Hill and University of California, Los Angeles I would like to congratulate Professor Antoniadis

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

POINT VALUES AND NORMALIZATION OF TWO-DIRECTION MULTIWAVELETS AND THEIR DERIVATIVES

POINT VALUES AND NORMALIZATION OF TWO-DIRECTION MULTIWAVELETS AND THEIR DERIVATIVES November 1, 1 POINT VALUES AND NORMALIZATION OF TWO-DIRECTION MULTIWAVELETS AND THEIR DERIVATIVES FRITZ KEINERT AND SOON-GEOL KWON,1 Abstract Two-direction multiscaling functions φ and two-direction multiwavelets

More information

Design of Image Adaptive Wavelets for Denoising Applications

Design of Image Adaptive Wavelets for Denoising Applications Design of Image Adaptive Wavelets for Denoising Applications Sanjeev Pragada and Jayanthi Sivaswamy Center for Visual Information Technology International Institute of Information Technology - Hyderabad,

More information

How much should we rely on Besov spaces as a framework for the mathematical study of images?

How much should we rely on Besov spaces as a framework for the mathematical study of images? How much should we rely on Besov spaces as a framework for the mathematical study of images? C. Sinan Güntürk Princeton University, Program in Applied and Computational Mathematics Abstract Relations between

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Affine and Quasi-Affine Frames on Positive Half Line

Affine and Quasi-Affine Frames on Positive Half Line Journal of Mathematical Extension Vol. 10, No. 3, (2016), 47-61 ISSN: 1735-8299 URL: http://www.ijmex.com Affine and Quasi-Affine Frames on Positive Half Line Abdullah Zakir Husain Delhi College-Delhi

More information

Biorthogonal Spline Type Wavelets

Biorthogonal Spline Type Wavelets PERGAMON Computers and Mathematics with Applications 0 (00 1 0 www.elsevier.com/locate/camwa Biorthogonal Spline Type Wavelets Tian-Xiao He Department of Mathematics and Computer Science Illinois Wesleyan

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Plugin Confidence Intervals in Discrete Distributions

Plugin Confidence Intervals in Discrete Distributions Plugin Confidence Intervals in Discrete Distributions T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Philadelphia, PA 19104 Abstract The standard Wald interval is widely

More information

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011 LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic

More information

ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE

ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE Statistica Sinica 9 2009, 885-903 ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE Lawrence D. Brown and Linda H. Zhao University of Pennsylvania Abstract: Many multivariate Gaussian models

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Estimation of a quadratic regression functional using the sinc kernel

Estimation of a quadratic regression functional using the sinc kernel Estimation of a quadratic regression functional using the sinc kernel Nicolai Bissantz Hajo Holzmann Institute for Mathematical Stochastics, Georg-August-University Göttingen, Maschmühlenweg 8 10, D-37073

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

MULTIVARIATE HISTOGRAMS WITH DATA-DEPENDENT PARTITIONS

MULTIVARIATE HISTOGRAMS WITH DATA-DEPENDENT PARTITIONS Statistica Sinica 19 (2009), 159-176 MULTIVARIATE HISTOGRAMS WITH DATA-DEPENDENT PARTITIONS Jussi Klemelä University of Oulu Abstract: We consider estimation of multivariate densities with histograms which

More information

Function spaces on the Koch curve

Function spaces on the Koch curve Function spaces on the Koch curve Maryia Kabanava Mathematical Institute Friedrich Schiller University Jena D-07737 Jena, Germany Abstract We consider two types of Besov spaces on the Koch curve, defined

More information

Smooth functions and local extreme values

Smooth functions and local extreme values Smooth functions and local extreme values A. Kovac 1 Department of Mathematics University of Bristol Abstract Given a sample of n observations y 1,..., y n at time points t 1,..., t n we consider the problem

More information

Multiresolution analysis by infinitely differentiable compactly supported functions. N. Dyn A. Ron. September 1992 ABSTRACT

Multiresolution analysis by infinitely differentiable compactly supported functions. N. Dyn A. Ron. September 1992 ABSTRACT Multiresolution analysis by infinitely differentiable compactly supported functions N. Dyn A. Ron School of of Mathematical Sciences Tel-Aviv University Tel-Aviv, Israel Computer Sciences Department University

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

On least squares estimators under Bernoulli-Laplacian mixture priors

On least squares estimators under Bernoulli-Laplacian mixture priors On least squares estimators under Bernoulli-Laplacian mixture priors Aleksandra Pižurica and Wilfried Philips Ghent University Image Processing and Interpretation Group Sint-Pietersnieuwstraat 41, B9000

More information

WAVELETS and other multiscale analysis tools underlie

WAVELETS and other multiscale analysis tools underlie 1322 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 10, NO. 9, SEPTEMBER 2001 Wavelet-Based Image Estimation: An Empirical Bayes Approach Using Jeffreys Noninformative Prior Mário A. T. Figueiredo, Senior

More information

Minimax Risk: Pinsker Bound

Minimax Risk: Pinsker Bound Minimax Risk: Pinsker Bound Michael Nussbaum Cornell University From: Encyclopedia of Statistical Sciences, Update Volume (S. Kotz, Ed.), 1999. Wiley, New York. Abstract We give an account of the Pinsker

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Variance Function Estimation in Multivariate Nonparametric Regression

Variance Function Estimation in Multivariate Nonparametric Regression Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cai 1, Michael Levine Lie Wang 1 Abstract Variance function estimation in multivariate nonparametric regression is considered

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam.  aad. Bayesian Adaptation p. 1/4 Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection

More information

Interpolating Scaling Vectors: Application to Signal and Image Denoising

Interpolating Scaling Vectors: Application to Signal and Image Denoising Interpolating Scaling Vectors: Application to Signal and Image Denoising Karsten Koch Philipps Universität Marburg FB 12 Mathematik und Informatik Hans-Meerwein Str, Lahnberge 35032 Marburg Germany Abstract

More information

Mi-Hwa Ko. t=1 Z t is true. j=0

Mi-Hwa Ko. t=1 Z t is true. j=0 Commun. Korean Math. Soc. 21 (2006), No. 4, pp. 779 786 FUNCTIONAL CENTRAL LIMIT THEOREMS FOR MULTIVARIATE LINEAR PROCESSES GENERATED BY DEPENDENT RANDOM VECTORS Mi-Hwa Ko Abstract. Let X t be an m-dimensional

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/32 Part II 2) Adaptation and oracle inequalities YES, Eurandom, 10 October 2011

More information

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In

More information

Lecture Notes 5: Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and

More information

FAST WAVELET TECHNIQUES FOR NEAR-OPTIMAL IMAGE PROCESSING

FAST WAVELET TECHNIQUES FOR NEAR-OPTIMAL IMAGE PROCESSING FAST WAVELET TECHNIQUES FOR NEAR-OPTIMAL IMAGE PROCESSING RONALD A. DeVORE BRADLEY J. LUCIER Department of Mathematics Department of Mathematics University of South Carolina Purdue University Columbia,

More information

Nonparametric Estimation of Functional-Coefficient Autoregressive Models

Nonparametric Estimation of Functional-Coefficient Autoregressive Models Nonparametric Estimation of Functional-Coefficient Autoregressive Models PEDRO A. MORETTIN and CHANG CHIANN Department of Statistics, University of São Paulo Introduction Nonlinear Models: - Exponential

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

Which wavelet bases are the best for image denoising?

Which wavelet bases are the best for image denoising? Which wavelet bases are the best for image denoising? Florian Luisier a, Thierry Blu a, Brigitte Forster b and Michael Unser a a Biomedical Imaging Group (BIG), Ecole Polytechnique Fédérale de Lausanne

More information

Image Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang

Image Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang Image Noise: Detection, Measurement and Removal Techniques Zhifei Zhang Outline Noise measurement Filter-based Block-based Wavelet-based Noise removal Spatial domain Transform domain Non-local methods

More information

NECESSARY CONDITIONS FOR WEIGHTED POINTWISE HARDY INEQUALITIES

NECESSARY CONDITIONS FOR WEIGHTED POINTWISE HARDY INEQUALITIES NECESSARY CONDITIONS FOR WEIGHTED POINTWISE HARDY INEQUALITIES JUHA LEHRBÄCK Abstract. We establish necessary conditions for domains Ω R n which admit the pointwise (p, β)-hardy inequality u(x) Cd Ω(x)

More information

Extending the two-sample empirical likelihood method

Extending the two-sample empirical likelihood method Extending the two-sample empirical likelihood method Jānis Valeinis University of Latvia Edmunds Cers University of Latvia August 28, 2011 Abstract In this paper we establish the empirical likelihood method

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals

Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals Acta Applicandae Mathematicae 78: 145 154, 2003. 2003 Kluwer Academic Publishers. Printed in the Netherlands. 145 Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals M.

More information

The Codimension of the Zeros of a Stable Process in Random Scenery

The Codimension of the Zeros of a Stable Process in Random Scenery The Codimension of the Zeros of a Stable Process in Random Scenery Davar Khoshnevisan The University of Utah, Department of Mathematics Salt Lake City, UT 84105 0090, U.S.A. davar@math.utah.edu http://www.math.utah.edu/~davar

More information

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 RATE-OPTIMAL GRAPHON ESTIMATION By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Network analysis is becoming one of the most active

More information

Multivariate Bayes Wavelet Shrinkage and Applications

Multivariate Bayes Wavelet Shrinkage and Applications Journal of Applied Statistics Vol. 32, No. 5, 529 542, July 2005 Multivariate Bayes Wavelet Shrinkage and Applications GABRIEL HUERTA Department of Mathematics and Statistics, University of New Mexico

More information

LARGER POSTERIOR MODE WAVELET THRESHOLDING AND APPLICATIONS

LARGER POSTERIOR MODE WAVELET THRESHOLDING AND APPLICATIONS LARGER POSTERIOR MODE WAVELET THRESHOLDING AND APPLICATIONS LUISA CUTILLO, Department of Mathematics and Applications, University of Naples Federico II, Naples, Italy Email: l.cutillo@dma.unina.it YOON

More information

A talk on Oracle inequalities and regularization. by Sara van de Geer

A talk on Oracle inequalities and regularization. by Sara van de Geer A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other

More information

OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE

OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE 17th European Signal Processing Conference (EUSIPCO 009) Glasgow, Scotland, August 4-8, 009 OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE Abdourrahmane M. Atto 1, Dominique Pastor, Gregoire Mercier

More information

Stochastic Design Criteria in Linear Models

Stochastic Design Criteria in Linear Models AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework

More information

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes Alea 4, 117 129 (2008) Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes Anton Schick and Wolfgang Wefelmeyer Anton Schick, Department of Mathematical Sciences,

More information

저작권법에따른이용자의권리는위의내용에의하여영향을받지않습니다.

저작권법에따른이용자의권리는위의내용에의하여영향을받지않습니다. 저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할수없습니다. 변경금지. 귀하는이저작물을개작, 변형또는가공할수없습니다. 귀하는, 이저작물의재이용이나배포의경우,

More information

AALBORG UNIVERSITY. Compactly supported curvelet type systems. Kenneth N. Rasmussen and Morten Nielsen. R November 2010

AALBORG UNIVERSITY. Compactly supported curvelet type systems. Kenneth N. Rasmussen and Morten Nielsen. R November 2010 AALBORG UNIVERSITY Compactly supported curvelet type systems by Kenneth N Rasmussen and Morten Nielsen R-2010-16 November 2010 Department of Mathematical Sciences Aalborg University Fredrik Bajers Vej

More information

Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study

Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study Anestis Antoniadis, Jeremie Bigot Laboratoire IMAG-LMC, University Joseph Fourier, BP 53, 384 Grenoble Cedex 9, France. and

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

Size properties of wavelet packets generated using finite filters

Size properties of wavelet packets generated using finite filters Rev. Mat. Iberoamericana, 18 (2002, 249 265 Size properties of wavelet packets generated using finite filters Morten Nielsen Abstract We show that asymptotic estimates for the growth in L p (R- norm of

More information

WAVELET EXPANSIONS OF DISTRIBUTIONS

WAVELET EXPANSIONS OF DISTRIBUTIONS WAVELET EXPANSIONS OF DISTRIBUTIONS JASSON VINDAS Abstract. These are lecture notes of a talk at the School of Mathematics of the National University of Costa Rica. The aim is to present a wavelet expansion

More information

Construction of Multivariate Compactly Supported Orthonormal Wavelets

Construction of Multivariate Compactly Supported Orthonormal Wavelets Construction of Multivariate Compactly Supported Orthonormal Wavelets Ming-Jun Lai Department of Mathematics The University of Georgia Athens, GA 30602 April 30, 2004 Dedicated to Professor Charles A.

More information

Local Polynomial Wavelet Regression with Missing at Random

Local Polynomial Wavelet Regression with Missing at Random Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Regularity of the density for the stochastic heat equation

Regularity of the density for the stochastic heat equation Regularity of the density for the stochastic heat equation Carl Mueller 1 Department of Mathematics University of Rochester Rochester, NY 15627 USA email: cmlr@math.rochester.edu David Nualart 2 Department

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

Bahadur representations for bootstrap quantiles 1

Bahadur representations for bootstrap quantiles 1 Bahadur representations for bootstrap quantiles 1 Yijun Zuo Department of Statistics and Probability, Michigan State University East Lansing, MI 48824, USA zuo@msu.edu 1 Research partially supported by

More information

MLISP: Machine Learning in Signal Processing Spring Lecture 10 May 11

MLISP: Machine Learning in Signal Processing Spring Lecture 10 May 11 MLISP: Machine Learning in Signal Processing Spring 2018 Lecture 10 May 11 Prof. Venia Morgenshtern Scribe: Mohamed Elshawi Illustrations: The elements of statistical learning, Hastie, Tibshirani, Friedman

More information