Block thresholding for density estimation: local and global adaptivity

Size: px

Start display at page:

Download "Block thresholding for density estimation: local and global adaptivity"

Todd Oliver
6 years ago
Views:

1 Journal of Multivariate Analysis ) Bloc thresholding for density estimation: local and global adaptivity Eric Chicen a,,t.tonycai b,1 a Department of Statistics, Florida State University, Tallahassee, FL , USA b Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104, USA Received 18 August 003 Available online 11 September 004 Abstract We consider wavelet bloc thresholding method for density estimation. A bloc-thresholded density estimator is proposed and is shown to achieve the optimal global rate of convergence over Besov spaces and simultaneously attain the optimal adaptive pointwise convergence rate as well. These results are obtained in part through the determination of an optimal bloc length. 004 Elsevier Inc. All rights reserved. AMS 1991 subject classification: primary 6G07; secondary 6G0 Keywords: Adaptive estimation; Besov space; Bloc thresholding; Density estimation; Hölder space 1. Introduction In nonparametric function estimation the performance of an estimator is typically measured under one of two commonly used losses: squared error at each point and integrated squared error over the whole interval. The first is a measure of accuracy of an estimator locally at a point and the second provides a global measure of precision. Minimax and adaptation theories have been well developed for both the local and global losses. See for example [7,1 14,1,3]. See also the references in Efromovich [16]. Corresponding author. Fax: address: chicen@stat.fsu.edu E. Chicen). 1 Research supported in part by NSF Grant DMS X/$ - see front matter 004 Elsevier Inc. All rights reserved. doi: /j.jmva

2 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) It has been noted in the literature that a local or global ris measure alone does not fully capture the performance of an estimator. For functions of spatial inhomogeneity, local smoothness of the functions varies significantly from point to point and a globally rateoptimal estimator can have erratic local behavior. On the other hand, an estimator which is locally rate optimal at each point can be far from optimal under the global loss, see [8]. More recent focus has been on a simultaneously local and global analysis of the performance of an estimator. The goal is to construct adaptive estimators which are near optimal simultaneously under both pointwise loss and global loss over a collection of function classes. Such an estimator permits the trade-off between variance and bias to be varied along the curve in an optimal way, resulting in spatially adaptive smoothing in classical sense. This approach has been used for example in Cai [4,6] and Efromovich [17] in the context of nonparametric regression and in Cai [5] for inverse problems. Wavelet methodology has demonstrated considerable success in terms of spatial adaptivity and asymptotic optimality. In particular, bloc thresholding rules have been shown to possess impressive properties. The estimators mae simultaneous decisions to retain or to discard all the coefficients within a bloc and increase estimation accuracy by utilizing information about neighboring coefficients. The idea of bloc thresholding can be traced bac to Efromovich [15] in orthogonal series estimators. In the context of nonparametric regression local bloc thresholding has been studied, for example, in Hall et al. [19], Cai [4,6], Cai and Silverman [9] and Efromovich [17]. Bloc thresholding rules for inverse problems were considered in Cai [5]. In particular it is shown in Cai [4,6] that there are conflicting demands on the bloc size for achieving the global and local adaptivity. To achieve the optimal global adaptivity the bloc size needs to be large and to achieve the optimal local adaptivity the bloc size must be small. An optimal choice of bloc size is given and the resulting estimator is shown to attain the adaptive minimax rate of convergence simultaneously under both the pointwise and global losses. In the present paper we consider bloc thresholding for density estimation. In this context, Keryacharian et al. [] proposed a wavelet bloc thresholding estimator which uses an entire resolution level as a bloc. The thresholding rule is not local and so does not enjoy a high degree of spatial adaptivity. A local version of bloc thresholding density estimator was introduced in Hall et al. [0]. The bloc size is chosen to be of the order log n) where n is the sample size. The estimator is shown to enjoy a number of advantages over the conventional term-by-term thresholding estimators. The global properties of the estimator were studied. The estimator adaptively attains the global optimal rate of convergence over a range of function classes of inhomogeneous smoothness under integrated squared error. However, as shown in this paper, the estimator does not achieve the optimal local adaptivity under pointwise squared error. The bloc size is too large to fully capture subtle spatial changes in the curvature of the underlying function. In the present paper, we propose a bloc thresholding density estimator and give a simultaneously local and global analysis for the estimator. Let X 1,X,...,X n be a random sample from a distribution with density function f. We wish to estimate the density f based on the sample. The estimation accuracy is measured both globally by the mean integrated squared error R ˆ f,f)= E ˆ f f, 1)

3 78 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) and locally by the expected squared error loss at each given point x 0 R ˆ fx 0 ), f x 0 )) = E ˆ fx 0 ) fx 0 )). ) Our bloc thresholding procedure first divides the empirical coefficients at each resolution level into nonoverlapping blocs and then simultaneously eeps or ills all the coefficients within a bloc, based on the magnitude of the sum of the squared empirical coefficients within that bloc. Motivated by the analysis of bloc thresholding rules for nonparametric regression in Cai [6], the bloc size is chosen to be log n. It is shown that the bloc thresholding estimator adaptively achieves not only the optimal global rate over Besov spaces, but simultaneously attains the adaptive local convergence rate as well. These results are obtained in part through the determination of the optimal bloc length. The paper is organized as follows. After Section, in which bacground information on wavelets and the function spaces of interest is given, we discuss bloc thresholding rules for density estimation in Section 3. The asymptotic properties of the bloc thresholding estimator are set forth in Section 4, along with a related theorem on a bloc thresholded convolution ernel estimator. Simulation results for the proposed estimator are found in Section 5 and proofs of the theorems are given in Section 6.. Wavelets and function spaces An orthonormal wavelet basis is generated from dilation and translation of a father wavelet and a mother wavelet ψ. In this paper, the functions and ψ are assumed to be compactly supported and = 1. We call a wavelet ψ r-regular if ψ has r vanishing moments and r continuous derivatives. Let ij t) = i/ i t j), ψ ij t) = i/ ψ i t j). The collection { mj, ψ ij, i m, j Z} is then an orthonormal basis of L R), see [11,6]. Besov spaces arise naturally in many fields of analysis. They contain a number of traditional function spaces such as Hölder and Sobolev spaces as special cases. A Besov space Bp,q s has three parameters: s measures degree of smoothness, p and q specify the type of norm used to measure the smoothness. Besov spaces can be defined by the sequence norm of wavelet coefficients. For a given function f, denote α mj = ft) mj t) dt and β ij = ft)ψ ij t) dt. Define the sequence norm of wavelet coefficients of f by f B s p,q = α mj l p + is+1/ 1/p) 1/p q 1/q β i,j p. 3) i=m j The standard modification applies for the cases p, q =. Let the wavelet ψ be r-regular. For s<r, the Besov space Bp,q s is defined to be the Banach space consisting of functions with finite Besov norm B s p,q. The Hölder space Λ s is a special case of a Besov space Bp,q s with p = q =. See Triebel [8,9] and Meyer [6] for more on the properties of Besov spaces. We shall measure the global adaptivity of an estimator over two families of rich function classes which were used in Hall et al. [19]. The classes contain functions of inhomogeneous

4 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) smoothness and are different from other traditional smoothness classes. Functions in these classes can be regarded as the superposition of smooth functions with irregular perturbations such as jump discontinuities and high-frequency oscillations. Let Fp,q s M, L) ={f Bs p,q : suppf ) [ L, L], f Bp,q s M} 4) be the collection of functions with support contained in [ L, L] and Besov norm bounded by M. Following the notation of Hall et al. [19], the first function space of interest is Ṽ s1 F,q s M, L)) which consists of functions which are the sum of a function in the space F,q s M, L), q 1 and an irregular function in F s 1 M, L). s+1/) 1,q A second space of interest, denoted by V d,τ F,q s M, L)), consists of functions which can be represented as the sum of a function in the space F,q s M, L), q 1 and a function in P d,τ,l which is the set of piecewise polynomials of degree d, support in [ L, L], and with the number of discontinuities no more than τ. Local adaptivity of an estimator is measured over the local Hölder classes Λ s M, x 0, δ) which is defined as follows. For 0 <s 1, Λ s M, x 0, δ) ={f : fx) fx 0 ) M x x 0 s,x x 0 δ,x 0 + δ)} and for s>1, Λ s M, x 0, δ) ={f : f s ) x) f s ) x 0 ) M x x 0 t,x x 0 δ,x 0 + δ)}, where s is the largest integer strictly less than s and t = s s. 3. The estimators 3.1. Wavelet and convolution ernels Let X 1,X,...,X n be a random sample of size n from a distribution with density function f. The objective is to estimate the density function f based on the sample. We shall use similar notation as in Hall et al. [19]. Let Kx,y) be a ernel function on R, and define K i x, y) = i K i x, i y), i = 0, 1,,....Additionally, K i f will be the integral operator defined as K i fx)= K i x, y)f y) dy. Note that ˆK i x) = n 1 n K i x, X m ) m=1 is an unbiased estimate of K i fx) for all x. If using a convolution ernel, Kx,y) = Kx y). For wavelets, Kx,y) = j x j) y j),where is the father wavelet. We impose the following conditions on the ernel K. First, there exists a compactly supported Q L such that and Qx) = 0 when x >q 0 5) Kx,y) Qx y) for all x and y. 6)

5 80 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) Next, K satisfies the moment condition of order N: x N+1 Qx) dx < and Kx,y)y x) dy = δ 0 for = 0, 1,...,N. 7) Conditions 5) 7) are met in the wavelet case if the mother wavelet ψ has N vanishing moments. As in Hall et al. [19] define the innovation ernel by D i x, y) = K i+1 x, y) K i x, y) for i = 0, 1,....Let D i f be the integral operator K i+1 f K i f. Then, similarly to ˆK i, an unbiased estimator of D i fx)is ˆD i x) = n 1 n D i x, X m ). m=1 For the wavelet ernel defined above, K and D i can be associated with the projection operators of the multiresolution analysis. Kx,y) is the projection operator on to the space spanned by and its integer translates. D i x, y) is, then, the operator projecting on to the detail spaces of multiresolution analysis. K and D i perform similar tass in the convolution case: namely, projection operators on to coarse and detail spaces. This innovation ernel will be used to define the density estimator. 3.. Bloc thresholded estimators The density f may be written as fx)= K 0 fx)+ D i fx). 8) The linear part, K 0 fx), will be estimated by ˆ K 0 x). The remaining part will be estimated using thresholding methods, and hence is nonlinear in nature. Let and ψ be compactly supported father and mother wavelets satisfying conditions 5) 7). We shall write j for 0j. Then unbiased estimates of α j = f, j and β ij = f, ψ ij are ˆα j = n 1 n j X m ) and ˆβ n ij = n 1 ψ ij X m ). m=1 Note that the linear part K 0 fx) can be written as K 0 fx) = Kx, y)f y) dy = j α j j x). The estimate of K 0 fx) is then Kˆ 0 x) = j ˆα j j x) = n 1 n m=1 m=1

6 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) Kx,X m ). Similarly, note that D i x, y) = K i+1 x, y) K i x, y) = j ψ ij x)ψ ij y). Therefore, in a manner similar to that used above on K 0 fx), D i fx) = j β ij ψ ij x). The estimate of the detail part D i fx)is, then, ˆD i x) = ˆβ j ij ψ ij x). We can then rewrite 8) as fx)= j and estimate 9) as ˆ fx)= j α j j x) + ˆα j j x) + β ij ψ ij x), 9) j R j ˆβ ij ψ ij x) = ˆ K 0 x) + R ˆD i x), 10) where R is a finite truncation value for the infinite series. An adaptive density estimator will be constructed by applying a bloc thresholding rule as follows. In each resolution level i, the indices j are divided up into nonoverlapping blocs of length l = log n. Within this bloc, the average estimated squared bias l 1 ˆβ j B) ij will be compared to the threshold. Here, B) refers to the set of indices j in bloc. By estimating all of these squared coefficients together, the additional information allows a better comparison to the threshold, and hence a better convergence rate than the more conventional term-by-term thresholding estimators. If the average squared bias is larger than the threshold, all coefficients in the bloc will be ept. Otherwise, all coefficients will discarded. Letting B i = l 1 j B) β ij and estimating this with ˆB i = l 1 ˆβ j B) ij, the bloc thresholding wavelet estimator of f becomes ˆ fx)= j ˆα j j x) + This may also be written as R j B) ˆβ ij ψ ij x)i ˆB i >cn 1 ). 11) ˆ fx)= ˆK 0 x) + R ˆD i x)i x J i )I ˆB i >cn 1 ), where ˆD i x) = ˆβ j B) ij ψ ij x) is an estimate of D i fx)= j B) β ij ψ ij x), and J i = {x : ψ ij x) = 0} = {supp ψ ij }. j B) j B) Note that if the support of ψ is of length σ, then the length of J i is σ + l 1)/ i l/ i, and these intervals overlap each other at either end by i σ 1).

7 8 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) The equivalent, bloc-thresholded convolution ernel estimator is ˆ fx)= ˆK 0 x) + R ˆD i x)i x I i )I Â i >cn 1 ), 1) where the I i are nonoverlapping intervals of length i l, and A i = l 1 I i D i fx)) dx, is estimated by Â i = l 1 I i ˆD i x) dx. Remar. As mentioned in the introduction, bloc size plays a crucial role in the performance of the resulting bloc thresholding estimator. It determines the degree of adaptivity. The bloc size of l = log n is chosen so that the estimator achieves both the optimal global and local adaptivity. Remar. Hal et al. [19] choose bloc size l = Clog n) and show that the bloc thresholding estimator is adaptively rate optimal under the global mean integrated squared error. However, as shown in the next section, this choice of bloc size is too large to achieve the optimal local adaptivity. 4. Local and global adaptivity The global minimax rate of convergence of an estimate of a density in a Besov class F s p,q M, L) to the true underlying density is On s/s+1) ). This minimax rate of convergence can be achieved adaptively without nowing the smoothness parameters. For the wavelet ernel density estimator 11) with bloc length l = log n and appropriate choice of series truncation parameter R, the optimal rate of convergence is achieved adaptively over the space Ṽ s1 F s,q M, L)) B A), where B A) is the space of all functions f with f A. Theorem 1. Let fˆ be the wavelet ernel density estimator 11) with the bloc length l = log n, R = log Dnl 1 ) where D is a constant given in 35), and ) c = A0.08) 1 C Q + Q 1 C 1/ 1, where C 1 and C are the universal constants from Talagrand [7]. Suppose that the wavelets and ψ are compactly supported and r-regular with r > maxs 1,N 1) i.e., conditions 6), 7) with order N 1, and 5) are met). Then there exists a positive constant C such that for all 1/ <s<n, q 1, and s 1 s>s/s + 1), sup E fˆ f Cn s/s+1). f Ṽ s1 F,q s M,L)) B A) In this theorem and the rest in this section) the function space parameters M,L and A are arbitrary finite constants. The bound constant C is dependent on them as well as on the choice of the ernel function K and hence on Q).

8 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) The convolution ernel estimator 1) also achieves the global, optimal minimax convergence rate with this smaller bloc length, although over a different space of irregular Besov functions. Theorem. Let ˆ f be the convolution ernel density estimator 1). Let l,r and c be as in theorem 1. Let τ n be a sequence of positive numbers such that for all ζ > 0, τ n = On ζ+1/n+1) ). If K satisfies 6), 7) with order N 1, and 5), and 1/ <s<n, then there exists a positive constant C such that sup sup d<n,τ τ n f V dτ F,q s M,L)) B A) E ˆ f f Cn s/s+1). Here, it can be seen that the number of discontinuities that can be handled by the estimator is on the order of the sample size n to a power. One of the differences between Theorems 1 and and those set forth in Hall et al. [19] is in regards to the bloc length l. In this paper, l = log n is used instead of their value log n). This choice of bloc length is crucial in estimators of the form given in the above two theorems. In fact, larger bloc lengths are unsatisfactory in that they preclude local convergence optimality. When attention is focused on adaptive estimation there are some striing differences between local and global theories. Under integrated squared error loss there are many situations where rate adaptive estimators can be constructed. When attention is focused on estimating a function at a given point rate optimal adaptive procedures typically do not exist. A penalty, usually a logarithmic factor must be paid for not nowing the smoothness. Important wor in this area began with Lepsi [3] where attention focused on a collection of Lipschitz classes. See also Brown and Low [1], Efromovich and Low [18] and Lepsi and Spooiny [5]. Connections between local and global parameter space adaptation can be found in Lepsi et al. [4], Cai [3] and Efromovich [17]. We shall use the local Hölder class Λ s M, x 0, δ) defined in Section to measure local adaptivity. The minimax rate of convergence for estimating fx 0 ) over Λ s M, x 0, δ) is n s/s+1). Lepsi [3] and Brown and Low [] showed that in adaptive pointwise estimation, where the smoothness parameter s is unnown, the optimal adaptive rate of convergence over Λ s M, x 0, δ) is n 1 log n) s/s+1). By using a bloc length of l = log n in the wavelet ernel estimator, this optimal adaptive rate of convergence is achieved simultaneously over a range of local Hölder classes. Theorem 3. Let fˆ be the wavelet ernel density estimator 11). Let R,l and c be as in Theorem 1, and suppose and ψ are bounded. If 1/ <s<n, and ψ has N 1 vanishing moments, then there exists a positive constant C such that for any x 0 in the support of f sup E fx ˆ 0 ) fx 0 )) Clog n/n) s/s+1). f Λ s M,x 0,δ) B A) By combining Theorems 1 and 3 we see that the bloc thresholding density estimator 11) adaptively achieves not only the optimal global rate of convergence over a wide range

9 84 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) of perturbed Besov spaces, but simultaneously attains the adaptive local convergence rate as well. The optimal global and local adaptivity cannot be attained if a larger bloc size is used. With a bloc length of order larger than log n for example, l = log n) ), the global rate may still be attained, but the local rate will not: Theorem 4. Let ˆ f be the wavelet ernel density estimator 11). Let R,l andcbeasin Theorem 1, and suppose and ψ are as in Theorem 3. If 1/ <s<nand l = log n) 1+r for some r>0, then for some x 0 in the support of f and some constant C sup E fx ˆ 0 ) fx 0 )) Clog n/n) s/s+1) log n) rs/s+1). f Λ s M,x 0,δ) 5. Simulation results In this section we compare the bloc thresholded wavelet estimator from this paper with various other estimators via a simulation study. We will refer to the estimator at 11) simply as DenBloc. The threshold c supplied by the theorems for DenBloc is useful for theoretical purposes, but it is not practical for implementation. Since the thresholding in the estimator is essentially a bias-variance comparison, we eep the estimated wavelet coefficients when the average squared bias of the coefficients in a bloc exceeds the variance of those coefficients. Therefore, c is replaced with an estimate of the variance of the coefficients in a bloc. This variance is approximated by forming a pilot estimate of the density f and evaluating it at the center of the bloc. Two of the competing estimators examined come from Cai and Silverman [9]. These estimators, NeighCoeff and NeighBloc, are wavelet estimators where the comparison against a threshold is not based on a single coefficient as is done in VisuShrin, for example) or on a single bloc of coefficients as is done with DenBloc). Rather, neighboring coefficients or blocs are considered when maing the threshold comparison for a particular coefficient or bloc. These estimator s were devised for nonparametric regression settings, but they are easily modified to the density estimation problem at hand. For NeighBloc, the bloc length is l = log n/. However, the variance and squared bias used for thresholding is computed not just from the current bloc, but includes information from its neighbors to the immediate left and right when possible). The total bloc size used for maing the thresholding decision is log n when the neighboring blocs are added in. The variance of the coefficients in the extended bloc is replaced by the pilot estimate of f evaluated in the center of the bloc as before. NeighCoeff is NeighBloc with bloc length l = 1. The extended bloc is of length 3. Again, the appropriate substitution is made in the threshold comparison as before. For more information on these estimators and the thresholds used see Cai and Silverman [9] and Chicen [10].

10 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) Fig. 1. Test densities. Solid line is saw, dashed line is mixnorm, and dotted line is the double exponential. Additionally, other estimators were also looed at. Biased cross-validation and unbiased cross validation ernel estimators with normal and triangle ernels were all implemented. These estimators were compared against one another in terms of mean squared error on the three test densities given in Fig. 1. Saw is a combination of sums of uniform random variables, mixnorm is a mixture of three normal densities, and the last is a double exponential random variable. Formulas for these densities are in Chicen [10]. Results of simulations for some of these estimators on various sample sizes are given in Tables 1 and. In each table, the MSE of the estimate is given from a repetition of size 60. Only one of the 4 ernel methods is reported here, the unbiased cross-validation normal UCVN) ernel estimator. The other ernel estimators mentioned above generally performed worse than this one, and the results are not included here. See [10] for additional simulation results. Table 1 show MSEs for samples sizes n = 0, 50, 100, 500, 1000 and 000. For saw, the wavelet estimators have lower MSEs than UCVN with the exception of sample size 100. Once the sample size hits 100, all three of the wavelet estimators have the same MSE. Examination of the thresholded coefficients reveals that for large sample sizes, all the detail coefficients calculated are 0. Since the coarse coefficients are the same for each wavelet estimate, the wavelet estimates are identical as well. At the lower sample sizes, NeighCoeff seems preferable, then DenBloc. This agrees well with the simulation results from Cai and Silverman [9]. For the mixnorm density, the UCVN is generally the best. This is perhaps not surprising given that the ernel for UCVN is from the same family as the density being estimated. Again, NeighCoeff is the best of the wavelet methods, while there is no clear distinction between NeighBloc and DenBloc. On the final density, the double exponential, the UCVN is the worst of the estimates. The three wavelet estimators are approximately equivalent with the exception of the lowest sample size. Since the wavelet estimators are approximately equivalent in terms of MSE in large sample sizes n 50), it is instructive to examine how the estimators wor with respect to small sample sizes. The results are given in Table. Here, the sample sizes are n = 10, 15, 0, 5, 30 and 40.

11 86 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) Table 1 MSE for saw, mixnorm and double exponential densities Density n DenBloc NeighBloc NeighCoeff UCVN Saw Mixnorm Dbl exp For saw, NeighCoeff is clearly the best estimator. It is only surpassed by UCVN at the very low size n = 10. NeighBloc has lower MSE than DenBloc for the lower sample sizes, while DenBloc taes the lead for the larger sample sizes. As with the sample sizes in Table 1, the UCVN is generally best at approximating mixnorm. NeighCoeff is the next best, while DenBloc and NeighBloc follow the same relation as they did for saw. On the double exponential, NeighCoeff is clearly best over the sample sizes in Table, followed by NeighBloc, DenBloc, and lastly, UCVN. The theorems in this paper show that DenBloc attains optimal convergence rates asymptotically. For the sample sizes considered here, however, NeighCoeff seems superior in terms of MSE. In particular, NeighCoeff is better than DenBloc at low sample sizes. The distinction between the wavelet estimators becomes blurred as the sample size increases. This suggests that NeighCoeff should be used for sample sizes under 50, and any of the three estimators are acceptable for larger n. Some example reconstructions are given in Figs. and 3. Fig. shows a comparison of DenBloc and UCVN with a sample size of 100 on the saw density. DenBloc does well at attaining the peas and valleys of the density. UCVN clearly shows a density with four modes, but does not capture the same highs and lows that DenBloc does. Fig. 3 is a typical reconstruction of the mixnorm density. Here, DenBloc does a good job at estimating the pea on the left, but is too irregular over the central smooth portion. UCVN underestimates the pea, but outperforms DenBloc on the smoother portion of the density.

12 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) Table MSE for saw, mixnorm and double exponential densities Density n DenBloc NeighBloc NeighCoeff UCVN Saw Mixnorm Dbl exp Fig.. Typical reconstruction of saw. n = 100. Solid line is DenBloc estimate, dashed line is UCVN estimate, and dotted line is actual density. 6. Proofs of theorems In this section, proofs are given for Theorems 1, 3 and 4. Theorem s proof is omitted due to its similarity to the proof of Theorem 1. Before beginning, several preliminary results are necessary.

13 88 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) Fig. 3. Typical reconstruction of mixnorm. n = 100. Solid line is DenBloc estimate, dashed line is UCVN estimate and dotted line is actual density Preliminaries First, a simple lemma based on Minowsi s inequality: Lemma 1. Let Y 1,Y,...,Y n be random variables. Then n ) [ n ] E Y i EYi )1/. i=1 i=1 Second, a theorem from Talagrand [7] as stated in Hall et al. [19]. Theorem 5. Let U 1,U,...,U n be independent and identically distributed random variables. Let ε 1,ε,...,ε n be independent Rademacher random variables that are also independent of the U i. Let G be a class of functions uniformly bounded by M. If there exists v, H>0 such that for all n, sup var gu) v, g G E sup g G m=1 n ε m gu m ) nh, then there exist universal constants C 1 and C such that for all λ > 0, P [ sup g G n 1 ) ] n gu m ) EgU) λ + C H m=1 Finally, a lemma from Hall et al. [3]. [ ] e nc 1 λ v 1 ) λm 1 ).

14 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) Lemma. If Kx,y) is a ernel satisfying condition 1), Q L, and J is a compact interval, then ) E ˆK 0 x) K 0 fx) dx f Q J /n, J and ) E ˆD i x) D i fx) dx 4 f Q i J /n, J where J is the length of the interval J. 6.. Proof of Theorem 1 We will prove this theorem for q =. For general q 1, the results will hold since B s pq Bs p. Let i s be the integer such that i s n 1/s+1) < i s+1. Minowsi s inequality implies that E fˆ f 4E ˆK 0 K 0 + 4E [ R +4E +4 D i f i=r+1 = T 1 + T + T 3 + T 4 T 1 is bounded by Lemma : i s [ ˆD i IJ i )I ˆB i >cn 1 ) D i f ] ] ˆD i IJ i )I ˆB i >cn 1 ) D i f T 1 Cn 1. 13) Each of the remaining pieces T i will be treated individually in their own sections Bound on T To bound the nonlinear part T, note that Lemma 1 and Minowsi s inequality give T C i s E 1/ ˆD i x)i ˆB i >cn 1 ) D i fx)) dx.

15 90 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) For a fixed i i s, the orthogonality of wavelets gives E ˆD i x)i ˆB i >cn 1 ) D i fx)) dx E ˆD i x) D i fx)) dx +E D i fx)) dx IB i cn 1 ) J i +E D i fx)) dx I ˆB i cn 1 )I B i > cn 1 ) J i = T 1 + T + T 3. As in Hall et al. [19], T 1 is bounded by Lemma. T is bounded by the size of the indicator function and the fact that the number of intervals overlapping the support of f is no more than a constant times i /l: T 1,T C i /n. To bound T 3, the following lemma from Hall et al. [19] is useful Lemma 3. If J i D i fx)) dx lc/n) then { } { } ˆD i x)) dx lc/n ˆD i x) D i fx)) dx 0.08lc/n, J i and if J i D i fx)) dx > lc/n then { } { ˆD i x)) dx lc/n J i J i J i ˆD i x) D i fx)) dx 0.16lc/n }. Using this lemma, T 3 E ) D i fx)) dx I ˆD i x) D i fx)) dx 0.16lc/n. J i J i Using Young s inequality with 6), and the fact that the length of the interval J i is a constant times l/ i, D i fx)) dx J i D i f dx J i So, T 3 Cl/ i P [ J i C f Q 1 l/i. ) ] 1/ ˆD i x) D i fx) dx ) 0.16lc/n. 14)

16 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) To bound the above probability, Talagrand s theorem Theorem 5) will be used. Similar to Hall et al. { ˆD i x) D i fx)) dx } 1/ j i { n } = sup n 1 gx)d i x, X m )dx E gx)d i x, X 1 )dx, g G m=1 J i J i where { } G = gx)d i x, )I j B))dx : g 1. J i Talagrand s theorem will be used with and M = i/ Q, v = f Q 1, H = Q 1l f /n, λ = 0.16lc/n C Q 1l f /n > 0. The probability at 14) is then bounded by [ ) ] 1/ P ˆD i x) D i fx) dx λ + C Q 1l f /n) I i [ ) )]} exp { nc 1 λ / f Q 1 λ/ i/ Q. For 0 i i s, constant c and λ positive, l n s/s+1) L) Q c ) C Q 1L) 1 Q implies ) ) λ / f Q 1 < λ/ i/ Q. 15) Thus, for large enough n, [ ) ] 1/ P ˆD i x) D i fx) dx ) 0.16lc/n Cn δ, 16) I i where δ is the constant 0.16c ) C 1 C Q 1A δ = A Q 1 and c is large enough to mae δ > 0. Putting 14) and 16) together with the fact that the number of intervals J i that intersect the support of f is no more than C i /l, T 3 Cn δ. 17)

17 9 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) All pieces are now available to bound T. [ is ] T C T 1 + T + T 3 ) 1/ C [ is ) ) ] 1/ i /n + n δ/ C i s n 1 + is [ n δ) = C n s/s+1) + log n 1/s+1) ) n δ]. 18) 6... Bound on T 3 As with T before, write R T 3 C E 1/ ˆD i x)i ˆB i >cn 1 ) D i fx)) dx. For a fixed i, i s + 1 i R, [ ) E ˆD i x)i x J i )I ˆB i >cn 1 ) D i fx)] dx ) ˆD i x) D i fx) dx I ˆB i >cn 1 )I B i >c/n)) E J i ) ˆD i x) D i fx) dx I ˆB i >cn 1 )I B i c/n)) +E J i +E D i fx)) dx I ˆB i cn 1 )I B i cn 1 ) J i +E D i fx)) dx I ˆB i cn 1 )I B i > cn 1 ) J i = T 31 + T 3 + T 33 + T 34. By Lemma 3, T 3 = I E E J i [ { J i J i ) ˆD i x) D i fx) dx I ˆB i >cn 1 )I B i c/n)) ) ˆD i x) D i fx) dx ) } 1/ ˆD i x) D i fx) dx )] 0.08lc/n.

18 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) To bound this, we use the fact that for any nonnegative random variable Y, EY IY > a) = a PY > a)+ The integrals in T 3 are of this form with [ ) ] 1/ Y = ˆD i x) D i fx) dx 0 J i EY IY > a) a exp a ypy > y)dy. and a = 0.08lc/n. Using Talagrand s theorem as was done for the piece T 3, )] y C H ) PY > y) exp [ nc 1 f Q y C H 1 i/, Q and therefore a C H ) [ nc 1 )] a C H i/ Q f Q 1 y C H ) [ nc 1 ) ] + y exp a f Q y C H 1 i/ dy Q = T 31 + T 3. For i s + 1 i R and a C H)positive R l n L) Q c 19) C Q 1L) 1) Q implies a C H) f 1 Q 1 a C H) i/ Q 1. Note that a C H > 0 implies that λ at 15) is positive as well. Therefore, 0.08c ) C Q T 31 Cl/n exp C 1A 1l A Q 1 Cn γ 1 log n, 0) where γ is the constant 0.08c ) C 1 C Q 1A γ = A Q 1) 1 and c is large enough to mae γ positive. For T 3, let a 0 = f Q 1 Q 1 i/ + C H>0. Then, if a a 0, ) a0 y C H) ) y C H T 3 = y exp nc 1 a f Q dy + y exp nc 1 1 a 0 i/ dy Q = T 31 + T 3.

19 94 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) To bound T 31, note that by a change of variables and increase in upper limit of integration, ) T 31 f Q 1 n 1 C 1 a C H) 1 exp nc 1 f Q 1 ) y + C Hy1/y) exp nc 1 a C H f Q dy. ) 1 The first term on the right of ) is bounded by Cn 1 γ, where γ is the constant in 1). To bound the second term, use integration by parts. ) y C Hy1/y) exp nc 1 a C H f Q dy 1 ) = C H f Q 1 a C H) exp nc 1 a C H nc 1 f Q 1 C H f ) Q 1 1 a C H nc 1 y exp y nc 1 f Q dy. 1 Since the integrand in second term on the right side above is strictly positive, this integral is also bounded by Cn γ 1. Using integration by parts on T 3, ) T 3 C n 1 + n 1 i log n/n + i n e nd/i, where d is the constant d = C 1 f Q 1 / Q. 3) If a>a 0, then T 3 T 3. Therefore, T 3 = T 31 + T 31 + T 3 ) ) i / log n n γ 1 log n ) + i / log n n 1 + n 1 i log n/n + i n e nd/i. 4) To bound T 34, observe that the only difference between T 3 and T 34 is the range of the index i. Therefore, by repeating the argument for T 3, the bound for T 34 is the same as at 17) T 34 Cn δ. This bound requires that 5) R L) Q 4 1 log n/n 0.16c = D c 6) C Q 1L) 1) Q which implies the condition at 19).

20 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) The bound on T 3 is found in a similar manner to 18). R T 3 C T 31 + T 3 + T 33 + T 34 ) 1/ C R Observe that for i R, T 31 + T 33 ) 1/ + R T 1/ 3 + R T 1/ 34. i e nd/i R e nd/r D c nlog n) 1 e d log n/d c = D c nlog n) 1 n d/d c. Therefore, i e nd/i is less than or equal to some constant if d D c. This condition is met if c 0.3L) 1 ) C Q 1 + Q 1 C 1/ 1. 7) Therefore, using the bound for T 3 givenat4), R T 1/ 3 ) C n γ + n 1 log n. 8) Using the bound for T 34 found at 5), R log R ) n δ. 9) T 1/ 34 In Hall et al. [19] it is shown that R T 31 + T 33 ) 1/ Cn s/s+1). 30) Therefore, using 8) 30), [ T 3 C n s/s+1) + n γ + log R ) ] n δ. 31) Bound on T 4 The final piece T 4 is easily bounded T 4 = C i=r+1 D i f = C i=r+1 β ij. 3) j

21 96 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) Since f = f 1 + f where f 1 B, s and f B s 1 s+1/) 1, Bs 1 s, β ij = fx)ψ ij x)dx = f 1 x) + f x)) ψ ij x)dx and 3) becomes T 4 C i=r+1 j = β 1ij + β ij, β 1ij + β ij ). From the bounds on wavelet coefficients given at 3) and 4), j β 1ij C is, and j β ij C is 1 s). Therefore, using R = D c n/l, provided that s 1 s>s/s + 1). T 4 Cn s/s+1), 33) Determination of constants γ, δ, D, and c Using the bounds from 13), 18), 31), and 33) E f ˆ f C [n s/s+1) + log R ) n δ + n γ]. For γ, n γ n s/s+1) for all 1 <s<nif and only if γ N/N + 1). The above constraint is met for all f in the space interest if the value of the threshold c is set accordingly: ) c A0.08) 1 N C 1 Q + Q 1. 34) C 1 N + 1) Note that the condition at 19) and 15) that a C H and λ be positive are met if 34) holds. Since c must satisfy both 34) and 7), and L) 1 f A, the value may be set as c = A0.08) 1 1 C 1 Q + Q 1 C 1 ). Let D = Q 4 1 Q L){C 4 Q A 1/ L 1/ ) + A) 1/ Q 1 C 1/ 1 }). 35)

22 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) The value for the constant D c can then be taen to be D c = D 1.Forδ, note that δ γ is a positive constant, so log R ) n δ = log R ) n γ n δ γ) Cn s/s+1). Therefore, using the bound for c at 7), E f f ˆ Cn s/s+1), and the Theorem 1 is proved Proof of Theorem 3 To simplify the proof, assume that f is in Λ s M) rather than in the local Hölder classes Λ s M, x 0, δ) for points x 0 in the support of f. Write fx ˆ 0 ) fx 0 ) as fx ˆ 0 ) fx 0 ) = ) ˆαj α j j x 0 ) j i s R i=r+1 j B) j B) β ij ψ ij x 0 ) = L 1 + L + L 3 + L 4, j ˆβ ij ψ ij x 0 )I ˆB i >cn 1 ) β ij ψ ij x 0 )) ˆβ ij ψ ij x 0 )I ˆB i >cn 1 ) β ij ψ ij x 0 )) where i s is as before. Then ) ) E fx ˆ 0 ) fx 0 ) C EL 1 + EL + EL 3 + EL 4. In each of these sums, the total number of indices j where the support of ψ ij or j intersects the point x 0 is no more than q 0 + 1, where q 0 is as in 5). This fact will be used several times in the following proof Bound on L 1 Recalling that = 1 and that is bounded, EL 1 CE j [ˆαj α j ) j x 0 ) ] C E j ˆαj α j ) = CE j ˆαj α j ) j x).

23 98 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) Using the orthogonality of the j, EL 1 CE j ˆα j j x) α j j x) = CE { ˆK 0 x) K 0 fx)} dx. By applying Lemma, EL 1 Cn 1. 36) Bound on L To bound L, brea it into the following sums: EL = E i s ) ˆβ ij β ij ψ ij x 0 )I ˆB i >cn 1 ) + i s j B) β ij ψ ij x 0 )I ˆB i cn 1 ) j = EL 1 + L ) CEL 1 + CEL. To bound L 1, first apply Lemma 1: i s EL 1 E ) ˆβ ij β ij ψ ij x 0 )I ˆB i >cn 1 ) i s ψ E j B) i s i/ j B) E j Now, E j ˆβ ij β ij ) is of order n 1 : 1/ ) ˆβ ij β ij ψ ij x 0 )I ˆB i >cn 1 ) 1/ ) ˆβ ij β ij. 37) E j ˆβ ij β ij ) = j var ˆβ ij j n 1 var ψ ij X 1 ) Cn 1 ψ ij x) dx. 38)

24 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) Using this result, 37) becomes EL 1 C [ is Cn s/s+1). ] i/ n 1) 1/ The bound for L is found by breaing it in to two pieces. L = i s j B) β ij ψ ij x 0 )I ˆB i cn 1 )I B i > cn 1 ) + i s = L 1 + L. j B) β ij ψ ij x 0 )I ˆB i cn 1 )I B i cn 1 ) The piece L 1 is bounded using Talagrand s theorem. First, note that by Lemma 3 and the fact that f Λ s M) β ij C is+1/), we have E j B) CE C is β ij ψ ij x 0 )I ˆB i cn 1 )I B i > cn 1 ) j B) j B) is+1/) i/ ψ I ˆB i cn 1 )I B i > cn 1 ) ) ) P ˆD i x) D i fx) dx > 0.16c log n/n. Then ) ) P ˆD i x) D i fx) dx > 0.16c log n/n Cn δ, where δ is as before. Therefore, using this bound on the probability and Lemma 1, EL 1 i s E j B) is ) C is n δ/ Cn δ. 1/ β ij ψ ij x 0 )I ˆB i cn 1 )I B i > cn 1 )

25 100 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) To bound L, observe that E β ij ψ ij x 0 )I ˆB i cn 1 )I B i cn 1 ) j B) C i ψ β ij IB i cn 1 ). Now, B i cn 1 implies that β ij Cl/n. j B) By virtue of f being in Λ s M), β ij C is+1/). j B) Therefore, and so E j B) j B) β ij IB i cn 1 ) C n 1 log n is+1/)), j B) β ij ψ ij x 0 )I ˆB i cn 1 )I B i cn 1 ) C i n 1 log n is+1/)). Therefore, the bound on L is after an application of Lemma 1) [ is EL C ] i/ n 1 log n is+1/)) 1/. Now, n 1 log n is+1/) whenever i nlog n) 1) 1/s+1). Therefore, letting i be the integer such that i nlog n) 1) 1/s+1) < i +1, i EL C i/ i s log n/n + i/ is+1/) ) log n s/s+1) C. n The bound on EL is therefore C n δ + n 1 log n) s/s+1)), i=i +1

26 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) and hence ) ] s/s+1) EL [n C δ + n 1 log n. 39) Bound on L 3 As with L, brea L 3 into the following parts: R EL 3 = E ) ˆβ ij β ij ψ ij x 0 )I ˆB i >cn 1 ) + R j B) j B) β ij ψ ij x 0 )I ˆB i cn 1 ) = EL 31 + L 3 ) CEL 31 + CEL 3. Additionally, L 31 must be divided as well. R ) CE ˆβ ij β ij ψ ij x 0 )I ˆB i >cn 1 )I B i >cn 1 /) EL 31 +CE R j B) = CEL CEL 31. j B) ) ˆβ ij β ij ψ ij x 0 )I ˆB i >cn 1 )I B i cn 1 /) To tae care of L 311, notice that E ) ˆβ ij β ij ψ ij x 0 )I ˆB i >cn 1 )I B i >cn 1 /) j B) C nc 1 B i E ) ˆβ ij β ij ψ ij x 0 ). As in 38) E Since B i = l 1 j B) j B) j B) ) ˆβ ij β ij ψ ij x 0 ) β ij C is+1/), i ψ E C i /n. j B) ˆβ ij β ij )

27 10 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) the bound for EL 311 then follows from an application of Lemma 1 ) 1/ R EL 311 C n i c n is+1/) R C is Cn s/s+1). To bound EL 31, Talagrand s theorem will be used. To begin, note that by Lemma 3 E ) ˆβ ij β ij ψ ij x 0 )I ˆB i >cn 1 )I B i cn 1 /) j B) C i ψ E ) ˆβ ij β ij I ˆB i >cn 1 )I B i cn 1 /) j B) C i E ) ˆD i x) D i fx) dx J i ) I ˆD i x) D i fx)) dx > 0.08c log n/n J i = C i T 3. This is bounded just as T 3 wasat4). The number of indices is here no more than a constant, giving a bound of ) C [n i γ 1 log n + n 1 + n 1 i log n/n + i n exp nd )] i, where γ is as in 1) and d is as in 3). Therefore, repeating the argument for the piece T 3 at 8), EL 31 Cn γ + n 1 R ). Only L 3 still needs bounding. EL 3 C C R R Cn s/s+1). j B) β ij ψ ij x 0 ) i/ ψ is+1/)

28 The bound for L 3 is then E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) EL 3 C n s/s+1) + n γ). 40) Bound on L 4 L 4 is bounded much lie L 3 was. The only difference is the range of the index i and the lac of an indicator function. EL 4 = E β ij ψ ij x 0 ) C i=r+1 j B) is i=r+1 Cn s/s+1). 41) Determination of constants γ, δ, and c From the bounds derived at 36), 39), 40), and 41), ) E fx 0 ) fx ˆ 0 ) C n δ + log n/n) s/s+1) + n γ). As before, we need γ and δ to be larger than N/N + 1) and 7) to hold, so ) c A0.08) 1 1 C Q + Q 1 will suffice, as well as imply the necessary conditions on γ and δ. This implies ) E fx 0 ) fx ˆ 0 ) Clog n/n) s/s+1) and the proof is complete. C Proof of Theorem 4 Suppose the bloc length l in the wavelet estimator 11) is taen to be of order larger than log n, say l = log n) 1+r for some r>0. Then, assume that f is a density function with one detail coefficient, β i j, which is as large as possible, and no other non-zero coefficients β ij overlapping the support of ψ i j. Outside this support, f has sufficient mass to ensure it integrates to one. This function f is desired to be in the space Λ s M, x 0, δ), solet β i j = i s+1/). Let i be such that i = n/l) 1/s+1), and j an integer such that

29 104 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) ψ i x 0 j ) c > 0 for some constant c. Let ) 1/ S = sup E fx ˆ 0 ) fx 0 )) f Λ s M) E fˆ x 0 ) f x 0 )) ) 1/ = E ˆβ i j I ˆB >c/n) β i j )ψ i j x 0) + j B j ˆα j α j ) j x 0 ) R + ˆβ ij I ˆB i >c/n) β ij )ψ ij x 0 ) j B) + 1/ β ij )ψ ij x 0 ), i>r j where B is the bloc containing the nonzero detail coefficient, and the final sum is over the remaining blocs. Since EY + X i ) ) 1/ EY ) 1/ EX i )1/ for random variables X i and Y [19], S E 1/ ˆβ i j I ˆB >c/n) β i j )ψ i j x 0) j B E j ˆα j α j ) j x 0 ) 1/ R E 1/ ˆβ ij β ij )ψ ij x 0 ) I ˆB i >c/n) j B) E 1/ β ij ψ ij x 0 ) i>r j = U 1 U U 3 U 4. Now, U C n 1 by Lemma. U 3 is easily seen to be bounded be C n s/s+1) by noting that in the previous sections, EL 1 Cn s/s+1), EL 31 Cnγ log n) 1+r ) 1 +

30 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) n s/s+1) ) and the other relevant pieces are zero. The values of γ and δ may be taen the same as in the case where l = log n. U 4 C n s/s+1) by repeating the argument for L 4.Forγ large enough i.e., threshold c as chosen), U + U 3 + U 4 C n s/s+1).for U 1, [ ] 1/ U 1 = Eˆβ i j β i j ) ψ i j x 0)I ˆB >c/n)+ Eβ i j ψ i j x 0)I ˆB c/n) ) 1/ Eβ i j ψ i j x 0)I ˆB < c/n)i B c/n) [ 1/ = β i j ψ i j x 0)EI ˆB < c/n)i B c/n))], where B is the mean of the true squared coefficients in the bloc containing the nonzero coefficient. By Talagrand s theorem and Lemma 3, EI ˆB c/n)i B c/n)) Cn γ, where γ is as before. For suitable n, this is less than 1. Therefore, the expectation of the indicators in the lower bound for U 1 is at least 1/. So, ) 1/ U 1 C β i j ψ i j x 0) = C l/n) s/s+1). Therefore, S = U 1 U U 3 C l/n) s/s+1) or sup E fx ˆ 0 ) fx 0 )) Clog n/n) s/s+1) log n) rs/s+1). f Λ s M,x 0,δ) References [1] L. Brown, M. Low, Asymptotic equivalence of nonparametric regression and white noise, Ann. Statist ) [] L. Brown, M. Low, A constrained ris inequality with applications to nonparametric functional estimations, Ann. Statist ) [3] T. Cai, Adaptive wavelet estimation: a bloc thresholding and oracle inequality approach, Ann. Statist ) [4] T. Cai, Wavelet regression via bloc thresholding: adaptivity and the choice of bloc size and threshold level, Technical Report #99-14, Department of Statistics, Purdue University, [5] T. Cai, On adaptive estimation of a derivative and other related linear inverse problems, J. Statist. Planning Inference ) [6] T. Cai, On bloc thresholding in wavelet regression: adaptivity bloc size and threshold level, Statist. Sinica 1 00) [7] T. Cai, M. Low, Nonparametric function estimation over shrining neighborhoods: superefficient and adaptation, Technical Report, Department of Statistics, University of Pennsylvania, 00. [8] T. Cai, M. Low, L. Zhao, Tradeoffs between global and local riss in nonparametric function estimation, Technical Report, Department of Statistics, University of Pennsylvania, 001.

31 106 E. Chicen, T.T. Cai / Journal of Multivariate Analysis ) [9] T. Cai, B. Silverman, Incorporating information on neighboring coefficients into wavelet estimation, Sanhya Ser. B ) [10] E. Chicen, Simulation results for wavelet-based density estimators, Technical Report M968, Department of Statistics, Florida State University, 004. [11] I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, 199. [1] D. Donoho, I. Johnstone, Adapting to unnown smoothness via wavelet shrinage, J. Amer. Statist. Assoc ) [13] D. Donoho, I. Johnstone, Minimax estimation via wavelet shrinage, Ann. Statist ) [14] D. Donoho, R. Liu, Geometrizing rates of convergence III, Ann. Statist ) [15] S. Efromovich, Nonparametric estimation of a density of unnown smoothness, Theoretical Probab. Appl ) [16] S. Efromovich, Nonparametric Curve Estimation, Springer, New Yor, [17] S. Efromovich, On blocwise shrinage estimation, Technical Report, 00. [18] S. Efromovich, M. Low, Adaptive estimates of linear functionals, Probab. Theory Related Fields ) [19] P. Hall, G. Keryacharian, D. Picard, Bloc threshold rules for curve estimation using ernel and wavelet methods, Ann. Statist ) [0] P. Hall, G. Keryacharian, D. Picard, On the minimax optimality of bloc thresholded wavelet estimators, Statist. Sinica ) [1] I. Ibragimov, R. Hasminsii, Nonparametric estimation of the values of a linear functional in Gaussian white noise, Theory Probab. Appl ) [] G. Keryacharian, D. Picard, K. Tribouley, l p adaptive density estimation, Bernoulli 1996) [3] O. Lepsi, On a problem of adaptive estimation in white Gaussian noise, Theory Probab. Appl ) [4] O. Lepsi, E. Mammen, V. Spoioiny, Optimal spatial adaptation to inhomogeneous smoothness: an approach based on ernel estimates with variable bandwidth selectors, Ann. Statist ) [5] O. Lepsi, V. Spoioiny, Optimal pointwise adaptive methods in nonparametric estimation, Ann. Statist ) [6] Y. Meyer, Wavelets and Operators, Cambridge University Press, Cambridge, 199. [7] M. Talagrand, Sharper bounds for Gaussian and empirical processes, Ann. Probab. 1994) [8] H. Triebel, Theory of Function Spaces, Birhäuser, Basel, [9] H. Triebel, Theory of Function Spaces II, Birhäuser, Basel, 1983.

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be Wavelet Estimation For Samples With Random Uniform Design T. Tony Cai Department of Statistics, Purdue University Lawrence D. Brown Department of Statistics, University of Pennsylvania Abstract We show