ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania and Rutgers University

Size: px

Start display at page:

Download "ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania and Rutgers University"

Austin Paul
5 years ago
Views:

1 The Aals of Statistics 018, Vol. 46, No. 4, Istitute of Mathematical Statistics, 018 ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1 BY T. TONY CAI AND ZIJIAN GUO Uiversity of Pesylvaia ad Rutgers Uiversity This paper cosiders poit ad iterval estimatio of the l q loss of a estimator i high-dimesioal liear regressio with radom desig. We establish the miimax rate for estimatig the l q loss ad the miimax expected legth of cofidece itervals for the l q loss of rate-optimal estimators of the regressio vector, icludig commoly used estimators such as Lasso, scaled Lasso, square-root Lasso ad Datzig Selector. Adaptivity of cofidece itervals for the l q loss is also studied. Both the settig of the kow idetity desig covariace matrix ad kow oise level ad the settig of ukow desig covariace matrix ad ukow oise level are studied. The results reveal iterestig ad sigificat differeces betwee estimatig the l loss ad l q loss with 1 q< as well as betwee the two settigs. New techical tools are developed to establish rate sharp lower bouds for the miimax estimatio error ad the expected legth of miimax ad adaptive cofidece itervals for the l q loss. A sigificat differece betwee loss estimatio ad the traditioal parameter estimatio is that for loss estimatio the costrait is o the performace of the estimator of the regressio vector, but the lower bouds are o the difficulty of estimatig its l q loss. The techical tools developed i this paper ca also be of idepedet iterest. 1. Itroductio. I may applicatios, the goal of statistical iferece is ot oly to costruct a good estimator, but also to provide a measure of accuracy for this estimator. I classical statistics, whe the parameter of iterest is oedimesioal, this is achieved i the form of a stadard error or a cofidece iterval. A prototypical example is the iferece for a biomial proportio, where ofte ot oly a estimate of the proportio but also its margi of error are give. Accuracy measures of a estimatio procedure have also bee used as a tool for the empirical selectio of tuig parameters. A well-kow example is Stei s Ubiased Risk Estimate SURE), which has bee a effective tool for the costructio of data-drive adaptive estimators i ormal meas estimatio, oparametric sigal recovery, covariace matrix estimatio, ad other problems. See, for istace, [11, 15, 1, 5, 3]. The commoly used cross-validatio methods ca also be viewed as a useful tool based o the idea of empirical assessmet of accuracy. Received March 016; revised March Supported i part by NSF Grats DMS ad DMS , ad NIH Grat R01 CA MSC010 subject classificatios. Primary 6G15; secodary 6C0, 6H35. Key words ad phrases. Accuracy assessmet, adaptivity, cofidece iterval, high-dimesioal liear regressio, loss estimatio, miimax lower boud, miimaxity, sparsity. 1807

2 1808 T. T. CAI AND Z. GUO I this paper, we cosider the problem of estimatig the loss of a give estimator i the settig of high-dimesioal liear regressio, where oe observes X, y) with X R p ad y R,adfor1 i, y i = X i β + ε i. Here, β R p i.i.d. is the regressio vector, X i N p 0, )are the rows of X ad the i.i.d. errors ε i N0,σ ) are idepedet of X. This high-dimesioal liear model has bee well studied i the literature, where the mai focus has bee o estimatio of β. Several pealized/costraied l 1 miimizatio methods, icludig Lasso [8], Datzig selector [1], scaled Lasso [6] ad square-root Lasso [3], have bee proposed. These methods have bee show to work well i applicatios ad produce iterpretable estimates of β whe β is assumed to be sparse. Theoretically, with a properly chose tuig parameter, these estimators achieve the optimal rate of covergece over collectios of sparse parameter spaces; see, for example, [3 5, 1, 3, 6, 30]. For a give estimator β, thel q loss β β q with 1 q is commoly used as a metric of accuracy for β. We cosider i the preset paper both poit ad iterval estimatio of the l q loss β β q for a give β. Note that the loss β β q is a radom quatity, depedig o both the estimator β ad the parameter β. For such a radom quatity, predictio ad predictio iterval are usually used for poit ad iterval estimatio, respectively. However, we slightly abuse the termiologies i the preset paper by usig estimatio ad cofidece iterval to represet the poit ad iterval estimators of the loss β β q. Sice the l q loss depeds o the estimator β, it is ecessary to specify the estimator i the discussio of loss estimatio. Throughout this paper, we restrict our attetio to a broad collectio of estimators β that perform well at least at oe iterior poit or a small subset of the parameter space. This collectio of estimators icludes most state-ofart estimators such as Lasso, Datzig selector, scaled Lasso ad square-root Lasso. High-dimesioal liear regressio has bee well studied i two settigs. Oe is the settig with kow desig covariace matrix = I, kow oise level σ = σ 0 ad sparse β; seeforexample,[1,, 7, 16, 19,, 7, 30]. Aother commoly cosidered settig is sparse β with ukow ad σ. We study poit ad iterval estimatio of the l q loss β β q i both settigs. Specifically, we cosider the parameter space 0 k) itroduced i.3), which cosists of k-sparse sigals β with kow desig covariace matrix = I ad kow oise level σ = σ 0,ad k) defied i.4), which cosists of k-sparse sigals with ukow ad σ Our cotributios. The preset paper studies the miimax ad adaptive estimatio of the loss β β q for a give estimator β ad the miimax expected legth ad adaptivity of cofidece itervals for the loss. A major step i our aalysis is to establish rate sharp lower bouds for the miimax estimatio error ad

3 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 1809 the miimax expected legth of cofidece itervals for the l q loss over 0 k) ad k) for a broad class of estimators of β, which cotais the subclass of rateoptimal estimators. We the focus o the estimatio of the loss of rate-optimal estimators ad take the Lasso ad scaled Lasso estimators as geeric examples. For these rate-optimal estimators, we propose procedures for poit estimatio as well as cofidece itervals for their l q losses. It is show that the proposed procedures achieve the correspodig lower bouds up to a costat factor. These results together establish the miimax rates for estimatig the l q loss of rate-optimal estimators over 0 k) ad k). The aalysis shows iterestig ad sigificat differeces betwee estimatig the l loss ad l q loss with 1 q<aswellas betwee the two parameter spaces k) ad 0 k): The miimax rate for estimatig β β over 0k) is mi{ 1 over k) is k = Iadσ = σ 0 whe,k } ad. So loss estimatio is much easier with the prior iformatio. k The miimax rate for estimatig β β q with 1 q<over both 0k) ad k) is k q. I the regime k, a practical loss estimator is proposed for estimatig the l loss ad show to achieve the optimal covergece rate 1 adaptively over 0 k). Wesayestimatio of loss is impossible if the miimax rate ca be achieved by the trivial estimator 0, which meas that the estimatio accuracy of the loss is at least of the same order as the loss itself. I all other cosidered cases, estimatio of loss is show to be impossible. These results idicate that loss estimatio is difficult. We the tur to the costructio of cofidece itervals for the l q loss. A cofidece iterval for the loss is useful eve whe it is impossible to estimate the loss, as a cofidece iterval ca provide otrivial upper ad lower bouds for the loss. I terms of covergece rate over 0 k) or k), the miimax rate of the expected legth of cofidece itervals for the l q loss, β β q, of ay rate-optimal estimator β coicides with the miimax estimatio rate. We also cosider the adaptivity of cofidece itervals for the l q loss of ay rate-optimal estimator β. The framework for adaptive cofidece itervals is discussed i detail i Sectio 3.1.) Regardig cofidece itervals for the l loss i the case of kow = Iad σ = σ 0, a procedure is proposed ad is show to achieve the optimal legth 1 adaptively over 0 k) for k log p. Furthermore, it is show that this is the oly regime where adaptive cofidece itervals exist, eve over two give parameter spaces. For example, whe k 1 ad k 1 k, it is impossible to costruct a cofidece iterval for the l loss with guarateed coverage probability over 0 k ) [cosequetly also over 0 k 1 )] ad with the expected legth automatically adjusted to the sparsity. Similarly, for the l q loss with 1 q<, co-

4 1810 T. T. CAI AND Z. GUO structio of adaptive cofidece itervals is impossible over 0 k 1 ) ad 0 k ) for k 1 k log p. Regardig cofidece itervals for the l q loss with 1 q i the case of ukow ad σ, the impossibility of adaptivity also holds over k 1 ) ad k ) for k 1 k log p. Establishig rate-optimal lower bouds requires the developmet of ew techical tools. Oe mai differece betwee loss estimatio ad the traditioal parameter estimatio is that for loss estimatio the costrait is o the performace of the estimator β of the regressio vector β, but the lower boud is o the difficulty of estimatig its loss β β q. We itroduce useful ew lower boud techiques for the miimax estimatio error ad the expected legth of adaptive cofidece itervals for the loss β β q. I several importat cases, it is ecessary to test a composite ull agaist a composite alterative i order to establish rate sharp lower bouds. The techical tools developed i this paper ca also be of idepedet iterest. I additio to 0 k) ad k), we also study a itermediate parameter space where the oise level σ is kow ad the desig covariace matrix is ukow but of certai structure. Lower bouds for the expected legth of miimax ad adaptive cofidece itervals for β β q over this parameter space are established for a broad collectio of estimators β ad are show to be rate sharp for the class of rate-optimal estimators. Furthermore, the lower bouds developed i this paper have wider implicatios. I particular, it is show that they lead immediately to miimax lower bouds for estimatig β q ad the expected legth of cofidece itervals for β q with 1 q. 1.. Compariso with other works. Statistical iferece o the loss of specific estimators of β has bee cosidered i the recet literature. The papers [, 16] established, i the settig = Iad/p δ 0, ), the limit of the ormalized loss p 1 βλ) β where βλ) is the Lasso estimator with a pre-specified tuig parameter λ. Although [, 16] provided a exact asymptotic expressio of the ormalized loss, the limit itself depeds o the ukow β. I a similar settig, the paper [7] established the limit of a ormalized l loss of the square-root Lasso estimator. These limits of the ormalized losses help uderstad the properties of the correspodig estimators of β, but they do ot lead to a estimate of the loss. Our results imply that although these ormalized losses have a limit uder certai regularity coditios, such losses caot be estimated well i most settigs. A recet paper, [0], costructed a cofidece iterval for β β i the case of kow = I, ukow oise level σ ad moderate dimesio where /p ξ 0, 1) ad o sparsity is assumed o β. While o sparsity assumptio o β is imposed, their method requires the assumptio of = Iad/p ξ 0, 1). I cotrast, i this paper, we cosider both ukow ad kow = I settigs, while allowig p ad assumig sparse β.

5 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 1811 Hoest adaptive iferece has bee studied i the oparametric fuctio estimatio literature, icludig [9] for adaptive cofidece itervals for liear fuctioals, [8, 18] for adaptive cofidece bads ad [10, 4] for adaptive cofidece balls, ad i the high-dimesioal liear regressio literature, icludig [] for adaptive cofidece set ad [7] for adaptive cofidece iterval for liear fuctioals. I this paper, we develop ew lower boud tools, Theorems 8 ad 9, to establish the possibility of adaptive cofidece itervals for β β q. The coectio betwee l loss cosidered i the curret paper ad the work [] is discussed i more detail i Sectio Orgaizatio. Sectio establishes the miimax lower bouds of estimatig the loss β β q with 1 q over both 0k) ad k) ad shows that these bouds are rate sharp for the Lasso ad scaled Lasso estimators, respectively. We the tur to iterval estimatio of β β q. Sectios 3 ad 4 preset the miimax ad adaptive miimax lower bouds for the expected legth of cofidece itervals for β β q over 0k) ad k). For Lasso ad scaled Lasso estimators, we show that the lower bouds ca be achieved ad ivestigate the possibility of adaptivity. Sectio 5 cosiders the rate-optimal estimators ad establishes the miimax covergece rate of estimatig their l q losses. Sectio 6 presets ew miimax lower boud techiques for estimatig the loss β β q. Sectio 7 discusses the miimaxity ad adaptivity i aother settig, where the oise level σ is kow ad the desig covariace matrix is ukow but of certai structure. Sectio 8 applies the ewly developed lower bouds to establish lower bouds for a related problem, that of estimatig β q. Sectio 9 proves the mai results ad additioal proofs are give i the supplemetal material [6] Notatio. For a matrix X R p, X i, X j ad X i,j deote respectively the ith row, jth colum ad i, j) etry of the matrix X. For a subset J {1,,...,p}, J deotes the cardiality of J, J c deotes the complemet {1,,...,p}\J, X J deotes the submatrix of X cosistig of colums X j with j J ad for a vector x R p, x J is the subvector of x with idices i J.For a vector x R p, suppx) deotes the support of x ad the l q orm of x is defied as x q = p i=1 x i q ) q 1 for q 0 with x 0 = suppx) ad x = max 1 j p x j.fora R, a + = max{a,0}; fora,b R, a b = max{a,b}. We use max X j as a shorthad for max 1 j p X j ad mi X j as a shorthad for mi 1 j p X j.foramatrixa, we defie the spectral orm A = sup x =1 Ax p ad the matrix l 1 orm A L1 = sup 1 j p i=1 A ij ; for a symmetric matrix A, λ mi A) ad λ max A) deote respectively the smallest ad largest eigevalue of A. Weusec ad C to deote geeric positive costats that may vary from place to place. For two positive sequeces a ad b, a b meas a Cb for all ad a b if b a ad a b if a b ad a b a,ada b if lim sup b = 0ada b if b a.

6 181 T. T. CAI AND Z. GUO. Miimax estimatio of the l q loss. We begi by presetig the miimax framework for estimatig the l q loss, β β q, of a give estimator β, adthe establish the miimax lower bouds for the estimatio error for a broad collectio of estimators β. We also show that such miimax lower bouds ca be achieved for the Lasso ad scaled Lasso estimators..1. Problem formulatio. Recall the high-dimesioal liear model,.1) y 1 = X p β p 1 + ε 1, ε N 0,σ I ). i.i.d. We focus o the radom desig with X i N0, )ad X i ad ε i are idepedet. Let Z = X, y) deote the observed data ad β be a give estimator of β. Deotig by L q Z) ay estimator of the loss β β q, the miimax rate of covergece for estimatig β β q overaparameterspace is defied as the largest quatity γ β,l q ) such that.) if sup L q θ P θ L q Z) β β q γ β,l q ) ) δ, for some costat δ>0 ot depedig o or p. We shall write L q for L q Z) whe there is o cofusio. We deote the parameter by θ = β,,σ), which cosists of the sigal β, the desig covariace matrix ad the oise level σ.foragiveθ = β,,σ),we use βθ) to deote the correspodig β. Two settigs are cosidered: The first is kow desig covariace matrix = I ad kow oise level σ = σ 0 ad the other is ukow ad σ. I the first settig, we cosider the followig parameter space that cosists of k-sparse sigals:.3) 0 k) = { β, I,σ 0 ) : β 0 k }, ad i the secod settig, we cosider { 1 k) = β,,σ): β 0 k, λ mi ) λ max ) M 1, M 1.4) 0 <σ M }, where M 1 1adM > 0 are costats. The parameter space 0 k) is a subset of k), which cosists of k-sparse sigals with ukow ad σ. The miimax rate γ β,l q ) for estimatig β β q also depeds o the estimator β. Differet estimators β could lead to differet losses β β q ad i geeral the difficulty of estimatig the loss β β q varies with β. We first recall the properties of some state-of-art estimators ad the specify the collectio of estimators o which we focus i this paper. As show i [3, 4, 1, 6], Lasso, Datzig

7 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 1813 Selector, scaled Lasso ad square-root Lasso satisfy the followig property if the tuig parameter is properly chose:.5) sup P θ β β q Ck ) q 0, θ k) where C>0is a costat. The miimax lower bouds established i [3, 30, 31] imply that k q is the optimal rate for estimatig β over the parameter space k). It should be stressed that all of these algorithms do ot require kowledge of the sparsity k ad are thus adaptive to the sparsity provided k. We cosider a broad collectio of estimators β satisfyig oe of the followig two assumptios: A1) The estimator β satisfies, for some θ 0 = β, I,σ 0 ),.6) P θ0 β β q C β q 0 1) ) σ 0 α 0, where 0 α 0 < 1 4 ad C > 0 are costats. A) The estimator β satisfies.7) sup P θ β β {θ=β q C β q 0 1) ),I,σ ):σ σ 0 } σ α 0, where 0 α 0 < 1 4 ad C > 0 are costats ad σ 0 > 0isgive. I view of the miimax rate give i.5), Assumptio A1) requires β to be a good estimator of β at at least oe poit θ 0 0 k). Assumptio A) is slightly stroger tha A1) ad requires β to estimate β well for a sigle β but over a rage of oise levels σ σ 0 while = I. Of course, ay estimator β satisfyig.5) satisfies both A1) ad A). I additio to Assumptios A1) ad A), we also itroduce the followig sparsity assumptios that will be used i various theorems: B1) Let c 0 be the costat defied i 9.14). The sparsity levels k ad k 0 satisfy 1 k c 0 mi{p γ, } for some costat 0 γ< 1 ad 1 k 0 c 0 mi{k, }. B) The sparsity levels k 1,k ad k 0 satisfy 1 k 1 k c 0 mi{p γ, } for some costat 0 γ< 1 ad c 0 > 0ad1 k 0 c 0 mi{k 1, }... Miimax estimatio of the l q loss over 0 k). The followig theorem establishes the miimax lower bouds for estimatig the loss β β q over the parameter space 0 k).

8 1814 T. T. CAI AND Z. GUO THEOREM 1. Suppose that the sparsity levels k ad k 0 satisfy Assumptio B1). For ay estimator β satisfyig Assumptio A1) with β 0 k 0, {.8) if sup P θ L β β c mi k ) L, 1 }σ 0 δ. θ 0 k) For ay estimator β satisfyig Assumptio A) with β 0 k 0,.9) if L q sup θ 0 k) P θ L q β β q ck q ) σ 0 δ where δ>0 ad c>0 are costats. for 1 q<, REMARK 1. Assumptio A1) restricts our focus to estimators that ca perform well at at least oe poit β, I,σ 0 ) 0 k). This weak coditio makes the established lower bouds widely applicable as the bechmark for evaluatig estimators of the l q loss of ay β that performs well at a proper subset, or eve a sigle poit of the whole parameter space. I this paper, we focus o estimatig the loss β β q with 1 q. Similar results ca be established for the loss i the form of β β q q with 1 q. Uder the same assumptios as those i Theorem 1, the lower bouds for estimatig the loss β β q q hold with replacig the covergece rates with their q power; that is,.8) remais the same while the covergece rate k q /σ 0 ) i.9) is replaced by k /σ 0 ) q. Similarly, all the results established i the rest of the paper for β β q hold for β β q q with correspodig covergece rates replaced by their q power. Theorem 1 establishes the miimax lower bouds for estimatig the l loss β β of ay estimator β satisfyig Assumptio A1) ad the l q loss β β q with 1 q< of ay estimator β satisfyig Assumptio A). We will take the Lasso estimator as a example ad demostrate the implicatios of the above theorem. We radomly split Z = y, X) ito subsamples Z 1) = y 1),X 1) ) ad Z ) = y ),X ) ) with sample sizes 1 ad, respectively. The Lasso estimator β L based o the first subsample Z 1) = y 1),X 1) ) is defied as.10) β L y 1) X 1) β = arg mi β R p 1 + λ p j=1 X 1) j 1 β j, where λ = A / 1 σ 0 with A> beig a pre-specified costat. Without loss of geerality, we assume 1. For the case 1 q<,.5) ad.9) together imply that the estimatio of the l q loss β L β q is impossible sice the lower boud ca be achieved by the trivial estimator of the loss, 0, that is, sup θ 0 k) P θ 0 β L β q Ck q ) 0.

9 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 1815 k For the case q =, i the regime k, the lower boud i.8) ca be achieved by the zero estimator, ad hece estimatio of the loss β L β is impossible. However, the iterestig case is whe k log p,theloss estimator L proposed i.11) achieves the miimax lower boud 1 i.8), which caot be achieved by the zero estimator. We ow detail the costructio of the loss estimator L. Based o the secod half-sample Z ) = y ),X ) ),we propose the followig estimator:.11) L 1 = y ) X ) β L ) σ 0. Note that the first subsample Z 1) = y 1),X 1) ) is used to produce the Lasso estimator β L i.10) ad the secod subsample Z ) = y ),X ) ) is retaied to evaluate the loss β L β. Such sample splittig techique is similar to crossvalidatio ad has bee used i [] for costructig cofidece sets for β ad i [0] for cofidece itervals for the l loss. The followig propositio establishes that the estimator L achieves the miimax lower boud of.8) over the regime k log p. PROPOSITION 1. Suppose that k ad β L is the Lasso estimator defied i.10) with A>, the the estimator of loss proposed i.11) satisfies, for ay sequece δ,p,.1) lim sup sup,p θ 0 k) P θ L β L β ) 1 δ,p = Miimax estimatio of the l q loss over k). We ow tur to the case of ukow ad σ ad establish the miimax lower boud for estimatig the l q loss over the parameter space k). THEOREM. Suppose that the sparsity levels k ad k 0 satisfy Assumptio B1). For ay estimator β satisfyig Assumptio A1) with β 0 k 0, ) sup P θ L q β β q δ, 1 q,.13) if L q θ k) where δ>0 ad c>0 are costats. ck q Theorem provides a miimax lower boud for estimatig the l q loss of ay estimator β satisfyig Assumptio A1), icludig the scaled Lasso estimator defied as.14) { β SL, ˆσ } y = arg mi Xβ + σ p β R p,σ R + σ + λ X j 0 β j, + j=1

10 1816 T. T. CAI AND Z. GUO where λ 0 = A / with A>. Note that for the scaled Lasso estimator the lower boud i.13) ca be achieved by the trivial loss estimator 0, i the sese, sup θ k) P θ 0 β SL β q Ck q ) 0, ad hece estimatio of loss is impossible i this case. 3. Miimaxity ad adaptivity of cofidece itervals over 0 k). We focused i the last sectio o poit estimatio of the l q loss ad showed the impossibility of loss estimatio except for oe regime. The results aturally lead to aother questio: Is it possible to costruct useful cofidece itervals for β β q that ca provide otrivial upper ad lower bouds for the loss? I this sectio, after itroducig the framework for miimaxity ad adaptivity of cofidece itervals, we cosider the case of kow = Iadσ = σ 0 ad establish the miimaxity ad adaptivity lower bouds for the expected legth of cofidece itervals for the l q loss of a broad collectio of estimators over the parameter space 0 k). Wealso show that such miimax lower bouds ca be achieved for the Lasso estimator ad the discuss the possibility of adaptivity usig the Lasso estimator as a example. The case of ukow ad σ will be the focus of the ext sectio Framework for miimaxity ad adaptivity of cofidece itervals. I this sectio, we itroduce the followig decisio theoretical framework for cofidece itervals of the loss β β q.give0<α<1 ad the parameter space ad the loss β β q, deote by I α, β,l q ) the set of all 1 α) level cofidece itervals for β β q over, I α, β,l { q ) = CI α β,l q,z)= [ lz),uz) ] : 3.1) if P θ β βθ) θ q CI α β,l q,z) ) 1 α}. We will write CI α for CI α β,l q,z) whe there is o cofusio. For ay cofidece iterval CI α β,l q,z)=[lz),uz)], its legth is deoted by LCI α β,l q, Z)) = uz) lz) ad the maximum expected legth over a parameter space 1 is defied as 3.) L ) CI α β,l q,z), 1 = sup E θ L CI α β,l q,z) ). θ 1 For two ested parameter spaces 1, we defie the bechmark L α 1,, β,l q ), measurig the degree of adaptivity over the ested spaces 1, 3.3) L α 1,, β,l q ) = if CI α β,l q,z) I α, β,l q ) sup E θ L CI α β,l q,z) ). θ 1 We will write L α 1, β,l q ) for L α 1, 1, β,l q ), which is the miimax expected legth of cofidece itervals for β β q over 1. The bechmark L α 1,, β,l q ) is the ifimum of the maximum expected legth over 1 amog

11 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 1817 FIG. 1. The plot demostrates defiitios of L α 1, β,l q ) ad L α 1,, β,l q ). all 1 α)-level cofidece itervals over. I cotrast, L α 1, β,l q ) is cosiderig all 1 α)-level cofidece itervals over 1. I words, if there is prior iformatio that the parameter lies i the smaller parameter space 1, L α 1, β,l q ) measures the bechmark legth of cofidece itervals over the parameter space 1, which is illustrated i the left of Figure 1; however, if there is oly prior iformatio that the parameter lies i the larger parameter space, L α 1,, β,l q ) measures the bechmark legth of cofidece itervals over the parameter space 1, which is illustrated i the right of Figure 1. Rigorously, we defie a cofidece iterval CI to be simultaeously adaptive over 1 ad if CI I α, β,l q ), 3.4) L CI ), 1 L α 1, β,l q ) ad L CI ), L α, β,l q ). The coditio 3.4) meas that the cofidece iterval CI, which has coverage over the larger parameter space, achieves the miimax rate over both 1 ad. Note that LCI, 1 ) L α 1,, β,l q ).IfL α 1,, β,l q ) L α 1, β,l q ), the the rate-optimal adaptatio 3.4) is impossible to achieve for 1. Otherwise, it is possible to costruct cofidece itervals simultaeously adaptive over parameter spaces 1 ad. The possibility of adaptatio over parameter spaces 1 ad ca thus be aswered by ivestigatig the bechmark quatities L α 1, β,l q ) ad L α 1,, β,l q ). Such framework has already bee itroduced i [7], which studies the miimaxity ad adaptivity of cofidece itervals for liear fuctioals i high-dimesioal liear regressio. We will adopt the miimax ad adaptatio framework discussed above ad establish the miimax expected legth L α 0k), β,l q ) ad the adaptatio bechmark L α 0k 1 ), 0 k ), β,l q ). I terms of the miimax expected legth ad the adaptivity behavior, there exist fudametal differeces betwee the case q = ad 1 q<. We will discuss them separately i the followig two subsectios. 3.. Cofidece itervals for the l loss over 0 k). The followig theorem establishes the miimax lower boud for the expected legth of cofidece itervals of β β over the parameter space 0k). THEOREM 3. Suppose that 0 <α< 1 4 ad the sparsity levels k ad k 0 satisfy Assumptio B1). For ay estimator β satisfyig Assumptio A1) with β 0

12 1818 T. T. CAI AND Z. GUO k 0, the there is some costat c>0 such that 3.5) L α { 0 k), β,l ) k 1 c mi, }σ 0. I particular, if β L is the Lasso estimator defied i.10) with A>, the the miimax expected legth for 1 α) level cofidece itervals of β L β over 0 k) is 3.6) L α { 0 k), β L ) k 1,l mi, }σ 0. We ow cosider adaptivity of cofidece itervals for the l loss. The followig theorem gives the lower boud for the bechmark L α 0k 1 ), 0 k ), β,l ). We will the discuss Theorems 3 ad 4 together. THEOREM 4. Suppose that 0 <α< 1 4 ad the sparsity levels k 1, k ad k 0 satisfy Assumptio B). For ay estimator β satisfyig Assumptio A1) with β 0 k 0, the there is some costat c>0 such that 3.7) L α { 0 k 1 ), 0 k ), β,l ) k 1 c mi, }σ 0. I particular, if β L is the Lasso estimator defied i.10) with A>, the above lower boud ca be achieved. The lower boud established i Theorem 4 implies that of Theorem 3 ad both lower bouds hold for a geeral class of estimators satisfyig Assumptio A1). There is a phase trasitio for the lower boud of the bechmark L α 0k 1 ), 0 k ), β,l ). I the regime k k σ0 ;whe k log p estimator β L defied i.10), the lower boud k, the lower boud i 3.7)is 1 σ 0, the lower boud i 3.7) is. For the Lasso σ0 i 3.5) ad k σ0 i 3.7) ca be achieved by the cofidece itervals CI 0 α Z, k, ) ad CI0 α Z, k, ) defied i 3.15), respectively. Applyig a similar idea to.11), we show that the miimax lower boud 1 σ0 i 3.6) ad3.7) ca be achieved by the followig cofidece iterval: ) ) ) 3.8) CI 1 α Z) = ψz) ψz),, 1 χ1 α ) σ χ α ) σ 0 where χ1 α ) ad χ α ) are the 1 α ad α quatiles of χ radom variable with degrees of freedom, respectively, ad { 1 3.9) ψz)= mi y ) X ) β L,σ 0 }. +

13 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 1819 Note that the two-sided cofidece iterval 3.8) is simply based o the observed data Z, ot depedig o ay prior kowledge of the sparsity k. Furthermore,it is a two-sided cofidece iterval, which tells ot oly just a upper boud, but also a lower boud for the loss. The coverage property ad the expected legth of CI 1 α Z) are established i the followig propositio. PROPOSITION. Suppose that k log p ad β L is the estimator defied i.10) with A>. The CI 1 α Z) defied i 3.8) satisfies 3.10) lim if if,p P β L β θ 0 k) CI1 α Z)) 1 α, ad 3.11) L CI 1 α Z), 0k) ) 1 σ 0. Regardig the Lasso estimator β L defied i.10), we will discuss the possibility of adaptivity of cofidece itervals for β L β. The adaptivity behavior of cofidece itervals for β L β is demostrated i Figure. As illustrated i the rightmost plot of Figure, i the regime k 1 k log p, we obtai L α 0k 1 ), 0 k ), β L,l ) L α 0k 1 ), β L,l ) 1, which implies that adaptatio is possible over this regime. As show i Propositio, the cofidece iterval CI 1 α Z) defied i 3.8) is fully adaptive over the regime k log p i the sese of 3.4). Illustrated i the leftmost ad middle plots of Figure, it is impossible to costruct a adaptive cofidece iterval for β L β over regimes k 1 k ad k 1 k sice L α 0k 1 ), 0 k ), β L,l ) FIG.. Illustratio of L α 0k 1 ), β L,l ) top) ad L α 0k 1 ), 0 k ), β L,l ) bottom) over regimes k 1 k leftmost), k 1 k log p middle) ad k 1 k log p rightmost).

14 180 T. T. CAI AND Z. GUO L α 0k 1 ), β L,l ) if k 1 ad k 1 k. To sum up, adaptive cofidece itervals for β L β are oly possible over the regime k log p. Compariso with cofidece balls. We should ote that the problem of costructig cofidece itervals for β β is related to but differet from that of costructig cofidece sets for β itself. Cofidece balls costructed i [] are of form {β : β β u Z)},where β ca be the Lasso estimator ad u Z) is a data depedet squared radius. See [] for further details. A aive applicatio of this cofidece ball leads to a oe-sided cofidece iterval for the loss β β, 3.1) CI iduced α Z) = { β β : β β u Z) }. Due to the reaso that cofidece sets for β were sought for i Theorem 1 i [], cofidece sets i the form {β : β β u Z)} will suffice to achieve the optimal legth. However, sice our goal is to characterize β β, we apply the ubiased risk estimatio discussed i Theorem 1 of [] ad costruct the two-sided cofidece iterval i 3.8). Such a two-sided cofidece iterval is more iformative tha the oe-sided cofidece iterval 3.1) sice the oe-sided cofidece iterval does ot cotai the iformatio whether the loss is close to zero or ot. Furthermore, as show i [], the legth of cofidece iterval CI iduced α Z) over 1 the parameter space 0 k) is of order + k. The two-sided cofidece iterval CI 1 1 α Z) costructed i 3.8) is of expected legth, which is much shorter tha 1 + k i the regime k, that is, the two-sided cofidece iterval 3.8) provides a more accurate iterval estimator of the l loss. This is illustrated i Figure 3. The lower boud techique developed i the literature of adaptive cofidece sets [] ca also be used to establish some of the lower boud results for the case q = give i the preset paper. However, ew techiques are eeded i order to establish the rate sharp lower bouds for the miimax estimatio error.9) i the regio k ad for the expected legth of the cofidece itervals 3.18) ad7.3) i the regio k 1 k log p, where it is ecessary to test FIG. 3. Compariso of the two-sided cofidece iterval CI 1 α Z) with the oe-sided cofidece iterval CI iduced α Z).

15 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 181 a composite ull agaist a composite alterative i order to establish rate sharp lower bouds Cofidece itervals for the l q loss with 1 q< over 0 k). We ow cosider the case 1 q< ad ivestigate the miimax expected legth ad adaptivity of cofidece itervals for β β q over the parameter space 0k). The followig theorem characterizes the miimax covergece rate for the expected legth of cofidece itervals. THEOREM 5. Suppose that 0 <α< 1 4,1 q< ad the sparsity levels k ad k 0 satisfy Assumptio B1). For ay estimator β satisfyig Assumptio A) with β 0 k 0, the there is some costat c>0 such that 3.13) L α 0 k), β,l ) q ck q σ 0. I particular, if β L is the Lasso estimator defied i.10) with A>, the the miimax expected legth for 1 α) level cofidece itervals of β L β q over 0 k) is 3.14) L α 0 k), β L ),l q k q σ 0. We ow costruct the cofidece iterval achievig the miimax covergece rate i 3.14), 3.15) CI 0 α Z,k,q)= 0,C A, k)k q ), Aσ 0 ) 3η 0 η, 0 +1 Aσ 0) k ) η k 0) where C A, k) = max{ } with η 0 = 1.01 ) 4 1 A+. The followig propositio establishes the coverage property ad the ex- A pected legth of CI 0 α Z,k,q). PROPOSITION 3. Suppose that 1 k ad β L is the estimator defied i.10) with A>. For 1 q, the cofidece iterval CI 0 α Z,k,q) defied i 3.15) satisfies 3.16) lim if if β β q CI0 α Z,k,q)) = 1, ad P θ,p θ 0 k) 3.17) L CI 0 α Z,k,q), 0k) ) k q σ 0. I particular, for the case q =, 3.16) ad 3.17) also hold for the estimator β L defied i.10) with A>.

16 18 T. T. CAI AND Z. GUO This result shows that the cofidece iterval CI 0 α Z,k,q) achieves the miimax rate give i 3.14). I cotrast to the l loss where the two-sided cofidece iterval 3.8) is sigificatly shorter tha the oe-sided iterval ad achieves the optimal rate over the regime k log p,forthel q loss with 1 q<, the oe-sided cofidece iterval achieves the optimal rate give i 3.14). We ow cosider adaptivity of cofidece itervals. The followig theorem establishes the lower bouds for L α 0k 1 ), 0 k ), β,l q ) with 1 q<. THEOREM 6. Suppose 0 <α< 1 4,1 q<ad the sparsity levels k 1, k ad k 0 satisfy Assumptio B). For ay estimator β satisfyig Assumptio A) with β 0 k 0, the there is some costat c>0 such that 3.18) q ck σ 0 if k 1 k, L α 0 k 1 ), 0 k ), β,l ) q q ck 1 1 σ 0 if k 1 k, q ck 1 k 1 σ 0 if k 1 k. I particular, if p ad β L is the Lasso estimator defied i.10) with A> 4, the above lower bouds ca be achieved. The lower bouds of Theorem 6 imply that of Theorem 5 ad both lower bouds hold for a geeral class of estimators satisfyig Assumptio A). However, the lower boud 3.18) i Theorem 6 has a sigificatly differet meaig from 3.13) i Theorem 5 where 3.18) quatifies the cost of adaptatio without kowig the sparsity level. For the Lasso estimator β L defied i.10), by comparig Theorem 5 ad Theorem 6, we obtai L α 0k 1 ), 0 k ), β L,l q ) L α 0k 1 ), β L,l q ) if k 1 k, which implies the impossibility of costructig adaptive cofidece itervals for the case 1 q<. There exists marked differece betwee the case 1 q< ad the case q =, where it is possible to costruct adaptive cofidece itervals over the regime k log p. For the Lasso estimator β L defied i.10), it is show i Propositio 3 that the cofidece iterval CI 0 α Z, k,q) defied i 3.15) achieves the lower boud q k σ 0 of 3.18). The lower bouds k q 1 k 1 σ 0 ad k q 1 1 σ0 of 3.18) ca be achieved by the followig proposed cofidece iterval: CI α Z, k,q) ) ) ) 3.19) ψz) = 1 χ1 α ) σ 0,16k ) q 1 ψz) 1 + χ α ) σ 0, +

17 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 183 where ψz) is give i 3.9). The above claim is verified i Propositio 4. Note that the cofidece iterval CI 1 α Z) defied i 3.8) is a special case of CI α Z, k,q)with q =. PROPOSITION 4. Suppose p, 1 k 1 k log p ad β L is defied i.10) with A>4. The CI α Z, k,q)defied i 3.19) satisfies 3.0) lim if if β β q CI α Z, k,q) ) 1 α, ad P θ,p θ 0 k ) 3.1) L CI α Z, k,q), 0 k 1 ) ) k q 1 k )σ Miimaxity ad adaptivity of cofidece itervals over k). I this sectio, we focus o the case of ukow ad σ ad establish the miimax expected legth of cofidece itervals for β β q with 1 q over k) defied i.4). We also study the possibility of adaptivity of cofidece itervals for β β q. The followig theorem establishes the lower bouds for the bechmark quatities L α k i), β,l q ) with i = 1, adl α k 1), k ), β,l q ). THEOREM 7. Suppose that 0 <α< 1 4,1 q ad the sparsity levels k 1, k ad k 0 satisfy Assumptio B). For ay estimator β satisfyig Assumptio A1) at θ 0 = β, I,σ 0 ) with β 0 k 0, there is a costat c>0 such that L α ki ), β,l ) q ck q 4.1) i for i = 1,, L α {θ0 }, k ), β,l 4.) ) q ck q. I particular, if β SL is the scaled Lasso estimator defied i.14) with A>, the the above lower bouds ca be achieved. The lower bouds 4.1) ad4.) hold for ay β satisfyig Assumptio A1) at a iterior poit θ 0 = β, I,σ 0 ), icludig the scaled Lasso estimator as a special case. We demostrate the impossibility of adaptivity of cofidece itervals for the l q loss of the scaled Lasso estimator β SL.SiceL α k 1), k ), β SL,l q ) L α {θ 0}, k ), β SL,l q ), by 4.), we have L α k 1), k ), β SL,l q ) L α k 1), β SL,l q ) if k 1 k. The compariso of L α k 1), β SL,l q ) ad L α k 1), k ), β SL,l q ) is illustrated i Figure 4. Referrig to the adaptivity defied i 3.4), it is impossible to costruct adaptive cofidece itervals for β SL β q. Theorem 7 shows that for ay cofidece iterval CI α β,l q,z) for the loss of ay give estimator β satisfyig Assumptio A1), uder the coverage costrait that CI α β,l q,z) I α k ), β,l q ), its expected legth at ay give

18 184 T. T. CAI AND Z. GUO FIG. 4. Illustratio of L α k 1), β SL,l q ) left) ad L α k 1), k ), β SL,l q ) right). θ 0 = β q, I,σ) k 0 ) must be of order k. I cotrast to Theorems 4 ad 6, Theorem 7 demostrates that cofidece itervals must be log at a large subset of poits i the parameter space, ot just at a small umber of ulucky poits. Therefore, the lack of adaptivity for cofidece itervals is ot due to the coservativeess of the miimax framework. I the followig, we detail the costructio of cofidece itervals for β SL β q. The costructio of cofidece itervals is based o the followig defiitio of restricted eigevalue, which is itroduced i [4]: 4.3) κx,k,s,α 0 ) = mi J 0 {1,...,p}, J 0 k mi δ 0, δ J c 1 α 0 δ J0 1 0 Xδ δj01, where J 1 deotes the subset correspodig to the s largest i absolute value coordiates of δ outside of J 0 ad J 01 = J 0 J 1. Defie the evet B ={ˆσ }. The cofidece iterval for β SL β q is defied as {[ ] 0,ϕZ,k,q) o B, 4.4) CI α Z,k,q)= {0} o B c, where { ϕz,k,q) = mi 16Amax X j σ κ X,k,k,3 max X j mi X j )) ) k q, k q ) σ }. REMARK. The restricted eigevalue κ X,k,k,3 max X j mi X j )) is computatioally ifeasible. For desig covariace matrix of special structures, the restricted eigevalue ca be replaced by its lower boud ad a computatioally feasible cofidece iterval ca be costructed. See Sectio 4.4 i [7] for more details. Properties of CI α Z,k,q)are established as follows. PROPOSITION 5. Suppose that 1 k ad β SL is the estimator defied i.14) with A>. For 1 q, the CI α Z,k,q)defied i 4.4) satisfies

19 the followig properties: 4.5) 4.6) lim if HIGH-DIMENSIONAL ACCURACY ASSESSMENT 185 if P θ,p θ k) β β q CI αz,k,q) ) = 1, L CI α Z,k,q), k) ) k q. Propositio 5 shows that the cofidece iterval CI α Z, k i,q) defied i 4.4) achieves the lower boud i 4.1), for i = 1,, ad the cofidece iterval CI α Z, k,q)defied i 4.4) achieves the lower boud i 4.). 5. Estimatio of the l q loss of rate-optimal estimators. We have established miimax lower bouds for the estimatio accuracy of the loss of a broad class of estimators β satisfyig A1) or A) ad also demostrated that such miimax lower bouds are sharp for the Lasso ad scaled Lasso estimators. We ow show that the miimax lower bouds are sharp for the class of rate-optimal estimators satisfyig the followig Assumptio A). A) The estimator β satisfies 5.1) sup P θ β β q C β q 0 1) ) Cp δ, θ k) for all 1 k log p,whereδ>0, C > 0adC>0are costats ot depedig o k, or p. Wesayaestimator β is rate-optimal if it satisfies Assumptio A). As show i [3, 4, 1, 6], Lasso, Datzig Selector, scaled Lasso ad square-root Lasso are rate-optimal whe the tuig parameter is chose properly. We shall stress that Assumptio A) implies Assumptios A1) ad A). Assumptio A) requires the estimator β to perform well over the whole parameter space k) while Assumptios A1) ad A) oly require β to perform well at a sigle poit or over a proper subset. The followig propositio shows that the miimax lower bouds established i Theorem 1 to Theorem 7 ca be achieved for the class of rate-optimal estimators. PROPOSITION 6. Let β be a estimator satisfyig Assumptio A): 1. There exist poit or iterval) estimators of the loss β β q with 1 q< achievig, up to a costat factor, the miimax lower bouds.9) i Theorem 1 ad 3.13) i Theorem 5 ad estimators of loss β β q with 1 q achievig, up to a costat factor, the miimax lower bouds.13) i Theorem ad 4.1) ad 4.) i Theorem 7.. Suppose that the estimator β is costructed based o the subsample Z 1) = y 1),X 1) ), the there exist estimators of the loss β β achievig, up to a costat factor, the miimax lower bouds.8) i Theorem 1, 3.5) i Theorem 3 ad 3.7) i Theorem 4.

20 186 T. T. CAI AND Z. GUO 3. Suppose the estimator β is costructed based o the subsample Z 1) = y 1),X 1) ) ad it satisfies Assumptio A) with δ> ad 5.) sup θ k) P θ β β) S c 1 c β β) 1 S where S = suppβ) ) Cp δ, for all k log p. The for p there exist estimators of the loss β β q with 1 q<achievig the lower bouds give i 3.18) i Theorem 6. For reasos of space, we do ot discuss the detailed costructio for the poit ad iterval estimators achievig these miimax lower bouds here ad postpoe the costructio to the proof of Propositio 6. REMARK 3. Sample splittig has bee widely used i the literature. For example, the coditio that β is costructed based o the subsample Z 1) = y 1),X 1) ) has bee itroduced i [] for costructig cofidece sets for β ad i [0] for costructig cofidece itervals for the l loss. Such a coditio is imposed purely for techical reasos to create idepedece betwee the estimator β ad the subsample Z ) = y ),X ) ), which is useful to evaluate the l q loss of the estimator β. Asshowi[4], the assumptio 5.) is satisfied for the Lasso ad Datzig Selector. This techical assumptio is imposed such that β β 1 ca be tightly cotrolled by β β. 6. Geeral tools for miimax lower bouds. A major step i our aalysis is to establish rate sharp lower bouds for the estimatio error ad the expected legth of cofidece itervals for the l q loss. We itroduce i this sectio ew techical tools that are eeded to establish these lower bouds. A sigificat distictio of the lower boud results give i the previous sectios from those for the traditioal parameter estimatio problems is that the costrait is o the performace of the estimator β of the regressio vector β, butthelower bouds are o the difficulty of estimatig its loss β β q. It is ecessary to develop ew lower boud techiques to establish rate-optimal lower bouds for the estimatio error ad the expected legth of cofidece itervals for the loss β β q. These techical tools may also be of idepedet iterest. We begi with otatio. Let Z deote a radom variable whose distributio is idexed by some parameter θ ad let π deote a prior o the parameter space. We will use f θ z) to deote the desity of Z give θ ad f π z) to deote the margial desity of Z uder the prior π. LetP π deote the distributio of Z correspodig to f π z), thatis,p π A) = 1 z A f π z) dz, where1 z A is the idicator fuctio. For a fuctio g, we write E π gz)) for the expectatio uder f π. More specifically, f π z) = f θ z)πθ) dθ ad E π gz)) = gz)f π z) dz. The L 1 distace betwee two probability distributios with desities f 0 ad f 1 is give by L 1 f 1,f 0 ) = f 1 z) f 0 z) dz. The followig theorem establishes the miimax lower bouds for the estimatio error ad the expected legth of

21 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 187 cofidece itervals for the l q loss, uder the costrait that β is a good estimator at at least oe iterior poit. THEOREM 8. Suppose 0 <α,α 0 < 1 4,1 q, 0 is positive defiite, θ 0 = β, 0,σ 0 ) ad F. Defie d = mi θ F βθ) β q. Let π deote a prior over the parameter space F. If a estimator β satisfies 6.1) P θ0 β β q 1 16 d the 6.) if L q ad sup θ {θ 0 } F ) 1 α 0, P θ L q β β q 1 ) 4 d c 1, 6.3) L ) α {θ0 },, β,l q = if E θ0 L CI α β,l q,z) ) c d, CI α β,l q,z) I α, β,l q ) where c 1 = mi{ 1 10, 9 10 α 0 L 1 f π,f θ0 )) + } ad c = 1 1 α α 0 L 1 f π,f θ0 )) +. REMARK 4. The miimax lower boud 6.) for the estimatio error ad 6.3) for the expected legth of cofidece itervals hold as log as the estimator β estimates β well at a iterior poit θ 0. Besides Coditio 6.1), aother key igrediet for the lower bouds 6.) ad6.3) is to costruct the least favorable space F with the prior π such that the margial distributios f π ad f θ0 are odistiguishable. For the estimatio lower boud 6.), costraiig that β β q ca be well estimated at θ 0, due to the odistiguishability betwee f π ad f θ0,we ca establish that the loss β β q caot be estimated well over F.Forthelower boud 6.3), by Coditio 6.1) ad the odistiguishability betwee f π ad f θ0, we will show that β β q over F is much larger tha β β q, ad hece the hoest cofidece itervals must be sufficietly log. Theorem 8 is used to establish the miimax lower bouds for both the estimatio error ad the expected legth of cofidece itervals of the l q loss over k). By takig θ 0 k 0 ) ad = k), Theorem follows from 6.) with a properly costructed subset F k). Bytakigθ 0 k 0 ) ad = k ),thelower boud 4.) i Theorem 7 follows from 6.3) with a properly costructed F k ). I both cases, Assumptio A1) implies Coditio 6.1). Several miimax lower bouds over 0 k) ca also be implied by Theorem 8. For the estimatio error, the miimax lower bouds.8) ad.9) over the regime k i Theorem 1 follow from 6.). For the expected legth of cofidece itervals, the miimax lower bouds 3.7) i Theorem 4 ad 3.18) i the regios

22 188 T. T. CAI AND Z. GUO k 1 k ad k 1 k i Theorem 6 follow from 6.3). I these cases, Assumptios A1) or A) ca guaratee that Coditio 6.1) is satisfied. However, the miimax lower bouds for estimatio error.9) i the regio k log p ad for the expected legth of cofidece itervals 3.18) ithe regio k 1 k log p caot be established usig the above theorem. The followig theorem, which requires testig a composite ull agaist a composite alterative, establishes the refied miimax lower bouds over 0 k). THEOREM 9. Let 0 <α,α 0 < 1 4,1 q ad θ 0 = β, 0,σ 0 ) where 0 is a positive defiite matrix. Let k 1 ad k be two sparsity levels. Assume that for i = 1, there exist parameter spaces F i {β, 0,σ 0 ) : β 0 k i } such that for give dist i ad d i : βθ) β ) 0 βθ) β ) = dist i ad βθ) β q = d i for all θ F i. Let π i deote a prior over the parameter space F i for i = 1,. Suppose that for θ 1 = β, 0,σ0 + dist 1 ) ad θ = β, 0,σ0 + dist ), there exist costats c 1,c > 0 such that 6.4) P θi β β q ) c i d i 1 α0 for i = 1,. The we have 6.5) if L q ad sup P θ L q β β q c ) 3 d c3, θ F 1 F 6.6) L α 0 k 1 ), 0 k ), β,l q ) c 4 1 c ) d 1 + c 1) d 1 where c3 = mi{ 1 4,1 c ) c 1) d 1 ) d + }, c4 = 1 α 0 α i=1 L 1 f πi,f θi ) L 1 f π,f π1 )) + ad c 3 = mi{ 10 1, 10 9 α 0 i=1 L 1 f πi, f θi ) L 1 f π,f π1 )) + }. REMARK 5. As log as the estimator β performs well at two poits, θ 1 ad θ, the miimax lower bouds 6.5) for the estimatio error ad 6.6) for the expected legth of cofidece itervals hold. Note that θ i i the above theorem does ot belog to the parameter space {β, 0,σ 0 ) : β 0 k i },fori = 1,. I cotrast to Theorem 8, Theorem 9 compares composite hypotheses F 1 ad F, which will lead to a sharper lower boud tha comparig the simple ull {θ 0 } with the composite alterative F. For simplicity, we costruct least favorable parameter spaces F i such that the poits i F i is of fixed geeralized) l distace ad fixed l q distace to β,fori = 1,, respectively. More importatly, we costruct F 1 with the ) +,

23 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 189 prior π 1 ad F with the prior π such that f π1 ad f π are ot distiguishable, where θ 1 ad θ are itroduced to facilitate the compariso. By Coditio 6.4) ad the costructio of F 1 ad F, we establish that the l q loss caot be simultaeously estimated well over F 1 ad F. For the lower boud 6.6), uder the same coditios, it is show that the l q loss over F 1 ad F are far apart ad ay cofidece iterval with guarateed coverage probability over F 1 F must be sufficietly log. Due to the prior iformatio = Iadσ = σ 0, the lower boud costructio over 0 k) is more ivolved tha that over k). We shall stress that the costructio of F 1 ad F ad the compariso betwee composite hypotheses are of idepedet iterest. The miimax lower boud.9) i the regio k follows from for 6.5) ad the miimax lower boud 3.18) i the regio k 1 k the expected legth of cofidece itervals follows from 6.6). I these cases, 0 is take as I ad Assumptio A) implies Coditio 6.4). 7. A itermediate settig with kow σ = σ 0 ad ukow. The results give i Sectios 3 ad 4 show the sigificat differece betwee 0 k) ad k) i terms of miimaxity ad adaptivity of cofidece itervals for β β q. 0 k) is for the simple settig with kow desig covariace matrix = Iad kow oise level σ = σ 0,ad k) is for ukow ad σ. I this sectio, we further cosider miimaxity ad adaptivity of cofidece itervals for β β q i a itermediate settig where the oise level σ = σ 0 is kow ad is ukow but of certai structure. Specifically, we cosider the followig parameter space: 1 β 0 k, λ mi ) λ max ) M 1 M 7.1) σ0 k, s) = β,,σ 0 ) : 1 1 L1 M, max 1), i 0 s 1 i p for some costats M 1 1adM>0. σ0 k, s) basically assumes kow oise level σ ad imposes sparsity coditios o the precisio matrix of the radom desig. This parameter space is similar to those used i the literature of sparse liear regressio with radom desig [13, 14, 9]. σ0 k, s) has two sparsity parameters where k represets the sparsity of β ad s represets the maximum row sparsity of the precisio matrix 1. Note that 0 k) σ0 k, s) k) ad 0 k) is a special case of σ0 k, s) with M 1 = 1. Uder the assumptio s /, the miimaxity ad adaptivity lower bouds for the expected legth of cofidece itervals for β β q with 1 q< over σ0 k, s) are the same as those over 0 k). Thatis,Theorems5 ad 6 hold with 0 k 1 ), 0 k ),ad 0 k) replaced by σ0 k 1,s), σ0 k,s)ad σ0 k, s), respectively. For the case q =, the followig theorem establishes the miimaxity ad adaptivity lower bouds for the expected legth of cofidece itervals for β β over σ 0 k, s).

24 1830 T. T. CAI AND Z. GUO THEOREM 10. Suppose 0 <α,α 0 < 1/4, M 1 > 1, s / ad the sparsity levels k 1, k ad k 0 satisfy Assumptio B) with the costat c 0 replaced by c0 defied i 9.14). For ay estimator β satisfyig 7.) sup P θ β β q C β ) q 0 θ k 0 ) σ α 0, with a costat C > 0, the there is some costat c>0 such that ) σ0 k 1,s), σ0 k,s), β,l 7.3) ad 7.4) L α L α { { c mi k, max k 1, 1 }}σ 0 ) k i σ0 k i,s), β,l c σ0 ad i = 1,. I particular, if p ad β is costructed based o the subsample Z 1) = y 1),X 1) ) ad satisfies Assumptio A) with δ>, the above lower bouds ca be attaied. I cotrast to Theorems 3 ad 4, the lower bouds for the case q = chage i the absece of the prior kowledge = I but the possibility of adaptivity of cofidece itervals over σ0 k, s) is similar to that over 0 k). Sice the Lasso estimator β L defied i.10) with A>4 satisfies Assumptio A) with δ>, by Theorem 10, the miimax lower bouds 7.3) ad 7.4) ca be attaied for β L.For β L, oly whe k 1 k log p, L α σ 0 k 1,s), β L,l ) L α σ 0 k 1,s), σ0 k,s), β,l ) k 1 ad adaptatio betwee σ0 k 1,s) ad σ0 k,s) is possible. I other regimes, if k 1 k, the L α σ 0 k 1,s), β L,l ) L α σ 0 k 1,s), σ0 k,s), β,l ) ad adaptatio betwee σ0 k 1,s) ad σ0 k,s) is impossible. For reasos of space, more discussio o σ0 k, s), icludig the costructio of adaptive cofidece itervals over the regime k 1 k, is postpoed to the supplemet [6]. 8. Miimax lower bouds for estimatig β q with 1 q. The lower bouds developed i this paper have broader implicatios. I particular, the established results imply the miimax lower bouds for estimatig β q ad the expected legth of cofidece itervals for β q with 1 q. To build the coectio, it is sufficiet to ote that the trivial estimator β = 0 satisfies Assumptios A1) ad A) with β = 0. The we ca apply the lower bouds.8),.9) ad

Accuracy Assessment for High-Dimensional Linear Regression

Accuracy Assessment for High-Dimensional Linear Regression Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia