ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania and Rutgers University

Size: px
Start display at page:

Download "ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania and Rutgers University"

Transcription

1 The Aals of Statistics 018, Vol. 46, No. 4, Istitute of Mathematical Statistics, 018 ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1 BY T. TONY CAI AND ZIJIAN GUO Uiversity of Pesylvaia ad Rutgers Uiversity This paper cosiders poit ad iterval estimatio of the l q loss of a estimator i high-dimesioal liear regressio with radom desig. We establish the miimax rate for estimatig the l q loss ad the miimax expected legth of cofidece itervals for the l q loss of rate-optimal estimators of the regressio vector, icludig commoly used estimators such as Lasso, scaled Lasso, square-root Lasso ad Datzig Selector. Adaptivity of cofidece itervals for the l q loss is also studied. Both the settig of the kow idetity desig covariace matrix ad kow oise level ad the settig of ukow desig covariace matrix ad ukow oise level are studied. The results reveal iterestig ad sigificat differeces betwee estimatig the l loss ad l q loss with 1 q< as well as betwee the two settigs. New techical tools are developed to establish rate sharp lower bouds for the miimax estimatio error ad the expected legth of miimax ad adaptive cofidece itervals for the l q loss. A sigificat differece betwee loss estimatio ad the traditioal parameter estimatio is that for loss estimatio the costrait is o the performace of the estimator of the regressio vector, but the lower bouds are o the difficulty of estimatig its l q loss. The techical tools developed i this paper ca also be of idepedet iterest. 1. Itroductio. I may applicatios, the goal of statistical iferece is ot oly to costruct a good estimator, but also to provide a measure of accuracy for this estimator. I classical statistics, whe the parameter of iterest is oedimesioal, this is achieved i the form of a stadard error or a cofidece iterval. A prototypical example is the iferece for a biomial proportio, where ofte ot oly a estimate of the proportio but also its margi of error are give. Accuracy measures of a estimatio procedure have also bee used as a tool for the empirical selectio of tuig parameters. A well-kow example is Stei s Ubiased Risk Estimate SURE), which has bee a effective tool for the costructio of data-drive adaptive estimators i ormal meas estimatio, oparametric sigal recovery, covariace matrix estimatio, ad other problems. See, for istace, [11, 15, 1, 5, 3]. The commoly used cross-validatio methods ca also be viewed as a useful tool based o the idea of empirical assessmet of accuracy. Received March 016; revised March Supported i part by NSF Grats DMS ad DMS , ad NIH Grat R01 CA MSC010 subject classificatios. Primary 6G15; secodary 6C0, 6H35. Key words ad phrases. Accuracy assessmet, adaptivity, cofidece iterval, high-dimesioal liear regressio, loss estimatio, miimax lower boud, miimaxity, sparsity. 1807

2 1808 T. T. CAI AND Z. GUO I this paper, we cosider the problem of estimatig the loss of a give estimator i the settig of high-dimesioal liear regressio, where oe observes X, y) with X R p ad y R,adfor1 i, y i = X i β + ε i. Here, β R p i.i.d. is the regressio vector, X i N p 0, )are the rows of X ad the i.i.d. errors ε i N0,σ ) are idepedet of X. This high-dimesioal liear model has bee well studied i the literature, where the mai focus has bee o estimatio of β. Several pealized/costraied l 1 miimizatio methods, icludig Lasso [8], Datzig selector [1], scaled Lasso [6] ad square-root Lasso [3], have bee proposed. These methods have bee show to work well i applicatios ad produce iterpretable estimates of β whe β is assumed to be sparse. Theoretically, with a properly chose tuig parameter, these estimators achieve the optimal rate of covergece over collectios of sparse parameter spaces; see, for example, [3 5, 1, 3, 6, 30]. For a give estimator β, thel q loss β β q with 1 q is commoly used as a metric of accuracy for β. We cosider i the preset paper both poit ad iterval estimatio of the l q loss β β q for a give β. Note that the loss β β q is a radom quatity, depedig o both the estimator β ad the parameter β. For such a radom quatity, predictio ad predictio iterval are usually used for poit ad iterval estimatio, respectively. However, we slightly abuse the termiologies i the preset paper by usig estimatio ad cofidece iterval to represet the poit ad iterval estimators of the loss β β q. Sice the l q loss depeds o the estimator β, it is ecessary to specify the estimator i the discussio of loss estimatio. Throughout this paper, we restrict our attetio to a broad collectio of estimators β that perform well at least at oe iterior poit or a small subset of the parameter space. This collectio of estimators icludes most state-ofart estimators such as Lasso, Datzig selector, scaled Lasso ad square-root Lasso. High-dimesioal liear regressio has bee well studied i two settigs. Oe is the settig with kow desig covariace matrix = I, kow oise level σ = σ 0 ad sparse β; seeforexample,[1,, 7, 16, 19,, 7, 30]. Aother commoly cosidered settig is sparse β with ukow ad σ. We study poit ad iterval estimatio of the l q loss β β q i both settigs. Specifically, we cosider the parameter space 0 k) itroduced i.3), which cosists of k-sparse sigals β with kow desig covariace matrix = I ad kow oise level σ = σ 0,ad k) defied i.4), which cosists of k-sparse sigals with ukow ad σ Our cotributios. The preset paper studies the miimax ad adaptive estimatio of the loss β β q for a give estimator β ad the miimax expected legth ad adaptivity of cofidece itervals for the loss. A major step i our aalysis is to establish rate sharp lower bouds for the miimax estimatio error ad

3 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 1809 the miimax expected legth of cofidece itervals for the l q loss over 0 k) ad k) for a broad class of estimators of β, which cotais the subclass of rateoptimal estimators. We the focus o the estimatio of the loss of rate-optimal estimators ad take the Lasso ad scaled Lasso estimators as geeric examples. For these rate-optimal estimators, we propose procedures for poit estimatio as well as cofidece itervals for their l q losses. It is show that the proposed procedures achieve the correspodig lower bouds up to a costat factor. These results together establish the miimax rates for estimatig the l q loss of rate-optimal estimators over 0 k) ad k). The aalysis shows iterestig ad sigificat differeces betwee estimatig the l loss ad l q loss with 1 q<aswellas betwee the two parameter spaces k) ad 0 k): The miimax rate for estimatig β β over 0k) is mi{ 1 over k) is k = Iadσ = σ 0 whe,k } ad. So loss estimatio is much easier with the prior iformatio. k The miimax rate for estimatig β β q with 1 q<over both 0k) ad k) is k q. I the regime k, a practical loss estimator is proposed for estimatig the l loss ad show to achieve the optimal covergece rate 1 adaptively over 0 k). Wesayestimatio of loss is impossible if the miimax rate ca be achieved by the trivial estimator 0, which meas that the estimatio accuracy of the loss is at least of the same order as the loss itself. I all other cosidered cases, estimatio of loss is show to be impossible. These results idicate that loss estimatio is difficult. We the tur to the costructio of cofidece itervals for the l q loss. A cofidece iterval for the loss is useful eve whe it is impossible to estimate the loss, as a cofidece iterval ca provide otrivial upper ad lower bouds for the loss. I terms of covergece rate over 0 k) or k), the miimax rate of the expected legth of cofidece itervals for the l q loss, β β q, of ay rate-optimal estimator β coicides with the miimax estimatio rate. We also cosider the adaptivity of cofidece itervals for the l q loss of ay rate-optimal estimator β. The framework for adaptive cofidece itervals is discussed i detail i Sectio 3.1.) Regardig cofidece itervals for the l loss i the case of kow = Iad σ = σ 0, a procedure is proposed ad is show to achieve the optimal legth 1 adaptively over 0 k) for k log p. Furthermore, it is show that this is the oly regime where adaptive cofidece itervals exist, eve over two give parameter spaces. For example, whe k 1 ad k 1 k, it is impossible to costruct a cofidece iterval for the l loss with guarateed coverage probability over 0 k ) [cosequetly also over 0 k 1 )] ad with the expected legth automatically adjusted to the sparsity. Similarly, for the l q loss with 1 q<, co-

4 1810 T. T. CAI AND Z. GUO structio of adaptive cofidece itervals is impossible over 0 k 1 ) ad 0 k ) for k 1 k log p. Regardig cofidece itervals for the l q loss with 1 q i the case of ukow ad σ, the impossibility of adaptivity also holds over k 1 ) ad k ) for k 1 k log p. Establishig rate-optimal lower bouds requires the developmet of ew techical tools. Oe mai differece betwee loss estimatio ad the traditioal parameter estimatio is that for loss estimatio the costrait is o the performace of the estimator β of the regressio vector β, but the lower boud is o the difficulty of estimatig its loss β β q. We itroduce useful ew lower boud techiques for the miimax estimatio error ad the expected legth of adaptive cofidece itervals for the loss β β q. I several importat cases, it is ecessary to test a composite ull agaist a composite alterative i order to establish rate sharp lower bouds. The techical tools developed i this paper ca also be of idepedet iterest. I additio to 0 k) ad k), we also study a itermediate parameter space where the oise level σ is kow ad the desig covariace matrix is ukow but of certai structure. Lower bouds for the expected legth of miimax ad adaptive cofidece itervals for β β q over this parameter space are established for a broad collectio of estimators β ad are show to be rate sharp for the class of rate-optimal estimators. Furthermore, the lower bouds developed i this paper have wider implicatios. I particular, it is show that they lead immediately to miimax lower bouds for estimatig β q ad the expected legth of cofidece itervals for β q with 1 q. 1.. Compariso with other works. Statistical iferece o the loss of specific estimators of β has bee cosidered i the recet literature. The papers [, 16] established, i the settig = Iad/p δ 0, ), the limit of the ormalized loss p 1 βλ) β where βλ) is the Lasso estimator with a pre-specified tuig parameter λ. Although [, 16] provided a exact asymptotic expressio of the ormalized loss, the limit itself depeds o the ukow β. I a similar settig, the paper [7] established the limit of a ormalized l loss of the square-root Lasso estimator. These limits of the ormalized losses help uderstad the properties of the correspodig estimators of β, but they do ot lead to a estimate of the loss. Our results imply that although these ormalized losses have a limit uder certai regularity coditios, such losses caot be estimated well i most settigs. A recet paper, [0], costructed a cofidece iterval for β β i the case of kow = I, ukow oise level σ ad moderate dimesio where /p ξ 0, 1) ad o sparsity is assumed o β. While o sparsity assumptio o β is imposed, their method requires the assumptio of = Iad/p ξ 0, 1). I cotrast, i this paper, we cosider both ukow ad kow = I settigs, while allowig p ad assumig sparse β.

5 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 1811 Hoest adaptive iferece has bee studied i the oparametric fuctio estimatio literature, icludig [9] for adaptive cofidece itervals for liear fuctioals, [8, 18] for adaptive cofidece bads ad [10, 4] for adaptive cofidece balls, ad i the high-dimesioal liear regressio literature, icludig [] for adaptive cofidece set ad [7] for adaptive cofidece iterval for liear fuctioals. I this paper, we develop ew lower boud tools, Theorems 8 ad 9, to establish the possibility of adaptive cofidece itervals for β β q. The coectio betwee l loss cosidered i the curret paper ad the work [] is discussed i more detail i Sectio Orgaizatio. Sectio establishes the miimax lower bouds of estimatig the loss β β q with 1 q over both 0k) ad k) ad shows that these bouds are rate sharp for the Lasso ad scaled Lasso estimators, respectively. We the tur to iterval estimatio of β β q. Sectios 3 ad 4 preset the miimax ad adaptive miimax lower bouds for the expected legth of cofidece itervals for β β q over 0k) ad k). For Lasso ad scaled Lasso estimators, we show that the lower bouds ca be achieved ad ivestigate the possibility of adaptivity. Sectio 5 cosiders the rate-optimal estimators ad establishes the miimax covergece rate of estimatig their l q losses. Sectio 6 presets ew miimax lower boud techiques for estimatig the loss β β q. Sectio 7 discusses the miimaxity ad adaptivity i aother settig, where the oise level σ is kow ad the desig covariace matrix is ukow but of certai structure. Sectio 8 applies the ewly developed lower bouds to establish lower bouds for a related problem, that of estimatig β q. Sectio 9 proves the mai results ad additioal proofs are give i the supplemetal material [6] Notatio. For a matrix X R p, X i, X j ad X i,j deote respectively the ith row, jth colum ad i, j) etry of the matrix X. For a subset J {1,,...,p}, J deotes the cardiality of J, J c deotes the complemet {1,,...,p}\J, X J deotes the submatrix of X cosistig of colums X j with j J ad for a vector x R p, x J is the subvector of x with idices i J.For a vector x R p, suppx) deotes the support of x ad the l q orm of x is defied as x q = p i=1 x i q ) q 1 for q 0 with x 0 = suppx) ad x = max 1 j p x j.fora R, a + = max{a,0}; fora,b R, a b = max{a,b}. We use max X j as a shorthad for max 1 j p X j ad mi X j as a shorthad for mi 1 j p X j.foramatrixa, we defie the spectral orm A = sup x =1 Ax p ad the matrix l 1 orm A L1 = sup 1 j p i=1 A ij ; for a symmetric matrix A, λ mi A) ad λ max A) deote respectively the smallest ad largest eigevalue of A. Weusec ad C to deote geeric positive costats that may vary from place to place. For two positive sequeces a ad b, a b meas a Cb for all ad a b if b a ad a b if a b ad a b a,ada b if lim sup b = 0ada b if b a.

6 181 T. T. CAI AND Z. GUO. Miimax estimatio of the l q loss. We begi by presetig the miimax framework for estimatig the l q loss, β β q, of a give estimator β, adthe establish the miimax lower bouds for the estimatio error for a broad collectio of estimators β. We also show that such miimax lower bouds ca be achieved for the Lasso ad scaled Lasso estimators..1. Problem formulatio. Recall the high-dimesioal liear model,.1) y 1 = X p β p 1 + ε 1, ε N 0,σ I ). i.i.d. We focus o the radom desig with X i N0, )ad X i ad ε i are idepedet. Let Z = X, y) deote the observed data ad β be a give estimator of β. Deotig by L q Z) ay estimator of the loss β β q, the miimax rate of covergece for estimatig β β q overaparameterspace is defied as the largest quatity γ β,l q ) such that.) if sup L q θ P θ L q Z) β β q γ β,l q ) ) δ, for some costat δ>0 ot depedig o or p. We shall write L q for L q Z) whe there is o cofusio. We deote the parameter by θ = β,,σ), which cosists of the sigal β, the desig covariace matrix ad the oise level σ.foragiveθ = β,,σ),we use βθ) to deote the correspodig β. Two settigs are cosidered: The first is kow desig covariace matrix = I ad kow oise level σ = σ 0 ad the other is ukow ad σ. I the first settig, we cosider the followig parameter space that cosists of k-sparse sigals:.3) 0 k) = { β, I,σ 0 ) : β 0 k }, ad i the secod settig, we cosider { 1 k) = β,,σ): β 0 k, λ mi ) λ max ) M 1, M 1.4) 0 <σ M }, where M 1 1adM > 0 are costats. The parameter space 0 k) is a subset of k), which cosists of k-sparse sigals with ukow ad σ. The miimax rate γ β,l q ) for estimatig β β q also depeds o the estimator β. Differet estimators β could lead to differet losses β β q ad i geeral the difficulty of estimatig the loss β β q varies with β. We first recall the properties of some state-of-art estimators ad the specify the collectio of estimators o which we focus i this paper. As show i [3, 4, 1, 6], Lasso, Datzig

7 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 1813 Selector, scaled Lasso ad square-root Lasso satisfy the followig property if the tuig parameter is properly chose:.5) sup P θ β β q Ck ) q 0, θ k) where C>0is a costat. The miimax lower bouds established i [3, 30, 31] imply that k q is the optimal rate for estimatig β over the parameter space k). It should be stressed that all of these algorithms do ot require kowledge of the sparsity k ad are thus adaptive to the sparsity provided k. We cosider a broad collectio of estimators β satisfyig oe of the followig two assumptios: A1) The estimator β satisfies, for some θ 0 = β, I,σ 0 ),.6) P θ0 β β q C β q 0 1) ) σ 0 α 0, where 0 α 0 < 1 4 ad C > 0 are costats. A) The estimator β satisfies.7) sup P θ β β {θ=β q C β q 0 1) ),I,σ ):σ σ 0 } σ α 0, where 0 α 0 < 1 4 ad C > 0 are costats ad σ 0 > 0isgive. I view of the miimax rate give i.5), Assumptio A1) requires β to be a good estimator of β at at least oe poit θ 0 0 k). Assumptio A) is slightly stroger tha A1) ad requires β to estimate β well for a sigle β but over a rage of oise levels σ σ 0 while = I. Of course, ay estimator β satisfyig.5) satisfies both A1) ad A). I additio to Assumptios A1) ad A), we also itroduce the followig sparsity assumptios that will be used i various theorems: B1) Let c 0 be the costat defied i 9.14). The sparsity levels k ad k 0 satisfy 1 k c 0 mi{p γ, } for some costat 0 γ< 1 ad 1 k 0 c 0 mi{k, }. B) The sparsity levels k 1,k ad k 0 satisfy 1 k 1 k c 0 mi{p γ, } for some costat 0 γ< 1 ad c 0 > 0ad1 k 0 c 0 mi{k 1, }... Miimax estimatio of the l q loss over 0 k). The followig theorem establishes the miimax lower bouds for estimatig the loss β β q over the parameter space 0 k).

8 1814 T. T. CAI AND Z. GUO THEOREM 1. Suppose that the sparsity levels k ad k 0 satisfy Assumptio B1). For ay estimator β satisfyig Assumptio A1) with β 0 k 0, {.8) if sup P θ L β β c mi k ) L, 1 }σ 0 δ. θ 0 k) For ay estimator β satisfyig Assumptio A) with β 0 k 0,.9) if L q sup θ 0 k) P θ L q β β q ck q ) σ 0 δ where δ>0 ad c>0 are costats. for 1 q<, REMARK 1. Assumptio A1) restricts our focus to estimators that ca perform well at at least oe poit β, I,σ 0 ) 0 k). This weak coditio makes the established lower bouds widely applicable as the bechmark for evaluatig estimators of the l q loss of ay β that performs well at a proper subset, or eve a sigle poit of the whole parameter space. I this paper, we focus o estimatig the loss β β q with 1 q. Similar results ca be established for the loss i the form of β β q q with 1 q. Uder the same assumptios as those i Theorem 1, the lower bouds for estimatig the loss β β q q hold with replacig the covergece rates with their q power; that is,.8) remais the same while the covergece rate k q /σ 0 ) i.9) is replaced by k /σ 0 ) q. Similarly, all the results established i the rest of the paper for β β q hold for β β q q with correspodig covergece rates replaced by their q power. Theorem 1 establishes the miimax lower bouds for estimatig the l loss β β of ay estimator β satisfyig Assumptio A1) ad the l q loss β β q with 1 q< of ay estimator β satisfyig Assumptio A). We will take the Lasso estimator as a example ad demostrate the implicatios of the above theorem. We radomly split Z = y, X) ito subsamples Z 1) = y 1),X 1) ) ad Z ) = y ),X ) ) with sample sizes 1 ad, respectively. The Lasso estimator β L based o the first subsample Z 1) = y 1),X 1) ) is defied as.10) β L y 1) X 1) β = arg mi β R p 1 + λ p j=1 X 1) j 1 β j, where λ = A / 1 σ 0 with A> beig a pre-specified costat. Without loss of geerality, we assume 1. For the case 1 q<,.5) ad.9) together imply that the estimatio of the l q loss β L β q is impossible sice the lower boud ca be achieved by the trivial estimator of the loss, 0, that is, sup θ 0 k) P θ 0 β L β q Ck q ) 0.

9 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 1815 k For the case q =, i the regime k, the lower boud i.8) ca be achieved by the zero estimator, ad hece estimatio of the loss β L β is impossible. However, the iterestig case is whe k log p,theloss estimator L proposed i.11) achieves the miimax lower boud 1 i.8), which caot be achieved by the zero estimator. We ow detail the costructio of the loss estimator L. Based o the secod half-sample Z ) = y ),X ) ),we propose the followig estimator:.11) L 1 = y ) X ) β L ) σ 0. Note that the first subsample Z 1) = y 1),X 1) ) is used to produce the Lasso estimator β L i.10) ad the secod subsample Z ) = y ),X ) ) is retaied to evaluate the loss β L β. Such sample splittig techique is similar to crossvalidatio ad has bee used i [] for costructig cofidece sets for β ad i [0] for cofidece itervals for the l loss. The followig propositio establishes that the estimator L achieves the miimax lower boud of.8) over the regime k log p. PROPOSITION 1. Suppose that k ad β L is the Lasso estimator defied i.10) with A>, the the estimator of loss proposed i.11) satisfies, for ay sequece δ,p,.1) lim sup sup,p θ 0 k) P θ L β L β ) 1 δ,p = Miimax estimatio of the l q loss over k). We ow tur to the case of ukow ad σ ad establish the miimax lower boud for estimatig the l q loss over the parameter space k). THEOREM. Suppose that the sparsity levels k ad k 0 satisfy Assumptio B1). For ay estimator β satisfyig Assumptio A1) with β 0 k 0, ) sup P θ L q β β q δ, 1 q,.13) if L q θ k) where δ>0 ad c>0 are costats. ck q Theorem provides a miimax lower boud for estimatig the l q loss of ay estimator β satisfyig Assumptio A1), icludig the scaled Lasso estimator defied as.14) { β SL, ˆσ } y = arg mi Xβ + σ p β R p,σ R + σ + λ X j 0 β j, + j=1

10 1816 T. T. CAI AND Z. GUO where λ 0 = A / with A>. Note that for the scaled Lasso estimator the lower boud i.13) ca be achieved by the trivial loss estimator 0, i the sese, sup θ k) P θ 0 β SL β q Ck q ) 0, ad hece estimatio of loss is impossible i this case. 3. Miimaxity ad adaptivity of cofidece itervals over 0 k). We focused i the last sectio o poit estimatio of the l q loss ad showed the impossibility of loss estimatio except for oe regime. The results aturally lead to aother questio: Is it possible to costruct useful cofidece itervals for β β q that ca provide otrivial upper ad lower bouds for the loss? I this sectio, after itroducig the framework for miimaxity ad adaptivity of cofidece itervals, we cosider the case of kow = Iadσ = σ 0 ad establish the miimaxity ad adaptivity lower bouds for the expected legth of cofidece itervals for the l q loss of a broad collectio of estimators over the parameter space 0 k). Wealso show that such miimax lower bouds ca be achieved for the Lasso estimator ad the discuss the possibility of adaptivity usig the Lasso estimator as a example. The case of ukow ad σ will be the focus of the ext sectio Framework for miimaxity ad adaptivity of cofidece itervals. I this sectio, we itroduce the followig decisio theoretical framework for cofidece itervals of the loss β β q.give0<α<1 ad the parameter space ad the loss β β q, deote by I α, β,l q ) the set of all 1 α) level cofidece itervals for β β q over, I α, β,l { q ) = CI α β,l q,z)= [ lz),uz) ] : 3.1) if P θ β βθ) θ q CI α β,l q,z) ) 1 α}. We will write CI α for CI α β,l q,z) whe there is o cofusio. For ay cofidece iterval CI α β,l q,z)=[lz),uz)], its legth is deoted by LCI α β,l q, Z)) = uz) lz) ad the maximum expected legth over a parameter space 1 is defied as 3.) L ) CI α β,l q,z), 1 = sup E θ L CI α β,l q,z) ). θ 1 For two ested parameter spaces 1, we defie the bechmark L α 1,, β,l q ), measurig the degree of adaptivity over the ested spaces 1, 3.3) L α 1,, β,l q ) = if CI α β,l q,z) I α, β,l q ) sup E θ L CI α β,l q,z) ). θ 1 We will write L α 1, β,l q ) for L α 1, 1, β,l q ), which is the miimax expected legth of cofidece itervals for β β q over 1. The bechmark L α 1,, β,l q ) is the ifimum of the maximum expected legth over 1 amog

11 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 1817 FIG. 1. The plot demostrates defiitios of L α 1, β,l q ) ad L α 1,, β,l q ). all 1 α)-level cofidece itervals over. I cotrast, L α 1, β,l q ) is cosiderig all 1 α)-level cofidece itervals over 1. I words, if there is prior iformatio that the parameter lies i the smaller parameter space 1, L α 1, β,l q ) measures the bechmark legth of cofidece itervals over the parameter space 1, which is illustrated i the left of Figure 1; however, if there is oly prior iformatio that the parameter lies i the larger parameter space, L α 1,, β,l q ) measures the bechmark legth of cofidece itervals over the parameter space 1, which is illustrated i the right of Figure 1. Rigorously, we defie a cofidece iterval CI to be simultaeously adaptive over 1 ad if CI I α, β,l q ), 3.4) L CI ), 1 L α 1, β,l q ) ad L CI ), L α, β,l q ). The coditio 3.4) meas that the cofidece iterval CI, which has coverage over the larger parameter space, achieves the miimax rate over both 1 ad. Note that LCI, 1 ) L α 1,, β,l q ).IfL α 1,, β,l q ) L α 1, β,l q ), the the rate-optimal adaptatio 3.4) is impossible to achieve for 1. Otherwise, it is possible to costruct cofidece itervals simultaeously adaptive over parameter spaces 1 ad. The possibility of adaptatio over parameter spaces 1 ad ca thus be aswered by ivestigatig the bechmark quatities L α 1, β,l q ) ad L α 1,, β,l q ). Such framework has already bee itroduced i [7], which studies the miimaxity ad adaptivity of cofidece itervals for liear fuctioals i high-dimesioal liear regressio. We will adopt the miimax ad adaptatio framework discussed above ad establish the miimax expected legth L α 0k), β,l q ) ad the adaptatio bechmark L α 0k 1 ), 0 k ), β,l q ). I terms of the miimax expected legth ad the adaptivity behavior, there exist fudametal differeces betwee the case q = ad 1 q<. We will discuss them separately i the followig two subsectios. 3.. Cofidece itervals for the l loss over 0 k). The followig theorem establishes the miimax lower boud for the expected legth of cofidece itervals of β β over the parameter space 0k). THEOREM 3. Suppose that 0 <α< 1 4 ad the sparsity levels k ad k 0 satisfy Assumptio B1). For ay estimator β satisfyig Assumptio A1) with β 0

12 1818 T. T. CAI AND Z. GUO k 0, the there is some costat c>0 such that 3.5) L α { 0 k), β,l ) k 1 c mi, }σ 0. I particular, if β L is the Lasso estimator defied i.10) with A>, the the miimax expected legth for 1 α) level cofidece itervals of β L β over 0 k) is 3.6) L α { 0 k), β L ) k 1,l mi, }σ 0. We ow cosider adaptivity of cofidece itervals for the l loss. The followig theorem gives the lower boud for the bechmark L α 0k 1 ), 0 k ), β,l ). We will the discuss Theorems 3 ad 4 together. THEOREM 4. Suppose that 0 <α< 1 4 ad the sparsity levels k 1, k ad k 0 satisfy Assumptio B). For ay estimator β satisfyig Assumptio A1) with β 0 k 0, the there is some costat c>0 such that 3.7) L α { 0 k 1 ), 0 k ), β,l ) k 1 c mi, }σ 0. I particular, if β L is the Lasso estimator defied i.10) with A>, the above lower boud ca be achieved. The lower boud established i Theorem 4 implies that of Theorem 3 ad both lower bouds hold for a geeral class of estimators satisfyig Assumptio A1). There is a phase trasitio for the lower boud of the bechmark L α 0k 1 ), 0 k ), β,l ). I the regime k k σ0 ;whe k log p estimator β L defied i.10), the lower boud k, the lower boud i 3.7)is 1 σ 0, the lower boud i 3.7) is. For the Lasso σ0 i 3.5) ad k σ0 i 3.7) ca be achieved by the cofidece itervals CI 0 α Z, k, ) ad CI0 α Z, k, ) defied i 3.15), respectively. Applyig a similar idea to.11), we show that the miimax lower boud 1 σ0 i 3.6) ad3.7) ca be achieved by the followig cofidece iterval: ) ) ) 3.8) CI 1 α Z) = ψz) ψz),, 1 χ1 α ) σ χ α ) σ 0 where χ1 α ) ad χ α ) are the 1 α ad α quatiles of χ radom variable with degrees of freedom, respectively, ad { 1 3.9) ψz)= mi y ) X ) β L,σ 0 }. +

13 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 1819 Note that the two-sided cofidece iterval 3.8) is simply based o the observed data Z, ot depedig o ay prior kowledge of the sparsity k. Furthermore,it is a two-sided cofidece iterval, which tells ot oly just a upper boud, but also a lower boud for the loss. The coverage property ad the expected legth of CI 1 α Z) are established i the followig propositio. PROPOSITION. Suppose that k log p ad β L is the estimator defied i.10) with A>. The CI 1 α Z) defied i 3.8) satisfies 3.10) lim if if,p P β L β θ 0 k) CI1 α Z)) 1 α, ad 3.11) L CI 1 α Z), 0k) ) 1 σ 0. Regardig the Lasso estimator β L defied i.10), we will discuss the possibility of adaptivity of cofidece itervals for β L β. The adaptivity behavior of cofidece itervals for β L β is demostrated i Figure. As illustrated i the rightmost plot of Figure, i the regime k 1 k log p, we obtai L α 0k 1 ), 0 k ), β L,l ) L α 0k 1 ), β L,l ) 1, which implies that adaptatio is possible over this regime. As show i Propositio, the cofidece iterval CI 1 α Z) defied i 3.8) is fully adaptive over the regime k log p i the sese of 3.4). Illustrated i the leftmost ad middle plots of Figure, it is impossible to costruct a adaptive cofidece iterval for β L β over regimes k 1 k ad k 1 k sice L α 0k 1 ), 0 k ), β L,l ) FIG.. Illustratio of L α 0k 1 ), β L,l ) top) ad L α 0k 1 ), 0 k ), β L,l ) bottom) over regimes k 1 k leftmost), k 1 k log p middle) ad k 1 k log p rightmost).

14 180 T. T. CAI AND Z. GUO L α 0k 1 ), β L,l ) if k 1 ad k 1 k. To sum up, adaptive cofidece itervals for β L β are oly possible over the regime k log p. Compariso with cofidece balls. We should ote that the problem of costructig cofidece itervals for β β is related to but differet from that of costructig cofidece sets for β itself. Cofidece balls costructed i [] are of form {β : β β u Z)},where β ca be the Lasso estimator ad u Z) is a data depedet squared radius. See [] for further details. A aive applicatio of this cofidece ball leads to a oe-sided cofidece iterval for the loss β β, 3.1) CI iduced α Z) = { β β : β β u Z) }. Due to the reaso that cofidece sets for β were sought for i Theorem 1 i [], cofidece sets i the form {β : β β u Z)} will suffice to achieve the optimal legth. However, sice our goal is to characterize β β, we apply the ubiased risk estimatio discussed i Theorem 1 of [] ad costruct the two-sided cofidece iterval i 3.8). Such a two-sided cofidece iterval is more iformative tha the oe-sided cofidece iterval 3.1) sice the oe-sided cofidece iterval does ot cotai the iformatio whether the loss is close to zero or ot. Furthermore, as show i [], the legth of cofidece iterval CI iduced α Z) over 1 the parameter space 0 k) is of order + k. The two-sided cofidece iterval CI 1 1 α Z) costructed i 3.8) is of expected legth, which is much shorter tha 1 + k i the regime k, that is, the two-sided cofidece iterval 3.8) provides a more accurate iterval estimator of the l loss. This is illustrated i Figure 3. The lower boud techique developed i the literature of adaptive cofidece sets [] ca also be used to establish some of the lower boud results for the case q = give i the preset paper. However, ew techiques are eeded i order to establish the rate sharp lower bouds for the miimax estimatio error.9) i the regio k ad for the expected legth of the cofidece itervals 3.18) ad7.3) i the regio k 1 k log p, where it is ecessary to test FIG. 3. Compariso of the two-sided cofidece iterval CI 1 α Z) with the oe-sided cofidece iterval CI iduced α Z).

15 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 181 a composite ull agaist a composite alterative i order to establish rate sharp lower bouds Cofidece itervals for the l q loss with 1 q< over 0 k). We ow cosider the case 1 q< ad ivestigate the miimax expected legth ad adaptivity of cofidece itervals for β β q over the parameter space 0k). The followig theorem characterizes the miimax covergece rate for the expected legth of cofidece itervals. THEOREM 5. Suppose that 0 <α< 1 4,1 q< ad the sparsity levels k ad k 0 satisfy Assumptio B1). For ay estimator β satisfyig Assumptio A) with β 0 k 0, the there is some costat c>0 such that 3.13) L α 0 k), β,l ) q ck q σ 0. I particular, if β L is the Lasso estimator defied i.10) with A>, the the miimax expected legth for 1 α) level cofidece itervals of β L β q over 0 k) is 3.14) L α 0 k), β L ),l q k q σ 0. We ow costruct the cofidece iterval achievig the miimax covergece rate i 3.14), 3.15) CI 0 α Z,k,q)= 0,C A, k)k q ), Aσ 0 ) 3η 0 η, 0 +1 Aσ 0) k ) η k 0) where C A, k) = max{ } with η 0 = 1.01 ) 4 1 A+. The followig propositio establishes the coverage property ad the ex- A pected legth of CI 0 α Z,k,q). PROPOSITION 3. Suppose that 1 k ad β L is the estimator defied i.10) with A>. For 1 q, the cofidece iterval CI 0 α Z,k,q) defied i 3.15) satisfies 3.16) lim if if β β q CI0 α Z,k,q)) = 1, ad P θ,p θ 0 k) 3.17) L CI 0 α Z,k,q), 0k) ) k q σ 0. I particular, for the case q =, 3.16) ad 3.17) also hold for the estimator β L defied i.10) with A>.

16 18 T. T. CAI AND Z. GUO This result shows that the cofidece iterval CI 0 α Z,k,q) achieves the miimax rate give i 3.14). I cotrast to the l loss where the two-sided cofidece iterval 3.8) is sigificatly shorter tha the oe-sided iterval ad achieves the optimal rate over the regime k log p,forthel q loss with 1 q<, the oe-sided cofidece iterval achieves the optimal rate give i 3.14). We ow cosider adaptivity of cofidece itervals. The followig theorem establishes the lower bouds for L α 0k 1 ), 0 k ), β,l q ) with 1 q<. THEOREM 6. Suppose 0 <α< 1 4,1 q<ad the sparsity levels k 1, k ad k 0 satisfy Assumptio B). For ay estimator β satisfyig Assumptio A) with β 0 k 0, the there is some costat c>0 such that 3.18) q ck σ 0 if k 1 k, L α 0 k 1 ), 0 k ), β,l ) q q ck 1 1 σ 0 if k 1 k, q ck 1 k 1 σ 0 if k 1 k. I particular, if p ad β L is the Lasso estimator defied i.10) with A> 4, the above lower bouds ca be achieved. The lower bouds of Theorem 6 imply that of Theorem 5 ad both lower bouds hold for a geeral class of estimators satisfyig Assumptio A). However, the lower boud 3.18) i Theorem 6 has a sigificatly differet meaig from 3.13) i Theorem 5 where 3.18) quatifies the cost of adaptatio without kowig the sparsity level. For the Lasso estimator β L defied i.10), by comparig Theorem 5 ad Theorem 6, we obtai L α 0k 1 ), 0 k ), β L,l q ) L α 0k 1 ), β L,l q ) if k 1 k, which implies the impossibility of costructig adaptive cofidece itervals for the case 1 q<. There exists marked differece betwee the case 1 q< ad the case q =, where it is possible to costruct adaptive cofidece itervals over the regime k log p. For the Lasso estimator β L defied i.10), it is show i Propositio 3 that the cofidece iterval CI 0 α Z, k,q) defied i 3.15) achieves the lower boud q k σ 0 of 3.18). The lower bouds k q 1 k 1 σ 0 ad k q 1 1 σ0 of 3.18) ca be achieved by the followig proposed cofidece iterval: CI α Z, k,q) ) ) ) 3.19) ψz) = 1 χ1 α ) σ 0,16k ) q 1 ψz) 1 + χ α ) σ 0, +

17 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 183 where ψz) is give i 3.9). The above claim is verified i Propositio 4. Note that the cofidece iterval CI 1 α Z) defied i 3.8) is a special case of CI α Z, k,q)with q =. PROPOSITION 4. Suppose p, 1 k 1 k log p ad β L is defied i.10) with A>4. The CI α Z, k,q)defied i 3.19) satisfies 3.0) lim if if β β q CI α Z, k,q) ) 1 α, ad P θ,p θ 0 k ) 3.1) L CI α Z, k,q), 0 k 1 ) ) k q 1 k )σ Miimaxity ad adaptivity of cofidece itervals over k). I this sectio, we focus o the case of ukow ad σ ad establish the miimax expected legth of cofidece itervals for β β q with 1 q over k) defied i.4). We also study the possibility of adaptivity of cofidece itervals for β β q. The followig theorem establishes the lower bouds for the bechmark quatities L α k i), β,l q ) with i = 1, adl α k 1), k ), β,l q ). THEOREM 7. Suppose that 0 <α< 1 4,1 q ad the sparsity levels k 1, k ad k 0 satisfy Assumptio B). For ay estimator β satisfyig Assumptio A1) at θ 0 = β, I,σ 0 ) with β 0 k 0, there is a costat c>0 such that L α ki ), β,l ) q ck q 4.1) i for i = 1,, L α {θ0 }, k ), β,l 4.) ) q ck q. I particular, if β SL is the scaled Lasso estimator defied i.14) with A>, the the above lower bouds ca be achieved. The lower bouds 4.1) ad4.) hold for ay β satisfyig Assumptio A1) at a iterior poit θ 0 = β, I,σ 0 ), icludig the scaled Lasso estimator as a special case. We demostrate the impossibility of adaptivity of cofidece itervals for the l q loss of the scaled Lasso estimator β SL.SiceL α k 1), k ), β SL,l q ) L α {θ 0}, k ), β SL,l q ), by 4.), we have L α k 1), k ), β SL,l q ) L α k 1), β SL,l q ) if k 1 k. The compariso of L α k 1), β SL,l q ) ad L α k 1), k ), β SL,l q ) is illustrated i Figure 4. Referrig to the adaptivity defied i 3.4), it is impossible to costruct adaptive cofidece itervals for β SL β q. Theorem 7 shows that for ay cofidece iterval CI α β,l q,z) for the loss of ay give estimator β satisfyig Assumptio A1), uder the coverage costrait that CI α β,l q,z) I α k ), β,l q ), its expected legth at ay give

18 184 T. T. CAI AND Z. GUO FIG. 4. Illustratio of L α k 1), β SL,l q ) left) ad L α k 1), k ), β SL,l q ) right). θ 0 = β q, I,σ) k 0 ) must be of order k. I cotrast to Theorems 4 ad 6, Theorem 7 demostrates that cofidece itervals must be log at a large subset of poits i the parameter space, ot just at a small umber of ulucky poits. Therefore, the lack of adaptivity for cofidece itervals is ot due to the coservativeess of the miimax framework. I the followig, we detail the costructio of cofidece itervals for β SL β q. The costructio of cofidece itervals is based o the followig defiitio of restricted eigevalue, which is itroduced i [4]: 4.3) κx,k,s,α 0 ) = mi J 0 {1,...,p}, J 0 k mi δ 0, δ J c 1 α 0 δ J0 1 0 Xδ δj01, where J 1 deotes the subset correspodig to the s largest i absolute value coordiates of δ outside of J 0 ad J 01 = J 0 J 1. Defie the evet B ={ˆσ }. The cofidece iterval for β SL β q is defied as {[ ] 0,ϕZ,k,q) o B, 4.4) CI α Z,k,q)= {0} o B c, where { ϕz,k,q) = mi 16Amax X j σ κ X,k,k,3 max X j mi X j )) ) k q, k q ) σ }. REMARK. The restricted eigevalue κ X,k,k,3 max X j mi X j )) is computatioally ifeasible. For desig covariace matrix of special structures, the restricted eigevalue ca be replaced by its lower boud ad a computatioally feasible cofidece iterval ca be costructed. See Sectio 4.4 i [7] for more details. Properties of CI α Z,k,q)are established as follows. PROPOSITION 5. Suppose that 1 k ad β SL is the estimator defied i.14) with A>. For 1 q, the CI α Z,k,q)defied i 4.4) satisfies

19 the followig properties: 4.5) 4.6) lim if HIGH-DIMENSIONAL ACCURACY ASSESSMENT 185 if P θ,p θ k) β β q CI αz,k,q) ) = 1, L CI α Z,k,q), k) ) k q. Propositio 5 shows that the cofidece iterval CI α Z, k i,q) defied i 4.4) achieves the lower boud i 4.1), for i = 1,, ad the cofidece iterval CI α Z, k,q)defied i 4.4) achieves the lower boud i 4.). 5. Estimatio of the l q loss of rate-optimal estimators. We have established miimax lower bouds for the estimatio accuracy of the loss of a broad class of estimators β satisfyig A1) or A) ad also demostrated that such miimax lower bouds are sharp for the Lasso ad scaled Lasso estimators. We ow show that the miimax lower bouds are sharp for the class of rate-optimal estimators satisfyig the followig Assumptio A). A) The estimator β satisfies 5.1) sup P θ β β q C β q 0 1) ) Cp δ, θ k) for all 1 k log p,whereδ>0, C > 0adC>0are costats ot depedig o k, or p. Wesayaestimator β is rate-optimal if it satisfies Assumptio A). As show i [3, 4, 1, 6], Lasso, Datzig Selector, scaled Lasso ad square-root Lasso are rate-optimal whe the tuig parameter is chose properly. We shall stress that Assumptio A) implies Assumptios A1) ad A). Assumptio A) requires the estimator β to perform well over the whole parameter space k) while Assumptios A1) ad A) oly require β to perform well at a sigle poit or over a proper subset. The followig propositio shows that the miimax lower bouds established i Theorem 1 to Theorem 7 ca be achieved for the class of rate-optimal estimators. PROPOSITION 6. Let β be a estimator satisfyig Assumptio A): 1. There exist poit or iterval) estimators of the loss β β q with 1 q< achievig, up to a costat factor, the miimax lower bouds.9) i Theorem 1 ad 3.13) i Theorem 5 ad estimators of loss β β q with 1 q achievig, up to a costat factor, the miimax lower bouds.13) i Theorem ad 4.1) ad 4.) i Theorem 7.. Suppose that the estimator β is costructed based o the subsample Z 1) = y 1),X 1) ), the there exist estimators of the loss β β achievig, up to a costat factor, the miimax lower bouds.8) i Theorem 1, 3.5) i Theorem 3 ad 3.7) i Theorem 4.

20 186 T. T. CAI AND Z. GUO 3. Suppose the estimator β is costructed based o the subsample Z 1) = y 1),X 1) ) ad it satisfies Assumptio A) with δ> ad 5.) sup θ k) P θ β β) S c 1 c β β) 1 S where S = suppβ) ) Cp δ, for all k log p. The for p there exist estimators of the loss β β q with 1 q<achievig the lower bouds give i 3.18) i Theorem 6. For reasos of space, we do ot discuss the detailed costructio for the poit ad iterval estimators achievig these miimax lower bouds here ad postpoe the costructio to the proof of Propositio 6. REMARK 3. Sample splittig has bee widely used i the literature. For example, the coditio that β is costructed based o the subsample Z 1) = y 1),X 1) ) has bee itroduced i [] for costructig cofidece sets for β ad i [0] for costructig cofidece itervals for the l loss. Such a coditio is imposed purely for techical reasos to create idepedece betwee the estimator β ad the subsample Z ) = y ),X ) ), which is useful to evaluate the l q loss of the estimator β. Asshowi[4], the assumptio 5.) is satisfied for the Lasso ad Datzig Selector. This techical assumptio is imposed such that β β 1 ca be tightly cotrolled by β β. 6. Geeral tools for miimax lower bouds. A major step i our aalysis is to establish rate sharp lower bouds for the estimatio error ad the expected legth of cofidece itervals for the l q loss. We itroduce i this sectio ew techical tools that are eeded to establish these lower bouds. A sigificat distictio of the lower boud results give i the previous sectios from those for the traditioal parameter estimatio problems is that the costrait is o the performace of the estimator β of the regressio vector β, butthelower bouds are o the difficulty of estimatig its loss β β q. It is ecessary to develop ew lower boud techiques to establish rate-optimal lower bouds for the estimatio error ad the expected legth of cofidece itervals for the loss β β q. These techical tools may also be of idepedet iterest. We begi with otatio. Let Z deote a radom variable whose distributio is idexed by some parameter θ ad let π deote a prior o the parameter space. We will use f θ z) to deote the desity of Z give θ ad f π z) to deote the margial desity of Z uder the prior π. LetP π deote the distributio of Z correspodig to f π z), thatis,p π A) = 1 z A f π z) dz, where1 z A is the idicator fuctio. For a fuctio g, we write E π gz)) for the expectatio uder f π. More specifically, f π z) = f θ z)πθ) dθ ad E π gz)) = gz)f π z) dz. The L 1 distace betwee two probability distributios with desities f 0 ad f 1 is give by L 1 f 1,f 0 ) = f 1 z) f 0 z) dz. The followig theorem establishes the miimax lower bouds for the estimatio error ad the expected legth of

21 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 187 cofidece itervals for the l q loss, uder the costrait that β is a good estimator at at least oe iterior poit. THEOREM 8. Suppose 0 <α,α 0 < 1 4,1 q, 0 is positive defiite, θ 0 = β, 0,σ 0 ) ad F. Defie d = mi θ F βθ) β q. Let π deote a prior over the parameter space F. If a estimator β satisfies 6.1) P θ0 β β q 1 16 d the 6.) if L q ad sup θ {θ 0 } F ) 1 α 0, P θ L q β β q 1 ) 4 d c 1, 6.3) L ) α {θ0 },, β,l q = if E θ0 L CI α β,l q,z) ) c d, CI α β,l q,z) I α, β,l q ) where c 1 = mi{ 1 10, 9 10 α 0 L 1 f π,f θ0 )) + } ad c = 1 1 α α 0 L 1 f π,f θ0 )) +. REMARK 4. The miimax lower boud 6.) for the estimatio error ad 6.3) for the expected legth of cofidece itervals hold as log as the estimator β estimates β well at a iterior poit θ 0. Besides Coditio 6.1), aother key igrediet for the lower bouds 6.) ad6.3) is to costruct the least favorable space F with the prior π such that the margial distributios f π ad f θ0 are odistiguishable. For the estimatio lower boud 6.), costraiig that β β q ca be well estimated at θ 0, due to the odistiguishability betwee f π ad f θ0,we ca establish that the loss β β q caot be estimated well over F.Forthelower boud 6.3), by Coditio 6.1) ad the odistiguishability betwee f π ad f θ0, we will show that β β q over F is much larger tha β β q, ad hece the hoest cofidece itervals must be sufficietly log. Theorem 8 is used to establish the miimax lower bouds for both the estimatio error ad the expected legth of cofidece itervals of the l q loss over k). By takig θ 0 k 0 ) ad = k), Theorem follows from 6.) with a properly costructed subset F k). Bytakigθ 0 k 0 ) ad = k ),thelower boud 4.) i Theorem 7 follows from 6.3) with a properly costructed F k ). I both cases, Assumptio A1) implies Coditio 6.1). Several miimax lower bouds over 0 k) ca also be implied by Theorem 8. For the estimatio error, the miimax lower bouds.8) ad.9) over the regime k i Theorem 1 follow from 6.). For the expected legth of cofidece itervals, the miimax lower bouds 3.7) i Theorem 4 ad 3.18) i the regios

22 188 T. T. CAI AND Z. GUO k 1 k ad k 1 k i Theorem 6 follow from 6.3). I these cases, Assumptios A1) or A) ca guaratee that Coditio 6.1) is satisfied. However, the miimax lower bouds for estimatio error.9) i the regio k log p ad for the expected legth of cofidece itervals 3.18) ithe regio k 1 k log p caot be established usig the above theorem. The followig theorem, which requires testig a composite ull agaist a composite alterative, establishes the refied miimax lower bouds over 0 k). THEOREM 9. Let 0 <α,α 0 < 1 4,1 q ad θ 0 = β, 0,σ 0 ) where 0 is a positive defiite matrix. Let k 1 ad k be two sparsity levels. Assume that for i = 1, there exist parameter spaces F i {β, 0,σ 0 ) : β 0 k i } such that for give dist i ad d i : βθ) β ) 0 βθ) β ) = dist i ad βθ) β q = d i for all θ F i. Let π i deote a prior over the parameter space F i for i = 1,. Suppose that for θ 1 = β, 0,σ0 + dist 1 ) ad θ = β, 0,σ0 + dist ), there exist costats c 1,c > 0 such that 6.4) P θi β β q ) c i d i 1 α0 for i = 1,. The we have 6.5) if L q ad sup P θ L q β β q c ) 3 d c3, θ F 1 F 6.6) L α 0 k 1 ), 0 k ), β,l q ) c 4 1 c ) d 1 + c 1) d 1 where c3 = mi{ 1 4,1 c ) c 1) d 1 ) d + }, c4 = 1 α 0 α i=1 L 1 f πi,f θi ) L 1 f π,f π1 )) + ad c 3 = mi{ 10 1, 10 9 α 0 i=1 L 1 f πi, f θi ) L 1 f π,f π1 )) + }. REMARK 5. As log as the estimator β performs well at two poits, θ 1 ad θ, the miimax lower bouds 6.5) for the estimatio error ad 6.6) for the expected legth of cofidece itervals hold. Note that θ i i the above theorem does ot belog to the parameter space {β, 0,σ 0 ) : β 0 k i },fori = 1,. I cotrast to Theorem 8, Theorem 9 compares composite hypotheses F 1 ad F, which will lead to a sharper lower boud tha comparig the simple ull {θ 0 } with the composite alterative F. For simplicity, we costruct least favorable parameter spaces F i such that the poits i F i is of fixed geeralized) l distace ad fixed l q distace to β,fori = 1,, respectively. More importatly, we costruct F 1 with the ) +,

23 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 189 prior π 1 ad F with the prior π such that f π1 ad f π are ot distiguishable, where θ 1 ad θ are itroduced to facilitate the compariso. By Coditio 6.4) ad the costructio of F 1 ad F, we establish that the l q loss caot be simultaeously estimated well over F 1 ad F. For the lower boud 6.6), uder the same coditios, it is show that the l q loss over F 1 ad F are far apart ad ay cofidece iterval with guarateed coverage probability over F 1 F must be sufficietly log. Due to the prior iformatio = Iadσ = σ 0, the lower boud costructio over 0 k) is more ivolved tha that over k). We shall stress that the costructio of F 1 ad F ad the compariso betwee composite hypotheses are of idepedet iterest. The miimax lower boud.9) i the regio k follows from for 6.5) ad the miimax lower boud 3.18) i the regio k 1 k the expected legth of cofidece itervals follows from 6.6). I these cases, 0 is take as I ad Assumptio A) implies Coditio 6.4). 7. A itermediate settig with kow σ = σ 0 ad ukow. The results give i Sectios 3 ad 4 show the sigificat differece betwee 0 k) ad k) i terms of miimaxity ad adaptivity of cofidece itervals for β β q. 0 k) is for the simple settig with kow desig covariace matrix = Iad kow oise level σ = σ 0,ad k) is for ukow ad σ. I this sectio, we further cosider miimaxity ad adaptivity of cofidece itervals for β β q i a itermediate settig where the oise level σ = σ 0 is kow ad is ukow but of certai structure. Specifically, we cosider the followig parameter space: 1 β 0 k, λ mi ) λ max ) M 1 M 7.1) σ0 k, s) = β,,σ 0 ) : 1 1 L1 M, max 1), i 0 s 1 i p for some costats M 1 1adM>0. σ0 k, s) basically assumes kow oise level σ ad imposes sparsity coditios o the precisio matrix of the radom desig. This parameter space is similar to those used i the literature of sparse liear regressio with radom desig [13, 14, 9]. σ0 k, s) has two sparsity parameters where k represets the sparsity of β ad s represets the maximum row sparsity of the precisio matrix 1. Note that 0 k) σ0 k, s) k) ad 0 k) is a special case of σ0 k, s) with M 1 = 1. Uder the assumptio s /, the miimaxity ad adaptivity lower bouds for the expected legth of cofidece itervals for β β q with 1 q< over σ0 k, s) are the same as those over 0 k). Thatis,Theorems5 ad 6 hold with 0 k 1 ), 0 k ),ad 0 k) replaced by σ0 k 1,s), σ0 k,s)ad σ0 k, s), respectively. For the case q =, the followig theorem establishes the miimaxity ad adaptivity lower bouds for the expected legth of cofidece itervals for β β over σ 0 k, s).

24 1830 T. T. CAI AND Z. GUO THEOREM 10. Suppose 0 <α,α 0 < 1/4, M 1 > 1, s / ad the sparsity levels k 1, k ad k 0 satisfy Assumptio B) with the costat c 0 replaced by c0 defied i 9.14). For ay estimator β satisfyig 7.) sup P θ β β q C β ) q 0 θ k 0 ) σ α 0, with a costat C > 0, the there is some costat c>0 such that ) σ0 k 1,s), σ0 k,s), β,l 7.3) ad 7.4) L α L α { { c mi k, max k 1, 1 }}σ 0 ) k i σ0 k i,s), β,l c σ0 ad i = 1,. I particular, if p ad β is costructed based o the subsample Z 1) = y 1),X 1) ) ad satisfies Assumptio A) with δ>, the above lower bouds ca be attaied. I cotrast to Theorems 3 ad 4, the lower bouds for the case q = chage i the absece of the prior kowledge = I but the possibility of adaptivity of cofidece itervals over σ0 k, s) is similar to that over 0 k). Sice the Lasso estimator β L defied i.10) with A>4 satisfies Assumptio A) with δ>, by Theorem 10, the miimax lower bouds 7.3) ad 7.4) ca be attaied for β L.For β L, oly whe k 1 k log p, L α σ 0 k 1,s), β L,l ) L α σ 0 k 1,s), σ0 k,s), β,l ) k 1 ad adaptatio betwee σ0 k 1,s) ad σ0 k,s) is possible. I other regimes, if k 1 k, the L α σ 0 k 1,s), β L,l ) L α σ 0 k 1,s), σ0 k,s), β,l ) ad adaptatio betwee σ0 k 1,s) ad σ0 k,s) is impossible. For reasos of space, more discussio o σ0 k, s), icludig the costructio of adaptive cofidece itervals over the regime k 1 k, is postpoed to the supplemet [6]. 8. Miimax lower bouds for estimatig β q with 1 q. The lower bouds developed i this paper have broader implicatios. I particular, the established results imply the miimax lower bouds for estimatig β q ad the expected legth of cofidece itervals for β q with 1 q. To build the coectio, it is sufficiet to ote that the trivial estimator β = 0 satisfies Assumptios A1) ad A) with β = 0. The we ca apply the lower bouds.8),.9) ad

Accuracy Assessment for High-Dimensional Linear Regression

Accuracy Assessment for High-Dimensional Linear Regression Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia

More information

Confidence Intervals for High-Dimensional Linear Regression: Minimax Rates and Adaptivity

Confidence Intervals for High-Dimensional Linear Regression: Minimax Rates and Adaptivity Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research 5-207 Cofidece Itervals for High-Dimesioal Liear Regressio: Miimax Rates ad Adaptivity Toy Cai Uiversity of Pesylvaia Zijia

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

CONFIDENCE INTERVALS FOR HIGH-DIMENSIONAL LINEAR REGRESSION: MINIMAX RATES AND ADAPTIVITY 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania

CONFIDENCE INTERVALS FOR HIGH-DIMENSIONAL LINEAR REGRESSION: MINIMAX RATES AND ADAPTIVITY 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania The Aals of Statistics 207, Vol. 45, No. 2, 65 646 DOI: 0.24/6-AOS46 Istitute of Mathematical Statistics, 207 CONFIDENCE INTERVALS FOR HIGH-DIMENSIONAL LINEAR REGRESSION: MINIMAX RATES AND ADAPTIVITY BY

More information

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications Semi-supervised Iferece for Explaied Variace i High-dimesioal Liear Regressio ad Its Applicatios T. Toy Cai ad Zijia Guo Uiversity of Pesylvaia ad Rutgers Uiversity March 8, 08 Abstract We cosider statistical

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if LECTURE 14 NOTES 1. Asymptotic power of tests. Defiitio 1.1. A sequece of -level tests {ϕ x)} is cosistet if β θ) := E θ [ ϕ x) ] 1 as, for ay θ Θ 1. Just like cosistecy of a sequece of estimators, Defiitio

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Probability, Expectation Value and Uncertainty

Probability, Expectation Value and Uncertainty Chapter 1 Probability, Expectatio Value ad Ucertaity We have see that the physically observable properties of a quatum system are represeted by Hermitea operators (also referred to as observables ) such

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Lecture 24: Variable selection in linear models

Lecture 24: Variable selection in linear models Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Lecture Notes 15 Hypothesis Testing (Chapter 10)

Lecture Notes 15 Hypothesis Testing (Chapter 10) 1 Itroductio Lecture Notes 15 Hypothesis Testig Chapter 10) Let X 1,..., X p θ x). Suppose we we wat to kow if θ = θ 0 or ot, where θ 0 is a specific value of θ. For example, if we are flippig a coi, we

More information

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight) Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should be doe

More information

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Lecture 6 Simple alternatives and the Neyman-Pearson lemma STATS 00: Itroductio to Statistical Iferece Autum 06 Lecture 6 Simple alteratives ad the Neyma-Pearso lemma Last lecture, we discussed a umber of ways to costruct test statistics for testig a simple ull

More information

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions Chapter 9 Slide Ifereces from Two Samples 9- Overview 9- Ifereces about Two Proportios 9- Ifereces about Two Meas: Idepedet Samples 9-4 Ifereces about Matched Pairs 9-5 Comparig Variatio i Two Samples

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

Rank tests and regression rank scores tests in measurement error models

Rank tests and regression rank scores tests in measurement error models Rak tests ad regressio rak scores tests i measuremet error models J. Jurečková ad A.K.Md.E. Saleh Charles Uiversity i Prague ad Carleto Uiversity i Ottawa Abstract The rak ad regressio rak score tests

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

x a x a Lecture 2 Series (See Chapter 1 in Boas)

x a x a Lecture 2 Series (See Chapter 1 in Boas) Lecture Series (See Chapter i Boas) A basic ad very powerful (if pedestria, recall we are lazy AD smart) way to solve ay differetial (or itegral) equatio is via a series expasio of the correspodig solutio

More information

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee

More information

Summary. Recap ... Last Lecture. Summary. Theorem

Summary. Recap ... Last Lecture. Summary. Theorem Last Lecture Biostatistics 602 - Statistical Iferece Lecture 23 Hyu Mi Kag April 11th, 2013 What is p-value? What is the advatage of p-value compared to hypothesis testig procedure with size α? How ca

More information

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab Sectio 12 Tests of idepedece ad homogeeity I this lecture we will cosider a situatio whe our observatios are classified by two differet features ad we would like to test if these features are idepedet

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

5. Likelihood Ratio Tests

5. Likelihood Ratio Tests 1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Lecture 11 October 27

Lecture 11 October 27 STATS 300A: Theory of Statistics Fall 205 Lecture October 27 Lecturer: Lester Mackey Scribe: Viswajith Veugopal, Vivek Bagaria, Steve Yadlowsky Warig: These otes may cotai factual ad/or typographic errors..

More information

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2 Chapter 8 Comparig Two Treatmets Iferece about Two Populatio Meas We wat to compare the meas of two populatios to see whether they differ. There are two situatios to cosider, as show i the followig examples:

More information

Disjoint Systems. Abstract

Disjoint Systems. Abstract Disjoit Systems Noga Alo ad Bey Sudaov Departmet of Mathematics Raymod ad Beverly Sacler Faculty of Exact Scieces Tel Aviv Uiversity, Tel Aviv, Israel Abstract A disjoit system of type (,,, ) is a collectio

More information

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function. MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Axioms of Measure Theory

Axioms of Measure Theory MATH 532 Axioms of Measure Theory Dr. Neal, WKU I. The Space Throughout the course, we shall let X deote a geeric o-empty set. I geeral, we shall ot assume that ay algebraic structure exists o X so that

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D. ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally

More information

Notes for Lecture 11

Notes for Lecture 11 U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Understanding Samples

Understanding Samples 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Mathematical Statistics - MS

Mathematical Statistics - MS Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios

More information

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially

More information

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test. Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal

More information

MATH/STAT 352: Lecture 15

MATH/STAT 352: Lecture 15 MATH/STAT 352: Lecture 15 Sectios 5.2 ad 5.3. Large sample CI for a proportio ad small sample CI for a mea. 1 5.2: Cofidece Iterval for a Proportio Estimatig proportio of successes i a biomial experimet

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

An Introduction to Asymptotic Theory

An Introduction to Asymptotic Theory A Itroductio to Asymptotic Theory Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) Asymptotic Theory 1 / 20 Five Weapos i Asymptotic Theory Five Weapos i Asymptotic Theory Pig Yu

More information

Application to Random Graphs

Application to Random Graphs A Applicatio to Radom Graphs Brachig processes have a umber of iterestig ad importat applicatios. We shall cosider oe of the most famous of them, the Erdős-Réyi radom graph theory. 1 Defiitio A.1. Let

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information