Accuracy Assessment for High-Dimensional Linear Regression

Size: px
Start display at page:

Download "Accuracy Assessment for High-Dimensional Linear Regression"

Transcription

1 Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia Follow this ad additioal works at: Part of the Physical Scieces ad Mathematics Commos Recommeded Citatio Cai, T., & Guo, Z Accuracy Assessmet for High-Dimesioal Liear Regressio. The Aals of Statistics, Retrieved from This paper is posted at ScholarlyCommos. For more iformatio, please cotact

2 Accuracy Assessmet for High-Dimesioal Liear Regressio Abstract This paper cosiders poit ad iterval estimatio of the l q loss of a estimator i high-dimesioal liear regressio with radom desig. We establish the miimax rate for estimatig the l q loss ad the miimax expected legth of cofidece itervals for the l q loss of rate-optimal estimators of the regressio vector, icludig commoly used estimators such as Lasso, scaled Lasso, square-root Lasso ad Datzig Selector. Adaptivity of the cofidece itervals for the l q loss is also studied. Both the settig of kow idetity desig covariace matrix ad kow oise level ad the settig of ukow desig covariace matrix ad ukow oise level are studied. The results reveal iterestig ad sigificat differeces betwee estimatig the l loss ad l q loss with 1 q < as well as betwee the two settigs. New techical tools are developed to establish rate sharp lower bouds for the miimax estimatio error ad the expected legth of miimax ad adaptive cofidece itervals for the l q loss. A sigificat differece betwee loss estimatio ad the traditioal parameter estimatio is that for loss estimatio the costrait is o the performace of the estimator of the regressio vector, but the lower bouds are o the difficulty of estimatig its l q loss. The techical tools developed i this paper ca also be of idepedet iterest. Keywords Accuracy assessmet, adaptivity, cofidece iterval, highdimesioal liear regressio, loss estimatio, miimax lower boud, miimaxity, sparsity Disciplies Physical Scieces ad Mathematics This joural article is available at ScholarlyCommos:

3 Submitted to the Aals of Statistics arxiv: arxiv: ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION By T. Toy Cai, ad Zijia Guo Uiversity of Pesylvaia This paper cosiders poit ad iterval estimatio of the l q loss of a estimator i high-dimesioal liear regressio with radom desig. We establish the miimax rate for estimatig the l q loss ad the miimax expected legth of cofidece itervals for the l q loss of rate-optimal estimators of the regressio vector, icludig commoly used estimators such as Lasso, scaled Lasso, square-root Lasso ad Datzig Selector. Adaptivity of cofidece itervals for the l q loss is also studied. Both the settig of kow idetity desig covariace matrix ad kow oise level ad the settig of ukow desig covariace matrix ad ukow oise level are studied. The results reveal iterestig ad sigificat differeces betwee estimatig the l loss ad l q loss with 1 q < as well as betwee the two settigs. New techical tools are developed to establish rate sharp lower bouds for the miimax estimatio error ad the expected legth of miimax ad adaptive cofidece itervals for the l q loss. A sigificat differece betwee loss estimatio ad the traditioal parameter estimatio is that for loss estimatio the costrait is o the performace of the estimator of the regressio vector, but the lower bouds are o the difficulty of estimatig its l q loss. The techical tools developed i this paper ca also be of idepedet iterest. 1. Itroductio. I may applicatios, the goal of statistical iferece is ot oly to costruct a good estimator, but also to provide a measure of accuracy for this estimator. I classical statistics, whe the parameter of iterest is oe-dimesioal, this is achieved i the form of a stadard error or a cofidece iterval. A prototypical example is the iferece for a biomial proportio, where ofte ot oly a estimate of the proportio but also its margi of error are give. Accuracy measures of a estimatio procedure have also bee used as a tool for the empirical selectio of tuig parameters. A well kow example is Stei s Ubiased Risk Estimate SURE, which has bee a effective tool for the costructio of data-drive adaptive estimators i ormal meas estimatio, oparametric sigal recovery, covariace The research was supported i part by NSF Grats DMS ad DMS , ad NIH Grat R01 CA MSC 010 subject classificatios: Primary 6G15; secodary 6C0, 6H35 Keywords ad phrases: Accuracy assessmet, adaptivity, cofidece iterval, highdimesioal liear regressio, loss estimatio, miimax lower boud, miimaxity, sparsity. 1

4 T. T. CAI AND Z. GUO matrix estimatio, ad other problems. See, for istace, [5, 1, 15, 11, 3]. The commoly used cross-validatio methods ca also be viewed as a useful tool based o the idea of empirical assessmet of accuracy. I this paper, we cosider the problem of estimatig the loss of a give estimator i the settig of high-dimesioal liear regressio, where oe observes X, y with X R p ad y R, ad for 1 i, y i = X i β + ϵ i. Here β R p iid is the regressio vector, X i N p 0, Σ are the rows of X, ad iid the errors ϵ i N0, σ are idepedet of X. This high-dimesioal liear model has bee well studied i the literature, where the mai focus has bee o estimatio of β. Several pealized/costraied l 1 miimizatio methods, icludig Lasso [8], Datzig selector [1], scaled Lasso [6] ad square-root Lasso [3], have bee proposed. These methods have bee show to work well i applicatios ad produce iterpretable estimates of β whe β is assumed to be sparse. Theoretically, with a properly chose tuig parameter, these estimators achieve the optimal rate of covergece over collectios of sparse parameter spaces. See, for example, [1, 6, 3, 3, 4, 5, 30]. For a give estimator β, the l q loss β β q with 1 q is commoly used as a metric of accuracy for β. We cosider i the preset paper both poit ad iterval estimatio of the l q loss β β q for a give β. Note that the loss β β q is a radom quatity, depedig o both the estimator β ad the parameter β. For such a radom quatity, predictio ad predictio iterval are ususally used for poit ad iterval estimatio, respectively. However, we slightly abuse the termiologies i the preset paper by usig estimatio ad cofidece iterval to represet the poit ad iterval estimators of the loss β β q. Sice the l q loss depeds o the estimator β, it is ecessary to specify the estimator i the discussio of loss estimatio. Throughout this paper, we restrict our attetio to a broad collectio of estimators β that perform well at least at oe iterior poit or a small subset of the parameter space. This collectio of estimators icludes most state-of-art estimators such as Lasso, Datzig selector, scaled Lasso ad square-root Lasso. High-dimesioal liear regressio has bee well studied i two settigs. Oe is the settig with kow desig covariace matrix Σ = I, kow oise level σ = σ 0 ad sparse β. See for example, [16,,, 30, 7, 7, 1, 19]. Aother commoly cosidered settig is sparse β with ukow Σ ad σ. We study poit ad iterval estimatio of the l q loss β β q i both settigs. Specifically, we cosider the parameter space Θ 0 k itroduced i

5 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 3.3, which cosists of k-sparse sigals β with kow desig covariace matrix Σ = I ad kow oise level σ = σ 0, ad Θk defied i.4, which cosists of k-sparse sigals with ukow Σ ad σ Our cotributios. The preset paper studies the miimax ad adaptive estimatio of the loss β β q for a give estimator β ad the miimax expected legth ad adaptivity of cofidece itervals for the loss. A major step i our aalysis is to establish rate sharp lower bouds for the miimax estimatio error ad the miimax expected legth of cofidece itervals for the l q loss over Θ 0 k ad Θk for a broad class of estimators of β, which cotais the subclass of rate-optimal estimators. We the focus o the estimatio of the loss of rate-optimal estimators ad take the Lasso ad scaled Lasso estimators as geeric examples. For these rate-optimal estimators, we propose procedures for poit estimatio as well as cofidece itervals for their l q losses. It is show that the proposed procedures achieve the correspodig lower bouds up to a costat factor. These results together establish the miimax rates for estimatig the l q loss of rate-optimal estimators over Θ 0 k ad Θk. The aalysis shows iterestig ad sigificat differeces betwee estimatig the l loss ad l q loss with 1 q < as well as betwee the two parameter spaces Θk ad Θ 0 k. { } The miimax rate for estimatig β β over Θ 0k is mi 1, k ad over Θk is k iformatio Σ = I ad σ = σ 0 whe. So loss estimatio is much easier with the prior k The miimax rate for estimatig β β q with 1 q < over both Θ 0 k ad Θk is k q.. I the regime k, a practical loss estimator is proposed for estimatig the l loss ad show to achieve the optimal covergece 1 rate adaptively over Θ 0 k. We say estimatio of loss is impossible if the miimax rate ca be achieved by the trivial estimator 0, which meas that the estimatio accuracy of the loss is at least of the same order as the loss itself. I all other cosidered cases, estimatio of loss is show to be impossible. These results idicate that loss estimatio is difficult. We the tur to the costructio of cofidece itervals for the l q loss. A cofidece iterval for the loss is useful eve whe it is impossible to estimate the loss, as a cofidece iterval ca provide o-trivial upper ad lower bouds for the loss. I terms of covergece rate over Θ 0 k or Θk, the miimax rate of the expected legth of cofidece itervals for the l q loss, β β q, of ay rate-optimal estimator β coicides with

6 4 T. T. CAI AND Z. GUO the miimax estimatio rate. We also cosider the adaptivity of cofidece itervals for the l q loss of ay rate-optimal estimator β. The framework for adaptive cofidece itervals is discussed i detail i Sectio 3.1. Regardig cofidece itervals for the l loss i the case of kow Σ = I ad σ = σ 0, a procedure is proposed ad is show to achieve the optimal legth 1 adaptively over Θ 0 k for k. Furthermore, it is show that this is the oly regime where adaptive cofidece itervals exist, eve over two give parameter spaces. For example, whe k 1 ad k 1 k, it is impossible to costruct a cofidece iterval for the l loss with guarateed coverage probability over Θ 0 k cosequetly also over Θ 0 k 1 ad with the expected legth automatically adjusted to the sparsity. Similarly, for the l q loss with 1 q <, costructio of adaptive cofidece itervals is impossible over Θ 0 k 1 ad Θ 0 k for k 1 k. Regardig cofidece itervals for the l q loss with 1 q i the case of ukow Σ ad σ, the impossibility of adaptivity also holds over Θk 1 ad Θk for k 1 k. Establishig rate-optimal lower bouds requires the developmet of ew techical tools. Oe mai differece betwee loss estimatio ad the traditioal parameter estimatio is that for loss estimatio the costrait is o the performace of the estimator β of the regressio vector β, but the lower boud is o the difficulty of estimatig its loss β β q. We itroduce useful ew lower boud techiques for the miimax estimatio error ad the expected legth of adaptive cofidece itervals for the loss β β q. I several importat cases, it is ecessary to test a composite ull agaist a composite alterative i order to establish rate sharp lower bouds. The techical tools developed i this paper ca also be of idepedet iterest. I additio to Θ 0 k ad Θk, we also study a itermediate parameter space where the oise level σ is kow ad the desig covariace matrix Σ is ukow but of certai structure. Lower bouds for the expected legth of miimax ad adaptive cofidece itervals for β β q over this parameter space are established for a broad collectio of estimators β ad are show to be rate sharp for the class of rate-optimal estimators. Furthermore, the lower bouds developed i this paper have wider implicatios. I particular, it is show that they lead immediately to miimax lower bouds for estimatig β q ad the expected legth of cofidece itervals for β q with 1 q. 1.. Compariso with other works. Statistical iferece o the loss of specific estimators of β has bee cosidered i the recet literature. The papers [16, ] established, i the settig Σ = I ad /p δ 0,, the limit of the ormalized loss 1 p βλ β where βλ is the Lasso estima-

7 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 5 tor with a pre-specified tuig parameter λ. Although [16, ] provided a exact asymptotic expressio of the ormalized loss, the limit itself depeds o the ukow β. I a similar settig, the paper [7] established the limit of a ormalized l loss of the square-root Lasso estimator. These limits of the ormalized losses help uderstad the properties of the correspodig estimators of β, but they do ot lead to a estimate of the loss. Our results imply that although these ormalized losses have a limit uder certai regularity coditios, such losses caot be estimated well i most settigs. A recet paper, [0], costructed a cofidece iterval for β β i the case of kow Σ = I, ukow oise level σ, ad moderate dimesio where /p ξ 0, 1 ad o sparsity is assumed o β. While o sparsity assumptio o β is imposed, their method requires the assumptio of Σ = I ad /p ξ 0, 1. I cotrast, i this paper, we cosider both ukow Σ ad kow Σ = I settigs, while allowig p ad assumig sparse β. Hoest adaptive iferece has bee studied i the oparametric fuctio estimatio literature, icludig [8] for adaptive cofidece itervals for liear fuctioals, [18, 10] for adaptive cofidece bads, ad [9, 4] for adaptive cofidece balls, ad i the high-dimesioal liear regressio literature, icludig [] for adaptive cofidece set ad [7] for adaptive cofidece iterval for liear fuctioals. I this paper, we develop ew lower boud tools, Theorems 8 ad 9, to establish the possibility of adaptive cofidece itervals for β β q. The coectio betwee l loss cosidered i the curret paper ad the work [] is discussed i more detail i Sectio Orgaizatio. Sectio establishes the miimax lower bouds of estimatig the loss β β q with 1 q over both Θ 0 k ad Θk ad shows that these bouds are rate sharp for the Lasso ad scaled Lasso estimators, respectively. We the tur to iterval estimatio of β β q. Sectios 3 ad 4 preset the miimax ad adaptive miimax lower bouds for the expected legth of cofidece itervals for β β q over Θ 0 k ad Θk. For Lasso ad scaled Lasso estimators, we show that the lower bouds ca be achieved ad ivestigate the possibility of adaptivity. Sectio 5 cosiders the rate-optimal estimators ad establishes the miimax covergece rate of estimatig their l q losses. Sectio 6 presets ew miimax lower boud techiques for estimatig the loss β β q. Sectio 7 discusses the miimaxity ad adaptivity i aother settig, where the oise level σ is kow ad the desig covariace matrix Σ is ukow but of certai structure. Sectio 8 applies the ewly developed lower bouds to establish lower bouds for a related problem, that of estimatig β q. Sectio 9 proves the mai results ad additioal proofs are give i the supplemetal material [6].

8 6 T. T. CAI AND Z. GUO 1.4. Notatio. For a matrix X R p, X i, X j, ad X i,j deote respectively the i-th row, j-th colum, ad i, j etry of the matrix X. For a subset J {1,,, p}, J deotes the cardiality of J, J c deotes the complemet {1,,, p}\j, X J deotes the submatrix of X cosistig of colums X j with j J ad for a vector x R p, x J is the subvector of x with idices i J. For a vector x R p, suppx deotes the support of x ad the l q orm of x is defied as x q = p i=1 x i q 1 q for q 0 with x 0 = suppx ad x = max 1 j p x j. For a R, a + = max {a, 0}. We use max X j as a shorthad for max 1 j p X j ad mi X j as a shorthad for mi 1 j p X j. For a matrix A, we defie the spectral orm A = sup x =1 Ax ad the matrix l 1 orm A L1 = sup p 1 j p i=1 A ij ; For a symmetric matrix A, λ mi A ad λ max A deote respectively the smallest ad largest eigevalue of A. We use c ad C to deote geeric positive costats that may vary from place to place. For two positive sequeces a ad b, a b meas a Cb for all ad a b if b a ad a b if a b ad b a, ad a a b if lim sup b = 0 ad a b if b a.. Miimax estimatio of the l q loss. We begi by presetig the miimax framework for estimatig the l q loss, β β q, of a give estimator β, ad the establish the miimax lower bouds for the estimatio error for a broad collectio of estimators β. We also show that such miimax lower bouds ca be achieved for the Lasso ad scaled Lasso estimators..1. Problem formulatio. Recall the high-dimesioal liear model,.1 y 1 = X p β p 1 + ϵ 1, ϵ N 0, σ I. We focus o the radom desig with X i iid N 0, Σ ad X i ad ϵ i are idepedet. Let Z = X, y deote the observed data ad β be a give estimator of β. Deotig by L q Z ay estimator of the loss β β q, the miimax rate of covergece for estimatig β β q over a parameter space Θ is defied as the largest quatity γ β,lq Θ such that. if sup P θ L q Z β β q Θ δ, γ β,lq L q θ Θ for some costat δ > 0 ot depedig o or p. We shall write L q for L q Z whe there is o cofusio. We deote the parameter by θ = β, Σ, σ, which cosists of the sigal β, the desig covariace matrix Σ ad the oise level σ. For a give

9 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 7 θ = β, Σ, σ, we use βθ to deote the correspodig β. Two settigs are cosidered: The first is kow desig covariace matrix Σ = I ad kow oise level σ = σ 0 ad the other is ukow Σ ad σ. I the first settig, we cosider the followig parameter space that cosists of k-sparse sigals,.3 Θ 0 k = {β, I, σ 0 : β 0 k}, ad i the secod settig, we cosider.4 { } 1 Θk = β, Σ, σ : β 0 k, λ mi Σ λ max Σ M 1, 0 < σ M, M 1 where M 1 1 ad M > 0 are costats. The parameter space Θ 0 k is a subset of Θk, which cosists of k-sparse sigals with ukow Σ ad σ. The miimax rate γ β,lq Θ for estimatig β β q also depeds o the estimator β. Differet estimators β could lead to differet losses β β q ad i geeral the difficulty of estimatig the loss β β q varies with β. We first recall the properties of some state-of-art estimators ad the specify the collectio of estimators o which we focus i this paper. As show i [1, 4, 3, 6], Lasso, Datzig Selector, scaled Lasso ad square-root Lasso satisfy the followig property if the tuig parameter is properly chose,.5 sup P θ β β q Ck q 0, θ Θk where C > 0 is a costat. The miimax lower bouds established i [30, 3, 31] imply that k q is the optimal rate for estimatig β over the parameter space Θk. It should be stressed that all of these algorithms do ot require kowledge of the sparsity k ad are thus adaptive to the sparsity provided k. We cosider a broad collectio of estimators β satisfyig oe of the followig two assumptios. A1 The estimator β satisfies, for some θ 0 = β, I, σ 0,.6 P θ0 β β q C β q 0 σ 0 α 0, where 0 α 0 < 1 4 ad C > 0 are costats. A The estimator β satisfies.7 sup {θ=β,i,σ:σ σ 0 } P θ β β q C β q 0 σ α 0, where 0 α 0 < 1 4 ad C > 0 are costats ad σ 0 > 0 is give.

10 8 T. T. CAI AND Z. GUO I view of the miimax rate give i.5, Assumptio A1 requires β to be a good estimator of β at at least oe poit θ 0 Θ 0 k. Assumptio A is slightly stroger tha A1 ad requires β to estimate β well for a sigle β but over a rage of oise levels σ σ 0 while Σ = I. Of course, ay estimator β satisfyig.5 satisfies both A1 ad A. I additio to Assumptios A1 ad A, we also itroduce the followig sparsity assumptios that will be used i various theorems. B1 Let c 0 be the costat defied i The sparsity levels k ad k 0 satisfy k c 0 mi{p γ, } for some costat 0 γ < 1 ad k 0 c 0 mi{k, }. B The sparsity levels k 1, k ad k 0 satisfy k 1 k c 0 mi{p γ, } for some costat 0 γ < 1 ad c 0 > 0 ad k 0 c 0 mi{k 1, }... Miimax estimatio of the l q loss over Θ 0 k. The followig theorem establishes the miimax lower bouds for estimatig the loss β β q over the parameter space Θ 0 k. Theorem 1. Suppose that the sparsity levels k ad k 0 satisfy Assumptio B1. For ay estimator β satisfyig Assumptio A1 with β 0 k 0, {.8 if sup P θ L β β c mi k } L θ Θ 0 k, 1 σ0 δ. For ay estimator β satisfyig Assumptio A with β 0 k 0,.9 if sup P θ L q β β q ck q L q θ Θ 0 k σ 0 δ, for 1 q <, where δ > 0 ad c > 0 are costats. Remark 1. Assumptio A1 restricts our focus to estimators that ca perform well at at least oe poit β, I, σ 0 Θ 0 k. This weak coditio makes the established lower bouds widely applicable as the bechmark for evaluatig estimators of the l q loss of ay β that performs well at a proper subset, or eve a sigle poit of the whole parameter space. I this paper, we focus o estimatig the loss β β q with 1 q. Similar results ca be established for the loss i the form of β β q q with 1 q ; Uder the same assumptios as those i Theorem 1, the lower bouds for estimatig the loss β β q q hold with replacig the covergece rates with their q power; that is,.8 remais the same while the covergece

11 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 9 rate k q /σ 0 i.9 is replaced by k /σ 0 q. Similarly, all the results established i the rest of the paper for β β q hold for β β q q with correspodig covergece rates replaced by their q power. Theorem 1 establishes the miimax lower bouds for estimatig the l loss β β of ay estimator β satisfyig Assumptio A1 ad the l q loss β β q with 1 q < of ay estimator β satisfyig Assumptio A. We will take the Lasso estimator as a example ad demostrate the implicatios of the above theorem. We radomly split Z = y, X ito subsamples Z 1 = y 1, X 1 ad Z = y, X with sample sizes 1 ad, respectively. The Lasso estimator β L based o the first subsample Z 1 = y 1, X 1 is defied as.10 βl = arg mi β R p y 1 X 1 β 1 + λ p j=1 X 1 j 1 β j, where λ = A / 1 σ 0 with A > beig a pre-specified costat. Without loss of geerality, we assume 1. For the case 1 q <,.5 ad.9 together imply that the estimatio of the l q loss β L β q is impossible sice the lower boud ca be achieved by the trivial estimator of the loss, 0. That is, sup θ Θ0 k P θ 0 β L β q Ck q 0. k For the case q =, i the regime k, the lower boud i.8 ca be achieved by the zero estimator ad hece estimatio of the loss β L β is impossible. However, the iterestig case is whe k, the loss estimator L proposed i.11 achieves the miimax lower boud 1 i.8, which caot be achieved by the zero estimator. We ow detail the costructio of the loss estimator L. Based o the secod half sample Z = y, X, we propose the followig estimator, 1 y.11 L = X βl σ 0 Note that the first subsample Z 1 = y 1, X 1 is used to produce the Lasso estimator β L i.10 ad the secod subsample Z = y, X is retaied to evaluate the loss β L β. Such sample splittig techique is similar to cross-validatio ad has bee used i [] for costructig cofidece sets for β ad i [0] for cofidece itervals for the l loss. The followig propositio establishes that the estimator L achieves the miimax lower boud of.8 over the regime k.. +

12 10 T. T. CAI AND Z. GUO Propositio 1. Suppose that k ad β L is the Lasso estimator defied i.10 with A >, the the estimator of loss proposed i.11 satisfies, for ay sequece δ,p,.1 lim sup,p sup θ Θ 0 k P θ L β L β 1 δ,p = Miimax estimatio of the l q loss over Θk. We ow tur to the case of ukow Σ ad σ ad establish the miimax lower boud for estimatig the l q loss over the parameter space Θk. Theorem. Suppose that the sparsity levels k ad k 0 satisfy Assumptio B1. For ay estimator β satisfyig Assumptio A1 with β 0 k 0,.13 if sup P θ L q β β q ck q δ, 1 q, L q θ Θk where δ > 0 ad c > 0 are costats. Theorem provides a miimax lower boud for estimatig the l q loss of ay estimator β satisfyig Assumptio A1, icludig the scaled Lasso estimator defied as.14 { β SL y Xβ, ˆσ} = arg mi + σ β R p,σ R + σ + λ 0 p j=1 X j β j, where λ 0 = A / with A >. Note that for the scaled Lasso estimator, the lower boud i.13 ca be achieved by the trivial loss estimator 0, i the sese, sup θ Θk P θ 0 β SL β q Ck q 0, ad hece estimatio of loss is impossible i this case. 3. Miimaxity ad adaptivity of cofidece itervals over Θ 0 k. We focused i the last sectio o poit estimatio of the l q loss ad showed the impossibility of loss estimatio except for oe regime. The results aturally lead to aother questio: Is it possible to costruct useful cofidece itervals for β β q that ca provide o-trivial upper ad lower bouds for the loss? I this sectio, after itroducig the framework for miimaxity ad adaptivity of cofidece itervals, we cosider the case of kow Σ = I ad σ = σ 0 ad establish the miimaxity ad adaptivity lower bouds for the expected legth of cofidece itervals for the l q loss of a broad collectio of estimators over the parameter space Θ 0 k. We also show that such

13 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 11 miimax lower bouds ca be achieved for the Lasso estimator ad the discuss the possibility of adaptivity usig the Lasso estimator as a example. The case of ukow Σ ad σ will be the focus of the ext sectio Framework for miimaxity ad adaptivity of cofidece itervals. I this sectio, we itroduce the followig decisio theoretical framework for cofidece itervals of the loss β β q. Give 0 < α < 1 ad the parameter space Θ ad the loss β β q, deote by I α Θ, β, l q the set of all 1 α level cofidece itervals for β β q over Θ, 3.1 I α Θ, β, { l q = CI α β, lq, Z = [l Z, u Z] : if θ Θ P θ We will write CI α for CI α β, lq, Z whe there is o cofusio. For ay cofidece iterval CI α β, lq, Z = [l Z, u Z], its legth is deoted by L CI α β, lq, Z = u Z l Z ad the maximum expected legth over a parameter space Θ 1 is defied as 3. L CI α β, lq, Z, Θ 1 = sup E θ L CI α β, lq, Z. θ Θ 1 For two ested parameter spaces Θ 1 Θ, we defie the bechmark L α measurig the degree of adaptivity over the ested spaces Θ 1 Θ, 3.3 L α Θ 1, Θ, β, l q = if sup E θ L CI α β, lq, Z. CI α β,l q,z I αθ, β,l q θ Θ 1 We will write L α Θ 1, β, l q for L α Θ 1, Θ 1, β, l q, which is the miimax expected legth of cofidece itervals for β β q over Θ 1. The bechmark L α Θ 1, Θ, β, l q is the ifimum of the maximum expected legth over Θ 1 amog all 1 α-level cofidece itervals over Θ. I cotrast, L α Θ 1, β, l q is cosiderig all 1 α-level cofidece itervals over Θ 1. I words, if there is prior iformatio that the parameter lies i the smaller parameter space Θ 1, L α Θ 1, β, l q measures the bechmark legth of cofidece itervals over the parameter space Θ 1, which is illustrated i the left of Figure 1; however, if there is oly prior iformatio that the parameter lies i the larger parameter space Θ, L α Θ 1, Θ, β, l q measures the bechmark legth of cofidece itervals over the parameter space Θ 1, which is illustrated i the right of Figure 1. } β βθ q CI α β, lq, Z 1 α. Θ 1, Θ, β, l q,

14 1 T. T. CAI AND Z. GUO L Θ, β, ll Θ L Θ, Θ, β, ll Θ Ω Θ Fig 1. The plot demostrates defiitios of L α Θ 1, β, l q ad L α Θ 1, Θ, β, l q. Rigorously, we defie a cofidece iterval CI to be simultaeously adaptive over Θ 1 ad Θ if CI I α Θ, β, l q, 3.4 L CI, Θ 1 L α Θ 1, β, l q, ad L CI, Θ L α Θ, β, l q. The coditio 3.4 meas that the cofidece iterval CI, which has coverage over the larger parameter space Θ, achieves the miimax rate over both Θ 1 ad Θ. Note that L CI, Θ 1 L α Θ 1, Θ, β, l q. If L α Θ 1, Θ, β, l q L α Θ 1, β, l q, the the rate-optimal adaptatio 3.4 is impossible to achieve for Θ 1 Θ. Otherwise, it is possible to costruct cofidece itervals simultaeously adaptive over parameter spaces Θ 1 ad Θ. The possibility of adaptatio over parameter spaces Θ 1 ad Θ ca thus be aswered by ivestigatig the bechmark quatities L α Θ 1, β, l q ad L α Θ 1, Θ, β, l q. Such framework has already bee itroduced i [7], which studies the miimaxity ad adaptivity of cofidece itervals for liear fuctioals i highdimesioal liear regressio. We will adopt the miimax ad adaptatio framework discussed above ad establish the miimax expected legth L α Θ 0 k, β, l q ad the adaptatio bechmark L α Θ 0 k 1, Θ 0 k, β, l q. I terms of the miimax expected legth ad the adaptivity behavior, there exist fudametal differeces betwee the case q = ad 1 q <. We will discuss them separately i the followig two subsectios. 3.. Cofidece itervals for the l loss over Θ 0 k. The followig theorem establishes the miimax lower boud for the expected legth of cofidece itervals of β β over the parameter space Θ 0k. Theorem 3. Suppose that 0 < α < 1 4 ad the sparsity levels k ad k 0 satisfy Assumptio B1. For ay estimator β satisfyig Assumptio A1

15 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 13 with β 0 k 0, the there is some costat c > 0 such that 3.5 L α Θ 0 k, β, { } k l c mi, 1 σ0. I particular, if β L is the Lasso estimator defied i.10 with A >, the the miimax expected legth for 1 α level cofidece itervals of β L β over Θ 0k is 3.6 L α Θ 0 k, β { } k L, l mi, 1 σ0. We ow cosider adaptivity of cofidece itervals for the l loss. The followig theorem gives the lower boud for the bechmark L α Θ 0 k 1, Θ 0 k, β, l. We will the discuss Theorems 3 ad 4 together. Theorem 4. Suppose that 0 < α < 1 4 ad the sparsity levels k 1, k ad k 0 satisfy Assumptio B. For ay estimator β satisfyig Assumptio A1 with β 0 k 0, the there is some costat c > 0 such that 3.7 L α Θ 0 k 1, Θ 0 k, β, { } k 1 l c mi, σ 0. I particular, if β L is the Lasso estimator defied i.10 with A >, the above lower boud ca be achieved. The lower boud established i Theorem 4 implies that of Theorem 3 ad both lower bouds hold for a geeral class of estimators satisfyig Assumptio A1. There is a phase trasitio for the lower boud of the bechmark L α Θ 0 k 1, Θ 0 k, β, l. I the regime k, the lower boud i 3.7 is k σ0 ; whe k 1, the lower boud i 3.7 is σ0. For the Lasso estimator β L defied i.10, the lower boud k σ 0 i 3.5 ad k σ0 i 3.7 ca be achieved by the cofidece itervals CI0 α Z, k, ad CI 0 α Z, k, defied i 3.15, respectively. Applyig a similar idea to.11, we show that the miimax lower boud 1 σ0 i 3.6 ad 3.7 ca be achieved by the followig cofidece iterval, 3.8 CI 1 α Z = ψ Z 1 χ 1 α ψ Z σ 0, 1 χ α σ 0, + +

16 14 T. T. CAI AND Z. GUO where χ 1 α ad χ α are the 1 α ad α quatiles of χ radom variable with degrees of freedom, respectively, ad { 1 } y 3.9 ψ Z = mi X βl, σ 0. Note that the two-sided cofidece iterval 3.8 is simply based o the observed data Z, ot depedig o ay prior kowledge of the sparsity k. Furthermore, it is a two-sided cofidece iterval, which tells ot oly just a upper boud, but also a lower boud for the loss. The coverage property ad the expected legth of CI 1 α Z are established i the followig propositio. Propositio. Suppose k ad β L is the estimator defied i.10 with A >. The CI 1 α Z defied i 3.8 satisfies, 3.10 lim if,p ad if P β L β CI 1 α Z θ Θ 0 k 3.11 L CI 1 α Z, Θ 0 k 1 σ 0. 1 α, Θ 0 k 1 k 1 Θ 0 k 1 k 1 Θ 0 k 1 1 Θ 0 k Ω Θ 0 k 1 k Θ 0 k Ω Θ 0 k 1 1 Θ 0 k Ω Θ 0 k 1 1 Fig. Illustratio of L α Θ 0k 1, β L, l top ad L α over regimes k 1 k k rightmost. leftmost, k1 k Θ 0k 1, Θ 0k, β L, l bottom middle ad k1 Regardig the Lasso estimator β L defied i.10, we will discuss the possibility of adaptivity of cofidece itervals for β L β. The adaptivity behavior of cofidece itervals for β L β is demostrated i Figure. As illustrated i the rightmost plot of Figure, i the regime k 1 k, we obtai L α Θ 0 k 1, Θ 0 k, β L, l L α Θ 0 k 1, β L, l

17 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 15 1, which implies that adaptatio is possible over this regime. As show i Propositio, the cofidece iterval CI 1 α Z defied i 3.8 is fully adaptive over the regime k i the sese of 3.4. Illustrated i the leftmost ad middle plots of Figure, it is impossible to costruct a adaptive cofidece iterval for β L β over regimes k 1 k ad k 1 k sice L α Θ 0 k 1, Θ 0 k, β L, l L α Θ 0 k 1, β L, l if k 1 ad k 1 k. To sum up, adaptive cofidece itervals for β L β is oly possible over the regime k. Compariso with cofidece balls. We should ote that the problem of costructig cofidece itervals for β β is related to but differet from that of costructig{ cofidece sets for β itself. Cofidece balls costructed i [] are of form β : β β } u Z, where β ca be the Lasso estimator ad u Z is a data depedet squared radius. See [] for further details. A aive applicatio of this cofidece ball leads to a oe-sided cofidece iterval for the loss β β, 3.1 CI iduced α Z = { β β : β β u Z Due to the reaso that cofidece sets { for β were sought for i Theorem 1 i [], cofidece sets i the form β : β β } u Z will suffice to achieve the optimal legth. However, sice our goal is to characterize β β, we apply the ubiased risk estimatio discussed i Theorem 1 of [] ad costruct the two-sided cofidece iterval i 3.8. Such a twosided cofidece iterval is more iformative tha the oe-sided cofidece iterval 3.1 sice the oe-sided cofidece iterval does ot cotai the iformatio whether the loss is close to zero or ot. Furthermore, as show i [], the legth of cofidece iterval CI iduced α Z over the parameter space 1 Θ 0 k is of order + k. The two-sided cofidece iterval CI1 α Z costructed i 3.8 is of expected legth 1, which is much shorter tha 1 + k i the regime k. That is, the two-sided cofidece iterval 3.8 provides a more accurate iterval estimator of the l loss. This is illustrated i Figure 3. The lower boud techique developed i the literature of adaptive cofidece sets [] ca also be used to establish some of the lower boud results for the case q = give i the preset paper. However, ew techiques are eeded i order to establish the rate sharp lower bouds for the miimax estimatio error.9 i the regio k ad for the }.

18 16 T. T. CAI AND Z. GUO CI Z 0 β β CI Z Fig 3. Compariso of the two-sided cofidece iterval CI 1 α Z with the oe-sided cofidece iterval CI iduced α Z. expected legth of the cofidece itervals 3.18 ad 7.3 i the regio k 1 k, where it is ecessary to test a composite ull agaist a composite alterative i order to establish rate sharp lower bouds Cofidece itervals for the l q loss with 1 q < over Θ 0 k. We ow cosider the case 1 q < ad ivestigate the miimax expected legth ad adaptivity of cofidece itervals for β β q over the parameter space Θ 0 k. The followig theorem characterizes the miimax covergece rate for the expected legth of cofidece itervals. Theorem 5. Suppose that 0 < α < 1 4, 1 q < ad the sparsity levels k ad k 0 satisfy Assumptio B1. For ay estimator β satisfyig Assumptio A with β 0 k 0, the there is some costat c > 0 such that 3.13 L α Θ 0 k, β, l q ck q σ 0. I particular, if β L is the Lasso estimator defied i.10 with A > 4, the the miimax expected legth for 1 α level cofidece itervals of β L β q over Θ 0 k is 3.14 L α Θ 0 k, β L, l q k q σ 0. We ow costruct the cofidece iterval achievig the miimax covergece rate i 3.14, 3.15 CI 0 α Z, k, q = 0, C A, kk q, { } where C A, k = max 1.01 A+ Aσ k 1 4, 3η0 η 0 +1 Aσ η 0 k 1 4 with η 0 = A. The followig propositio establishes the coverage property ad the expected legth of CI 0 α Z, k, q.

19 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 17 Propositio 3. Suppose k ad β L is the estimator defied i.10 with A > 4. For 1 q, the cofidece iterval CI 0 α Z, k, q defied i 3.15 satisfies 3.16 lim if,p ad if θ Θ 0 k P θ β β q CI 0 α Z, k, q = 1, 3.17 L CI 0 α Z, k, q, Θ 0 k k q σ 0. I particular, for the case q =, 3.16 ad 3.17 also hold for the estimator β L defied i.10 with A >. This result shows that the cofidece iterval CI 0 α Z, k, q achieves the miimax rate give i I cotrast to the l loss where the two-sided cofidece iterval 3.8 is sigificatly shorter tha the oe-sided iterval ad achieves the optimal rate over the regime k, for the l q loss with 1 q <, the oe-sided cofidece iterval achieves the optimal rate give i We ow cosider adaptivity of cofidece itervals. The followig theorem establishes the lower bouds for L α Θ 0 k 1, Θ 0 k, β, l q with 1 q <. Theorem 6. Suppose 0 < α < 1 4, 1 q < ad the sparsity levels k 1, k ad k 0 satisfy Assumptio B. For ay estimator β satisfyig Assumptio A with β 0 k 0, the there is some costat c > 0 such that 3.18 L α ck Θ 0 k 1, Θ 0 k, β, l q ck ck q q 1 1 q 1 k 1 σ 0 if σ 0 if k 1 k σ 0 if k 1 ; k k 1 k ;. I particular, if p ad β L is the Lasso estimator defied i.10 with A > 4, the above lower bouds ca be achieved. The lower bouds of Theorem 6 imply that of Theorem 5 ad both lower bouds hold for a geeral class of estimators satisfyig Assumptio A. However, the lower boud 3.18 i Theorem 6 has a sigificatly differet meaig from 3.13 i Theorem 5 where 3.18 quatifies the cost of adaptatio without kowig the sparsity level. For the Lasso estimator

20 18 T. T. CAI AND Z. GUO β L defied i.10, by comparig Theorem 5 ad Theorem 6, we obtai L α Θ 0 k 1, Θ 0 k, β L, l q L α Θ 0 k 1, β L, l q if k 1 k, which implies the impossibility of costructig adaptive cofidece itervals for the case 1 q <. There exists marked differece betwee the case 1 q < ad the case q =, where it is possible to costruct adaptive cofidece itervals over the regime k. For the Lasso estimator β L defied i.10, it is show i Propositio 3 that the cofidece iterval CI 0 α Z, k, q defied i 3.15 achieves the lower q boud k σ 0 of The lower bouds k q 1 k 1 σ 0 ad k q 1 1 σ0 of 3.18 ca be achieved by the followig proposed cofidece iterval, 3.19 CI α Z, k, q = ψ Z 1 χ 1 α σ 0, 16k q 1 ψ Z 1 χ α σ 0, where ψ Z is give i 3.9. The above claim is verified i Propositio 4. Note that the cofidece iterval CI 1 α Z defied i 3.8 is a special case of CI α Z, k, q with q =. Propositio 4. Suppose p, k 1 k ad β L is defied i.10 with A > 4. The CI α Z, k, q defied i 3.19 satisfies, 3.0 lim if,p ad if θ Θ 0 k P θ + β β q CI α Z, k, q 3.1 L CI α Z, k, q, Θ 0 k 1 k q 1 1 α, k σ Miimaxity ad adaptivity of cofidece itervals over Θk. I this sectio, we focus o the case of ukow Σ ad σ ad establish the miimax expected legth of cofidece itervals for β β q with 1 q over Θk defied i.4. We also study the possibility of adaptivity of cofidece itervals for β β q. The followig theorem establishes the lower bouds for the bechmark quatities L α Θ k i, β, l q with i = 1, ad L α Θ k 1, Θ k, β, l q. Theorem 7. Suppose that 0 < α < 1 4, 1 q ad the sparsity levels k 1, k ad k 0 satisfy Assumptio B. For ay estimator β satisfyig +

21 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 19 Assumptio A1 at θ 0 = β, I, σ 0 with β 0 k 0, there is a costat c > 0 such that 4.1 L α 4. L α Θ k i, β, l q q cki {θ 0 }, Θ k, β, l q, for i = 1, ; q ck. I particular, if β SL is the scaled Lasso estimator defied i.14 with A >, the the above lower bouds ca be achieved. The lower bouds 4.1 ad 4. hold for ay β satisfyig Assumptio A1 at a iterior poit θ 0 = β, I, σ 0, icludig the scaled Lasso estimator as a special case. We demostrate the impossibility of adaptivity of cofidece itervals for the l q loss of the scaled Lasso estimator β SL. Sice L α Θ k 1, Θ k, β SL, l q L α {θ 0 }, Θ k, β SL, l q, by 4., we have L α Θ k 1, Θ k, β SL, l q L α Θ k 1, β SL, l q if k 1 k. The compariso of L α Θ k 1, β SL, l q ad L α Θ k 1, Θ k, β SL, l q is illustrated i Figure 4. Referrig to the adaptivity defied i 3.4, it is impossible to costruct adaptive cofidece itervals for β SL β q. Θ k k Θ k Ω Θ k k Fig 4. Illustratio of L α Θ k 1, β SL, l q left ad L α Θ k 1, Θ k, β SL, l q right. Theorem 7 shows that for ay cofidece iterval CI α β, lq, Z for the loss of ay give estimator β satisfyig Assumptio A1, uder the coverage costrait that CI α β, lq, Z I α Θ k, β, l q, its expected legth at ay give θ 0 = β q, I, σ Θ k 0 must be of order k. I cotrast to Theorem 4 ad 6, Theorem 7 demostrates that cofidece itervals must be log at a large subset of poits i the parameter space, ot just at a small

22 0 T. T. CAI AND Z. GUO umber of ulucky poits. Therefore, the lack of adaptivity for cofidece itervals is ot due to the coservativeess of the miimax framework. I the followig, we detail the costructio of cofidece itervals for β SL β q. The costructio of cofidece itervals is based o the followig defiitio of restricted eigevalue, which is itroduced i [4], 4.3 κx, k, s, α 0 = mi J 0 {1,,p}, J 0 k mi δ 0, δ J c 1 α 0 δ J0 1 0 Xδ δj01, where J 1 deotes the subset correspodig to the s largest i absolute value coordiates of δ outside of J 0 ad J 01 = J 0 J 1. Defie the evet B = {ˆσ }. The cofidece iterval for β SL β q is defied as { [0, φ Z, k, q] o B 4.4 CI α Z, k, q = {0} o B c, where φ Z, k, q = mi 16Amax X j σ κ X, max k, k, 3 X j mi X j k q, k q Remark. The restricted eigevalue κ X, max k, k, 3 X j mi X j is computatioally ifeasible. For desig covariace matrix Σ of special structures, the restricted eigevalue ca be replaced by its lower boud ad a computatioally feasible cofidece iterval ca be costructed. See Sectio 4.4 i [7] for more details. Properties of CI α Z, k, q are established as follows. Propositio 5. Suppose k ad β SL is the estimator defied i.14 with A >. For 1 q, the CI α Z, k, q defied i 4.4 satisfies the followig properties, 4.5 lim if,p if θ Θk P θ β β q CI α Z, k, q 4.6 L CI α Z, k, q, Θ k k q. = 1, Propositio 5 shows that the cofidece iterval CI α Z, k i, q defied i 4.4 achieves the lower boud i 4.1, for i = 1,, ad the cofidece iterval CI α Z, k, q defied i 4.4 achieves the lower boud i 4.. σ.

23 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 1 5. Estimatio of the l q loss of rate-optimal estimators. We have established miimax lower bouds for the estimatio accuracy of the loss of a broad class of estimators β satisfyig A1 or A ad also demostrated that such miimax lower bouds are sharp for the Lasso ad scaled Lasso estimators. We ow show that the miimax lower bouds are sharp for the class of rate-optimal estimators satisfyig the followig Assumptio A. A The estimator β satisfies, 5.1 sup θ Θk P θ β β q C q β 0 Cp δ, for all k, where δ > 0, C > 0 ad C > 0 are costats ot depedig o k,, or p. We say a estimator β is rate-optimal if it satisfies Assumptio A. As show i [1, 4, 3, 6], Lasso, Datzig Selector, scaled Lasso ad squareroot Lasso are rate-optimal whe the tuig parameter is chose properly. We shall stress that Assumptio A implies Assumptios A1 ad A. Assumptio A requires the estimator β to perform well over the whole parameter space Θk while Assumptios A1 ad A oly require β to perform well at a sigle poit or over a proper subset. The followig propositio shows that the miimax lower bouds established i Theorem 1 to Theorem 7 ca be achieved for the class of rate-optimal estimators. Propositio 6. Let β be a estimator satisfyig Assumptio A. 1. There exist poit or iterval estimators of the loss β β q with 1 q < achievig, up to a costat factor, the miimax lower bouds.9 i Theorem 1 ad 3.13 i Theorem 5 ad estimators of loss β β q with 1 q achievig, up to a costat factor, the miimax lower bouds.13 i Theorem ad 4.1 ad 4. i Theorem 7.. Suppose that the estimator β is costructed based o the subsample Z 1 = y 1, X 1, the there exist estimators of the loss β β achievig, up to a costat factor, the miimax lower bouds.8 i Theorem 1, 3.5 i Theorem 3 ad 3.7 i Theorem Suppose the estimator β is costructed based o the subsample Z 1 = y 1, X 1 ad it satisfies Assumptio A with δ > ad 5. sup P θ β β S c 1 c β β S 1 where S = suppβ Cp δ, θ Θk

24 T. T. CAI AND Z. GUO for all k. The for p there exist estimators of the loss β β q with 1 q < achievig the lower bouds give i 3.18 i Theorem 6. For reasos of space, we do ot discuss the detailed costructio for the poit ad iterval estimators achievig these miimax lower bouds here ad postpoe the costructio to the proof of Propositio 6. Remark 3. Sample splittig has bee widely used i the literature. For example, the coditio that β is costructed based o the subsample Z 1 = y 1, X 1 has bee itroduced i [] for costructig cofidece sets for β ad i [0] for costructig cofidece itervals for the l loss. Such a coditio is imposed purely for techical reasos to create idepedece betwee the estimator β ad the subsample Z = y, X, which is useful to evaluate the l q loss of the estimator β. As show i [4], the assumptio 5. is satisfied for Lasso ad Datzig Selector. This techical assumptio is imposed such that β β 1 ca be tightly cotrolled by β β. 6. Geeral tools for miimax lower bouds. A major step i our aalysis is to establish rate sharp lower bouds for the estimatio error ad the expected legth of cofidece itervals for the l q loss. We itroduce i this sectio ew techical tools that are eeded to establish these lower bouds. A sigificat distictio of the lower boud results give i the previous sectios from those for the traditioal parameter estimatio problems is that the costrait is o the performace of the estimator β of the regressio vector β, but the lower bouds are o the difficulty of estimatig its loss β β q. It is ecessary to develop ew lower boud techiques to establish rate-optimal lower bouds for the estimatio error ad the expected legth of cofidece itervals for the loss β β q. These techical tools may also be of idepedet iterest. We begi with otatio. Let Z deote a radom variable whose distributio is idexed by some parameter θ Θ ad let π deote a prior o the parameter space Θ. We will use f θ z to deote the desity of Z give θ ad f π z to deote the margial desity of Z uder the prior π. Let P π deote the distributio of Z correspodig to f π z, i.e., P π A = 1 z A f π z dz, where 1 z A is the idicator fuctio. For a fuctio g, we write E π gz for the expectatio uder f π. More specifically, f π z = f θ z π θ dθ ad E π gz = g z f π z dz. The L 1 distace betwee two probability distributios with desities f 0 ad f 1 is give by L 1 f 1, f 0 = f 1 z f 0 z dz.

25 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 3 The followig theorem establishes the miimax lower bouds for the estimatio error ad the expected legth of cofidece itervals for the l q loss, uder the costrait that β is a good estimator at at least oe iterior poit. Theorem 8. Suppose 0 < α, α 0 < 1 4, 1 q, Σ 0 is positive defiite, θ 0 = β, Σ 0, σ 0 Θ, ad F Θ. Defie d = mi θ F β θ β q. Let π deote a prior over the parameter space F. If a estimator β satisfies 6.1 P θ0 β β q 1 16 d 1 α 0, the 6. if L q ad 6.3{θ 0 }, Θ, β, l q L α sup P θ L q β β q 1 θ {θ 0 } F 4 d c 1, = if E θ 0 L CI α β, lq, Z c d, CI α β,l q,z I αθ, β,l q { where c 1 = mi 1 10, 9 10 α 0 L 1 f π, f θ0 } ad c + = 1 1 α α 0 L 1 f π, f θ0 +. Remark 4. The miimax lower boud 6. for the estimatio error ad 6.3 for the expected legth of cofidece itervals hold as log as the estimator β estimates β well at a iterior poit θ 0. Besides Coditio 6.1, aother key igrediet for the lower bouds 6. ad 6.3 is to costruct the least favorable space F with the prior π such that the margial distributios f π ad f θ0 are o-distiguishable. For the estimatio lower boud 6., costraiig that β β q ca be well estimated at θ 0, due to the o-distiguishability betwee f π ad f θ0, we ca establish that the loss β β q caot be estimated well over F. For the lower boud 6.3, by Coditio 6.1 ad the o-distiguishability betwee f π ad f θ0, we will show that β β q over F is much larger tha β β q ad hece the hoest cofidece itervals must be sufficietly log. Theorem 8 is used to establish the miimax lower bouds for both the estimatio error ad the expected legth of cofidece itervals of the l q loss over Θk. By takig θ 0 Θk 0 ad Θ = Θk, Theorem follows from 6. with a properly costructed subset F Θk. By takig θ 0 Θk 0 ad Θ = Θk, the lower boud 4. i Theorem 7 follows from 6.3 with

26 4 T. T. CAI AND Z. GUO a properly costructed F Θk. I both cases, Assumptio A1 implies Coditio 6.1. Several miimax lower bouds over Θ 0 k ca also be implied by Theorem 8. For the estimatio error, the miimax lower bouds.8 ad.9 over the regime k i Theorem 1 follow from 6.. For the expected legth of cofidece itervals, the miimax lower bouds 3.7 i Theorem 4 ad 3.18 i the regios k 1 k ad k 1 k i Theorem 6 follow from 6.3. I these cases, Assumptio A1 or A ca guaratee that Coditio 6.1 is satisfied. However, the miimax lower bouds for estimatio error.9 i the regio k ad for the expected ca- legth of cofidece itervals 3.18 i the regio k 1 k ot be established usig the above theorem. The followig theorem, which requires testig a composite ull agaist a composite alterative, establishes the refied miimax lower bouds over Θ 0 k. Theorem 9. Let 0 < α, α 0 < 1 4, 1 q, ad θ 0 = β, Σ 0, σ 0 where Σ 0 is a positive defiite matrix. Let k 1 ad k be two sparsity levels. Assume that for i = 1, there exist parameter spaces F i {β, Σ 0, σ 0 : β 0 k i } such that for give dist i ad d i β θ β Σ 0 β θ β = dist i ad β θ β q = d i, for all θ F i. Let π i deote a prior over the parameter space F i for i = 1,. Suppose that for θ 1 = β, Σ 0, σ0 + 1 dist ad θ = β, Σ 0, σ0 + dist, there exist costats c 1, c > 0 such that 6.4 P θi β β q c i d i 1 α 0, for i = 1,. The we have 6.5 if L q ad 6.6 L α sup P θ L q β β q c 3d c 3, θ F 1 F Θ 0 k 1, Θ 0 k, β, l q c 4 { where c 1 3 = mi, 1 c c1 d 1 d { ad c 3 = mi 1 c d 1 + c 1 d 1, , 9 10 α 0 i=1 L 1 f πi, f θi L 1 f π, f π1 + }, c 4 = 1 α 0 α i=1 L1 fπ, f i θ i L 1 f π, f π1 + }. +

27 HIGH-DIMENSIONAL ACCURACY ASSESSMENT 5 Remark 5. As log as the estimator β performs well at two poits, θ 1 ad θ, the miimax lower bouds 6.5 for the estimatio error ad 6.6 for the expected legth of cofidece itervals hold. Note that θ i i the above theorem does ot belog to the parameter space {β, Σ 0, σ 0 : β 0 k i }, for i = 1,. I cotrast to Theorem 8, Theorem 9 compares composite hypotheses F 1 ad F, which will lead to a sharper lower boud tha comparig the simple ull {θ 0 } with the composite alterative F. For simplicity, we costruct least favorable parameter spaces F i such that the poits i F i is of fixed geeralized l distace ad fixed l q distace to β, for i = 1,, respectively. More importatly, we costruct F 1 with the prior π 1 ad F with the prior π such that f π1 ad f π are ot distiguishable, where θ 1 ad θ are itroduced to facilitate the compariso. By Coditio 6.4 ad the costructio of F 1 ad F, we establish that the l q loss caot be simultaeously estimated well over F 1 ad F. For the lower boud 6.6, uder the same coditios, it is show that the l q loss over F 1 ad F are far apart ad ay cofidece iterval with guarateed coverage probability over F 1 F must be sufficietly log. Due to the prior iformatio Σ = I ad σ = σ 0, the lower boud costructio over Θ 0 k is more ivolved tha that over Θk. We shall stress that the costructio of F 1 ad F ad the compariso betwee composite hypotheses are of idepedet iterest. The miimax lower boud.9 i the regio k follows from 6.5 ad the miimax lower boud 3.18 i the regio k 1 k for the expected legth of cofidece itervals follows from 6.6. I these cases, Σ 0 is take as I ad Assumptio A implies Coditio A itermediate settig with kow σ = σ 0 ad ukow Σ. The results give i Sectios 3 ad 4 show the sigificat differece betwee Θ 0 k ad Θk i terms of miimaxity ad adaptivity of cofidece itervals for β β q. Θ 0 k is for the simple settig with kow desig covariace matrix Σ = I ad kow oise level σ = σ 0, ad Θk is for ukow Σ ad σ. I this sectio, we further cosider miimaxity ad adaptivity of cofidece itervals for β β q i a itermediate settig where the oise level σ = σ 0 is kow ad Σ is ukow but of certai structure. Specifically, we cosider the followig parameter space, β 0 k, λ mi Σ λ max Σ M 1 M 1 Θ σ0 k, s = β, Σ, σ 0 : Σ 1 L1 M, max Σ 1 1 i p i 0 s,

ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania and Rutgers University

ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania and Rutgers University The Aals of Statistics 018, Vol. 46, No. 4, 1807 1836 https://doi.org/10.114/17-aos1604 Istitute of Mathematical Statistics, 018 ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1 BY T. TONY

More information

Confidence Intervals for High-Dimensional Linear Regression: Minimax Rates and Adaptivity

Confidence Intervals for High-Dimensional Linear Regression: Minimax Rates and Adaptivity Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research 5-207 Cofidece Itervals for High-Dimesioal Liear Regressio: Miimax Rates ad Adaptivity Toy Cai Uiversity of Pesylvaia Zijia

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications Semi-supervised Iferece for Explaied Variace i High-dimesioal Liear Regressio ad Its Applicatios T. Toy Cai ad Zijia Guo Uiversity of Pesylvaia ad Rutgers Uiversity March 8, 08 Abstract We cosider statistical

More information

CONFIDENCE INTERVALS FOR HIGH-DIMENSIONAL LINEAR REGRESSION: MINIMAX RATES AND ADAPTIVITY 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania

CONFIDENCE INTERVALS FOR HIGH-DIMENSIONAL LINEAR REGRESSION: MINIMAX RATES AND ADAPTIVITY 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania The Aals of Statistics 207, Vol. 45, No. 2, 65 646 DOI: 0.24/6-AOS46 Istitute of Mathematical Statistics, 207 CONFIDENCE INTERVALS FOR HIGH-DIMENSIONAL LINEAR REGRESSION: MINIMAX RATES AND ADAPTIVITY BY

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if LECTURE 14 NOTES 1. Asymptotic power of tests. Defiitio 1.1. A sequece of -level tests {ϕ x)} is cosistet if β θ) := E θ [ ϕ x) ] 1 as, for ay θ Θ 1. Just like cosistecy of a sequece of estimators, Defiitio

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Lecture 6 Simple alternatives and the Neyman-Pearson lemma STATS 00: Itroductio to Statistical Iferece Autum 06 Lecture 6 Simple alteratives ad the Neyma-Pearso lemma Last lecture, we discussed a umber of ways to costruct test statistics for testig a simple ull

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight) Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Summary. Recap ... Last Lecture. Summary. Theorem

Summary. Recap ... Last Lecture. Summary. Theorem Last Lecture Biostatistics 602 - Statistical Iferece Lecture 23 Hyu Mi Kag April 11th, 2013 What is p-value? What is the advatage of p-value compared to hypothesis testig procedure with size α? How ca

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Disjoint Systems. Abstract

Disjoint Systems. Abstract Disjoit Systems Noga Alo ad Bey Sudaov Departmet of Mathematics Raymod ad Beverly Sacler Faculty of Exact Scieces Tel Aviv Uiversity, Tel Aviv, Israel Abstract A disjoit system of type (,,, ) is a collectio

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

Lecture 11 October 27

Lecture 11 October 27 STATS 300A: Theory of Statistics Fall 205 Lecture October 27 Lecturer: Lester Mackey Scribe: Viswajith Veugopal, Vivek Bagaria, Steve Yadlowsky Warig: These otes may cotai factual ad/or typographic errors..

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

There is no straightforward approach for choosing the warmup period l.

There is no straightforward approach for choosing the warmup period l. B. Maddah INDE 504 Discrete-Evet Simulatio Output Aalysis () Statistical Aalysis for Steady-State Parameters I a otermiatig simulatio, the iterest is i estimatig the log ru steady state measures of performace.

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Axioms of Measure Theory

Axioms of Measure Theory MATH 532 Axioms of Measure Theory Dr. Neal, WKU I. The Space Throughout the course, we shall let X deote a geeric o-empty set. I geeral, we shall ot assume that ay algebraic structure exists o X so that

More information

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions Chapter 9 Slide Ifereces from Two Samples 9- Overview 9- Ifereces about Two Proportios 9- Ifereces about Two Meas: Idepedet Samples 9-4 Ifereces about Matched Pairs 9-5 Comparig Variatio i Two Samples

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

5. Likelihood Ratio Tests

5. Likelihood Ratio Tests 1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,

More information

Rank tests and regression rank scores tests in measurement error models

Rank tests and regression rank scores tests in measurement error models Rak tests ad regressio rak scores tests i measuremet error models J. Jurečková ad A.K.Md.E. Saleh Charles Uiversity i Prague ad Carleto Uiversity i Ottawa Abstract The rak ad regressio rak score tests

More information

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should be doe

More information

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function. MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Probability, Expectation Value and Uncertainty

Probability, Expectation Value and Uncertainty Chapter 1 Probability, Expectatio Value ad Ucertaity We have see that the physically observable properties of a quatum system are represeted by Hermitea operators (also referred to as observables ) such

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Lecture Notes 15 Hypothesis Testing (Chapter 10)

Lecture Notes 15 Hypothesis Testing (Chapter 10) 1 Itroductio Lecture Notes 15 Hypothesis Testig Chapter 10) Let X 1,..., X p θ x). Suppose we we wat to kow if θ = θ 0 or ot, where θ 0 is a specific value of θ. For example, if we are flippig a coi, we

More information

Understanding Samples

Understanding Samples 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We

More information

Notes for Lecture 11

Notes for Lecture 11 U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Section 14. Simple linear regression.

Section 14. Simple linear regression. Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo

More information

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2 Chapter 8 Comparig Two Treatmets Iferece about Two Populatio Meas We wat to compare the meas of two populatios to see whether they differ. There are two situatios to cosider, as show i the followig examples:

More information

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially

More information

MATH/STAT 352: Lecture 15

MATH/STAT 352: Lecture 15 MATH/STAT 352: Lecture 15 Sectios 5.2 ad 5.3. Large sample CI for a proportio ad small sample CI for a mea. 1 5.2: Cofidece Iterval for a Proportio Estimatig proportio of successes i a biomial experimet

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates Iteratioal Joural of Scieces: Basic ad Applied Research (IJSBAR) ISSN 2307-4531 (Prit & Olie) http://gssrr.org/idex.php?joural=jouralofbasicadapplied ---------------------------------------------------------------------------------------------------------------------------

More information

x a x a Lecture 2 Series (See Chapter 1 in Boas)

x a x a Lecture 2 Series (See Chapter 1 in Boas) Lecture Series (See Chapter i Boas) A basic ad very powerful (if pedestria, recall we are lazy AD smart) way to solve ay differetial (or itegral) equatio is via a series expasio of the correspodig solutio

More information

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee

More information

Provläsningsexemplar / Preview TECHNICAL REPORT INTERNATIONAL SPECIAL COMMITTEE ON RADIO INTERFERENCE

Provläsningsexemplar / Preview TECHNICAL REPORT INTERNATIONAL SPECIAL COMMITTEE ON RADIO INTERFERENCE TECHNICAL REPORT CISPR 16-4-3 2004 AMENDMENT 1 2006-10 INTERNATIONAL SPECIAL COMMITTEE ON RADIO INTERFERENCE Amedmet 1 Specificatio for radio disturbace ad immuity measurig apparatus ad methods Part 4-3:

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D. ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information