Accuracy Assessment for High-Dimensional Linear Regression

Similar documents
ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania and Rutgers University

Confidence Intervals for High-Dimensional Linear Regression: Minimax Rates and Adaptivity

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications

CONFIDENCE INTERVALS FOR HIGH-DIMENSIONAL LINEAR REGRESSION: MINIMAX RATES AND ADAPTIVITY 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania

Efficient GMM LECTURE 12 GMM II

Statistics 511 Additional Materials

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Properties and Hypothesis Testing

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

Random Variables, Sampling and Estimation

32 estimating the cumulative distribution function

This is an introductory course in Analysis of Variance and Design of Experiments.

Estimation for Complete Data

Statistical inference: example 1. Inferential Statistics

A statistical method to determine sample size to estimate characteristic value of soil parameters

Infinite Sequences and Series

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

An Introduction to Randomized Algorithms

Basics of Probability Theory (for Theory of Computation courses)

1 Inferential Methods for Correlation and Regression Analysis

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Frequentist Inference

Convergence of random variables. (telegram style notes) P.J.C. Spreij

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

CHAPTER 10 INFINITE SEQUENCES AND SERIES

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

Introductory statistics

Chapter 6 Principles of Data Reduction

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

A survey on penalized empirical risk minimization Sara A. van de Geer

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Bayesian Methods: Introduction to Multi-parameter Models

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Lecture 2: Monte Carlo Simulation

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

REGRESSION WITH QUADRATIC LOSS

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

The standard deviation of the mean

Empirical Process Theory and Oracle Inequalities

Summary. Recap ... Last Lecture. Summary. Theorem

Optimally Sparse SVMs

Disjoint Systems. Abstract

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Lecture 11 October 27

5.1 A mutual information bound based on metric entropy

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

1 Review and Overview

There is no straightforward approach for choosing the warmup period l.

Information-based Feature Selection

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Axioms of Measure Theory

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Advanced Stochastic Processes.

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

5. Likelihood Ratio Tests

Rank tests and regression rank scores tests in measurement error models

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Output Analysis (2, Chapters 10 &11 Law)

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Topic 9: Sampling Distributions of Estimators

Probability, Expectation Value and Uncertainty

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Vector Quantization: a Limiting Case of EM

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Lecture Notes 15 Hypothesis Testing (Chapter 10)

Understanding Samples

Notes for Lecture 11

Intro to Learning Theory

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Section 14. Simple linear regression.

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

MATH/STAT 352: Lecture 15

Lecture 3: August 31

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

x a x a Lecture 2 Series (See Chapter 1 in Boas)

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Provläsningsexemplar / Preview TECHNICAL REPORT INTERNATIONAL SPECIAL COMMITTEE ON RADIO INTERFERENCE

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

6.3 Testing Series With Positive Terms

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Lecture 10 October Minimaxity and least favorable prior sequences

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Element sampling: Part 2

Chapter 6 Sampling Distributions

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Topic 9: Sampling Distributions of Estimators

Statistical Inference Based on Extremum Estimators

On Random Line Segments in the Unit Square

Transcription:

Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia Follow this ad additioal works at: http://repository.upe.edu/statistics_papers Part of the Physical Scieces ad Mathematics Commos Recommeded Citatio Cai, T., & Guo, Z. 016. Accuracy Assessmet for High-Dimesioal Liear Regressio. The Aals of Statistics, Retrieved from http://repository.upe.edu/statistics_papers/84 This paper is posted at ScholarlyCommos. http://repository.upe.edu/statistics_papers/84 For more iformatio, please cotact repository@pobox.upe.edu.

Accuracy Assessmet for High-Dimesioal Liear Regressio Abstract This paper cosiders poit ad iterval estimatio of the l q loss of a estimator i high-dimesioal liear regressio with radom desig. We establish the miimax rate for estimatig the l q loss ad the miimax expected legth of cofidece itervals for the l q loss of rate-optimal estimators of the regressio vector, icludig commoly used estimators such as Lasso, scaled Lasso, square-root Lasso ad Datzig Selector. Adaptivity of the cofidece itervals for the l q loss is also studied. Both the settig of kow idetity desig covariace matrix ad kow oise level ad the settig of ukow desig covariace matrix ad ukow oise level are studied. The results reveal iterestig ad sigificat differeces betwee estimatig the l loss ad l q loss with 1 q < as well as betwee the two settigs. New techical tools are developed to establish rate sharp lower bouds for the miimax estimatio error ad the expected legth of miimax ad adaptive cofidece itervals for the l q loss. A sigificat differece betwee loss estimatio ad the traditioal parameter estimatio is that for loss estimatio the costrait is o the performace of the estimator of the regressio vector, but the lower bouds are o the difficulty of estimatig its l q loss. The techical tools developed i this paper ca also be of idepedet iterest. Keywords Accuracy assessmet, adaptivity, cofidece iterval, highdimesioal liear regressio, loss estimatio, miimax lower boud, miimaxity, sparsity Disciplies Physical Scieces ad Mathematics This joural article is available at ScholarlyCommos: http://repository.upe.edu/statistics_papers/84

Submitted to the Aals of Statistics arxiv: arxiv:0000.0000 ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION By T. Toy Cai, ad Zijia Guo Uiversity of Pesylvaia This paper cosiders poit ad iterval estimatio of the l q loss of a estimator i high-dimesioal liear regressio with radom desig. We establish the miimax rate for estimatig the l q loss ad the miimax expected legth of cofidece itervals for the l q loss of rate-optimal estimators of the regressio vector, icludig commoly used estimators such as Lasso, scaled Lasso, square-root Lasso ad Datzig Selector. Adaptivity of cofidece itervals for the l q loss is also studied. Both the settig of kow idetity desig covariace matrix ad kow oise level ad the settig of ukow desig covariace matrix ad ukow oise level are studied. The results reveal iterestig ad sigificat differeces betwee estimatig the l loss ad l q loss with 1 q < as well as betwee the two settigs. New techical tools are developed to establish rate sharp lower bouds for the miimax estimatio error ad the expected legth of miimax ad adaptive cofidece itervals for the l q loss. A sigificat differece betwee loss estimatio ad the traditioal parameter estimatio is that for loss estimatio the costrait is o the performace of the estimator of the regressio vector, but the lower bouds are o the difficulty of estimatig its l q loss. The techical tools developed i this paper ca also be of idepedet iterest. 1. Itroductio. I may applicatios, the goal of statistical iferece is ot oly to costruct a good estimator, but also to provide a measure of accuracy for this estimator. I classical statistics, whe the parameter of iterest is oe-dimesioal, this is achieved i the form of a stadard error or a cofidece iterval. A prototypical example is the iferece for a biomial proportio, where ofte ot oly a estimate of the proportio but also its margi of error are give. Accuracy measures of a estimatio procedure have also bee used as a tool for the empirical selectio of tuig parameters. A well kow example is Stei s Ubiased Risk Estimate SURE, which has bee a effective tool for the costructio of data-drive adaptive estimators i ormal meas estimatio, oparametric sigal recovery, covariace The research was supported i part by NSF Grats DMS-10898 ad DMS-1403708, ad NIH Grat R01 CA17334. MSC 010 subject classificatios: Primary 6G15; secodary 6C0, 6H35 Keywords ad phrases: Accuracy assessmet, adaptivity, cofidece iterval, highdimesioal liear regressio, loss estimatio, miimax lower boud, miimaxity, sparsity. 1

T. T. CAI AND Z. GUO matrix estimatio, ad other problems. See, for istace, [5, 1, 15, 11, 3]. The commoly used cross-validatio methods ca also be viewed as a useful tool based o the idea of empirical assessmet of accuracy. I this paper, we cosider the problem of estimatig the loss of a give estimator i the settig of high-dimesioal liear regressio, where oe observes X, y with X R p ad y R, ad for 1 i, y i = X i β + ϵ i. Here β R p iid is the regressio vector, X i N p 0, Σ are the rows of X, ad iid the errors ϵ i N0, σ are idepedet of X. This high-dimesioal liear model has bee well studied i the literature, where the mai focus has bee o estimatio of β. Several pealized/costraied l 1 miimizatio methods, icludig Lasso [8], Datzig selector [1], scaled Lasso [6] ad square-root Lasso [3], have bee proposed. These methods have bee show to work well i applicatios ad produce iterpretable estimates of β whe β is assumed to be sparse. Theoretically, with a properly chose tuig parameter, these estimators achieve the optimal rate of covergece over collectios of sparse parameter spaces. See, for example, [1, 6, 3, 3, 4, 5, 30]. For a give estimator β, the l q loss β β q with 1 q is commoly used as a metric of accuracy for β. We cosider i the preset paper both poit ad iterval estimatio of the l q loss β β q for a give β. Note that the loss β β q is a radom quatity, depedig o both the estimator β ad the parameter β. For such a radom quatity, predictio ad predictio iterval are ususally used for poit ad iterval estimatio, respectively. However, we slightly abuse the termiologies i the preset paper by usig estimatio ad cofidece iterval to represet the poit ad iterval estimators of the loss β β q. Sice the l q loss depeds o the estimator β, it is ecessary to specify the estimator i the discussio of loss estimatio. Throughout this paper, we restrict our attetio to a broad collectio of estimators β that perform well at least at oe iterior poit or a small subset of the parameter space. This collectio of estimators icludes most state-of-art estimators such as Lasso, Datzig selector, scaled Lasso ad square-root Lasso. High-dimesioal liear regressio has bee well studied i two settigs. Oe is the settig with kow desig covariace matrix Σ = I, kow oise level σ = σ 0 ad sparse β. See for example, [16,,, 30, 7, 7, 1, 19]. Aother commoly cosidered settig is sparse β with ukow Σ ad σ. We study poit ad iterval estimatio of the l q loss β β q i both settigs. Specifically, we cosider the parameter space Θ 0 k itroduced i

HIGH-DIMENSIONAL ACCURACY ASSESSMENT 3.3, which cosists of k-sparse sigals β with kow desig covariace matrix Σ = I ad kow oise level σ = σ 0, ad Θk defied i.4, which cosists of k-sparse sigals with ukow Σ ad σ. 1.1. Our cotributios. The preset paper studies the miimax ad adaptive estimatio of the loss β β q for a give estimator β ad the miimax expected legth ad adaptivity of cofidece itervals for the loss. A major step i our aalysis is to establish rate sharp lower bouds for the miimax estimatio error ad the miimax expected legth of cofidece itervals for the l q loss over Θ 0 k ad Θk for a broad class of estimators of β, which cotais the subclass of rate-optimal estimators. We the focus o the estimatio of the loss of rate-optimal estimators ad take the Lasso ad scaled Lasso estimators as geeric examples. For these rate-optimal estimators, we propose procedures for poit estimatio as well as cofidece itervals for their l q losses. It is show that the proposed procedures achieve the correspodig lower bouds up to a costat factor. These results together establish the miimax rates for estimatig the l q loss of rate-optimal estimators over Θ 0 k ad Θk. The aalysis shows iterestig ad sigificat differeces betwee estimatig the l loss ad l q loss with 1 q < as well as betwee the two parameter spaces Θk ad Θ 0 k. { } The miimax rate for estimatig β β over Θ 0k is mi 1, k ad over Θk is k iformatio Σ = I ad σ = σ 0 whe. So loss estimatio is much easier with the prior k The miimax rate for estimatig β β q with 1 q < over both Θ 0 k ad Θk is k q.. I the regime k, a practical loss estimator is proposed for estimatig the l loss ad show to achieve the optimal covergece 1 rate adaptively over Θ 0 k. We say estimatio of loss is impossible if the miimax rate ca be achieved by the trivial estimator 0, which meas that the estimatio accuracy of the loss is at least of the same order as the loss itself. I all other cosidered cases, estimatio of loss is show to be impossible. These results idicate that loss estimatio is difficult. We the tur to the costructio of cofidece itervals for the l q loss. A cofidece iterval for the loss is useful eve whe it is impossible to estimate the loss, as a cofidece iterval ca provide o-trivial upper ad lower bouds for the loss. I terms of covergece rate over Θ 0 k or Θk, the miimax rate of the expected legth of cofidece itervals for the l q loss, β β q, of ay rate-optimal estimator β coicides with

4 T. T. CAI AND Z. GUO the miimax estimatio rate. We also cosider the adaptivity of cofidece itervals for the l q loss of ay rate-optimal estimator β. The framework for adaptive cofidece itervals is discussed i detail i Sectio 3.1. Regardig cofidece itervals for the l loss i the case of kow Σ = I ad σ = σ 0, a procedure is proposed ad is show to achieve the optimal legth 1 adaptively over Θ 0 k for k. Furthermore, it is show that this is the oly regime where adaptive cofidece itervals exist, eve over two give parameter spaces. For example, whe k 1 ad k 1 k, it is impossible to costruct a cofidece iterval for the l loss with guarateed coverage probability over Θ 0 k cosequetly also over Θ 0 k 1 ad with the expected legth automatically adjusted to the sparsity. Similarly, for the l q loss with 1 q <, costructio of adaptive cofidece itervals is impossible over Θ 0 k 1 ad Θ 0 k for k 1 k. Regardig cofidece itervals for the l q loss with 1 q i the case of ukow Σ ad σ, the impossibility of adaptivity also holds over Θk 1 ad Θk for k 1 k. Establishig rate-optimal lower bouds requires the developmet of ew techical tools. Oe mai differece betwee loss estimatio ad the traditioal parameter estimatio is that for loss estimatio the costrait is o the performace of the estimator β of the regressio vector β, but the lower boud is o the difficulty of estimatig its loss β β q. We itroduce useful ew lower boud techiques for the miimax estimatio error ad the expected legth of adaptive cofidece itervals for the loss β β q. I several importat cases, it is ecessary to test a composite ull agaist a composite alterative i order to establish rate sharp lower bouds. The techical tools developed i this paper ca also be of idepedet iterest. I additio to Θ 0 k ad Θk, we also study a itermediate parameter space where the oise level σ is kow ad the desig covariace matrix Σ is ukow but of certai structure. Lower bouds for the expected legth of miimax ad adaptive cofidece itervals for β β q over this parameter space are established for a broad collectio of estimators β ad are show to be rate sharp for the class of rate-optimal estimators. Furthermore, the lower bouds developed i this paper have wider implicatios. I particular, it is show that they lead immediately to miimax lower bouds for estimatig β q ad the expected legth of cofidece itervals for β q with 1 q. 1.. Compariso with other works. Statistical iferece o the loss of specific estimators of β has bee cosidered i the recet literature. The papers [16, ] established, i the settig Σ = I ad /p δ 0,, the limit of the ormalized loss 1 p βλ β where βλ is the Lasso estima-

HIGH-DIMENSIONAL ACCURACY ASSESSMENT 5 tor with a pre-specified tuig parameter λ. Although [16, ] provided a exact asymptotic expressio of the ormalized loss, the limit itself depeds o the ukow β. I a similar settig, the paper [7] established the limit of a ormalized l loss of the square-root Lasso estimator. These limits of the ormalized losses help uderstad the properties of the correspodig estimators of β, but they do ot lead to a estimate of the loss. Our results imply that although these ormalized losses have a limit uder certai regularity coditios, such losses caot be estimated well i most settigs. A recet paper, [0], costructed a cofidece iterval for β β i the case of kow Σ = I, ukow oise level σ, ad moderate dimesio where /p ξ 0, 1 ad o sparsity is assumed o β. While o sparsity assumptio o β is imposed, their method requires the assumptio of Σ = I ad /p ξ 0, 1. I cotrast, i this paper, we cosider both ukow Σ ad kow Σ = I settigs, while allowig p ad assumig sparse β. Hoest adaptive iferece has bee studied i the oparametric fuctio estimatio literature, icludig [8] for adaptive cofidece itervals for liear fuctioals, [18, 10] for adaptive cofidece bads, ad [9, 4] for adaptive cofidece balls, ad i the high-dimesioal liear regressio literature, icludig [] for adaptive cofidece set ad [7] for adaptive cofidece iterval for liear fuctioals. I this paper, we develop ew lower boud tools, Theorems 8 ad 9, to establish the possibility of adaptive cofidece itervals for β β q. The coectio betwee l loss cosidered i the curret paper ad the work [] is discussed i more detail i Sectio 3.. 1.3. Orgaizatio. Sectio establishes the miimax lower bouds of estimatig the loss β β q with 1 q over both Θ 0 k ad Θk ad shows that these bouds are rate sharp for the Lasso ad scaled Lasso estimators, respectively. We the tur to iterval estimatio of β β q. Sectios 3 ad 4 preset the miimax ad adaptive miimax lower bouds for the expected legth of cofidece itervals for β β q over Θ 0 k ad Θk. For Lasso ad scaled Lasso estimators, we show that the lower bouds ca be achieved ad ivestigate the possibility of adaptivity. Sectio 5 cosiders the rate-optimal estimators ad establishes the miimax covergece rate of estimatig their l q losses. Sectio 6 presets ew miimax lower boud techiques for estimatig the loss β β q. Sectio 7 discusses the miimaxity ad adaptivity i aother settig, where the oise level σ is kow ad the desig covariace matrix Σ is ukow but of certai structure. Sectio 8 applies the ewly developed lower bouds to establish lower bouds for a related problem, that of estimatig β q. Sectio 9 proves the mai results ad additioal proofs are give i the supplemetal material [6].

6 T. T. CAI AND Z. GUO 1.4. Notatio. For a matrix X R p, X i, X j, ad X i,j deote respectively the i-th row, j-th colum, ad i, j etry of the matrix X. For a subset J {1,,, p}, J deotes the cardiality of J, J c deotes the complemet {1,,, p}\j, X J deotes the submatrix of X cosistig of colums X j with j J ad for a vector x R p, x J is the subvector of x with idices i J. For a vector x R p, suppx deotes the support of x ad the l q orm of x is defied as x q = p i=1 x i q 1 q for q 0 with x 0 = suppx ad x = max 1 j p x j. For a R, a + = max {a, 0}. We use max X j as a shorthad for max 1 j p X j ad mi X j as a shorthad for mi 1 j p X j. For a matrix A, we defie the spectral orm A = sup x =1 Ax ad the matrix l 1 orm A L1 = sup p 1 j p i=1 A ij ; For a symmetric matrix A, λ mi A ad λ max A deote respectively the smallest ad largest eigevalue of A. We use c ad C to deote geeric positive costats that may vary from place to place. For two positive sequeces a ad b, a b meas a Cb for all ad a b if b a ad a b if a b ad b a, ad a a b if lim sup b = 0 ad a b if b a.. Miimax estimatio of the l q loss. We begi by presetig the miimax framework for estimatig the l q loss, β β q, of a give estimator β, ad the establish the miimax lower bouds for the estimatio error for a broad collectio of estimators β. We also show that such miimax lower bouds ca be achieved for the Lasso ad scaled Lasso estimators..1. Problem formulatio. Recall the high-dimesioal liear model,.1 y 1 = X p β p 1 + ϵ 1, ϵ N 0, σ I. We focus o the radom desig with X i iid N 0, Σ ad X i ad ϵ i are idepedet. Let Z = X, y deote the observed data ad β be a give estimator of β. Deotig by L q Z ay estimator of the loss β β q, the miimax rate of covergece for estimatig β β q over a parameter space Θ is defied as the largest quatity γ β,lq Θ such that. if sup P θ L q Z β β q Θ δ, γ β,lq L q θ Θ for some costat δ > 0 ot depedig o or p. We shall write L q for L q Z whe there is o cofusio. We deote the parameter by θ = β, Σ, σ, which cosists of the sigal β, the desig covariace matrix Σ ad the oise level σ. For a give

HIGH-DIMENSIONAL ACCURACY ASSESSMENT 7 θ = β, Σ, σ, we use βθ to deote the correspodig β. Two settigs are cosidered: The first is kow desig covariace matrix Σ = I ad kow oise level σ = σ 0 ad the other is ukow Σ ad σ. I the first settig, we cosider the followig parameter space that cosists of k-sparse sigals,.3 Θ 0 k = {β, I, σ 0 : β 0 k}, ad i the secod settig, we cosider.4 { } 1 Θk = β, Σ, σ : β 0 k, λ mi Σ λ max Σ M 1, 0 < σ M, M 1 where M 1 1 ad M > 0 are costats. The parameter space Θ 0 k is a subset of Θk, which cosists of k-sparse sigals with ukow Σ ad σ. The miimax rate γ β,lq Θ for estimatig β β q also depeds o the estimator β. Differet estimators β could lead to differet losses β β q ad i geeral the difficulty of estimatig the loss β β q varies with β. We first recall the properties of some state-of-art estimators ad the specify the collectio of estimators o which we focus i this paper. As show i [1, 4, 3, 6], Lasso, Datzig Selector, scaled Lasso ad square-root Lasso satisfy the followig property if the tuig parameter is properly chose,.5 sup P θ β β q Ck q 0, θ Θk where C > 0 is a costat. The miimax lower bouds established i [30, 3, 31] imply that k q is the optimal rate for estimatig β over the parameter space Θk. It should be stressed that all of these algorithms do ot require kowledge of the sparsity k ad are thus adaptive to the sparsity provided k. We cosider a broad collectio of estimators β satisfyig oe of the followig two assumptios. A1 The estimator β satisfies, for some θ 0 = β, I, σ 0,.6 P θ0 β β q C β q 0 σ 0 α 0, where 0 α 0 < 1 4 ad C > 0 are costats. A The estimator β satisfies.7 sup {θ=β,i,σ:σ σ 0 } P θ β β q C β q 0 σ α 0, where 0 α 0 < 1 4 ad C > 0 are costats ad σ 0 > 0 is give.

8 T. T. CAI AND Z. GUO I view of the miimax rate give i.5, Assumptio A1 requires β to be a good estimator of β at at least oe poit θ 0 Θ 0 k. Assumptio A is slightly stroger tha A1 ad requires β to estimate β well for a sigle β but over a rage of oise levels σ σ 0 while Σ = I. Of course, ay estimator β satisfyig.5 satisfies both A1 ad A. I additio to Assumptios A1 ad A, we also itroduce the followig sparsity assumptios that will be used i various theorems. B1 Let c 0 be the costat defied i 9.14. The sparsity levels k ad k 0 satisfy k c 0 mi{p γ, } for some costat 0 γ < 1 ad k 0 c 0 mi{k, }. B The sparsity levels k 1, k ad k 0 satisfy k 1 k c 0 mi{p γ, } for some costat 0 γ < 1 ad c 0 > 0 ad k 0 c 0 mi{k 1, }... Miimax estimatio of the l q loss over Θ 0 k. The followig theorem establishes the miimax lower bouds for estimatig the loss β β q over the parameter space Θ 0 k. Theorem 1. Suppose that the sparsity levels k ad k 0 satisfy Assumptio B1. For ay estimator β satisfyig Assumptio A1 with β 0 k 0, {.8 if sup P θ L β β c mi k } L θ Θ 0 k, 1 σ0 δ. For ay estimator β satisfyig Assumptio A with β 0 k 0,.9 if sup P θ L q β β q ck q L q θ Θ 0 k σ 0 δ, for 1 q <, where δ > 0 ad c > 0 are costats. Remark 1. Assumptio A1 restricts our focus to estimators that ca perform well at at least oe poit β, I, σ 0 Θ 0 k. This weak coditio makes the established lower bouds widely applicable as the bechmark for evaluatig estimators of the l q loss of ay β that performs well at a proper subset, or eve a sigle poit of the whole parameter space. I this paper, we focus o estimatig the loss β β q with 1 q. Similar results ca be established for the loss i the form of β β q q with 1 q ; Uder the same assumptios as those i Theorem 1, the lower bouds for estimatig the loss β β q q hold with replacig the covergece rates with their q power; that is,.8 remais the same while the covergece

HIGH-DIMENSIONAL ACCURACY ASSESSMENT 9 rate k q /σ 0 i.9 is replaced by k /σ 0 q. Similarly, all the results established i the rest of the paper for β β q hold for β β q q with correspodig covergece rates replaced by their q power. Theorem 1 establishes the miimax lower bouds for estimatig the l loss β β of ay estimator β satisfyig Assumptio A1 ad the l q loss β β q with 1 q < of ay estimator β satisfyig Assumptio A. We will take the Lasso estimator as a example ad demostrate the implicatios of the above theorem. We radomly split Z = y, X ito subsamples Z 1 = y 1, X 1 ad Z = y, X with sample sizes 1 ad, respectively. The Lasso estimator β L based o the first subsample Z 1 = y 1, X 1 is defied as.10 βl = arg mi β R p y 1 X 1 β 1 + λ p j=1 X 1 j 1 β j, where λ = A / 1 σ 0 with A > beig a pre-specified costat. Without loss of geerality, we assume 1. For the case 1 q <,.5 ad.9 together imply that the estimatio of the l q loss β L β q is impossible sice the lower boud ca be achieved by the trivial estimator of the loss, 0. That is, sup θ Θ0 k P θ 0 β L β q Ck q 0. k For the case q =, i the regime k, the lower boud i.8 ca be achieved by the zero estimator ad hece estimatio of the loss β L β is impossible. However, the iterestig case is whe k, the loss estimator L proposed i.11 achieves the miimax lower boud 1 i.8, which caot be achieved by the zero estimator. We ow detail the costructio of the loss estimator L. Based o the secod half sample Z = y, X, we propose the followig estimator, 1 y.11 L = X βl σ 0 Note that the first subsample Z 1 = y 1, X 1 is used to produce the Lasso estimator β L i.10 ad the secod subsample Z = y, X is retaied to evaluate the loss β L β. Such sample splittig techique is similar to cross-validatio ad has bee used i [] for costructig cofidece sets for β ad i [0] for cofidece itervals for the l loss. The followig propositio establishes that the estimator L achieves the miimax lower boud of.8 over the regime k.. +

10 T. T. CAI AND Z. GUO Propositio 1. Suppose that k ad β L is the Lasso estimator defied i.10 with A >, the the estimator of loss proposed i.11 satisfies, for ay sequece δ,p,.1 lim sup,p sup θ Θ 0 k P θ L β L β 1 δ,p = 0..3. Miimax estimatio of the l q loss over Θk. We ow tur to the case of ukow Σ ad σ ad establish the miimax lower boud for estimatig the l q loss over the parameter space Θk. Theorem. Suppose that the sparsity levels k ad k 0 satisfy Assumptio B1. For ay estimator β satisfyig Assumptio A1 with β 0 k 0,.13 if sup P θ L q β β q ck q δ, 1 q, L q θ Θk where δ > 0 ad c > 0 are costats. Theorem provides a miimax lower boud for estimatig the l q loss of ay estimator β satisfyig Assumptio A1, icludig the scaled Lasso estimator defied as.14 { β SL y Xβ, ˆσ} = arg mi + σ β R p,σ R + σ + λ 0 p j=1 X j β j, where λ 0 = A / with A >. Note that for the scaled Lasso estimator, the lower boud i.13 ca be achieved by the trivial loss estimator 0, i the sese, sup θ Θk P θ 0 β SL β q Ck q 0, ad hece estimatio of loss is impossible i this case. 3. Miimaxity ad adaptivity of cofidece itervals over Θ 0 k. We focused i the last sectio o poit estimatio of the l q loss ad showed the impossibility of loss estimatio except for oe regime. The results aturally lead to aother questio: Is it possible to costruct useful cofidece itervals for β β q that ca provide o-trivial upper ad lower bouds for the loss? I this sectio, after itroducig the framework for miimaxity ad adaptivity of cofidece itervals, we cosider the case of kow Σ = I ad σ = σ 0 ad establish the miimaxity ad adaptivity lower bouds for the expected legth of cofidece itervals for the l q loss of a broad collectio of estimators over the parameter space Θ 0 k. We also show that such

HIGH-DIMENSIONAL ACCURACY ASSESSMENT 11 miimax lower bouds ca be achieved for the Lasso estimator ad the discuss the possibility of adaptivity usig the Lasso estimator as a example. The case of ukow Σ ad σ will be the focus of the ext sectio. 3.1. Framework for miimaxity ad adaptivity of cofidece itervals. I this sectio, we itroduce the followig decisio theoretical framework for cofidece itervals of the loss β β q. Give 0 < α < 1 ad the parameter space Θ ad the loss β β q, deote by I α Θ, β, l q the set of all 1 α level cofidece itervals for β β q over Θ, 3.1 I α Θ, β, { l q = CI α β, lq, Z = [l Z, u Z] : if θ Θ P θ We will write CI α for CI α β, lq, Z whe there is o cofusio. For ay cofidece iterval CI α β, lq, Z = [l Z, u Z], its legth is deoted by L CI α β, lq, Z = u Z l Z ad the maximum expected legth over a parameter space Θ 1 is defied as 3. L CI α β, lq, Z, Θ 1 = sup E θ L CI α β, lq, Z. θ Θ 1 For two ested parameter spaces Θ 1 Θ, we defie the bechmark L α measurig the degree of adaptivity over the ested spaces Θ 1 Θ, 3.3 L α Θ 1, Θ, β, l q = if sup E θ L CI α β, lq, Z. CI α β,l q,z I αθ, β,l q θ Θ 1 We will write L α Θ 1, β, l q for L α Θ 1, Θ 1, β, l q, which is the miimax expected legth of cofidece itervals for β β q over Θ 1. The bechmark L α Θ 1, Θ, β, l q is the ifimum of the maximum expected legth over Θ 1 amog all 1 α-level cofidece itervals over Θ. I cotrast, L α Θ 1, β, l q is cosiderig all 1 α-level cofidece itervals over Θ 1. I words, if there is prior iformatio that the parameter lies i the smaller parameter space Θ 1, L α Θ 1, β, l q measures the bechmark legth of cofidece itervals over the parameter space Θ 1, which is illustrated i the left of Figure 1; however, if there is oly prior iformatio that the parameter lies i the larger parameter space Θ, L α Θ 1, Θ, β, l q measures the bechmark legth of cofidece itervals over the parameter space Θ 1, which is illustrated i the right of Figure 1. } β βθ q CI α β, lq, Z 1 α. Θ 1, Θ, β, l q,

1 T. T. CAI AND Z. GUO L Θ, β, ll Θ L Θ, Θ, β, ll Θ Ω Θ Fig 1. The plot demostrates defiitios of L α Θ 1, β, l q ad L α Θ 1, Θ, β, l q. Rigorously, we defie a cofidece iterval CI to be simultaeously adaptive over Θ 1 ad Θ if CI I α Θ, β, l q, 3.4 L CI, Θ 1 L α Θ 1, β, l q, ad L CI, Θ L α Θ, β, l q. The coditio 3.4 meas that the cofidece iterval CI, which has coverage over the larger parameter space Θ, achieves the miimax rate over both Θ 1 ad Θ. Note that L CI, Θ 1 L α Θ 1, Θ, β, l q. If L α Θ 1, Θ, β, l q L α Θ 1, β, l q, the the rate-optimal adaptatio 3.4 is impossible to achieve for Θ 1 Θ. Otherwise, it is possible to costruct cofidece itervals simultaeously adaptive over parameter spaces Θ 1 ad Θ. The possibility of adaptatio over parameter spaces Θ 1 ad Θ ca thus be aswered by ivestigatig the bechmark quatities L α Θ 1, β, l q ad L α Θ 1, Θ, β, l q. Such framework has already bee itroduced i [7], which studies the miimaxity ad adaptivity of cofidece itervals for liear fuctioals i highdimesioal liear regressio. We will adopt the miimax ad adaptatio framework discussed above ad establish the miimax expected legth L α Θ 0 k, β, l q ad the adaptatio bechmark L α Θ 0 k 1, Θ 0 k, β, l q. I terms of the miimax expected legth ad the adaptivity behavior, there exist fudametal differeces betwee the case q = ad 1 q <. We will discuss them separately i the followig two subsectios. 3.. Cofidece itervals for the l loss over Θ 0 k. The followig theorem establishes the miimax lower boud for the expected legth of cofidece itervals of β β over the parameter space Θ 0k. Theorem 3. Suppose that 0 < α < 1 4 ad the sparsity levels k ad k 0 satisfy Assumptio B1. For ay estimator β satisfyig Assumptio A1

HIGH-DIMENSIONAL ACCURACY ASSESSMENT 13 with β 0 k 0, the there is some costat c > 0 such that 3.5 L α Θ 0 k, β, { } k l c mi, 1 σ0. I particular, if β L is the Lasso estimator defied i.10 with A >, the the miimax expected legth for 1 α level cofidece itervals of β L β over Θ 0k is 3.6 L α Θ 0 k, β { } k L, l mi, 1 σ0. We ow cosider adaptivity of cofidece itervals for the l loss. The followig theorem gives the lower boud for the bechmark L α Θ 0 k 1, Θ 0 k, β, l. We will the discuss Theorems 3 ad 4 together. Theorem 4. Suppose that 0 < α < 1 4 ad the sparsity levels k 1, k ad k 0 satisfy Assumptio B. For ay estimator β satisfyig Assumptio A1 with β 0 k 0, the there is some costat c > 0 such that 3.7 L α Θ 0 k 1, Θ 0 k, β, { } k 1 l c mi, σ 0. I particular, if β L is the Lasso estimator defied i.10 with A >, the above lower boud ca be achieved. The lower boud established i Theorem 4 implies that of Theorem 3 ad both lower bouds hold for a geeral class of estimators satisfyig Assumptio A1. There is a phase trasitio for the lower boud of the bechmark L α Θ 0 k 1, Θ 0 k, β, l. I the regime k, the lower boud i 3.7 is k σ0 ; whe k 1, the lower boud i 3.7 is σ0. For the Lasso estimator β L defied i.10, the lower boud k σ 0 i 3.5 ad k σ0 i 3.7 ca be achieved by the cofidece itervals CI0 α Z, k, ad CI 0 α Z, k, defied i 3.15, respectively. Applyig a similar idea to.11, we show that the miimax lower boud 1 σ0 i 3.6 ad 3.7 ca be achieved by the followig cofidece iterval, 3.8 CI 1 α Z = ψ Z 1 χ 1 α ψ Z σ 0, 1 χ α σ 0, + +

14 T. T. CAI AND Z. GUO where χ 1 α ad χ α are the 1 α ad α quatiles of χ radom variable with degrees of freedom, respectively, ad { 1 } y 3.9 ψ Z = mi X βl, σ 0. Note that the two-sided cofidece iterval 3.8 is simply based o the observed data Z, ot depedig o ay prior kowledge of the sparsity k. Furthermore, it is a two-sided cofidece iterval, which tells ot oly just a upper boud, but also a lower boud for the loss. The coverage property ad the expected legth of CI 1 α Z are established i the followig propositio. Propositio. Suppose k ad β L is the estimator defied i.10 with A >. The CI 1 α Z defied i 3.8 satisfies, 3.10 lim if,p ad if P β L β CI 1 α Z θ Θ 0 k 3.11 L CI 1 α Z, Θ 0 k 1 σ 0. 1 α, Θ 0 k 1 k 1 Θ 0 k 1 k 1 Θ 0 k 1 1 Θ 0 k Ω Θ 0 k 1 k Θ 0 k Ω Θ 0 k 1 1 Θ 0 k Ω Θ 0 k 1 1 Fig. Illustratio of L α Θ 0k 1, β L, l top ad L α over regimes k 1 k k rightmost. leftmost, k1 k Θ 0k 1, Θ 0k, β L, l bottom middle ad k1 Regardig the Lasso estimator β L defied i.10, we will discuss the possibility of adaptivity of cofidece itervals for β L β. The adaptivity behavior of cofidece itervals for β L β is demostrated i Figure. As illustrated i the rightmost plot of Figure, i the regime k 1 k, we obtai L α Θ 0 k 1, Θ 0 k, β L, l L α Θ 0 k 1, β L, l

HIGH-DIMENSIONAL ACCURACY ASSESSMENT 15 1, which implies that adaptatio is possible over this regime. As show i Propositio, the cofidece iterval CI 1 α Z defied i 3.8 is fully adaptive over the regime k i the sese of 3.4. Illustrated i the leftmost ad middle plots of Figure, it is impossible to costruct a adaptive cofidece iterval for β L β over regimes k 1 k ad k 1 k sice L α Θ 0 k 1, Θ 0 k, β L, l L α Θ 0 k 1, β L, l if k 1 ad k 1 k. To sum up, adaptive cofidece itervals for β L β is oly possible over the regime k. Compariso with cofidece balls. We should ote that the problem of costructig cofidece itervals for β β is related to but differet from that of costructig{ cofidece sets for β itself. Cofidece balls costructed i [] are of form β : β β } u Z, where β ca be the Lasso estimator ad u Z is a data depedet squared radius. See [] for further details. A aive applicatio of this cofidece ball leads to a oe-sided cofidece iterval for the loss β β, 3.1 CI iduced α Z = { β β : β β u Z Due to the reaso that cofidece sets { for β were sought for i Theorem 1 i [], cofidece sets i the form β : β β } u Z will suffice to achieve the optimal legth. However, sice our goal is to characterize β β, we apply the ubiased risk estimatio discussed i Theorem 1 of [] ad costruct the two-sided cofidece iterval i 3.8. Such a twosided cofidece iterval is more iformative tha the oe-sided cofidece iterval 3.1 sice the oe-sided cofidece iterval does ot cotai the iformatio whether the loss is close to zero or ot. Furthermore, as show i [], the legth of cofidece iterval CI iduced α Z over the parameter space 1 Θ 0 k is of order + k. The two-sided cofidece iterval CI1 α Z costructed i 3.8 is of expected legth 1, which is much shorter tha 1 + k i the regime k. That is, the two-sided cofidece iterval 3.8 provides a more accurate iterval estimator of the l loss. This is illustrated i Figure 3. The lower boud techique developed i the literature of adaptive cofidece sets [] ca also be used to establish some of the lower boud results for the case q = give i the preset paper. However, ew techiques are eeded i order to establish the rate sharp lower bouds for the miimax estimatio error.9 i the regio k ad for the }.

16 T. T. CAI AND Z. GUO CI Z 0 β β CI Z Fig 3. Compariso of the two-sided cofidece iterval CI 1 α Z with the oe-sided cofidece iterval CI iduced α Z. expected legth of the cofidece itervals 3.18 ad 7.3 i the regio k 1 k, where it is ecessary to test a composite ull agaist a composite alterative i order to establish rate sharp lower bouds. 3.3. Cofidece itervals for the l q loss with 1 q < over Θ 0 k. We ow cosider the case 1 q < ad ivestigate the miimax expected legth ad adaptivity of cofidece itervals for β β q over the parameter space Θ 0 k. The followig theorem characterizes the miimax covergece rate for the expected legth of cofidece itervals. Theorem 5. Suppose that 0 < α < 1 4, 1 q < ad the sparsity levels k ad k 0 satisfy Assumptio B1. For ay estimator β satisfyig Assumptio A with β 0 k 0, the there is some costat c > 0 such that 3.13 L α Θ 0 k, β, l q ck q σ 0. I particular, if β L is the Lasso estimator defied i.10 with A > 4, the the miimax expected legth for 1 α level cofidece itervals of β L β q over Θ 0 k is 3.14 L α Θ 0 k, β L, l q k q σ 0. We ow costruct the cofidece iterval achievig the miimax covergece rate i 3.14, 3.15 CI 0 α Z, k, q = 0, C A, kk q, { } where C A, k = max 1.01 A+ Aσ 0 1 4 4 k 1 4, 3η0 η 0 +1 Aσ 0 1 4 9+11η 0 k 1 4 with η 0 = A. The followig propositio establishes the coverage property ad the expected legth of CI 0 α Z, k, q.

HIGH-DIMENSIONAL ACCURACY ASSESSMENT 17 Propositio 3. Suppose k ad β L is the estimator defied i.10 with A > 4. For 1 q, the cofidece iterval CI 0 α Z, k, q defied i 3.15 satisfies 3.16 lim if,p ad if θ Θ 0 k P θ β β q CI 0 α Z, k, q = 1, 3.17 L CI 0 α Z, k, q, Θ 0 k k q σ 0. I particular, for the case q =, 3.16 ad 3.17 also hold for the estimator β L defied i.10 with A >. This result shows that the cofidece iterval CI 0 α Z, k, q achieves the miimax rate give i 3.14. I cotrast to the l loss where the two-sided cofidece iterval 3.8 is sigificatly shorter tha the oe-sided iterval ad achieves the optimal rate over the regime k, for the l q loss with 1 q <, the oe-sided cofidece iterval achieves the optimal rate give i 3.14. We ow cosider adaptivity of cofidece itervals. The followig theorem establishes the lower bouds for L α Θ 0 k 1, Θ 0 k, β, l q with 1 q <. Theorem 6. Suppose 0 < α < 1 4, 1 q < ad the sparsity levels k 1, k ad k 0 satisfy Assumptio B. For ay estimator β satisfyig Assumptio A with β 0 k 0, the there is some costat c > 0 such that 3.18 L α ck Θ 0 k 1, Θ 0 k, β, l q ck ck q q 1 1 q 1 k 1 σ 0 if σ 0 if k 1 k σ 0 if k 1 ; k k 1 k ;. I particular, if p ad β L is the Lasso estimator defied i.10 with A > 4, the above lower bouds ca be achieved. The lower bouds of Theorem 6 imply that of Theorem 5 ad both lower bouds hold for a geeral class of estimators satisfyig Assumptio A. However, the lower boud 3.18 i Theorem 6 has a sigificatly differet meaig from 3.13 i Theorem 5 where 3.18 quatifies the cost of adaptatio without kowig the sparsity level. For the Lasso estimator

18 T. T. CAI AND Z. GUO β L defied i.10, by comparig Theorem 5 ad Theorem 6, we obtai L α Θ 0 k 1, Θ 0 k, β L, l q L α Θ 0 k 1, β L, l q if k 1 k, which implies the impossibility of costructig adaptive cofidece itervals for the case 1 q <. There exists marked differece betwee the case 1 q < ad the case q =, where it is possible to costruct adaptive cofidece itervals over the regime k. For the Lasso estimator β L defied i.10, it is show i Propositio 3 that the cofidece iterval CI 0 α Z, k, q defied i 3.15 achieves the lower q boud k σ 0 of 3.18. The lower bouds k q 1 k 1 σ 0 ad k q 1 1 σ0 of 3.18 ca be achieved by the followig proposed cofidece iterval, 3.19 CI α Z, k, q = ψ Z 1 χ 1 α σ 0, 16k q 1 ψ Z 1 χ α σ 0, where ψ Z is give i 3.9. The above claim is verified i Propositio 4. Note that the cofidece iterval CI 1 α Z defied i 3.8 is a special case of CI α Z, k, q with q =. Propositio 4. Suppose p, k 1 k ad β L is defied i.10 with A > 4. The CI α Z, k, q defied i 3.19 satisfies, 3.0 lim if,p ad if θ Θ 0 k P θ + β β q CI α Z, k, q 3.1 L CI α Z, k, q, Θ 0 k 1 k q 1 1 α, k 1 + 1 σ 0. 4. Miimaxity ad adaptivity of cofidece itervals over Θk. I this sectio, we focus o the case of ukow Σ ad σ ad establish the miimax expected legth of cofidece itervals for β β q with 1 q over Θk defied i.4. We also study the possibility of adaptivity of cofidece itervals for β β q. The followig theorem establishes the lower bouds for the bechmark quatities L α Θ k i, β, l q with i = 1, ad L α Θ k 1, Θ k, β, l q. Theorem 7. Suppose that 0 < α < 1 4, 1 q ad the sparsity levels k 1, k ad k 0 satisfy Assumptio B. For ay estimator β satisfyig +

HIGH-DIMENSIONAL ACCURACY ASSESSMENT 19 Assumptio A1 at θ 0 = β, I, σ 0 with β 0 k 0, there is a costat c > 0 such that 4.1 L α 4. L α Θ k i, β, l q q cki {θ 0 }, Θ k, β, l q, for i = 1, ; q ck. I particular, if β SL is the scaled Lasso estimator defied i.14 with A >, the the above lower bouds ca be achieved. The lower bouds 4.1 ad 4. hold for ay β satisfyig Assumptio A1 at a iterior poit θ 0 = β, I, σ 0, icludig the scaled Lasso estimator as a special case. We demostrate the impossibility of adaptivity of cofidece itervals for the l q loss of the scaled Lasso estimator β SL. Sice L α Θ k 1, Θ k, β SL, l q L α {θ 0 }, Θ k, β SL, l q, by 4., we have L α Θ k 1, Θ k, β SL, l q L α Θ k 1, β SL, l q if k 1 k. The compariso of L α Θ k 1, β SL, l q ad L α Θ k 1, Θ k, β SL, l q is illustrated i Figure 4. Referrig to the adaptivity defied i 3.4, it is impossible to costruct adaptive cofidece itervals for β SL β q. Θ k k Θ k Ω Θ k k Fig 4. Illustratio of L α Θ k 1, β SL, l q left ad L α Θ k 1, Θ k, β SL, l q right. Theorem 7 shows that for ay cofidece iterval CI α β, lq, Z for the loss of ay give estimator β satisfyig Assumptio A1, uder the coverage costrait that CI α β, lq, Z I α Θ k, β, l q, its expected legth at ay give θ 0 = β q, I, σ Θ k 0 must be of order k. I cotrast to Theorem 4 ad 6, Theorem 7 demostrates that cofidece itervals must be log at a large subset of poits i the parameter space, ot just at a small

0 T. T. CAI AND Z. GUO umber of ulucky poits. Therefore, the lack of adaptivity for cofidece itervals is ot due to the coservativeess of the miimax framework. I the followig, we detail the costructio of cofidece itervals for β SL β q. The costructio of cofidece itervals is based o the followig defiitio of restricted eigevalue, which is itroduced i [4], 4.3 κx, k, s, α 0 = mi J 0 {1,,p}, J 0 k mi δ 0, δ J c 1 α 0 δ J0 1 0 Xδ δj01, where J 1 deotes the subset correspodig to the s largest i absolute value coordiates of δ outside of J 0 ad J 01 = J 0 J 1. Defie the evet B = {ˆσ }. The cofidece iterval for β SL β q is defied as { [0, φ Z, k, q] o B 4.4 CI α Z, k, q = {0} o B c, where φ Z, k, q = mi 16Amax X j σ κ X, max k, k, 3 X j mi X j k q, k q Remark. The restricted eigevalue κ X, max k, k, 3 X j mi X j is computatioally ifeasible. For desig covariace matrix Σ of special structures, the restricted eigevalue ca be replaced by its lower boud ad a computatioally feasible cofidece iterval ca be costructed. See Sectio 4.4 i [7] for more details. Properties of CI α Z, k, q are established as follows. Propositio 5. Suppose k ad β SL is the estimator defied i.14 with A >. For 1 q, the CI α Z, k, q defied i 4.4 satisfies the followig properties, 4.5 lim if,p if θ Θk P θ β β q CI α Z, k, q 4.6 L CI α Z, k, q, Θ k k q. = 1, Propositio 5 shows that the cofidece iterval CI α Z, k i, q defied i 4.4 achieves the lower boud i 4.1, for i = 1,, ad the cofidece iterval CI α Z, k, q defied i 4.4 achieves the lower boud i 4.. σ.

HIGH-DIMENSIONAL ACCURACY ASSESSMENT 1 5. Estimatio of the l q loss of rate-optimal estimators. We have established miimax lower bouds for the estimatio accuracy of the loss of a broad class of estimators β satisfyig A1 or A ad also demostrated that such miimax lower bouds are sharp for the Lasso ad scaled Lasso estimators. We ow show that the miimax lower bouds are sharp for the class of rate-optimal estimators satisfyig the followig Assumptio A. A The estimator β satisfies, 5.1 sup θ Θk P θ β β q C q β 0 Cp δ, for all k, where δ > 0, C > 0 ad C > 0 are costats ot depedig o k,, or p. We say a estimator β is rate-optimal if it satisfies Assumptio A. As show i [1, 4, 3, 6], Lasso, Datzig Selector, scaled Lasso ad squareroot Lasso are rate-optimal whe the tuig parameter is chose properly. We shall stress that Assumptio A implies Assumptios A1 ad A. Assumptio A requires the estimator β to perform well over the whole parameter space Θk while Assumptios A1 ad A oly require β to perform well at a sigle poit or over a proper subset. The followig propositio shows that the miimax lower bouds established i Theorem 1 to Theorem 7 ca be achieved for the class of rate-optimal estimators. Propositio 6. Let β be a estimator satisfyig Assumptio A. 1. There exist poit or iterval estimators of the loss β β q with 1 q < achievig, up to a costat factor, the miimax lower bouds.9 i Theorem 1 ad 3.13 i Theorem 5 ad estimators of loss β β q with 1 q achievig, up to a costat factor, the miimax lower bouds.13 i Theorem ad 4.1 ad 4. i Theorem 7.. Suppose that the estimator β is costructed based o the subsample Z 1 = y 1, X 1, the there exist estimators of the loss β β achievig, up to a costat factor, the miimax lower bouds.8 i Theorem 1, 3.5 i Theorem 3 ad 3.7 i Theorem 4. 3. Suppose the estimator β is costructed based o the subsample Z 1 = y 1, X 1 ad it satisfies Assumptio A with δ > ad 5. sup P θ β β S c 1 c β β S 1 where S = suppβ Cp δ, θ Θk

T. T. CAI AND Z. GUO for all k. The for p there exist estimators of the loss β β q with 1 q < achievig the lower bouds give i 3.18 i Theorem 6. For reasos of space, we do ot discuss the detailed costructio for the poit ad iterval estimators achievig these miimax lower bouds here ad postpoe the costructio to the proof of Propositio 6. Remark 3. Sample splittig has bee widely used i the literature. For example, the coditio that β is costructed based o the subsample Z 1 = y 1, X 1 has bee itroduced i [] for costructig cofidece sets for β ad i [0] for costructig cofidece itervals for the l loss. Such a coditio is imposed purely for techical reasos to create idepedece betwee the estimator β ad the subsample Z = y, X, which is useful to evaluate the l q loss of the estimator β. As show i [4], the assumptio 5. is satisfied for Lasso ad Datzig Selector. This techical assumptio is imposed such that β β 1 ca be tightly cotrolled by β β. 6. Geeral tools for miimax lower bouds. A major step i our aalysis is to establish rate sharp lower bouds for the estimatio error ad the expected legth of cofidece itervals for the l q loss. We itroduce i this sectio ew techical tools that are eeded to establish these lower bouds. A sigificat distictio of the lower boud results give i the previous sectios from those for the traditioal parameter estimatio problems is that the costrait is o the performace of the estimator β of the regressio vector β, but the lower bouds are o the difficulty of estimatig its loss β β q. It is ecessary to develop ew lower boud techiques to establish rate-optimal lower bouds for the estimatio error ad the expected legth of cofidece itervals for the loss β β q. These techical tools may also be of idepedet iterest. We begi with otatio. Let Z deote a radom variable whose distributio is idexed by some parameter θ Θ ad let π deote a prior o the parameter space Θ. We will use f θ z to deote the desity of Z give θ ad f π z to deote the margial desity of Z uder the prior π. Let P π deote the distributio of Z correspodig to f π z, i.e., P π A = 1 z A f π z dz, where 1 z A is the idicator fuctio. For a fuctio g, we write E π gz for the expectatio uder f π. More specifically, f π z = f θ z π θ dθ ad E π gz = g z f π z dz. The L 1 distace betwee two probability distributios with desities f 0 ad f 1 is give by L 1 f 1, f 0 = f 1 z f 0 z dz.

HIGH-DIMENSIONAL ACCURACY ASSESSMENT 3 The followig theorem establishes the miimax lower bouds for the estimatio error ad the expected legth of cofidece itervals for the l q loss, uder the costrait that β is a good estimator at at least oe iterior poit. Theorem 8. Suppose 0 < α, α 0 < 1 4, 1 q, Σ 0 is positive defiite, θ 0 = β, Σ 0, σ 0 Θ, ad F Θ. Defie d = mi θ F β θ β q. Let π deote a prior over the parameter space F. If a estimator β satisfies 6.1 P θ0 β β q 1 16 d 1 α 0, the 6. if L q ad 6.3{θ 0 }, Θ, β, l q L α sup P θ L q β β q 1 θ {θ 0 } F 4 d c 1, = if E θ 0 L CI α β, lq, Z c d, CI α β,l q,z I αθ, β,l q { where c 1 = mi 1 10, 9 10 α 0 L 1 f π, f θ0 } ad c + = 1 1 α α 0 L 1 f π, f θ0 +. Remark 4. The miimax lower boud 6. for the estimatio error ad 6.3 for the expected legth of cofidece itervals hold as log as the estimator β estimates β well at a iterior poit θ 0. Besides Coditio 6.1, aother key igrediet for the lower bouds 6. ad 6.3 is to costruct the least favorable space F with the prior π such that the margial distributios f π ad f θ0 are o-distiguishable. For the estimatio lower boud 6., costraiig that β β q ca be well estimated at θ 0, due to the o-distiguishability betwee f π ad f θ0, we ca establish that the loss β β q caot be estimated well over F. For the lower boud 6.3, by Coditio 6.1 ad the o-distiguishability betwee f π ad f θ0, we will show that β β q over F is much larger tha β β q ad hece the hoest cofidece itervals must be sufficietly log. Theorem 8 is used to establish the miimax lower bouds for both the estimatio error ad the expected legth of cofidece itervals of the l q loss over Θk. By takig θ 0 Θk 0 ad Θ = Θk, Theorem follows from 6. with a properly costructed subset F Θk. By takig θ 0 Θk 0 ad Θ = Θk, the lower boud 4. i Theorem 7 follows from 6.3 with

4 T. T. CAI AND Z. GUO a properly costructed F Θk. I both cases, Assumptio A1 implies Coditio 6.1. Several miimax lower bouds over Θ 0 k ca also be implied by Theorem 8. For the estimatio error, the miimax lower bouds.8 ad.9 over the regime k i Theorem 1 follow from 6.. For the expected legth of cofidece itervals, the miimax lower bouds 3.7 i Theorem 4 ad 3.18 i the regios k 1 k ad k 1 k i Theorem 6 follow from 6.3. I these cases, Assumptio A1 or A ca guaratee that Coditio 6.1 is satisfied. However, the miimax lower bouds for estimatio error.9 i the regio k ad for the expected ca- legth of cofidece itervals 3.18 i the regio k 1 k ot be established usig the above theorem. The followig theorem, which requires testig a composite ull agaist a composite alterative, establishes the refied miimax lower bouds over Θ 0 k. Theorem 9. Let 0 < α, α 0 < 1 4, 1 q, ad θ 0 = β, Σ 0, σ 0 where Σ 0 is a positive defiite matrix. Let k 1 ad k be two sparsity levels. Assume that for i = 1, there exist parameter spaces F i {β, Σ 0, σ 0 : β 0 k i } such that for give dist i ad d i β θ β Σ 0 β θ β = dist i ad β θ β q = d i, for all θ F i. Let π i deote a prior over the parameter space F i for i = 1,. Suppose that for θ 1 = β, Σ 0, σ0 + 1 dist ad θ = β, Σ 0, σ0 + dist, there exist costats c 1, c > 0 such that 6.4 P θi β β q c i d i 1 α 0, for i = 1,. The we have 6.5 if L q ad 6.6 L α sup P θ L q β β q c 3d c 3, θ F 1 F Θ 0 k 1, Θ 0 k, β, l q c 4 { where c 1 3 = mi, 1 c 4 1 1 + 4 c1 d 1 d { ad c 3 = mi 1 c d 1 + c 1 d 1, + 1 10, 9 10 α 0 i=1 L 1 f πi, f θi L 1 f π, f π1 + }, c 4 = 1 α 0 α i=1 L1 fπ, f i θ i L 1 f π, f π1 + }. +

HIGH-DIMENSIONAL ACCURACY ASSESSMENT 5 Remark 5. As log as the estimator β performs well at two poits, θ 1 ad θ, the miimax lower bouds 6.5 for the estimatio error ad 6.6 for the expected legth of cofidece itervals hold. Note that θ i i the above theorem does ot belog to the parameter space {β, Σ 0, σ 0 : β 0 k i }, for i = 1,. I cotrast to Theorem 8, Theorem 9 compares composite hypotheses F 1 ad F, which will lead to a sharper lower boud tha comparig the simple ull {θ 0 } with the composite alterative F. For simplicity, we costruct least favorable parameter spaces F i such that the poits i F i is of fixed geeralized l distace ad fixed l q distace to β, for i = 1,, respectively. More importatly, we costruct F 1 with the prior π 1 ad F with the prior π such that f π1 ad f π are ot distiguishable, where θ 1 ad θ are itroduced to facilitate the compariso. By Coditio 6.4 ad the costructio of F 1 ad F, we establish that the l q loss caot be simultaeously estimated well over F 1 ad F. For the lower boud 6.6, uder the same coditios, it is show that the l q loss over F 1 ad F are far apart ad ay cofidece iterval with guarateed coverage probability over F 1 F must be sufficietly log. Due to the prior iformatio Σ = I ad σ = σ 0, the lower boud costructio over Θ 0 k is more ivolved tha that over Θk. We shall stress that the costructio of F 1 ad F ad the compariso betwee composite hypotheses are of idepedet iterest. The miimax lower boud.9 i the regio k follows from 6.5 ad the miimax lower boud 3.18 i the regio k 1 k for the expected legth of cofidece itervals follows from 6.6. I these cases, Σ 0 is take as I ad Assumptio A implies Coditio 6.4. 7. A itermediate settig with kow σ = σ 0 ad ukow Σ. The results give i Sectios 3 ad 4 show the sigificat differece betwee Θ 0 k ad Θk i terms of miimaxity ad adaptivity of cofidece itervals for β β q. Θ 0 k is for the simple settig with kow desig covariace matrix Σ = I ad kow oise level σ = σ 0, ad Θk is for ukow Σ ad σ. I this sectio, we further cosider miimaxity ad adaptivity of cofidece itervals for β β q i a itermediate settig where the oise level σ = σ 0 is kow ad Σ is ukow but of certai structure. Specifically, we cosider the followig parameter space, 7.1 1 β 0 k, λ mi Σ λ max Σ M 1 M 1 Θ σ0 k, s = β, Σ, σ 0 : Σ 1 L1 M, max Σ 1 1 i p i 0 s,