Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications

Similar documents
Accuracy Assessment for High-Dimensional Linear Regression

ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania and Rutgers University

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Confidence Intervals for High-Dimensional Linear Regression: Minimax Rates and Adaptivity

Properties and Hypothesis Testing

A statistical method to determine sample size to estimate characteristic value of soil parameters

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Lecture 2: Monte Carlo Simulation

Efficient GMM LECTURE 12 GMM II

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

32 estimating the cumulative distribution function

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Random Variables, Sampling and Estimation

Empirical Process Theory and Oracle Inequalities

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data

1 Inferential Methods for Correlation and Regression Analysis

Introductory statistics

Stochastic Simulation

Statistics 511 Additional Materials

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

MA Advanced Econometrics: Properties of Least Squares Estimators

Estimation for Complete Data

Optimal Estimation of Genetic Relatedness in High-dimensional Linear Models

Chapter 6 Sampling Distributions

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

REGRESSION WITH QUADRATIC LOSS

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Statistical Inference Based on Extremum Estimators

1 Introduction to reducing variance in Monte Carlo simulations

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

CONFIDENCE INTERVALS FOR HIGH-DIMENSIONAL LINEAR REGRESSION: MINIMAX RATES AND ADAPTIVITY 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania

Regression with an Evaporating Logarithmic Trend

Bayesian Methods: Introduction to Multi-parameter Models

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Output Analysis (2, Chapters 10 &11 Law)

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Lecture 33: Bootstrap

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Optimal Estimation of Co-heritability in High-dimensional Linear Models

This is an introductory course in Analysis of Variance and Design of Experiments.

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Output Analysis and Run-Length Control

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Statistical Analysis on Uncertainty for Autocorrelated Measurements and its Applications to Key Comparisons

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Infinite Sequences and Series

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

STAT431 Review. X = n. n )

Lecture 19: Convergence

An Introduction to Randomized Algorithms

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

6.867 Machine learning, lecture 7 (Jaakkola) 1

Singular Continuous Measures by Michael Pejic 5/14/10

Regression with quadratic loss

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Rank tests and regression rank scores tests in measurement error models

1 Review of Probability & Statistics

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Topic 9: Sampling Distributions of Estimators

5. Likelihood Ratio Tests

CHAPTER 10 INFINITE SEQUENCES AND SERIES

Summary. Recap ... Last Lecture. Summary. Theorem

Lecture 10 October Minimaxity and least favorable prior sequences

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Vector Quantization: a Limiting Case of EM

A proposed discrete distribution for the statistical modeling of

x a x a Lecture 2 Series (See Chapter 1 in Boas)

7.1 Convergence of sequences of random variables

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Probability, Expectation Value and Uncertainty

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Optimally Sparse SVMs

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Quantile regression with multilayer perceptrons.

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

11 THE GMM ESTIMATION

Stat 421-SP2012 Interval Estimation Section

1 Review and Overview

Problem Set 4 Due Oct, 12

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

THE SPECTRAL RADII AND NORMS OF LARGE DIMENSIONAL NON-CENTRAL RANDOM MATRICES

Correlation Regression

Rates of Convergence by Moduli of Continuity

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

Pattern Classification, Ch4 (Part 1)

Asymptotic Results for the Linear Regression Model

Transcription:

Semi-supervised Iferece for Explaied Variace i High-dimesioal Liear Regressio ad Its Applicatios T. Toy Cai ad Zijia Guo Uiversity of Pesylvaia ad Rutgers Uiversity March 8, 08 Abstract We cosider statistical iferece for the explaied variace β Σβ uder the highdimesioal liear model Y = Xβ + ɛ i the semi-supervised settig, where β is the regressio vector ad Σ is the desig covariace matrix. A calibrated estimator, which efficietly itegrates both labelled ad ulabelled data, is proposed. It is show that the estimator achieves the miimax optimal rate of covergece i the geeral semisupervised framework. The optimality result characterizes how the ulabelled data affects the miimax optimal rate. Moreover, the limitig distributio for the proposed estimator is established ad data-drive cofidece itervals for the explaied variace are costructed. We further develop a radomized calibratio techique for statistical iferece i the presece of weak sigals ad apply the obtaied iferece results to a rage of importat statistical problems, icludig sigal detectio ad global testig, predictio accuracy evaluatio, ad cofidece ball costructio. The umerical performace of the proposed methodology is demostrated i simulatio studies ad a aalysis of estimatig heritability for a yeast segregat data set with multiple traits. Keywords: Cofidece iterval, cofidece ball, heritability, predictio accuracy, sigal detectio, miimaxity, sparsity. Itroductio High-dimesioal liear models are ubiquitous i cotemporary statistical modelig with a wide rage of applicatios i may scietific fields. The early focus has bee maily o Departmet of Statistics, The Wharto School, Uiversity of Pesylvaia, Philadelphia, PA 904. The research of Toy Cai was supported i part by NSF Grat DMS-7735 ad NIH Grat R0 GM- 3056.

developig methods for the recovery of the whole regressio vector via pealized or costraied l miimizatio approaches. Examples iclude the Lasso [34], Datzig Selector [4], MCP [4], square-root Lasso [4], ad scaled Lasso [33]. There have bee sigificat recet iterests i statistical iferece for low-dimesioal fuctioals, icludig cofidece itervals ad hypothesis testig for idividual regressio coefficiets [43, 35, 7, 6], miimaxity ad adaptivity of cofidece itervals for geeral liear fuctioals [7], estimatio of the sigal-to-oise-ratio [38, 4], iferece for the l q accuracy of a give estimator [8], ad estimatio of quadratic fuctioals [4, ]. Motivated by a rage of applicatios, the preset paper cosiders statistical iferece for the explaied variace, which is a oe-dimesioal weighted quadratic fuctioal, i the high-dimesioal ad semi-supervised settig. We first develop i detail the theory for optimal estimatio of the explaied variace, which also leads to the costructio of cofidece itervals. The results are the applied to several other importat statistical iferece problems.. Problem Formulatio ad Motivatios We cosider the high-dimesioal liear model with a radom desig, y i = Xi β + ɛ i, for i where y i R ad X i R p deote respectively the outcome ad the measured covariates of the i-th observatio, ɛ i deotes the error ad β R p deotes the high-dimesioal regressio vector. The rows X i are i.i.d. p-dimesioal sub-gaussia radom vectors with mea 0 ad covariace matrix Σ ad the errors {ɛ i } i are i.i.d sub-gaussia radom variable with mea 0 ad variace σ ad idepedet of {X i } i. The explaied variace uder the regressio model is represeted by the weighted quadratic fuctioal of β, Q = β Σβ. We study estimatio ad iferece for the explaied variace i the semi-supervised settig, where the data is a combiatio of the labelled data {y i, X i } i i the regressio model ad the ulabelled data {X i } + i +N. Here the measured covariates of both the labelled ad ulabelled data are assumed to be idepedet ad follow the same distributio. The more covetioal supervised settig is treated as a special case. The settig of semi-supervised learig is commoly see i applicatios where the outcomes are more expesive to collect tha the covariates. For example, i the aalysis of Electroic Health Records EHR databases, the covariates are easy to be automatically extracted while labellig of the outcomes is costly ad time-cosumig [5, 9]. I additio,

semi-supervised learig aturally arises i the itegrative aalysis of multiple geetics data sets where the covariates are the same across all data sets but the outcomes measured vary from study to study due to the specific purposes of idividual studies [36]. This ca be aturally formulated as semi-supervised learig, where the pre-specified outcome is oly measured over oe or several but ot all data sets while the covariates are measured across all data sets. See [5, 9, 3, 4] for more discussio about semi-supervised learig. The developmet of the optimal estimator ad cofidece itervals for Q = β Σβ i the semi-supervised settig alog with the correspodig statistical aalysis is of sigificat iterest o its ow right ad poses may challeges. This iferece problem is also closely coected to several other importat statistical problems.. Heritability. Heritability is amog the most importat geetics cocepts. Highdimesioal liear regressio has prove to be useful i modelig the pheotypegeotype relatioship i the presece of the large amout of geetic variats [3,, 38, 4]. Uder the liear model with the outcome ormalized to have uit variace, oe heritability measure defied i the literature is the quadratic fuctioal, β Σβ, which measures the total variace explaied by geetic variats [4, 38].. Sigal-to-Noise Ratio ad Proportio of Variace Explaied. Sigal-to- Noise Ratio SNR ad Proportio of Variace Explaied PVE are importat statistics cocepts ad are defied respectively as β Σβ/σ ad β Σβ/β Σβ + σ uder model. The quadratic fuctioal β Σβ is cetral to SNR ad PVE. Together with a good estimator of σ [, 33, 4], the results for β Σβ established i this paper are useful for iferece of SNR ad PVE. 3. Sigal Detectio ad Global Testig. Iferece for the explaied variace ca be applied to testig the global hypothesis H 0 : β = β ull for β ull R p, which icludes sigal detectio as a special case with β ull = 0. The coectio is revealed i the followig adjusted liear model, y i X i βull = X i β β ull + ɛ i for i. 3 Uder model 3, testig for H 0 : β = β ull is recast as testig the hypotheses H 0 : β β ull Σ β β ull = 0 versus H : β β ull Σ β β ull > 0. 4. Predictio Accuracy Assessmet. Accuracy assessmet is of sigificat importace i applicatios. Iferece for the explaied variace is useful for assessig the out-of-sample predictio accuracy of a give estimator. We use X 0, y 0 to deote the set of traiig observatios ad ˇβ to deote a give estimator of β based o the traiig data set X 0, y 0 ; we use X, y to deote the set of test 3

observatios. The predictio accuracy for a future observatio x ew is defied as E xew x ew ˇβ β = ˇβ β Σ ˇβ β. To obtai the iferece results for this quatity, we rely o the followig adjusted liear model, y i X i ˇβ = X i β ˇβ + ɛi for i. 4 Iferece results developed for the explaied variace ca be applied to 4 to obtai the correspodig results for the predictio accuracy E xew x ew ˇβ β. 5. Cofidece Ball for β. Costructio of cofidece balls for β is aother importat applicatio of iferece for explaied variace studied i this paper. We use ˇβ to deote a pre-specified estimator of β, which serves as the ceter of the costructed cofidece ball. Based o the adjusted liear model 4, a cofidece iterval LZ, UZ for the explaied variace ˇβ β Σ ˇβ β leads to a atural cofidece ball for β, { } β : β ˇβ λ mi Σ UZ, 5 where λ mi Σ deotes the smallest eigevalue of Σ. The close coectios to the above statistical applicatios provide further motivatios for studyig the iferece problem for β Σβ. I Sectio 4, we demostrate i detail how to apply the obtaied results for β Σβ to tackle some of these statistical applicatios.. Results ad Cotributios We itroduce a ew estimator, Calibrated High-dimesioal Iferece for Variace Explaied CHIVE, i the semi-supervised settig. The CHIVE estimator for Q = β Σβ is costructed i two steps, which together efficietly itegrate both labelled ad ulabelled data. The first step is to plug i the estimators of β ad Σ, deoted by β ad Σ, respectively, ad the secod step is to calibrate this plug-i estimator β Σ β through estimatig its estimatio error. The secod step is essetial i rebalacig the bias ad variace to improve the estimatio accuracy. The calibratio techique is a geeral machiery as it ca take differet forms of β ad Σ as its iputs. This flexibility is quite useful i the semi-supervised settig, where the ulabelled data ca be efficietly used to estimate the desig covariace matrix Σ. We show the optimality of CHIVE by establishig the miimax optimal rate of covergece for estimatig β Σβ i the geeral semi-supervised settig. We also quatify the ucertaity of the CHIVE estimator by establishig its limitig distributio uder stroger coditios. Data-drive cofidece itervals for β Σβ are costructed based o the limitig distributio. The supervised settig ad the settig with kow desig covariace matrix 4

are discussed as the special cases with N = 0 ad N =, respectively. We further develop a radomized calibratio techique for statistical iferece i the presece of weak sigals ad apply the obtaied results to several importat statistical problems. The umerical performace of the proposed methodology is demostrated i simulatio studies ad a real data aalysis of estimatig heritability for a yeast segregat data set with multiple traits. The mai cotributios of the preset paper are three-fold.. We propose a ovel estimator, CHIVE, for the explaied variace β Σβ that efficietly uses both the labelled ad ulabelled data ad is show to achieve the miimax optimal rate. The results characterize how the ulabelled data affects the miimax optimal rate for estimatig β Σβ. Specifically, the optimal rate is β / + β / N + + k log p/, where p is the dimesio, is the size of the labelled data, N is the size of the ulabelled data, ad k ad β deote respectively the sparsity ad the l orm of β. It is iterestig to ote that the ulabelled data oly helps reduce the covergece rate β / N + but ot the other two terms.. We quatify the ucertaity for the CHIVE estimator through establishig its limitig distributio. It is show that the limitig distributio is ormal ad its variace depeds o the proportio of the labelled data. The result is the used for the costructio of data-drive cofidece itervals for β Σβ. 3. The iferece results obtaied i this paper are applied to i sigal detectio ad global testig, ii predictio accuracy evaluatio, ad iii cofidece ball costructio. For sigal detectio, we cotrol the type I error ad characterize the type II error by establishig the power fuctio uder a local alterative. The results ca be easily exteded to the geeral global testig problem. For evaluatio of out-of-sample predictio accuracy of a give sparse estimator, both the poit ad iterval estimators are developed. We establish the estimatio error boud for the poit estimator of the predictio accuracy ad cotrol the legth of the correspodig cofidece iterval. A cofidece ball for the regressio vector β with cotrolled radius is also costructed. We stress that these procedures are data-drive ad do ot require a priori kowledge of the desig covariace matrix Σ or the oise level σ. See more details i Sectio 4 ad the related umerical performace i Sectios 5. ad 5.3. A cetral questio i semi-supervised learig is how to efficietly use both labelled ad ulabelled data to coduct statistical iferece [5, 9]. The results obtaied i the preset paper illustrate how the ulabelled data ca facilitate statistical iferece for the explaied variace ad also the related statistical applicatios. 5

.3 Related Work Estimatio ad iferece for quadratic fuctioals have bee studied i the literature i a rage of settigs. I particular, miimax ad adaptive estimatio of quadratic fuctioals plays a importat role i oparametric iferece ad has bee well studied i desity estimatio, oparametric regressio, ad white oise with drift model. See, for example, [5, 7, 8, 8,,, 6]. I high-dimesioal liear regressio, estimatio ad iferece for quadratic fuctioals has also bee studied i [4, 38, ]. I particular, [38] ad [] cosidered estimatio of β Σβ/σ ad β, respectively, but ot the ucertaity quatificatio problem. [4] studied the costructio of cofidece itervals for β uder the settig of Σ = I, moderate dimesio where /p ξ 0, ad o sparsity assumptio o β. The iferece problem i sparse high-dimesioal liear regressio cosidered i the curret paper is sigificatly differet from the settig cosidered i [4], maily due to the complicated geometry iduced by the sparsity structure ad the ukow desig covariace matrix Σ. Other works related to quadratic fuctioal iferece iclude costructio of cofidece itervals for the l loss of the estimator cosidered i [8] ad iferece for treatmet effect ad edogeeity parameter i istrumetal variable regressio [, 0]. I additio, [5, 44] cosidered hypothesis testig for high-dimesioal liear regressio. The statistical applicatios studied i this paper have also bee cosidered separately i the literature. Sigal detectio was studied i [3, ] uder the liear model i a special settig where the desig covariace matrix Σ is equal to or closed to the idetity matrix. I this settig, [3, ] established optimal sigal detectio method ad theory. The obtaied iferece results i the preset paper eable the study of the sigal detectio problem uder a geeral settig where the desig covariace matrix Σ is ukow. The cofidece ball costructio for the whole regressio vector was cosidered i [30] i the case of kow σ ad the optimal size ad possibility of adaptive cofidece balls was also established. The results obtaied i the curret paper lead to a cofidece ball costructio for β i the case of ukow σ. A problem related to predictio accuracy is iferece for the estimatio accuracy, which was cosidered i [8, 4]. However, iferece for the predictio accuracy ad that for the estimatio accuracy are quite differet problems..4 Orgaizatio of the Paper The rest of the paper is orgaized as follows. I Sectio, we itroduce i detail the CHIVE estimator ad establish its miimax rate optimality. Sectio 3 focuses o quatifyig the ucertaity of the CHIVE estimator ad costructio of cofidece itervals for β Σβ. We apply i Sectio 4 the developed procedures to tackle three importat problems, sigal de- 6

tectio ad global testig, predictio accuracy evaluatio ad cofidece ball costructio. Simulatio results are give i Sectio 5 ad a aalysis of a yeast data set is preseted i Sectio 6. A discussio is provided i Sectio 7 ad the proofs are give i Sectio 8. Additioal proofs ad simulatio results are preseted i the appedix. Optimal Estimatio of β Σβ I this sectio, we first itroduce the calibratio methodology for estimatig the explaied variace ad the establish the miimax covergece rate for estimatig β Σβ i the geeral semi-supervised framework. The results demostrate the effect of the ulabelled data o the optimal covergece rate. The supervised settig ad the settig with kow desig covariace matrix are the discussed as special cases. We begi with the otatio that will be used i the rest of the paper. We use Z = X, y to deote the data set. For a matrix A, A i, A j, ad A i,j deote respectively the i-th row, j-th colum, ad i, j etry of the matrix A. The spectral orm of A is A = sup x = Ax ad the matrix l orm A L = sup p j p A ij. For a symmetric matrix A, λ mi A ad λ max A deote respectively the smallest ad largest eigevalue of A. For a set S, S deotes the cardiality of S. For a vector x R p, x j deotes the vector without the j-th idex, suppx deotes the support of x ad the l q orm of x is defied as x q = p x i q q for q 0 with x 0 = suppx ad x = max j p x j. For a R, a + = max {a, 0}; For a, b R, a b = max{a, b}. We use c ad C to deote geeric positive costats that may vary from place to place. For p a sequece of radom variables X idexed by, we use X X to represet that X coverges to X i probability. For a sequece of radom variables X ad umbers a, we defie X = o p a if X /a coverges to zero i probability. For two positive sequeces a ad b, a b meas a Cb for all ad a b if b a ad a b if a b a ad b a, ad a b if lim b = 0 ad a b if b a.. Calibratio of Plug-i Estimators For the semi-supervised learig, the data is mixed of the labelled data X, y,, X, y ad the ulabelled data X +,,, X +N, where X,, X, X +,,, X +N are i.i.d realizatios of p-dimesioal covariates. We use β ad Σ to deote certai reasoably good estimators of β ad Σ, which will be specified later. Based o β ad Σ, a atural estimator of the quadratic fuctioal Q = β Σβ is the plug-i estimator β Σ β, which has the followig error decompositio, β Σ β β Σβ = β Σ β β β β Σ β β + β Σ Σ β. 6 7

Based o the above decompositio, the estimator β Σ β ca be further improved sice the estimatio error due to the first term β Σ β β o the right had side of 6 ca be further reduced. We estimate the term β Σ β β i the error decompositio 6 by β X i y i X i β ad propose the followig calibrated estimator, Q β, Σ, Z = β Σ β + β X i y i X i β. 7 This estimator is referred to as the CHIVE estimator, as a shorthad for Calibrated Highdimesioal Iferece for Variace Explaied. The calibratio step i 7 is essetially to improve the plug-i estimator β Σ β through re-balacig the bias ad variace. The calibrated estimator requires three iputs, the iitial estimators β ad Σ ad the data Z = X, y. With this machiery, it remais to propose iitial estimators for β ad Σ. We begi with estimators for β ad the move o to the estimators for Σ. Throughout the paper, without special otificatio, we make the followig assumptios o the estimators β ad σ. B With probability larger tha γ where γ 0, the estimator β satisfies } k log p log p max { X β β, β β, β β k. B σ is a cosistet estimator of σ, that is, σ σ p 0. Examples of estimators satisfyig B ad B. The scaled lasso estimator { β, σ } defied i the followig equatio 8 has bee show i [33] to satisfy B ad B uder regularity coditios, y Xβ { β, σ} = arg mi β R p,σ R + σ See also Lemma i [] for more details. + σ +.0 log p p j= X j β j. 8 Sice the square root lasso estimator [4] is umerically the same with the scale Lasso estimator, the square root lasso estimators of β ad σ also satisfy B ad B. I additio, with a prior kowledge of σ, the Lasso estimator of β ad other variats are also show to satisfy the above coditio B; see [4, 4, 39] for more details. Now, we tur to the estimators of Σ. The additioal ulabelled data is useful for estimatig the desig covariace matrix Σ. We pool the iformatio cotaied i both the labelled ad ulabelled data ad estimate Σ by Σ S = +N +N X i Xi. The we use β ad Σ S as iputs ad utilize the calibratio idea itroduced i 7, Q β, ΣS, Z = β ΣS β + β X i y i X β, i where Σ S = +N X i X i. 9 8

Whe there is o cofusio, we use Q to deote the estimator proposed i 9. We first itroduce the followig regularity coditios ad the establish the covergece rate of the proposed estimator i 9 i Theorem. A The rows X i are i.i.d. p-dimesioal sub-gaussia radom vectors with mea 0 ad covariace matrix Σ with /M λ mi Σ λ max Σ M for M ; The errors {ɛ i } i are idepedet of {X i } i +N ad follow i.i.d sub-gaussia radom variable with mea zero ad variace σ ; The high-dimesioal vector β is assumed to be of sparsity k; A There exists some positive costat c 0 > 0 such that E β X X β β Σβ > c0. Assumptio A requires that the spectrum of the covariace matrices Σ is bouded away from zero ad ifiity ad that the oise level σ is upper bouded by a costat. Assumptio A also assumes that both the desig ad the oise are sub-gaussia. Defie U = X i β/ β Σβ, where EU = 0 ad EU =. Assumptio A is placed o this radom variable U such that VarU is ot vaishig. This assumptio is imposed such that VarU ca be well estimated ad this type of assumptio has bee itroduced i covariace matrix estimatio literature [0] for the same purpose. Theorem. Suppose that Coditio A holds ad k c/ log p for some costat c > 0. For ay estimator β satisfyig Coditio B, with probability larger tha p c exp c N c t γ, the estimator Q = Q β, ΣS, Z defied i 9 satisfies Q Q t β β N k log p + t + + β N +. 0 Uder the additioal assumptio k /log p ad β k log p/, Q Q where ρ = lim 4σ β Σβ + ρe β X X β N0, β Σβ N+. Remark. Sice Q 0, the covergece rate 0 also holds for Q +, the positive part of Q. To keep the otatio simpler, we oly preset the results for Q throughout this paper. This covergece rate established i 0 is show to be optimal i Sectio.. Uder the additioal assumptios k /log p ad β k log p/, we establish a more refied distributioal result i. Such ormal limitig distributio is used i Sectio 3 to costruct cofidece itervals for β Σβ. Oe iterestig pheomeo is that the limitig distributio established i depeds o the proportio of the labelled data. If the amout of ulabelled data domiates that of labelled data that is, ρ = 0, the the limitig distributio i is simplified to Q Q 4σ β Σβ N0,. 9

. Optimal Rate of Covergece I this sectio, we further ivestigate the optimality of the proposed estimator 9 by studyig the miimax covergece rate of estimatig β Σβ i the semi-supervised settig ad cosider the followig specific parameter space, Θ k, M = { θ = β, Σ, σ : β 0 k, M/ β M, } λ mi Σ λ max Σ M, σ M, M where M ad M > 0 are positive costats. The parameter space defied i requires the sparsity β 0 k ad M/ β M, where k ad M are allowed to grow with ad p. Here k quatifies the sparsity of β ad M quatifies the sigal stregth of the true sigal β i terms of its l orm. The other coditios /M λ mi Σ λ max Σ M ad σ M are regularity coditios. The followig theorem establishes the miimax lower bouds for the covergece rate of estimatig Q over the parameter space Θk, M. Theorem. Suppose k c mi {/ log p, p ν } for some costats c > 0 ad 0 ν <. The if Q M sup P Q { M Q + mi θ Θk,M N + + k log p }, M 4. 3 I the above theorem, oly the first term i the lower boud is ivolved with the amout of the additioal ulabelled data, that is to say, a larger amout of ulabelled data oly helps lower the term M / N + but ot ay other terms. Theorems ad together show that the estimator proposed i Sectio. is miimax rate optimal uder regularity coditios. Corollary. Suppose that Coditio A holds ad k c mi {/ log p, p ν } for some costats c > 0 ad 0 ν <. For ay estimator β satisfyig Coditio B, the estimator Q defied i 9 is miimax rate optimal over Θk, M where k log p/ M C for some costat C > 0. The above corollary shows that the proposed method attais the optimal covergece rate whe the l orm is relatively strog, that is, β is bouded away from zero by k log p/. As show i Theorem, for the case where M k log p/, the lower boud of estimatig β Σβ is M. This optimal covergece rate ca be achieved by a trivial estimator 0 ad hece the correspodig regime M k log p/ is ot iterestig i terms of studyig optimal estimators. I Corollary, the lower boud 3 is oly matched for the regime where M C for some costat C > 0. For theoretical iterest, we are goig to modify the proposed estimator Q defied i 9 such that the modified versio will achieve the lower boud 3 over the whole iterestig regime M k log p/. We radomly split the data y, X 0

ito two subsamples Z = y, X with sample size ad Z = y, X with sample size, where. Let β deote a estimator which is produced by the first sub-sample y, X ad satisfies Coditio A. Oe example of such a estimator is the scaled Lasso estimator 8 applied to the subsample Z = y, X. We propose the followig estimator of Q, Q β, Σ, Z = β Σ β + β where Σ = +N +N i= + X i X i. i= + X i y i X i β, 4 The followig theorem establishes the covergece rate of Q β, Σ, Z ad shows that this estimator achieves the optimal covergece rate of estimatig Q for M k log p/. Theorem 3. Suppose that Coditio A holds ad k c/ log p for some costat c > 0. Let β be a estimator depedig o the first half sample y, X ad satisfyig Coditio B. The with probability larger tha p c exp c N c t γ, Q β, Σ, Z Q t + β β + t + k log p N +. 5 Hece, the estimator Q β, Σ, Z defied i 4 achieves the optimal estimatio rate M + M + k log p. 6 over Θk, M i the regime k c mi {/ log p, p ν } for some costats c > 0 ad 0 ν < ad M k log p/..3 Two Special Cases We ow tur to the iferece i the supervised settig ad the settig with kow desig covariace matrix. These two settigs ca be viewed as the special cases with N = 0 ad N = respectively..3. Case I: Supervised Iferece I the supervised settig, we oly observe the labelled data ad will estimate Σ by the sample covariace matrix Σ L = X i Xi. The followig theorem establishes the covergece rate of the estimator Q = Q β, Σ L, Z. Theorem 4. Suppose that Coditio A holds ad k c/ log p for some costat c > 0. For ay estimator β satisfyig B, with probability larger tha p c exp c t γ,

Q β, Σ L, Z proposed i 7 with Σ L = X i X i satisfies Q β, Σ L, Z Q t β + β + k log p. 7 { Uder the additioal assumptio A ad β mi k log p/, k log p/ /}, the Q β, ΣL, Z Q 4σ β Σβ + E β X X β N0, 8 β Σβ The estimator Q β, Σ L, Z is a special case of the estimator 9 with N = 0. comparig Theorem 4 with Theorem, if β C for some positive costat C, the the ulabelled data leads to a faster covergece rate by reducig the term β / i 7 to β / N + i 0; however, the ulabelled data does ot affect other terms i the covergece rate. The effect of the ulabelled data is also revealed i the limitig distributio of the proposed estimator, where a compariso of 8 ad shows that the exact variace level is reduced from 4σ β Σβ + E β X X β β Σβ to 4σ β Σβ + ρe β X X β β Σβ, where ρ deotes the limitig proportio of the amout of labelled data out of the total amout of both labelled ad ulabelled data. The followig corollary further establishes the miimax rate for estimatig β Σβ i the supervised settig. Corollary. Suppose that Coditio A holds ad k c mi {/ log p, p ν } for some costats c > 0 ad 0 ν <. By For ay estimator β satisfyig Coditio B, the estimator Q = Q β, Σ L, Z defied i 7 with Σ L = X i X i achieves the followig optimal estimatio rate over Θk, M for M k log p/, M + M + k log p. 9 Remark. A related paper [] studies estimatio of β ad shows that the optimal rate of estimatig β over Θk, M for M k log p/ is M/ + M + k log p/ i the supervised settig. I cotrast to 9, we ca see that either of these two problems is easier tha the other, where there is a additioal term M / i 9 ad a additioal term Mk log p/ i the optimal covergece rate of estimatig β. Remark 3. Iferece for β Σβ is closely coected to [33, 38], where [33] studied the iferece problem for σ ad [38] studied the estimatio of β Σβ/σ. I particular, [33] proposed the scaled lasso estimator σ i 8 to estimate σ ad [38] proposed to estimate

β Σβ by y σ + as a itermediate step of estimatig β Σβ/σ. For the estimator Q β, Σ L, Z defied i 7, if β is take as the scaled Lasso estimator, the Q β, Σ L, Z is reduced to beig the same as the estimator proposed i [38], where the equivalece is show by the followig expressio, β ΣL β + β X i y i X i β = y y X β = y σ. 0 We shall stress that the calibratio idea i 7 provides a completely ew perspective o estimatio of β Σβ, where istead of usig the expressio Q = Eyi σ ad estimatig σ first, we estimate Q directly by calibratig the plug-i estimator. This ew perspective establishes a geeral machiery takig reasoable good iitial estimators of β ad Σ as iputs. As show i 9, the flexibility of the calibrated estimator has prove to be extremely useful i efficietly poolig additioal iformatio o Σ; Note that the estimatio method itroduced i [38] caot be exteded to the semi-supervised settig as that for the calibratio perspective i 9. Additioally, [38] focused o the estimatio problem istead of cofidece iterval costructio ad hypothesis testig problems. I terms of techical details o estimatio optimality, the results i [38] allowed for a more geeral regime k p tha Corollary but oly cosidered the optimality i the supervised settig ad cosidered a fixed M i the aalysis..4 Case II: Kow Σ The geeral semi-supervised results also shed light o aother iterestig settig where the desig covariace Σ is kow. I the semi-supervised settig, the additioal ulabelled data is used for estimatig the desig covariace matrix Σ. The case of kow Σ is a extreme case of the semi-supervised settig with N take as ifiity. The estimator 4 ca be modified such that the iformatio o Σ is icorporated, Q β, Σ, Z = β Σ β + β i= + Similarly, the estimator proposed i 9 ca be modified as Q β, Σ, Z = β Σ β + β X i y i X i β X i y i X β. i Corollary 3. Suppose that Coditio A holds ad k c/ log p for some costat c > 0.. For ay estimator β depedig o the first half sample y, X ad satisfyig Coditio B, with probability larger tha p c exp c t γ, the estimator 3

defied i satisfies Q β, Σ, Z Q t + β + k log p. 3. For ay estimator β satisfyig Coditio B, with probability larger tha p c exp c t γ, the estimator defied i satisfies Q β, Σ, Z Q t β + β + k log p. 4 Through comparig 3 with 5 ad 4 with 0, the ucertaity of estimatig the desig covariace matrix leads to the additioal term β / N +. By applyig Theorem, we ca show that the covergece rate i 3 achieves the optimal covergece rate M/ + k log p/. The term M / N + will disappear due to the kow desig covariace matrix Σ. 3 Cofidece Itervals for β Σβ I this sectio, we cosider the problem of costructig cofidece itervals for β Σβ, which is ivolved with ucertaity quatificatio of the CHIVE estimator proposed i Sectio. We first costruct cofidece itervals for β Σβ i Sectio 3. ad itroduce a radomized calibratio procedure i Sectio 3. to study iferece for explaied variace i the presece of weak sigals. 3. Cofidece Iterval Costructio We start with ucertaity quatificatio of the CHIVE estimator Q proposed i 9. By the limitig distributio established i, the mai ext step is to cosistetly estimate the stadard error 4σ β Σβ + ρe β X X β β Σβ /. Specifically, we estimate 4σ β Σβ by φ, ρ by ρ = /N + ad E β X X β β Σβ by φ, where φ = σ β ΣS β ad φ = +N β X i X β i β, ΣS β with Σ S defied i 9. The we propose the followig cofidece iterval cetered at Q, CIZ = Q φ zα/, Q + z α/ φ, where φ 4 φ + ρ φ =, 5 + where z α/ is the upper α/ quatile of stadard ormal distributio. The followig theorem establishes the coverage ad precisio properties of CIZ, where the legth of ay iterval CIZ = LZ, UZ is defied as LCIZ = UZ LZ. 4

Theorem 5. Suppose that Coditios A ad A hold, k mi{/logn+ log p, /log p} ad β k log p/. For β ad σ satisfyig Coditios B ad B, respectively, the cofidece iterval give i 5 satisfies the followig coverage ad precisio properties, lim P β Σβ CIZ α 6 ad lim P LCIZ + δ 0 4σ β Σβ + E β X X β β Σβ = 0 7 N + for ay positive costat δ 0 > 0. The effect of the additioal data o the legth of cofidece iterval is demostrated i 7, where the cofidece iterval gets shorter with a larger amout of ulabelled data. Furthermore, the legth 4σ β Σβ/ + E β X X β β Σβ /N + is upper bouded by β / + β / N +, which matches the optimal covergece rate of estimatio M/ + M / N + over the parameter space Θk, M for k /log p ad M k log p/. As show i Theorem 5, the validity of the proposed cofidece iterval 5 requires the coditio that β is bouded away from zero by k log p/. Although k log p/ coverges to zero over the extreme sparse regime k /log p, it reveals the difficulty of costructig stable cofidece itervals for β Σβ whe β is at a local eighborhood of zero. The ext sectio will address the iferece problem i presece of such weak sigals. 3. Iferece for Weak Sigals: Radomized Calibratio As discussed i the itroductio, ucertaity quatificatio of Q = β Σβ is closely coected to other importat statistical problems, icludig sigal detectio ad global testig; predictio accuracy evaluatio ad 3 cofidece ball costructio. These applicatios provide a strog motivatio for iferece for the explaied variace uder the settigs of weak sigals that is, β k log p/. I the followig, we focus o the iferece problem i the presece of weak sigals ad itroduce a radomized versio of iid the CHIVE estimator 9. We first geerate radom variables u i N0, τ0 for i, which is idepedet of the observed data Z. Similar to 9, we propose the followig radomized calibrated estimator, Q R = Q β, R ΣS, Z, u = β ΣS β + X i β + u i y i X i β. 8 5

Whe there is o cofusio, we use Q R to deote the estimator proposed i 8. I cotrast to 9, the calibratio step i 8 is ivolved with a additioal term u iy i X β. i If u i is zero istead of beig geerated as ormal radom variables i 8, the estimator Q β, R ΣS, Z, 0 is reduced to beig exactly the same as Q β, ΣS, Z defied i 9. Sice u i i 8 is radomly geerated ormal radom variables, this additioal term approximately follows a ormal distributio with mea zero ad variace 4σ τ0 /. Eve i the presece of weak sigals, this additioal term further elarges the variace level of the calibrated estimator such that the bias level of the calibrated estimator is domiated by the correspodig variace level. The followig corollary establishes the limitig distributio of the estimator Q R after radomized calibratio. Theorem 6. Suppose that Coditio A holds, k / log p ad τ 0 > 0 is a positive costat. For ay estimator β satisfyig Coditio B, the where ρ = lim QR Q 4σ d N 0, 9 β Σβ + τ0 + ρe β X X β β Σβ +N. I compariso to the limitig distributio i Theorem, Theorem 6 requires o coditio o β to coduct iferece for the explaied variace while the variace level of Q R is slightly larger tha that of Q by the amout 4σ τ0 /. This additioal variace term is a side effect of the radomized calibratio. However, it has paved the way to quatify the ucertaity whe β is ear zero that is β k log p/. Similar to 5, the stadard error of the radomized estimator Q R i 9 is approximated as φ R = +N β 4 σ ΣS β + τ β 0 + X i X β, i β Σ S β 30 where Σ S is defied i 9. The we propose the followig cofidece iterval, QR CI R Z = z α/ φ R, QR + z α/ φr, 3 + where z α/ is the upper α/ quatile of stadard ormal distributio. The followig corollary characterizes the coverage ad precisio properties of CI R Z. Corollary 4. Suppose that Coditios A ad A hold, k mi{/logn+ log p, /log p} ad τ 0 > 0 is a positive costat. For β ad σ satisfyig Coditios B ad B, respectively, the the cofidece iterval defied i 3 satisfies the followig coverage ad precisio properties, lim P β Σβ CI R Z α 3 6

ad lim P LCI R Z + δ 0 for ay positive costat δ 0 > 0. 4σ β Σβ + τ0 + E β X X β β Σβ = 0 33 N + The algorithm for estimatig β Σβ ad quatifyig its ucertaity is summarized i Algorithm. Algorithm : Radomized CHIVE i the Semi-supervised Settig Iput : Labelled data {y i, X i } i ad ulabelled covariates {X i } + i +N ; Radomizatio level τ 0 Output: Poit estimator Q R = Q R y, X, τ 0 ad its stadard error estimator φ R = φ R y, X, τ 0 Iitializatio: Costruct poit estimator β ad σ satisfyig B ad B; Estimate Σ by Σ S defied i 9; Radomized Calibratio: Estimate Q by the estimator Q R defied i 8, where the variables {u i } i are geerated to be idepedet of the observed data X, y ad followig i.i.d N0, τ 0 ; 3 Ucertaity Quatificatio: Estimate the stadard error of the proposed estimator by φ R defied i 30. 4 Statistical Applicatios I this sectio, we apply Algorithm to tackle several importat statistical problems, icludig sigal detectio ad global testig i Sectio 4., predictio accuracy evaluatio i Sectio 4. ad cofidece ball costructio i Sectio 4.3. 4. Applicatio : Sigal Detectio ad Global Testig Sigal detectio is of great importace i statistics ad related scietific applicatios ad the detectio problem i high-dimesioal liear regressio was studied i [, 3]. The iferece procedure stated i Algorithm has profoud implicatios o sigal detectio ad the geeral global testig i high-dimesioal liear regressio. We cosider the global hypothesis H 0 : β = β ull, which icludes the sigal detectio as a special case with takig β ull = 0. The global testig problem is cast as H 0 : β β ull Σβ β ull = 0 v.s. H : β β ull Σβ β ull > 0. 34 7

We apply Algorithm with a give τ 0 > 0 ad obtai the poit estimator Q R y Xβ ull, X, τ 0 ad its stadard error estimator φ R y Xβ ull, X, τ 0. The we propose the detectio procedure, with Type I error cotrolled at α 0, as Dτ 0 = QR y Xβ ull, X, τ 0 φ R y Xβ ull, X, τ 0 z α. 35 We defie the correspodig ull parameter space as { } H 0 = θ = β ull, Σ, σ : λ mi Σ λ max Σ M, σ M M 36 ad the local alterative parameter space as { H = θ = β, Σ, σ : β β ull Σβ β ull = }, λ mi Σ λ max Σ M, σ M. M The followig corollary establishes that Dτ 0 cotrols the type I error asymptotically ad also establishes the asymptotic power fuctio of the proposed test. Corollary 5. Suppose that Coditios A ad A hold, τ 0 > 0 is a positive costat ad the vector δ = β β ull satisfies the coditios that δ 0 mi{/logn + log p, δ /log p} ad E X X δ δ Σδ > c0 for some positive costat c 0. The for ay θ H 0, 37 lim P θ Dτ 0 = α. 38 For ρ > 0 ad ay θ H with some positive costat > 0, the lim P θ Dτ 0 = = Φ z α. 39 4σ β Σβ + τ0 + ρe β X X β β Σβ The assumptios of Corollary 5 are the same as those of Corollary 4 from the perspective that the coditios imposed o β i Corollary 4 are ow imposed o the differece vector δ = β β ull. Oe sufficiet coditio for the differece vector δ beig sparse is that both the true sigal β ad the ull hypothesis β ull are sparse. Corollary 5 shows that for ay positive costat τ 0, Dτ 0 cotrols the type I error asymptotically. For the fiite sample performace, we have ivestigated how to choose the radomizatio level τ 0 i the simulatio sectio. See Sectio 5. for the umerical performace. 4. Applicatio : Predictio Accuracy Assessmet Iferece for explaied variace has importat applicatios to evaluatig the out-of-sample predictio for a give sparse estimator ˇβ. To keep the otatio cosistet, we assume ˇβ is 8

estimated based o a traiig data set X 0, y 0 ad X, y is a idepedet test data to evaluate its predictio accuracy. We start with computig the residual o the test data set y X ˇβ = Xβ ˇβ + ɛ. 40 The out-of-sample predictio accuracy is defied as PA ˇβ = E xew x ew ˇβ β = ˇβ β Σ ˇβ β ad it is reduced to the explaied variace for the residual model 40 with outcome r = y X ˇβ ad covariates X. Let Q R r, X, τ 0 ad φ R r, X, τ 0 deote the outputs of Algorithm with the labeled data {r i, X i } i ad ulabelled data {X i } + i +N as iputs. The we propose the poit estimator of PA ˇβ as Q R r, X, τ 0 ad the iterval estimator for PA ˇβ as QR CI PA ˇβ = r, X, τ 0 z φ R α/ r, X, τ 0, Q R r, X, τ 0 + z φr α/ r, X, τ 0 + The followig corollary establishes the covergece rate for the poit estimator ad the coverage ad precisio properties of the iterval estimator. Corollary 6. Suppose that Coditios A ad A hold ad τ 0 > 0 is a positive costat. For ay sparse estimator satisfyig ˇβ 0 C β 0 ad C > 0, the. If k c/ log p for some positive costat c > 0, the with probability larger tha p c exp c N c t γ, Q R r, X, τ 0 Q t ˇβ β + τ 0 + t ˇβ β + ˇβ β + k log p N + 4 4. If k mi{/logn + log p, δ /log p} ad E X X δ δ Σδ > c0 for some positive costat c 0 where δ = β ˇβ, the the cofidece iterval defied i 4 satisfies the followig coverage ad precisio properties, lim P PA ˇβ CI PA ˇβ α 43 ad lim P ˇβ β + τ 0 LCI PA ˇβ C + ˇβ β = 0 44 N + for some costat C > 0. The above corollary has show that the precisio of cofidece iterval for the predictio accuracy is ot just related to the sample sizes, N, the sparsity k ad the dimesio p, but also related to the accuracy of the evaluated estimator ˇβ β. See Sectio 5.3 for the umerical performace. 9

4.3 Applicatio 3: Cofidece Ball Costructio The predictio accuracy evaluatio established i 4 ca be used to costruct cofidece ball for β. For the settig where λ mi Σ is kow, the we have λ mi Σ ˇβ β ˇβ β Σ ˇβ β ad costruct the cofidece ball for β as { } CB ˇβ = β : β ˇβ z α/ λ mi Σ φ R r, X, τ 0 As show i 44, the radius of the cofidece ball CB ˇβ is upper bouded by ˇβ β +τ 0 + ˇβ β N+. To miimize the radius, we eed to select the ceter ˇβ for the cofidece ball i 45 such that ˇβ is sparse ad ˇβ β is small. I the high-dimesioal literature, several pealized estimators are show to satisfy such properties, such as Lasso, scaled Lasso ad Datzig Selector. 45 5 Simulatio Study We carry out simulatio studies i this sectio to demostrate the umerical performace of the proposed methods, which cosist of cofidece iterval costructio for β Σβ i Sectio 5., sigal detectio i Sectio 5. ad predictio accuracy evaluatio i Sectio 5.3. Throughout all simulatio studies i this paper, we geerate the high-dimesioal liear regressio with the dimesio p = 800 ad the correspodig sample size across {00, 400, 600, 800, 000, 00, 400}. For the liear model, the covariates {X i } i are geerated i i.i.d. fashio to follow multivariate ormal distributio with mea zero ad covariace matrix Σ R 800 800 where Σ ij = 0.5 i j ad the errors {ɛ i } i are geerated as i.i.d stadard ormal distributio. I additio to the labelled data, we also geerate the ulabelled data {X i } + i +N with N =, 000 to study the proposed iferece procedures i the semi-supervised settig. The simulatios are replicated over 500 simulatios. 5. Iferece for β Σβ The sample sizes are geerated across 00, 400, 600, 800 ad 000 ad the high-dimesioal regressio vector β is geerated across the followig three settigs, a. Settig : β is geerated with sparsity 0 where β j = j/0 for j 0 ad β j = 0 for j ; b. Settig : β is geerated with sparsity 50 where β j = j/50 for j 50 ad β j = 0 for j 5; 0

c. Settig 3: β is geerated as approximate sparse vector with β j = 0.5 p. We have compared the estimatio accuracy across two differet types of estimators, the plug-i estimator ad the CHIVE estimator ad across two differet settigs, supervised settig ad semi-supervised settig. Recall that oly labelled data is available i the supervised settig ad both the labelled ad ulabelled data are available i the semi-supervised settig. The umerical compariso has bee reported i Figure. Across all three settigs, it is observed that the proposed CHIVE estimator has achieved uiformly much better estimatio accuracy tha the plug-i estimators, i both supervised ad semi-supervised settigs. This umerical observatio demostrates that the calibratio step is useful i improvig the estimatio accuracy. I additio, the ulabelled data is useful i estimatig β Σβ, where as demostrated i Figure, the solid lie the CHIVE estimator i the semisupervised settig is always below the dotted lie the CHIVE estimator i the supervised settig. This matches with the theoretical results. True Value 9.4 True Value 49.47 True Value.9 RMSE 0.4 0.6 0.8.0..4 Plugi CHIVE Plugi.semi CHIVE.semi RMSE 0.4 0.6 0.8.0..4 Plugi CHIVE Plugi.semi CHIVE.semi RMSE 0.4 0.6 0.8.0..4 Plugi CHIVE Plugi.semi CHIVE.semi 00 400 600 800 000 00 400 600 800 000 00 400 600 800 000 Sample Size Sample Size Sample Size Figure : Root Mea Squared Error RMSE of differet estimators of β Σβ. The x-axis stads for the sample size ad y-axis stads for the RMSE of correspodig estimators. The dotted lie ad the solid lie represet the correspodig RMSEs of the CHIVE estimator i the supervised settig ad semi-supervised settig, respectively; The dashed lie ad the dotted-dashed solid lie represet the correspodig RMSE of the plug-i estimator i the supervised settig ad semi-supervised settig, respectively. The true values for β Σβ are 9.4, 49.47 ad.9, from the leftmost to the rightmost. I additio to the sigificat improvemet i terms of estimatio, the CHIVE estimator serves as the ceter of cofidece itervals for β Σβ. The coverage ad precisio properties of the costructed cofidece iterval CI are reported i Table. With a larger sample size,

the empirical coverage of the proposed cofidece iterval achieves 95% ad the average legths of the cofidece itervals get shorter. The itegratio of the ulabelled data i the semi-supervised settig has shorte the legths of cofidece iterval sigificatly. Settig Settig Settig 3 Supervised Semi-Supervised Cov Le Cov Le 00 0.9 3.750 0.896.796 400 0.936.734 0.94.536 600 0.946.93 0.950.393 800 0.936.99 0.94.90,000 0.966.800 0.960.5 00 0.906 8.8 0.50 6.7 400 0.940 3.444 0.880 5.930 600 0.946.045 0.90 5.643 800 0.93 9.67 0.94 5.49,000 0.966 8.757 0.956 5.47 00 0.864.34 0.866 0.903 400 0.94 0.98 0.904 0.7 600 0.934 0.88 0.906 0.64 800 0.944 0.73 0.94 0.56,000 0.94 0.650 0.950 0.56 Table : Coverage ad precisio properties of Proposed CIs. Differet rows correspod to differet settigs Settig,,3 ad differet sample sizes = 00, 400, 600, 800, 000 for the give settig. Each row reports empirical coverage idexed with Cov ad average legths idexed with Le of proposed CIs. The colums idexed with Supervised represet the results for the supervised settig ad the colums idexed with Semi-Supervised represet the results for the semi-supervised settig. For example, i the first row of umbers 0.9, 3.750, 0.896,.796, it correspods to the settig ad sample size = 00, i the supervised settig, CI has empirical coverage 0.9 ad the average legth is 3.750; i the semi-supervised settig, CI has empirical coverage 0.896 ad the average legth is.796. 5. Sigal Detectio For the detectio problem, we cosider the followig geeratio of β as β j = δ for j 50 ad β j = 0 for 5 j 800 ad vary δ across {0.00, 0.05, 0, 05, 0.075, 00, 0.5, 0.5} ad vary the sample size across {600, 00}. I Figure, we demostrate the coverage ad precisio properties of the radomized cofidece itervals across four methods, the oradomized detector D0 ad the three radomized detectors D, D4 ad D6, where D is defied i 35. The two plots o the top of Figure, correspodig to the supervised

.00.0 0.75.5 Coverage 0.50 Legth.0 0.5 0.5 0.00 0.00 0.05 0.0 0.5 δ 0.00 0.05 0.0 0.5 δ.00.5 0.75 Coverage 0.50 Legth.0 0.5 0.5 0.00 0.00 0.05 0.0 0.5 δ 0.0 0.00 0.05 0.0 0.5 δ Radomizatio tau0=0 tau0= tau0=4 tau0=6 Figure : Empirical coverage ad average legths of the proposed radomized cofidece itervals i the supervised settig. The above two figures correspod to the sample size = 600 ad the bottom two figures correspod to = 00. The left had side figures stad for the empirical coverage for differet δ while the right had side figures stad for the average legths of CIs for differet δ. Differet type of the curves correspod to differet radomizatio levels τ 0 {0,, 4, 6}. The dashed horizotal lies o the left had figures correspod to the targeted coverage level, 0.95. settig with = 600 demostrate the effect of radomizatio o the empirical coverage ad average legths, where the radomizatio leads to a iterval estimator achievig the coverage properties at the expese of wider iterval estimators. With the radomizatio level τ 0 reachig, the coverage property is guarateed while the empirical coverage for the procedure without radomizatio τ 0 = 0 is much lower tha 0.95, especially for weak sigals with a small δ. The bottom two plots of Figure correspods to the supervised settig with =, 00 ad the mai observatio is similar to the case of = 600 but the cofidece itervals are much shorter tha the settig with = 600. The empirical detectio rate is reported i Table, where the sample size is geerated across = 600 ad =, 00 ad the explaied variace β Σβ is cotrolled via the scaler δ. Whe δ = 0, it correspods to the ull case ad a proper detectio procedure is expected to have type I error rate 0.05. As predicted by theory, the detectio method without 3

radomizatio D0 fails to give proper type I error due to presece of weak sigals. With itroducig the radomizatio procedure, the type I error rate gets closer to 0.05. Whe δ moves away from zero, the detectio procedure is take as a powerful procedure as the empirical detectio rate approaches. For the detectio procedure with radomizatio level τ 0 =, the settig with δ = 0.05 correspods to a idistiguishable regio, where it is challegig to detect the sigal. However, as δ reaches 0.05, the detectio rate reaches 0.800 for = 600 ad 0.944 for = 00. As characterized by theory, a larger radomizatio level requires a higher value of δ such that the sigal ca be detected, for example, for τ 0 = 4, util δ reaches 0.075, the detectio rate reaches 0.8 for = 600 ad 0.968 for = 00. The correspodig semi-supervised settig shows a similar pheomeo to the supervised settig but teds to be easier tha the supervised settig due to the ulabelled data. The results are reported i the supplemetary materials. = 600 =, 00 δ β Σβ D0 D D4 D6 D0 D D4 D6 0.000 0.000.000 0.48 0.08 0.066.000 0.4 0.076 0.068 0.05 0.09.000 0.48 0.094 0.06.000 0.54 0.4 0.086 0.050 0.365.000 0.800 0.356 0.8.000 0.944 0.47 0.64 0.075 0.8.000.000 0.80 0.54.000.000 0.968 0.764 0.00.460.000.000.000 0.94.000.000.000 0.99 0.5.8.000.000.000 0.996.000.000.000.000 0.50 3.85.000.000.000.000.000.000.000.000 Table : Empirical detectio rates i the supervised settig. The colum idexed with δ represets the sigal stregth, where the sigal is of sparsity 50 ad of the form δ,,, 0, 0,, 0; the colum idexed with β Σβ represets the value of β Σβ; the colums uder =600 ad =,00 correspod to sample size 600 ad,00 respectively, where the colum idexed with Dτ 0 report the empirical detectio rates for the detector Dτ 0. 5.3 Predictio Loss Evaluatio I this subsectio, the high-dimesioal regressio vector β is geerated with sparsity 0 where β j = j/5 for j 0 ad β j = 0 for j. Let βλ deote the Lasso estimator based o a idepedet traiig data X 0, y 0 with sample size 0 = 600, β λ = arg mi β R p y 0 X 0 β 0 + λ p j= X 0 j 0 β j. We cosider the iferece problem for the out-of-sample predictio accuracy βλ β Σ βλ β. Specifically, we cosider three estimators βλ 0, β5λ 0 ad β0λ 0 with λ 0 = 4 Z 0./p 0

ad report the umerical performace of both poit ad iterval estimators of the correspodig predictio accuracy. We cosider the predictio accuracy problem across three differet sample sizes, {600, 00, 400} ad itroduce differet radomizatio levels. We will use PAτ 0 to deote the procedure with radomizatio level τ 0. βλ 0 β5λ0 β0λ0 True Accuracy 0.065 0.636.30 Super, 600 Semi, 600 Super, 00 Semi, 00 Super, 400 Semi, 400 PA0 PA PA4 PA0 PA PA4 PA0 PA PA4 Coverage 0.66 0.938 0.96 0.778 0.98 0.956 0.94 0.934 0.948 Est Aver 0.57 0.58 0.60 0.739 0.743 0.746.47.4.44 Lower Aver 0.090 0.000 0.000 0.584 0.373 0.058.06.93.665 Upper Aver 0.3 0.50 0.837 0.895..434.77.909 3.83 Coverage 0.80 0.938 0.96 0.800 0.936 0.96 0.930 0.944 0.958 Est Aver 0.54 0.55 0.57 0.76 0.79 0.73.385.388.39 Lower Aver 0.088 0.000 0.000 0.58 0.364 0.047.0.950.664 Upper Aver 0.0 0.499 0.834 0.87.094.48.667.87 3.9 Coverage 0.480 0.968 0.976 0.890 0.960 0.974 0.964 0.964 0.970 Est Aver 0.07 0.08 0.09 0.684 0.686 0.688.356.358.360 Lower Aver 0.067 0.000 0.000 0.575 0.40 0.90.099.004.80 Upper Aver 0.46 0.355 0.599 0.793 0.95.85.63.7.909 Coverage 0.494 0.970 0.976 0.898 0.958 0.97 0.954 0.954 0.968 Est Aver 0.06 0.07 0.08 0.680 0.68 0.684.348.350.35 Lower Aver 0.066 0.000 0.000 0.576 0.48 0.87.33.06.8 Upper Aver 0.45 0.354 0.598 0.783 0.946.80.563.675.883 Coverage 0.738 0.97 0.978 0.96 0.954 0.97 0.948 0.944 0.958 Est Aver 0.083 0.084 0.084 0.663 0.663 0.66.340.340.340 Lower Aver 0.058 0.000 0.000 0.585 0.47 0.306.54.085.945 Upper Aver 0.09 0.60 0.434 0.74 0.853.09.56.594.734 Coverage 0.74 0.97 0.978 0.9 0.96 0.976 0.950 0.950 0.960 Est Aver 0.083 0.083 0.083 0.66 0.66 0.66.337.337.336 Lower Aver 0.058 0.000 0.000 0.587 0.47 0.306.73.098.95 Upper Aver 0.08 0.60 0.434 0.736 0.85.07.50.576.7 Table 3: Iferece for predictio accuracy βλ β Σ βλ β. The table reports six settigs, correspodig to three differet sample sizes 600,00, 400 ad the supervised ad semi-supervised settig. For example, Super, 600 stads for the supervised settig with sample size = 600 ad Semi, 600 stads for the semi-supervised settig with sample size = 600. The true predictio accuracy of the three estimators βλ 0, β5λ 0 ad βλ 0 is reported as 0.065, 0.636 ad.30. Three predictio accuracy evaluators PA0, PA ad PA4 are reported, where PA0 is the evaluator with o radomizatio, PA is the evaluator with radomizatio level τ 0 = ad PA4 is the evaluator with radomizatio level τ 0 = 4. For each settig, the row idexed with Coverage reports the empirical coverage of the correspodig cofidece itervals over 500 simulatios; the row idexed with Est Aver reports the sample average of the correspodig poit estimators over 500 simulatios; the rows idexed with Lower Aver ad Upper Aver report the sample averages of the lower ad upper limits of iterval estimators over 500 simulatios. 5