Optimal Estimation of Co-heritability in High-dimensional Linear Models

Size: px

Start display at page:

Download "Optimal Estimation of Co-heritability in High-dimensional Linear Models"

Amice Cain
5 years ago
Views:

1 Optimal Estimatio of Co-heritability i High-dimesioal Liear Models Zijia Guo, Wajie Wag,, T. Toy Cai, ad Hogzhe Li Departmet of Statistics, The Wharto School, Uiversity of Pesylvaia Departmet of Biostatistics ad Epidemiology, Perelma School of Medicie, Uiversity of Pesylvaia Abstract Co-heritability is a importat cocept that characterizes the geetic associatios withi pairs of quatitative traits. There has bee sigificat recet iterest i estimatig the co-heritability based o data from the geome-wide associatio studies (GWAS. This paper itroduces two measures of co-heritability i the highdimesioal liear model framework, icludig the ier product of the two regressio vectors ad a ormalized ier product by their legths. Fuctioal de-biased estimators (FDEs are developed to estimate these two co-heritability measures. I additio, estimators of quadratic fuctioals of the regressio vectors are proposed. Both theoretical ad umerical properties of the estimators are ivestigated. I particular, miimax rates of covergece are established ad the proposed estimators of the ier product, the quadratic fuctioals ad the ormalized ier product are show to be rate-optimal. Simulatio results show that the FDEs sigificatly outperform the aive plug-i estimates. The FDEs are also applied to aalyze a yeast segregat data set with multiple traits to estimate heritability ad co-heritability amog the traits. 0 Zijia Guo is a Ph.D. studet ( zijguo@wharo.upe.edu. Wajie Wag is a Postdoctoral Research Fellow ( wajiew@wharto.upe.edu. T. Toy Cai is Dorothy Silberberg Professor of Statistics ( tcai@wharto.upe.edu. The research of T. Toy Cai was supported i part by NSF Grats DMS-0898 ad DMS , ad NIH Grat R0 CA7334. Hogzhe Li is Professor ( hogzhe@upe.edu. The research of Wajie Wag ad Hogzhe Li was supported i part by NIH grats CA7334 ad GM

2 Keywords: Geetic correlatios, geome-wide associatio studies, ier product, miimax rate of covergece, quadratic fuctioal. Itroductio. Motivatio ad Backgroud Geome-wide associatio studies (GWAS have ot oly led to idetificatio of thousads of geetic variats or sigle ucleotide polymorphisms (SNPs that are associated various complex pheotypes (Maolio, 00, they also provide importat iformatio to estimate the geetic heritability, a key populatio parameter that ca help uderstad the geetic architecture of complex traits (Yag et al., 00. Results from these GWAS have also show that may complex pheotype shared commet geetic variats, icludig various autoimmue diseases (Zherakova et al., 009 ad psychiatric disorders (Lee et al., 03. These empirical evidece of shared geetic etiology for various pheotypes ca iform osology ad ecourages the ivestigatio of commo pathophysiologies for related disorders that ca be explored for drug repositioig. There has bee a recet iterest i estimatig the geetic correlatios betwee two traits based o the geome-wide aggregate effects of causal variats affectig both traits. The cocept of co-heritability has bee proposed to describe the geetic associatios withi pairs of quatitative traits based o GWAS data. This is i cotrast to the traditioal approaches estimatig co-heritability based o twi or family studies. (Baradat, 976 first geeralized the cocept of heritability ad defied the coefficiet of geetic predictio betwee two traits as the ratio of additive geetic covariace over the product of the pheotypic stadard deviatio of each trait. This idea was recetly exteded to defie co-heritability usig GWAS data by estimatig the geetic risk scores (GRSs for oe pheotype ad their associatio with a secod pheotype (Purcell et al., 009; Wray et al., 007. This defiitio is ad hoc ad does ot accout for the fact oly a very small set of SNPs are expected to be associated with either of the two traits. Lee et al. (0 ad Yag et al. (03 exteded the mixed-effect model framework for

3 estimatig the heritability to estimate geetic covariace ad geetic correlatio betwee two traits. I their models, each idividual s pheotype is affected by a geetic radom effect, which is correlated across idividuals by virtue of sharig some of the geetic variats affectig the traits, ad a evirometal radom effect, which is ucorrelated across idividuals. They defie co-heritability as the square-root of the ratio of the covariace of the geetic radom effects to the product of the total variaces. The mixed-effect model approach requires kowledge of the idetity of the causal SNPs, ad hece the covariace matrix. This is however ot available, Lee et al. (0 ad Yag et al. (03 approximate the geetic correlatio betwee every pair of idividuals across the set of causal SNPs by the geetic correlatio across the set of all geotyped SNPs. The cocept of usig a estimated correlatio matrix istead of the actual correlatio matrix i estimatig the heritability is a questioable heuristic (Gola ad Rosset, 0. The very large umber of SNPs used for estimatig the geetic correlatios, most of them likely ot causative, might lead to biased estimate of correlatios due the set of causal SNPs. This has bee clearly demostrated by Gola ad Rosset (0. The goal of this paper is to defie two quatities that ca be used for measurig the co-heritability betwee a pair of traits based o GWAS data i the framework of highdimesioal liear models. Istead of usig the mixed-effects model framework, our defiitio of the co-heritability is similar i spirit to that of Baradat (976. The proposed estimatio procedure ivolves selectig the relevat geetic variats, which overcomes the problem of most of the SNPs i GWAS are ot relevat to either of the two traits.. Problem Formulatio I this paper, a pair of trait values (y, w are modeled as a liear combiatio of p geetic variats ad a error term that icludes evirometal ad umeasured geetic effects, y = X pβ p + ɛ ad w = Z pγ p + δ, ( where the rows X i are i.i.d. p-dimesioal Sub-gaussia radom vectors with mea µ x ad covariace matrix Σ, the rows Z i are i.i.d. p-dimesioal Sub-gaussia radom vectors with mea µ z ad covariace matrix Γ ad the error (ɛ, δ follows the multivariate ormal 3

4 distributio with mea zero ad covariace σ I 0 0 σi ad is assumed to be idepedet of X ad Z. Throughout the paper there are o particular assumptios o the relatio betwee the radom desig matrices X ad Z, where they ca be idetical, correlated or idepedet. I the study of geetic co-heritability, the pairs of traits y ad w are assumed to have mea zero, ad the jth colum of X, X j, ad the jth colum of Z, Z j, are the umerically coded geetic markers at the jth geetic variat ad are assumed to have variace. Uder this model, if the colums of X ad Z are idepedet, for the i-th observatio, Var(y i = j β j + σ = β + σ, ad Var(w i = j γ j + σ = γ + σ, therefore β ad γ ca the be iterpreted as the arrow sese heritability. Based o this model, oe measure of geetic co-heritability is the ier product of the regressio coefficiets I (β, γ = β, γ ( which measures the shared geetic effects betwee these two traits. Alteratively, a ormalized ier product, that is the ratio R (β, γ = β, γ β γ ( β γ > 0, (3 ca also be used. The deomiator β ad γ represet the total geetic variats for two traits where the colums of the desig matrix are idepedet. I the case where oe of β ad γ is vaishig, the ratio is defied to be zero, which meas that there is o correlatio betwee two traits whe oe of the regressio vector is zero. With this ormalizatio, R (β, γ is always betwee - ad ad ca be used to compare co-heritability amog multiple pairs. The focus of this paper is to develop estimators for I (β, γ ad R (β, γ based o two idepedet GWAS data with geotype data measured o the same set of geetic markers, deoted by (y i, X i, i =,, ad (w i, Z i, i =,,. 4 A aive estimator is to

5 estimate β ad γ first ad the plug i the estimators of β ad γ ito the expressios ( ad (3. For the problem of iterest, usually there are may more geetic markers tha the sample size, that is p max{, }. However, for ay trait, oe expect that oly a few of these markers have ozero effects. Oe ca apply ay high-dimesioal regressio methods such as Lasso (Tibshirai, 996 ad scaled Lasso (Su ad Zhag, 0 ad margial regressio with screeig (Fa et al., 0; McCarthy et al., 008 to estimate these sparse regressio coefficiets. However, such plug-i estimators have several drawbacks i terms of estimatig heritability ad co-heritability. The Lasso approach shriks the estimatio towards 0, i particular, some weak effects might be shruke to 0, yet the accumulatio of these weak effects may cotribute sigificat to the trait variability. It is possible that some geetic variats may have strog effects o oe trait ad weak effects o the other trait. Due to shrikage, the plug-i of Lasso type estimators fails to capture this part of cotributio to co-heritability from such geetic variats. Margial regressio calculates the regressio score betwee the trait ad each sigle marker (i.e., y ad X j, j p, ad scree for the large scores. This approach also suffers from the existece of weak effects, as the margial scores must be large eough to survive i the screeig step..3 Methods ad Mai Results I this paper, a ew estimator, fuctioal de-biased estimator of ier product ad ormalized ier product (FDE is proposed to estimate the co-heritability measures I (β, γ defied i ( ad R (β, γ defied i (3. The proposed estimator for I (β, γ ivolves two steps, icludig estimatig the ier product by the plug-i scaled Lasso estimator, ad the correctig the plug-i scaled Lasso estimator. Similar two-step procedures are proposed to estimate the quadratic fuctioals β ad γ. To estimate the ormalized ier product, we plug-i the estimators of the ier product ad quadratic fuctioals ito the defiitio (3. FDE is show to achieve the miimax optimal covergece rates of estimatig I (β, γ ad R (β, γ. The correctio step i FDE improves the estimatio accuracy through correctig the shrikage from Lasso ad is crucial to achieve the optimal covergece rate. As 5

6 demostrated i the simulatio studies, FDE cosistetly outperforms the plug-i estimator of the scaled Lasso estimator. I particular, FDE has sigificat improvemet of estimatig the co-heritability measures i the existece of some geetic markers havig weak effects o oe trait ad strog effects o the other trait. I additio, FDE does ot suffer from depedecy amog geetic markers. FDE works for a broad class of depedecy structure of geetic markers, ad it also allows depedecy betwee two desig matrices X ad Z. The theoretical aalysis give i Sectio 3 shows that the optimal rate of covergece for estimatig I (β, γ is ( ( β + γ + k log p + k log p, where p is the dimesio, is the sample size, ad k is the maximum sparsity of β ad γ. The optimal rate depeds ot oly o p, ad k, but also the sigal stregth β ad γ. Restrictig to the relatively strog sigals with mi{ β, γ } η 0 Ck log p/, where C is some costat, the optimal covergece rate of estimatig R (β, γ is ( + k log p + k log p η 0 η0. I cotrast to estimatig I (β, γ, the optimal rate scales to the iverse of the sigal stregth, represeted by /η 0. The estimators Î (β, γ ad R (β, γ proposed i Sectio 3 adaptively achieve the optimal rates for estimatig I (β, γ ad R (β, γ, respectively. I additio, the optimal rate of covergece for estimatig the quadratic fuctioals Q (β = β ad Q (γ = γ is also established..4 Orgaizatio of the Paper The rest of the paper is orgaized as follows. Sectio presets the procedures for estimatig I (β, γ, Q (β, Q (γ, ad R (β, γ i detail. I Sectio 3, miimax rates of covergece for the estimatio problems are established ad the proposed estimators are show to attai the optimal rates. I Sectio 4, simulatio studies are coducted to evaluate the empirical performace of the FDEs. A yeast cross data is used to illustrate the estimators i Sectio 5. The proofs of mai theorems are preset i Sectio 6. The remaiig proofs ad the exteded simulatio studies are give i the supplemetary materials. 6

7 .5 Notatio ad Defiitios Basic otatio ad defiitios that will be used i the rest of the paper are defied here. For a matrix X R p, X i, X j, ad X i,j deote respectively the i-th row, j-th colum, ad (i, j-th etry of the matrix X, X i, j deotes the i-th row of X excludig the j-th coordiate, ad X j deotes the sub-matrix of X excludig the j-th colum. Let [p] = {,,, p}. For a subset J [p], X J deotes the sub-matrix of X cosistig of colums X j with j J ad for a vector x R p, x J is the sub-vector of x with idices i J ad x J is the sub-vector with idices i J c. For a vector x R p, the l q orm of x is defied as x q = ( q i= x i q q for q 0 with x 0 deotig the cardiality of o-zero elemets of x ad x = max j p x j. For a matrix A ad q, A q = sup x q= Ax q is the matrix l q operator orm. I particular, A is the spectral orm. For a symmetric matrix A, λ mi (A ad λ max (A deote respectively the smallest ad largest eigevalue of A. For a set S, S deotes the cardiality of S. For a R, a + = max {a, 0} ad sig(a is the sig of a, i.e., sig(a = if a > 0, sig (a = if a < 0 ad sig (0 = 0. Defie the Sub-gaussia orm x ψ of x R p as x ψ = sup v S p sup q (E v x q /q q where S p is the uit sphere i R p. The radom vector x R p is defied to be Sub-gaussia if its correspodig Sub-gaussia orm is bouded; see Vershyi (0 for more o Subgaussia radom variables. For the desig matriices X R p ad Z R p, we defie the correspodig sample covariace matrices as Σ = X X/ ad Γ = Z Z/. z α/ deote the upper α/ quatile of the stadard ormal distributio. For two positive sequeces a ad b, a b meas a Cb for all ad a b if a b ad b a. c ad C are used to deote geeric positive costats that may vary from place to place. Let Estimatio Methods Details of the FDE estimatio procedures are give i this Sectio. Estimatio of I (β, γ i Sectio. is cosidered first ad the estimatio of the quadratic fuctioals Q (β = β ad Q (γ = γ is give i Sectio.. A estimator of the ormalized ier product R (β, γ is give i Sectio.3 by combiig the estimators of I (β, γ, Q (β ad Q (γ developed i Sectios. ad.. 7

8 . Estimatio of I (β, γ Estimatio of the ier product I (β, γ = β, γ is cosidered first sice I (β, γ is of sigificat iterest i its ow right. I additio, costructig a good estimator of I (β, γ is also a key step i developig a optimal estimator for the ormalized ier product R (β, γ. The scaled Lasso estimators for high-dimesioal liear model ( are defied through the followig optimizatio algorithm (Su ad Zhag, 0, ad where λ 0 =.0 log p. { β, y Xβ ˆσ } = arg mi + σ β R p,σ R + σ + λ 0 w Zγ { γ, ˆσ } = arg mi + σ γ R p,σ R + σ + λ 0 p j= p j= X j β j, (4 Z j γ j, (5 A atural estimator of the ier product I (β, γ ca the be obtaied by simply pluggig i the scaled Lasso estimators β ad γ give i (4 ad (5 respectively. However, as we will show, β, γ is ot a good estimate of I (β, γ i geeral due to the bias i the Lasso or scaled Lasso estimators ad other reasos. A alterative is to use the de-biased estimator, which was origially itroduced for the costructio of cofidece itervals (Cai ad Guo, 06a; Javamard ad Motaari, 04; va de Geer et al., 04; Zhag ad Zhag, 04. Pluggig i the de-biased estimator does ot lead to a good estimator either; it has large variace. See Reamrk for further discussios. To costruct a optimal estimator of I (β, γ, it is helpful to decompose the differece betwee β, γ ad β, γ ito three terms, β, γ β, γ = γ, β β + β, γ γ β β, γ γ. (6 The last term o the right had side, β β, γ γ is small, but the first two terms γ, β β ad β, γ γ ca be large. Our approach is to first estimate these two terms ad the subtract them from β, γ to obtai the fial estimate of I (β, γ. Some ituitio for estimatig γ, β β is give here. Sice ( X y X β = Σ ( β β + X ɛ, (7 8

9 multiplyig both sides of (7 by a vector u R p yields ( u X y X β ( = u Σ β β + u X ɛ, (8 which ca be writte as ( u X y X β γ, β β = ( ( Σu γ β β + u X ɛ. (9 If the vector u ca be chose such that the right had side of (9 is small, the u X (y X β / is a good estimate of γ, β β. To costruct u, ote that the first term o the right had side of (9 ca be further ( ( β upper bouded as Σu γ β Σu γ β β. To estimate γ, β β, ( ( a projectio vector u is costructed such that Σu γ β β is cotrolled through costraiig the term Σu γ ad the secod term of (9 u X ɛ/, which has mea 0, is cotrolled through miimizig its variace σ u Σu/. This leads to the followig covex optimizatio algorithm for idetifyig the projectio vector u for estimatig γ, β β, { } û = arg mi u λ Σu : Σu γ γ, (0 u R p where λ = λ max (Σ log p. Remark. The solutio of the above optimizatio problem might ot be uique ad û is defied as ay miimizer of the optimizatio problem. The theory established i Sectio 3 still holds for ay miimizer of (0. The optimizatio problem (0 is solved through its equivalet Lagrage dual problem, which is computatioally efficiet ad scales well to the high-dimesioal problem. See Step i Table for more details. Oce the projectio vector û is obtaied, γ, β β is the estimated by û X (y X β /. Similarly, the projectio vector for estimatig β, γ γ ca be obtaied via the covex algorithm { û = arg mi u Γu } : Γu β β λ, ( u R p where λ = λ max (Γ log p. The β, γ γ is estimated by û Z (w Z γ /. The fial estimator Î (β, γ of I (β, γ is give by Î (β, γ = β, γ + û ( X y X β + û Z (w Z γ. ( 9

10 It is clear from the above discussio that the key idea for the costructio of the fial estimator Î (β, γ is to idetify the projectio vectors û ad û such that γ, β β ad β, γ γ are well estimated. It will be show i Sectio 3 that the estimator Î (β, γ is adaptively miimax rate-optimal. Remark. As metioed before, simply pluggig i the Lasso, scaled Lasso, or de-biased estimator does ot lead to a good estimator of I (β, γ. Aother atural approach is to first threshold the de-biased estimator to obtai a sparse estimator of the coefficiet vectors (see details i Zhag ad Zhag (04, Sectio 3.3, Guo et al. (06, equatio (0 ad the plug i. This estimator is referred as the thresholded estimator. Simulatios i Sectio 4 demostrate that the proposed estimator defied i ( outperforms the three plug-i estimators usig the scaled Lasso, de-biased, ad thresholded estimators.. Estimatio of Q (β ad Q (γ I order to estimate the ormalized ier product R (β, γ, it is ecessary to estimate the quadratic fuctioals Q (β = β ad Q (γ = γ. To this ed, we radomly split the data (y, X ito two subsamples ( y (, X ( with sample size / ad ( y (, X ( with sample size / ad the data (w, Z ito two subsamples ( w (, Z ( with sample size / ad ( w (, Z ( with sample size /. With a slight abuse of otatio, let β ad γ deote the optimizers of the scaled Lasso algorithm (4 applied to ( y (, X ( ad (5 applied to ( w (, Z (, respectively. For the scaled Lasso algorithms, the sample sizes ad are replaced by / ad /, respectively. Agai, the simple plug-i estimator β of Q (β ad γ of Q (γ do ot perform well. Note that β β = β, β β β β. (3 The secod term o the right had side of (3 is small, but the first ca be large. The plug-i estimator β does ot estimate β well for this reaso. Similar ideas as i estimatig β, γ are applied here to estimate β. Specifically, the term β, β β is estimated first ad the is added it to β to obtai the fial estimate of β. To estimate 0

11 β, β β, a projectio vector uis idetified such that the followig differece is cotrolled, ( / u X ( ( y ( X ( β β, β β ( = u Σ( β ( β β + ( / u X ( ɛ, with Σ ( = ( X ( X ( /( /. Let the projectio vector û 3 be the solutio to the followig optimizatio algorithm û 3 = arg mi u R p { u Σ( u : Σ ( u β β λ / } (4, (5 where λ = λ max (Σ log p. We the estimate β, β β ( by û ( 3 X ( y ( X ( β /( / ad propose the fial estimator of β as ( Q (β = β + û 3 / Similarly, the estimator of γ is give by ( Q (γ = γ + û 4 / where û 4 = arg mi u { ( ( X ( y ( X ( β. (6 + ( Z ( ( w ( Z ( γ, (7 + u Γ( u : Γ ( u γ γ λ / with Γ ( = ( Z ( ( Z ( /( / ad λ = λ max (Γ log p. }, (8 Remark 3. Sample splittig is used here for the purpose of the theoretical aalysis. I the Simulatio Sectio, the performace of the proposed estimator without sample splittig is ivestigated; see Steps 5-7 i Table. The proposed estimator without sample splittig performs eve better umerically tha with sample splittig sice more observatios are used i costructig the iitial estimators β ad γ ad the projectio vectors û 3 ad û 4..3 Estimatio of R (β, γ Give the estimators Î (β, γ, Q (β ad Q (γ costructed i Sectios. ad., a atural estimator for the ormalized ier product R (β, γ is give by Î (β, γ } R (β, γ = sig (Î (β, γ mi Q (β Q { Q (β Q (γ > 0,, (9 (γ

12 where Î (β, γ, Q (β ad Q (γ are estimators of β, γ, β ad γ defied i (, (6 ad (7, respectively. It is possible that oe of Q (β ad Q (γ is 0 if β ad γ are close to zero. I this case, the ormalized ier product R (β, γ is estimated as 0. Sice R (β, γ is always betwee ad, the estimator R (β, γ is trucated to esure that it is withi the rage. The FDE algorithm without sample splittig for calculatig the estimators Î (β, γ, Q (β, Q (γ, ad R (β, γ is detailed i Table. 3 Theoretical Aalysis The theoretical properties of the estimators itroduced i the last sectio are ivestigated i this secrtio. The upper bouds ad lower bouds together show that the proposed estimators Î (β, γ, Q (β, Q (γ, ad R (β, γ achieve the optimal rates of covergece. 3. Upper Boud Aalysis The samples sizes ad are assumed to be of the same order, that is. Let = mi{, } be the smallest of two sample sizes. The followig assumptios are eeded to facilitate the theoretical aalysis. (A The covariace matrices Σ ad Γ satisfy /M λ mi (Σ λ max (Σ M ad /M λ mi (Γ λ max (Γ M, ad the oise levels σ ad σ satisfy max{σ, σ } M, where M ad M > 0 are positive costats. (A The l orms of the coefficiet vectors β ad γ are bouded away from zero i the sese that mi { β, γ } η 0 C k log p, where k = max{ β 0, γ 0 }. (0 Assumptio (A places a coditio o the spectrum of the covariace matrices Σ ad Γ ad a upper boud o the oise levels σ ad σ. Assumptio (A requires that the sigal has to be bouded away from zero by η 0, which is oly used i the upper boud aalysis of the ormalized ier product R (β, γ.

13 Table : FDE algorithm without sample splittig for estimatig the ier product, legths ad ormalized ier product. Iput: desig matrices: X, Z; respose vectors: y, w; tuig parameters λ 0,λ. Output: Î (β, γ, Q (β, Q (γ ad R (β, γ. Iitial Lasso estimators:. Scaled Lasso: Calculate β ad γ from (4 ad (5 with the tuig parameter λ 0. Ier product calculatio:. Projectio vector û : Calculate û = arg mi u u X Xu/4 + u γ + λ t u, where λ t = λ t /.5, ad λ 0 = λ/. Repeat util û ca ot be solved with λ replaced by λ t+, or t Projectio vector û : Calculate û = arg mi u u Z Zu/4 + u β + λ u, where λ = λ t /.5, ad λ 0 = λ/. Repeat util û ca ot be solved with λ replaced by λ t+, or t Correctio: Î (β, γ = β, γ + û X (y X β/ + û Z (w Z γ/. Quadratic fuctioal calculatio: 5. Projectio vector û 3 : Calculate û 3 = arg mi u u X Xu/4 + u β + λ u, where λ = λ t /.5, ad λ 0 = λ/. Repeat util û 3 ca ot be solved with λ replaced by λ t+, or t Projectio vector û 4 : Calculate û 4 = arg mi u u Z Zu/4 + u γ + λ u, where λ = λ t /.5, ad λ 0 = λ/. Repeat util û ca ot be solved with λ replaced by λ t+, or t Correctio: Q (β = ( β + û 3X (y X β/ +, Ratio calculatio: Q (γ = ( γ + û 4Z (w Z γ/ R (β, γ = sig( Î (β, γ mi {( Î (β, γ / Q (β Q } } (γ { Q (β Q (γ > 0,. 3

14 The followig theorem establishes the covergece rates of the estimators Î (β, γ, Q (β ad Q (γ, proposed i (, (6 ad (7, respectively. Theorem. Suppose the assumptio (A holds ad k log p/ 0 with k = max { β 0, γ 0 }. The for ay fixed costat 0 < α < 0.5, with probability at least α p c, we have ( Î (β, γ I (β, γ zα/ C ( β + γ + k log p + C k log p, ( ( zα/ Q (β Q (β C β + k log p + C k log p, ( ad ( zα/ Q (γ Q (γ C γ + k log p where c ad C are positive costats. + C k log p, (3 The upper boud of estimatig β, γ ot oly depeds o k, ad p, but also scales to the sigal stregths β ad γ. For the estimatio of the quadratic fuctioal Q (β, the covergece rate depeds o β. The similar pheomeo holds for estimatio of Q (γ. The followig theorem establishes the covergece rate of the estimator R (β, γ proposed i (9. Theorem. Suppose the assumptios (A ad (A hold ad k log p/ 0 with k = max { β 0, γ 0 }. The for ay fixed costat 0 < α < 0.5, with probability at least α p c, we have R (β, γ R (β, γ C ( zα/ + k log p + C η 0 η0 where c ad C are positive costats. k log p, (4 I cotrast to Theorem, Theorem requires the extra assumptio (A o the sigal stregths β ad γ. The covergece rate of estimatig R (β, γ is scaled to the iverse of sigal stregths, /η 0. This is differet from the error boud i Theorem, where the estimatio accuracy is scaled to the l orm. The lower boud results established i Theorem 3 will demostrate the ecessity of Assumptio (A for estimatio of R (β, γ. 4

15 3. Miimax Lower Boud This sectio establishes the miimax lower bouds that show the optimality of the proposed estimators i Sectio. We first itroduce the parameter spaces for θ = (β, Σ, σ, γ, Γ, σ, which is the product of parameter spaces for θ β = (β, Σ, σ ad θ γ = (γ, Γ, σ. To start with, we itroduce the followig parameter space for both θ β = (β, Σ, σ ad θ γ = (γ, Γ, σ, G (k = { (β, Σ, σ : β 0 k, } λ mi (Σ λ max (Σ M, σ M, (5 M where M ad M > 0 are positive costats. The parameter space defied i (5 requires the sparsity β 0 k. The other coditios /M λ mi (Σ λ max (Σ M ad σ M are regularity coditios. Based o the defiitio (5, the parameter space for (β, Σ, σ, γ, Γ, σ is defied as a product of two parameter spaces, Θ(k = G(k G(k = {θ = (β, Σ, σ, γ, Γ, σ : (β, Σ, σ G(k, (γ, Γ, σ G(k}. Defie a proper subspace G (k, η 0 of G (k as (6 G (k, η 0 = G (k { β η 0 }, (7 where η 0 0. Compared with G (k, the parameter space G (k, η 0 is restricted to sigals whose l orm is bouded away from 0 by η 0. A proper subspace of Θ(k for θ = (β, Σ, σ, γ, Γ, σ ca be defied as Θ(k, η 0 = {θ = (β, Σ, σ, γ, Γ, σ : (β, Σ, σ G(k, η 0, (γ, Γ, σ G(k, η 0 }. (8 The followig theorem establishes the miimax lower bouds for the covergece rates of estimatig the ier product β, γ, the quadratic fuctioals Q (β ad Q (γ ad the ormalized ier product R (β, γ. Theorem 3. Suppose k c mi {/log p, p ν } for some costats c > 0 ad 0 < ν < /, the if Î sup θ Θ(k ( ( P θ Î I (β, γ c (( β + γ 5 + k log p + k log p 4, (9

16 if R if Q if Q ( ( ( sup P θβ Q Q (β c β θ β G(k ( sup P θγ Q Q (γ c θ γ G(k ( sup P θ R R (β, γ c mi θ Θ(k,η 0 where c > 0 is a positive costat. + k log p ( ( γ + k log p { ( + k log p + η 0 η0 + k log p 4, (30 + k log p k log p, 4, (3 } 4, (3 Combied with Theorem, the above theorem implies that the estimators Î (β, γ, Q (β, Q (γ proposed i (,(6 ad (7 achieve the miimax lower bouds (9, (30 ad (3, so they are miimax rate-optimal estimators of I (β, γ, Q (β ad Q (γ, respectively. For estimatio of R (β, γ, uder the assumptio that η 0 C k log p/, Theorem shows that the estimator R (β, γ proposed i (9 achieves the miimax lower boud /η 0 (/ + k log p/ + /η 0 k log p/ i (3. Hece R (β, γ is the rate-optimal estimator of R (β, γ uder the assumptio that the sigal stregths β ad γ are bouded away from zero. However, if η 0 c k log p/, the estimatio of R (β, γ is ot iterestig sice the trivial estimator 0 will achieve the costat miimax lower boud i this case, which demostrates the ecessity of Assumptio (A. 4 Simulatio Evalulatios ad Comparisos I this sectio, the performace of several estimators for I (β, γ ad R (β, γ were compared i two differet settigs. These estimators icluded plug-i scaled Lasso estimator (Su ad Zhag, 0, plug-i de-biased estimator (Cai ad Guo, 06a; Javamard ad Motaari, 04; va de Geer et al., 04; Zhag ad Zhag, 04, plug-i thresholded estimator (Zhag ad Zhag, 04, Sectio 3.3 ad the proposed estimator FDE. Details about these estimators are listed as follows. Plug-i scaled Lasso estimator (Lasso: The ier product I (β, γ is estimated by β, γ ad the ormalized ier product R (β, γ is estimated by [ β, γ / ( β ] { γ β } γ > 0. 6

17 Plug-i de-biased estimator (De-biased: Deote the de-biased Lasso estimators as β ad γ. The ier product I (β, γ is estimated by β, γ ad the ormalized ier [ product R (β, γ is estimated by β, γ / ( β ] { γ β } γ > 0. Plug-i thresholded estimator (Thresholded: Deote the thresholded estimators as β ad γ. The ier product I (β, γ is estimated by β, γ ad the ormalized ier product R (β, γ is estimated by [ β, γ / ( β γ ] { β γ > 0 }. FDE: The ier product I (β, γ is estimated by Î (β, γ i ( ad the ratio R (β, γ is estimated by R (β, γ i (9. We cosider FDE with sample splittig (FDE-S ad without sample splittig (FDE-NS for R (β, γ. Implemetatio of the de-biased, thresholded ad FDE estimators requires the scaled Lasso estimators β ad γ i the iitial step. The scaled Lasso estimator was implemeted by the equivalet square-root Lasso algorithm (Belloi et al., 0. The theoretical tuig parameter is λ 0 =.0 log p/, which may be coservative i the umerical studies. Istead, the tuig parameter was chose as λ 0 = b.0 log p. However, the performaces of all estimators were evaluated across a grid of tuig parameter values b {.5,.5,.75, } (see Supplemetary Material, Sectio A.. The results showed that b =.5 was a good choice for all the estimators. Hece, λ 0 =.5.0 log p/ was used for the umerical studies i this sectio ad Sectio 5. To implemet the FDE algorithm, the other tuig parameter λ as.0 log p/ for the correctio Steps,3,5 ad 6 are give i Table. Comparisos of estimates of I (β, γ ad R (β, γ are preseted below. Results o estimatig the quadratic fuctioals are preseted i the Supplemetary Material, Sectio A.. For each settig, with the parameters (p,,, s, s, s, Σ, Γ, F β, F γ specified, the followig steps were implemeted:. Geerate sets S [p] ad S [p], with S = s, S = s ad S S = s. For β R p ad γ R p, geerate β j F β ad γ k F γ, for j S, k S, ad set β j = 0 ad γ k = 0, for j S, k S.. Geerate X i i.i.d N(0, Σ, i, ad Z i i.i.d N(0, Γ, i. 7

18 3. Geerate the oise ɛ i i.i.d N(0,, i, ad δ i Geerate the outcome as y = Xβ + ɛ ad w = Zγ + δ. i.i.d N(0,, i. 4. With X, y, Z, ad w, estimate I (β, γ ad R (β, γ through differet estimators. 5. Repeat -4 for rep replicatios. For a give quatity T, deote the estimator from l-th replicatio as T(X, y, Z, w; l. The mea squared error (MSE is used to measure the performace of this estimator, MSE(T = rep rep l= ( T(X, y, Z, w; l T. (33 Experimet. I this experimet, the parameters were set as follows: (p,,, rep = (600, 400, 400, 300, the sparsity parameters (s, s, s = (5, 30, 5, ad the covariace matrices Σ ad Γ satisfy Σ ij = Γ ij = (0.8 i j. For give positive values τ ad τ, the sigals of β satisfy that β ji = ( + i/s τ /, for j i S, i =,,, s, ad the sigals i γ satisfy that γ j = τ for j S. This simulatio settig aimed to ivestigate the case where the coefficiets for oe regressio are much larger tha the other by varyig the sigal stregth parameters as (τ, τ {(3.0,., (.6,., (.,.3, (.8,.4, (.,.6, (.,.4, (.3,., (.4,.0}. The results are summarized i Table. For all combiatios of the sigal stregth parameters, i terms of estimatig the ier product I (β, γ, FDE cosistetly outperformed the plugi estimates with Lasso ad Thresholded Lasso. Moreover, with icreasig differece betwee τ ad τ, the advatage of FDE over the plugi estimate usig Lasso or thresholded Lasso became larger. The same results were observed for estimatio of the ormalized ier product R (β, γ, where FDE-NS had cosistet better performace tha other methods. Although De-biased performed well i terms of estimatig I (β, γ, it performed much worse tha FDE-NS for estimatig R (β, γ. As discussed i Sectio, the sample splittig of estimatig the ormalized ier product is simply proposed to facilitate the theoretical aalysis ad might ot be ecessary for the algorithm. Our simulatio results idicated that the proposed estimator without sample splittig (FDE-NS performed quite well i all settigs, eve better tha FDE-S, due to the fact that more samples were used for estimatio ad correctio steps. Such observatios 8

19 led us to use the proposed estimator without sample splittig (FDE-NS i the real data aalysis i Sectio 5. Table : Estimatio of the ier product I (β, γ ad the ormalized ier product R (β, γ: the truth ad mea squared error (MSE for the plug-i estimator with the scaled Lasso estimator (Lasso, plug-i estimator with the de-biased estimator (De-biased, plug-i estimator with the thresholded estimator (Thresholded, ad the proposed estimator Î (β, γ (FDE, proposed estimators with sample splittig (FDE-S ad without sample splittig (FDE-NS. I (β, γ R (β, γ Stregth parameters, (τ, τ Method (.8,.4 (.,.3 (.6,. (3,. (.,.6 (.,.4 (.3,. (.4, Truth Lasso De-biased Thresholded FDE Truth Lasso De-biased Thresholded FDE-S FDE-NS Experimet. I this experimet, the parameters were set as follows: (p,,, rep = (800, 400, 400, 300, sigal stregth parameters (τ, τ = (.,. ad the covariace matrices Σ ad Γ satisfy Σ ij = Γ ij = (0.8 i j. The sigals of β follow that β ji = ( + i/s τ /, for j i S, i =,,, s, ad the sigals i γ satisfy that γ j = τ for j S. This simulatio settig aimed to ivestigate the relatioship betwee the performace of estimators ad the sigal sparsity level ad vary the umber of sigals i β ad γ as (s, s {(40, 40, (50, 50, (60, 60, (70, 70, (80, 80, (90, 90, (00, 00, (0, 0}, ad fix the umber of commo sigals at s = 0. Sice the umber of the associated variats is very large for both coefficiet vectors, large values of τ ad τ iduce strog sigals so that 9

20 all the methods perform well. Istead, the sigal magitude was fixed at τ =. ad τ =. as weak sigals. The results are summarized i Table 3. Clearly, FDE outperformed the other methods. Whe the sigals became deser, the improvemet of FDE over other methods was more proouced. For estimatio of R (β, γ, the results showed that FDE-NS cosistetly outperformed other estimators. As the umber of sigals icreased, the MSE correspodig to FDE-NS decreased quickly. Table 3: Estimatio of the ier product I (β, γ ad the ormalized ier product R (β, γ: the truth ad MSE for the plug-i estimator with the scaled Lasso estimator (Lasso, plug-i estimator with the de-biased estimator (De-biased, plug-i estimator with the thresholded estimator (Thresholded, ad the proposed estimator Î (β, γ (FDE, proposed estimators with sample splittig (FDE-S ad without sample splittig (FDE-NS. I (β, γ R (β, γ Sparsity parameter, s (s = s Method Truth Lasso De-biased Thresholded FDE Truth Lasso De-biased Thresholded FDE-S FDE-NS

21 5 Co-heritability Estimatio of Yeast Cross Geome Wide Associatio Data Bloom et al. (03 reported a large scale geome-wide associatio study of 46 quatitative traits based o,008 Saccharomyces cerevisiae segregats betwee a laboratory strai ad a wie strai. The data set icluded,63 uique geotype markers. Sice may of these markers were highly correlated, Bloom et al. (03 further selected a set of 4,40 markers that are weakly depedet based i the likage disequilibrium iformatio. These markers were selected by pickig oe marker closest to each cetimorga positio o the geetic map. The maker geotype was coded as or, accordig to which strai it came from. For each segregat, the traits of iterest were the ed-poit coloy size ormalized by the cotrol growth uder 46 differet growth media, icludig Hydroge Peroxide, Diamide, Calcium, Yeast Nitroge Base (YNB ad Yeast extract Peptoe Dextrose (YPD, etc. To demostrate the co-heritability amog these traits, 8 traits were cosidered, icludig the ormalized coloy size uder Calcium Chloride (Calcium, Diamide, Hydroge Peroxide (Hydroge, Paraquat, Raffiose, 6 Azauracil (Azauracil, YNB, ad YPD. Each trait was ormalized to have variace, so the quadratic orm represets the total geetic effects for each trait. FDE was applied to every pair of these 8 traits without sample splittig, ad the results are summarized i Table 4, icludig estimates of the ier product for all possible pairs, the quadratic orm for each trait ad the ormalized ier product R (β, γ. The geetic heritability of these traits raged from 0. for Raffiose to 0.67 for YPD. About two thirds of these pairs had a estimated co-heritability smaller tha 0., idicatig relatively weak geetic correlatios amog these traits. To further demostrate the geetic relatedess amog these pairs, for each trait, a t- statistic was calculated based o regressig the trait value y o geetic geetic marker X j, for i j p. A larger absolute value of the t-statistic implied a stroger effect of the marker o the trait. For ay pair of traits, the scatter plot of the t-statistics provided a way of revealig the geetic relatioship betwee them. Figure shows the plots of several pairs of the traits, icludig the pairs with a large positive R (β, γ, YPD v.s. YNB ad Calcium v.s.

22 Table 4: FDE estimatio for the ier product, the quadratic orm (top row of each block ad the ormalized ier product (bottom row of each block amog 8 traits of yeast cross. Traits Calcium Diamide Hydroge Paraquat Raffiose Azauracil YNB YPD Calcium Diamide Hydroge Paraquat Raffiose Azauracil YNB YPD.6680 YNB, pairs with a large egative R (β, γ, Raffiose v.s. Hydroge ad Calcium v.s. Hydroge, ad pairs with R (β, γ ear to 0, icludig Paraquat v.s. Diamide ad Paraquat v.s. Raffiose. The plot clearly idicates a strog positive geetic correlatio betwee YPD ad YNB. The correlatio betwee Calcium ad YNB is slightly smaller. Raffiose/Hydroge ad Calcium/Hydroge pairs clearly showed egative geetic correlatio. There were several geetic variats with very large effects o Hydroge, but they were ot associated with the other traits such as Raffiose ad Calcium. The shared geetic variats were relatively weak, leadig to a correlatio aroud -0.. The two plots o the right show the pairs of trats with weak correlatios. These plots idicated that the proposed co-heritability measures ca

23 YPD YNB Raffiose Hydroge Paraquat Diamide Calcium 0 Calcium 0 Paraquat YNB Hydroge Raffiose Figure : Scatter plots of margial regressio t statistics for six pairs of traits, icludig the pairs with positive correlatio case (left pael, egative correlatio (middle pael, ad o correlatio (right pael. ideed capture the geetic sharig amog differet related traits. 6 Proofs 6. Proof of Theorem For simplicity of otatio, we assume = ad use = = to represet the sample size throughout the proof. The proofs ca be easily geeralized to the case. Without loss of geerality, we assume that the Sub-gaussia orm of radom vectors X i ad Z i are also upper bouded by M, that is, max { } X i ψ, Z i ψ M. Proof of ( 3

24 Note the followig importat decompositio, Î (β, γ I (β, γ = β, γ + û ( X y X β + û Z (w Z γ β, γ ( = û ( X y X β ( γ, β β + û Z (w Z γ β, γ γ β β, γ γ =û X ɛ + (û Σ ( γ β β + û Z δ + (û Γ β (γ γ β β, γ γ. (34 The followig lemmas are itroduced to cotrol the terms i (34 ad similar results were established i the aalysis of Lasso, scaled Lasso ad de-biased Lasso (Cai ad Guo, 06a; Re et al., 03; Su ad Zhag, 0; Ye ad Zhag, 00. The proofs of the followig lemmas ca be foud i the supplemetary material, Sectio C. Lemma. Suppose the assumptio (A holds ad k log p/ 0 with k = max { β 0, γ 0 }. The with probability at least p c, we have β log p log p β Ck, γ γ Ck, (35 ad β β C where c ad C are positive costats. k log p k log p, γ γ C, (36 Lemma. Suppose the assumptio (A holds ad k log p/ 0 with k = max { β 0, γ 0 }. The with probability at least p c, we have (û Σ ( γ β β β 0 log p C γ ad (û Γ β (γ γ C β γ 0 log p. (37 With probability at least α p c, û X ɛ C γ z α/, ad hece ad û Z δ C β z α/, (38 ( û X y X β γ, β β C γ z α/ k log p + C γ, û Z (w Z γ β, γ γ C β z α/ + C β k log p, where c ad C are positive costats. 4 (39

25 By the decompositio (34 ad the iequalities (36, (37 ad (38, we obtai that ( Î (β, γ I (β, γ C β ( z α/ + γ + k log p + C k log p. By (36, we establish (. Proof of ( ad (3 The proof of (3 is similar to that of ( ad oly the proof of ( is preset i the followig. We itroduce the estimator Q(β = β ( + û ( 3 X ( y ( X ( β /(/ ad due to the fact that Q (β is o-egative, we have Q (β Q (β Q(β Q (β. We decompose the differece betwee Q(β ad Q (β, Q(β Q (β = β β + û ( ( 3 X ( y ( X β ( / = (û 3 Σ ( β ( β β + û ( 3 X ( ɛ ( / β β. Combied with the above argumet, the upper boud ( follows from (36 ad the followig lemma. The proof of the followig lemma ca be foud i the supplemetary material Sectio C. Lemma 3. Suppose the assumptio (A holds ad k log p/ 0 with k = max { β 0, γ 0 }. The with probability at least p c α, ( û 3 X ( ɛ ( / C β z α/, (40 ad where c ad C are positive costats. (û 3 Σ ( β ( β β k log p C β, (4 6. Proof of Theorem Let µ = û X (y X β / ad µ = û Z (w Z γ /. We itroduce the ew estimator R(β, γ = Î (β, γ Q (β Q = β, γ + µ + µ. (β Q (β Q (γ 5

26 Sice R (β, γ, we have R (β, γ R (β, γ R(β, γ R (β, γ. We decompose the differece betwee R(β, γ ad R (β, γ, β = = β, γ + µ + µ β, γ Q (β Q (γ β γ β Q (β + β β β, Q (β γ Q (γ β β,, Q (β γ Q (γ β β, γ + γ γ + β γ, γ + γ β γ Q (γ γ µ +. γ Q (β Q (γ Q (β µ + µ Q (β Q (γ µ Q (γ The followig lemma cotrols the terms i the above equatio ad establishes the theorem. The proof of the followig lemma ca be foud i the supplemetary material Sectio C. Lemma 4. Suppose the assumptios (A ad (A hold ad k log p/ 0 with k = max { β 0, γ 0 }. The with probability at least p c α, we have β β γ, γ β Q (β γ Q (γ C k log p β γ, (43 β β γ µ, + β Q (β γ Q (β Q (γ C ( zα/ + k log p ( k log p + C β β + β γ, (44 ad β β, γ Q (γ γ + γ Q (β where c ad C are positive costats. µ 3 Q (γ C γ ( zα/ + k log p ( + C γ + By the decompositio (4, combiig (43, (44 ad (45, we have Î (β, γ β, γ Q (β Q (β β γ ( C + ( zα/ + k log p ( + C + k log p + β γ β γ β γ. Uder the assumptio (A, the upper boud (4 follows from the above iequality. 6 (4 β γ (45 k log p,

27 6.3 Proof of (30 ad (3 i Theorem 3 We first itroduce the otatios used i the proof of lower boud results. Let π deote the prior distributio supported o the parameter space H. Let f π (z deote the desity fuctio of the margial distributio of the radom variable Z with the prior π o H. More specifically, f π (z = f θ (z π (θ dθ. We defie the χ distace betwee two desity fuctios f ad f 0 by χ (f (z f 0 (z (f, f 0 = dz = f 0 (z f (z dz (46 f 0 (z ad the L distace by L (f, f 0 = f (z f 0 (z dz. It is well kow that L (f, f 0 χ (f, f 0. (47 The proof of the lower boud is based o the followig versio of Le Cam s Lemma (LeCam (973; Re et al. (03; Yu (997. Lemma 5. Let T (θ deote a fuctioal o θ. Suppose that H 0 = {θ 0 }, H 0, H Θ ad d = mi θ H T (θ T (θ 0. Let π deote a prior o the parameter space H. The we have if T sup θ H 0 H P θ ( d T T (θ L (f π, f θ0. (48 The proofs of (30 ad (3 are applicatios of Lemma 5. The key is to costruct the parameter spaces H 0 = {θ 0 }, H ad the prior o H such that (i H 0, H Θ,(iithe L distace L (f π, f θ0 is cotrolled ad (iiithe distace d = mi θ H T (θ T (θ 0 is maximized. I the followig, we provide the detailed proof of (30. The proof of (3 is very similar to that of (30 ad is omitted here. I the discussio of lower boud results, we will assume µ x = µ z = 0 ad the desig X i ad Z i follow joit ormal distributio. We defie the followig two parameter spaces to facilitate the discussio, { G S (k = G (k β M } { } k log p ad G W (k = G (k β C. (49 4 The lower boud (30 ca be decomposed ito the followig three lower bouds, ( if sup P θβ Q Q (β c k log p β Q θ β G S (k 4, (50 7

28 ad ( if sup P θβ Q Q (β c β Q θ β G S (k 4, (5 if Q ( sup P θβ Q Q (β c k log p θ β G W (k 4, (5 where c > 0 is a positive costat. Combiig (50 ad (5, we have ( ( if sup P θβ Q Q (β c β + k log p Q θ β G S (k 4. (53 O G S (k, we have ( β + k log p { ( = max β + k log p, k log p } ad o G W (k, we have k log p { ( = max β + k log p, k log p }. Sice both G S (k ad G W (k are proper subspace of G (k, we ca establish (30 by combiig (53 ad (5. I the followig, we will establish the lower bouds (50, (5 ad (5 separately. Proof of (50 Uder the Gaussia radom desig model, V i = (y i, X i R p+ follows a joit Gaussia distributio with mea 0. Let Σ v deote the covariace matrix of V i. For the idices of Σ v, we use 0 as the idex of y i ad {,, p} as the idices for (X i,, X ip R p. Decompose Σ v ito blocks Σv yy Σ ( v xy, where Σ v yy, Σ v xx ad Σ v xy deote the variace of y i, the variace of X i ad the covariace of y i ad X i, respectively. Let Ω = Σ deote the precisio matrix. There exists a bijective fuctio h : Σ v (β, Ω, σ ad the iverse mappig h : (β, Ω, σ Σ v, where h ((β, Ω, σ = β Ω β + σ β Ω ad Ω β Ω h(σ v = ( (Σ v xx Σ v xy, (Σ v xx, Σ v yy ( Σxy v (Σ v xx Σxy v. (54 Based o the bijectio, it is sufficiet to cotrol the χ distace betwee two multivariate Gaussia distributios. 8 Σ v xy Σ v xx

29 We itroduce the ull parameter space { G 0 = θ 0 = (β, I, σ 0 : β = (β, η 0, 0,, 0 with β > M ad σ 0 = M 8 } G S (k, where η 0 0. Defie p = p. Based o the mappig h, we have the correspodig ull parameter space for Σ v, F 0 = {Σ v 0} where (β + η0 + σ0 β η 0 0 β Σ v 0 = 0 0. η I p p We the itroduce the alterative parameter space for Σ v, which will iduce a parameter space for (β, Ω, σ through the mappig h. Defie F = {Σ v α : α l (p, k, ρ}, where (β + η0 + σ0 β η 0 ρ 0 α β Σ v α = 0 α, (55 η ρ 0 α α 0 I p p with ρ 0 = β + σ 0 ad l (p, k, ρ = {α : α R p, α 0 = k, α i {0, ρ} for i p }. (56 The we costruct the correspodig parameter space G for (β, Ω, σ, which is iduced by the mappig h ad the parameter space F, G = {(β, Ω, σ : (β, Ω, σ = h (Σ v for Σ v F }. (57 For Σ v α, the correspodig (β, Ω, σ defied as h (Σ v α satisfies β = α ρ 0 + β, β α = η 0, β {,} = (ρ 0 β α = σ 0 α, (58 α σ = σ0 α (ρ 0 β σ α 0 M, (59 max { λ mi (Ω, λ max (Ω } α. (60 9

30 Whe α is chose such that α mi { /M, M }, the we have The differece betwee β ad β is M λ mi (Ω λ max (Ω M. (6 β β = α (β ρ 0 α = σ 0 α. α By takig ρ = log(4p /k /, we have β β Ck log p/ ad hece β β Ck log p/ M /4. Combied with (59, (58 ad (6, we show that G G S (k. Let π deote the uiform prior over the parameter space G iduced by the uiform prior of α over l (p, k, ρ where ρ = log(4p /k /. The cotrol of L (f π, f θ0 is established i the followig lemma, which follows from Lemma of Cai ad Guo (06a ad is established i (7. of Cai ad Guo (06a. Lemma 6. Suppose that k c {/log p, p γ }, where 0 γ < ad c is a sufficiet small positive costat. For ρ = log(4p /k /, we establish that α mi { /M, M } ad L (f π, f θ0 4. (6 To apply Lemma 5, we cosider the fuctioal T(θ = β ad calculate the distace d = σ 0 β + ( α α (β σ 0 = α α β + β σ 0 α. (63 By takig β = M / > σ 0 ad η 0 = 0, we have β M / Ck log p/ ad σ α α β + β σ 0 log p α ck σ 0 max { β, β } ck log p σ 0 max { β, β }, where the last iequality follows from the fact that β Ck log pσ 0 / ad β = 0. Combied with (6, a applicatio of Lemma 5 leads to (50. Proof of (5 We costruct the followig parameter spaces, { G 0 = θ 0 = (β, I, σ 0 : β = (β, η 0, 0, 0, ad β M } G S (k { ( } G = θ = (β, I, σ 0 : β = β ɛ + σ 0, η 0, 0, 0 G S (k, where ɛ is a small positive costat to be chose later. The proof of the followig lemma ca be foud i the supplemetary material Sectio C. 30 (64

31 Lemma 7. If ɛ = log (7/6 /, the we have L (f π, f θ0 4. (65 To apply Lemma 5, we take η 0 = 0 ad calculate the distace d = β (β = βσ ɛ + σ ɛ c max{ β, β }. Applyig Lemma 5, we establish (5. Proof of (5 We itroduce the followig ull ad alterative parameter spaces, G 0 = {(β, I, σ : β = (η 0, 0, 0,, 0} G = {(β, I, σ : β = (η 0, 0, α with α l (p, k, ρ}, (66 where l (p, k, ρ = { α : α R p, α 0 = k, α i {0, ρ} }. (67 Let π deote the prior over the parameter space G iduced by the uiform prior of α over l (p, k, ρ where ρ = log(4p /k /. The cotrol of L (f π, f θ0 is established i the followig lemma, which follows from Lemma 7 of Cai ad Guo (06b ad is established i (.6 of Cai ad Guo (06b. Lemma 8. Suppose that k c {/log p, p γ }, where 0 γ < positive costat. For ρ = log(4p /k /, we establish ad c is a sufficiet small L (f π, f θ0 8. (68 By specifyig η 0 = 0, we have G 0 ad G defied i (66 are proper subspaces of the parameter space G W (k. To apply Lemma 5, we calculate the distace d = α ck log p/. Applyig Lemma 5, we establish (5. SUPPLEMENTARY MATERIAL Title: Supplemet to Optimal Estimatio of Co-heritability i High-dimesioal Liear Regressios. (.pdf file 3

32 Refereces Baradat, P. (976, Use of juveile-mature relatioships i idividual selectio icludig iformatio from relatives, Proceedig of IUFRO Meetig o advaced geeratio breedig, Bordeax, pp. 38. Belloi, A., Cherozhukov, V., ad Wag, L. (0, Square-root lasso: pivotal recovery of sparse sigals via coic programmig, Biometrika, 98(4, Bloom, J. S., Ehrereich, I. M., Loo, W. T., Lite, T.-L. V., ad Kruglyak, L. (03, Fidig the sources of missig heritability i a yeast cross, Nature, 494(7436, Cai, T. T., ad Guo, Z. (06a, Cofidece itervals for high-dimesioal liear regressio: miimax rates ad adaptivity, The Aals of Statistics, To appear. Cai, T. T., ad Guo, Z. (06b, Supplemet to Cofidece itervals for high-dimesioal liear regressio: miimax rates ad adaptivity, The Aals of Statistics, To appear. Fa, J., Ha, X., ad Gu, W. (0, Estimatig false discovery proportio uder arbitrary covariace depedece, Joural of the America Statistical Associatio, 07(499, Gola, D., ad Rosset, S. (0, Accurate estimatio of heritability i geome wide studies usig radom effects models, Bioiformatics, 7, i37 i33. Guo, Z., Kag, H., Cai, T. T., ad Small, D. S. (06, Cofidece Iterval for Causal Effects with Possibly Ivalid Istrumets Eve After Cotrollig for May Cofouders, arxiv preprit arxiv: ,. Javamard, A., ad Motaari, A. (04, Cofidece itervals ad hypothesis testig for high-dimesioal regressio, The Joural of Machie Learig Research, 5(, LeCam, L. (973, Covergece of estimates uder dimesioality restrictios, The Aals of Statistics, (,

Optimal Estimation of Genetic Relatedness in High-dimensional Linear Models

Optimal Estimation of Genetic Relatedness in High-dimensional Linear Models Optimal Estimatio of Geetic Relatedess i High-dimesioal Liear Models Abstract Estimatig the geetic relatedess betwee two traits based o the geome-wide associatio data is a importat problem i geetics research.