Optimal Estimation of Co-heritability in High-dimensional Linear Models

Size: px
Start display at page:

Download "Optimal Estimation of Co-heritability in High-dimensional Linear Models"

Transcription

1 Optimal Estimatio of Co-heritability i High-dimesioal Liear Models Zijia Guo, Wajie Wag,, T. Toy Cai, ad Hogzhe Li Departmet of Statistics, The Wharto School, Uiversity of Pesylvaia Departmet of Biostatistics ad Epidemiology, Perelma School of Medicie, Uiversity of Pesylvaia Abstract Co-heritability is a importat cocept that characterizes the geetic associatios withi pairs of quatitative traits. There has bee sigificat recet iterest i estimatig the co-heritability based o data from the geome-wide associatio studies (GWAS. This paper itroduces two measures of co-heritability i the highdimesioal liear model framework, icludig the ier product of the two regressio vectors ad a ormalized ier product by their legths. Fuctioal de-biased estimators (FDEs are developed to estimate these two co-heritability measures. I additio, estimators of quadratic fuctioals of the regressio vectors are proposed. Both theoretical ad umerical properties of the estimators are ivestigated. I particular, miimax rates of covergece are established ad the proposed estimators of the ier product, the quadratic fuctioals ad the ormalized ier product are show to be rate-optimal. Simulatio results show that the FDEs sigificatly outperform the aive plug-i estimates. The FDEs are also applied to aalyze a yeast segregat data set with multiple traits to estimate heritability ad co-heritability amog the traits. 0 Zijia Guo is a Ph.D. studet ( zijguo@wharo.upe.edu. Wajie Wag is a Postdoctoral Research Fellow ( wajiew@wharto.upe.edu. T. Toy Cai is Dorothy Silberberg Professor of Statistics ( tcai@wharto.upe.edu. The research of T. Toy Cai was supported i part by NSF Grats DMS-0898 ad DMS , ad NIH Grat R0 CA7334. Hogzhe Li is Professor ( hogzhe@upe.edu. The research of Wajie Wag ad Hogzhe Li was supported i part by NIH grats CA7334 ad GM

2 Keywords: Geetic correlatios, geome-wide associatio studies, ier product, miimax rate of covergece, quadratic fuctioal. Itroductio. Motivatio ad Backgroud Geome-wide associatio studies (GWAS have ot oly led to idetificatio of thousads of geetic variats or sigle ucleotide polymorphisms (SNPs that are associated various complex pheotypes (Maolio, 00, they also provide importat iformatio to estimate the geetic heritability, a key populatio parameter that ca help uderstad the geetic architecture of complex traits (Yag et al., 00. Results from these GWAS have also show that may complex pheotype shared commet geetic variats, icludig various autoimmue diseases (Zherakova et al., 009 ad psychiatric disorders (Lee et al., 03. These empirical evidece of shared geetic etiology for various pheotypes ca iform osology ad ecourages the ivestigatio of commo pathophysiologies for related disorders that ca be explored for drug repositioig. There has bee a recet iterest i estimatig the geetic correlatios betwee two traits based o the geome-wide aggregate effects of causal variats affectig both traits. The cocept of co-heritability has bee proposed to describe the geetic associatios withi pairs of quatitative traits based o GWAS data. This is i cotrast to the traditioal approaches estimatig co-heritability based o twi or family studies. (Baradat, 976 first geeralized the cocept of heritability ad defied the coefficiet of geetic predictio betwee two traits as the ratio of additive geetic covariace over the product of the pheotypic stadard deviatio of each trait. This idea was recetly exteded to defie co-heritability usig GWAS data by estimatig the geetic risk scores (GRSs for oe pheotype ad their associatio with a secod pheotype (Purcell et al., 009; Wray et al., 007. This defiitio is ad hoc ad does ot accout for the fact oly a very small set of SNPs are expected to be associated with either of the two traits. Lee et al. (0 ad Yag et al. (03 exteded the mixed-effect model framework for

3 estimatig the heritability to estimate geetic covariace ad geetic correlatio betwee two traits. I their models, each idividual s pheotype is affected by a geetic radom effect, which is correlated across idividuals by virtue of sharig some of the geetic variats affectig the traits, ad a evirometal radom effect, which is ucorrelated across idividuals. They defie co-heritability as the square-root of the ratio of the covariace of the geetic radom effects to the product of the total variaces. The mixed-effect model approach requires kowledge of the idetity of the causal SNPs, ad hece the covariace matrix. This is however ot available, Lee et al. (0 ad Yag et al. (03 approximate the geetic correlatio betwee every pair of idividuals across the set of causal SNPs by the geetic correlatio across the set of all geotyped SNPs. The cocept of usig a estimated correlatio matrix istead of the actual correlatio matrix i estimatig the heritability is a questioable heuristic (Gola ad Rosset, 0. The very large umber of SNPs used for estimatig the geetic correlatios, most of them likely ot causative, might lead to biased estimate of correlatios due the set of causal SNPs. This has bee clearly demostrated by Gola ad Rosset (0. The goal of this paper is to defie two quatities that ca be used for measurig the co-heritability betwee a pair of traits based o GWAS data i the framework of highdimesioal liear models. Istead of usig the mixed-effects model framework, our defiitio of the co-heritability is similar i spirit to that of Baradat (976. The proposed estimatio procedure ivolves selectig the relevat geetic variats, which overcomes the problem of most of the SNPs i GWAS are ot relevat to either of the two traits.. Problem Formulatio I this paper, a pair of trait values (y, w are modeled as a liear combiatio of p geetic variats ad a error term that icludes evirometal ad umeasured geetic effects, y = X pβ p + ɛ ad w = Z pγ p + δ, ( where the rows X i are i.i.d. p-dimesioal Sub-gaussia radom vectors with mea µ x ad covariace matrix Σ, the rows Z i are i.i.d. p-dimesioal Sub-gaussia radom vectors with mea µ z ad covariace matrix Γ ad the error (ɛ, δ follows the multivariate ormal 3

4 distributio with mea zero ad covariace σ I 0 0 σi ad is assumed to be idepedet of X ad Z. Throughout the paper there are o particular assumptios o the relatio betwee the radom desig matrices X ad Z, where they ca be idetical, correlated or idepedet. I the study of geetic co-heritability, the pairs of traits y ad w are assumed to have mea zero, ad the jth colum of X, X j, ad the jth colum of Z, Z j, are the umerically coded geetic markers at the jth geetic variat ad are assumed to have variace. Uder this model, if the colums of X ad Z are idepedet, for the i-th observatio, Var(y i = j β j + σ = β + σ, ad Var(w i = j γ j + σ = γ + σ, therefore β ad γ ca the be iterpreted as the arrow sese heritability. Based o this model, oe measure of geetic co-heritability is the ier product of the regressio coefficiets I (β, γ = β, γ ( which measures the shared geetic effects betwee these two traits. Alteratively, a ormalized ier product, that is the ratio R (β, γ = β, γ β γ ( β γ > 0, (3 ca also be used. The deomiator β ad γ represet the total geetic variats for two traits where the colums of the desig matrix are idepedet. I the case where oe of β ad γ is vaishig, the ratio is defied to be zero, which meas that there is o correlatio betwee two traits whe oe of the regressio vector is zero. With this ormalizatio, R (β, γ is always betwee - ad ad ca be used to compare co-heritability amog multiple pairs. The focus of this paper is to develop estimators for I (β, γ ad R (β, γ based o two idepedet GWAS data with geotype data measured o the same set of geetic markers, deoted by (y i, X i, i =,, ad (w i, Z i, i =,,. 4 A aive estimator is to

5 estimate β ad γ first ad the plug i the estimators of β ad γ ito the expressios ( ad (3. For the problem of iterest, usually there are may more geetic markers tha the sample size, that is p max{, }. However, for ay trait, oe expect that oly a few of these markers have ozero effects. Oe ca apply ay high-dimesioal regressio methods such as Lasso (Tibshirai, 996 ad scaled Lasso (Su ad Zhag, 0 ad margial regressio with screeig (Fa et al., 0; McCarthy et al., 008 to estimate these sparse regressio coefficiets. However, such plug-i estimators have several drawbacks i terms of estimatig heritability ad co-heritability. The Lasso approach shriks the estimatio towards 0, i particular, some weak effects might be shruke to 0, yet the accumulatio of these weak effects may cotribute sigificat to the trait variability. It is possible that some geetic variats may have strog effects o oe trait ad weak effects o the other trait. Due to shrikage, the plug-i of Lasso type estimators fails to capture this part of cotributio to co-heritability from such geetic variats. Margial regressio calculates the regressio score betwee the trait ad each sigle marker (i.e., y ad X j, j p, ad scree for the large scores. This approach also suffers from the existece of weak effects, as the margial scores must be large eough to survive i the screeig step..3 Methods ad Mai Results I this paper, a ew estimator, fuctioal de-biased estimator of ier product ad ormalized ier product (FDE is proposed to estimate the co-heritability measures I (β, γ defied i ( ad R (β, γ defied i (3. The proposed estimator for I (β, γ ivolves two steps, icludig estimatig the ier product by the plug-i scaled Lasso estimator, ad the correctig the plug-i scaled Lasso estimator. Similar two-step procedures are proposed to estimate the quadratic fuctioals β ad γ. To estimate the ormalized ier product, we plug-i the estimators of the ier product ad quadratic fuctioals ito the defiitio (3. FDE is show to achieve the miimax optimal covergece rates of estimatig I (β, γ ad R (β, γ. The correctio step i FDE improves the estimatio accuracy through correctig the shrikage from Lasso ad is crucial to achieve the optimal covergece rate. As 5

6 demostrated i the simulatio studies, FDE cosistetly outperforms the plug-i estimator of the scaled Lasso estimator. I particular, FDE has sigificat improvemet of estimatig the co-heritability measures i the existece of some geetic markers havig weak effects o oe trait ad strog effects o the other trait. I additio, FDE does ot suffer from depedecy amog geetic markers. FDE works for a broad class of depedecy structure of geetic markers, ad it also allows depedecy betwee two desig matrices X ad Z. The theoretical aalysis give i Sectio 3 shows that the optimal rate of covergece for estimatig I (β, γ is ( ( β + γ + k log p + k log p, where p is the dimesio, is the sample size, ad k is the maximum sparsity of β ad γ. The optimal rate depeds ot oly o p, ad k, but also the sigal stregth β ad γ. Restrictig to the relatively strog sigals with mi{ β, γ } η 0 Ck log p/, where C is some costat, the optimal covergece rate of estimatig R (β, γ is ( + k log p + k log p η 0 η0. I cotrast to estimatig I (β, γ, the optimal rate scales to the iverse of the sigal stregth, represeted by /η 0. The estimators Î (β, γ ad R (β, γ proposed i Sectio 3 adaptively achieve the optimal rates for estimatig I (β, γ ad R (β, γ, respectively. I additio, the optimal rate of covergece for estimatig the quadratic fuctioals Q (β = β ad Q (γ = γ is also established..4 Orgaizatio of the Paper The rest of the paper is orgaized as follows. Sectio presets the procedures for estimatig I (β, γ, Q (β, Q (γ, ad R (β, γ i detail. I Sectio 3, miimax rates of covergece for the estimatio problems are established ad the proposed estimators are show to attai the optimal rates. I Sectio 4, simulatio studies are coducted to evaluate the empirical performace of the FDEs. A yeast cross data is used to illustrate the estimators i Sectio 5. The proofs of mai theorems are preset i Sectio 6. The remaiig proofs ad the exteded simulatio studies are give i the supplemetary materials. 6

7 .5 Notatio ad Defiitios Basic otatio ad defiitios that will be used i the rest of the paper are defied here. For a matrix X R p, X i, X j, ad X i,j deote respectively the i-th row, j-th colum, ad (i, j-th etry of the matrix X, X i, j deotes the i-th row of X excludig the j-th coordiate, ad X j deotes the sub-matrix of X excludig the j-th colum. Let [p] = {,,, p}. For a subset J [p], X J deotes the sub-matrix of X cosistig of colums X j with j J ad for a vector x R p, x J is the sub-vector of x with idices i J ad x J is the sub-vector with idices i J c. For a vector x R p, the l q orm of x is defied as x q = ( q i= x i q q for q 0 with x 0 deotig the cardiality of o-zero elemets of x ad x = max j p x j. For a matrix A ad q, A q = sup x q= Ax q is the matrix l q operator orm. I particular, A is the spectral orm. For a symmetric matrix A, λ mi (A ad λ max (A deote respectively the smallest ad largest eigevalue of A. For a set S, S deotes the cardiality of S. For a R, a + = max {a, 0} ad sig(a is the sig of a, i.e., sig(a = if a > 0, sig (a = if a < 0 ad sig (0 = 0. Defie the Sub-gaussia orm x ψ of x R p as x ψ = sup v S p sup q (E v x q /q q where S p is the uit sphere i R p. The radom vector x R p is defied to be Sub-gaussia if its correspodig Sub-gaussia orm is bouded; see Vershyi (0 for more o Subgaussia radom variables. For the desig matriices X R p ad Z R p, we defie the correspodig sample covariace matrices as Σ = X X/ ad Γ = Z Z/. z α/ deote the upper α/ quatile of the stadard ormal distributio. For two positive sequeces a ad b, a b meas a Cb for all ad a b if a b ad b a. c ad C are used to deote geeric positive costats that may vary from place to place. Let Estimatio Methods Details of the FDE estimatio procedures are give i this Sectio. Estimatio of I (β, γ i Sectio. is cosidered first ad the estimatio of the quadratic fuctioals Q (β = β ad Q (γ = γ is give i Sectio.. A estimator of the ormalized ier product R (β, γ is give i Sectio.3 by combiig the estimators of I (β, γ, Q (β ad Q (γ developed i Sectios. ad.. 7

8 . Estimatio of I (β, γ Estimatio of the ier product I (β, γ = β, γ is cosidered first sice I (β, γ is of sigificat iterest i its ow right. I additio, costructig a good estimator of I (β, γ is also a key step i developig a optimal estimator for the ormalized ier product R (β, γ. The scaled Lasso estimators for high-dimesioal liear model ( are defied through the followig optimizatio algorithm (Su ad Zhag, 0, ad where λ 0 =.0 log p. { β, y Xβ ˆσ } = arg mi + σ β R p,σ R + σ + λ 0 w Zγ { γ, ˆσ } = arg mi + σ γ R p,σ R + σ + λ 0 p j= p j= X j β j, (4 Z j γ j, (5 A atural estimator of the ier product I (β, γ ca the be obtaied by simply pluggig i the scaled Lasso estimators β ad γ give i (4 ad (5 respectively. However, as we will show, β, γ is ot a good estimate of I (β, γ i geeral due to the bias i the Lasso or scaled Lasso estimators ad other reasos. A alterative is to use the de-biased estimator, which was origially itroduced for the costructio of cofidece itervals (Cai ad Guo, 06a; Javamard ad Motaari, 04; va de Geer et al., 04; Zhag ad Zhag, 04. Pluggig i the de-biased estimator does ot lead to a good estimator either; it has large variace. See Reamrk for further discussios. To costruct a optimal estimator of I (β, γ, it is helpful to decompose the differece betwee β, γ ad β, γ ito three terms, β, γ β, γ = γ, β β + β, γ γ β β, γ γ. (6 The last term o the right had side, β β, γ γ is small, but the first two terms γ, β β ad β, γ γ ca be large. Our approach is to first estimate these two terms ad the subtract them from β, γ to obtai the fial estimate of I (β, γ. Some ituitio for estimatig γ, β β is give here. Sice ( X y X β = Σ ( β β + X ɛ, (7 8

9 multiplyig both sides of (7 by a vector u R p yields ( u X y X β ( = u Σ β β + u X ɛ, (8 which ca be writte as ( u X y X β γ, β β = ( ( Σu γ β β + u X ɛ. (9 If the vector u ca be chose such that the right had side of (9 is small, the u X (y X β / is a good estimate of γ, β β. To costruct u, ote that the first term o the right had side of (9 ca be further ( ( β upper bouded as Σu γ β Σu γ β β. To estimate γ, β β, ( ( a projectio vector u is costructed such that Σu γ β β is cotrolled through costraiig the term Σu γ ad the secod term of (9 u X ɛ/, which has mea 0, is cotrolled through miimizig its variace σ u Σu/. This leads to the followig covex optimizatio algorithm for idetifyig the projectio vector u for estimatig γ, β β, { } û = arg mi u λ Σu : Σu γ γ, (0 u R p where λ = λ max (Σ log p. Remark. The solutio of the above optimizatio problem might ot be uique ad û is defied as ay miimizer of the optimizatio problem. The theory established i Sectio 3 still holds for ay miimizer of (0. The optimizatio problem (0 is solved through its equivalet Lagrage dual problem, which is computatioally efficiet ad scales well to the high-dimesioal problem. See Step i Table for more details. Oce the projectio vector û is obtaied, γ, β β is the estimated by û X (y X β /. Similarly, the projectio vector for estimatig β, γ γ ca be obtaied via the covex algorithm { û = arg mi u Γu } : Γu β β λ, ( u R p where λ = λ max (Γ log p. The β, γ γ is estimated by û Z (w Z γ /. The fial estimator Î (β, γ of I (β, γ is give by Î (β, γ = β, γ + û ( X y X β + û Z (w Z γ. ( 9

10 It is clear from the above discussio that the key idea for the costructio of the fial estimator Î (β, γ is to idetify the projectio vectors û ad û such that γ, β β ad β, γ γ are well estimated. It will be show i Sectio 3 that the estimator Î (β, γ is adaptively miimax rate-optimal. Remark. As metioed before, simply pluggig i the Lasso, scaled Lasso, or de-biased estimator does ot lead to a good estimator of I (β, γ. Aother atural approach is to first threshold the de-biased estimator to obtai a sparse estimator of the coefficiet vectors (see details i Zhag ad Zhag (04, Sectio 3.3, Guo et al. (06, equatio (0 ad the plug i. This estimator is referred as the thresholded estimator. Simulatios i Sectio 4 demostrate that the proposed estimator defied i ( outperforms the three plug-i estimators usig the scaled Lasso, de-biased, ad thresholded estimators.. Estimatio of Q (β ad Q (γ I order to estimate the ormalized ier product R (β, γ, it is ecessary to estimate the quadratic fuctioals Q (β = β ad Q (γ = γ. To this ed, we radomly split the data (y, X ito two subsamples ( y (, X ( with sample size / ad ( y (, X ( with sample size / ad the data (w, Z ito two subsamples ( w (, Z ( with sample size / ad ( w (, Z ( with sample size /. With a slight abuse of otatio, let β ad γ deote the optimizers of the scaled Lasso algorithm (4 applied to ( y (, X ( ad (5 applied to ( w (, Z (, respectively. For the scaled Lasso algorithms, the sample sizes ad are replaced by / ad /, respectively. Agai, the simple plug-i estimator β of Q (β ad γ of Q (γ do ot perform well. Note that β β = β, β β β β. (3 The secod term o the right had side of (3 is small, but the first ca be large. The plug-i estimator β does ot estimate β well for this reaso. Similar ideas as i estimatig β, γ are applied here to estimate β. Specifically, the term β, β β is estimated first ad the is added it to β to obtai the fial estimate of β. To estimate 0

11 β, β β, a projectio vector uis idetified such that the followig differece is cotrolled, ( / u X ( ( y ( X ( β β, β β ( = u Σ( β ( β β + ( / u X ( ɛ, with Σ ( = ( X ( X ( /( /. Let the projectio vector û 3 be the solutio to the followig optimizatio algorithm û 3 = arg mi u R p { u Σ( u : Σ ( u β β λ / } (4, (5 where λ = λ max (Σ log p. We the estimate β, β β ( by û ( 3 X ( y ( X ( β /( / ad propose the fial estimator of β as ( Q (β = β + û 3 / Similarly, the estimator of γ is give by ( Q (γ = γ + û 4 / where û 4 = arg mi u { ( ( X ( y ( X ( β. (6 + ( Z ( ( w ( Z ( γ, (7 + u Γ( u : Γ ( u γ γ λ / with Γ ( = ( Z ( ( Z ( /( / ad λ = λ max (Γ log p. }, (8 Remark 3. Sample splittig is used here for the purpose of the theoretical aalysis. I the Simulatio Sectio, the performace of the proposed estimator without sample splittig is ivestigated; see Steps 5-7 i Table. The proposed estimator without sample splittig performs eve better umerically tha with sample splittig sice more observatios are used i costructig the iitial estimators β ad γ ad the projectio vectors û 3 ad û 4..3 Estimatio of R (β, γ Give the estimators Î (β, γ, Q (β ad Q (γ costructed i Sectios. ad., a atural estimator for the ormalized ier product R (β, γ is give by Î (β, γ } R (β, γ = sig (Î (β, γ mi Q (β Q { Q (β Q (γ > 0,, (9 (γ

12 where Î (β, γ, Q (β ad Q (γ are estimators of β, γ, β ad γ defied i (, (6 ad (7, respectively. It is possible that oe of Q (β ad Q (γ is 0 if β ad γ are close to zero. I this case, the ormalized ier product R (β, γ is estimated as 0. Sice R (β, γ is always betwee ad, the estimator R (β, γ is trucated to esure that it is withi the rage. The FDE algorithm without sample splittig for calculatig the estimators Î (β, γ, Q (β, Q (γ, ad R (β, γ is detailed i Table. 3 Theoretical Aalysis The theoretical properties of the estimators itroduced i the last sectio are ivestigated i this secrtio. The upper bouds ad lower bouds together show that the proposed estimators Î (β, γ, Q (β, Q (γ, ad R (β, γ achieve the optimal rates of covergece. 3. Upper Boud Aalysis The samples sizes ad are assumed to be of the same order, that is. Let = mi{, } be the smallest of two sample sizes. The followig assumptios are eeded to facilitate the theoretical aalysis. (A The covariace matrices Σ ad Γ satisfy /M λ mi (Σ λ max (Σ M ad /M λ mi (Γ λ max (Γ M, ad the oise levels σ ad σ satisfy max{σ, σ } M, where M ad M > 0 are positive costats. (A The l orms of the coefficiet vectors β ad γ are bouded away from zero i the sese that mi { β, γ } η 0 C k log p, where k = max{ β 0, γ 0 }. (0 Assumptio (A places a coditio o the spectrum of the covariace matrices Σ ad Γ ad a upper boud o the oise levels σ ad σ. Assumptio (A requires that the sigal has to be bouded away from zero by η 0, which is oly used i the upper boud aalysis of the ormalized ier product R (β, γ.

13 Table : FDE algorithm without sample splittig for estimatig the ier product, legths ad ormalized ier product. Iput: desig matrices: X, Z; respose vectors: y, w; tuig parameters λ 0,λ. Output: Î (β, γ, Q (β, Q (γ ad R (β, γ. Iitial Lasso estimators:. Scaled Lasso: Calculate β ad γ from (4 ad (5 with the tuig parameter λ 0. Ier product calculatio:. Projectio vector û : Calculate û = arg mi u u X Xu/4 + u γ + λ t u, where λ t = λ t /.5, ad λ 0 = λ/. Repeat util û ca ot be solved with λ replaced by λ t+, or t Projectio vector û : Calculate û = arg mi u u Z Zu/4 + u β + λ u, where λ = λ t /.5, ad λ 0 = λ/. Repeat util û ca ot be solved with λ replaced by λ t+, or t Correctio: Î (β, γ = β, γ + û X (y X β/ + û Z (w Z γ/. Quadratic fuctioal calculatio: 5. Projectio vector û 3 : Calculate û 3 = arg mi u u X Xu/4 + u β + λ u, where λ = λ t /.5, ad λ 0 = λ/. Repeat util û 3 ca ot be solved with λ replaced by λ t+, or t Projectio vector û 4 : Calculate û 4 = arg mi u u Z Zu/4 + u γ + λ u, where λ = λ t /.5, ad λ 0 = λ/. Repeat util û ca ot be solved with λ replaced by λ t+, or t Correctio: Q (β = ( β + û 3X (y X β/ +, Ratio calculatio: Q (γ = ( γ + û 4Z (w Z γ/ R (β, γ = sig( Î (β, γ mi {( Î (β, γ / Q (β Q } } (γ { Q (β Q (γ > 0,. 3

14 The followig theorem establishes the covergece rates of the estimators Î (β, γ, Q (β ad Q (γ, proposed i (, (6 ad (7, respectively. Theorem. Suppose the assumptio (A holds ad k log p/ 0 with k = max { β 0, γ 0 }. The for ay fixed costat 0 < α < 0.5, with probability at least α p c, we have ( Î (β, γ I (β, γ zα/ C ( β + γ + k log p + C k log p, ( ( zα/ Q (β Q (β C β + k log p + C k log p, ( ad ( zα/ Q (γ Q (γ C γ + k log p where c ad C are positive costats. + C k log p, (3 The upper boud of estimatig β, γ ot oly depeds o k, ad p, but also scales to the sigal stregths β ad γ. For the estimatio of the quadratic fuctioal Q (β, the covergece rate depeds o β. The similar pheomeo holds for estimatio of Q (γ. The followig theorem establishes the covergece rate of the estimator R (β, γ proposed i (9. Theorem. Suppose the assumptios (A ad (A hold ad k log p/ 0 with k = max { β 0, γ 0 }. The for ay fixed costat 0 < α < 0.5, with probability at least α p c, we have R (β, γ R (β, γ C ( zα/ + k log p + C η 0 η0 where c ad C are positive costats. k log p, (4 I cotrast to Theorem, Theorem requires the extra assumptio (A o the sigal stregths β ad γ. The covergece rate of estimatig R (β, γ is scaled to the iverse of sigal stregths, /η 0. This is differet from the error boud i Theorem, where the estimatio accuracy is scaled to the l orm. The lower boud results established i Theorem 3 will demostrate the ecessity of Assumptio (A for estimatio of R (β, γ. 4

15 3. Miimax Lower Boud This sectio establishes the miimax lower bouds that show the optimality of the proposed estimators i Sectio. We first itroduce the parameter spaces for θ = (β, Σ, σ, γ, Γ, σ, which is the product of parameter spaces for θ β = (β, Σ, σ ad θ γ = (γ, Γ, σ. To start with, we itroduce the followig parameter space for both θ β = (β, Σ, σ ad θ γ = (γ, Γ, σ, G (k = { (β, Σ, σ : β 0 k, } λ mi (Σ λ max (Σ M, σ M, (5 M where M ad M > 0 are positive costats. The parameter space defied i (5 requires the sparsity β 0 k. The other coditios /M λ mi (Σ λ max (Σ M ad σ M are regularity coditios. Based o the defiitio (5, the parameter space for (β, Σ, σ, γ, Γ, σ is defied as a product of two parameter spaces, Θ(k = G(k G(k = {θ = (β, Σ, σ, γ, Γ, σ : (β, Σ, σ G(k, (γ, Γ, σ G(k}. Defie a proper subspace G (k, η 0 of G (k as (6 G (k, η 0 = G (k { β η 0 }, (7 where η 0 0. Compared with G (k, the parameter space G (k, η 0 is restricted to sigals whose l orm is bouded away from 0 by η 0. A proper subspace of Θ(k for θ = (β, Σ, σ, γ, Γ, σ ca be defied as Θ(k, η 0 = {θ = (β, Σ, σ, γ, Γ, σ : (β, Σ, σ G(k, η 0, (γ, Γ, σ G(k, η 0 }. (8 The followig theorem establishes the miimax lower bouds for the covergece rates of estimatig the ier product β, γ, the quadratic fuctioals Q (β ad Q (γ ad the ormalized ier product R (β, γ. Theorem 3. Suppose k c mi {/log p, p ν } for some costats c > 0 ad 0 < ν < /, the if Î sup θ Θ(k ( ( P θ Î I (β, γ c (( β + γ 5 + k log p + k log p 4, (9

16 if R if Q if Q ( ( ( sup P θβ Q Q (β c β θ β G(k ( sup P θγ Q Q (γ c θ γ G(k ( sup P θ R R (β, γ c mi θ Θ(k,η 0 where c > 0 is a positive costat. + k log p ( ( γ + k log p { ( + k log p + η 0 η0 + k log p 4, (30 + k log p k log p, 4, (3 } 4, (3 Combied with Theorem, the above theorem implies that the estimators Î (β, γ, Q (β, Q (γ proposed i (,(6 ad (7 achieve the miimax lower bouds (9, (30 ad (3, so they are miimax rate-optimal estimators of I (β, γ, Q (β ad Q (γ, respectively. For estimatio of R (β, γ, uder the assumptio that η 0 C k log p/, Theorem shows that the estimator R (β, γ proposed i (9 achieves the miimax lower boud /η 0 (/ + k log p/ + /η 0 k log p/ i (3. Hece R (β, γ is the rate-optimal estimator of R (β, γ uder the assumptio that the sigal stregths β ad γ are bouded away from zero. However, if η 0 c k log p/, the estimatio of R (β, γ is ot iterestig sice the trivial estimator 0 will achieve the costat miimax lower boud i this case, which demostrates the ecessity of Assumptio (A. 4 Simulatio Evalulatios ad Comparisos I this sectio, the performace of several estimators for I (β, γ ad R (β, γ were compared i two differet settigs. These estimators icluded plug-i scaled Lasso estimator (Su ad Zhag, 0, plug-i de-biased estimator (Cai ad Guo, 06a; Javamard ad Motaari, 04; va de Geer et al., 04; Zhag ad Zhag, 04, plug-i thresholded estimator (Zhag ad Zhag, 04, Sectio 3.3 ad the proposed estimator FDE. Details about these estimators are listed as follows. Plug-i scaled Lasso estimator (Lasso: The ier product I (β, γ is estimated by β, γ ad the ormalized ier product R (β, γ is estimated by [ β, γ / ( β ] { γ β } γ > 0. 6

17 Plug-i de-biased estimator (De-biased: Deote the de-biased Lasso estimators as β ad γ. The ier product I (β, γ is estimated by β, γ ad the ormalized ier [ product R (β, γ is estimated by β, γ / ( β ] { γ β } γ > 0. Plug-i thresholded estimator (Thresholded: Deote the thresholded estimators as β ad γ. The ier product I (β, γ is estimated by β, γ ad the ormalized ier product R (β, γ is estimated by [ β, γ / ( β γ ] { β γ > 0 }. FDE: The ier product I (β, γ is estimated by Î (β, γ i ( ad the ratio R (β, γ is estimated by R (β, γ i (9. We cosider FDE with sample splittig (FDE-S ad without sample splittig (FDE-NS for R (β, γ. Implemetatio of the de-biased, thresholded ad FDE estimators requires the scaled Lasso estimators β ad γ i the iitial step. The scaled Lasso estimator was implemeted by the equivalet square-root Lasso algorithm (Belloi et al., 0. The theoretical tuig parameter is λ 0 =.0 log p/, which may be coservative i the umerical studies. Istead, the tuig parameter was chose as λ 0 = b.0 log p. However, the performaces of all estimators were evaluated across a grid of tuig parameter values b {.5,.5,.75, } (see Supplemetary Material, Sectio A.. The results showed that b =.5 was a good choice for all the estimators. Hece, λ 0 =.5.0 log p/ was used for the umerical studies i this sectio ad Sectio 5. To implemet the FDE algorithm, the other tuig parameter λ as.0 log p/ for the correctio Steps,3,5 ad 6 are give i Table. Comparisos of estimates of I (β, γ ad R (β, γ are preseted below. Results o estimatig the quadratic fuctioals are preseted i the Supplemetary Material, Sectio A.. For each settig, with the parameters (p,,, s, s, s, Σ, Γ, F β, F γ specified, the followig steps were implemeted:. Geerate sets S [p] ad S [p], with S = s, S = s ad S S = s. For β R p ad γ R p, geerate β j F β ad γ k F γ, for j S, k S, ad set β j = 0 ad γ k = 0, for j S, k S.. Geerate X i i.i.d N(0, Σ, i, ad Z i i.i.d N(0, Γ, i. 7

18 3. Geerate the oise ɛ i i.i.d N(0,, i, ad δ i Geerate the outcome as y = Xβ + ɛ ad w = Zγ + δ. i.i.d N(0,, i. 4. With X, y, Z, ad w, estimate I (β, γ ad R (β, γ through differet estimators. 5. Repeat -4 for rep replicatios. For a give quatity T, deote the estimator from l-th replicatio as T(X, y, Z, w; l. The mea squared error (MSE is used to measure the performace of this estimator, MSE(T = rep rep l= ( T(X, y, Z, w; l T. (33 Experimet. I this experimet, the parameters were set as follows: (p,,, rep = (600, 400, 400, 300, the sparsity parameters (s, s, s = (5, 30, 5, ad the covariace matrices Σ ad Γ satisfy Σ ij = Γ ij = (0.8 i j. For give positive values τ ad τ, the sigals of β satisfy that β ji = ( + i/s τ /, for j i S, i =,,, s, ad the sigals i γ satisfy that γ j = τ for j S. This simulatio settig aimed to ivestigate the case where the coefficiets for oe regressio are much larger tha the other by varyig the sigal stregth parameters as (τ, τ {(3.0,., (.6,., (.,.3, (.8,.4, (.,.6, (.,.4, (.3,., (.4,.0}. The results are summarized i Table. For all combiatios of the sigal stregth parameters, i terms of estimatig the ier product I (β, γ, FDE cosistetly outperformed the plugi estimates with Lasso ad Thresholded Lasso. Moreover, with icreasig differece betwee τ ad τ, the advatage of FDE over the plugi estimate usig Lasso or thresholded Lasso became larger. The same results were observed for estimatio of the ormalized ier product R (β, γ, where FDE-NS had cosistet better performace tha other methods. Although De-biased performed well i terms of estimatig I (β, γ, it performed much worse tha FDE-NS for estimatig R (β, γ. As discussed i Sectio, the sample splittig of estimatig the ormalized ier product is simply proposed to facilitate the theoretical aalysis ad might ot be ecessary for the algorithm. Our simulatio results idicated that the proposed estimator without sample splittig (FDE-NS performed quite well i all settigs, eve better tha FDE-S, due to the fact that more samples were used for estimatio ad correctio steps. Such observatios 8

19 led us to use the proposed estimator without sample splittig (FDE-NS i the real data aalysis i Sectio 5. Table : Estimatio of the ier product I (β, γ ad the ormalized ier product R (β, γ: the truth ad mea squared error (MSE for the plug-i estimator with the scaled Lasso estimator (Lasso, plug-i estimator with the de-biased estimator (De-biased, plug-i estimator with the thresholded estimator (Thresholded, ad the proposed estimator Î (β, γ (FDE, proposed estimators with sample splittig (FDE-S ad without sample splittig (FDE-NS. I (β, γ R (β, γ Stregth parameters, (τ, τ Method (.8,.4 (.,.3 (.6,. (3,. (.,.6 (.,.4 (.3,. (.4, Truth Lasso De-biased Thresholded FDE Truth Lasso De-biased Thresholded FDE-S FDE-NS Experimet. I this experimet, the parameters were set as follows: (p,,, rep = (800, 400, 400, 300, sigal stregth parameters (τ, τ = (.,. ad the covariace matrices Σ ad Γ satisfy Σ ij = Γ ij = (0.8 i j. The sigals of β follow that β ji = ( + i/s τ /, for j i S, i =,,, s, ad the sigals i γ satisfy that γ j = τ for j S. This simulatio settig aimed to ivestigate the relatioship betwee the performace of estimators ad the sigal sparsity level ad vary the umber of sigals i β ad γ as (s, s {(40, 40, (50, 50, (60, 60, (70, 70, (80, 80, (90, 90, (00, 00, (0, 0}, ad fix the umber of commo sigals at s = 0. Sice the umber of the associated variats is very large for both coefficiet vectors, large values of τ ad τ iduce strog sigals so that 9

20 all the methods perform well. Istead, the sigal magitude was fixed at τ =. ad τ =. as weak sigals. The results are summarized i Table 3. Clearly, FDE outperformed the other methods. Whe the sigals became deser, the improvemet of FDE over other methods was more proouced. For estimatio of R (β, γ, the results showed that FDE-NS cosistetly outperformed other estimators. As the umber of sigals icreased, the MSE correspodig to FDE-NS decreased quickly. Table 3: Estimatio of the ier product I (β, γ ad the ormalized ier product R (β, γ: the truth ad MSE for the plug-i estimator with the scaled Lasso estimator (Lasso, plug-i estimator with the de-biased estimator (De-biased, plug-i estimator with the thresholded estimator (Thresholded, ad the proposed estimator Î (β, γ (FDE, proposed estimators with sample splittig (FDE-S ad without sample splittig (FDE-NS. I (β, γ R (β, γ Sparsity parameter, s (s = s Method Truth Lasso De-biased Thresholded FDE Truth Lasso De-biased Thresholded FDE-S FDE-NS

21 5 Co-heritability Estimatio of Yeast Cross Geome Wide Associatio Data Bloom et al. (03 reported a large scale geome-wide associatio study of 46 quatitative traits based o,008 Saccharomyces cerevisiae segregats betwee a laboratory strai ad a wie strai. The data set icluded,63 uique geotype markers. Sice may of these markers were highly correlated, Bloom et al. (03 further selected a set of 4,40 markers that are weakly depedet based i the likage disequilibrium iformatio. These markers were selected by pickig oe marker closest to each cetimorga positio o the geetic map. The maker geotype was coded as or, accordig to which strai it came from. For each segregat, the traits of iterest were the ed-poit coloy size ormalized by the cotrol growth uder 46 differet growth media, icludig Hydroge Peroxide, Diamide, Calcium, Yeast Nitroge Base (YNB ad Yeast extract Peptoe Dextrose (YPD, etc. To demostrate the co-heritability amog these traits, 8 traits were cosidered, icludig the ormalized coloy size uder Calcium Chloride (Calcium, Diamide, Hydroge Peroxide (Hydroge, Paraquat, Raffiose, 6 Azauracil (Azauracil, YNB, ad YPD. Each trait was ormalized to have variace, so the quadratic orm represets the total geetic effects for each trait. FDE was applied to every pair of these 8 traits without sample splittig, ad the results are summarized i Table 4, icludig estimates of the ier product for all possible pairs, the quadratic orm for each trait ad the ormalized ier product R (β, γ. The geetic heritability of these traits raged from 0. for Raffiose to 0.67 for YPD. About two thirds of these pairs had a estimated co-heritability smaller tha 0., idicatig relatively weak geetic correlatios amog these traits. To further demostrate the geetic relatedess amog these pairs, for each trait, a t- statistic was calculated based o regressig the trait value y o geetic geetic marker X j, for i j p. A larger absolute value of the t-statistic implied a stroger effect of the marker o the trait. For ay pair of traits, the scatter plot of the t-statistics provided a way of revealig the geetic relatioship betwee them. Figure shows the plots of several pairs of the traits, icludig the pairs with a large positive R (β, γ, YPD v.s. YNB ad Calcium v.s.

22 Table 4: FDE estimatio for the ier product, the quadratic orm (top row of each block ad the ormalized ier product (bottom row of each block amog 8 traits of yeast cross. Traits Calcium Diamide Hydroge Paraquat Raffiose Azauracil YNB YPD Calcium Diamide Hydroge Paraquat Raffiose Azauracil YNB YPD.6680 YNB, pairs with a large egative R (β, γ, Raffiose v.s. Hydroge ad Calcium v.s. Hydroge, ad pairs with R (β, γ ear to 0, icludig Paraquat v.s. Diamide ad Paraquat v.s. Raffiose. The plot clearly idicates a strog positive geetic correlatio betwee YPD ad YNB. The correlatio betwee Calcium ad YNB is slightly smaller. Raffiose/Hydroge ad Calcium/Hydroge pairs clearly showed egative geetic correlatio. There were several geetic variats with very large effects o Hydroge, but they were ot associated with the other traits such as Raffiose ad Calcium. The shared geetic variats were relatively weak, leadig to a correlatio aroud -0.. The two plots o the right show the pairs of trats with weak correlatios. These plots idicated that the proposed co-heritability measures ca

23 YPD YNB Raffiose Hydroge Paraquat Diamide Calcium 0 Calcium 0 Paraquat YNB Hydroge Raffiose Figure : Scatter plots of margial regressio t statistics for six pairs of traits, icludig the pairs with positive correlatio case (left pael, egative correlatio (middle pael, ad o correlatio (right pael. ideed capture the geetic sharig amog differet related traits. 6 Proofs 6. Proof of Theorem For simplicity of otatio, we assume = ad use = = to represet the sample size throughout the proof. The proofs ca be easily geeralized to the case. Without loss of geerality, we assume that the Sub-gaussia orm of radom vectors X i ad Z i are also upper bouded by M, that is, max { } X i ψ, Z i ψ M. Proof of ( 3

24 Note the followig importat decompositio, Î (β, γ I (β, γ = β, γ + û ( X y X β + û Z (w Z γ β, γ ( = û ( X y X β ( γ, β β + û Z (w Z γ β, γ γ β β, γ γ =û X ɛ + (û Σ ( γ β β + û Z δ + (û Γ β (γ γ β β, γ γ. (34 The followig lemmas are itroduced to cotrol the terms i (34 ad similar results were established i the aalysis of Lasso, scaled Lasso ad de-biased Lasso (Cai ad Guo, 06a; Re et al., 03; Su ad Zhag, 0; Ye ad Zhag, 00. The proofs of the followig lemmas ca be foud i the supplemetary material, Sectio C. Lemma. Suppose the assumptio (A holds ad k log p/ 0 with k = max { β 0, γ 0 }. The with probability at least p c, we have β log p log p β Ck, γ γ Ck, (35 ad β β C where c ad C are positive costats. k log p k log p, γ γ C, (36 Lemma. Suppose the assumptio (A holds ad k log p/ 0 with k = max { β 0, γ 0 }. The with probability at least p c, we have (û Σ ( γ β β β 0 log p C γ ad (û Γ β (γ γ C β γ 0 log p. (37 With probability at least α p c, û X ɛ C γ z α/, ad hece ad û Z δ C β z α/, (38 ( û X y X β γ, β β C γ z α/ k log p + C γ, û Z (w Z γ β, γ γ C β z α/ + C β k log p, where c ad C are positive costats. 4 (39

25 By the decompositio (34 ad the iequalities (36, (37 ad (38, we obtai that ( Î (β, γ I (β, γ C β ( z α/ + γ + k log p + C k log p. By (36, we establish (. Proof of ( ad (3 The proof of (3 is similar to that of ( ad oly the proof of ( is preset i the followig. We itroduce the estimator Q(β = β ( + û ( 3 X ( y ( X ( β /(/ ad due to the fact that Q (β is o-egative, we have Q (β Q (β Q(β Q (β. We decompose the differece betwee Q(β ad Q (β, Q(β Q (β = β β + û ( ( 3 X ( y ( X β ( / = (û 3 Σ ( β ( β β + û ( 3 X ( ɛ ( / β β. Combied with the above argumet, the upper boud ( follows from (36 ad the followig lemma. The proof of the followig lemma ca be foud i the supplemetary material Sectio C. Lemma 3. Suppose the assumptio (A holds ad k log p/ 0 with k = max { β 0, γ 0 }. The with probability at least p c α, ( û 3 X ( ɛ ( / C β z α/, (40 ad where c ad C are positive costats. (û 3 Σ ( β ( β β k log p C β, (4 6. Proof of Theorem Let µ = û X (y X β / ad µ = û Z (w Z γ /. We itroduce the ew estimator R(β, γ = Î (β, γ Q (β Q = β, γ + µ + µ. (β Q (β Q (γ 5

26 Sice R (β, γ, we have R (β, γ R (β, γ R(β, γ R (β, γ. We decompose the differece betwee R(β, γ ad R (β, γ, β = = β, γ + µ + µ β, γ Q (β Q (γ β γ β Q (β + β β β, Q (β γ Q (γ β β,, Q (β γ Q (γ β β, γ + γ γ + β γ, γ + γ β γ Q (γ γ µ +. γ Q (β Q (γ Q (β µ + µ Q (β Q (γ µ Q (γ The followig lemma cotrols the terms i the above equatio ad establishes the theorem. The proof of the followig lemma ca be foud i the supplemetary material Sectio C. Lemma 4. Suppose the assumptios (A ad (A hold ad k log p/ 0 with k = max { β 0, γ 0 }. The with probability at least p c α, we have β β γ, γ β Q (β γ Q (γ C k log p β γ, (43 β β γ µ, + β Q (β γ Q (β Q (γ C ( zα/ + k log p ( k log p + C β β + β γ, (44 ad β β, γ Q (γ γ + γ Q (β where c ad C are positive costats. µ 3 Q (γ C γ ( zα/ + k log p ( + C γ + By the decompositio (4, combiig (43, (44 ad (45, we have Î (β, γ β, γ Q (β Q (β β γ ( C + ( zα/ + k log p ( + C + k log p + β γ β γ β γ. Uder the assumptio (A, the upper boud (4 follows from the above iequality. 6 (4 β γ (45 k log p,

27 6.3 Proof of (30 ad (3 i Theorem 3 We first itroduce the otatios used i the proof of lower boud results. Let π deote the prior distributio supported o the parameter space H. Let f π (z deote the desity fuctio of the margial distributio of the radom variable Z with the prior π o H. More specifically, f π (z = f θ (z π (θ dθ. We defie the χ distace betwee two desity fuctios f ad f 0 by χ (f (z f 0 (z (f, f 0 = dz = f 0 (z f (z dz (46 f 0 (z ad the L distace by L (f, f 0 = f (z f 0 (z dz. It is well kow that L (f, f 0 χ (f, f 0. (47 The proof of the lower boud is based o the followig versio of Le Cam s Lemma (LeCam (973; Re et al. (03; Yu (997. Lemma 5. Let T (θ deote a fuctioal o θ. Suppose that H 0 = {θ 0 }, H 0, H Θ ad d = mi θ H T (θ T (θ 0. Let π deote a prior o the parameter space H. The we have if T sup θ H 0 H P θ ( d T T (θ L (f π, f θ0. (48 The proofs of (30 ad (3 are applicatios of Lemma 5. The key is to costruct the parameter spaces H 0 = {θ 0 }, H ad the prior o H such that (i H 0, H Θ,(iithe L distace L (f π, f θ0 is cotrolled ad (iiithe distace d = mi θ H T (θ T (θ 0 is maximized. I the followig, we provide the detailed proof of (30. The proof of (3 is very similar to that of (30 ad is omitted here. I the discussio of lower boud results, we will assume µ x = µ z = 0 ad the desig X i ad Z i follow joit ormal distributio. We defie the followig two parameter spaces to facilitate the discussio, { G S (k = G (k β M } { } k log p ad G W (k = G (k β C. (49 4 The lower boud (30 ca be decomposed ito the followig three lower bouds, ( if sup P θβ Q Q (β c k log p β Q θ β G S (k 4, (50 7

28 ad ( if sup P θβ Q Q (β c β Q θ β G S (k 4, (5 if Q ( sup P θβ Q Q (β c k log p θ β G W (k 4, (5 where c > 0 is a positive costat. Combiig (50 ad (5, we have ( ( if sup P θβ Q Q (β c β + k log p Q θ β G S (k 4. (53 O G S (k, we have ( β + k log p { ( = max β + k log p, k log p } ad o G W (k, we have k log p { ( = max β + k log p, k log p }. Sice both G S (k ad G W (k are proper subspace of G (k, we ca establish (30 by combiig (53 ad (5. I the followig, we will establish the lower bouds (50, (5 ad (5 separately. Proof of (50 Uder the Gaussia radom desig model, V i = (y i, X i R p+ follows a joit Gaussia distributio with mea 0. Let Σ v deote the covariace matrix of V i. For the idices of Σ v, we use 0 as the idex of y i ad {,, p} as the idices for (X i,, X ip R p. Decompose Σ v ito blocks Σv yy Σ ( v xy, where Σ v yy, Σ v xx ad Σ v xy deote the variace of y i, the variace of X i ad the covariace of y i ad X i, respectively. Let Ω = Σ deote the precisio matrix. There exists a bijective fuctio h : Σ v (β, Ω, σ ad the iverse mappig h : (β, Ω, σ Σ v, where h ((β, Ω, σ = β Ω β + σ β Ω ad Ω β Ω h(σ v = ( (Σ v xx Σ v xy, (Σ v xx, Σ v yy ( Σxy v (Σ v xx Σxy v. (54 Based o the bijectio, it is sufficiet to cotrol the χ distace betwee two multivariate Gaussia distributios. 8 Σ v xy Σ v xx

29 We itroduce the ull parameter space { G 0 = θ 0 = (β, I, σ 0 : β = (β, η 0, 0,, 0 with β > M ad σ 0 = M 8 } G S (k, where η 0 0. Defie p = p. Based o the mappig h, we have the correspodig ull parameter space for Σ v, F 0 = {Σ v 0} where (β + η0 + σ0 β η 0 0 β Σ v 0 = 0 0. η I p p We the itroduce the alterative parameter space for Σ v, which will iduce a parameter space for (β, Ω, σ through the mappig h. Defie F = {Σ v α : α l (p, k, ρ}, where (β + η0 + σ0 β η 0 ρ 0 α β Σ v α = 0 α, (55 η ρ 0 α α 0 I p p with ρ 0 = β + σ 0 ad l (p, k, ρ = {α : α R p, α 0 = k, α i {0, ρ} for i p }. (56 The we costruct the correspodig parameter space G for (β, Ω, σ, which is iduced by the mappig h ad the parameter space F, G = {(β, Ω, σ : (β, Ω, σ = h (Σ v for Σ v F }. (57 For Σ v α, the correspodig (β, Ω, σ defied as h (Σ v α satisfies β = α ρ 0 + β, β α = η 0, β {,} = (ρ 0 β α = σ 0 α, (58 α σ = σ0 α (ρ 0 β σ α 0 M, (59 max { λ mi (Ω, λ max (Ω } α. (60 9

30 Whe α is chose such that α mi { /M, M }, the we have The differece betwee β ad β is M λ mi (Ω λ max (Ω M. (6 β β = α (β ρ 0 α = σ 0 α. α By takig ρ = log(4p /k /, we have β β Ck log p/ ad hece β β Ck log p/ M /4. Combied with (59, (58 ad (6, we show that G G S (k. Let π deote the uiform prior over the parameter space G iduced by the uiform prior of α over l (p, k, ρ where ρ = log(4p /k /. The cotrol of L (f π, f θ0 is established i the followig lemma, which follows from Lemma of Cai ad Guo (06a ad is established i (7. of Cai ad Guo (06a. Lemma 6. Suppose that k c {/log p, p γ }, where 0 γ < ad c is a sufficiet small positive costat. For ρ = log(4p /k /, we establish that α mi { /M, M } ad L (f π, f θ0 4. (6 To apply Lemma 5, we cosider the fuctioal T(θ = β ad calculate the distace d = σ 0 β + ( α α (β σ 0 = α α β + β σ 0 α. (63 By takig β = M / > σ 0 ad η 0 = 0, we have β M / Ck log p/ ad σ α α β + β σ 0 log p α ck σ 0 max { β, β } ck log p σ 0 max { β, β }, where the last iequality follows from the fact that β Ck log pσ 0 / ad β = 0. Combied with (6, a applicatio of Lemma 5 leads to (50. Proof of (5 We costruct the followig parameter spaces, { G 0 = θ 0 = (β, I, σ 0 : β = (β, η 0, 0, 0, ad β M } G S (k { ( } G = θ = (β, I, σ 0 : β = β ɛ + σ 0, η 0, 0, 0 G S (k, where ɛ is a small positive costat to be chose later. The proof of the followig lemma ca be foud i the supplemetary material Sectio C. 30 (64

31 Lemma 7. If ɛ = log (7/6 /, the we have L (f π, f θ0 4. (65 To apply Lemma 5, we take η 0 = 0 ad calculate the distace d = β (β = βσ ɛ + σ ɛ c max{ β, β }. Applyig Lemma 5, we establish (5. Proof of (5 We itroduce the followig ull ad alterative parameter spaces, G 0 = {(β, I, σ : β = (η 0, 0, 0,, 0} G = {(β, I, σ : β = (η 0, 0, α with α l (p, k, ρ}, (66 where l (p, k, ρ = { α : α R p, α 0 = k, α i {0, ρ} }. (67 Let π deote the prior over the parameter space G iduced by the uiform prior of α over l (p, k, ρ where ρ = log(4p /k /. The cotrol of L (f π, f θ0 is established i the followig lemma, which follows from Lemma 7 of Cai ad Guo (06b ad is established i (.6 of Cai ad Guo (06b. Lemma 8. Suppose that k c {/log p, p γ }, where 0 γ < positive costat. For ρ = log(4p /k /, we establish ad c is a sufficiet small L (f π, f θ0 8. (68 By specifyig η 0 = 0, we have G 0 ad G defied i (66 are proper subspaces of the parameter space G W (k. To apply Lemma 5, we calculate the distace d = α ck log p/. Applyig Lemma 5, we establish (5. SUPPLEMENTARY MATERIAL Title: Supplemet to Optimal Estimatio of Co-heritability i High-dimesioal Liear Regressios. (.pdf file 3

32 Refereces Baradat, P. (976, Use of juveile-mature relatioships i idividual selectio icludig iformatio from relatives, Proceedig of IUFRO Meetig o advaced geeratio breedig, Bordeax, pp. 38. Belloi, A., Cherozhukov, V., ad Wag, L. (0, Square-root lasso: pivotal recovery of sparse sigals via coic programmig, Biometrika, 98(4, Bloom, J. S., Ehrereich, I. M., Loo, W. T., Lite, T.-L. V., ad Kruglyak, L. (03, Fidig the sources of missig heritability i a yeast cross, Nature, 494(7436, Cai, T. T., ad Guo, Z. (06a, Cofidece itervals for high-dimesioal liear regressio: miimax rates ad adaptivity, The Aals of Statistics, To appear. Cai, T. T., ad Guo, Z. (06b, Supplemet to Cofidece itervals for high-dimesioal liear regressio: miimax rates ad adaptivity, The Aals of Statistics, To appear. Fa, J., Ha, X., ad Gu, W. (0, Estimatig false discovery proportio uder arbitrary covariace depedece, Joural of the America Statistical Associatio, 07(499, Gola, D., ad Rosset, S. (0, Accurate estimatio of heritability i geome wide studies usig radom effects models, Bioiformatics, 7, i37 i33. Guo, Z., Kag, H., Cai, T. T., ad Small, D. S. (06, Cofidece Iterval for Causal Effects with Possibly Ivalid Istrumets Eve After Cotrollig for May Cofouders, arxiv preprit arxiv: ,. Javamard, A., ad Motaari, A. (04, Cofidece itervals ad hypothesis testig for high-dimesioal regressio, The Joural of Machie Learig Research, 5(, LeCam, L. (973, Covergece of estimates uder dimesioality restrictios, The Aals of Statistics, (,

Optimal Estimation of Genetic Relatedness in High-dimensional Linear Models

Optimal Estimation of Genetic Relatedness in High-dimensional Linear Models Optimal Estimatio of Geetic Relatedess i High-dimesioal Liear Models Abstract Estimatig the geetic relatedess betwee two traits based o the geome-wide associatio data is a importat problem i geetics research.

More information

Accuracy Assessment for High-Dimensional Linear Regression

Accuracy Assessment for High-Dimensional Linear Regression Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications Semi-supervised Iferece for Explaied Variace i High-dimesioal Liear Regressio ad Its Applicatios T. Toy Cai ad Zijia Guo Uiversity of Pesylvaia ad Rutgers Uiversity March 8, 08 Abstract We cosider statistical

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Confidence Intervals for High-Dimensional Linear Regression: Minimax Rates and Adaptivity

Confidence Intervals for High-Dimensional Linear Regression: Minimax Rates and Adaptivity Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research 5-207 Cofidece Itervals for High-Dimesioal Liear Regressio: Miimax Rates ad Adaptivity Toy Cai Uiversity of Pesylvaia Zijia

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 9 Multicolliearity Dr Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Multicolliearity diagostics A importat questio that

More information

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates Iteratioal Joural of Scieces: Basic ad Applied Research (IJSBAR) ISSN 2307-4531 (Prit & Olie) http://gssrr.org/idex.php?joural=jouralofbasicadapplied ---------------------------------------------------------------------------------------------------------------------------

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

THE KALMAN FILTER RAUL ROJAS

THE KALMAN FILTER RAUL ROJAS THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a getle itroductio to the Kalma filter, a umerical method that ca be used for sesor fusio or for calculatio of trajectories. First, we cosider

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test. Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal

More information

Lecture 24: Variable selection in linear models

Lecture 24: Variable selection in linear models Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania and Rutgers University

ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania and Rutgers University The Aals of Statistics 018, Vol. 46, No. 4, 1807 1836 https://doi.org/10.114/17-aos1604 Istitute of Mathematical Statistics, 018 ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1 BY T. TONY

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

11 THE GMM ESTIMATION

11 THE GMM ESTIMATION Cotets THE GMM ESTIMATION 2. Cosistecy ad Asymptotic Normality..................... 3.2 Regularity Coditios ad Idetificatio..................... 4.3 The GMM Iterpretatio of the OLS Estimatio.................

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15 17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig

More information

A Note on Adaptive Group Lasso

A Note on Adaptive Group Lasso A Note o Adaptive Group Lasso Hasheg Wag ad Chelei Leg Pekig Uiversity & Natioal Uiversity of Sigapore July 7, 2006. Abstract Group lasso is a atural extesio of lasso ad selects variables i a grouped maer.

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

11 Correlation and Regression

11 Correlation and Regression 11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

arxiv: v1 [math.pr] 13 Oct 2011

arxiv: v1 [math.pr] 13 Oct 2011 A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable. Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig

More information

The Choquet Integral with Respect to Fuzzy-Valued Set Functions

The Choquet Integral with Respect to Fuzzy-Valued Set Functions The Choquet Itegral with Respect to Fuzzy-Valued Set Fuctios Weiwei Zhag Abstract The Choquet itegral with respect to real-valued oadditive set fuctios, such as siged efficiecy measures, has bee used i

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators. IE 330 Seat # Ope book ad otes 120 miutes Cover page ad six pages of exam No calculators Score Fial Exam (example) Schmeiser Ope book ad otes No calculator 120 miutes 1 True or false (for each, 2 poits

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Correlation Regression

Correlation Regression Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

Asymptotic distribution of the first-stage F-statistic under weak IVs

Asymptotic distribution of the first-stage F-statistic under weak IVs November 6 Eco 59A WEAK INSTRUMENTS III Testig for Weak Istrumets From the results discussed i Weak Istrumets II we kow that at least i the case of a sigle edogeous regressor there are weak-idetificatio-robust

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight) Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Quantile regression with multilayer perceptrons.

Quantile regression with multilayer perceptrons. Quatile regressio with multilayer perceptros. S.-F. Dimby ad J. Rykiewicz Uiversite Paris 1 - SAMM 90 Rue de Tolbiac, 75013 Paris - Frace Abstract. We cosider oliear quatile regressio ivolvig multilayer

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Lecture 12: February 28

Lecture 12: February 28 10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Linear Regression Models

Linear Regression Models Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should be doe

More information

Soo King Lim Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7:

Soo King Lim Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7: 0 Multivariate Cotrol Chart 3 Multivariate Normal Distributio 5 Estimatio of the Mea ad Covariace Matrix 6 Hotellig s Cotrol Chart 6 Hotellig s Square 8 Average Value of k Subgroups 0 Example 3 3 Value

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [

More information

Notes for Lecture 11

Notes for Lecture 11 U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with

More information

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet

More information