Robust Sparse Estimation of Multiresponse Regression and Inverse Covariance Matrix via the L2 distance

Size: px
Start display at page:

Download "Robust Sparse Estimation of Multiresponse Regression and Inverse Covariance Matrix via the L2 distance"

Transcription

1 Robust Sparse Estimatio of Multirespose Regressio ad Iverse Covariace Matrix via the L2 distace Aurélie C. Lozao IBM Watso Research Ceter Yorktow Heights, New York Huijig Jiag IBM Watso Research Ceter Yorktow Heights, New York Xiwei Deg Virgiia Tech Blacksburg, Virgiia ABSTRACT We propose a robust framework to joitly perform two key modelig tasks ivolvig high dimesioal data: (i) learig a sparse fuctioal mappig from multiple predictors to multiple resposes while takig advatage of the couplig amog resposes, ad (ii) estimatig the coditioal depedecy structure amog resposes while adjustig for their predictors. The traditioal likelihood-based estimators lack resiliece with respect to outliers ad model misspecificatio. This issue is exacerbated whe dealig with high dimesioal oisy data. I this work, we propose istead to miimize a regularized distace criterio, which is motivated by the miimum distace fuctioals used i oparametric methods for their excellet robustess properties. The proposed estimates ca be obtaied efficietly by leveragig a sequetial quadratic programmig algorithm. We provide theoretical justificatio such as estimatio cosistecy for the proposed estimator. Additioally, we shed light o the robustess of our estimator through its liearizatio, which yields a combiatio of weighted lasso ad graphical lasso with the sample weights providig a ituitive explaatio of the robustess. We demostrate the merits of our framework through simulatio study ad the aalysis of real fiacial ad geetics data. Categories ad Subject Descriptors I.2.6 Artificial Itelligece: Learig Keywords Robust estimatio, high dimesioal data, sparse learig, variable selectio, multirespose regressio, iverse covariace, L2E. INTRODUCTION We focus o multirespose regressio where both predictor ad respose spaces may exhibit high dimesios. We propose a robust framework to joitly ad syergistically Permissio to make digital or hard copies of all or part of this work for persoal or classroom use is grated without fee provided that copies are ot made or distributed for profit or commercial advatage ad that copies bear this otice ad the full citatio o the first page. Copyrights for compoets of this work owed by others tha ACM must be hoored. Abstractig with credit is permitted. To copy otherwise, or republish, to post o servers or to redistribute to lists, requires prior specific permissio ad/or a fee. Request permissios from permissios@acm.org. KDD 3, August 4, 203, Chicago, Illiois, USA. Copyright 203 ACM /3/08...$5.00. solve two importat tasks: (i) learig the sparse fuctioal mappig betwee iputs ad outputs while takig advatage of the couplig amog resposes, ad (ii) estimatig the coditioal depedecy structure amog resposes while adjustig for the covariates. This is motivated by the crucial eed of itegratig geomic ad trascriptomic datasets i computatioal biology i order to solve two fudametal problems effectively: idetifyig the geetic variatios i the geome that ifluece gee expressio levels (a.k.a. expressio quatitative trait loci eqtls mappig), ad ucoverig gee expressio etworks. I fact, the accuracy of the first problem ca the be improved by leveragig the gee relatedess, ad similarly the accurate ad faithful estimatio of the gee expressio etworks ca be obtaied by accoutig for the cofoudig geetic effects o gee expressio. Multirespose regressio 5 geeralizes the basic siglerespose regressio to model multiple resposes that might sigificatly correlate with each other. As opposed to treatig each respose idepedetly, oe ca joitly lear multiple regressio mappigs to improve the estimatio ad predictio accuracy by exploitig the coditioal depedecies amog resposes. Variable selectio i multirespose regressio ca be accomplished via the pealized approaches icludig lasso 3 ad multitask lasso 24. Sparse estimatio of iverse covariace matrix is a importat area i the multivariate aalysis with broad applicatios i graphical models. A major focus i this area is that of pealized maximum likelihood formulatios 5, 34,, 6. Alteratively, modified Cholesky decompositios based o the likelihood ca be used to estimate the sparse iverse covariace 7, 4, 2. A simpler approach of eighborhood selectio 22 estimates sparse graphical models usig lasso to regress o each variable with the others as predictors. Combiig multirespose regressio ad iverse covariace estimatio has recetly begu to attract more attetio i the machie learig commuity. Rothma et al. 27 proposed a multivariate regressio with covariace estimatio (MRCE) to joitly estimate the sparse regressio ad iverse covariace matrices. They demostrated that exploitig the correlatio structures ca sigificatly improve the predictio accuracy. The same model was also studied by Lee ad Liu 20 who provided some theoretical properties for their developed method. A alterative parameterizatio was cosidered by Soh ad Kim 29, which is based o the joit distributio of predictors ad resposes ad yields a l -pealized coditioal graphical model (l -CGGM). Aother relevat method is that of covariate adjusted precisio matrix estimatio (CAPME) 7, a two-stage approach 293

2 to estimate the coditioal depedecy structure amog respose variables by adjustig for covariates. The first stage is to estimate the regressio coefficiets via a multivariate extesio of the l Datzig selector 8, ad the secod stage is to estimate the iverse covariace matrix usig l error with a l pealty. Robustess is a importat aspect ofte overlooked i the sparse learig literature, while critical whe dealig with high dimesioal oisy data. Traditioal likelihood-based estimators such as MRCE ad l -CGGM lack resiliece to outliers ad model misspecificatio. Additioally, to the best of our kowledge, estimates based o Datzig selector have ot bee compared to lasso couterparts i terms of robustess. Thus it is uclear whether CAPME, for istace, ca address the robustess issue. There is limited existig work o robust sparse learig methods i high-dimesioal modelig. The LAD-lasso 32 performs sigle respose regressio usig the least absolute deviatio combied with a l pealty. The tlasso 4 performs iverse covariace estimatio usig pealized log-likelihood with the multivariate t distributio. However, either of these methods ca be easily exteded to the settig of this paper. We propose a robust approach to joitly estimate multirespose regressio ad iverse covariace matrix. Our approach is based o a regularized distace criterio motivated by miimum distace estimators. Miimum distace estimators 33 are popularized i oparametric methods ad have exhibited excellet robustess properties 3, 2. Their use for parametric estimatio has bee discussed i 28, 2. I this work, we propose a pealized miimum distace criterio for robust estimatio of sparse parametric models i the high dimesioal settigs. Our key cotributios to this robust approach are as follows. The objective, which is deoted as REG-ISE, is based o the itegrated squared error distace (ISE) betwee the model ad the true distributio, ad imposes the sparse model structure by addig sparsity-iducig pealties to the ISE criterio. Theoretical guaratees are provided o the estimatio cosistecy of the proposed REG-ISE estimator. We leverage a sequetial quadratic programmig algorithm 9 to efficietly solve our objective. We shed light ito the robustess of our framework by liearizig our objective. The liearizatio yields a problem combiig weighted versios of l -pealized regressio (lasso) ad l -pealized iverse covariace estimatio (glasso), where the weights assiged to the istaces are theoretically derived ad ca be iterpreted i terms of outlyig degrees. We propose a modified cross-validatio ad hold-out validatio methods for the choice of tuig parameters, which are also applicable to other pealized regressio methods. The stregth of our method is demostrated via simulatio data with ad without outliers. Our study also cofirms that outliers ca severely ifluece the variable selectio accuracy of some existig sparse learig methods. Experimets o real fiacial ad eqtl data further illustrate the merits of the proposed method. 2. MODEL SETUP Deote the respose vector y = (y,..., y q) R q ad the predictor vector x = (x,..., x p) R p. We cosider a multirespose liear regressio model y = B x + ɛ, ɛ N (0, Σ), () where B = (b ij) is a p q matrix of coefficiets ad the kth colum is the coefficiets associated with kth respose y k regressig o the predictors x. The q q covariace matrix Σ describes the covariace structure of respose vector y give the predictors x. Moreover, its iverse Σ = (c ij) represets the partial covariace structure 9 ad has bee widely used to lear a sparse graphical model uder Gaussia assumptio. Note that Σ i () captures the partial covariaces amog resposes y after adjustig for the effects of covariates x. For simplicity of otatio, we assume the data are cetered so that the model () does ot cotai itercepts. Suppose there are observatioal vectors x i = (x i,..., x ip), i =,..., ad the correspodig respose vectors are y i = (y i,..., y iq). To joitly obtai sparse estimates of coefficiet matrix B ad precisio matrix Σ, we cosider a loss fuctio L (B, Σ) that measures the goodess-of-fit o the multivariate respose. The sparse structures of B ad Σ are ecouraged by usig l pealties. Specifically, the pealized loss fuctio L,λ (B, Σ) is writte as L,λ (B, Σ) = L (B, Σ) + λ Σ + λ 2 B. (2) where Σ = i j cij ad B = i,j bij are l matrix orms. Followig the priciple of parsimoy, we cosider l pealty fuctios to seek a most appropriate model that adequately explais the data. With carefully selected tuig parameters λ ad λ 2, we ca achieve a optimal trade-off betwee the parsimoiousess ad goodess-of-fit of the model. The loss L (B, Σ) is typically derived from a likelihoodbased approach. For istace the MRCE method 27 uses L (B, Σ) = trace ( (Y XB) T (Y XB)Σ ) log Σ. Alteatively, if oe igores the cotributio of the iverse covariace matrix (by implicitely assumig that it is the idetity matrix), oe ca cosider the traditioal squared loss L = Y XB 2 F ad a straighforward geeralizatio of the traditioal Lasso estimator 3 to the mutirespose settig. 3. A REGULARIZED INTEGRATED SQUA- RED ERROR ESTIMATOR We begi this sectio by showig how a miimum distace criterio yields our proposed estimator for achievig robustess uder the model i (). 3. Derivatio of the REG-ISE Objective We first apply the Itegrated Squared Error (ISE) criterio to the coditioal distributio of respose vector y give the predictors x. It leads to a L 2 distace betwee the true coditioal distributio f(y x) ad the parametric 294

3 distributio fuctio f(y x; B, Σ) as follows L(B, Σ) = f(y x; B, Σ) f(y x) 2 dy = f 2 (y x; B, Σ)dy 2 f(y x; B, Σ)f(y x)dy + f 2 (y x)dy = f 2 (y x; B, Σ)dy 2Ef(y x; B, Σ) + costat. where f(y x; B, Σ) is the probability desity fuctio of multivariate ormal N (B x, Σ) ad f(y x) 2 dy is a costat idepedet of B ad Σ. Note that f(y x; B, Σ) f(y B x; Σ) because of the coditioal distributio assumptio. Sice ɛ = y B x are idepedetly ad idetically distributed, oe ca cosider approximatig Ef(y x; B, Σ) by the empirical mea f(y i x i; B, Σ). Similar approximatio techiques have also bee used for Gaussia mixture desity estimatio 28. Now the resultig empirical loss fuctio of L(B, Σ) ca therefore be writte as L (B, Σ) = f 2 (y x; B, Σ)dy 2 f(y i x i; B, Σ) where f 2 (y x; B, Σ)dy = /(2 q π q/2 Σ /2 ). Note that we assume a parametric family for the model while usig a oparametric ISE criterio to measure goodess of fit. From the perspective of the loss fuctio, ISE is a more robust measure of the goodess-of-fit compared with the likelihood-based loss fuctio. It ca match the model with the largest portio of the data because the itegratio i (3) accouts for the whole rage of the squared loss fuctio. Usig ISE criterio as the loss fuctio, the objective fuctio i (2) becomes L,λ (B, Σ) = Σ /2 2 q π q/2 2 f(y i x i; B, Σ) (3) + λ Σ + λ 2 B. (4) However, the miimizatio of the objective i (4) is challegig. To circumvet this difficulty, we cosider miimizig a upper boud of (4) which retais the robustess property. For that purpose, we itroduce a lemma which is essetial for derivig the proposed objective (8) for estimatig the sparse multirespose regressio model i (). Lemma. For a positive defiite matrix Σ with dimesio q, the relatio betwee its determiat value ad l orm ca be described i the followig iequality Σ /2 ( ) q/2 Σ. q The proof of Lemma is provided i the Appedix. Usig Lemma, we ca derive a upper boud for the objective fuctio (4) as follows c Σ q/2 2 f(y i x i; B, Σ) + λ Σ + λ 2 B, where c = 2 q (πq) q/2 is a costat. The above optimizatio problem amouts to miimizig (5) L,λ (B, Σ) = L (B, Σ) + λ Σ + λ 2 B, (6) where L (B, Σ) = 2 f(y i xi; B, Σ) ad λ is appropriately chose. The value of λ here should be slightly larger tha the value of λ i (4). Moreover, the diagoal elemets of Σ are also pealized. Note that L,λ (B, Σ) is a upper boud of L,λ (B, Σ), however, the differece L,λ (B, Σ) L,λ (B, Σ) is well cotrolled by the pealty term λ Σ i (6). By properly adjustig the value of λ, we ca make the differece reasoably small. Therefore, by miimizig L,λ (B, Σ), we expect to approach the solutio of L,λ (B, Σ) ad thus still retai the robustess property i the estimators. Takig the logarithm o L (B, Σ), we obtai the loss L (B, Σ) = log exp( 2 (y i B x i ) Σ (y i B x i )) 2 log Σ. (7) We ote that the logarithm is employed to strike a better balace betwee goodess of fit ad the sparsity iducig pealty (similarly oe cosiders the pealized egative loglikelihood rather tha dealig with the likelihood directly). This yields the estimator proposed ad studied i this paper, the Regularized Itegrated Square Error (REG-ISE) estimator, which miimizes the followig objective fuctio: L,λ (B, Σ) = log exp( 2 (y i B x i) Σ (y i B x i)) 2 log Σ + λ Σ + λ 2 B. (8) For otatioal coveiece, here ad i later sectios, we use λ for λ. Some ituitio ca already be gaied o the objective robustess by cosiderig the ratio betwee data ad model pdf: f(y x)/f(y x; B, Σ). A outlier i the data may drives this ratio to ifiity, i which case the log-likelihood is also ifiity. I cotrast, the differece f(y x) f(y x; B, Σ) is always bouded. This property makes the L 2-distace a favourable choice whe dealig with outliers. We ote that a similar reasoig holds i the cotext of desity estimatio, as poited out i the recet work of 30 o desity-differece estimatio. 3.2 Optimizatio The REG-ISE objective fuctio (8) is o-covex ad o-smooth. I order to solve it, oe could cosider approximatig the log-sum-exp term i the objective, combied with alterate optimizatio for B ad Σ respectively (as doe i MRCE). However, the covergece of alterate optimizatio ca be very slow, as observed i the case of MRCE

4 Istead, we propose to adopt a sequetial quadratic programmig algorithm recetly developed by Curtis & Overto 9 for the o-smooth ad o-covex optimizatio. The basic idea of their algorithm SLQP-GS is to combie sequetial quadratic approximatio with a process of gradiet samplig so that the computatio of the search directio is effective i osmooth regios. The oly requiremet for SLQP-GS to be applicable is that the objective ad costraits (if ay) be cotiuously differetiable o ope dese subsets, which is satisfied i our case. We also beefit from the covergece guaratees of SLQP-GS, amely that the algorithm is guarateed to coverge to a solutio regardless of the iitializatio with probability oe. We employed the Matlab implemetatio of SLQP-GS provided by the authors, which is available from coral.ie.lehigh.edu/~frakecurtis/software. Due to space costraits, we refer the reader to Curtis & Overto 9 for details o the algorithm, its matlab implemetatio ad covergece results. We ote that the gradiet samplig step i SLQP-GS ca be efficietly parallelized for fast computatio i high dimesioal applicatios. Alteratively oe ca perform adaptive samplig of gradiets over the course of the optimizatio process as described i Cosistecy Results The L 2 distace estimators are kow to strike the right balace betwee statistical efficiecy ad robustess 2, 28. I this sectio, we add to this body of evidece by showig that the REG-ISE estimator is root- cosistet for the settigs of the fixed dimesioality p. Deote by B the true regressio coefficiet matrix, ad by Σ the true covariace matrix. We assume the followig coditios: (C) X X A, where A is positive defiite. (C2) There exist -cosistet estimators of B ad Σ. Coditio (C2) ca be replaced by some techical regularity coditios as i 3 so as to guaratee the cosistecy of ordiary maximum likelihood estimators. Theorem. Cosider sequeces λ, ad λ 2, of regularizatio parameters, such that λ, /2 0 ad λ 2, /2 0. The uder the coditios (C) ad (C2), there exists a local miimizer ( ˆB, ˆΣ ) of REG-ISE such that (vec( ˆB), vec( ˆΣ ) ) (vec( B), vec( Σ ) ) = O p(/ ). The proof is provided i the Appedix. The above theoretical results hold for the case where the dimesioality p is fixed, while the sample size is allowed to grow. As a future work we pla to exted our results to the case where p is allowed to grow with the sample size. We cojecture that coditio (C) might be replaced by coditios o sample ad populatio covariace while (C2) is still achieved by certai pealized maximum likelihood estimators 23. We ote however that such a extesio is highly o-trivial. I fact, to our kowledge, o theory is yet available for joit estimatio i the high dimesioal case eve for the stadard maximum likelihood estimator, the o-covexity beig the source of the difficulty. Nevertheless, i view of our superior empirical results, we hope that theoretical results ca be obtaied by makig use of the techiques i 26 which solely deal with iverse covariace estimatio, ad those i 23 which solely cocer regressio. 3.4 Isights ito Robustess We provide some isights for the robustess of REG-ISE by cosiderig a first order approximatio of the log-sumexp term i (8). Defie the parameter set β = (B, Σ ) ad deote g i(β) = 2 (y i B x i) Σ (y i B x i). We cosider a first-order approximatio for log exp(gi(β)) with respect to β as follows: log exp(g i(β)) C 0 + exp(g i(β 0 )) exp(gi(β 0 )) gi(β 0 )T (β β 0 ), where β 0 is a iitial estimate ad C 0 is some costat idepedet of β. Usig the fact that g i(β) g i(β 0 ) + g i(β 0 ) T (β β 0 ), we have the followig log exp(g i(β)) exp(g i(β 0 )) exp(gi(β 0)) gi(β), up to some costat idepedet of β. Therefore, the objectio fuctio (8) ca be approximated by log Σ + w i(y i B x i) Σ (y i B x i) + λ Σ + λ 2 B, (9) up to some costat ad where w i w i(β 0 ) = exp(g i(β 0 )) exp(gi(β 0)). (0) By defiig S = S (β 0 ) = wi(y i B x i)(y i B x i), we ca rewrite (9) as log Σ + traceσ S (β 0 ) + λ Σ + λ 2 B. () Note that S ca be viewed as a weighted sample covariace matrix where weights are with respect to observatios. Oe could the evisio a approximate iterative procedure where give iitial estimates, data are first re-weighted by w i i (0) ad the alterately passed to lasso ad l pealized iverse covariace solvers (e.g. QUIC, glasso) to provide ew estimates, ad the procedure would be repeated util covergece (see details i the appedix). This ituitively elaborates the robustess property of REG-ISE. Ideed the weights w i are proportioal to the likelihood fuctios of idividual data poits, i.e., w i = i L(y x i ;β 0 ) L(y i x. i ;β 0 ) Thus data with high likelihood values are give more weights i the estimatio. Coversely, data with low likelihood values, which are more likely to be outliers, cotribute less to the estimatio. The coectio betwee the likelihood fuctios ad weights icely explais the resiliece of the proposed estimator to outliers. 3.5 Tuig Parameter Selectio Approaches for choosig tuig parameters iclude crossvalidatio (CV) 4, the hold-out validatio set method 2, 296

5 ad iformatio criteria such as Bayesia iformatio criterio (BIC) 34. Here we proposed a modified scheme for the cross-validatio method. The commo K-fold CV cosists i radomly partitioig the data ito K folds, ad the leavig out oe fold of data as validatio set while all the other folds are used as traiig set i each CV iteratio. Note that CV assumes that the data are i.i.d. distributed, ad therefore the validatio set ad traiig set are cosidered statistically equivalet. However, such a assumptio is o loger valid i the presece of outliers sice the proportios of the outliers i the validatio data ad traiig data ca be differet. Cosequetly, the validatio set caot be used to evaluate the model obtaied by the traiig set. To tackle this issue, we develop a modified cross-validatio scheme motivated by the idea of sliced desigs 25. Specifically, we perform K-fold cross-validatio for = mk observatios as follows. Based o iitial estimates of the model parameters, we first rak the observed data accordig to the values of their likelihood fuctios. The the first K data poits are radomly assiged to K folds, oe poit per fold. Subsequetly the ext K data poits are radomly assiged to K folds. This procedure is repeated m times. I this way, the data i each fold are more likely to have similar distributios. This modified scheme ca also be applied to tuig via hold-out validatio set method. 4. SIMULATION STUDY We compare the proposed REG-ISE with MRCE 27, l - CGGM 29, CAPME 7, ad LAD-Lasso 32. Note that LAD-Lasso ca oly estimate regressio coefficiets. I our experimets, the rows of p predictor matrices X are sampled idepedetly from N (0, Σx) where (Σx) i,j = 0.5 i j. We cosider the followig two cases. Case : The covariace matrix is set to Σ i,j = 0.7 i j which correspods to a AR() model with baded Σ. We radomly select 0 percet of the predictors to be irrelevat to all the resposes. The for each respose, we radomly select half of the remaiig predictors to be relevat for that respose. The correspodig o-zero etries i the regressio matrix B are sampled idepedetly from N (0, ). We cosider 60 predictors, 20 resposes ad 00 observatios. Case 2: Σ is the graph Laplacia of a tree with outdegree of 4 ad edge weights uiformly sampled from 0.3,.0. For each respose we radomly select 0 percet of the predictors to be relevat, ad sample the correspodig ozero eties i B idepedetly from N (0, ). We cosider 000 predictors, 00 resposes ad 400 observatios. To address the robustess issue, we cosider various percetages of outliers cotamiatig the resposes. The ucotamiated data are geerated from y N (B x, Σ), where B ad Σ are specified above. Two scearios presetig outliers are cosidered: (i) outliers with respect to the mea: y N (B x + C, Σ) where C is a costat vector of 5 ad (ii) outliers regardig the covariace structure: y N (B x, I) where I is a idetity matrix. To measure variable selectio accuracy, we use the F score defied by F = 2P R/(P + R), where P is precisio (fractio of correctly selected variables amog selected variables) ad R is recall (fractio of correctly selected variables amog true relevat variables). To measure the estimatio accuracy of B, we report the model error defied as ME( ˆB, B) = tr ( ˆB B) T Σx( ˆB B), where ˆB is the estimated regressio coefficiet matrix. The estimatio accuracy for Σ is measured by its l 2 loss, defied as ˆΣ Σ F, uder Frobeius orm where ˆΣ is the estimated iverse covariace matrix. For each of the above settigs, we geerate 50 simulated dataset. For all five compariso methods, for each dataset we use the modified 5-fold cross-validatio described i Sectio 3.5 to tue parameters λ ad λ 2. The choice of iitial parameter estimates is importat for MRCE ad REG-ISE as their respective objective fuctios are o-covex. The iitial estimates for B are obtaied usig ridge regressio. I additio for REG-ISE, Σ is iitialized as the iverse of the sample covariace matrix of ridge regressio residuals perturbed by a positive diagoal matrix. Figures ad 2 preset the results for Case. Results for Case 2 are summarized i Table Similar behavior i terms of robustess is observed for both cases. Our proposed REG- ISE estimator clearly outperforms other methods due to its robustess agaist outliers with respect to both mea ad covariace deviatios. Performace of MRCE, l CGGM ad CAPME seriously degrade oce outliers are itroduced. Surprisigly, LAD-Lasso does ot show much resiliece to outliers. Moreover, with clea data, its estimatio ad variable selectio accuracy is iferior to other methods as it igores the depedecies amog resposes. Iterestigly, eve whe there are o outliers i the data, REG-ISE is competitive, sice it is more likely to distiguish true sigals from various oise amplitudes. Figure : Average model error ME( ˆB, B) (top) ad F scores (bottom) for B estimated by REG-ISE, MRCE, l CGGM, LAD-Lasso, ad CAPME o simulated data of Case. Outliers i terms of the mea (left), ad covariace (right). The x-axis correspods to the percetage of outliers. 297

6 Table : Simulatio results for Case 2. Top: Model error for B/l 2 loss for Σ. Bottom: F for B/F for Σ. Measure Outlier Type Outlier % REG-ISE MRCE l- CGGM CAPME LAD-Lasso ME(B)/l 2 (Σ ) Noe / / / / /NA ME(B)/l 2 (Σ ) Mea / / / / /NA ME(B)/l 2 (Σ ) Mea 0 34/ / / / /NA ME(B)/l 2 (Σ ) Mea / / / / /NA ME(B)/l 2 (Σ ) Cov / / / / /NA ME(B)/l 2 (Σ ) Cov / / / / /NA ME(B)/l 2 (Σ ) Cov / / / / /NA F (B)/F (Σ ) Noe 0 0.8/ / / / /NA F (B)/F (Σ ) Mea / / / / /NA F (B)/F (Σ ) Mea / / / / /NA F (B)/F (Σ ) Mea / / / / /NA F (B)/F (Σ ) Cov / / / / /NA F (B)/F (Σ ) Cov / / / / /NA F (B)/F (Σ ) Cov / / / / /NA model is cosidered as follows y t = B y t + ɛ t, ɛ t N (0, Σ), t = 2,..., T Figure 2: Average estimatio error ˆΣ Σ F (top) ad F scores (bottom) for Σ estimated by REG-ISE, MRCE, l CGGM, LAD-Lasso, ad CAPME o simulated data of Case. Outliers i terms of the mea (left), ad covariace (right). The x-axis correspods to the percetage of outliers. 5. APPLICATIONS I this sectio, we illustrate the usefuless of the proposed robust methods through two motivatig applicatios ad compare the results of our robust estimators with those of existig methods. 5. Asset Retur Predictio As a toy example for multivariate time series, we aalyze a fiacial dataset which has bee studied i 27 ad 34. This dataset cotais weekly log-returs of 9 stocks for year Give multivariate time series data of log-returs y t for weeks t =,..., T, a first-order vector autoregressive where the respose matrix y t is formed by observatios at week t ad the predictor matrix y t cotais observatios at the previous week t. Followig the aalysis i Rothma et al. 27, we used log-returs of the first 26 weeks as traiig set, ad log-returs of the remaiig 26 weeks as testig set. The tuig parameters were selected usig the modified 0-fold cross-validatio described i Sectio 2.4. Table 2 reports the mea squared predictio error (MSPE) of the five compariso methods. Eve though all methods are competitive o this dataset, REG-ISE estimator achieves the smallest predictio error. Figure 3 presets the graphs iduced by the estimates of Σ usig MRCE ad REG- ISE, respectively. Comparig the two graphs, both MRCE ad REG-ISE idicate that compaies from the same idustry are partially correlated, e.g. GE ad IBM (techology), Ford ad GM (auto idustry). AIG (isurace compay) seems to be partially correlated with most of the other compaies. However, there are some discrepacies betwee the two graphs, e.g., GM is foud to be partially correlated to IBM by REG-ISE but to be ucorrelated by MRCE. Overall, the results from REG-ISE have reasoable fiacial iterpretatio. 5.2 eqtl Data Aalysis We aalyze yeast eqtl dataset 6 which cotais geotype data for 2,956 SNPs (predictors) ad microarray data for 6,26 gees (resposes) regardig 2 segregats (istaces). We extracted,260 uique SNPs, ad focused o 25 gees belogig to cell-cycle pathway provided by the KEGG database 8. For all methods, the tuig parameters were chose via 5-fold modified cross-validatio de- Table 2: Predictio accuracy measured by MSPE for various methods for the asset retur dataset. Method MSPE REG-ISE 0.69 ± 0. MRCE 0.7 ± 0.2 l -CGGM 0.72 ± 0.0 CAPME 0.72 ± 0. LAD-Lasso 0.73 ±

7 YJL074C YIL026C YJL074C YIL026C YFL008W YOR95W YER47C Walmart YDL028C Exxo AIG Exxo AIG YOR95W YHR52W YGL90C YAL06W YDL34C YDL88C YGR098C YER47C YDR0W YPL94W YGL086W YDR325W YBR33C YKL0W YIL046W YGL003C YFR028C YIL06W YPL53C GM YBL023C YLR274W YGL20C YGR092W YPRW YJR053W YMR055C YLR03C YEL032W YDR052CYBR202W YPR09W YDL07W YDR46C YLR079W YFL029C YML064C YJL94W YDL056W YLR82W YML065W YBR060C YDL32W YDR328C YHR8C YOL33W YJR046W YLL004W YFL009W YOR083W YPR62C YBR60W YNL26W YMR99W YPL256C YBR35W YML027W YJL57C YAL040C YPL03C YERC YJR090C YNL289W YDL27W YMR036C Citigroup YDR80W YGR3W YBL097W YLR086W YBR093C YMR043W YGR233C YDL06C YDR45C Ford YAL024C YFR034C YBL06W Citigroup YBR2C YCR084C YDL55W YMR043W YGR233C YBL06W GE CoocoPhillips YPL53C YGR092W YDL0C YLR76C YAR09C YBR2C YCR084C YAL024C YDR80W YBL097W YGR3W YLR086W YFR034C YDR45C YOL00W Ford YIL06W YFR028C YBR093C YDL55W YDL06C YOL00W YFR03C YFR03C YLR272C YLR272C CoocoPhillips YGL003C YOR368W YLR288C YCL06C YDR499W YMR00C YGR08W YPR9W YGR09C YPR20C YPRW YJR053W YDR507C YMR055C YJL87C YML064C YDR46C YFL029C YLR079W YJL94W YDL056W YLR82W YML065W YBR060C YDL32W YDR328C YBL023C YJR046W YHR8C YFL009W YLR274W YGL20C YLL004W YOL33W YPR62C YOR083W YBR60W YLR03C YEL032W YNL26W YPL256C YMR99W YDR052C YBR202W YBR35W YPR09W YDL07W YJL57C YML027W YERC YAL040C YPL03C YJR090C YNL289W YDL27W YMR036C YDL0C YAR09C YBR36W YDR27C YDR8W YHR66C YKL022C YLR27C YOR249C YFR036W YBR33C YIL046W YKL0W YLR76C GM YER73W YJL076W YBR274W YJL03CYDR3C YGL6W YBL084C YGL240W YNL72W YDR260C YDL008W YLR02C YLR288C YCL06C YJL87C IBM IBM YPL94W YJL030W YOR026W YLR20W YOR368W YDR499W YMR00C YGR08W YPR9W YPR20C YGR09C YDR507C YBR36W YDR27C YDR8W YFR036W YHR66C YKL022C YLR27C YOR249C YAL06W YDL34C YDR0W YDL88C YGR098C YGL086W YGR88C YER73W YJL03CYDR3C YJL076W YBR274W YJL030W YOR026W YGL6W YBL084C YDR260C YGL240W YDL008W YNL72W YLR02C YHR52W YGL90C YDL028C YDR325W YGR88C YLR20W YFL008W YDL003W YDL003W Walmart GE YJL074C YIL026C YJL074C YFL008W YIL026C Figure 3: Graphs from the estimates of the iverse covariace matrix Σ obtaied by MRCE (left) ad REG-ISE(right). YOR95W YHR52W YGL90C YAL06W YDL34C YDL88C YDR0W YGR098C YDL028C YGL086W YDR325W YJL03C YDR3C YJL030W YOR026W YBR274W YGL6W YDR8W YBL084C YHR66C YFR036W YGL240W YLR27C YDR260C YDL008W YNL72W YLR02C YIL046W YDR499W YML065W YBR060C YBL023CYHR8C YLR274WYJR046W YGL20CYLL004W YLR03C YEL032WYPR62C YNL26W YBR202W YDR052C YPR09W YDL07W YJR053W YMR055C YDR45C YDR052C YBR202W YPR09W YDL07W YGR3W YBL097W YFL029C YLR079W YDL056W YJL94W YDL32W YDR46C YDL0C YLR76C YBR2C YCR084C YML064C YLR82W YDR328C YOL33W YFL009W YAL024C YPL256C YBR60W YMR99W YOR083W YBR35W YML027W YDR80W YJL57C YLR086W YGR233C YCL06C YPL53C YIL06W YGR092W YJR053W YPRW YMR055C YAR09C YJL87C YML065W YBL023C YBR060C YLL004W YHR8C YLR274WYJR046W YGL20C YPR62C YLR03C YEL032WYNL26W YDR80W YMR043W YDR499W YMR00C YGR08W YPR9W YGR09C YPR20C YKL0W YIL046W YDR507C YDL0C YBR36W YDR27C YOR368W YLR288C YOR249C YLR27C YGL003C YFR028C YBR2C YCR084C YAL024C YFR034C YBR274W YDR8W YBL084C YFR036W YKL022C YHR66C YLR76C YML027W YERC YAL040C YPL03C YBL06W scribed i Sectio 3.5. We first evaluated the predictive accuracy of our method ad the compariso methods by radomly partitioig the data ito traiig ad test sets, usig 90 observatios for traiig ad the remaider for testig. We computed the MSPE for the testig set. The average MSPEs based o 20 radom partitios are preseted i Table 3. We ca see that overall the predictive performace of REG-ISE is superior to the other methods. Figure 4 shows the cell-cycle pathways estimated by the proposed REG-ISE, MRCE method, CAPME method alog with the bechmark KEGG pathway. MRCE teds to estimate may spurious liks. Similar observatio holds for l CGGM, so its estimated graph is ot represeted here. CAPME recovers some of the liks but ot as accurately as REG-ISE. This ca be partly explaied by the fact that CAPME does ot take ito accout the covariace structure i its regressio stage ad does ot have ay feedback loop. This ca result i poor estimatio of the regressio matrix B, which i tur may egatively impact the estimatio of precisio matrix Σ. I additio, lack of robustess ca also result i iaccurate etwork recostructio. Certai discrepacies betwee true ad estimated graphs may also be caused by iheret limitatios i this dataset. For istace, some edges i cell-cycle pathway may ot be observable from gee expressio data. Additioally i this dataset, the perturbatio of cellular systems may ot be sigificat eough to eable accurate iferece of some of the liks. Usig the KEGG pathway as the groud truth, we also computed the F scores for the estimates of Σ show i Table 4. As a saity check, we aalyzed the microarray data without SNPs as predictors usig glasso. The resultig graph was extremely dese with F score to be This idicates the disadvatage of procedures like glasso which is uable to adjust for predictors (hereby the geetic variats) i iverse covariace matrix estimatio. YGR092W YAR09C YML064C YJR090C YNL289W YDL27W YMR036C YDR3C YGL6W YGL240W YLR02C YDR260C YDL008W YNL72W YBR33C YBR60W YPL256C YMR99WYOR083W YBR35W YJL57C YER73W YJL03C YJL030W YOR026W YLR20W YLR288C YPRW YDR328C YOL33W YDL32W YFL009W YPL94W YJL076W YGR88C YCL06C YIL06W YFR028C YFL029C YDR46C YLR079W YDL056W YLR82W YJL87C YGL086W YDR325W YER73W YOR368W YOR95W YHR52W YGL90C YAL06W YDL88C YDL34C YDR0W YGR098C YDL028C YPL53C YDR507C YJL94W YBR36W YDR27C YKL022C YOR249C YGL003C YPR9W YGR08W YMR00C YPR20C YGR09C YBR33C YKL0W YER47C YPL94W YJL076W YGR88C YLR20W YFL008W YDL003W YDL003W YER47C YGR3W YBL097W YJR090C YDL27W YMR036C YBL06W YERC YAL040C YPL03C YFR034C YBR093C YMR043W YGR233C YDL55W YBR093C YDL06C YDR45C YOL00W YLR086W YNL289W YDL06C YDL55W YOL00W YFR03C YFR03C YLR272C YLR272C Figure 4: Yeast cell cycle etwork provided i the KEGG database (top left), estimated by REG-ISE (top right), MRCE (bottom left), ad CAPME (bottom right). Table 4: F work (higher REG-ISE ±0.009 scores of the estimated cell-cycle etvalues idicate higher accuracy). MRCE l -CGGM CAPME Glasso ±0.008 ±0.0 ±0.034 ±0.04 From recostructed etwork ad F scores, we coclude that REG-ISE most faithfully estimates the cell-cycle etwork compared to the other methods, which clearly demostrates the value of embracig the robustess. 6. CONCLUDING REMARKS I this work, we have developed a robust framework to joitly estimate multirespose regressio ad iverse covariace matrix from high dimesioal data. Our formulatio is readily applicable to deal with sigle regressio ad sparse iverse covariace estimatio by itself, as well as to the parameterizatio used i 29, which will be a iterestig future work. The proposed methodology is valuable for may applicatios beyod the itegratio of geomic ad trascriptomic data ad fiatial data aalysis. Additioal iterestig future work icludes extedig the proposed method to directed graph modelig via vector autoregressive models, ad extedig our theoretical results to the high-dimesioal settig. Table 3: MSPEs uder differet methods based o 20 radom partitios of the eqtl ito traiig ad test sets. REG-ISE MRCE l -CGGM CAPME 2.36 ± ± ± ±

8 APPENDIX A. PROOF OF LEMMA Suppose that the eigevalues of Σ are d i, i =,..., q. Usig the fact that q q di q q di, oe ca have ( q ) q/2 Σ /2 di. q For each eigevalue d i, we apply the Gershgori s circle theorem to obtai its upper boud. That is d i c ii q c ij d i c ij j i j= where c ij, i =,..., q; j =,..., q are the elemets i matrix Σ. Therefore, we have q di q q j= cij = Σ, which leads to ( ) Σ Σ /2 q/2. q This eds the proof of Lemma. B. PROOF SKETCH OF THEOREM Recall our objective fuctio: L,λ (B, Σ ) = log exp( 2 (y i B x i) Σ (y i B x i)) 2 log Σ + λ Σ + λ 2 B (2) Let B ad Σ be the true regressio ad iverse covariace matrices. We follow the same reasoig as i the proof of Theorem i 3. The key idea is that it s eough to show that for ay δ > 0 there exists a large costat C, such that P { sup L,λ ( B+ U, Σ + U 2 ) > L,λ ( B, Σ )} δ U =C with U = (vec(u ), vec(u 2) ). Defie Q(B, Σ ) as For completeess we provide the details of the Q(B, Σ ) = log exp( approximate iterative procedure discussed i Sectio (y i B x i) Σ (y i B x i)). Algorithm Approximate Iterative Procedure By usig a similar reasoig as i Sectio 3.4, oe ca show that aroud the true parameters ( B, Σ ), the differece Q( B + U, Σ + U 2 ) Q( B, Σ ) ca be lower-bouded by w i(y i ( B + U ) x i) ( Σ + U 2 )(y i ( B + U ) x i) w i(y i B x i) Σ (y i B x i) + o() where w i w i(β 0 ) = exp(g i (β 0 )) exp(g i (β 0 )). Eve though Q is globally o-covex, such a approximatio is valid as Q is locally bi-covex i a eighborhood of the true parameters ( B, Σ ) with asymptotic probability oe. Briefly, oe ca show that the evets U T 2 BQ( B + t U ) U < 0 ad U T 2 2 Σ Q( Σ + t U 2 ) U 2 < 0 for t (0, ) are small uiformly o U ad U 2 as log as U is small eough. Briefly, this ca be doe by brute force calculatio of the Hessias aroud the true solutio, oticig that the expected value of U T 2 BQ( B+t U ) U ad U T 2 2 Σ Q( Σ + t U 2 ) U 2 are both o-egative for U small eough ad the upper-boudig the large deviatios from the expected values usig Azuma-Hoeffdig s iequality. Let us defie V (U) as V (U) = L,λ ( B + U, Σ + U 2 ) L,λ ( B, Σ ). For otatio coveiece, deote Ỹ = diag( w,..., w )Y, ad X = diag( w,,..., w )X. Notig that B jk + u j,k B jk = u j,k for B jk = 0 ad Σ st + u 2s,t Σ st = u 2s,t for Σ st = 0, we the have V (U) log ( Σ + U 2 ) Σ + trace{( Σ + U 2 )(Ỹ X( B + U )) (Ỹ X( B + U ))} trace{ Σ (Ỹ X B) (Ỹ X B)} + λ ( B u j,k jk + B jk ) B kj 0 + λ 2 ( Σ st + u2s,t Σ jk ). Σ st 0 Followig the same derivatios as the proof of Lemma 3 i 20, we ca show that for a sufficietly large C, V (U) > 0 uiformly o {U : U = C} with probability greater tha δ. This completes the proof. C. APPROXIMATE ITERATIVE PROCEDURE Step : Give a iitial estimate Σ 0 ad a iitial estimate B 0. Step 2: Compute w i based o (0) ad obtai S = wi(y i B 0x i)(y i B 0x i). Step 3: Estimate Σ by miimizig () give B 0: ˆΣ = arg mi log Σ + traceσ S + λ Σ. Step 4: Estimate B by miimizig (9) give Σ 0 : ˆB = arg mi w i (y i B x i ) Σ 0 (y i B x i ) + λ 2 B. Step 5: If ˆΣ Σ 0 2 F δ ad ˆB B 0 2 F δ 2, stop. Else set Σ 0 = ˆΣ ad B 0 = ˆB ad go back to Step

9 Refereces O. Baerjee, L. Ghaoui, ad A. d Aspremot. Model selectio through sparse maximum likelihood estimatio. JMLR, 9:485 56, A. Basu, I. R. Harris, N. L. Hjort, ad M. C. Joes. Robust ad efficiet estimatio by miimisig a desity power divergece. Biometrika, 85, R. Bera. Robust locatio estimates. Aals of Statistics, 5:43 444, P. J. Bickel ad E. Levia. Regularized estimatio of large covariace matrices. Aals of Statistics, 36():99 227, L. Breima ad J. H. Friedma. Predictig multivariate resposes i multiple liear regressio. JRSS Series B., R. Brem ad L. Kruglyak. The ladscape of geetic complexity across 5,700 gee expressio traits i yeast. Proc. Natl. Acad. Sci. USA, 02(5): , T. T. Cai, H. Li, W. Liu, ad J. Xie. Covariate adjusted precisio matrix estimatio with a applicatio i geetical geomics. Biometrika, E. Cades ad T. Tao. The datzig selector: Statistical estimatio whe p is much larger tha. The Aals of Statistics, 35: , F. E. Curtis ad M. L. Overto. A Sequetial Quadratic Programmig Algorithm for Nocovex, Nosmooth Costraied Optimizatio. SIAM Joural o Optimizatio, 22(2): , F. E. Curtis ad X. Que. A Adaptive Gradiet Samplig Algorithm for Nosmooth Optimizatio. Optimizatio Methods ad Software, 20. D. L. Dooho ad R. C. Liu. The automatic robustess of miimum distace fuctioal. Aals of Statistics, 6: , J. Fa ad R. Li. Variable selectio via ococave pealized likelihood ad its oracle properties. JASA, 96: , M. Fiegold ad M. Drto. Robust graphical modelig of gee etworks usig classical ad alterative t-distributio. Aals of Applied Statistics, 5(2A): , J. Friedma, T. Hastie, ad R. Tibshirai. Sparse iverse covariace estimatio with the graphical lasso. Biostatistics, 9(3):432 44, C. Hsieh, M. Sustik, I. Dhillo, ad P. Ravikumar. Sparse iverse covariace matrix estimatio usig quadratic approximatio. I NIPS, J. Huag, N. Liu, M. Pourahmadi, ad L. Liu. Covariace matrix selectio ad estimatio via pealised ormal likelihood. Biometrika, 93():85 98, M. Kaehisa, S. Goto, M. Furumichi, M. Taabe, ad M. Hirakawa. Kegg for represetatio ad aalysis of molecular etworks ivolvig diseases ad drugs. Nucleic Acid Research, 3(): , S. L. Lauritze. Graphical Models. Oxford: Claredo Press, W. Lee ad Y. Liu. Simultaeous multiple respose regressio ad iverse covariace matrix estimatio via pealized gaussia maximum likelihood. J. Multivar. Aal., E. Levia, A. J. Rothma, ad J. Zhu. Sparse estimatio of large covariace matrices via a ested lasso pealty. Aals of Applied Statistics, 2(), N. Meishause ad P. Buhlma. High-dimesioal graphs ad variable selectio with the lasso. Aals of Statistics, 34(3): , S. Negahba, P. Ravikumar, M. Waiwright, ad B. Yu. A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers. CoRR, abs/00.273, G. Oboziski, B. Taskar, ad M. Jorda. Multi-task feature selectio. Tech. report, Uiversity of Califoria, Berkeley, P. Z. G. Qia ad C. F. J. Wu. Sliced space-fillig desigs. Biometrika, 96(4): , P. Ravikumar, M. Waiwright, G. Raskutti, ad B. Yu. High-dimesioal covariace estimatio by miimizig l-pealized log-determiat divergece. Electro. J. Statist., 5: , A. Rothma, E. Levia, ad J. Zhu. Sparse multivariate regressio with covariace estimatio. JCGS, 9(4): , D. Scott. Parametric statistical modelig by miimum itegrated square error. Techometrics, 43: , K. Soh ad S. Kim. Joit estimatio of structured sparsity ad output structure i multiple-output regressio via iverse-covariace regularizatio. AISTATS, M. Sugiyama, T. Suzuki, T. Kaamori, M. C. Du Plessis, S. Liu, ad I. Takeuchi. Desity-differece estimatio. NIPS, 25: , R. Tibshirai. Regressio shrikage ad selectio via the lasso. Joural of the Royal Statistical Society, Series B, 58(): , H. Wag, G. Li, ad G. Jiag. Robust regressio shrikage ad cosistet variable selectio through the LAD-lasso. J. Busiess ad Ecoomics Statistics, 25: , J. Wolfowitz. The miimum distace method. Aals of Mathematical Statistics, 28():75 88, M. Yua ad Y. Li. Model selectio ad estimatio i the gaussia graphical model. Biometrika, 94():9 35,

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Lecture 24: Variable selection in linear models

Lecture 24: Variable selection in linear models Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet

More information

A Note on Adaptive Group Lasso

A Note on Adaptive Group Lasso A Note o Adaptive Group Lasso Hasheg Wag ad Chelei Leg Pekig Uiversity & Natioal Uiversity of Sigapore July 7, 2006. Abstract Group lasso is a atural extesio of lasso ad selects variables i a grouped maer.

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Accuracy Assessment for High-Dimensional Linear Regression

Accuracy Assessment for High-Dimensional Linear Regression Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Graphical Nonconvex Optimization via an Adaptive Convex Relaxation

Graphical Nonconvex Optimization via an Adaptive Convex Relaxation Graphical Nocovex Optimizatio via a Adaptive Covex Relaxatio Qiag Su 1 Kea Mig Ta Ha Liu 3 Tog Zhag 3 Abstract We cosider the problem of learig highdimesioal Gaussia graphical models. The graphical lasso

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Lecture 11: Decision Trees

Lecture 11: Decision Trees ECE9 Sprig 7 Statistical Learig Theory Istructor: R. Nowak Lecture : Decisio Trees Miimum Complexity Pealized Fuctio Recall the basic results of the last lectures: let X ad Y deote the iput ad output spaces

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba, UC Berkeley Pradeep Ravikumar, UT Austi Marti Waiwright, UC Berkeley Bi Yu, UC Berkeley NIPS

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Minimum Distance Estimation for Robust High-Dimensional Regression

Minimum Distance Estimation for Robust High-Dimensional Regression Miimum Distace Estimatio for Robust High-Dimesioal Regressio Aurélie C. Lozao 1 ad Nicolai Meishause 2 1 IBM T.J. Watso Research Ceter, USA 2 Departmet of Statistics, Uiversity of Oxford, UK July 11, 2013

More information

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates Iteratioal Joural of Scieces: Basic ad Applied Research (IJSBAR) ISSN 2307-4531 (Prit & Olie) http://gssrr.org/idex.php?joural=jouralofbasicadapplied ---------------------------------------------------------------------------------------------------------------------------

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications Semi-supervised Iferece for Explaied Variace i High-dimesioal Liear Regressio ad Its Applicatios T. Toy Cai ad Zijia Guo Uiversity of Pesylvaia ad Rutgers Uiversity March 8, 08 Abstract We cosider statistical

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

11 THE GMM ESTIMATION

11 THE GMM ESTIMATION Cotets THE GMM ESTIMATION 2. Cosistecy ad Asymptotic Normality..................... 3.2 Regularity Coditios ad Idetificatio..................... 4.3 The GMM Iterpretatio of the OLS Estimatio.................

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

Ω ). Then the following inequality takes place:

Ω ). Then the following inequality takes place: Lecture 8 Lemma 5. Let f : R R be a cotiuously differetiable covex fuctio. Choose a costat δ > ad cosider the subset Ωδ = { R f δ } R. Let Ωδ ad assume that f < δ, i.e., is ot o the boudary of f = δ, i.e.,

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should be doe

More information

Lecture 12: February 28

Lecture 12: February 28 10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable. Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Quantile regression with multilayer perceptrons.

Quantile regression with multilayer perceptrons. Quatile regressio with multilayer perceptros. S.-F. Dimby ad J. Rykiewicz Uiversite Paris 1 - SAMM 90 Rue de Tolbiac, 75013 Paris - Frace Abstract. We cosider oliear quatile regressio ivolvig multilayer

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

TEACHER CERTIFICATION STUDY GUIDE

TEACHER CERTIFICATION STUDY GUIDE COMPETENCY 1. ALGEBRA SKILL 1.1 1.1a. ALGEBRAIC STRUCTURES Kow why the real ad complex umbers are each a field, ad that particular rigs are ot fields (e.g., itegers, polyomial rigs, matrix rigs) Algebra

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

A Relationship Between the One-Way MANOVA Test Statistic and the Hotelling Lawley Trace Test Statistic

A Relationship Between the One-Way MANOVA Test Statistic and the Hotelling Lawley Trace Test Statistic http://ijspccseetorg Iteratioal Joural of Statistics ad Probability Vol 7, No 6; 2018 A Relatioship Betwee the Oe-Way MANOVA Test Statistic ad the Hotellig Lawley Trace Test Statistic Hasthika S Rupasighe

More information

On Robust Estimation of High Dimensional Generalized Linear Models

On Robust Estimation of High Dimensional Generalized Linear Models O Robust Estimatio of High Dimesioal Geeralized Liear Models Euho Yag Departmet of Computer Sciece Uiversity of Texas Austi euho@csutexasedu Ambuj Tewari Departmet of Statistics Uiversity of Michiga A

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

Matrix Representation of Data in Experiment

Matrix Representation of Data in Experiment Matrix Represetatio of Data i Experimet Cosider a very simple model for resposes y ij : y ij i ij, i 1,; j 1,,..., (ote that for simplicity we are assumig the two () groups are of equal sample size ) Y

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

Read through these prior to coming to the test and follow them when you take your test.

Read through these prior to coming to the test and follow them when you take your test. Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1

More information

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D. ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity High-dimesioal regressio with oisy ad missig data: Provable guaratees with o-covexity Po-Lig Loh Departmet of Statistics Uiversity of Califoria, Berkeley Berkeley, CA 94720 ploh@berkeley.edu Marti J. Waiwright

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Regression with an Evaporating Logarithmic Trend

Regression with an Evaporating Logarithmic Trend Regressio with a Evaporatig Logarithmic Tred Peter C. B. Phillips Cowles Foudatio, Yale Uiversity, Uiversity of Aucklad & Uiversity of York ad Yixiao Su Departmet of Ecoomics Yale Uiversity October 5,

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Stat 139 Homework 7 Solutions, Fall 2015

Stat 139 Homework 7 Solutions, Fall 2015 Stat 139 Homework 7 Solutios, Fall 2015 Problem 1. I class we leared that the classical simple liear regressio model assumes the followig distributio of resposes: Y i = β 0 + β 1 X i + ɛ i, i = 1,...,,

More information

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS STRUCTURE OF EXAMINATION PAPER. There will be oe 2-hour paper cosistig of 4 questios.

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Kernel density estimator

Kernel density estimator Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.

More information

4.5 Multiple Imputation

4.5 Multiple Imputation 45 ultiple Imputatio Itroductio Assume a parametric model: y fy x; θ We are iterested i makig iferece about θ I Bayesia approach, we wat to make iferece about θ from fθ x, y = πθfy x, θ πθfy x, θdθ where

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

1 Hash tables. 1.1 Implementation

1 Hash tables. 1.1 Implementation Lecture 8 Hash Tables, Uiversal Hash Fuctios, Balls ad Bis Scribes: Luke Johsto, Moses Charikar, G. Valiat Date: Oct 18, 2017 Adapted From Virgiia Williams lecture otes 1 Hash tables A hash table is a

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

Correlation Regression

Correlation Regression Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother

More information