An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

Size: px
Start display at page:

Download "An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss"

Transcription

1 An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv: v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao Tong University and Binyan Jiang Department of Applied Mathematics, The Hong Kong Polytechnic University, Abstract The estimation of high dimensional precision matrices has been a central topic in statistical learning. However, as the number of parameters scales quadratically with the dimension p, many state-ofthe-art methods do not scale well to solve problems with a very large p. In this paper, we propose a very efficient algorithm for precision matrix estimation via penalized quadratic loss functions. Under the high dimension low sample size setting, the computation complexity of our algorithm is linear in the sample size and the number of parameters, which is the same as computing the sample covariance matrix. Keywords: ADMM; High dimension; Penalized quadratic loss; Precision matrix. 1 Introduction Precision matrix plays an important role in statistical analysis. Under Gaussian assumptions, the precision matrix is often used to study the conditional independence among the random variables. Contemporary applications usually require fast methods for estimating a very high dimensional precision 1

2 matrix (Meinshausen and Bühlmann, 2006). Despite recent advances, estimation of the precision matrix remains challenging when the dimension p is very large, owing to the fact that the number of parameters to be estimated is of order O(p 2 ). For example, in the Prostate dataset we are studying in this paper, 6033 genetic activity measurements are recorded for 102 subjects. The precision matrix to be estimated is of dimension , resulting in more than 18 million parameters. One of the well-known and popular method in precision matrix estimation is referred to as the graphical lasso (Yuan and Lin, 2007; Banerjee et al., 2008; Friedman et al., 2008). Without loss of generality, assume that X 1,, X n are i.i.d. observations from a p-dimensional Gaussian distribution with mean 0 and covariance matrix Σ. To estimation the precision matrix Ω = Σ 1, the graphical lasso solves the following l 1 regularized negative log-likelihood: arg min tr(sω) log Ω + λ Ω 1, (1) Ω O p where O p is the set of non-negative definite matrices in R p p, S is the sample covariance matrix, Ω 1 denotes the sum of the absolute values of Ω and λ 0 is the tuning parameter. Although (1) is constructed based on Gaussian likelihood, it is known that glasso also works for non-gaussian data (Ravikumar et al., 2011). Many algorithms have been developed to solve the graphical lasso. Friedman et al. (2008) proposed a coordinate descent procedure and Boyd et al. (2011) provided an alternating direction method of multipliers (ADMM) algorithm for solving (1). In order to obtain faster convergence for the iterations, second order methods and proximal gradient algorithms on the dual problem are also well developed; see for example Hsieh et al. (2014), Dalal and Rajaratnam (2017), and the references therein. However, since the graphical lasso involves matrix determinant, eigen-decomposition or calculation of the determinant of a p p matrix is inevitable in these algorithms. Note that the computation complexity of eigen-decomposition or matrix determinant is of order O(p 3 ). Thus, the computation time for these algorithms will scale up cubically in p. Recently, Zhang and Zou (2014) and Liu and Luo (2015) proposed to estimate Ω by some trace based quadratic loss functions. Using Kronecker product and matrix vectorization, our interest is to estimate, or equivalently, vec(ω ) = vec(σ 1 ) = vec(i I Σ 1 ) = (Σ I) 1 vec(i), (2) vec(ω ) = vec(σ 1 ) = vec(σ 1 I I) = (I Σ) 1 vec(i), (3) 2

3 where I denotes the identity matrix and is Kronecker product. Motivated by (2) and LASSO (Tibshirani, 1996), a natural way to estimate β = vec(ω ) is 1 arg min β R 2 βt (S I)β β T vec(i) + λ β 1. (4) p2 To obtain a symmetric estimator we can use both (2) and (3), and estimate β by 1 arg min β R 4 βt (S I + I S)β β T vec(i) + λ β 1. (5) p2 In matrix notation and writing β = vec(ω), (4) can be written as ˆΩ 1 = arg min 1 2 tr(ωsωt ) tr(ω) + λ Ω 1. (6) Symmetrization can then be applied to obtain a final estimator. The loss function L 1 (Ω) := 1 2 tr(ωsωt ) tr(ω) (7) is used in Liu and Luo (2015) and they proposed a column-wise estimation approach called SCIO. Similarly, the matrix form of (5) is ˆΩ 2 = arg min 1 4 tr(ωt SΩ) tr(ωsωt ) tr(ω) + λ Ω 1. (8) The loss function L 2 (Ω) = 1 2 {L 1(Ω) + L 1 (Ω T )} = 1 4 tr(ωsωt ) tr(ωt SΩ) tr(ω), is equivalent to the D-trace loss proposed by Zhang and Zou (2014), owing to the fact that L 2 (Ω) naturally force the solution to be symmetric. In the original papers by Zhang and Zou (2014) and Liu and Luo (2015), the authors have established consistency results for the estimation (6) and (8) and show that their performance are comparable to graphical lasso. As can be seen in the vectorized formulation (2) and (3), the loss functions L 1 (Ω) and L 2 (Ω) are quadratic in β. In this note, we show that under such quadratic loss functions, explicit solutions can be obtained in each step of the ADMM algorithm. In particular, we derived explicit formulations for the inverses of (S I + ρi) and (2 1 S I I S + ρi) for any given ρ > 0, from which we are able to solve (6) and (8), or equivalently (4) and (5), with computation complexity of order O(np 2 ). Such a rate is in some sense optimal, as the complexity for computing S is also of order O(np 2 ). 3

4 2 Main Results For any real matrix M, we shall use M 2 = tr(mm T ) to denote its Frobenius norm, and M to denote its Spectral norm, i.e., the square root of the largest eigenvalue of M T M. For the quadratic loss function L(Ω) = L 1 (Ω) or L 2 (Ω), the augmented Lagrangian is L a (Ω, A, B) = arg min L(Ω) + ρ/2 Ω A + B λ A 1, where ρ > 0 is the step size of ADMM. By Boyd et al. (2011), the alternating iterations are Ω k+1 = arg min L a (Ω, A k, B k ), A k+1 = arg min L a (Ω k+1, A, B k ) = soft(ω k+1 + B k, λ/ρ), B k+1 = Ω k+1 A k+1 + B k, where soft(a, λ) is an element-wise soft thresholding operator. Clearly the computation complexity will be mainly dominated by the update of Ω k+1, which amount to solving the following problem: arg min Ω R p p L(Ω) + ρ/2 Ω A k + B k 2 2. From the convexity of the objective function, we have, L (Ω) + ρ(ω A k + B k ) = 0. Consequently, for the estimation (6) and (8), we need to solve the following equations respectively, SΩ + ρω = I ρ(a k B k ), (9) 2 1 SΩ ΩS + ρω = I ρ(a k B k ). (10) By looking at the vectorized formulations (4) and (5), we immediately have that in order to solve (9) and (10) we need to compute the inverses of (S I + ρi) and (2 1 S I I S + ρi). The following proposition provides explicit expressions for the inverses. Proposition 1 Write the decomposition of S as S = UΛU T where U R p m, m = min(n, p), U T U = I m and Λ = diag{τ 1,, τ m }, τ 1,..., τ m 0. For any ρ > 0, we have (S + ρi) 1 = ρ 1 I ρ 1 U T Λ 1 U, and (2 1 S I I S + ρi) 1 =ρ 1 I ρ 1 (UΛ 1 U T ) I ρ 1 I (UΛ 1 U T ) + ρ 1 (U U)diag{vec(Λ 2 )}(U T U T ), { } { } τ where Λ 1 = diag 1,, τ m τ τ 1 +2ρ τ m+2ρ, and Λ 2 = i τ j (τ i +τ j +4ρ) (τ i +2ρ)(τ j +2ρ)(τ i +τ j. +2ρ) 4 m m

5 The main technique used for deriving the above proposition is the well-known Woodbury matrix identity. The derivation involves lengthy and tedious calculations and is hence omitted. Nevertheless, one can simply verify the proposition by showing for example (S + ρi)(ρ 1 I ρ 1 U T Λ 1 U) = (ρ 1 I ρ 1 U T Λ 1 U)(S + ρi) = I. By Proposition 1, and the following equations: {(UΛ 1 U T ) I}vec(C) = vec(cuλ 1 U T ), I (UΛ 1 U T )vec(c) = vec(uλ 1 U T C), (U U)diag{vec(Λ 2 )}(U T U T )vec(c) = U{Λ 2 (U T CU)}U T, where denotes the Hadamard product, we can obtain the following theorem: Theorem 1 For a given ρ > 0, (i) the solution to the equation SΩ + ρω = C is unique and is given as Ω = ρ 1 C ρ 1 UΛ 1 U T C; (ii) the solution to the equation 2 1 SΩ ΩS + ρω = C is unique and is given as Ω = ρ 1 C ρ 1 CUΛ 1 U T ρ 1 UΛ 1 U T C + ρ 1 U{Λ 2 (U T CU)}U T. Note that when S is the the sample covariance, U and Λ can be obtained from the singular value decomposition (SVD) of X = (X 1,..., X n ) whose complexity is of order O(mnp). On the other hand, the solutions obtained in Theorem 1 involves only elementary matrix operations of p m and m m matrices and thus the complexity for solving (9) and (10) can be seen to be of order O(mnp + mp 2 ) = O(np 2 ). Remark 1 Generally, we can specify different weights for each element and consider the estimation ˆΩ k = arg min L k (Ω) + λ W Ω 1, k = 1, 2, where W = (w ij ) p p, w i,j 0. For example, Setting w ii = 0 and w i,j = 1, i j where the diagonal elements are left out of penalization; Using the local linear approximation (Zou and Li, 2008), we can set W = {p λ (ˆΩ ij )} p p, where ˆΩ = (ˆΩ ij ) p p is a LASSO solution and p λ ( ) is a general penalized function such as SCAD or MCP. The ADMM algorithm will be the same as the l 1 penalized case except the A k+1 related update is replaced by a element-wise soft thresholding with different thresholding parameters. 5

6 Remark 2 Compared with the ADMM algorithm given in Zhang and Zou (2014), our update of Ω k+1 only involves matrix operations of some p m and m m matrices, while matrix operations on some p p matrices are required in Zhang and Zou (2014); see for example Theorem 1 in Zhang and Zou (2014). Consequently, we are able to obtain the order O(np 2 ) in these updates while Zhang and Zou (2014) requires O((n + p)p 2 ). Our algorithm thus scales much better when n p. Similarly, for the graphical lasso, we can also use ADMM (Boyd et al., 2011) to implement the minimization where the loss function is L(Ω) = tr(sω) log Ω. The update for Ω k+1 is obtained by solving ρω Ω 1 = ρ(a k B k ) S. Denote the eigenvalue decomposition of ρ(a k B k ) S as Q T Λ 0 Q where Λ 0 = diag{a 1,, a p }, we can obtain a closed form solution, { Ω k+1 a = Q T 1 + a ρ diag,, a p + } a 2 p + 4ρ Q. 2ρ 2ρ Compared with the algorithm based on quadratic loss functions, the computational complexity is dominated by the eigenvalue decomposition of p p matrices. 3 Simulations In this section, we conduct several simulations to illustrate the efficiency and estimation accuracy of our proposed methods. We consider the following two precision matrices: 4 2 Ω 1 = (0.5 i j ) p p, Ω 2 = Ω 1 1 = where Ω 1 is an asymptotic sparse matrix with many small elements and Ω 2 is a sparse matrix where each column/row only has at most three non-zero elements. For all of our simulations, we set the sample size n = 200 and generate the data X 1,, X n from N(0, Σ) with Σ = Ω 1 1 or Ω 1 2. For notation convenience, we shall use the term EQUAL to denote our proposed Efficient admm algorithm via the QUAdratic Loss (4), and similarly, use EQUALs to denote the estimation based on the symmetric quadratic loss (5). An R package EQUAL has been developed to implement our methods. For comparison, we consider the following competitors: 6,

7 CLIME (Cai et al., 2011) which is implemented by the R package fastclime ; glasso (Friedman et al., 2008) which is implemented by the R package glasso ; BigQuic (Hsieh et al., 2013) which is implemented by the R package BigQuic ; glasso-admm which solves the glasso by ADMM (Boyd et al., 2011); SCIO (Liu and Luo, 2015) which is implemented by the R package scio ; D-trace (Zhang and Zou, 2014) which is implemented using the ADMM algorithm provided in the paper. Table 1 summaries the computation time in seconds based on 10 replications where all methods are implemented in R with a PC with 3.3 GHz Intel Core i7 CPU and 16GB memory. For all the methods, we solve a solution path corresponding to 50 lambda values ranging from λ max to λ max log p/n. Here λ max is the maximum absolute elements of the sample covariance matrix. Although the stopping criteria is different for each method, we can see from Table 1 the computation advantage of our methods. In particularly, our proposed algorithms are much faster than the original quadratic loss based methods SCIO or D-trace for large p. In addition, we can roughly observe that the required time increases quadratically in p in our proposed algorithms. The second simulation is designed to evaluate the performance of estimation accuracy. Given the true precision matrix Ω and an estimator ˆΩ, we report the following four loss functions: loss1 = 1 Ω ˆΩ 2, loss2 = Ω ˆΩ, loss3 = p 1 loss4 = p {tr(ˆω T Ω 1 ˆΩ)/2 tr(ˆω) + tr(ω)}, 1 p {tr(ω 1 ˆΩ) log Ω 1 ˆΩ p}, where loss1 is the scaled Frobenius loss, loss2 is the spectral loss, loss3 is the Stein s loss which is related to the Gaussian likelihood and loss4 is related to the quadratic loss. Table 2 reports the simulation results where the tuning parameter is chosen by five-fold crossvalidations. We can see that the performance of all three estimators are comparable, indicating that the penalized quadratic loss estimator is also reliable for high dimensional precision matrix estimation. We also observe that the estimator based on (5) has slightly smaller estimation error than those based on (4). As shown in Table 1, the computation for quadratic loss estimator are much faster than glasso. 7

8 Table 1: The average computation time (standard deviation) of solving a solution path for the high dimensional precision matrix estimation. p=100 p=200 p=400 p=800 p=1600 Sparse case where Ω = Ω 2 CLIME 0.388(0.029) 2.863(0.056) (0.341) (0.696) ( 7.953) glasso 0.096(0.014) 0.562(0.080) 3.108(0.367) (1.739) (11.379) BigQuic 2.150(0.068) 5.337(0.073) (0.326) (0.867) ( 2.376) glasso-admm 0.909(0.018) 1.788(0.045) 5.419(0.137) (0.540) ( 3.993) SCIO 0.044(0.001) 0.272(0.021) 1.766(0.032) (0.103) ( 1.325) EQUAL 0.064(0.002) 0.343(0.006) 1.180(0.031) 4.846(0.112) ( 0.281) D-trace 0.088(0.003) 0.504(0.010) 2.936(0.070) (0.598) ( 4.959) EQUALs 0.113(0.003) 0.648(0.014) 1.723(0.039) 5.687(0.184) ( 0.373) Asymptotic sparse case where Ω = Ω 1 CLIME 0.382(0.026) 2.605(0.081) (0.480) (1.198) (9.645) glasso 0.052(0.007) 0.289(0.049) 1.467(0.324) 9.124(1.756) (7.386) BigQuic 1.816(0.043) 4.134(0.033) (0.123) (0.247) (0.574) glasso-admm 0.847(0.018) 1.636(0.032) 5.370(0.239) (0.518) (3.656) SCIO 0.036(0.000) 0.253(0.028) 1.690(0.030) (0.181) (0.236) EQUAL 0.031(0.001) 0.167(0.006) 0.614(0.029) 3.084(0.122) (0.129) D-trace 0.037(0.001) 0.213(0.009) 1.438(0.072) (0.591) (4.898) EQUALs 0.048(0.001) 0.273(0.011) 0.856(0.043) 3.518(0.107) (0.128) 8

9 Moreover, to check the singularity of the estimation, we also report the minimum eigenvalue for each estimation in the final column of Table 2. We observe that all estimators are well estimated to be positive definite. Finally, we apply our proposal to the Prostate dataset which is publicly available at stanford.edu/ hastie/casi_files/data/prostate.html. The data records 6033 genetic activity measurements for the control group (50 subjects) and the prostate cancer group (52 subjects). Here, the data dimension p is 6033 and the sample size n is 50 or 52. We estimate the precision for each group. Since our EQUAL and EQUALs give similar results, we only report the estimation for EQUALs. It took less than 20 minutes for EQUALs to obtain the solution paths while glasso can not produce the solution due to out of memory in R. The sparsity level of the solution paths are plotted in the upper panel of Figure 1. To compare the precision matrices between the two groups, the network graphs of the EQUALs estimators with tuning λ = 0.75 are provided in the lower panel of Figure 1. References O. Banerjee, L. E. Ghaoui, and A. daspremont. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. The Journal of Machine Learning Research, 9 (Mar): , S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends R in Machine learning, 3 (1):1 122, T. Cai, W. Liu, and X. Luo. A constrained l 1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106(494): , O. Dalal and B. Rajaratnam. Sparse gaussian graphical model estimation via alternating minimization. Biometrika, 104(2): , J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3): ,

10 Table 2: The estimation error (standard deviation) for precision matrix estimation. loss1 loss2 loss3 loss4 min-eigen Sparse case where Ω = Ω 2 p = 500 EQUAL 0.508(0.009) 1.183(0.051) 0.273(0.004) 0.286(0.004) 0.555(0.018) EQUALs 0.465(0.010) 1.122(0.050) 0.240(0.003) 0.254(0.004) 0.500(0.013) glasso 0.554(0.008) 1.209(0.032) 0.232(0.002) 0.273(0.003) 0.298(0.011) p = 1000 EQUAL 0.605(0.010) 1.323(0.039) 0.304(0.004) 0.326(0.005) 0.577(0.015) EQUALs 0.550(0.008) 1.274(0.041) 0.267(0.003) 0.289(0.004) 0.527(0.014) glasso 0.599(0.006) 1.280(0.026) 0.248(0.002) 0.292(0.002) 0.307(0.011) p = 2000 EQUAL 0.555(0.007) 1.293(0.042) 0.291(0.003) 0.307(0.004) 0.560(0.016) EQUALs 0.538(0.008) 1.250(0.037) 0.278(0.003) 0.293(0.004) 0.549(0.015) glasso 0.640(0.005) 1.343(0.025) 0.263(0.001) 0.310(0.002) 0.318(0.010) Asymptotic sparse case where Ω = Ω 1 p = 500 EQUAL 0.707(0.005) 2.028(0.016) 0.329(0.003) 0.344(0.003) 0.364(0.011) EQUALs 0.664(0.010) 1.942(0.026) 0.297(0.005) 0.320(0.004) 0.333(0.011) glasso 0.706(0.004) 2.019(0.010) 0.310(0.002) 0.335(0.001) 0.252(0.008) p = 1000 EQUAL 0.701(0.008) 2.033(0.020) 0.331(0.004) 0.344(0.004) 0.361(0.012) EQUALs 0.681(0.003) 1.983(0.013) 0.314(0.002) 0.331(0.002) 0.353(0.011) glasso 0.728(0.003) 2.070(0.008) 0.321(0.002) 0.344(0.001) 0.261(0.008) p = 2000 EQUAL 0.860(0.106) 2.351(0.190) 0.446(0.066) 0.426(0.049) 0.322(0.023) EQUALs 0.666(0.020) 1.984(0.042) 0.317(0.008) 0.331(0.007) 0.348(0.011) 10 glasso 0.746(0.002) 2.109(0.007) 0.332(0.001) 0.353(0.001) 0.265(0.008)

11 lambda Sparsity level lambda Sparsity level (a) Solution path for control subjects (b) Solution path for prostate cancer subjects (c) Estimated networks for control subjects (d) Estimated networks for prostate cancer subjects Figure 1: Estimation for the Prostate data using EQUALs. Upper panel: sparsity level (average number of non-zero elements for each row/column) versus λ. Lower panel: network graphs for the two patient groups when λ = C.-J. Hsieh, M. A. Sustik, I. S. Dhillon, P. K. Ravikumar, and R. Poldrack. BIG & QUIC: Sparse inverse covariance estimation for a million variables. In Advances in neural information processing systems, pages , C.-J. Hsieh, M. A. Sustik, I. S. Dhillon, and P. Ravikumar. QUIC: quadratic approximation for sparse inverse covariance estimation. The Journal of Machine Learning Research, 15(1): ,

12 W. Liu and X. Luo. Fast and adaptive sparse precision matrix estimation in high dimensions. Journal of Multivariate Analysis, 135: , N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the lasso. Annals of Statistics, 34(3): , P. Ravikumar, M. J. Wainwright, G. Raskutti, B. Yu, et al. High-dimensional covariance estimation by minimizing l 1 -penalized log-determinant divergence. Electronic Journal of Statistics, 5: , R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, pages , M. Yuan and Y. Lin. Model selection and estimation in the Gaussian graphical model. Biometrika, 94 (1):19 35, T. Zhang and H. Zou. Sparse precision matrix estimation via lasso penalized D-trace loss. Biometrika, 101(1): , H. Zou and R. Li. One-step sparse estimates in nonconcave penalized likelihood models. Annals of Statistics, 36(4):1509,

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

arxiv: v1 [stat.co] 22 Jan 2019

arxiv: v1 [stat.co] 22 Jan 2019 A Fast Iterative Algorithm for High-dimensional Differential Network arxiv:9.75v [stat.co] Jan 9 Zhou Tang,, Zhangsheng Yu,, and Cheng Wang Department of Bioinformatics and Biostatistics, Shanghai Jiao

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

On the inconsistency of l 1 -penalised sparse precision matrix estimation

On the inconsistency of l 1 -penalised sparse precision matrix estimation On the inconsistency of l 1 -penalised sparse precision matrix estimation Otte Heinävaara Helsinki Institute for Information Technology HIIT Department of Computer Science University of Helsinki Janne

More information

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables for a Million Variables Cho-Jui Hsieh The University of Texas at Austin NIPS Lake Tahoe, Nevada Dec 8, 2013 Joint work with M. Sustik, I. Dhillon, P. Ravikumar and R. Poldrack FMRI Brain Analysis Goal:

More information

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso Exact Hybrid Covariance Thresholding for Joint Graphical Lasso Qingming Tang Chao Yang Jian Peng Jinbo Xu Toyota Technological Institute at Chicago Massachusetts Institute of Technology Abstract. This

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) 1/12 MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) Dominique Guillot Departments of Mathematical Sciences University of Delaware May 6, 2016

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang

More information

Sparse Inverse Covariance Estimation for a Million Variables

Sparse Inverse Covariance Estimation for a Million Variables Sparse Inverse Covariance Estimation for a Million Variables Inderjit S. Dhillon Depts of Computer Science & Mathematics The University of Texas at Austin SAMSI LDHD Opening Workshop Raleigh, North Carolina

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

Joint Gaussian Graphical Model Review Series I

Joint Gaussian Graphical Model Review Series I Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Estimation of Graphical Models with Shape Restriction

Estimation of Graphical Models with Shape Restriction Estimation of Graphical Models with Shape Restriction BY KHAI X. CHIONG USC Dornsife INE, Department of Economics, University of Southern California, Los Angeles, California 989, U.S.A. kchiong@usc.edu

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Variables. Cho-Jui Hsieh The University of Texas at Austin. ICML workshop on Covariance Selection Beijing, China June 26, 2014

Variables. Cho-Jui Hsieh The University of Texas at Austin. ICML workshop on Covariance Selection Beijing, China June 26, 2014 for a Million Variables Cho-Jui Hsieh The University of Texas at Austin ICML workshop on Covariance Selection Beijing, China June 26, 2014 Joint work with M. Sustik, I. Dhillon, P. Ravikumar, R. Poldrack,

More information

The Nonparanormal skeptic

The Nonparanormal skeptic The Nonpara skeptic Han Liu Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Fang Han Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Ming Yuan Georgia Institute

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Penalized versus constrained generalized eigenvalue problems

Penalized versus constrained generalized eigenvalue problems Penalized versus constrained generalized eigenvalue problems Irina Gaynanova, James G. Booth and Martin T. Wells. arxiv:141.6131v3 [stat.co] 4 May 215 Abstract We investigate the difference between using

More information

Newton-Like Methods for Sparse Inverse Covariance Estimation

Newton-Like Methods for Sparse Inverse Covariance Estimation Newton-Like Methods for Sparse Inverse Covariance Estimation Peder A. Olsen Figen Oztoprak Jorge Nocedal Stephen J. Rennie June 7, 2012 Abstract We propose two classes of second-order optimization methods

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang Intel-NTU, National Taiwan University, Taiwan Shou-de Lin Intel-NTU, National Taiwan University, Taiwan WANGJIM123@GMAIL.COM SDLIN@CSIE.NTU.EDU.TW Abstract This paper proposes a robust method

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February

More information

arxiv: v1 [math.st] 13 Feb 2012

arxiv: v1 [math.st] 13 Feb 2012 Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Distributed ADMM for Gaussian Graphical Models Yaoliang Yu Lecture 29, April 29, 2015 Eric Xing @ CMU, 2005-2015 1 Networks / Graphs Eric Xing

More information

A Divide-and-Conquer Procedure for Sparse Inverse Covariance Estimation

A Divide-and-Conquer Procedure for Sparse Inverse Covariance Estimation A Divide-and-Conquer Procedure for Sparse Inverse Covariance Estimation Cho-Jui Hsieh Dept. of Computer Science University of Texas, Austin cjhsieh@cs.utexas.edu Inderjit S. Dhillon Dept. of Computer Science

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Indirect multivariate response linear regression

Indirect multivariate response linear regression Biometrika (2016), xx, x, pp. 1 22 1 2 3 4 5 6 C 2007 Biometrika Trust Printed in Great Britain Indirect multivariate response linear regression BY AARON J. MOLSTAD AND ADAM J. ROTHMAN School of Statistics,

More information

Package scio. February 20, 2015

Package scio. February 20, 2015 Title Sparse Column-wise Inverse Operator Version 0.6.1 Author Xi (Rossi) LUO and Weidong Liu Package scio February 20, 2015 Maintainer Xi (Rossi) LUO Suggests clime, lorec, QUIC

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Jérôme Thai 1 Timothy Hunter 1 Anayo Akametalu 1 Claire Tomlin 1 Alex Bayen 1,2 1 Department of Electrical Engineering

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Learning Local Dependence In Ordered Data

Learning Local Dependence In Ordered Data Journal of Machine Learning Research 8 07-60 Submitted 4/6; Revised 0/6; Published 4/7 Learning Local Dependence In Ordered Data Guo Yu Department of Statistical Science Cornell University, 73 Comstock

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Xiaocheng Tang Department of Industrial and Systems Engineering Lehigh University Bethlehem, PA 18015 xct@lehigh.edu Katya Scheinberg

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Learning Graphical Models With Hubs

Learning Graphical Models With Hubs Learning Graphical Models With Hubs Kean Ming Tan, Palma London, Karthik Mohan, Su-In Lee, Maryam Fazel, Daniela Witten arxiv:140.7349v1 [stat.ml] 8 Feb 014 Abstract We consider the problem of learning

More information

Automatic Response Category Combination. in Multinomial Logistic Regression

Automatic Response Category Combination. in Multinomial Logistic Regression Automatic Response Category Combination in Multinomial Logistic Regression Bradley S. Price, Charles J. Geyer, and Adam J. Rothman arxiv:1705.03594v1 [stat.me] 10 May 2017 Abstract We propose a penalized

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

A Direct Approach for Sparse Quadratic Discriminant Analysis

A Direct Approach for Sparse Quadratic Discriminant Analysis A Direct Approach for Sparse Quadratic Discriminant Analysis arxiv:1510.00084v3 [stat.me] 20 Sep 2016 Binyan Jiang, Xiangyu Wang, and Chenlei Leng September 21, 2016 Abstract Quadratic discriminant analysis

More information

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Robust estimation of scale and covariance with P n and its application to precision matrix estimation Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY

More information

Estimating Sparse Precision Matrix with Bayesian Regularization

Estimating Sparse Precision Matrix with Bayesian Regularization Estimating Sparse Precision Matrix with Bayesian Regularization Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement Graphical

More information

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

An Introduction to Graphical Lasso

An Introduction to Graphical Lasso An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Discriminative Feature Grouping

Discriminative Feature Grouping Discriative Feature Grouping Lei Han and Yu Zhang, Department of Computer Science, Hong Kong Baptist University, Hong Kong The Institute of Research and Continuing Education, Hong Kong Baptist University

More information

Preconditioning via Diagonal Scaling

Preconditioning via Diagonal Scaling Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF

Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF Jong Ho Kim, Youngsuk Park December 17, 2016 1 Introduction Markov random fields (MRFs) are a fundamental fool on data

More information

Graphical Models for Non-Negative Data Using Generalized Score Matching

Graphical Models for Non-Negative Data Using Generalized Score Matching Graphical Models for Non-Negative Data Using Generalized Score Matching Shiqing Yu Mathias Drton Ali Shojaie University of Washington University of Washington University of Washington Abstract A common

More information

Lecture 17: October 27

Lecture 17: October 27 0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Sparse and Locally Constant Gaussian Graphical Models

Sparse and Locally Constant Gaussian Graphical Models Sparse and Locally Constant Gaussian Graphical Models Jean Honorio, Luis Ortiz, Dimitris Samaras Department of Computer Science Stony Brook University Stony Brook, NY 794 {jhonorio,leortiz,samaras}@cs.sunysb.edu

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Proximity-Based Anomaly Detection using Sparse Structure Learning

Proximity-Based Anomaly Detection using Sparse Structure Learning Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /

More information

arxiv: v2 [econ.em] 1 Oct 2017

arxiv: v2 [econ.em] 1 Oct 2017 Estimation of Graphical Models using the L, Norm Khai X. Chiong and Hyungsik Roger Moon arxiv:79.8v [econ.em] Oct 7 Naveen Jindal School of Management, University of exas at Dallas E-mail: khai.chiong@utdallas.edu

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

A Parametric Simplex Approach to Statistical Learning Problems

A Parametric Simplex Approach to Statistical Learning Problems A Parametric Simplex Approach to Statistical Learning Problems Haotian Pang Tuo Zhao Robert Vanderbei Han Liu Abstract In this paper, we show that the parametric simplex method is an efficient algorithm

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

An algorithm for the multivariate group lasso with covariance estimation

An algorithm for the multivariate group lasso with covariance estimation An algorithm for the multivariate group lasso with covariance estimation arxiv:1512.05153v1 [stat.co] 16 Dec 2015 Ines Wilms and Christophe Croux Leuven Statistics Research Centre, KU Leuven, Belgium Abstract

More information

Differential network analysis from cross-platform gene expression data: Supplementary Information

Differential network analysis from cross-platform gene expression data: Supplementary Information Differential network analysis from cross-platform gene expression data: Supplementary Information Xiao-Fei Zhang, Le Ou-Yang, Xing-Ming Zhao, and Hong Yan Contents 1 Supplementary Figures Supplementary

More information

Structure estimation for Gaussian graphical models

Structure estimation for Gaussian graphical models Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of

More information

Extended Bayesian Information Criteria for Gaussian Graphical Models

Extended Bayesian Information Criteria for Gaussian Graphical Models Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models 1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall

More information

Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis

Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis Biostatistics (2010), 11, 4, pp. 599 608 doi:10.1093/biostatistics/kxq023 Advance Access publication on May 26, 2010 Simultaneous variable selection and class fusion for high-dimensional linear discriminant

More information

A Direct Approach for Sparse Quadratic Discriminant Analysis

A Direct Approach for Sparse Quadratic Discriminant Analysis Journal of Machine Learning Research 19 (2018) 1-37 Submitted 5/17; Revised 4/18; Published 9/18 A Direct Approach for Sparse Quadratic Discriminant Analysis Binyan Jiang Department of Applied Mathematics

More information

Sparse Covariance Matrix Estimation with Eigenvalue Constraints

Sparse Covariance Matrix Estimation with Eigenvalue Constraints Sparse Covariance Matrix Estimation with Eigenvalue Constraints Han Liu and Lie Wang 2 and Tuo Zhao 3 Department of Operations Research and Financial Engineering, Princeton University 2 Department of Mathematics,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

Estimating Covariance Structure in High Dimensions

Estimating Covariance Structure in High Dimensions Estimating Covariance Structure in High Dimensions Ashwini Maurya Michigan State University East Lansing, MI, USA Thesis Director: Dr. Hira L. Koul Committee Members: Dr. Yuehua Cui, Dr. Hyokyoung Hong,

More information

Estimating Sparse Graphical Models: Insights Through Simulation. Yunan Zhu. Master of Science STATISTICS

Estimating Sparse Graphical Models: Insights Through Simulation. Yunan Zhu. Master of Science STATISTICS Estimating Sparse Graphical Models: Insights Through Simulation by Yunan Zhu A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in STATISTICS Department of

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Log Covariance Matrix Estimation

Log Covariance Matrix Estimation Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed

More information