Variables. Cho-Jui Hsieh The University of Texas at Austin. ICML workshop on Covariance Selection Beijing, China June 26, 2014

Size: px
Start display at page:

Download "Variables. Cho-Jui Hsieh The University of Texas at Austin. ICML workshop on Covariance Selection Beijing, China June 26, 2014"

Transcription

1 for a Million Variables Cho-Jui Hsieh The University of Texas at Austin ICML workshop on Covariance Selection Beijing, China June 26, 2014 Joint work with M. Sustik, I. Dhillon, P. Ravikumar, R. Poldrack, A. Banerjee, S. Becker, and P. Olsen

2 FMRI Brain Analysis Goal: Reveal functional connections between regions of the brain. (Sun et al, 2009; Smith et al, 2011; Varoquaux et al, 2010; Ng et al, 2011) p = 228, 483 voxels. Figure from (Varoquaux et al, 2010)

3 Other Applications Gene regulatory network discovery: (Schafer & Strimmer 2005; Andrei & Kendziorski 2009; Menendez et al, 2010; Yin and Li, 2011) Financial Data Analysis: Model dependencies in multivariate time series (Xuan & Murphy, 2007). Sparse high dimensional models in economics (Fan et al, 2011). Social Network Analysis / Web data: Model co-authorship networks (Goldenberg & Moore, 2005). Model item-item similarity for recommender system(agarwal et al, 2011). Climate Data Analysis (Chen et al., 2010). Signal Processing (Zhang & Fung, 2013). Anomaly Detection (Ide et al, 2009). Time series prediction and classification (Wang et al., 2014).

4 L1-regularized inverse covariance selection Goal: Estimate the inverse covariance matrix in the high dimensional setting: p(# variables) n(# samples) Add l 1 regularization a sparse inverse covriance matrix is preferred. The l 1 -regularized Maximum Likelihood Estimator: Σ 1 { = arg min log det X + tr(sx ) X 0 where X 1 = n i,j=1 X ij. High dimensional problems } {{ } negative log likelihood } +λ X 1 = arg min f (X ), X 0 Number of parameters scales quadratically with number of variables.

5 Proximal Newton method Split smooth and non-smooth terms: f (X ) = g(x ) + h(x ), where g(x ) = log det X + tr(sx ) and h(x ) = λ X 1. Form quadratic approximation for g(x t + ): ḡ Xt ( ) = tr((s W t ) ) + (1/2) vec( ) T (W t W t ) vec( ) log det X t + tr(sx t ), where W t = (X t ) 1 = X log det(x ) X =X t. Define the generalized Newton direction: D t = arg min ḡx t ( ) + λ X t + 1.

6 Algorithm Proximal Newton method Proximal Newton method for sparse Inverse Covariance estimation Input: Empirical covariance matrix S, scalar λ, initial X 0. For t = 0, 1,... 1 Compute Newton direction: D t = arg min fxt (X t + ) (A Lasso problem.) 2 Line Search: use an Armijo-rule based step-size selection to get α s.t. X t+1 = X t + αd t is positive definite, satisfies a sufficient decrease condition f (X t + αd t) f (X t) + ασ t. (Cholesky factorization of X t + αd t ) (Hsieh et al., 2011; Olsen et al., 2012; Lee et al., 2012; Dinh et al., 2013; Scheinberg et al., 2014)

7 Scalability How to apply proximal Newton method to high dimensional problems? Number of parameters: too large (p 2 ) Hessian: cannot be explicited formed (p 2 p 2 ) Limited memory... BigQUIC (2013): p = 1, 000, 000 (1 trillion parameters) in 22.9 hrs with 32 GBytes memory (using a single machine with 32 cores). Key ingradients: Active variable selection the complexity scales with the intrincic dimensionality. Block coordinate descent Handle problems where # parameters > memory size. (Big & Quic). Other techniques: Divide-and-conquer, efficient line search... Extension to general dirty models.

8 Active Variable Selection

9 Motivation We can directly implement a proximal Newton method. Takes 28 seconds on ER dataset. Run Glasso: takes only 25 seconds. Why do people like to use proximal Newton method?

10 Newton Iteration 1 ER dataset, p = 692, Newton iteration 1.

11 Newton Iteration 3 ER dataset, p = 692, Newton iteration 3.

12 Newton Iteration 5 ER dataset, p = 692, Newton iteration 5.

13 Newton Iteration 7 ER dataset, p = 692, Newton iteration 7. Only 2.7% variables have been changed to nonzero.

14 Active Variable Selection Our goal: Reduce the number of variables from O(p 2 ) to O( X ). The solution usually has a small number of nonzeros: X 0 p 2 Strategy: before solving the Newton direction, guess the variables that should be updated. Fixed set: S fixed = {(i, j) X ij = 0 and ij g(x ) λ.}. The remaining variables constitute the free set. Solve the smaller subproblem: D t = arg min : Sfixed =0 ḡx t ( ) + λ X t + 1.

15 Active Variable Selection Is this just a heuristic? No. The process is equivalent to a block coordinate descent algorithm with two blocks: 1 Update the fixed set no change (due to the definition of S fixed ). 2 Update the free set update on the subproblem. Convergence is guaranteed.

16 Size of free set In practice, the size of free set is small. Take Hereditary dataset as an example: p = 1869, number of variables = p 2 = 3.49 million. The size of free set drops to 20, 000 at the end.

17 Real datasets (a) Time for Estrogen, p = 692 (b) Time for hereditarybc, p = 1, 869 Figure : Comparison of algorithms on real datasets. The results show QUIC converges faster than other methods.

18 Algorithm QUIC: QUadratic approximation for sparse Inverse Covariance estimation Input: Empirical covariance matrix S, scalar λ, initial X 0. For t = 0, 1,... 1 Variable selection: select a free set of m p 2 variables. 2 Use coordinate descent to find descent direction: D t = arg min fxt (X t + ) over set of free variables, (A Lasso problem.) 3 Line Search: use an Armijo-rule based step-size selection to get α s.t. X t+1 = X t + αd t is positive definite, satisfies a sufficient decrease condition f (X t + αd t) f (X t) + ασ t. (Cholesky factorization of X t + αd t )

19 Extensions Also used in LIBLINEAR (Yuan et al., 2011) for l 1 -regularized logistic regression. Can be generalized to active subspace selection for nuclear norm (Hsieh et al., 2014). Can be generalized to the general decomposable norm.

20 Efficient (block) Coordinate Descent with Limited Memory

21 Coordinate Descent Updates Use coordinate descent to solve: arg min D {ḡ X (D) + λ X + D 1 }. Current gradient ḡ X (D) = (W W )vec(d) + S W = WDW + S W, where W = X 1. Closed form solution for each coordinate descent update: D ij c + S(c b/a, λ/a), where S(z, r) = sign(z) max{ z r, 0} is the soft-thresholding function, a = W 2 ij + W iiw jj, b = S ij W ij + w T i Dw j, and c = X ij + D ij. The main cost is in computing w T i Dw j. Instead, we maintain U = DW after each coordinate updates, and then compute w T i u j only O(p) flops per updates.

22 Difficulties in Scaling QUIC Consider the case that p 1million, m = X t 0 50million. Coordinate descent requires X t and W = X 1 t, needs O(p 2 ) storage needs O(mp) computation per sweep, where m = X t 0

23 Coordinate Updates with Memory Cache Assume we can store M columns of W in memory. Coordinate descent update (i, j): compute w T i Dw j. If w i, w j are not in memory: recompute by CG: X w i = e i : O(T CG ) time.

24 Coordinate Updates with Memory Cache w 1, w 2, w 3, w 4 stored in memory.

25 Coordinate Updates with Memory Cache Cache hit, do not need to recompute w i, w j.

26 Coordinate Updates with Memory Cache Cache miss, recompute w i, w j.

27 Coordinate Updates ideal case Want to find update sequence that minimizes number of cache misses: probably NP Hard. Our strategy: update variables block by block. The ideal case: there exists a partition {S 1,..., S k } such that all free sets are in diagonal blocks: Only requires p column evaluations.

28 General case: block diagonal + sparse If the block partition is not perfect: extra column computations can be characterized by boundary nodes. Given a partition {S 1,..., S k }, we define boundary nodes as B(S q ) {j j S q and i S z, z q s.t. F ij = 1}, where F is adjacency matrix of the free set.

29 Graph Clustering Algorithm The number of columns to be computed in one sweep is p + q B(S q ). Can be upper bounded by p + B(S q ) p + F ij. q z q i S z,j S q Use Graph Clustering (METIS or Graclus) to find the partition. Example: on fmri dataset (p = million) with 20 blocks, random partition: need 1.6 million column computations. graph clustering: need million column computations.

30 BigQUIC Block co-ordinate descent with clustering, needs O(p 2 ) O(m + p 2 /k) storage needs O(mp) O(mp) computation per sweep, where m = X t 0

31 Algorithm BigQUIC Input: Samples Y, scalar λ, initial X 0. For t = 0, 1,... 1 Variable selection: select a free set of m p 2 variables. 2 Construct a partition by clustering. 3 Run block coordinate descent to find descent direction: D t = arg min fxt (X t + ) over set of free variables. 4 Line Search: use an Armijo-rule based step-size selection to get α s.t. X t+1 = X t + αd t is positive definite, satisfies a sufficient decrease condition f (X t + αd t) f (X t) + ασ t. (Schur complement with conjugate gradient method. )

32 Convergence Analysis

33 Convergence Guarantees for QUIC Theorem (Hsieh, Sustik, Dhillon & Ravikumar, NIPS, 2011) QUIC converges quadratically to the global optimum, that is for some constant 0 < κ < 1: X t+1 X F lim t X t X 2 F = κ. Other convergence proof for proximal Newton: (Lee et al, 2012) discuss the convergence for general loss function and regularization. (Katyas et al., 2014) discuss the global convergence rate.

34 BigQUIC Convergence Analysis Recall W = X 1. When each w i is computed by CG (X w i = e i ): The gradient ij g(x ) = S ij W ij on free set can be computed once and stored in memory. Hessian (w T i Dw j in coordinate updates) needs to be repeatedly computed. To reduce the time overhead, Hessian should be computed approximately.

35 BigQUIC Convergence Analysis Recall W = X 1. When each w i is computed by CG (X w i = e i ): The gradient ij g(x ) = S ij W ij on free set can be computed once and stored in memory. Hessian (w T i Dw j in coordinate updates) needs to be repeatedly computed. To reduce the time overhead, Hessian should be computed approximately. Theorem (Hsieh, Sustik, Dhillon, Ravikumar & Poldrack, 2013) If g(x ) is computed exactly and ηi Ĥ t ηi for some constant η, η > 0 at every Newton iteration, then BigQUIC converges to the global optimum.

36 BigQUIC Convergence Analysis To have super-linear convergence rate, we require the Hessian computation to be more and more accurate as X t approaches X. Measure the accuracy of Hessian computation by R = [r 1..., r p ], where r i = X w i e i. The optimality of X t can be measured by { S ij g(x ) + sign(x ij )λ if X ij 0, ij f (X ) = sign( ij g(x )) max( ij g(x ) λ, 0) if X ij = 0. Theorem (Hsieh, Sustik, Dhillon, Ravikumar & Poldrack, 2013) In BigQUIC, if residual matrix R t = O( S f (X t ) l ) for some 0 < l 1 as t, then X t+1 X = O( X t X 1+l ) as t.

37 BigQUIC Convergence Analysis To have super-linear convergence rate, we require the Hessian computation to be more and more accurate as X t approaches X. Measure the accuracy of Hessian computation by R = [r 1..., r p ], where r i = X w i e i. The optimality of X t can be measured by { S ij g(x ) + sign(x ij )λ if X ij 0, ij f (X ) = sign( ij g(x )) max( ij g(x ) λ, 0) if X ij = 0. Theorem (Hsieh, Sustik, Dhillon, Ravikumar & Poldrack, 2013) In BigQUIC, if residual matrix R t = O( S f (X t ) l ) for some 0 < l 1 as t, then X t+1 X = O( X t X 1+l ) as t. When R t = O( S f (X t ) ), BigQUIC has asymptotic quadratic convergence rate.

38 Experimental results (scalability) (a) Scalability on random graph (b) Scalability on chain graph Figure : BigQUIC can solve one million dimensional problems.

39 Experimental results BigQUIC is faster even for medium size problems. Figure : Comparison on FMRI data with a p = subset (maximum dimension that previous methods can handle).

40 Results on FMRI dataset 228,483 voxels, 518 time points. λ = 0.6 = average degree 8, BigQUIC took 5 hours. λ = 0.5 = average degree 38, BigQUIC took 21 hours. Findings: Voxels with large degree were generally found in the gray matter. Can detect meaningful brain modules by modularity clustering.

41 Quic & Dirty A Quadratic Approximation Approach for Dirty Statistical Models

42 Dirty Statistical Models Superposition-structured models or dirty models (Yang et al., 2013): ( ) min L θ (r) + λ r θ (r) Ar := F (θ), {θ (r) } k r=1 r r An example: GMRF with latent features (Chandrasekaran et al., 2012) min log det(s L) + S L, Σ + λ S S 1 + λ L L. S,L:L 0,S L 0 When there are two groups of random variables X, Y, Θ = [Θ XX, Θ XY ; Θ YX, Θ YY ]. If we only observe X, and Y is the set of hidden variables, the marginal precision matrix will be [Θ XX Θ XY Θ 1 YY Θ YX ], therefore will have a structure of low rank + sparse. Error bound: (Meng et al., 2014). Other applications: Robust PCA (low rank + sparse), multitask (sparse + group sparse),...

43 Quadratic Approximation The Newton direction d: [d (1),..., d (k) ] = argmin (1),..., (k) ḡ(θ + ) + k λ r θ (r) + (r) Ar, r=1 where and k ḡ(θ + ) = g(θ) + (t), G T H, r=1 H H H := 2 g(θ) =....., H := 2 L( θ). H H

44 Quadratic Approximation A general active subspace selection for decomposable norms. L1 norm: variable selection Nuclear norm: subspace selection Group sparse norm: group selection Method for solving the quadratic approximation subproblem: Alternating minimization. Convergence analysis: Hessian is not positive definite. Still have quadratic convergence.

45 Experimental Comparisons (a) ER dataset (b) Leukemia dataset Figure : Comparison on gene expression data (GMRF with latent variables).

46 Conclusions How to scale proximal Newton method to high dimensional problems: Active variable selection the complexity scales with the intrincic dimensionality. Block coordinate descent Handle problems where # parameters > memory size. (Big & Quic). Other techniques: Divide-and-conquer, efficient line search... BigQUIC: Memory efficient quadratic approximation method for sparse inverse covariance estimation (scale to 1 million by 1 million problems). With a careful design, we can apply proximal Newton method to solve models with superposition-structured regularization (sparse + low rank; sparse + group sparse; low rank + group sparse... ).

47 References [1] C. J. Hsieh, M. Sustik, I. S. Dhillon, P. Ravikumar, and R. Poldrack. BIG & QUIC: Sparse inverse covariance estimation for a million variables. NIPS (oral presentation), [2] C. J. Hsieh, M. Sustik, I. S. Dhillon, and P. Ravikumar. Sparse Inverse Covariance Matrix Estimation using Quadratic Approximation. NIPS, [3] C. J. Hsieh, I. S. Dhillon, P. Ravikumar, A. Banerjee. A Divide-and-Conquer Procedure for. NIPS, [4] P. A. Olsen, F. Oztoprak, J. Nocedal, and S. J. Rennie. Newton-Like Methods for Sparse Inverse Covariance Estimation. Optimization Online, [5] O. Banerjee, L. El Ghaoui, and A. d Aspremont Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data. JMLR, [6] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, [7] L. Li and K.-C. Toh. An inexact interior point method for l1-reguarlized sparse covariance selection. Mathematical Programming Computation, [8] K. Scheinberg, S. Ma, and D. Glodfarb. Sparse inverse covariance selection via alternating linearization methods. NIPS, [9] K. Scheinberg and I. Rish. Learning sparse Gaussian Markov networks using a greedy coordinate ascent approach. Machine Learning and Knowledge Discovery in Databases, 2010.

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables for a Million Variables Cho-Jui Hsieh The University of Texas at Austin NIPS Lake Tahoe, Nevada Dec 8, 2013 Joint work with M. Sustik, I. Dhillon, P. Ravikumar and R. Poldrack FMRI Brain Analysis Goal:

More information

Sparse Inverse Covariance Estimation for a Million Variables

Sparse Inverse Covariance Estimation for a Million Variables Sparse Inverse Covariance Estimation for a Million Variables Inderjit S. Dhillon Depts of Computer Science & Mathematics The University of Texas at Austin SAMSI LDHD Opening Workshop Raleigh, North Carolina

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables Cho-Jui Hsieh, Mátyás A. Sustik, Inderjit S. Dhillon, Pradeep Ravikumar Department of Computer Science University of Texas at Austin

More information

Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation

Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation Cho-Jui Hsieh, Mátyás A. Sustik, Inderjit S. Dhillon, and Pradeep Ravikumar Department of Computer Science University of Texas

More information

QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models

QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models Cho-Jui Hsieh, Inderjit S. Dhillon, Pradeep Ravikumar University of Texas at Austin Austin, TX 7872 USA {cjhsieh,inderjit,pradeepr}@cs.utexas.edu

More information

A Divide-and-Conquer Procedure for Sparse Inverse Covariance Estimation

A Divide-and-Conquer Procedure for Sparse Inverse Covariance Estimation A Divide-and-Conquer Procedure for Sparse Inverse Covariance Estimation Cho-Jui Hsieh Dept. of Computer Science University of Texas, Austin cjhsieh@cs.utexas.edu Inderjit S. Dhillon Dept. of Computer Science

More information

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Xiaocheng Tang Department of Industrial and Systems Engineering Lehigh University Bethlehem, PA 18015 xct@lehigh.edu Katya Scheinberg

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation

Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation Cho-Jui Hsieh, Mátyás A. Sustik, Inderjit S. Dhillon, and Pradeep Ravikumar Department of Computer Science University of Texas

More information

Newton-Like Methods for Sparse Inverse Covariance Estimation

Newton-Like Methods for Sparse Inverse Covariance Estimation Newton-Like Methods for Sparse Inverse Covariance Estimation Peder A. Olsen Figen Oztoprak Jorge Nocedal Stephen J. Rennie June 7, 2012 Abstract We propose two classes of second-order optimization methods

More information

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J 7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models

QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models Cho-Jui Hsieh, Inderjit S. Dhillon, Pradeep Ravikumar University of Texas at Austin Austin, TX 7872 USA {cjhsieh,inderjit,pradeepr}@cs.utexas.edu

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Sparse Inverse Covariance Estimation with Hierarchical Matrices

Sparse Inverse Covariance Estimation with Hierarchical Matrices Sparse Inverse Covariance Estimation with ierarchical Matrices Jonas Ballani Daniel Kressner October, 0 Abstract In statistics, a frequent task is to estimate the covariance matrix of a set of normally

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Frist order optimization methods for sparse inverse covariance selection

Frist order optimization methods for sparse inverse covariance selection Frist order optimization methods for sparse inverse covariance selection Katya Scheinberg Lehigh University ISE Department (joint work with D. Goldfarb, Sh. Ma, I. Rish) Introduction l l l l l l The field

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Lecture 17: October 27

Lecture 17: October 27 0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Jérôme Thai 1 Timothy Hunter 1 Anayo Akametalu 1 Claire Tomlin 1 Alex Bayen 1,2 1 Department of Electrical Engineering

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso Exact Hybrid Covariance Thresholding for Joint Graphical Lasso Qingming Tang Chao Yang Jian Peng Jinbo Xu Toyota Technological Institute at Chicago Massachusetts Institute of Technology Abstract. This

More information

Joint Gaussian Graphical Model Review Series I

Joint Gaussian Graphical Model Review Series I Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 11, 2016 Paper presentations and final project proposal Send me the names of your group member (2 or 3 students) before October 15 (this Friday)

More information

Memory Efficient Kernel Approximation

Memory Efficient Kernel Approximation Si Si Department of Computer Science University of Texas at Austin ICML Beijing, China June 23, 2014 Joint work with Cho-Jui Hsieh and Inderjit S. Dhillon Outline Background Motivation Low-Rank vs. Block

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang Intel-NTU, National Taiwan University, Taiwan Shou-de Lin Intel-NTU, National Taiwan University, Taiwan WANGJIM123@GMAIL.COM SDLIN@CSIE.NTU.EDU.TW Abstract This paper proposes a robust method

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Structure estimation for Gaussian graphical models

Structure estimation for Gaussian graphical models Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Technical report no. SOL Department of Management Science and Engineering, Stanford University, Stanford, California, September 15, 2013.

Technical report no. SOL Department of Management Science and Engineering, Stanford University, Stanford, California, September 15, 2013. PROXIMAL NEWTON-TYPE METHODS FOR MINIMIZING COMPOSITE FUNCTIONS JASON D. LEE, YUEKAI SUN, AND MICHAEL A. SAUNDERS Abstract. We generalize Newton-type methods for minimizing smooth functions to handle a

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

PU Learning for Matrix Completion

PU Learning for Matrix Completion Cho-Jui Hsieh Dept of Computer Science UT Austin ICML 2015 Joint work with N. Natarajan and I. S. Dhillon Matrix Completion Example: movie recommendation Given a set Ω and the values M Ω, how to predict

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting

More information

Large-scale Collaborative Ranking in Near-Linear Time

Large-scale Collaborative Ranking in Near-Linear Time Large-scale Collaborative Ranking in Near-Linear Time Liwei Wu Depts of Statistics and Computer Science UC Davis KDD 17, Halifax, Canada August 13-17, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Bregman Divergences for Data Mining Meta-Algorithms

Bregman Divergences for Data Mining Meta-Algorithms p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,

More information

Topology identification of undirected consensus networks via sparse inverse covariance estimation

Topology identification of undirected consensus networks via sparse inverse covariance estimation 2016 IEEE 55th Conference on Decision and Control (CDC ARIA Resort & Casino December 12-14, 2016, Las Vegas, USA Topology identification of undirected consensus networks via sparse inverse covariance estimation

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Proximity-Based Anomaly Detection using Sparse Structure Learning

Proximity-Based Anomaly Detection using Sparse Structure Learning Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF

Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF Jong Ho Kim, Youngsuk Park December 17, 2016 1 Introduction Markov random fields (MRFs) are a fundamental fool on data

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Robust estimation of scale and covariance with P n and its application to precision matrix estimation Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Online Dictionary Learning with Group Structure Inducing Norms

Online Dictionary Learning with Group Structure Inducing Norms Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

IBM Research Report. SINCO - A Greedy Coordinate Ascent Method for Sparse Inverse Covariance Selection Problem. Katya Scheinberg Columbia University

IBM Research Report. SINCO - A Greedy Coordinate Ascent Method for Sparse Inverse Covariance Selection Problem. Katya Scheinberg Columbia University RC24837 (W98-46) August 4, 29 Mathematics IBM Research Report - A Greedy Coordinate Ascent Method for Sparse Inverse Covariance Selection Problem Katya Scheinberg Columbia University Irina Rish IBM Research

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric

More information

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence oé LAMDA Group H ŒÆOŽÅ Æ EâX ^ #EâI[ : liwujun@nju.edu.cn Dec 10, 2016 Wu-Jun Li (http://cs.nju.edu.cn/lwj)

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM

Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM Lee, Ching-pei University of Illinois at Urbana-Champaign Joint work with Dan Roth ICML 2015 Outline Introduction Algorithm Experiments

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Sparse Linear Programming via Primal and Dual Augmented Coordinate Descent

Sparse Linear Programming via Primal and Dual Augmented Coordinate Descent Sparse Linear Programg via Primal and Dual Augmented Coordinate Descent Presenter: Joint work with Kai Zhong, Cho-Jui Hsieh, Pradeep Ravikumar and Inderjit Dhillon. Sparse Linear Program Given vectors

More information

Nearest Neighbor based Coordinate Descent

Nearest Neighbor based Coordinate Descent Nearest Neighbor based Coordinate Descent Pradeep Ravikumar, UT Austin Joint with Inderjit Dhillon, Ambuj Tewari Simons Workshop on Information Theory, Learning and Big Data, 2015 Modern Big Data Across

More information

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

Big Data Optimization in Machine Learning

Big Data Optimization in Machine Learning Lehigh University Lehigh Preserve Theses and Dissertations 2016 Big Data Optimization in Machine Learning Xiaocheng Tang Lehigh University Follow this and additional works at: http://preserve.lehigh.edu/etd

More information