Probabilistic Graphical Models
|
|
- Evelyn Greene
- 5 years ago
- Views:
Transcription
1 School of Computer Science Probabilistic Graphical Models Distributed ADMM for Gaussian Graphical Models Yaoliang Yu Lecture 29, April 29, 2015 Eric CMU,
2 Networks / Graphs Eric CMU,
3 Where do graphs come from? Prior knowledge Mom told me A is connected to B Estimate from data! We have seen this in previous classes Will see two more today Sometimes may also be interested in edge weights An easier problem Real networks are BIG Require distributed optimization Eric CMU,
4 Structural Learning for completely observed Data MRF (Recall) 1) ( x,, x ( ( 1) 1 n ( 2) ( 2) 1,, n ( x x M ) ( x,, x ) ) ( ( M ) 1 n ) Eric CMU,
5 Gaussian Graphical Models Multivariate Gaussian density: WOLG: let We can view this as a continuous Markov Random Field with potentials defined on every node and edge: { } ) - ( ) - ( - exp ) ( ), ( / / µ µ π µ x x x Σ Σ = Σ T n p ( ) = = < j i j i ij i i ii n p x x q x q Q Q x x x p / 2 1/ exp ) (2 ) 0,,,, ( π µ! 5 Eric CMU,
6 The covariance and the precision matrices Covariance matrix Graphical model interpretation? Precision matrix Graphical model interpretation? Eric CMU,
7 Sparse precision vs. sparse covariance in GGM Σ = Σ = X 1 Σ 15 = 0 X1 X 5 X nbrs (1) or nbrs(5) 1 X 5 Σ15 = 0 Eric CMU,
8 Another example How to estimate this MRF? What if p >> n MLE does not exist in general! What about only learning a sparse graphical model? This is possible when s=o(n) Very often it is the structure of the GM that is more interesting Eric CMU,
9 Recall lasso Eric CMU,
10 Graph Regression (Meinshausen & Buhlmann 06) Neighborhood selection Lasso: Eric CMU,
11 Graph Regression Eric CMU,
12 Graph Regression Pros: Computationally convenient Strong theoretical guarantee ( p <= pol(n) ) Cons: Asymmetry Not minimax optimal Eric CMU,
13 The regularized MLE (Yuan & Lin 07) min Q log det Q +tr(qs)+ kqk 1 S: sample covariance matrix, may be singular Q 1 : may exclude the diagonal log det Q: implicitly force Q to be PSD symmetric Pros Cons Single step for estimating graph and inverse covariance MLE! Computationally challenging, partly solved by Glasso (Banergee et al 08, Friedman et al 08) Eric CMU,
14 Many many follow-ups Eric CMU,
15 A closer look of RMLE min Q log det Q +tr(qs)+ kqk 1 Set derivative to 0: Q 1 + S + sign(q) =0 kq 1 Sk 1 apple Can we (?!): min Q kqk 1 s.t. kq 1 Sk 1 apple Eric CMU,
16 CLIME (Cai et al. 11) Further relaxation min Q kqk 1 s.t. ksq Ik 1 apple Constraint controls Q S 1 Objective controls sparsity in Q Q is not required to be PSD or symmetric Separable! LP!!! Both objective and constraint are element-wise separable Can be reformulated as LP Strong theoretical guarantee Variations are minimax-optimal (Cai et al. 12, Liu & Wang 12) Eric CMU,
17 But for BIG problems min Q kqk 1 s.t. ksq Ik 1 apple Standard solvers for LP can be slow Embarrassingly parallel: Solve each column of Q independently in each core/machine min q i kq i k 1 s.t. ksq i e i k 1 apple Thanks for not having PSD constraint on Q Still troublesome if S is big Need to consider first-order methods Eric CMU,
18 A gentle introduction to alternating direction method of multipliers (ADMM) Eric CMU,
19 Optimization with coupling variables J uncoupled L coupled Canonical form: min w,z f(w)+g(z), s.t. Aw + Bz = c, where w 2 R m,z 2 R p,a: R m! R q,b : R p! R q,c2 R q Numerically challenging because Function f or g nonsmooth or constrained (i.e., can take value ) Linear constraint couples the variables w and z Large scale, interior point methods NA Naively alternating x and z does not work Min w 2 s.t. w + z = 1; optimum clearly is w = 0 Start with say w = 1 à z = 0 à w = 1 à z = 0 However, without coupling, can solve separately w and z Idea: try to decouple vars in the constraint! 1 Eric CMU,
20 Example: Empirical Risk Minimization (ERM) Each i corresponds to a training point (x i, y i ) Loss f i measures the fitness of the model parameter w least squares: support vector machines: boosting: logistic regression: g is the regularization function, e.g. or Vars coupled in obj, but not in constraint (none) min w n g(w)+ X f i (w) f i (w) =(y i w > x i ) 2 L coupled i=1 f i (w) =(1 y i w > x i ) + f i (w) =exp( y i w > x i ) f i (w) = log(1 + exp( y i w > x i )) Reformulate: transfer coupling from obj to constraint Arrive at canonical form, allow unified treatment later nkwk 2 2 nkwk 1 Eric CMU,
21 Why canonical form? ERM: min w g(w)+ n X i=1 f i (w) Canonical form: min w,z f(w)+g(z), s.t. Aw + Bz = c, where w 2 R m,z 2 R p,a: R m! R q,b : R p! R q,c2 R q ADMM algorithm (to be introduced shortly) excels at solving the canonical form Canonical form is a general template for constrained problems ERM (and many other problems) can be converted to canonical form through variable duplication (see next slide) Eric CMU,
22 How to: variable duplication Duplicate variables to achieve canonical form All w i must (eventually) agree Downside: many extra variables, increase problem size min w n g(w)+ X f i (w) Implicitly maintain duplicated variables i=1 v =[w 1,...,w n ] > min g(z)+ X f i(w i ), s.t. w i = z,8i v,z i {z } {z } v [I,...,I] f(v) > z=0 Global consensus constraint: 8i, w i = z Eric CMU,
23 Augmented Lagrangian Canonical form: min w,z f(w)+g(z), s.t. Aw + Bz = c, where w 2 R m, z 2 R p,a: R m! R q,b : R p! R q, c 2 R q Intro Lagrangian multiplier to decouple variables min w,z > (Aw + Bz {z c)+ µ 2 kaw + Bz ck2 2 } L µ (w,z; ) L µ : augmented Lagrangian More complicated min-max problem, but no coupling constraints Eric CMU,
24 Why Augmented Lagrangian? Quadratic term gives numerical stability min w,z > (Aw + Bz {z c)+ µ 2 kaw + Bz ck2 2 } L µ (w,z; ) May lead to strong convexity in w or z Faster convergence when strongly convex Allows larger step size (due to higher stability) Prevents subproblems diverging to infinity (again, stability) But sometimes better to work with normal Lagrangian Eric CMU,
25 ADMM Algorithm Fix dual, block coordinate descent on primal w, z min w,z > (Aw + Bz {z c)+ µ 2 kaw + Bz ck2 2 } L µ (w,z; ) arg min L µ(w, z t ; w z t+1 arg min L µ (w t+1, z; z Fix primal w, z, gradient ascent on dual w t+1 t+1 t + (Aw t+1 + Bz t+1 c) Step size can be large, e.g. = µ Usually rescale to remove / t ) f(w)+ µ 2 kaw + Bzt c + t /µk 2 t ) g(z)+ µ 2 kawt+1 + Bz c + t /µk 2 Eric CMU,
26 ERM revisited Reformulate by duplicating variables min v,z g(z)+ X f i(w i ), s.t. w i = z, 8i i {z } {z } v [I,...,I] f(v) > z=0 ADMM x-step: w t+1 arg min w L µ(w, z t ; t ) f(w)+ µ 2 kaw + Bzt c + t /µk 2 = X i f i (w i )+ µ 2 kw i z t + t ik 2 Thanks to duplicating Completely decoupled Parallelizable Closed-form if f i is simple Eric CMU,
27 ADMM: History and Related Augmented Lagrangian Method (ALM): solve w, z jointly even though coupled (Bertsekas'82) and refs therein Alternating Direction of Multiplier Method (ADMM): alternate w and z as previous slide (Boyd et al.'10) and refs therein Operator splitting for PDEs: Douglas, Peaceman, and Rachford (50s-70s) Glowinsky et al.'80s, Gabay'83; Spingarn'85 (Eckstein & Bertsekas'92; He et al.'02) in variational inequality Lots of recent work. Eric CMU,
28 ADMM: Linearization x t+1 Demanding step in each iteration of ADMM (similar for z): Diagonal A: reduce to proximal map (more later) of f f(x) =kxk 1, soft-shrinkage: sign(x) ( x µ) + Non-diagonal A: no closed-form, messy inner loop Instead, reduce to diagonal A by arg min L µ (x, z t ; y t ) = f(x)+g(z t )+y t > (Ax + Bz t c)+ µ 2 kax + Bz t ck 2 2 x A single gradient step: x t+1 x t )+A > y t + µa > (Ax t + Bz t c) x t Or, linearize the quadratic at : x t+1 arg min f(x)+y t > Ax +(x x t ) > µa > (Ax t + Bz t c)+ µ x 2 kx x tk 2 2 {z } Intuition: x re-computed in the next iteration anyways No need for perfect x f(x)+ µ 2 kx x t+a > (Ax t +Bz t c+y t /µ)k 2 2 Eric CMU,
29 Convergence Guarantees: Fixedpoint theory Recall some definitions Lagrangian: ADMM = Douglas-Rachford splitting Fixed-point iteration! proximal map P µ f (w) := arg min z reflection map R µ f (w) :=2Pµ f (w) convergence follows, e.g. (Bauschke & Combettes'13) explains why dual y, not primal x or z, always converges 1 2µ kz wk2 2 + f(z) well-defined for convex f, non-expansive: kt (x) T (y)k 2 applekx yk 2 proximal map generalizes the Euclidean projection L 0 (x, z; y) =min f(x)+y > Ax +min g(z)+y > (Bz c) x z {z } {z } d 1 (y) d 2 (y) w 1 2 (w + Rµ d 2 (R µ d 1 (w))); y P µ d 2 (w) w Eric CMU,
30 ADMM for CLIME Eric CMU,
31 Apply ADMM to CLIME min Q kqk 1 s.t. ksq Ek 1 apple Solve a block of columns of Q in each core/machine E is the corresponding block in I Step 1: reduce to ADMM canonical form Use variable duplicating min Q,Z kqk 1 s.t. kz Ek 1 apple, Z = SQ min Q,Z kqk 1 +[kz Ek 1 apple ] s.t. Z = SQ Eric CMU,
32 Apply ADMM to CLIME (cont ) Step 2: Write out augmented Lagrangian L(Q, Z; Y )=kqk 1 +[kz Ek 1 apple ]+ tr[(sq Z)Y ]+ 2 ksq Zk2 F Step 3: Perform primal-dual updates Q Z arg min Q kqk ksq Z + Y k2 F arg min Z [kz Ek 1 apple ]+ 2 ksq Z + Y k2 F = arg min kz Ek 1 apple Y Y + SQ Z 2 ksq Z + Y k2 F Eric CMU,
33 Apply ADMM to CLIME (cont ) Step 4: Solve the subproblems Lagrangian dual Y: trivial Primal Z: projection to l_inf ball, separable, easy Primal Q: easy if S is orthogonal, in general a lasso problem Bypass double loop by linearization Intuition: wasteful to solve Q to death min Q kqk 1 + tr(q > S(Y + SQ t Z)) + 2 kq Q tk 2 F Soft-thresholding Putting things together Q Z arg min Q kqk ksq Z + Y k2 F arg min Z [kz Ek 1 apple ]+ 2 ksq Z + Y k2 F = arg min kz Ek 1 apple Y Y + SQ Z 2 ksq Z + Y k2 F Eric CMU,
34 Exploring structure Expensive step in ADMM-CLIME: Matrix-matrix multiplication: SQ and alike If p >> n, S is size p x p but of rank at most n Write S = AA, and do A(A Q) A =[X 1,...,X n ] 2 R p n Matrix * matrix >> for loop of matrix * vector Preferable to solve a balanced block of columns Eric CMU,
35 Parallelization (Wang et al 13) Embarrassingly parallel If A fits into memory Chop A into small blocks and distribute Communication may be high Eric CMU,
36 Numerical results (Wang et al 13) Eric CMU,
37 Numerical results cont (Wang et al 13) Eric CMU,
38 Nonparanormal extensions Eric CMU,
39 Nonparanormal (Liu et al. 09) Z i = f i (X i ), i =1,...,p (Z 1,...,Z p ) N(0, ) f i: unknown monotonic functions Observe X, but not Z Independence preserved under transformations X i? X j X rest () Z i? Z j Z rest () Q ij =0 Can estimate f i first, then apply glasso on f i (X i ) Estimating functions can be slow, nonparametric rate Eric CMU,
40 A neat observation Since f i is monotonic Z i,: comonotone / concordant with X i,: Use rank estimator! Z i = f i (X i ), i =1,...,p (Z 1,...,Z p ) N(0, ) Z i,1,z i,2,...,z i,n R i,1,r i,2,...,r i,n Want, but do not observe Z Maybe??? ij = 1 n X i,1,x i,2,...,x i,n nx k=1 Z i,k Z j,k? = 1 n nx k=1 R i,k R j,k Eric CMU,
41 Kendall s tau Assuming no ties: Key: ij = 1 n(n 1) k,` Complexity of computing Kendall s tau? Genuine, asymptotically unbiased estimate of t 7! 2sin( 6 t) X sign[(r i,k R i,`)(r j,k R j,`)] ij =2sin( 6 E( ij)) is a contraction, preserving concentration After having, use whatever glasso, e.g., CLIME Can also use other rank estimator, e.g., Spearman s rho Eric CMU,
42 Summary Gaussian graphical model selection Neighborhood selection, Regularized MLE, CLIME Implicit PSD vs. Explicit PSD Distributed ADMM Generic procedure Can distribute matrix product Nonparanormal Gaussian graphical model Rank statistics Plug-in estimator Eric CMU,
43 References Graph Lasso High-Dimensional Graphs and Variable Selection with the LASSO, Meinshausen and Buhlmann, Annals of Statistics, vol. 34, 2006 Model Selection and Estimation in the Gaussian Graphical Model, Yuan and Lin, Biometrika, vol. 94, 2007 A Constrained l_1 minimization Approach for Sparse Precision Matrix Estimation, Cai et al., Journal of the American Statistical Association, 2011 Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data, Banerjee et al, Journal of Machine Learning Research, 2008 Sparse Inverse Covariance Estimation with the Graphical Lasso, Friedman et al., Biostatistics, 2008 Nonparanormal The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs, Han Liu et al., Journal of Machine Learning Research, 2009 Regularized Rank-based Estimation of High-Dimensional Nonparanormal Graphical Models, Lingzhou Xue and Hui Zou, Annals of Statistics, vol. 40, 2012 High-Dimensional Semiparametric Gaussian Copula Graphical Models, Han Liu et al., Annals of Statistics, vol. 40, 2012 ADMM Large Scale Distributed Sparse Precision Estimation, Wang et al., NIPS 2013 Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Boyd et al., Foundation and Trends in Machine Learning, 2011 Eric CMU,
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationDistributed Optimization via Alternating Direction Method of Multipliers
Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition
More informationDivide-and-combine Strategies in Statistical Modeling for Massive Data
Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06
More informationSparse Graph Learning via Markov Random Fields
Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationFrist order optimization methods for sparse inverse covariance selection
Frist order optimization methods for sparse inverse covariance selection Katya Scheinberg Lehigh University ISE Department (joint work with D. Goldfarb, Sh. Ma, I. Rish) Introduction l l l l l l The field
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationCausal Inference: Discussion
Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement
More informationRecent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationThe Nonparanormal skeptic
The Nonpara skeptic Han Liu Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Fang Han Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Ming Yuan Georgia Institute
More informationLinearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学
Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 16 Extra: Support Vector Machine Optimization with Dual Dr. Yanjun Qi University of Virginia Department of Computer Science Today Extra q Optimization of SVM ü SVM
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationDistributed Optimization and Statistics via Alternating Direction Method of Multipliers
Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University Stanford Statistics Seminar, September 2010
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationCS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu
CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationContraction Methods for Convex Optimization and monotone variational inequalities No.12
XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING
ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationApproximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.
Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and
More informationSplitting methods for decomposing separable convex programs
Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationProximal splitting methods on convex problems with a quadratic term: Relax!
Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.
More information27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1
10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex
More informationIs the test error unbiased for these programs?
Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x
More informationAlternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization
Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationThe Alternating Direction Method of Multipliers
The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider
More informationRobust Inverse Covariance Estimation under Noisy Measurements
.. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationRobust and sparse Gaussian graphical modelling under cell-wise contamination
Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More information2 Regularized Image Reconstruction for Compressive Imaging and Beyond
EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.16
XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of
More informationDouglas-Rachford Splitting: Complexity Estimates and Accelerated Variants
53rd IEEE Conference on Decision and Control December 5-7, 204. Los Angeles, California, USA Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants Panagiotis Patrinos and Lorenzo Stella
More informationProximity-Based Anomaly Detection using Sparse Structure Learning
Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationPreconditioning via Diagonal Scaling
Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationFirst-order methods for structured nonsmooth optimization
First-order methods for structured nonsmooth optimization Sangwoon Yun Department of Mathematics Education Sungkyunkwan University Oct 19, 2016 Center for Mathematical Analysis & Computation, Yonsei University
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationIntroduction to graphical models: Lecture III
Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationStatistical Machine Learning for Structured and High Dimensional Data
Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationLASSO Review, Fused LASSO, Parallel LASSO Solvers
Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationAn interior-point stochastic approximation method and an L1-regularized delta rule
Photograph from National Geographic, Sept 2008 An interior-point stochastic approximation method and an L1-regularized delta rule Peter Carbonetto, Mark Schmidt and Nando de Freitas University of British
More informationIs the test error unbiased for these programs? 2017 Kevin Jamieson
Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationThe Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent
The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent Yinyu Ye K. T. Li Professor of Engineering Department of Management Science and Engineering Stanford
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationBig & Quic: Sparse Inverse Covariance Estimation for a Million Variables
for a Million Variables Cho-Jui Hsieh The University of Texas at Austin NIPS Lake Tahoe, Nevada Dec 8, 2013 Joint work with M. Sustik, I. Dhillon, P. Ravikumar and R. Poldrack FMRI Brain Analysis Goal:
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More informationLecture 23: November 19
10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo
More informationJoint Gaussian Graphical Model Review Series I
Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun
More information