Generalized Power Method for Sparse Principal Component Analysis
|
|
- Allan Washington
- 5 years ago
- Views:
Transcription
1 Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work with M. Journée, Yu. Nesterov and R. Sepulchre
2 1. Outline Sparse PCA Optimization reformulations Algorithm and complexity analysis Numerical experiments
3 2. Sparse PCA (spca) Input: Matrix A = [a 1,..., a n ] R p n, p n Goal: Find unit-norm vector z R n which simultaneously 1. maximizes variance z T A T Az 2. is sparse If sparsity is not required, z is the dominant right singular vector of A: max z T A T Az = λ max (A T A) = (σ max (A)) 2. z T z 1 Extracting more components: Discussion above is about the single-unit case (m = 1). Often more components (sparse dominant singular directions) are needed: block case (m > 1). Applications: gene expression, finance, data visualization, signal processing, vision,...
4 3. Our approach to spca 1. Formulate spca as an optimization problem with sparsityinducing penalty (l 1 or l 0 ) controlled by a single parameter 2. Reformulate to get problem of a suitable form: suitable for analysis suitable for computation 3. Solve reformulation using a simple gradient scheme 4. Recover solution of the original problem Will illustrate steps 1) and 2) and 4) on the single-unit l 1 penalized case and then jump to general analysis of step 3).
5 4. Three observations about the l 1 penalty Notation: z 1 = i z i. Penalty formulation of single-unit spca: Observations: φ l1 (γ) def = max zt A T Az γ z 1. (1) z T z 1 1. γ = 0 no reason to expect zero coordinates in z 2. γ a i 2 def = max i a i, then z = 0. Indeed, since max z 0 Az 2 z 1 = max z 0 max z 0 i z ia i 2 z 1 i z i a i 2 i z i = max i a i In fact, γ a i 2 zi (γ) = 0 for all i
6 5. Reformulation Note that: φ l1 (γ) = max Az 2 γ z 1 = max max z B n z B n x B xt Az γ z 1 p = max x B p max z B n n z i (a T i x) γ z i. For fixed x, the inner max-problem has the closed-form solution i=1 z i = sign(a T i x)[ a T i x γ] +, z = z/ z 2. Hence to solve (1), we only need to solve this reformulation: φ 2 l 1 (γ) = max x R p x T x=1 n [ a T i x γ] 2 +, (2) i=1 Note: The objective function of (2) is convex and smooth and the feasible region is in R p instead of R n (p n).
7 6. Single-unit spca via l 0 penalty Similar story as in the l 1 case, so only briefly: Notation: z 0 = Card{i : z i 0}. Penalty formulation: φ l0 (γ) def = max z T A T Az γ z 0, (3) z T z 1 To solve (3), first solve this reformulation: and then set φ l1 (γ) = max x R p x T x=1 n [(a T i x) 2 γ] +, (4) i=1 z i = [sign((a T i x) 2 γ)] + a T i x, z = z/ z 2.
8 7. Maximizing convex functions Problems (2) and (4) (and their block generalizations) are of the form f = max f(x), (P) x Q where E is a finite-dimensional vector space, f : E R is a convex function, Q E is compact. In particular, Q = unit Euclidean sphere in R p / Single-unit case (m = 1) Q = Stiefel manifold in R p m, i.e. the set of p m matrices with orthonormal columns / Block case (m > 1) How to solve (P)?
9 8. Gradient algorithm We solve (P) using this simple gradient method: 1. Input: Initial iterate x 0 Q 2. For k 0 repeat x k+1 Arg max{f(x k ) + f (x k ), y x k y Q} k k + 1 This algorithm generalizes the power method for computing the largest eigenvalue of a symmetric positive definite matrix C: f(x) = 1 2 xt Cx x k+1 = Cx k Cx k 2. Hence Generalized Power Method (GPower).
10 9. Iteration complexity: basic result At any point x Q we introduce a measure for the first-order optimality conditions: (x) def = max y Q f (x), y x. Clearly, (x) 0 and it vanishes only at the points where the gradient f (x) belongs to the normal cone to Conv(Q) at x. Denote k def = min 0 i k (x i). Theorem Let sequence {x k } k=0 be generated by GPower as applied to a convex function f. Then the sequence {f(x k )} k=0 is monotonically increasing and lim (x k ) = 0. Moreover, k k f f(x 0 ). (5) k + 1
11 10. Strong convexity of functions and sets Function f is strongly convex if there exists a constant σ f > 0 such that for any x, y E f(y) f(x) + f (x), y x + σ f 2 y x 2. The set Conv(Q) is strongly convex if there exists a constant σ Q > 0 such that for any x, y Conv(Q) and α [0, 1] the following inclusion holds: αx + (1 α)y + σ Q 2 α(1 α) x y 2 S Conv(Q). Theorem If f : E R is nonnegative, has σ f L f -Lipschitz, then for any ω > 0, the level set Q ω def = {x f(x) ω} is strongly convex with parameter σ Qω = σ f / 2ωL f. > 0 and f is
12 11. Refined analysis under strong convexity Theorem Let f be convex with strong convexity parameter σ f 0, and Conv(Q) be convex with strong convexity parameter σ Q 0. If 0 < δ f = inf x Q f (x) and either σ f > 0 or σ Q > 0, then N k=0 x k+1 x k 2 2(f f(x 0 )) σ Q δ f + σ f. Note: If f is not minimized on Q, then δ f > 0.
13 12. Computational experiments We compare the following Sparse PCA algorithms: GPower l1 Single-unit sparse PCA via l 1 -penalty [1] GPower l0 Single-unit sparse PCA via l 0 -penalty [1] GPower l1,m Block sparse PCA via l 1 -penalty [1] GPower l0,m Block sparse PCA via l 0 -penalty [1] SPCA SPCA algorithm [2] Greedy Greedy method [3] rsvd l1 Method [4] with l 1 -penalty ( soft thresholding ) rsvd l0 Method [4] with l 0 -penalty ( hard thresholding ) Greedy slows down dramatically, compared to the other methods, if aimed at obtaining a component of higher cardinality. Test Problems: Randomly generated A = Gaussian with zero mean and unit variance Gene-expression data
14 13. Trade-off curves Trade-off between explained variance and cardinality. The algorithms aggregate in two groups. The methods GPower l1, GPower l0, Greedy and rsvd l0 do better (black solid lines), and SPCA and rsvd l1 do worse (red dashed lines). Based on 100 random test problems of size p = 100, n = 300.
15 14. Controlling sparsity with γ Dependence of cardinality on the value of the sparsity-inducing parameter γ. The horizontal axis shows a normalized interval of reasonable values of γ. The vertical axis shows percentage of nonzero coefficients of the resulting sparse loading vector z. Based on 100 random test problems of size p = 100, n = 300.
16 15. How does the trade-off evolve in time? Evolution of the explained variance (solid lines and left axis) and cardinality (dashed lines and right axis) in time for the methods GPower l1 and rsvd l1. Based on random test problem of size p = 250 and n = 2500.
17 16. Random data: speed Fixed n/p ratio: p n GPower l GPower l SPCA rsvd l rsvd l Fixed p, growing n: p n GPower l GPower l SPCA rsvd l rsvd l
18 17. Gene expression data: speed Data sets (breast cancer cohorts): Study Samples (p) Genes (n) Reference Vijver van de Vijver et al. [2002] Wang Wang et al. [2005] Naderi Naderi et al. [2006] JRH Sotiriou et al. [2006] Speed (in seconds): Vijver Wang Naderi JRH-2 GPower l GPower l GPower l1,m GPower l0,m SPCA rsvd l rsvd l
19 18. Gene expression data: content PEI-values based on 536 cancer-related pathways: Vijver Wang Naderi JRH-2 PCA GPower l GPower l GPower l1,m GPower l0,m SPCA rsvd l rsvd l Pathway Enrichment Index (PEI) measures the statistical significance of the overlap between two kinds of gene sets.
20 19. Summary We have developed 4 reformulations (single unit/block l 1 /l 0 ) of the spca problem which enabled us to devise a very fast method (we work in dimension p n and use only gradients), and analyze the iteration complexity of the method; analyzed a simple gradient method (Generalized Power Method) for maximizing convex functions on compact sets; applied GPower to 4 reformulations and ended-up with 4 algorithms for spca; tested our algorithms on random and gene expression data: they outperform other methods significantly in speed (finish before some other algorithms initialize), for the biological data, they produce slightly higher quality of solution in terms of PEI.
21 20. References [1] M. Journée, Yu. Nesterov, P. Richtárik, R. Sepulchre. Generalized Power Method for Sparse Principal Component Analysis (this talk). submitted to Journal of Machine Learning Research, November [2] H. Zou, T. Hastie, R. Tibshirani. Sparse Principal Component Analysis. Journal of Computational and Graphical Statistics, 15(2): , [3] A. d Aspremont, F. R. Bach, L. El Ghaoui. Optimal Solutions for Sparse Principal Component Analysis. Journal of Machine Learning Research, 9: , [4] H. Shen, J. Z. Huang. Sparse Principal Component Analysis via Regularized Low Rank Matrix Approximation. Journal of Multivariate Analysis, 99(6): , 2008.
Generalized power method for sparse principal component analysis
CORE DISCUSSION PAPER 2008/70 Generalized power method for sparse principal component analysis M. JOURNÉE, Yu. NESTEROV, P. RICHTÁRIK and R. SEPULCHRE November 2008 Abstract In this paper we develop a
More informationGeneralized Power Method for Sparse Principal Component Analysis
Journal of Machine Learning Research () Submitted 11/2008; Published Generalized Power Method for Sparse Principal Component Analysis Yurii Nesterov Peter Richtárik Center for Operations Research and Econometrics
More informationSparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations
Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Martin Takáč The University of Edinburgh Joint work with Peter Richtárik (Edinburgh University) Selin
More informationarxiv: v1 [math.oc] 28 Nov 2008
arxiv:0811.4724v1 [math.oc] 28 Nov 2008 Generalized power method for sparse principal component analysis Michel Journée Yurii Nesterov Peter Richtárik Rodolphe Sepulchre Abstract In this paper we develop
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,
More informationSTATS 306B: Unsupervised Learning Spring Lecture 13 May 12
STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality
More informationConditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint
Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationSparse PCA with applications in finance
Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction
More informationAn Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA
An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA Matthias Hein Thomas Bühler Saarland University, Saarbrücken, Germany {hein,tb}@cs.uni-saarland.de
More informationInverse Power Method for Non-linear Eigenproblems
Inverse Power Method for Non-linear Eigenproblems Matthias Hein and Thomas Bühler Anubhav Dwivedi Department of Aerospace Engineering & Mechanics 7th March, 2017 1 / 30 OUTLINE Motivation Non-Linear Eigenproblems
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationNote on the convex hull of the Stiefel manifold
Note on the convex hull of the Stiefel manifold Kyle A. Gallivan P.-A. Absil July 9, 00 Abstract In this note, we characterize the convex hull of the Stiefel manifold and we find its strong convexity parameter.
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationSupplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data
Supplement to A Generalized Least Squares Matrix Decomposition Genevera I. Allen 1, Logan Grosenic 2, & Jonathan Taylor 3 1 Department of Statistics and Electrical and Computer Engineering, Rice University
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationTighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time
Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time Zhaoran Wang Huanran Lu Han Liu Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08540 {zhaoran,huanranl,hanliu}@princeton.edu
More informationSparse Principal Component Analysis Formulations And Algorithms
Sparse Principal Component Analysis Formulations And Algorithms SLIDE 1 Outline 1 Background What Is Principal Component Analysis (PCA)? What Is Sparse Principal Component Analysis (spca)? 2 The Sparse
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationExpectation-Maximization for Sparse and Non-Negative PCA
Christian D. Sigg Joachim M. Buhmann Institute of Computational Science, ETH Zurich, 8092 Zurich, Switzerland chrsigg@inf.ethz.ch jbuhmann@inf.ethz.ch Abstract We study the problem of finding the dominant
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming Alexandre d Aspremont alexandre.daspremont@m4x.org Department of Electrical Engineering and Computer Science Laurent El Ghaoui elghaoui@eecs.berkeley.edu
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationAn Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA
An Inverse Power Method for Nonlinear Eigenproblems with Applications in -Spectral Clustering and Sparse PCA Matthias Hein Thomas Bühler Saarland University, Saarbrücken, Germany {hein,tb}@csuni-saarlandde
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationRandomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints
Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline
More informationSparse Principal Component Analysis with Constraints
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Sparse Principal Component Analysis with Constraints Mihajlo Grbovic Department of Computer and Information Sciences, Temple University
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationEE 381V: Large Scale Optimization Fall Lecture 24 April 11
EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationTruncated Power Method for Sparse Eigenvalue Problems
Journal of Machine Learning Research 14 (2013) 899-925 Submitted 12/11; Revised 10/12; Published 4/13 Truncated Power Method for Sparse Eigenvalue Problems Xiao-Tong Yuan Tong Zhang Department of Statistics
More informationECON 4117/5111 Mathematical Economics
Test 1 September 29, 2006 1. Use a truth table to show the following equivalence statement: (p q) (p q) 2. Consider the statement: A function f : X Y is continuous on X if for every open set V Y, the pre-image
More informationSparse Principal Component Analysis for multiblocks data and its extension to Sparse Multiple Correspondence Analysis
Sparse Principal Component Analysis for multiblocks data and its extension to Sparse Multiple Correspondence Analysis Anne Bernard 1,5, Hervé Abdi 2, Arthur Tenenhaus 3, Christiane Guinot 4, Gilbert Saporta
More informationSparse principal component analysis via regularized low rank matrix approximation
Journal of Multivariate Analysis 99 (2008) 1015 1034 www.elsevier.com/locate/jmva Sparse principal component analysis via regularized low rank matrix approximation Haipeng Shen a,, Jianhua Z. Huang b a
More informationc 2010 Society for Industrial and Applied Mathematics
SIAM J. OPTIM. Vol. 20, No. 5, pp. 2327 2351 c 2010 Society for Industrial and Applied Mathematics LOW-RANK OPTIMIZATION ON THE CONE OF POSITIVE SEMIDEFINITE MATRICES M. JOURNÉE, F. BACH, P.-A. ABSIL,
More informationOptimization on the Grassmann manifold: a case study
Optimization on the Grassmann manifold: a case study Konstantin Usevich and Ivan Markovsky Department ELEC, Vrije Universiteit Brussel 28 March 2013 32nd Benelux Meeting on Systems and Control, Houffalize,
More informationMatrix stabilization using differential equations.
Matrix stabilization using differential equations. Nicola Guglielmi Universitá dell Aquila and Gran Sasso Science Institute, Italia NUMOC-2017 Roma, 19 23 June, 2017 Inspired by a joint work with Christian
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationCompressed Sensing in Cancer Biology? (A Work in Progress)
Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University
More informationFrist order optimization methods for sparse inverse covariance selection
Frist order optimization methods for sparse inverse covariance selection Katya Scheinberg Lehigh University ISE Department (joint work with D. Goldfarb, Sh. Ma, I. Rish) Introduction l l l l l l The field
More informationPROJECTION ALGORITHMS FOR NONCONVEX MINIMIZATION WITH APPLICATION TO SPARSE PRINCIPAL COMPONENT ANALYSIS
PROJECTION ALGORITHMS FOR NONCONVEX MINIMIZATION WITH APPLICATION TO SPARSE PRINCIPAL COMPONENT ANALYSIS WILLIAM W. HAGER, DZUNG T. PHAN, AND JIAJIE ZHU Abstract. We consider concave minimization problems
More informationApplied Machine Learning for Biomedical Engineering. Enrico Grisan
Applied Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Data representation To find a representation that approximates elements of a signal class with a linear combination
More informationFull Regularization Path for Sparse Principal Component Analysis
Full Regularization Path for Sparse Principal Component Analysis Alexandre d Aspremont ORFE, Princeton University, Princeton NJ 08544, USA. aspremon@princeton.edu Francis R. Bach francis.bach@mines.org
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationLecture 7: September 17
10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively
More informationSparse representation classification and positive L1 minimization
Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng
More informationLecture 1: September 25
0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationAn Augmented Lagrangian Approach for Sparse Principal Component Analysis
An Augmented Lagrangian Approach for Sparse Principal Component Analysis Zhaosong Lu Yong Zhang July 12, 2009 Abstract Principal component analysis (PCA) is a widely used technique for data analysis and
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationAdaptive First-Order Methods for General Sparse Inverse Covariance Selection
Adaptive First-Order Methods for General Sparse Inverse Covariance Selection Zhaosong Lu December 2, 2008 Abstract In this paper, we consider estimating sparse inverse covariance of a Gaussian graphical
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationHigh-dimensional Statistics
High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given
More informationAn augmented Lagrangian approach for sparse principal component analysis
Math. Program., Ser. A DOI 10.1007/s10107-011-0452-4 FULL LENGTH PAPER An augmented Lagrangian approach for sparse principal component analysis Zhaosong Lu Yong Zhang Received: 12 July 2009 / Accepted:
More informationDeflation Methods for Sparse PCA
Deflation Methods for Sparse PCA Lester Mackey Computer Science Division University of California, Berkeley Berkeley, CA 94703 Abstract In analogy to the PCA setting, the sparse PCA problem is often solved
More informationA tutorial on sparse modeling. Outline:
A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationApproximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.
Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and
More informationA DIRECT FORMULATION FOR SPARSE PCA USING SEMIDEFINITE PROGRAMMING
A DIRECT FORMULATION FOR SPARSE PCA USING SEMIDEFINITE PROGRAMMING ALEXANDRE D ASPREMONT, LAURENT EL GHAOUI, MICHAEL I. JORDAN, AND GERT R. G. LANCKRIET Abstract. Given a covariance matrix, we consider
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More information2.3. Clustering or vector quantization 57
Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :
More informationHigh-dimensional Statistical Models
High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationSparse PCA: Convex Relaxations, Algorithms and Applications
Sparse PCA: Convex Relaxations, Algorithms and Applications Youwei Zhang, Alexandre d Aspremont 2, and Laurent El Ghaoui 3 EECS, U.C. Berkeley. Berkeley, CA 9472. zyw@eecs.berkeley.edu 2 ORFE, Princeton
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationPackage sgpca. R topics documented: July 6, Type Package. Title Sparse Generalized Principal Component Analysis. Version 1.0.
Package sgpca July 6, 2013 Type Package Title Sparse Generalized Principal Component Analysis Version 1.0 Date 2012-07-05 Author Frederick Campbell Maintainer Frederick Campbell
More informationShort Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning
Short Course Robust Optimization and Machine Machine Lecture 4: Optimization in Unsupervised Laurent El Ghaoui EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 s
More information6. Regularized linear regression
Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationLecture 23: November 21
10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationSupplement to Sparse Non-Negative Generalized PCA with Applications to Metabolomics
Supplement to Sparse Non-Negative Generalized PCA with Applications to Metabolomics Genevera I. Allen Department of Pediatrics-Neurology, Baylor College of Medicine, Jan and Dan Duncan Neurological Research
More informationA SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA
A SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA Gilbert Saporta CEDRIC, Conservatoire National des Arts et Métiers, Paris gilbert.saporta@cnam.fr With inputs from Anne Bernard, Ph.D student Industrial
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationLecture Notes 10: Matrix Factorization
Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To
More informationUC Berkeley UC Berkeley Electronic Theses and Dissertations
UC Berkeley UC Berkeley Electronic Theses and Dissertations Title Sparse Principal Component Analysis: Algorithms and Applications Permalink https://escholarship.org/uc/item/34g5207t Author Zhang, Youwei
More informationA NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang
A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.
More informationRandomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming
Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone
More informationEfficient Methods for Overlapping Group Lasso
Efficient Methods for Overlapping Group Lasso Lei Yuan Arizona State University Tempe, AZ, 85287 Lei.Yuan@asu.edu Jun Liu Arizona State University Tempe, AZ, 85287 j.liu@asu.edu Jieping Ye Arizona State
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationSparse Principal Component Analysis: Algorithms and Applications
Sparse Principal Component Analysis: Algorithms and Applications Youwei Zhang Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2013-187 http://www.eecs.berkeley.edu/pubs/techrpts/2013/eecs-2013-187.html
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationI P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION
I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationGeneralized Conditional Gradient and Its Applications
Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized
More informationEffective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationRegularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
More informationLift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints
Lift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints Nicolas Boumal Université catholique de Louvain (Belgium) IDeAS seminar, May 13 th, 2014, Princeton The Riemannian
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More information