Subset selection with sparse matrices
|
|
- Reynold Moore
- 5 years ago
- Views:
Transcription
1 Subset selection with sparse matrices Alberto Del Pia, University of Wisconsin-Madison Santanu S. Dey, Georgia Tech Robert Weismantel, ETH Zürich February 1, 018 Schloss Dagstuhl
2 Subset selection for regression A large number of variables can be observed, and we are interested in the value of a predictor variable b. Due to time or cost constraints, it is not feasible to sample all the variables every time a prediction is required. Goal: select a small subset of variables to predict b in the future. Natural applications abound in medical or social studies. Example: Predict the risks of heart disease in terms of observable quantities age, sex, blood pressure, cholesterol level, ). The goal is to identify a small set of attributes for future tests. 1
3 Subset selection for regression b Mx + µ b x R n, µ R m m 1 Image credit: The Elements of Statistical Learning
4 Subset selection for regression Mx + µ b x R n, µ R b M ˆb Image credit: The Elements of Statistical Learning M 1
5 Approaches Mx + µ b x R n, µ R Different communities and different approaches: Enumeration: Best-subset regression. Greedy algorithms: Forward- and backward-stepwise selection, forward-stagewise regression, etc. Branch and bound: Leaps and bounds procedure. Convex optimization: Shrinkage methods: ridge regression, the lasso, least angle regression, etc. 3
6 Complexity Mx + µ b x R n, µ R The problem is NP-hard [Welch, 198]. Few polynomially solvable cases: When the n k largest eigenvalues of M M are identical, for k fixed. [Gao, Li, 013] When the covariance graph is a tree. [Das and Kempe, 008] Approximate algorithm when the covariance has constant bandwidth. [Das, Kempe, 008] Under some conditions, using l 1 norm to replace the cardinality constraint yields the exact solution with an overwhelg probability. [Donoho, 006] [Candes, Romberg, Tao 006] 4
7 Our contribution Mx + µ b x R n, µ R Can be solved in polynomial time if: i) M is obtained from a diagonal matrix by adding a fixed number of extra columns. d 1... c 1 c k d n 5
8 Our contribution Mx + µ b x R n, µ R Can be solved in polynomial time if: ii) M is obtained by adding a fixed number of extra columns to a block diagonal matrix, where each block has a fixed number of variables. A 1... c1 c k A h 5
9 First reduction A c 1 c k )x + µ b x R n+k, µ R 6
10 First reduction A c 1 c k )x + µ b x R n+k, µ R Lemma We just need to show how to solve in polynomial time the problem Ax b ) k c l λ l x R n, λ R k 6
11 The diagonal case
12 The diagonal case Dx b ) k c l λ l x R n, λ R k 7
13 The diagonal case Dx b ) k c l λ l x R n, λ R k First, we consider a simpler problem by fixing the variables λ: Dx b x R n 7
14 The diagonal case Dx b ) k c l λ l x R n, λ R k First, we consider a simpler problem by fixing the variables λ: 1. d.. x R n d n x b 7
15 The diagonal case Dx b ) k c l λ l x R n, λ R k First, we consider a simpler problem by fixing the variables λ: n d j x j b j ) j=1 x R n 7
16 A simpler diagonal problem n d j x j b j ) j=1 x R n 8
17 A simpler diagonal problem n d j x j b j ) j=1 x R n We can order indices 1,..., n such that we have b j1 b j b jn. The optimal support {j 1, j,..., j σ } only depends on the ordering. An optimal solution is { x bji /d ji for i = 1,..., σ j i := 0 otherwise 8
18 A simpler diagonal problem n d j x j b j ) j=1 x R n x b x 1 8
19 Back to the diagonal case Dx b ) k c l λ l x R n, λ R k 9
20 Back to the diagonal case Dx b ) k c l λ l x R n, λ R k x b k cl λ l x 1 9
21 Back to the diagonal case Dx b ) k c l λ l x R n, λ R k We wish to partition all λ vectors based on the optimal support they yield. In each cell of the partition, we want to have an ordering of all the k b i c l i λ l 9
22 Back to the diagonal case Dx b ) k c l λ l x R n, λ R k For every two indices i, j, we subdivide all points λ R k based on k b i ci l k λ l b j c l j λ l We just need four hyperplanes. In total we obtain On ) hyperplanes. By the hyperplane arrangement theorem, we obtain On k ) polyhedra Q t in R k. 9
23 Back to the diagonal case Consider one polyhedron Q t. Dx b ) k c l λ l x R n, λ R k We obtain an ordering of all the b i that holds for all λ Q t k c l i λ l The σ indices highest in the ordering yield an optimal support for all λ in Q t. 9
24 Back to the diagonal case Dx b ) k c l λ l x R n, λ R k Let X be the set containing all these On k ) supports. For each χ X, the original problem with the additional constraints can be solved in polynomial time. x i = 0, for all i / χ, The original problem has an optimal solution with support contained in χ for some χ X. 9
25 The block diagonal case
26 The block diagonal case Ax b ) k c l λ l x R n, λ R k 10
27 The block diagonal case Ax b ) k c l λ l x R n, λ R k Same overall strategy: 1. Design an algorithm for the simpler problem obtained by fixing the variables λ.. Cover the space of the λ variables with a polynomial number of regions such that in each region the algorithm yields the same optimal support. 10
28 The block diagonal case Ax b ) k c l λ l x R n, λ R k First, we consider a simpler problem by fixing the variables λ: Ax b x R n 10
29 The block diagonal case Ax b ) k c l λ l x R n, λ R k First, we consider a simpler problem by fixing the variables λ: A1... x R n A h x 1 b 1.. x h b h 10
30 The block diagonal case Ax b ) k c l λ l x R n, λ R k First, we consider a simpler problem by fixing the variables λ: h A i x i b i i=1 x R n 10
31 A simpler block diagonal problem h A i x i b i i=1 x R n 11
32 A simpler block diagonal problem h A i x i b i i=1 x R n We have to decide how to distribute σ among the different blocks. We can do it recursively for support 1,,..., σ. At each iteration we only need to redistribute Oθ 3 ) supports, where θ is the maximum number of variables in a block. The optimal support only depends on an ordering of Oh θ3 ) values of the form { A i x i b i } : x i R ni, suppx i ) j i,j 11
33 Back to the block diagonal case Ax b ) k c l λ l x R n, λ R k P) 1
34 Back to the block diagonal case Ax b ) k c l λ l x R n, λ R k P) We wish to partition all λ vectors based on the optimal support they yield. In each cell of the partition, we want to have an ordering of all the ) k Ai x i b i c i lλ l : x i R ni, suppx i ) j i,j 1
35 Partitioning the space Ai x i b i i,j ) k c i lλ l : x i R ni, suppx i ) j Issue 1: There is no such polyhedral decomposition of the λ space. Why? There are quadratic functions involved. Solution: Linearization Instead of the λ space, we define a new space S where we can linearize any quadratic monomial. The dimension of S is Ok ). 13
36 Partitioning the space Ai x i b i i,j ) k c i lλ l : x i R ni, suppx i ) j Issue : The objective value of each i, j) depends on its optimal support. Solution: Enumeration We partition S in polyhedra P t S, such that in each of them every i, j) has a fixed optimal support. 13
37 Partitioning the space Ai x i b i i,j ) k c i lλ l : x i R ni, suppx i ) j How? For each fixed support, the optimal value of every i, j) is a quadratic function. By comparing all pairs of quadratic functions, we obtain Oh θ ) = Oh) hyperplanes in S. By the hyperplane arrangement theorem, we obtain Oh k ) polyhedra P t in S. The best quadratic function among those corresponding to i, j yields the optimal support of i, j). 13
38 Partitioning the space Ai x i b i i,j ) k c i lλ l : x i R ni, suppx i ) j Now we can apply the old partitioning procedure: In each P t every expression is a linear function in S. For every two such expression, we subdivide all points in S based on which is largest. In total we obtain Oh θ3 ) hyperplanes. By the hyperplane arrangement theorem, we obtain Oh θ3 k ) polyhedra Q t,u P t. 13
39 Wrapping up the block diagonal case Consider one polyhedron Q t,u. Ax b ) k c l λ l x R n, λ R k We obtain an ordering of all the ) k Ai x i b i c i lλ l : x i R ni, suppx i ) j i,j that holds for all λ Q t,u The algorithm for the simpler block diagonal problem yields the same optimal support for all λ in Q t,u. 14
40 Wrapping up the block diagonal case Ax b ) k c l λ l x R n, λ R k Let X be the set containing all these Oh θ3 k ) supports. For each χ X, the original problem with the additional constraints can be solved in polynomial time. x i = 0, for all i / χ, The original problem has an optimal solution with support contained in χ for some χ X. 14
41 Subset selection with sparse matrices Alberto Del Pia, University of Wisconsin-Madison Santanu S. Dey, Georgia Tech Robert Weismantel, ETH Zürich February 1, 018 Schloss Dagstuhl
Subset selection in sparse matrices
Subset selection in sparse matrices Alberto Del Pia Santanu S. Dey Robert Weismantel October 5, 2018 Abstract In subset selection we search for the best linear predictor that involves a small subset of
More informationMinimizing Cubic and Homogeneous Polynomials over Integers in the Plane
Minimizing Cubic and Homogeneous Polynomials over Integers in the Plane Alberto Del Pia Department of Industrial and Systems Engineering & Wisconsin Institutes for Discovery, University of Wisconsin-Madison
More informationMachine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017
Machine Learning Regularization and Feature Selection Fabio Vandin November 14, 2017 1 Regularized Loss Minimization Assume h is defined by a vector w = (w 1,..., w d ) T R d (e.g., linear models) Regularization
More informationMinimizing Cubic and Homogeneous Polynomials Over Integer Points in the Plane
Minimizing Cubic and Homogeneous Polynomials Over Integer Points in the Plane Robert Hildebrand IFOR - ETH Zürich Joint work with Alberto Del Pia, Robert Weismantel and Kevin Zemmer Robert Hildebrand (ETH)
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationInteger Quadratic Programming is in NP
Alberto Del Pia 1 Santanu S. Dey 2 Marco Molinaro 2 1 IBM T. J. Watson Research Center, Yorktown Heights. 2 School of Industrial and Systems Engineering, Atlanta Outline 1 4 Program: Definition Definition
More informationApproximation Algorithms
Approximation Algorithms Chapter 26 Semidefinite Programming Zacharias Pitouras 1 Introduction LP place a good lower bound on OPT for NP-hard problems Are there other ways of doing this? Vector programs
More informationAn Introduction to Sparse Approximation
An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,
More informationSCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationMachine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017
Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationDistribution-Free Distribution Regression
Distribution-Free Distribution Regression Barnabás Póczos, Alessandro Rinaldo, Aarti Singh and Larry Wasserman AISTATS 2013 Presented by Esther Salazar Duke University February 28, 2014 E. Salazar (Reading
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationSparsity in Underdetermined Systems
Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2
More informationLMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009
LMI MODELLING 4. CONVEX LMI MODELLING Didier HENRION LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ Universidad de Valladolid, SP March 2009 Minors A minor of a matrix F is the determinant of a submatrix
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationBindel, Fall 2016 Matrix Computations (CS 6210) Notes for
1 A cautionary tale Notes for 2016-10-05 You have been dropped on a desert island with a laptop with a magic battery of infinite life, a MATLAB license, and a complete lack of knowledge of basic geometry.
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationTECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model
More informationSparsity of Matrix Canonical Forms. Xingzhi Zhan East China Normal University
Sparsity of Matrix Canonical Forms Xingzhi Zhan zhan@math.ecnu.edu.cn East China Normal University I. Extremal sparsity of the companion matrix of a polynomial Joint work with Chao Ma The companion matrix
More informationLeast Sparsity of p-norm based Optimization Problems with p > 1
Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More information1 Solution of a Large-Scale Traveling-Salesman Problem... 7 George B. Dantzig, Delbert R. Fulkerson, and Selmer M. Johnson
Part I The Early Years 1 Solution of a Large-Scale Traveling-Salesman Problem............ 7 George B. Dantzig, Delbert R. Fulkerson, and Selmer M. Johnson 2 The Hungarian Method for the Assignment Problem..............
More informationCompressed Sensing in Cancer Biology? (A Work in Progress)
Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University
More information6. Regularized linear regression
Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationj=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.
Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u
More informationEllipsoidal Mixed-Integer Representability
Ellipsoidal Mixed-Integer Representability Alberto Del Pia Jeff Poskin September 15, 2017 Abstract Representability results for mixed-integer linear systems play a fundamental role in optimization since
More informationSubmodularity in Machine Learning
Saifuddin Syed MLRG Summer 2016 1 / 39 What are submodular functions Outline 1 What are submodular functions Motivation Submodularity and Concavity Examples 2 Properties of submodular functions Submodularity
More informationLecture 1 Introduction
L. Vandenberghe EE236A (Fall 2013-14) Lecture 1 Introduction course overview linear optimization examples history approximate syllabus basic definitions linear optimization in vector and matrix notation
More informationDeciding Emptiness of the Gomory-Chvátal Closure is NP-Complete, Even for a Rational Polyhedron Containing No Integer Point
Deciding Emptiness of the Gomory-Chvátal Closure is NP-Complete, Even for a Rational Polyhedron Containing No Integer Point Gérard Cornuéjols 1 and Yanjun Li 2 1 Tepper School of Business, Carnegie Mellon
More informationQuick Tour of Linear Algebra and Graph Theory
Quick Tour of Linear Algebra and Graph Theory CS224w: Social and Information Network Analysis Fall 2012 Yu Wayne Wu Based on Borja Pelato s version in Fall 2011 Matrices and Vectors Matrix: A rectangular
More informationNonlinear Programming Models
Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear Programming Models p. Introduction Nonlinear Programming Models p. NLP problems minf(x) x S R n Standard form:
More informationAlgebraic and Geometric ideas in the theory of Discrete Optimization
Algebraic and Geometric ideas in the theory of Discrete Optimization Jesús A. De Loera, UC Davis Three Lectures based on the book: Algebraic & Geometric Ideas in the Theory of Discrete Optimization (SIAM-MOS
More informationORIE 4741: Learning with Big Messy Data. Regularization
ORIE 4741: Learning with Big Messy Data Regularization Professor Udell Operations Research and Information Engineering Cornell October 26, 2017 1 / 24 Regularized empirical risk minimization choose model
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationIs the test error unbiased for these programs? 2017 Kevin Jamieson
Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning
More informationRandomness-in-Structured Ensembles for Compressed Sensing of Images
Randomness-in-Structured Ensembles for Compressed Sensing of Images Abdolreza Abdolhosseini Moghadam Dep. of Electrical and Computer Engineering Michigan State University Email: abdolhos@msu.edu Hayder
More informationIntroduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,
1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,
More informationCSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization
CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization April 6, 2018 1 / 34 This material is covered in the textbook, Chapters 9 and 10. Some of the materials are taken from it. Some of
More information10725/36725 Optimization Homework 4
10725/36725 Optimization Homework 4 Due November 27, 2012 at beginning of class Instructions: There are four questions in this assignment. Please submit your homework as (up to) 4 separate sets of pages
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationMath Matrix Algebra
Math 44 - Matrix Algebra Review notes - (Alberto Bressan, Spring 7) sec: Orthogonal diagonalization of symmetric matrices When we seek to diagonalize a general n n matrix A, two difficulties may arise:
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationLarge-Scale L1-Related Minimization in Compressive Sensing and Beyond
Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationConic optimization under combinatorial sparsity constraints
Conic optimization under combinatorial sparsity constraints Christoph Buchheim and Emiliano Traversi Abstract We present a heuristic approach for conic optimization problems containing sparsity constraints.
More informationConvex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014
Convex optimization Javier Peña Carnegie Mellon University Universidad de los Andes Bogotá, Colombia September 2014 1 / 41 Convex optimization Problem of the form where Q R n convex set: min x f(x) x Q,
More informationMinimizing the Difference of L 1 and L 2 Norms with Applications
1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More information1 Positive definiteness and semidefiniteness
Positive definiteness and semidefiniteness Zdeněk Dvořák May 9, 205 For integers a, b, and c, let D(a, b, c) be the diagonal matrix with + for i =,..., a, D i,i = for i = a +,..., a + b,. 0 for i = a +
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationLinear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman
Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical
More informationCOURSE ON LMI PART I.2 GEOMETRY OF LMI SETS. Didier HENRION henrion
COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS Didier HENRION www.laas.fr/ henrion October 2006 Geometry of LMI sets Given symmetric matrices F i we want to characterize the shape in R n of the LMI set F
More informationNetwork Flows. 6. Lagrangian Relaxation. Programming. Fall 2010 Instructor: Dr. Masoud Yaghini
In the name of God Network Flows 6. Lagrangian Relaxation 6.3 Lagrangian Relaxation and Integer Programming Fall 2010 Instructor: Dr. Masoud Yaghini Integer Programming Outline Branch-and-Bound Technique
More informationA Tutorial on Compressive Sensing. Simon Foucart Drexel University / University of Georgia
A Tutorial on Compressive Sensing Simon Foucart Drexel University / University of Georgia CIMPA13 New Trends in Applied Harmonic Analysis Mar del Plata, Argentina, 5-16 August 2013 This minicourse acts
More informationThe Ellipsoid Algorithm
The Ellipsoid Algorithm John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA 9 February 2018 Mitchell The Ellipsoid Algorithm 1 / 28 Introduction Outline 1 Introduction 2 Assumptions
More informationExpression Data Exploration: Association, Patterns, Factors & Regression Modelling
Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation
More informationMachine Learning for Biomedical Engineering. Enrico Grisan
Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Curse of dimensionality Why are more features bad? Redundant features (useless or confounding) Hard to interpret and
More informationRepresentations of All Solutions of Boolean Programming Problems
Representations of All Solutions of Boolean Programming Problems Utz-Uwe Haus and Carla Michini Institute for Operations Research Department of Mathematics ETH Zurich Rämistr. 101, 8092 Zürich, Switzerland
More informationSymmetries in Experimental Design and Group Lasso Kentaro Tanaka and Masami Miyakawa
Symmetries in Experimental Design and Group Lasso Kentaro Tanaka and Masami Miyakawa Workshop on computational and algebraic methods in statistics March 3-5, Sanjo Conference Hall, Hongo Campus, University
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationThe Strength of Multi-row Aggregation Cuts for Sign-pattern Integer Programs
The Strength of Multi-row Aggregation Cuts for Sign-pattern Integer Programs Santanu S. Dey 1, Andres Iroume 1, and Guanyi Wang 1 1 School of Industrial and Systems Engineering, Georgia Institute of Technology
More informationSparse Interactions: Identifying High-Dimensional Multilinear Systems via Compressed Sensing
Sparse Interactions: Identifying High-Dimensional Multilinear Systems via Compressed Sensing Bobak Nazer and Robert D. Nowak University of Wisconsin, Madison Allerton 10/01/10 Motivation: Virus-Host Interaction
More informationValid Inequalities for Optimal Transmission Switching
Valid Inequalities for Optimal Transmission Switching Hyemin Jeon Jeff Linderoth Jim Luedtke Dept. of ISyE UW-Madison Burak Kocuk Santanu Dey Andy Sun Dept. of ISyE Georgia Tech 19th Combinatorial Optimization
More informationScientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix
Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008
More informationAbsolute value equations
Linear Algebra and its Applications 419 (2006) 359 367 www.elsevier.com/locate/laa Absolute value equations O.L. Mangasarian, R.R. Meyer Computer Sciences Department, University of Wisconsin, 1210 West
More informationSDP Relaxations for MAXCUT
SDP Relaxations for MAXCUT from Random Hyperplanes to Sum-of-Squares Certificates CATS @ UMD March 3, 2017 Ahmed Abdelkader MAXCUT SDP SOS March 3, 2017 1 / 27 Overview 1 MAXCUT, Hardness and UGC 2 LP
More informationSecond-order cone programming
Outline Second-order cone programming, PhD Lehigh University Department of Industrial and Systems Engineering February 10, 2009 Outline 1 Basic properties Spectral decomposition The cone of squares The
More informationNotes on the decomposition result of Karlin et al. [2] for the hierarchy of Lasserre by M. Laurent, December 13, 2012
Notes on the decomposition result of Karlin et al. [2] for the hierarchy of Lasserre by M. Laurent, December 13, 2012 We present the decomposition result of Karlin et al. [2] for the hierarchy of Lasserre
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationConvex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)
ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality
More informationOn mixed-integer sets with two integer variables
On mixed-integer sets with two integer variables Sanjeeb Dash IBM Research sanjeebd@us.ibm.com Santanu S. Dey Georgia Inst. Tech. santanu.dey@isye.gatech.edu September 8, 2010 Oktay Günlük IBM Research
More informationLecture 4: Newton s method and gradient descent
Lecture 4: Newton s method and gradient descent Newton s method Functional iteration Fitting linear regression Fitting logistic regression Prof. Yao Xie, ISyE 6416, Computational Statistics, Georgia Tech
More informationIntroduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012
Introduction to Sparsity Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Outline Understanding Sparsity Total variation Compressed sensing(definition) Exact recovery with sparse prior(l 0 ) l 1 relaxation
More informationCOMS 4771 Lecture Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso
COMS 477 Lecture 6. Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso / 2 Fixed-design linear regression Fixed-design linear regression A simplified
More informationApproximate Message Passing
Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters
More informationSparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery
Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Anna C. Gilbert Department of Mathematics University of Michigan Sparse signal recovery measurements:
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationOnline Dictionary Learning with Group Structure Inducing Norms
Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,
More informationChordal structure in computer algebra: Permanents
Chordal structure in computer algebra: Permanents Diego Cifuentes Laboratory for Information and Decision Systems Electrical Engineering and Computer Science Massachusetts Institute of Technology Joint
More information6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC
6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC 2003 2003.09.02.10 6. The Positivstellensatz Basic semialgebraic sets Semialgebraic sets Tarski-Seidenberg and quantifier elimination Feasibility
More informationOutline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model
Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationWeek 8. 1 LP is easy: the Ellipsoid Method
Week 8 1 LP is easy: the Ellipsoid Method In 1979 Khachyan proved that LP is solvable in polynomial time by a method of shrinking ellipsoids. The running time is polynomial in the number of variables n,
More informationFunctional Analysis Review
Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all
More information