Subset selection with sparse matrices

Size: px
Start display at page:

Download "Subset selection with sparse matrices"

Transcription

1 Subset selection with sparse matrices Alberto Del Pia, University of Wisconsin-Madison Santanu S. Dey, Georgia Tech Robert Weismantel, ETH Zürich February 1, 018 Schloss Dagstuhl

2 Subset selection for regression A large number of variables can be observed, and we are interested in the value of a predictor variable b. Due to time or cost constraints, it is not feasible to sample all the variables every time a prediction is required. Goal: select a small subset of variables to predict b in the future. Natural applications abound in medical or social studies. Example: Predict the risks of heart disease in terms of observable quantities age, sex, blood pressure, cholesterol level, ). The goal is to identify a small set of attributes for future tests. 1

3 Subset selection for regression b Mx + µ b x R n, µ R m m 1 Image credit: The Elements of Statistical Learning

4 Subset selection for regression Mx + µ b x R n, µ R b M ˆb Image credit: The Elements of Statistical Learning M 1

5 Approaches Mx + µ b x R n, µ R Different communities and different approaches: Enumeration: Best-subset regression. Greedy algorithms: Forward- and backward-stepwise selection, forward-stagewise regression, etc. Branch and bound: Leaps and bounds procedure. Convex optimization: Shrinkage methods: ridge regression, the lasso, least angle regression, etc. 3

6 Complexity Mx + µ b x R n, µ R The problem is NP-hard [Welch, 198]. Few polynomially solvable cases: When the n k largest eigenvalues of M M are identical, for k fixed. [Gao, Li, 013] When the covariance graph is a tree. [Das and Kempe, 008] Approximate algorithm when the covariance has constant bandwidth. [Das, Kempe, 008] Under some conditions, using l 1 norm to replace the cardinality constraint yields the exact solution with an overwhelg probability. [Donoho, 006] [Candes, Romberg, Tao 006] 4

7 Our contribution Mx + µ b x R n, µ R Can be solved in polynomial time if: i) M is obtained from a diagonal matrix by adding a fixed number of extra columns. d 1... c 1 c k d n 5

8 Our contribution Mx + µ b x R n, µ R Can be solved in polynomial time if: ii) M is obtained by adding a fixed number of extra columns to a block diagonal matrix, where each block has a fixed number of variables. A 1... c1 c k A h 5

9 First reduction A c 1 c k )x + µ b x R n+k, µ R 6

10 First reduction A c 1 c k )x + µ b x R n+k, µ R Lemma We just need to show how to solve in polynomial time the problem Ax b ) k c l λ l x R n, λ R k 6

11 The diagonal case

12 The diagonal case Dx b ) k c l λ l x R n, λ R k 7

13 The diagonal case Dx b ) k c l λ l x R n, λ R k First, we consider a simpler problem by fixing the variables λ: Dx b x R n 7

14 The diagonal case Dx b ) k c l λ l x R n, λ R k First, we consider a simpler problem by fixing the variables λ: 1. d.. x R n d n x b 7

15 The diagonal case Dx b ) k c l λ l x R n, λ R k First, we consider a simpler problem by fixing the variables λ: n d j x j b j ) j=1 x R n 7

16 A simpler diagonal problem n d j x j b j ) j=1 x R n 8

17 A simpler diagonal problem n d j x j b j ) j=1 x R n We can order indices 1,..., n such that we have b j1 b j b jn. The optimal support {j 1, j,..., j σ } only depends on the ordering. An optimal solution is { x bji /d ji for i = 1,..., σ j i := 0 otherwise 8

18 A simpler diagonal problem n d j x j b j ) j=1 x R n x b x 1 8

19 Back to the diagonal case Dx b ) k c l λ l x R n, λ R k 9

20 Back to the diagonal case Dx b ) k c l λ l x R n, λ R k x b k cl λ l x 1 9

21 Back to the diagonal case Dx b ) k c l λ l x R n, λ R k We wish to partition all λ vectors based on the optimal support they yield. In each cell of the partition, we want to have an ordering of all the k b i c l i λ l 9

22 Back to the diagonal case Dx b ) k c l λ l x R n, λ R k For every two indices i, j, we subdivide all points λ R k based on k b i ci l k λ l b j c l j λ l We just need four hyperplanes. In total we obtain On ) hyperplanes. By the hyperplane arrangement theorem, we obtain On k ) polyhedra Q t in R k. 9

23 Back to the diagonal case Consider one polyhedron Q t. Dx b ) k c l λ l x R n, λ R k We obtain an ordering of all the b i that holds for all λ Q t k c l i λ l The σ indices highest in the ordering yield an optimal support for all λ in Q t. 9

24 Back to the diagonal case Dx b ) k c l λ l x R n, λ R k Let X be the set containing all these On k ) supports. For each χ X, the original problem with the additional constraints can be solved in polynomial time. x i = 0, for all i / χ, The original problem has an optimal solution with support contained in χ for some χ X. 9

25 The block diagonal case

26 The block diagonal case Ax b ) k c l λ l x R n, λ R k 10

27 The block diagonal case Ax b ) k c l λ l x R n, λ R k Same overall strategy: 1. Design an algorithm for the simpler problem obtained by fixing the variables λ.. Cover the space of the λ variables with a polynomial number of regions such that in each region the algorithm yields the same optimal support. 10

28 The block diagonal case Ax b ) k c l λ l x R n, λ R k First, we consider a simpler problem by fixing the variables λ: Ax b x R n 10

29 The block diagonal case Ax b ) k c l λ l x R n, λ R k First, we consider a simpler problem by fixing the variables λ: A1... x R n A h x 1 b 1.. x h b h 10

30 The block diagonal case Ax b ) k c l λ l x R n, λ R k First, we consider a simpler problem by fixing the variables λ: h A i x i b i i=1 x R n 10

31 A simpler block diagonal problem h A i x i b i i=1 x R n 11

32 A simpler block diagonal problem h A i x i b i i=1 x R n We have to decide how to distribute σ among the different blocks. We can do it recursively for support 1,,..., σ. At each iteration we only need to redistribute Oθ 3 ) supports, where θ is the maximum number of variables in a block. The optimal support only depends on an ordering of Oh θ3 ) values of the form { A i x i b i } : x i R ni, suppx i ) j i,j 11

33 Back to the block diagonal case Ax b ) k c l λ l x R n, λ R k P) 1

34 Back to the block diagonal case Ax b ) k c l λ l x R n, λ R k P) We wish to partition all λ vectors based on the optimal support they yield. In each cell of the partition, we want to have an ordering of all the ) k Ai x i b i c i lλ l : x i R ni, suppx i ) j i,j 1

35 Partitioning the space Ai x i b i i,j ) k c i lλ l : x i R ni, suppx i ) j Issue 1: There is no such polyhedral decomposition of the λ space. Why? There are quadratic functions involved. Solution: Linearization Instead of the λ space, we define a new space S where we can linearize any quadratic monomial. The dimension of S is Ok ). 13

36 Partitioning the space Ai x i b i i,j ) k c i lλ l : x i R ni, suppx i ) j Issue : The objective value of each i, j) depends on its optimal support. Solution: Enumeration We partition S in polyhedra P t S, such that in each of them every i, j) has a fixed optimal support. 13

37 Partitioning the space Ai x i b i i,j ) k c i lλ l : x i R ni, suppx i ) j How? For each fixed support, the optimal value of every i, j) is a quadratic function. By comparing all pairs of quadratic functions, we obtain Oh θ ) = Oh) hyperplanes in S. By the hyperplane arrangement theorem, we obtain Oh k ) polyhedra P t in S. The best quadratic function among those corresponding to i, j yields the optimal support of i, j). 13

38 Partitioning the space Ai x i b i i,j ) k c i lλ l : x i R ni, suppx i ) j Now we can apply the old partitioning procedure: In each P t every expression is a linear function in S. For every two such expression, we subdivide all points in S based on which is largest. In total we obtain Oh θ3 ) hyperplanes. By the hyperplane arrangement theorem, we obtain Oh θ3 k ) polyhedra Q t,u P t. 13

39 Wrapping up the block diagonal case Consider one polyhedron Q t,u. Ax b ) k c l λ l x R n, λ R k We obtain an ordering of all the ) k Ai x i b i c i lλ l : x i R ni, suppx i ) j i,j that holds for all λ Q t,u The algorithm for the simpler block diagonal problem yields the same optimal support for all λ in Q t,u. 14

40 Wrapping up the block diagonal case Ax b ) k c l λ l x R n, λ R k Let X be the set containing all these Oh θ3 k ) supports. For each χ X, the original problem with the additional constraints can be solved in polynomial time. x i = 0, for all i / χ, The original problem has an optimal solution with support contained in χ for some χ X. 14

41 Subset selection with sparse matrices Alberto Del Pia, University of Wisconsin-Madison Santanu S. Dey, Georgia Tech Robert Weismantel, ETH Zürich February 1, 018 Schloss Dagstuhl

Subset selection in sparse matrices

Subset selection in sparse matrices Subset selection in sparse matrices Alberto Del Pia Santanu S. Dey Robert Weismantel October 5, 2018 Abstract In subset selection we search for the best linear predictor that involves a small subset of

More information

Minimizing Cubic and Homogeneous Polynomials over Integers in the Plane

Minimizing Cubic and Homogeneous Polynomials over Integers in the Plane Minimizing Cubic and Homogeneous Polynomials over Integers in the Plane Alberto Del Pia Department of Industrial and Systems Engineering & Wisconsin Institutes for Discovery, University of Wisconsin-Madison

More information

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 14, 2017 1 Regularized Loss Minimization Assume h is defined by a vector w = (w 1,..., w d ) T R d (e.g., linear models) Regularization

More information

Minimizing Cubic and Homogeneous Polynomials Over Integer Points in the Plane

Minimizing Cubic and Homogeneous Polynomials Over Integer Points in the Plane Minimizing Cubic and Homogeneous Polynomials Over Integer Points in the Plane Robert Hildebrand IFOR - ETH Zürich Joint work with Alberto Del Pia, Robert Weismantel and Kevin Zemmer Robert Hildebrand (ETH)

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Integer Quadratic Programming is in NP

Integer Quadratic Programming is in NP Alberto Del Pia 1 Santanu S. Dey 2 Marco Molinaro 2 1 IBM T. J. Watson Research Center, Yorktown Heights. 2 School of Industrial and Systems Engineering, Atlanta Outline 1 4 Program: Definition Definition

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Chapter 26 Semidefinite Programming Zacharias Pitouras 1 Introduction LP place a good lower bound on OPT for NP-hard problems Are there other ways of doing this? Vector programs

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

Distribution-Free Distribution Regression

Distribution-Free Distribution Regression Distribution-Free Distribution Regression Barnabás Póczos, Alessandro Rinaldo, Aarti Singh and Larry Wasserman AISTATS 2013 Presented by Esther Salazar Duke University February 28, 2014 E. Salazar (Reading

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009 LMI MODELLING 4. CONVEX LMI MODELLING Didier HENRION LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ Universidad de Valladolid, SP March 2009 Minors A minor of a matrix F is the determinant of a submatrix

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 A cautionary tale Notes for 2016-10-05 You have been dropped on a desert island with a laptop with a magic battery of infinite life, a MATLAB license, and a complete lack of knowledge of basic geometry.

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Sparsity of Matrix Canonical Forms. Xingzhi Zhan East China Normal University

Sparsity of Matrix Canonical Forms. Xingzhi Zhan East China Normal University Sparsity of Matrix Canonical Forms Xingzhi Zhan zhan@math.ecnu.edu.cn East China Normal University I. Extremal sparsity of the companion matrix of a polynomial Joint work with Chao Ma The companion matrix

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

1 Solution of a Large-Scale Traveling-Salesman Problem... 7 George B. Dantzig, Delbert R. Fulkerson, and Selmer M. Johnson

1 Solution of a Large-Scale Traveling-Salesman Problem... 7 George B. Dantzig, Delbert R. Fulkerson, and Selmer M. Johnson Part I The Early Years 1 Solution of a Large-Scale Traveling-Salesman Problem............ 7 George B. Dantzig, Delbert R. Fulkerson, and Selmer M. Johnson 2 The Hungarian Method for the Assignment Problem..............

More information

Compressed Sensing in Cancer Biology? (A Work in Progress)

Compressed Sensing in Cancer Biology? (A Work in Progress) Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent. Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u

More information

Ellipsoidal Mixed-Integer Representability

Ellipsoidal Mixed-Integer Representability Ellipsoidal Mixed-Integer Representability Alberto Del Pia Jeff Poskin September 15, 2017 Abstract Representability results for mixed-integer linear systems play a fundamental role in optimization since

More information

Submodularity in Machine Learning

Submodularity in Machine Learning Saifuddin Syed MLRG Summer 2016 1 / 39 What are submodular functions Outline 1 What are submodular functions Motivation Submodularity and Concavity Examples 2 Properties of submodular functions Submodularity

More information

Lecture 1 Introduction

Lecture 1 Introduction L. Vandenberghe EE236A (Fall 2013-14) Lecture 1 Introduction course overview linear optimization examples history approximate syllabus basic definitions linear optimization in vector and matrix notation

More information

Deciding Emptiness of the Gomory-Chvátal Closure is NP-Complete, Even for a Rational Polyhedron Containing No Integer Point

Deciding Emptiness of the Gomory-Chvátal Closure is NP-Complete, Even for a Rational Polyhedron Containing No Integer Point Deciding Emptiness of the Gomory-Chvátal Closure is NP-Complete, Even for a Rational Polyhedron Containing No Integer Point Gérard Cornuéjols 1 and Yanjun Li 2 1 Tepper School of Business, Carnegie Mellon

More information

Quick Tour of Linear Algebra and Graph Theory

Quick Tour of Linear Algebra and Graph Theory Quick Tour of Linear Algebra and Graph Theory CS224w: Social and Information Network Analysis Fall 2012 Yu Wayne Wu Based on Borja Pelato s version in Fall 2011 Matrices and Vectors Matrix: A rectangular

More information

Nonlinear Programming Models

Nonlinear Programming Models Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear Programming Models p. Introduction Nonlinear Programming Models p. NLP problems minf(x) x S R n Standard form:

More information

Algebraic and Geometric ideas in the theory of Discrete Optimization

Algebraic and Geometric ideas in the theory of Discrete Optimization Algebraic and Geometric ideas in the theory of Discrete Optimization Jesús A. De Loera, UC Davis Three Lectures based on the book: Algebraic & Geometric Ideas in the Theory of Discrete Optimization (SIAM-MOS

More information

ORIE 4741: Learning with Big Messy Data. Regularization

ORIE 4741: Learning with Big Messy Data. Regularization ORIE 4741: Learning with Big Messy Data Regularization Professor Udell Operations Research and Information Engineering Cornell October 26, 2017 1 / 24 Regularized empirical risk minimization choose model

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Is the test error unbiased for these programs? 2017 Kevin Jamieson

Is the test error unbiased for these programs? 2017 Kevin Jamieson Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning

More information

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Randomness-in-Structured Ensembles for Compressed Sensing of Images Randomness-in-Structured Ensembles for Compressed Sensing of Images Abdolreza Abdolhosseini Moghadam Dep. of Electrical and Computer Engineering Michigan State University Email: abdolhos@msu.edu Hayder

More information

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University, 1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,

More information

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization April 6, 2018 1 / 34 This material is covered in the textbook, Chapters 9 and 10. Some of the materials are taken from it. Some of

More information

10725/36725 Optimization Homework 4

10725/36725 Optimization Homework 4 10725/36725 Optimization Homework 4 Due November 27, 2012 at beginning of class Instructions: There are four questions in this assignment. Please submit your homework as (up to) 4 separate sets of pages

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Math Matrix Algebra

Math Matrix Algebra Math 44 - Matrix Algebra Review notes - (Alberto Bressan, Spring 7) sec: Orthogonal diagonalization of symmetric matrices When we seek to diagonalize a general n n matrix A, two difficulties may arise:

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Conic optimization under combinatorial sparsity constraints

Conic optimization under combinatorial sparsity constraints Conic optimization under combinatorial sparsity constraints Christoph Buchheim and Emiliano Traversi Abstract We present a heuristic approach for conic optimization problems containing sparsity constraints.

More information

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014 Convex optimization Javier Peña Carnegie Mellon University Universidad de los Andes Bogotá, Colombia September 2014 1 / 41 Convex optimization Problem of the form where Q R n convex set: min x f(x) x Q,

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

1 Positive definiteness and semidefiniteness

1 Positive definiteness and semidefiniteness Positive definiteness and semidefiniteness Zdeněk Dvořák May 9, 205 For integers a, b, and c, let D(a, b, c) be the diagonal matrix with + for i =,..., a, D i,i = for i = a +,..., a + b,. 0 for i = a +

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical

More information

COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS. Didier HENRION henrion

COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS. Didier HENRION   henrion COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS Didier HENRION www.laas.fr/ henrion October 2006 Geometry of LMI sets Given symmetric matrices F i we want to characterize the shape in R n of the LMI set F

More information

Network Flows. 6. Lagrangian Relaxation. Programming. Fall 2010 Instructor: Dr. Masoud Yaghini

Network Flows. 6. Lagrangian Relaxation. Programming. Fall 2010 Instructor: Dr. Masoud Yaghini In the name of God Network Flows 6. Lagrangian Relaxation 6.3 Lagrangian Relaxation and Integer Programming Fall 2010 Instructor: Dr. Masoud Yaghini Integer Programming Outline Branch-and-Bound Technique

More information

A Tutorial on Compressive Sensing. Simon Foucart Drexel University / University of Georgia

A Tutorial on Compressive Sensing. Simon Foucart Drexel University / University of Georgia A Tutorial on Compressive Sensing Simon Foucart Drexel University / University of Georgia CIMPA13 New Trends in Applied Harmonic Analysis Mar del Plata, Argentina, 5-16 August 2013 This minicourse acts

More information

The Ellipsoid Algorithm

The Ellipsoid Algorithm The Ellipsoid Algorithm John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA 9 February 2018 Mitchell The Ellipsoid Algorithm 1 / 28 Introduction Outline 1 Introduction 2 Assumptions

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

Machine Learning for Biomedical Engineering. Enrico Grisan

Machine Learning for Biomedical Engineering. Enrico Grisan Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Curse of dimensionality Why are more features bad? Redundant features (useless or confounding) Hard to interpret and

More information

Representations of All Solutions of Boolean Programming Problems

Representations of All Solutions of Boolean Programming Problems Representations of All Solutions of Boolean Programming Problems Utz-Uwe Haus and Carla Michini Institute for Operations Research Department of Mathematics ETH Zurich Rämistr. 101, 8092 Zürich, Switzerland

More information

Symmetries in Experimental Design and Group Lasso Kentaro Tanaka and Masami Miyakawa

Symmetries in Experimental Design and Group Lasso Kentaro Tanaka and Masami Miyakawa Symmetries in Experimental Design and Group Lasso Kentaro Tanaka and Masami Miyakawa Workshop on computational and algebraic methods in statistics March 3-5, Sanjo Conference Hall, Hongo Campus, University

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

The Strength of Multi-row Aggregation Cuts for Sign-pattern Integer Programs

The Strength of Multi-row Aggregation Cuts for Sign-pattern Integer Programs The Strength of Multi-row Aggregation Cuts for Sign-pattern Integer Programs Santanu S. Dey 1, Andres Iroume 1, and Guanyi Wang 1 1 School of Industrial and Systems Engineering, Georgia Institute of Technology

More information

Sparse Interactions: Identifying High-Dimensional Multilinear Systems via Compressed Sensing

Sparse Interactions: Identifying High-Dimensional Multilinear Systems via Compressed Sensing Sparse Interactions: Identifying High-Dimensional Multilinear Systems via Compressed Sensing Bobak Nazer and Robert D. Nowak University of Wisconsin, Madison Allerton 10/01/10 Motivation: Virus-Host Interaction

More information

Valid Inequalities for Optimal Transmission Switching

Valid Inequalities for Optimal Transmission Switching Valid Inequalities for Optimal Transmission Switching Hyemin Jeon Jeff Linderoth Jim Luedtke Dept. of ISyE UW-Madison Burak Kocuk Santanu Dey Andy Sun Dept. of ISyE Georgia Tech 19th Combinatorial Optimization

More information

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008

More information

Absolute value equations

Absolute value equations Linear Algebra and its Applications 419 (2006) 359 367 www.elsevier.com/locate/laa Absolute value equations O.L. Mangasarian, R.R. Meyer Computer Sciences Department, University of Wisconsin, 1210 West

More information

SDP Relaxations for MAXCUT

SDP Relaxations for MAXCUT SDP Relaxations for MAXCUT from Random Hyperplanes to Sum-of-Squares Certificates CATS @ UMD March 3, 2017 Ahmed Abdelkader MAXCUT SDP SOS March 3, 2017 1 / 27 Overview 1 MAXCUT, Hardness and UGC 2 LP

More information

Second-order cone programming

Second-order cone programming Outline Second-order cone programming, PhD Lehigh University Department of Industrial and Systems Engineering February 10, 2009 Outline 1 Basic properties Spectral decomposition The cone of squares The

More information

Notes on the decomposition result of Karlin et al. [2] for the hierarchy of Lasserre by M. Laurent, December 13, 2012

Notes on the decomposition result of Karlin et al. [2] for the hierarchy of Lasserre by M. Laurent, December 13, 2012 Notes on the decomposition result of Karlin et al. [2] for the hierarchy of Lasserre by M. Laurent, December 13, 2012 We present the decomposition result of Karlin et al. [2] for the hierarchy of Lasserre

More information

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs) ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality

More information

On mixed-integer sets with two integer variables

On mixed-integer sets with two integer variables On mixed-integer sets with two integer variables Sanjeeb Dash IBM Research sanjeebd@us.ibm.com Santanu S. Dey Georgia Inst. Tech. santanu.dey@isye.gatech.edu September 8, 2010 Oktay Günlük IBM Research

More information

Lecture 4: Newton s method and gradient descent

Lecture 4: Newton s method and gradient descent Lecture 4: Newton s method and gradient descent Newton s method Functional iteration Fitting linear regression Fitting logistic regression Prof. Yao Xie, ISyE 6416, Computational Statistics, Georgia Tech

More information

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Introduction to Sparsity Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Outline Understanding Sparsity Total variation Compressed sensing(definition) Exact recovery with sparse prior(l 0 ) l 1 relaxation

More information

COMS 4771 Lecture Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso

COMS 4771 Lecture Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso COMS 477 Lecture 6. Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso / 2 Fixed-design linear regression Fixed-design linear regression A simplified

More information

Approximate Message Passing

Approximate Message Passing Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters

More information

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Anna C. Gilbert Department of Mathematics University of Michigan Sparse signal recovery measurements:

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Online Dictionary Learning with Group Structure Inducing Norms

Online Dictionary Learning with Group Structure Inducing Norms Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,

More information

Chordal structure in computer algebra: Permanents

Chordal structure in computer algebra: Permanents Chordal structure in computer algebra: Permanents Diego Cifuentes Laboratory for Information and Decision Systems Electrical Engineering and Computer Science Massachusetts Institute of Technology Joint

More information

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC 6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC 2003 2003.09.02.10 6. The Positivstellensatz Basic semialgebraic sets Semialgebraic sets Tarski-Seidenberg and quantifier elimination Feasibility

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Week 8. 1 LP is easy: the Ellipsoid Method

Week 8. 1 LP is easy: the Ellipsoid Method Week 8 1 LP is easy: the Ellipsoid Method In 1979 Khachyan proved that LP is solvable in polynomial time by a method of shrinking ellipsoids. The running time is polynomial in the number of variables n,

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information