Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Size: px
Start display at page:

Download "Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28"

Transcription

1 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28

2 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis: high level view Some extensions (complex regularization) structured sparsity graphical model matrix regularization T. Zhang (Rutgers) Sparsity Models 2 / 28

3 Modern Sparsity Analysis: Motivation Modern datasets are often high dimensional statistical estimation suffers from curse of dimensionality T. Zhang (Rutgers) Sparsity Models 3 / 28

4 Modern Sparsity Analysis: Motivation Modern datasets are often high dimensional statistical estimation suffers from curse of dimensionality Sparsity: popular assumption to address curse of dimensionality motivated from real applications T. Zhang (Rutgers) Sparsity Models 3 / 28

5 Modern Sparsity Analysis: Motivation Modern datasets are often high dimensional statistical estimation suffers from curse of dimensionality Sparsity: popular assumption to address curse of dimensionality motivated from real applications Challenges: formulation, focusing on efficient computation mathematical analysis T. Zhang (Rutgers) Sparsity Models 3 / 28

6 Standard Sparse Regression Model: Y = X β + ɛ Y R n : observation X R n p : design matrix β R p : parameter vector to be estimated ɛ R n : zero mean stochastic noise with variance σ 2 T. Zhang (Rutgers) Sparsity Models 4 / 28

7 Standard Sparse Regression Model: Y = X β + ɛ Y R n : observation X R n p : design matrix β R p : parameter vector to be estimated ɛ R n : zero mean stochastic noise with variance σ 2 High dimensional setting: n p T. Zhang (Rutgers) Sparsity Models 4 / 28

8 Standard Sparse Regression Model: Y = X β + ɛ Y R n : observation X R n p : design matrix β R p : parameter vector to be estimated ɛ R n : zero mean stochastic noise with variance σ 2 High dimensional setting: n p Sparsity: β has few nonzero components supp( β) = {j : β j 0}. β 0 = supp( β) is small: n T. Zhang (Rutgers) Sparsity Models 4 / 28

9 Algorithms for Standard Sparsity L 0 regularization: natural method (computationally inefficient) ˆβ L0 = arg min β Y Xβ 2 2, subject to β 0 k T. Zhang (Rutgers) Sparsity Models 5 / 28

10 Algorithms for Standard Sparsity L 0 regularization: natural method (computationally inefficient) ˆβ L0 = arg min β Y Xβ 2 2, subject to β 0 k L 1 regularization (Lasso): convex relaxation (computationally efficient) [ ] ˆβ L1 = arg min Y Xβ λ β 1 β T. Zhang (Rutgers) Sparsity Models 5 / 28

11 Algorithms for Standard Sparsity L 0 regularization: natural method (computationally inefficient) ˆβ L0 = arg min β Y Xβ 2 2, subject to β 0 k L 1 regularization (Lasso): convex relaxation (computationally efficient) [ ] ˆβ L1 = arg min Y Xβ λ β 1 β Theoretical questions: how well can we estimate parameter β (recovery performance) T. Zhang (Rutgers) Sparsity Models 5 / 28

12 Greedy Algorithms for standard sparse regularization Reformulation: find variable set F {1,..., p} to minimize min β Xβ Y 2 2 supp(β) F s.t. F k Forward Greedy Algorithm (OMP): select variables one by one Initialize variable set F k = at k = 0 Iterate k = 1,..., p find best variable j to add to F k 1 (maximum reduction of squared error) F k = F k 1 {j} terminate with some criterion; output ˆβ using regression with selected variables F k T. Zhang (Rutgers) Sparsity Models 6 / 28

13 Greedy Algorithms for standard sparse regularization Reformulation: find variable set F {1,..., p} to minimize min β Xβ Y 2 2 supp(β) F s.t. F k Forward Greedy Algorithm (OMP): select variables one by one Initialize variable set F k = at k = 0 Iterate k = 1,..., p find best variable j to add to F k 1 (maximum reduction of squared error) F k = F k 1 {j} terminate with some criterion; output ˆβ using regression with selected variables F k Theoretical question: recovery performance? T. Zhang (Rutgers) Sparsity Models 6 / 28

14 Conditions and Results Type of results (sparse recovery): Variable selection (can we find nonzero variables): can we recover the true support F? supp( ˆβ) F? Parameter estimation (how well we can estimate β): can we recover the parameters? ˆβ β 2 2? Are efficient algorithms (such as L 1 or OMP) good enough? T. Zhang (Rutgers) Sparsity Models 7 / 28

15 Conditions and Results Type of results (sparse recovery): Variable selection (can we find nonzero variables): can we recover the true support F? supp( ˆβ) F? Parameter estimation (how well we can estimate β): can we recover the parameters? ˆβ β 2 2? Are efficient algorithms (such as L 1 or OMP) good enough? Yes but require conditions: Irrepresentable: for support recovery RIP Restricted Isometry Property: for parameter recovery T. Zhang (Rutgers) Sparsity Models 7 / 28

16 KKT Condition for Lasso Solution Lasso solution: KKT condition: at ˆβ = ˆβ L1 : [ ] ˆβ L1 = arg min Y Xβ λ β 1 β Exists a sub-gradient being zero: for all j = 1,..., p (X j is the j-th column of X): 2Xj (X ˆβ y) + λ ˆβ j = 0. 1 u > 0 Subgradient of L 1 norm: u = sign(u) = 1 u < 0 [ 1, 1] u = 0. If we can find a ˆβ that satisfies KKT condition, then it is Lasso solution. A slightly stronger condition implies uniqueness. T. Zhang (Rutgers) Sparsity Models 8 / 28

17 Feature Selection Consistency of Lasso Idea: construct a solution and check KKT condition. T. Zhang (Rutgers) Sparsity Models 9 / 28

18 Feature Selection Consistency of Lasso Idea: construct a solution and check KKT condition. Define ˆβ such that ˆβ F satisfies: and set ˆβ F c = 0. 2X F (X F ˆβ F y) + λsign( β) F = 0, T. Zhang (Rutgers) Sparsity Models 9 / 28

19 Feature Selection Consistency of Lasso Idea: construct a solution and check KKT condition. Define ˆβ such that ˆβ F satisfies: and set ˆβ F c = 0. Condition A: X F X F is full rank sign( ˆβ F ) = sign( ˆβ F ) 2X F (X F ˆβ F y) + λsign( β) F = 0, 2 Xj (X F ˆβ F y) < λ for j / F. Under Condition A: ˆβ is the unique Lasso solution (satisfies KKT) T. Zhang (Rutgers) Sparsity Models 9 / 28

20 Irrepresentable Condition The condition µ = sup Xj X F (X F X F ) 1 sign( β) F < 1, j F is called irrepresentable condition. It implies condition A when y = X β and λ is sufficiently small. Under irrepresentable condition, if noise is sufficiently small min j F β j is larger than noise level then there exists appropriate λ such that condition A holds. Thus Lasso solution is unique and feature selection consistent. T. Zhang (Rutgers) Sparsity Models 10 / 28

21 Irrepresentable Condition The condition µ = sup Xj X F (X F X F ) 1 sign( β) F < 1, j F is called irrepresentable condition. It implies condition A when y = X β and λ is sufficiently small. Under irrepresentable condition, if noise is sufficiently small min j F β j is larger than noise level then there exists appropriate λ such that condition A holds. Thus Lasso solution is unique and feature selection consistent. Condition similar to irrepresentable condition can be derived for OMP. T. Zhang (Rutgers) Sparsity Models 10 / 28

22 RIP Conditions Feature selection consistency implies good parameter estimation. However irrepresentable condition is too strong. T. Zhang (Rutgers) Sparsity Models 11 / 28

23 RIP Conditions Feature selection consistency implies good parameter estimation. However irrepresentable condition is too strong. RIP (restricted isometry property): weaker condition which can be used to obtain parameter estimation result. Definition of RIP: for some c > 1, the following condition holds with k = F = β 0. ρ + (c k)/ρ (c k) < { β X Xβ ρ + (s) = sup β β { β X Xβ ρ (s) = inf β β } : β 0 s } : β 0 s T. Zhang (Rutgers) Sparsity Models 11 / 28

24 Results under Restricted Isometry Property Parameter estimation under RIP: β ˆβ 2 = O(σ 2 β 0 ln p/n), where σ 2 is noise variance. this result can be obtained both for Lasso and for OMP. it is best possible T. Zhang (Rutgers) Sparsity Models 12 / 28

25 Results under Restricted Isometry Property Parameter estimation under RIP: β ˆβ 2 = O(σ 2 β 0 ln p/n), where σ 2 is noise variance. this result can be obtained both for Lasso and for OMP. it is best possible Feature selection under RIP: neither procedure achieves feature selection consistency. T. Zhang (Rutgers) Sparsity Models 12 / 28

26 Results under Restricted Isometry Property Parameter estimation under RIP: where σ 2 is noise variance. β ˆβ 2 = O(σ 2 β 0 ln p/n), this result can be obtained both for Lasso and for OMP. it is best possible Feature selection under RIP: neither procedure achieves feature selection consistency. Improvement: non-convex formulation is needed for optimal feature selection under RIP trickier to analyze: general theory only appeared very recently T. Zhang (Rutgers) Sparsity Models 12 / 28

27 Complex Regularization: structured sparsity Wavelet domain: sparsity pattern not random (structured) Image domain Wavelet domain T. Zhang (Rutgers) Sparsity Models 13 / 28

28 Complex Regularization: structured sparsity Wavelet domain: sparsity pattern not random (structured) Image domain Wavelet domain can we take advantage of structure? T. Zhang (Rutgers) Sparsity Models 13 / 28

29 Structured Sparsity Characterization Observation: sparsity pattern is the set of nonzero coefficients not all sparse patterns are equally likely Our proposal: information theoretical characterization of structure : a sparsity pattern F is associated with cost c(f ) c(f) is negative log-likelihood of F (or its multiple). Optimization problem: min β Xβ Y 2 2 subject to β 0 + c(supp(β)) s. c(supp(β)): cost for selecting support supp(β) β 0 : cost for estimation after feature selection T. Zhang (Rutgers) Sparsity Models 14 / 28

30 Example: Group Structure Variables are divided into pre-defined groups G 1,..., G p/m m variables per group Example (m = 4) G 1 G 2 G 4 G p/m nodes: variables gray nodes: selected variables (groups 1,2,4) Assumption: coefficients are not completely random coefficients in each group are simultaneously (or nearly simultaneously) zeros or nonzeros How to take advantage of group structure? T. Zhang (Rutgers) Sparsity Models 15 / 28

31 Example: Group Structure Variables are divided into pre-defined groups G 1,..., G p/m m variables per group Assumption: coefficients in each group are simultaneously zeros or nonzeros Group sparsity pattern cost: β 0 + m 1 β 0 ln p. Standard sparsity pattern cost (for Lasso): β 0 ln p Theoretical question: can we take advantage of group sparsity structure to improve Lasso? T. Zhang (Rutgers) Sparsity Models 16 / 28

32 Convex Relaxation for group sparsity L 1 L 2 convex relaxation (group Lasso) ˆβ = arg min Xβ Y λ β j β Gj 2. This is supposed to take advantage of group sparsity structure within group: uses L 2 regularization (doesn t encourage sparsity) across group: uses L 1 regularization (encourage sparsity) T. Zhang (Rutgers) Sparsity Models 17 / 28

33 Convex Relaxation for group sparsity L 1 L 2 convex relaxation (group Lasso) ˆβ = arg min Xβ Y λ β j β Gj 2. This is supposed to take advantage of group sparsity structure within group: uses L 2 regularization (doesn t encourage sparsity) across group: uses L 1 regularization (encourage sparsity) Question: what is the benefit of group Lasso formulation? T. Zhang (Rutgers) Sparsity Models 17 / 28

34 Recovery Analysis for Lasso and Group Lasso Simple sparsity: s = β 0 variables out of p variables information theoretical complexity (log of mumbler of choices): O(s ln p) statistical recovery performance: ˆβ β 2 2 = O(σ 2 β 0 ln p/n) T. Zhang (Rutgers) Sparsity Models 18 / 28

35 Recovery Analysis for Lasso and Group Lasso Simple sparsity: s = β 0 variables out of p variables information theoretical complexity (log of mumbler of choices): O(s ln p) statistical recovery performance: ˆβ β 2 2 = O(σ 2 β 0 ln p/n) Group sparsity: g groups out of p/m groups (ideally g = β 0 /m) information theoretical complexity (log of number of choices): O(g ln(p/m)) Statistical recovery performance for group Lasso: if supp( β) is covered in g groups, under group RIP (weaker than RIP) ˆβ β 2 2 = O σ2 n ( g } ln(p/m) {{} group selection + mg ) }{{} estimation after group selection T. Zhang (Rutgers) Sparsity Models 18 / 28

36 Group sparsity: correct group structure 2 (a) Original (b) Lasso (b) Group Lasso T. Zhang (Rutgers) Sparsity Models 19 / 28

37 Group sparsity: incorrect group structure 2 (a) Original (b) Lasso (c) Group Lasso T. Zhang (Rutgers) Sparsity Models 20 / 28

38 Matrix Formulation: Graphical Model Example Learning gene interaction network structure T. Zhang (Rutgers) Sparsity Models 21 / 28

39 Formulation: Gaussian Graphical Model Multi-dimensional Gaussian vectors: X 1,... X n N(µ, Σ). Precision matrix Θ = Σ 1 Non-zeros of precision matrix gives graphical model structure: [ P(X i ) Θ exp 1 ] 2 (X i µ) T Θ(X i µ). where is determinant. Estimation: L 1 regularized maximum likelihood estimator [ ] ˆΘ = arg min ln Θ + tr(ˆσθ) + λ Θ 1, Θ 1 : element L 1 regularization to encourage sparsity ˆΣ: empirical covariance matrix. Analysis exists (feature selection and parameter estimation): techniques similar to L 1 analysis but not satisfactory T. Zhang (Rutgers) Sparsity Models 22 / 28

40 Matrix Completion user movie M1 M2 M3 M4 M5 M6 M7 M8 U1 1? ?? U2 2 4?? 4? 1 2 U ? 4?? 2 U4? ? 1 1 U5 1?? 1?? 3 2 m n matrix: m users and n movies with incomplete ratings can we fill-in the missing values? T. Zhang (Rutgers) Sparsity Models 23 / 28

41 Matrix Completion user movie M1 M2 M3 M4 M5 M6 M7 M8 U1 1? ?? U2 2 4?? 4? 1 2 U ? 4?? 2 U4? ? 1 1 U5 1?? 1?? 3 2 m n matrix: m users and n movies with incomplete ratings can we fill-in the missing values? require assumptions: intuition: U2 and U3 has similar ratings on observed ratings assume they have similar preference low-rank (rank-r) structure: user i to u i R r and movie j to v j R r, with rating u T i v j. Let X the true rating matrix X UV T (U : m r V : n r) T. Zhang (Rutgers) Sparsity Models 23 / 28

42 Formulation Let S = {observed (i, j) entries} Let y ij be the observed values for (i, j) S Let X be the true underlying rating matrix We want to find X to fit observed y ij, assuming X is low-rank: min (X ij y ij ) 2 + λ rank(x). X R m n (i,j) S rank(x): nonconvex function of X convex relaxation: trace-norm X, defined as the sum of singular values of X. The convex reformulation is min (X ij y ij ) 2 + λ X. X R m n (i,j) S Solution of trace norm regularization is low-rank. T. Zhang (Rutgers) Sparsity Models 24 / 28

43 Sparsity versus Low-rank A vector β R p : p parameters reduce dimension sparsity β 0 is small constraint β 0 s is nonconvex convex relaxation: convex hull of unit 1-sparse vectors, which gives L 1 regularization β 1 1 vector solution with L 1 regularization is sparse A matrix X R m n : m n parameters reduce dimension lowrank X = r j=1 u jvj T where u j R m and v j R n are vectors number of parameters no more than rm + rn. rank constraint is nonconvex convex relaxation: convex hull of unit rank-one matrices, which gives trace-norm regularization X 1 matrix solution with trace-norm regularization is low-rank T. Zhang (Rutgers) Sparsity Models 25 / 28

44 Matrix Regularization Example: mixed sparsity and low rank Y (observed) = X L (low-rank) + X S (sparse) [ ˆX S, ˆX L ] = arg min 1 2µ (X S + X L ) Y λ X S 1 + X L }{{}. trace norm trace norm: sum of singular values of a matrix encourage low-rank matrix T. Zhang (Rutgers) Sparsity Models 26 / 28

45 Theoretical Analysis Want to know under what conditions, we can recovery X S and X L Matrix is m n. X S : sparse (spike like outliers) no more than n 0 outliers per row no more than m 0 outliers per column X L : rank is r incoherence: X L is flat no component is large T. Zhang (Rutgers) Sparsity Models 27 / 28

46 Theoretical Analysis Want to know under what conditions, we can recovery X S and X L Matrix is m n. X S : sparse (spike like outliers) no more than n 0 outliers per row no more than m 0 outliers per column X L : rank is r incoherence: X L is flat no component is large Question: how many outliers per row n 0 and per column m 0 are allowed to recover X S and X L? Partial answer (not completely satisfactory): Sparsity pattern supp(x S ) is random: exact recovery under the following conditions m 0 = O(m) and n 0 = O(n). Sparsity pattern supp(x S ) does not have to be random: m 0 c(m/r) and n 0 c(n/r) for some constant c (r is the rank of X L ) T. Zhang (Rutgers) Sparsity Models 27 / 28

47 References Statistical Science Special Issue on Sparsity and Regularization, Structured Sparsity: F. Bach, et al General Theoretical Analysis: S. Negahban et al. Graphical Models: J. Lafferty et al. Nonconvex Methods: CH Zhang and T Zhang... T. Zhang (Rutgers) Sparsity Models 28 / 28

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations

Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations Tong Zhang, Member, IEEE, 1 Abstract Given a large number of basis functions that can be potentially more than the number

More information

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Adaptive one-bit matrix completion

Adaptive one-bit matrix completion Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Restricted Strong Convexity Implies Weak Submodularity

Restricted Strong Convexity Implies Weak Submodularity Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

1-Bit Matrix Completion

1-Bit Matrix Completion 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Constrained optimization

Constrained optimization Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

High-dimensional Statistics

High-dimensional Statistics High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Generalized Conditional Gradient and Its Applications

Generalized Conditional Gradient and Its Applications Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized

More information

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Constructing Explicit RIP Matrices and the Square-Root Bottleneck Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry

More information

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning

High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning Krishnakumar Balasubramanian Georgia Institute of Technology krishnakumar3@gatech.edu Kai Yu Baidu Inc. yukai@baidu.com Tong

More information

1-Bit Matrix Completion

1-Bit Matrix Completion 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible

More information

1-Bit Matrix Completion

1-Bit Matrix Completion 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible

More information

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric

More information

Spectral k-support Norm Regularization

Spectral k-support Norm Regularization Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Statistical Machine Learning for Structured and High Dimensional Data

Statistical Machine Learning for Structured and High Dimensional Data Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,

More information

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD EE-731: ADVANCED TOPICS IN DATA SCIENCES LABORATORY FOR INFORMATION AND INFERENCE SYSTEMS SPRING 2016 INSTRUCTOR: VOLKAN CEVHER SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD STRUCTURED SPARSITY

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

IEOR 265 Lecture 3 Sparse Linear Regression

IEOR 265 Lecture 3 Sparse Linear Regression IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M

More information

Overview. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Overview. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Overview Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 1/25/2016 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Robust Sparse Recovery via Non-Convex Optimization

Robust Sparse Recovery via Non-Convex Optimization Robust Sparse Recovery via Non-Convex Optimization Laming Chen and Yuantao Gu Department of Electronic Engineering, Tsinghua University Homepage: http://gu.ee.tsinghua.edu.cn/ Email: gyt@tsinghua.edu.cn

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Rank minimization via the γ 2 norm

Rank minimization via the γ 2 norm Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Collaborative Filtering Matrix Completion Alternating Least Squares

Collaborative Filtering Matrix Completion Alternating Least Squares Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Generalized greedy algorithms.

Generalized greedy algorithms. Generalized greedy algorithms. François-Xavier Dupé & Sandrine Anthoine LIF & I2M Aix-Marseille Université - CNRS - Ecole Centrale Marseille, Marseille ANR Greta Séminaire Parisien des Mathématiques Appliquées

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

EE 381V: Large Scale Learning Spring Lecture 16 March 7

EE 381V: Large Scale Learning Spring Lecture 16 March 7 EE 381V: Large Scale Learning Spring 2013 Lecture 16 March 7 Lecturer: Caramanis & Sanghavi Scribe: Tianyang Bai 16.1 Topics Covered In this lecture, we introduced one method of matrix completion via SVD-based

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA The Annals of Statistics 2009, Vol. 37, No. 1, 246 270 DOI: 10.1214/07-AOS582 Institute of Mathematical Statistics, 2009 LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA BY NICOLAI

More information

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

Recovery of Simultaneously Structured Models using Convex Optimization

Recovery of Simultaneously Structured Models using Convex Optimization Recovery of Simultaneously Structured Models using Convex Optimization Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Samet Oymak and Babak Hassibi (Caltech) Yonina Eldar (Technion)

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Scale Mixture Modeling of Priors for Sparse Signal Recovery

Scale Mixture Modeling of Priors for Sparse Signal Recovery Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse

More information

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin

More information

Provable Alternating Minimization Methods for Non-convex Optimization

Provable Alternating Minimization Methods for Non-convex Optimization Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish

More information

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Covariate-Assisted Variable Ranking

Covariate-Assisted Variable Ranking Covariate-Assisted Variable Ranking Tracy Ke Department of Statistics Harvard University WHOA-PSI@St. Louis, Sep. 8, 2018 1/18 Sparse linear regression Y = X β + z, X R n,p, z N(0, σ 2 I n ) Signals (nonzero

More information

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION A Thesis by MELTEM APAYDIN Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

CS Homework 3. October 15, 2009

CS Homework 3. October 15, 2009 CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website

More information

Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso

Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso Adam Smith asmith@cse.psu.edu Pennsylvania State University Abhradeep Thakurta azg161@cse.psu.edu Pennsylvania

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12 STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality

More information

Sparsity and the Lasso

Sparsity and the Lasso Sparsity and the Lasso Statistical Machine Learning, Spring 205 Ryan Tibshirani (with Larry Wasserman Regularization and the lasso. A bit of background If l 2 was the norm of the 20th century, then l is

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 3: Sparse signal recovery: A RIPless analysis of l 1 minimization Yuejie Chi The Ohio State University Page 1 Outline

More information

Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery

Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery : Distribution-Oblivious Support Recovery Yudong Chen Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 7872 Constantine Caramanis Department of Electrical

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Nonconvex penalties: Signal-to-noise ratio and algorithms

Nonconvex penalties: Signal-to-noise ratio and algorithms Nonconvex penalties: Signal-to-noise ratio and algorithms Patrick Breheny March 21 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/22 Introduction In today s lecture, we will return to nonconvex

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information