Sparsity in Underdetermined Systems

Similar documents
An Introduction to Sparse Approximation

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

Sparse & Redundant Signal Representation, and its Role in Image Processing

Basis Pursuit Denoising and the Dantzig Selector

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Sparse Approximation and Variable Selection

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

Signal Recovery, Uncertainty Relations, and Minkowski Dimension

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

Lecture 24 May 30, 2018

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Sparse representation classification and positive L1 minimization

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Phase Transition Phenomenon in Sparse Approximation

Sparse linear models

ISyE 691 Data mining and analytics

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Tractable Upper Bounds on the Restricted Isometry Constant

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Near Optimal Signal Recovery from Random Projections

2.3. Clustering or vector quantization 57

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

SPARSE signal representations have gained popularity in recent

of Orthogonal Matching Pursuit

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

1 Regression with High Dimensional Data

Reconstruction from Anisotropic Random Measurements

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit

ESL Chap3. Some extensions of lasso

Greedy Signal Recovery and Uniform Uncertainty Principles

Sparse & Redundant Representations by Iterated-Shrinkage Algorithms

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Lasso Regression: Regularization for feature selection

Stable Signal Recovery from Incomplete and Inaccurate Measurements

Uncertainty principles and sparse approximation

Linear Methods for Regression. Lijun Zhang

GREEDY SIGNAL RECOVERY REVIEW

The Sparsity Gap. Joel A. Tropp. Computing & Mathematical Sciences California Institute of Technology

Sparse Linear Models (10/7/13)

Recent developments on sparse representation

Lecture 5 : Sparse Models

Multipath Matching Pursuit

Sensing systems limited by constraints: physical size, time, cost, energy

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Introduction to Compressed Sensing

Elaine T. Hale, Wotao Yin, Yin Zhang

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Morphological Diversity and Source Separation

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Simultaneous Sparsity

Multiple Change Point Detection by Sparse Parameter Estimation

Signal Recovery from Permuted Observations

Homotopy methods based on l 0 norm for the compressed sensing problem

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressive Sensing and Beyond

Compressed Sensing and Neural Networks

Least Angle Regression, Forward Stagewise and the Lasso

Regularization Algorithms for Learning

Bayesian Compressive Sensing and Projection Optimization

Bias-free Sparse Regression with Guaranteed Consistency

Bayesian Methods for Sparse Signal Recovery

IN a series of recent papers [1] [4], the morphological component

Error Correction via Linear Programming

Sparse Vector Distributions and Recovery from Compressed Sensing

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images: Final Presentation

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

Lasso Regression: Regularization for feature selection

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Greedy and Relaxed Approximations to Model Selection: A simulation study

LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING. Saiprasad Ravishankar and Yoram Bresler

Robust Sparse Recovery via Non-Convex Optimization

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

LASSO Review, Fused LASSO, Parallel LASSO Solvers

Guaranteed Sparse Recovery under Linear Transformation

On the Role of the Properties of the Nonzero Entries on Sparse Signal Recovery

Stability and Robustness of Weak Orthogonal Matching Pursuits

The Pros and Cons of Compressive Sensing

Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector

A tutorial on sparse modeling. Outline:

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

MATCHING PURSUIT WITH STOCHASTIC SELECTION

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE

Analysis of Greedy Algorithms

1-Bit Compressive Sensing

RSP-Based Analysis for Sparsest and Least l 1 -Norm Solutions to Underdetermined Linear Systems

Compressed Sensing and Related Learning Problems

The lasso, persistence, and cross-validation

Optimization for Compressed Sensing

Shifting Inequality and Recovery of Sparse Signals

A Constructive Approach to L 0 Penalized Regression

Transcription:

Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005

Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2 ) > Linear model, with β > Estimate with 1 ( X ' X) ' X y

Developing Trend > Classical model requires, but recent developments have pushed people beyond the classical model, to. p n p < n

New Data Types > MicroArray Data: is number of genes, is number of patients > Financial Data: is number of stocks, prices, etc, is number of time points n p p > Data Mining: automated data collection can imply large numbers of variables > Texture Classification in Images (eg. satellite): p is number of pixels, n is number of images n

Estimating the model > Can we find an estimate for when? β p n > George Box (1986) Effect-Sparsity: the vast majority of factors have zero effect, only a small fraction actually affect the reponse. y = Xβ + ε β > can still be modeled but now must be sparse, containing a few nonzero elements, the remaining elements zero.

Strategies for Sparse Modeling 1. All Subsets Regression Fit all possible linear models for all levels of sparsity. 2. Forward Stepwise Regression Greedy approach that chooses each variable in the model sequentially by significance level. 3. LASSO (Tibshirani 1994), LARS (Efron, Hastie, Johnstone, Tibshirani 2002) shrinks some coefficient estimates to zero. 4. Stochastic Search Variable Selection (SSVS) (George & McCulloch 1993) Set a prior that stipulates only a few terms in the model.

LASSO and LARS: a quick tour > LASSO solves: for a choice of. t min β y Xβ s.t. β t 2 2 1 > LARS: a stepwise approximation to LASSO Advantage: guaranteed to stop in n steps > Allowing variables to be removed from the current set of variables, gives LASSO solution.

A New Perspective > Up until now we ve described the statistical view of the problem when. > Now introduce ideas from Signal Processing and a new tool for understanding regression when, in the case of large. p p > Claim: This will allow us to see that, for certain problems, statistical solutions such as LASSO, LARS, are just as good as all subsets regression. n > n n

Background from Signal Processing > There exists a signal, and several ortho-bases (eg. sinusoids, wavelets, gabor). > Concatenation of several ortho-bases is a dictionary. > Postulate that the signal is sparsely representable, i.e. made up from few components of the dictionary. > Motivation: Image = Texture + Cartoon Signal = Sinusoids + Spikes Signal = CDMA + TDMA + FM + y

Overcomplete Dictionaries 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 Canonical Basis φ( ω,0) φ ( ω1, ) Standard Fourier Basis n orthogonal columns where ωk = 2 π k, k = 0,, n/2 0,1 indicates cosine, sine orthogonal columns A= [ B B ] C F n 2n n is an overcomplete dictionary

Example: Image = Wavelets + Ridgelets (Stark 2000) Wavelet Component Ridgelet Component

Example: Image = Texture + Cartoon (Elad and Starck 2003) Cartoon (Curvelets) Texture (local sinusoids)

Example: Image = Texture + Cartoon (Elad and Starck 2003) Cartoon Original Image

Formal Signal Processing Problem Description Signal decomposition: y = Ax With a noise term: y = Ax+ z, z N(0, σ 2 ) Signal Matrix Coefficients Noise n p Regression y X β ε observations predictors Decomposition y A x z signal length p / n = #bases p > n If #bases > 1,.

Signal Processing Solutions 1..Matching Pursuit (Mallat, Zhang 1993) Forward Stepwise Regression 2..Basis Pursuit (Chen, Donoho 1994) Simple global optimization criteria: ( 1 ) min x x 1 s.t. P y = Ax 3. Maximally Sparse Solution: Intuitively most compelling but not feasible! ( 0 ) min x x 0 s.t. P y = Ax

l 0 Problem Impossible! > We can t hope to do an all subsets search, but we are lucky! ( P) 1 is a convex problem, and it can sometimes solve ( P ). 0

( l1, l0) Equivalence ( P ) > Signal processing results show 1 solves for certain problems. ( P0) > Donoho, Huo (IEEE IT, 2001) > Donoho, Elad (PNAS, 2003) > Tropp (IEEE IT, 2004) > Gribonval (IEEE IT, 2004) > Candès, Romberg, Tao (IEEE IT, to appear)

Phase Transition in Random Matrix Model A A, N(0,1) > n p, i j > y = Ax, where x has k random nonzeros, positions random. > Phase Plane ( δ, ρ) ρ = k/ n : degree of sparsity δ = n/ p : degree of underdetermination ρ w( δ ) Theorem (DLD 2005) There exists a critical such that, for every ρ < ρ w, for the overwhelming majority of ( ya, ) pairs, if ρ < ρ w, ( P1) solves ( P0).

( l l ) Phase Transition: 1, 0 equivalence ρ = k/ n Combinatorial Search! Combinatorial Search! P solves P0 1 δ = n/ p

Paradigm for study P > is a property of an algorithm, > ( ya, ) is a random ensemble, > Find the Phase Transitions for property. P > Approach pioneered by Donoho, Drori, and Tsaig P Many properties studied Many phase transitions

Phase Diagram Approach 1. Generate, where sparse. 2. Run full solution path, 3. Call the final path element, 4. Property P y Ax 0 : = 0 x x x 1 0 2 0 2 The following figures are due to Iddo Drori and Yaakov Tsaig. x ε x 1

Phase Diagram: Noiseless Model ρ = k/ n δ = n/ p LASSO δ = n/ p LARS p n = 200 varies 10 to 200

Phase Diagram: Noiseless Model ρ = k/ n δ = n/ p p n = 200 varies 10 to 200 Forward Stepwise

Phase Transition Surprises > Surprise: LASSO, in the noiseless case, final point same as best subset, for < ρ ρ LASSO > Hoped for: LARS, in the noiseless case, same as best subset, for < ρ ρ LARS > Surprise: Stepwise, in the noiseless case, only successful for ρ c ρ LASSO

This implies a statistics question! > Could results like these describe linear regression for noisy data? > For example, when are LASSO, LARS, Forward Stepwise just as good as all subsets regression? > Reformulate problems with Noise: ( 0, λ) min P β y Xβ + λ β ( P1, λ) min β y Xβ + λ β 2 2 0 2 2 1

Experiment Setup X > n p, with random entries generated from N(0,1), and normalized columns. > β is a p-vector with the first k entries drawn from U (0,100) remaining entries 0. > ε is a random N(0,16) n-vector. >Create y = Xβ + ε ˆβ > We find the solution using and an algorithm (LASSO, LARS, Forward Stepwise) with y and X as inputs and the appropriate stopping criterion.

Implementation of LASSO > Engineering approach: stopping criteria: y X ˆ β z 2 2 2 2

Questions > Will there be any phase transition? > Can we learn something about the properties of these algorithms from the Phase Diagram?

Property of Interest: Risk Ratio > Ideal Risk: > Actual Risk: R ˆ tr X X R ideal t 1 2 ( β ) = {( I I) } σ 2 ( ˆ β ) = E ˆ β β 2 > Compare empirical risk to notion of oracle inequality (Donoho and Johnstone 1994) 2 ideal R c0+ c1σ R c = σ ; c = 2log( p) 2 0 1 R( ˆ β ) > Can plot in a Phase Diagram. ideal R ( ˆ β )

Phase Diagram: LASSO for Noisy Model p n = 200 Risk Ratio varies 10 to 200

Phase Diagram: LARS for Noisy Model p n = 200 Risk Ratio varies 10 to 200

Phase Diagram: Stepwise for Noisy Model p n = 200 Risk Ratio varies 10 to 200

Experiences with Noisy Case > Phase Diagrams revealing, stimulating. > Stepwise Regression falls apart at a critical sparsity level (why?) > LARS in same cases works very well! > Suggests other interesting properties to study. > Other algorithms: Forward Stagewise, Backward Elimination, SVSS.

Introducing SparseLab! http://www-stat.stanford.edu/~sparselab > Make software solutions for sparse systems available. > Growing research on sparsity, variable selection issues could advance the research community if they have standard tools. > SparseLab is a system to do this.

SparseLab in Depth > Currently included solvers: IHT Iterative Hard Thresholding IRWLS Iteratively Reweighted Least Squares IST Iterative Soft Thresholding LARS Least Angle Regression MP Matching Pursuit OMP Orthogonal Matching Pursuit PDCO Primal-Dual Barrier Method for Convex Objectives Simplex Forward Stepwise

SparseLab in Depth > Current examples: Compressed Sensing Deconvolving mrna Error Correcting Codes NMR Spectroscopy NonNegative Fourier Time Frequency

SparseLab in Depth > Reproducible Research: SparseLab makes available the code to reproduce figures in published papers. > Papers currently included: Extensions of Compressed Sensing (Tsaig, Donoho 2005) 1 Breakdown of Equivalence between the Minimal l -norm Solution and the Sparsest Solution (Tsaig, Donoho 2004) High-Dimensional Centrally-Symmetric Polytopes With Neighborliness Proportional to Dimension (Donoho 2005) > All open source!

Conclusions: Ambitions for Thesis > SparseLab publicly available Containing 5-10 papers Intuitive interfaces for both Statistics and Signal Processing > Use of Phase Diagram to investigate behavior of: LARS LASSO Forward Stepwise Forward Stagewise SVSS Log Penalty in the noisy case

Acknowledgments Dave Donoho Iddo Drori Joshua Sweetkind-Singer Yaakov Tsaig