Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Similar documents
Enhanced Compressive Sensing and More

Compressive Sensing Theory and L1-Related Optimization Algorithms

Solution Recovery via L1 minimization: What are possible and Why?

Elaine T. Hale, Wotao Yin, Yin Zhang

Optimization Algorithms for Compressed Sensing

Solving l 1 Regularized Least Square Problems with Hierarchical Decomposition

An Introduction to Sparse Approximation

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Combining geometry and combinatorics

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

FIXED-POINT CONTINUATION APPLIED TO COMPRESSED SENSING: IMPLEMENTATION AND NUMERICAL EXPERIMENTS *

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

Color Scheme. swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14

A Fixed-Point Continuation Method for l 1 -Regularized Minimization with Applications to Compressed Sensing

Sparse Optimization: Algorithms and Applications. Formulating Sparse Optimization. Motivation. Stephen Wright. Caltech, 21 April 2007

Sparse Optimization Lecture: Sparse Recovery Guarantees

Minimizing the Difference of L 1 and L 2 Norms with Applications

Compressive Sensing and Beyond

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Signal Recovery from Permuted Observations

Bias-free Sparse Regression with Guaranteed Consistency

Sensing systems limited by constraints: physical size, time, cost, energy

Compressed Sensing and Sparse Recovery

Compressed Sensing: Extending CLEAN and NNLS

THEORY OF COMPRESSIVE SENSING VIA l 1 -MINIMIZATION: A NON-RIP ANALYSIS AND EXTENSIONS

Basis Pursuit Denoising and the Dantzig Selector

Multipath Matching Pursuit

Greedy Signal Recovery and Uniform Uncertainty Principles

Sparse Signal Reconstruction with Hierarchical Decomposition

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Sparse analysis Lecture III: Dictionary geometry and greedy algorithms

GREEDY SIGNAL RECOVERY REVIEW

Near Optimal Signal Recovery from Random Projections

Strengthened Sobolev inequalities for a random subspace of functions

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Making Flippy Floppy

Optimization for Compressed Sensing

Generalized Orthogonal Matching Pursuit- A Review and Some

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Compressed Sensing: Lecture I. Ronald DeVore

Sparse & Redundant Representations by Iterated-Shrinkage Algorithms

Stochastic geometry and random matrix theory in CS

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Homotopy methods based on l 0 norm for the compressed sensing problem

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

ABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach

Sparsity in Underdetermined Systems

Gauge optimization and duality

Convex Optimization and l 1 -minimization

Compressive Sensing (CS)

Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

A tutorial on sparse modeling. Outline:

SPARSE SIGNAL RESTORATION. 1. Introduction

ANALYSIS AND GENERALIZATIONS OF THE LINEARIZED BREGMAN METHOD

Sparse Solutions of an Undetermined Linear System

Recent Developments in Compressed Sensing

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

A new method on deterministic construction of the measurement matrix in compressed sensing

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Bayesian Methods for Sparse Signal Recovery

CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles

1-Bit Compressive Sensing

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Introduction to compressive sampling

Sparse recovery using sparse random matrices

Sparse Optimization Lecture: Dual Methods, Part I

Does Compressed Sensing have applications in Robust Statistics?

Compressed Sensing - Near Optimal Recovery of Signals from Highly Incomplete Measurements

Practical Compressive Sensing with Toeplitz and Circulant Matrices

THE great leap forward in digital processing over the

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Introduction to Compressed Sensing

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Recovering overcomplete sparse representations from structured sensing

Bayesian Paradigm. Maximum A Posteriori Estimation

Fast Sparse Representation Based on Smoothed

Exact Signal Recovery from Sparsely Corrupted Measurements through the Pursuit of Justice

Compressed Sensing. 1 Introduction. 2 Design of Measurement Matrices

Sparse & Redundant Signal Representation, and its Role in Image Processing

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm

Optimization for Learning and Big Data

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

AN INTRODUCTION TO COMPRESSIVE SENSING

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

The Pros and Cons of Compressive Sensing

Compressed Sensing and Linear Codes over Real Numbers

Compressive sampling meets seismic imaging

ROBUST BINARY FUSED COMPRESSIVE SENSING USING ADAPTIVE OUTLIER PURSUIT. Xiangrong Zeng and Mário A. T. Figueiredo

Compressive Sensing Applied to Full-wave Form Inversion

Self-Calibration and Biconvex Compressive Sensing

Stable Signal Recovery from Incomplete and Inaccurate Measurements

Transcription:

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March 5th, 2008

Outline Outline: CS: Application and Theory Computational Challenges and Existing Algorithms Fixed-Point Continuation: theory to algorithm Exploit Structures in TV-Regularization Acknowledgments: (NSF DMS-0442065) Collaborators: Elaine Hale, Wotao Yin Students: Yilun Wang, Junfeng Yang

Compressive Sensing Fundamental Recover sparse signal from incomplete data Unknown signal x R n Measurements: Ax R m, m < n x is sparse (#nonzeros x 0 < m) Unique x = arg min{ x 1 : Ax = Ax } x is recoverable Ax = Ax under-determined, min x 1 favors sparse x Theory: x 0 < O(m/ log(n/m)) recovery for random A (Donoho et al, Candes-Tao et al..., 2005)

Application: Missing Data Recovery 0.5 Complete data 0 0.5 0 100 200 300 400 500 600 700 800 900 1000 0.5 Available data 0 0.5 0 100 200 300 400 500 600 700 800 900 1000 0.5 Recovered data 0 0.5 0 100 200 300 400 500 600 700 800 900 1000 The signal was synthesized by a few Fourier components.

Application: Missing Data Recovery II Complete data Available data Recovered data 75% of pixels were blacked out (becoming unknown).

Application: Missing Data Recovery III Complete data Available data Recovered data 85% of pixels were blacked out (becoming unknown).

How are missing data recovered? Data vector f has a missing part u: f := [ b u ], b R m, u R n m. Under a basis Φ, f has a representation x, f = Φx, or [ ] [ ] A x b =. B u Under favorable conditions (x is sparse and A is good ), x = arg min{ x 1 : Ax = b}, then we recover missing data u = Bx.

Sufficient Condition for Recovery Feasibility: Define: F = {x : Ax = Ax } {x + v : v Null(A)} S = {i : x i 0}, Z = {1,, n} \ S x 1 = x 1 + ( v Z 1 v S 1 ) + ( x S + v S 1 x S 1 + v S 1 ) > x 1, if v Z 1 > v S 1 x is the unique min. if v 1 > 2 v S 1, v Null(A) \ {0}. Since x 1/2 0 v 2 v S 1, it suffices that v 1 > 2 x 1/2 0 v 2, v Null(A) \ {0}

l 1 -norm vs. Sparsity Sufficient Sparsity for Unique Recovery: x 0 < 1 v 1 2 v 2, v Null(A) \ {0} By uniqueness, Hence, x x, Ax = Ax x 0 > x 0. x = arg min{ x 1 : Ax = Ax } = arg min{ x 0 : Ax = Ax } i.e., minimum l 1 -norm implies maximum sparsity.

In most subspaces, v 1 v 2 In R n, 1 v 1 v 2 n. However, v 1 v 2 in most subspaces (due to concentration of measure). Theorem: (Kashin 77, Garnaev-Gluskin 84) Let A R m n be standard iid Gaussian. With probability above 1 e c 1(n m), v 1 v 2 c 2 m log(n/m), v Null(A) \ {0} where c 1 and c 2 are absolute constants. Immediately, for random A and with high probability x 0 < Cm log(n/m) x is recoverable.

Signs help Theorem: There exist good measurement matrices A R m n so that if then x 0 and x 0 m/2, x = arg min{ x 1 : Ax = Ax, x 0}. In particular, (generalized) Vandermonde matrices (including partial DFT matrices) are good. ( x 0 can be replaced by sign(x ) is known.)

Discussion Further Results: Better estimates on constants (still uncertain) Some non-random matrices are good too (e.g. partial transforms) Implications of CS: Theoretically, sample size n O(k log (n/k)) Work-load shift: encoder decoder New paradigm in data acquisition? In practice, compression ratio not dramatic, but longer battery life for space devises? shorter scan time for MRI?...

Related l 1 -minimization Problems min{ x 1 : Ax = b} (noiseless) min{ x 1 : Ax b ɛ} (noisy) min µ x 1 + Ax b 2 (unconstrained) min µ Φx 1 + Ax b 2 (Φ 1 may not exist) min µ G(x) 1 + Ax b 2 (G( ) may be nonlinear) min µ G(x) 1 + ν Φx 1 + Ax b 2 (mixed form) Φ may represent wavelet or curvelet transform G(x) 1 can represent isotropic TV (total variation) Objectives are not necessarily strictly convex Objectives are non-differentiable

Algorithmic Challenges Large-scale, non-smooth optimization problems with dense data that require low storage and fast algorithms. 1k 1k, 2D-images give over 10 6 variables. Good" matrices are dense (random, transforms...). Often (near) real-time processing is required. Matrix factorizations are out of question. Algorithms must be built on Av and A T v.

Algorithm Classes (I) Greedy Algorithms: Marching Pursuits (Mallat-Zhang, 1993) OMP (Gilbert-Tropp, 2005) StOMP (Donoho et al, 2006) Chaining Pursuit (Gilbert et al, 2006) Cormode-Muthukrishnan (2006) HHS Pursuit (Gilbert et al, 2006) Some require special encoding matrices.

Algorithm Classes (II) Introducing extra variables, one can convert compressive sensing problems into smooth linear or 2nd-order cone programs; e.g. min{ x 1 : Ax = b} LP min{e T x + e T x : Ax + Ax = b, x +, x 0} Smooth Optimization Methods: Projected Gradient: GPSR (Figueiredo-Nowak-Wright, 07) Interior-point algorithm: l 1 -LS (Boyd et al 2007) (pre-conditioned CG for linear systems) l 1 -Magic (Romberg 2006)

Fixed-Point Shrinkage min µ x 1 + f (x) x = Shrink(x τ f (x), τµ) where Shrink(y, t) = sign(y) max( y t, 0) Fixed-point iterations: x k+1 = Shrink(x k τ f (x k ), τµ) directly follows from forward-backward operator splitting (a long history in PDE and optimization since 1950 s) Rediscovered in signal processing by many since 2000 s. Convergence properties analyzed extensively

Forward-Backward Operator Splitting Derivation: min µ x 1 + f (x) 0 µ x 1 + f (x) τ f (x) τµ x 1 x τ f (x) x + τµ x 1 (I + τµ 1 )x x τ f (x) {x} (I + τµ 1 ) 1 (x τ f (x)) x = shrink(x τ f (x), τµ) min µ x 1 + f (x) x = Shrink(x τ f (x), τµ)

New Convergence Results The following are obtained by E. Hale, W, Yin and YZ, 2007. Finite Convergence: for k = O(1/τµ) x k j = 0, if x j = 0 sign(x k j ) = sign(x j ), if x J 0 Rate of convergence depending on reduced Hessian: x k+1 x lim sup k x k x κ(h EE ) 1 κ(hee ) + 1 where H EE is the sub-hessian corresponding to x 0. The bigger µ is, the sparser x is, the faster is the convergence.

Fixed-Point Continuation For each µ > 0, x = Shrink(x τ f (x), τµ) = x(µ) Idea: approximately follow the path x(µ) FPC: Set µ to a larger value. Set initial x. DO until µ it reaches its right value END DO Adjust stopping criterion Start from x, do fixed-point iterations until stop Decrease µ value

Continuation Makes It Kick 10 0 (a) µ = 200 10 0 (b) µ = 1200 10 1 10 1 x x s / x s 10 2 10 2 10 3 With Continuation Without Continuation 10 3 0 50 100 150 Iteration 0 200 400 600 800 Iteration

Discussion Continuation make fixed-point shrinkage practical. FPC appears more robust than StOMP and GPSR, and is faster most times. l 1 -LS is generally slower. 1st-order methods slows down on less sparse problems. 2-order methods have their own set of problems. A comprehensive evaluation is still needed.

Total Variation Regularization Discrete (isotropic) TV for a 2D variable: TV (u) = i,j (Du) ij (1-norm of 2-norms of 1st-order finite difference vectors) convex, non-linear, non-differentiable suitable for sparse Du, not sparse u A mixed-norm formulation: min u µtv (u) + λ Φu 1 + Au b 2

Alternating Minimization Consider linear operator A being a convolution: min u µ i,j (Du) ij + Au b 2 Introducing w ij R 2 and a penalty term: min u,w µ i,j w ij + ρ w Du 2 + Au b 2 Exploit structure by alternating minimization: For fixed u, w has a closed-form solution. For fixed w, quadratic can be minimized by 3 FFTs. (similarly for A being a partial discrete Fourier matrix)

MRI Reconstruction from 15% Fourier Coefficients Original 250 x 250 Original 250 x 250 Original 250 x 250 Original 250 x 250 Original 250 x 250 Original 250 x 250 SNR:14.74, t=0.09 SNR:16.12, t=0.09 SNR:17.72, t=0.08 SNR:16.40, t=0.10 SNR:13.86, t=0.08 SNR:17.27, t=0.10 Reconstruction time 0.1s on a Dell PC (3GHz Pentium).

CS Fundamental Missing Data Recovery Theory Challenges Algorithms FPC Total Variation Image Deblurring: Comparison to Matlab Toolbox Original image: 512x512 Blurry & Noisy SNR: 5.1dB. deconvlucy: SNR=6.5dB, t=8.9 deconvreg: SNR=10.8dB, t=4.4 deconvwnr: SNR=10.8dB, t=1.4 MxNopt: SNR=16.3dB, t=1.6 512 512 image, CPU time 1.6 seconds

The End Thank You!