Compressive Sensing Theory and L1-Related Optimization Algorithms

Similar documents
Enhanced Compressive Sensing and More

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Solution Recovery via L1 minimization: What are possible and Why?

THEORY OF COMPRESSIVE SENSING VIA l 1 -MINIMIZATION: A NON-RIP ANALYSIS AND EXTENSIONS

Strengthened Sobolev inequalities for a random subspace of functions

Sparse Optimization Lecture: Sparse Recovery Guarantees

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Compressed Sensing and Sparse Recovery

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Compressive Sensing and Beyond

An Introduction to Sparse Approximation

Elaine T. Hale, Wotao Yin, Yin Zhang

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

Compressed Sensing: Lecture I. Ronald DeVore

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressive Sensing (CS)

Compressed Sensing. 1 Introduction. 2 Design of Measurement Matrices

Lecture Notes 9: Constrained Optimization

Generalized Orthogonal Matching Pursuit- A Review and Some

Recovering overcomplete sparse representations from structured sensing

Compressed Sensing: Extending CLEAN and NNLS

Optimization Algorithms for Compressed Sensing

Sensing systems limited by constraints: physical size, time, cost, energy

Practical Compressive Sensing with Toeplitz and Circulant Matrices

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Multipath Matching Pursuit

Compressed sensing techniques for hyperspectral image recovery

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

Lecture 22: More On Compressed Sensing

L-statistics based Modification of Reconstruction Algorithms for Compressive Sensing in the Presence of Impulse Noise

Introduction to Compressed Sensing

The Pros and Cons of Compressive Sensing

Compressed Sensing and Related Learning Problems

Combining geometry and combinatorics

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

A Fixed-Point Continuation Method for l 1 -Regularized Minimization with Applications to Compressed Sensing

The Fundamentals of Compressive Sensing

COMPRESSED SENSING IN PYTHON

AN INTRODUCTION TO COMPRESSIVE SENSING

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Lecture: Algorithms for Compressed Sensing

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Exact Signal Recovery from Sparsely Corrupted Measurements through the Pursuit of Justice

Recent Developments in Compressed Sensing

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles

Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Sparse linear models

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Exponential decay of reconstruction error from binary measurements of sparse signals

Adaptive Primal Dual Optimization for Image Processing and Learning

FIXED-POINT CONTINUATION APPLIED TO COMPRESSED SENSING: IMPLEMENTATION AND NUMERICAL EXPERIMENTS *

Sparse Signal Recovery Using Markov Random Fields

Sparsifying Transform Learning for Compressed Sensing MRI

Interpolation via weighted l 1 minimization

GREEDY SIGNAL RECOVERY REVIEW

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Fast Hard Thresholding with Nesterov s Gradient Method

Super-resolution via Convex Programming

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Near Optimal Signal Recovery from Random Projections

On the coherence barrier and analogue problems in compressed sensing

Sparse linear models and denoising

Compressed Sensing Using Reed- Solomon and Q-Ary LDPC Codes

Greedy Signal Recovery and Uniform Uncertainty Principles

ABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng

Low-Dimensional Signal Models in Compressive Sensing

Minimizing the Difference of L 1 and L 2 Norms with Applications

Compressed Sensing and Linear Codes over Real Numbers

Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector

Stochastic geometry and random matrix theory in CS

Reconstruction from Anisotropic Random Measurements

A Generalized Restricted Isometry Property

of Orthogonal Matching Pursuit

Sparse Optimization: Algorithms and Applications. Formulating Sparse Optimization. Motivation. Stephen Wright. Caltech, 21 April 2007

Sparsity Regularization

Sparsity in Underdetermined Systems

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Sparse Legendre expansions via l 1 minimization

Sparse Interactions: Identifying High-Dimensional Multilinear Systems via Compressed Sensing

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Solving l 1 Regularized Least Square Problems with Hierarchical Decomposition

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING

Algorithms for sparse analysis Lecture I: Background on sparse approximation

Necessary and sufficient conditions of solution uniqueness in l 1 minimization

Sparse Signal Reconstruction with Hierarchical Decomposition

Compressed Sensing - Near Optimal Recovery of Signals from Highly Incomplete Measurements

ROBUST BINARY FUSED COMPRESSIVE SENSING USING ADAPTIVE OUTLIER PURSUIT. Xiangrong Zeng and Mário A. T. Figueiredo

Introduction to compressive sampling

ANALYSIS AND GENERALIZATIONS OF THE LINEARIZED BREGMAN METHOD

Solving Underdetermined Linear Equations and Overdetermined Quadratic Equations (using Convex Programming)

IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER

Gauge optimization and duality

A new method on deterministic construction of the measurement matrix in compressed sensing

Optimization for Compressed Sensing

Bias-free Sparse Regression with Guaranteed Consistency

Transcription:

Compressive Sensing Theory and L1-Related Optimization Algorithms Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, USA CAAM Colloquium January 26, 2009

Outline: What s Compressive Sensing (CS)? CS Theory: RIP and Non-RIP Optimization Algorithms Concluding Remarks Acknowledgments Collaborators: Wotao Yin, Elaine Hale Students: Junfeng Yang, Yilun Wang Funding Agencies: NSF, ONR

MRI: Magnetic Resonance Imaging MRI Scan = Fourier coefficients = Images Is it possible to cut the scan time into half?

Numerical Simulation FFT2(image) = Fourier coefficients Pick 25% coefficients at random (with bias) Reconstruct image from the 25% coefficients

Simulation Result Original vs. Reconstructed Image size: 350 350. Reconstruction time: 1s

Image Reconstruction Model min u αtv (u) + β u 1 + 1 2 F pu f p 2 2 u is the unknown image F p partial Fourier matrix f p partial Fourier coefficients TV (u) = i (Du) i = grad magnitude 1 Compressing Sensing may cut scan time 1/2 or more Lustig, Donoho and Pauly, MR Medicine (2008) Research on real-time algorithms still needed

CS Application: Single-pixel Camera Single-pixel Camera Has Multiple Futures ScienceDaily (Oct. 20, 2008) A terahertz version of the single-pixel camera developed by Rice University researchers could lead to breakthrough technologies in security, telecom, signal processing and medicine.... Kelly Lab and Baranuik group, ECE at Rice http://www.dsp.ece.rice.edu/cscamera/

What s Compressive Sensing (CS)? Standard Paradigm in Signal Processing: Sample full data x R n (subject to Nyquist limit). Then compress (transform + truncation) Decoding is simple (inverse transform) Acquisition can become a bottleneck (time, power, speed,...) Paradigm Shift: Compressive Sensing Acquire less data b i = ai T x, i = 1,, m n. Decoding is more costly: getting x from Ax = b. Advantage: Reducing acquisition size from n to m

CS Emerging Methodology Signal x R n. Encoding b = Ax R m Fewer measurements taken (m < n), but no free lunch prior information on signal x required good measurement matrix A needed Prior info is sparsity: Ψx has many elements = 0 (or Ψx 0 is small) When does it work? Sparsfying basis Ψ is known A R m n is random-like m > Ψx 0 sufficiently

Sparsity is Common under Transforms Many have sparse representations under known bases: Fourier, Wavelets, curvelets, DCT,... 2 Signal: x 8 Signal: Ψ x 1.5 6 1 4 2 0.5 0 0 2 0.5 4 1 6 1.5 0 100 200 300 400 500 8 0 100 200 300 400 500 Figure: Before and After DCT Transform

Decoding in CS Given (A, b, Ψ), find the sparsest point: x = arg min{ Ψx 0 : Ax = b} From combinatorial to convex optimization: x = arg min{ Ψx 1 : Ax = b} 1-norm is sparsity promoting (e.g., Santosa-Symes 85) Basis pursuit (Donoho et al 98) Many variants; e.g., Ax b 2 σ for noisy b Greedy algorithms (e.g., Tropp-Gilbert 05,...) Big question: when is 0 = 1?

Notation x(k): k-term Approximation Keeping the k largest elements of x and setting the rest to 0 produce a k-term approximation of x, denoted by x(k). 1 x 1 x(20) 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 10 20 30 40 0 10 20 30 40

CS Theory RIP Based Assume Ψ = I. Let x = arg min{ x 1 : Ax = A x} A Celebrated Result: Theorem: (Candes-Tao 2005, C-Romberg-T 2005) If A R m n is iid standard normal, with high probability (WHP) x x 1 C(RIP 2k (A)) x x(k) 1 for k O (m/[1 + log(n/m)]) (k < m < n). Donoho (2005) obtained similar RIP-like results. Most subsequent analyses use RIP.

What Is RIP? Restricted Isometry Property: RIP k (A) (0, 1) min{σ} so that for some r > 0 (1 σ)r ( ) Ax 2 (1 + σ)r, x 0 = k. x RIP k (A) measures conditioning of {[k columns of A]} Candes-Tao theory requires RIP 2k (A) < 0.414 RIP k (GA) can be arbitrarily bad for nonsingular G

Is RIP indispensable? Invariance of solution w.r.t. nonsingular G: x = arg min{ Ψx 1 : GAx = Gb} E.g., orthogonalize rows of A so GA = Q and QQ T = I. Is GA always as good an encoding matrix as A is?

Is RIP indispensable? Invariance of solution w.r.t. nonsingular G: x = arg min{ Ψx 1 : GAx = Gb} E.g., orthogonalize rows of A so GA = Q and QQ T = I. Is GA always as good an encoding matrix as A is? Of course.

Is RIP indispensable? Invariance of solution w.r.t. nonsingular G: x = arg min{ Ψx 1 : GAx = Gb} E.g., orthogonalize rows of A so GA = Q and QQ T = I. Is GA always as good an encoding matrix as A is? Of course. But Candes-Tao theory doesn t say so. Moreover, RIP conditions are known to be overly stringent RIP analysis is not simple nor intuitive (in my view)

A Non-RIP Analysis (Z, 2008) Lemma: For any x, y R n and p = 0, 1, y 0 < 1 2 x y 1 x y 2 = y p < x p. Define F {x : Ax = Ax } = x + Null(A). Corollary: If above holds for y = x F and all x F \ {x }, x = arg min{ x p : Ax = Ax }, p = 0, 1. The larger the 1 vs 2 norm ratio in Null(A), the better. What really matters is Null(A), not representation A.

In the entire space R n, 1 vs 2 Ratio is Mostly Large 1 v 1 / v 2 n, but the ratio 1 in most subspaces (or WHP). A result from Kashin-Garnaev-Gluskin (1978,84) In most (n m)-d subspaces S R n (or WHP), v 1 m > c 1, v S. v 2 log(n/m) In fact, Prob({good S}) 1 as n m. A random space will give the best chance. Random-like matrices should do OK too.

1 vs 2 Ratio in R 2 E.g., in most subspaces, v 1 / v 2 > 80% 2.

An RIP-free Result Let x = arg min{ x 1 : GAx = GA x} Theorem (Z, 2008): For all k < k and x x(k) small, x x(k) p C(k/k ) P r ( x x(k)) p, p = 1 or 2 P r = orthogonal projection onto Range(A T ) k c 1 m/[1 + log(n/m)] WHP if A ij N (0, 1) (compare C-T: x x 1 C(RIP 2k (A)) x x(k) 1 ) Key Difference: RIP: the smaller RIP, the more stable RIP-free: the sparser, the more stable

RIP-free Constant C(k/k ) C(ν) = 1 + 1 + ν 2 ν 2 1 ν 2 2, ν [0, 1) 12 C(ν) 11 10 9 8 7 6 5 4 3 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ν

Models with More Prior Information Theoretical guarantees previously existed only for min{ Ψx 1 : Ax = b} min{ Ψx 1 : Ax = b, x 0} Non-RIP analysis (Z, 2008) extends to min{ Ψx 1 : Ax = b, x S} e.g., min{ Ψx 1 : Ax = b, x ˆx δ} e.g., min{ Ψx 1 + µtv(x) : Ax = b}

Uniform Recoverability What types of random matrices are good? Standard normal (Candes-Tao 05, Donoho 05) Bernoulli and a couple more (Baraniuk-Davenport-DeVore-Wakin, 07) Some partial orthonormal matrices (Rudelson-Vershynin, 06) Uniform Recoverability (Z, 2008) All iid random matrices are asymptotically equally good as long as the 4 + δ moment remains bounded used a random determinant result by Girko (1998)

Algorithms for CS Algorithmic Challenges in CS Large dense matrices, (near) real-time processing Standard (simplex, interior-point) methods not suitable Optimization seems more robust than greedy methods In many cases, it is faster than other approaches. Efficient algorithms can be built on Av and A T v. Solution sparsity helps. Fast transforms help. Structured random matrices help.

Fixed-point Shrinkage Algorithm: min x 1 + µf (x) x x k+1 = Shrink(x k τ f (x k ), τ/µ) where Shrink(y, t) = y Proj [ t,t] (y) A first-order method follows from classic operator splitting Discovered in signal processing by many since 2000 s Convergence properties analyzed extensively

New Convergence Results (Hale, Yin & Z, 2007) How can solution sparsity help a 1st-order method? Finite Convergence: for all but a finite # of iterations, sign(x k j ) = sign(x j ), x k j = 0, if x j = 0 if x j 0 q-linear rate depending on reduced Hessian: x k+1 x lim sup k x k x κ(h EE ) 1 κ(hee ) + 1 where H EE is a sub-hessian of f at x (κ(h EE ) κ(h )), and E = supp(x ) (under a regularity condition). The sparser x is, the faster the convergence.

FPC: Fixed-Point Continuation (say, µ k = 2µ k 1 ) 10 0 µ = 200 Relative Error 10!1 10!2 FPC with Continuation FPC without Continuation 10 0 0 20 40 60 80 100 120 Inner Iteration µ = 1200 Relative Error 10!1 10!2 10!3 FPC with Continuation FPC without Continuation 0 100 200 300 400 500 600 700 Inner Iteration (Numerical comparison results in Hale, Yin & Z 2007)

FPC-AS: FPC + Active Set 1st-order CS methods slow down or fail when sparsity approaches the threshold of the L0-L1 equivalence. Can the number of measurements be pushed to limit? Active Set: Combining 1st and 2nd orders min x x 1 + µ 2 Ax b 2 2 Use shrinkage to estimate support and signs (1st order) Fix support and signs, solve the reduced QP (2nd order) Repeat until convergence Solved some hard problems on which other solvers failed Z. Wen, W. Yin, D. Goldfarb and Z, 2008

Results on a Difficult Problem min x 1, s.t. Ax = b where A R m n is a partial DCT matrix (dense). Table: Comparison of 6 Solvers Problem Solver Rel-Err CPU Ameth6Xmeth2K151 FPC 4.8e-01 60.8 n = 1024 spg-bp 4.3e-01 50.7 m = 512 Cplex-primal 1.0e-12 19.8 k = 150 Cplex-dual 9.7e-13 11.1 x i = ±1 Cplex-barrier 2.7e-12 22.1 FPC-AS 7.3e-10 0.36

TV Regularization Discrete total variation (TV) for an image: TV (u) = D i u (sum over all pixels) (1-norm of gradient magnitude) Advantage: able to capture sharp edges Rudin-Osher-Fatemi 1992, Rudin-Osher 1994 Also useful in CS applications (e.g., MRI) Fast TV algorithms were in dire need for many years

FTVd: Fast TV deconvolution (TVL2) min Di u + µ Ku f 2 u 2 Introducing w i R 2 (grayscale) and quadratic penalty: ( min w i + β ) u,w 2 w i D i u 2 + µ Ku f 2 2 In theory, u(β) u as β. In practice, β = 200 suffices. Alternating Minimization: For fixed u, {w i } solved by 2D-shrinkage at O(N) For fixed {w i }, u solved by 2 FFTs at O(N log N) Extended to color (2 6), TVL1, and CS (Yang-Wang-Yin-Z, 07-08)

10 4 10 3 FTVd vs. Lagged Diffusivity FTVd: Test No.1 LD: Test No.1 FTVd: Test No.2 LD: Test No.2 Running times (s) 10 2 10 1 10 0 4 6 8 10 12 14 16 18 20 hsize (Test 1: Lena 512 by 512; Test 2: Man 1024 by 1024)

Color Deblurring with Impulsive Noise Bn. RV 40% Bn. RV 50% Bn. RV 60% µ: 8, t: 161s, SNR: 16dB µ: 4, t: 181s, SNR: 14dB µ: 2, t: 204s, SNR: 10dB

Extension to CS RecPF: Reconstruction from Partial Fourier Data min u TV (u) + λ Ψu 1 + µ F p (u) f p 2 Based on same splitting and alternating idea (Yang-Z-Yin, 08) Matlab Code: http://www.caam.rice.edu/ optimization/l1/recpf Figure: 250x250 images, 17% coefficients, CPU 0.2s

Concluding Remarks What I have learned so far: CS can reduce data acquisition costs: less data, (almost) same content CS poses many algorithmic challenges: optimization algorithms for various models Case-by-case studies are often required: finding and exploiting structures Still a long way to go from theory to practice: but potentials and promises are real Will CS be revolutionary?

Compressive Sensing Resources: http://www.dsp.ece.rice.edu/cs/ Papers and Software (FPC, FTVd and RecPF) available at: http://www.caam.rice.edu/ optimization/l1 Thank You!

Happy Lunar New Year!