Enhanced Compressive Sensing and More

Similar documents
Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Compressive Sensing Theory and L1-Related Optimization Algorithms

Solution Recovery via L1 minimization: What are possible and Why?

THEORY OF COMPRESSIVE SENSING VIA l 1 -MINIMIZATION: A NON-RIP ANALYSIS AND EXTENSIONS

Sparse Optimization Lecture: Sparse Recovery Guarantees

Gauge optimization and duality

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Compressed Sensing: Lecture I. Ronald DeVore

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Elaine T. Hale, Wotao Yin, Yin Zhang

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Optimization Algorithms for Compressed Sensing

Strengthened Sobolev inequalities for a random subspace of functions

Adaptive Primal Dual Optimization for Image Processing and Learning

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

Minimizing the Difference of L 1 and L 2 Norms with Applications

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise

Optimization for Compressed Sensing

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Compressed Sensing and Sparse Recovery

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Does Compressed Sensing have applications in Robust Statistics?

An Introduction to Sparse Approximation

Color Scheme. swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14

Interpolation via weighted l 1 minimization

Exponential decay of reconstruction error from binary measurements of sparse signals

FIXED-POINT CONTINUATION APPLIED TO COMPRESSED SENSING: IMPLEMENTATION AND NUMERICAL EXPERIMENTS *

Recent Developments in Compressed Sensing

Multipath Matching Pursuit

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Compressive Sensing and Beyond

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

The Pros and Cons of Compressive Sensing

Introduction to Compressed Sensing

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Lecture Notes 9: Constrained Optimization

Compressed sensing techniques for hyperspectral image recovery

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Sparse Interactions: Identifying High-Dimensional Multilinear Systems via Compressed Sensing

Recovering overcomplete sparse representations from structured sensing

AN INTRODUCTION TO COMPRESSIVE SENSING

GREEDY SIGNAL RECOVERY REVIEW

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Sparse Solutions of an Undetermined Linear System

Determinis)c Compressed Sensing for Images using Chirps and Reed- Muller Sequences

Optimisation Combinatoire et Convexe.

Analog-to-Information Conversion

SPARSE SIGNAL RESTORATION. 1. Introduction

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

Reconstruction from Anisotropic Random Measurements

Bias-free Sparse Regression with Guaranteed Consistency

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

A Generalized Restricted Isometry Property

Practical Compressive Sensing with Toeplitz and Circulant Matrices

Sparse Optimization Lecture: Dual Methods, Part I

A Fixed-Point Continuation Method for l 1 -Regularized Minimization with Applications to Compressed Sensing

COMPRESSIVE SAMPLING USING EM ALGORITHM. Technical Report No: ASU/2014/4

Fast and Robust Phase Retrieval

Stochastic geometry and random matrix theory in CS

Self-Calibration and Biconvex Compressive Sensing

Abstract This paper is about the efficient solution of large-scale compressed sensing problems.

Generalized Orthogonal Matching Pursuit- A Review and Some

of Orthogonal Matching Pursuit

Rui ZHANG Song LI. Department of Mathematics, Zhejiang University, Hangzhou , P. R. China

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING

Compressive Sensing (CS)

Low-Dimensional Signal Models in Compressive Sensing

Necessary and sufficient conditions of solution uniqueness in l 1 minimization

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

arxiv: v9 [cs.cv] 12 Mar 2013

CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles

COMPRESSED SENSING IN PYTHON

A new method on deterministic construction of the measurement matrix in compressed sensing

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration

Sensing systems limited by constraints: physical size, time, cost, energy

A Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing

Solving Corrupted Quadratic Equations, Provably

Solving l 1 Regularized Least Square Problems with Hierarchical Decomposition

Compressed Sensing with Quantized Measurements

ANALYSIS AND GENERALIZATIONS OF THE LINEARIZED BREGMAN METHOD

Exact Signal Recovery from Sparsely Corrupted Measurements through the Pursuit of Justice

Solving Underdetermined Linear Equations and Overdetermined Quadratic Equations (using Convex Programming)

Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property

Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit

Fast Hard Thresholding with Nesterov s Gradient Method

CSC 576: Variants of Sparse Learning

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit

On the Observability of Linear Systems from Random, Compressive Measurements

ROBUST BINARY FUSED COMPRESSIVE SENSING USING ADAPTIVE OUTLIER PURSUIT. Xiangrong Zeng and Mário A. T. Figueiredo

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

EUSIPCO

Bayesian Paradigm. Maximum A Posteriori Estimation

Random Coding for Fast Forward Modeling

Transcription:

Enhanced Compressive Sensing and More Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Nonlinear Approximation Techniques Using L1 Texas A & M University May 16th, 2008

Outline Collaborators Wotao Yin, Elaine Hale Students: Yilun Wang, Junfeng Yang Acknowledgment: NSF DMS-0442065 The Outline of the Talk Compressive Sensing (CS): when l 0 l 1? An accessible proof and Enhanced CS Rice L1-Related Optimization Project An L1 Algorithm: FPC A TV Algorithm: FTVd Numerical Results

Compressive Sensing (CS) Recover sparse signal from incomplete data: Unknown signal x R n Measurements: b = Ax R m, m < n x is sparse: #nonzeros x 0 < m 1 Solution to 2 Problems? l 0 -Prob: min{ x 0 : Ax = b} sparsest solution (hard) l 1 -Prob: min{ x 1 : Ax = b} lin. prog. solution (easy) Recoverability: When does the same x solve both?

CS Recoverability When does the following happen? (b = Ax ) {x } = arg min{ x 0 : Ax = b} = arg min{ x 1 : Ax = b} Answer: For a random A R m n, x 0 < c m log(n/m). Candes-Romberg-Tao, Donoho et al, 2005 Rudelson-Vershynin, 2005, 2006 Baraniuk-Davenport-DeVore-Wakin, 2007...

Recoverability Guarantees Theoretical guarantees available: min{ Φx 1 : Ax = b}, min{ Φx 1 : Ax = b, x 0} (Donoho-Tanner 2005, Z 2005) What about these convex models? min{ Φx 1 + µtv(x) : Ax = b} min{ Φx 1 + µ x ˆx : Ax = b} min{ Φx 1 : Ax = b, Bx c, x [l, u]}

CS Analysis When is l 0 l 1? Most analyses are based on the notion of RIP: Restricted Isometry Property Or based on counting faces of polyhedrons Derivations are quite involved and not transparent Generalize CS analysis to more models? A simpler, gentler, more general analysis? Yes. Using Kashin-Garnaev-Gluskin (KGG) inequality. (Extension to Z, CAAM Report TR05-09)

KGG Result l 1 -norm vs. l 2 -norm: n v 1 v 2 1, v R n \ {0} However, v 1 / v 2 1 in most subspaces of R n. Theorem: (Kashin 77, Garnaev-Gluskin 84) Let A R m n be iid Gaussian. With probability > 1 e c 1(n m), v 1 v 2 c 2 m log(n/m), v Null(A) \ {0} where c 1 and c 2 are absolute constants.

A Picture in 2D In most subspaces, v 1 / v 2 0.8 2 > 1.1

Sparsest Point vs. l p -Minimizer, p (0, 1] When does the following hold on C R n? {x } = arg min x C x 0 = arg min x C x p 1 This means: (i) l 0 l p on C, (ii) uniqueness of x. A Sufficient Condition entirely on sparsity x 0 < 1 2 v p 1 v p 2, v (C x ) \ {0} (10-line, elementary proof skipped)

Recoverability Proved and Generalized For C = {x : Ax = b, x S}, C x = Null(A) (S x ), S R n [l 0 l p ] x 1 2 0 < 1 v p 1 2 v p, v Null(A) (S x ) \ {0} 2 For a Gaussian random A, by GKK [l 0 l p ] on C h.p. = x 0 < c(p) m log(n/m) (Stability results also available for noisy data)

Enhanced Compressive Sensing ECS: with prior information x S min{ x 1 : Ax = b, x S} We have shown ECS recoverability is at least as good as CS. More prior information (beside nonnegativity)? min{ x 1 + µ TV(x) : Ax = b} S = {x : TV(x) δ} min{ x 1 + µ x ˆx : Ax = b} S = {x : x ˆx δ}... and many more possibilities. More ECS models, more algorithmic challenges for optimizers.

ECS vs. CS: A case study Unknown signal x close to a prior sparse x p : ECS: min{ x 1 : Ax = b, x x p 1 δ} With 10% differences in supports and nonzero values, 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Recovery Rate Maximum Relative Error 1.4 1.2 1 0.8 0.6 0.4 0.2 0 10 20 30 40 0 10 20 30 40

Rice L1-Related Optimization Project Computational & Applied Math. Dept. in Engineering School: Y. Z., Wotao Yin (Elaine Hale, left) Students Optimization Algorithmic Challenges in CS Large-scale, (near) real-time processing Dense matrices, non-smooth objectives Traditional (simplex, interior-point) methods have trouble. Can convex optimization be practical in CS?

Convex Optimization Works in CS Convex Optimization is generally more robust w.r.t noise. Is it too slow for large-scale applications? In many cases, it is faster than other approaches. Solution sparsity helps. Fast transforms help. Structured random matrices help. Efficient algorithms can be built on Av and A T v. Real-time algorithms are possible for problems with special structures (like MRI). 2 examples from our work: FPC and FTVd

Forward-Backward Operator Splitting Derivation (since 1950 s): min x 1 + µf (x) 0 x 1 + µ f (x) τµ f (x) τ x 1 x τµ f (x) x + τ x 1 (I + τ 1 )x x τµ f (x) {x} (I + τ 1 ) 1 (x τµ f (x)) x = shrink(x τ f (x), τ/µ) Equivalence to Fixed Point min x 1 + µf (x) x = Shrink(x τ f (x), τ/µ)

Fixed-point Shrinkage Algorithm: min x 1 + µf (x) x x k+1 = Shrink(x k τ f (x k ), τ/µ) where Shrink(y, t) = y Proj [ t,t] (y) A first-order method follows from FB-operator splitting Discovered in signal processing by many since 2000 s Convergence properties analyzed extensively

New Convergence Results (Hale, Yin & Z, 2007) How can solution sparsity help a 1st-order method? Finite Convergence: for all but a finite # of iterations, sign(x k j ) = sign(x j ), x k j = 0, if x j = 0 if x j 0 q-linear rate depending on reduced Hessian: x k+1 x lim sup k x k x κ(h EE ) 1 κ(hee ) + 1 where H EE is a sub-hessian of f at x (κ(h EE ) κ(h )), and E = supp(x ) (under a regularity condition). The sparser x is, the faster the convergence.

FPC: Fixed-Point Continuation x(µ) := arg min x 1 + µf (x) x Idea: approximately follow the path x(µ) FPC: Set µ = µ 0 < µ max, and x 0. Do until µ µ max 1. Starting from x 0, do shrinkage until converged End Do 2. Set µ = 2µ, and x 0 to the previous solution. Smaller µ sparser x(µ) faster convergence Converges is also fast for larger µ due to warm starts. Generally effective, may slow down near boundary.

Continuation Makes It Kick 10 0 µ = 200 Relative Error 10!1 10!2 FPC with Continuation FPC without Continuation 10 0 0 20 40 60 80 100 120 Inner Iteration µ = 1200 Relative Error 10!1 10!2 10!3 FPC with Continuation FPC without Continuation 0 100 200 300 400 500 600 700 Inner Iteration (Numerical comparison results in Hale, Yin & Z 2007)

Random Kronicker Products Plus FPC Fully random matrices are computationally costly. Random Kronicker Products A = A 1 A 2 R m n, where we only store A 1 (m/p n/q) and A 2 (p q). ( ) Ax = vec A 2 XA T 1 also requires much less computation. Trial: n = 1M, m = 250k, x 0 = 25k, p = 500, q = 1000. 2 (500 by 1000) matrices stored (reduction of 0.5M times). FPC solved the problem in 47s on a PC.

Total Variation Regularization Discrete total variation (TV) for an image u: TV (u) = D i u (sum over all pixels) (1-norm of the vector of 2-norms of discrete gradients) Advantage: able to capture sharp edges Rudin-Osher-Fatemi 1992, Rudin-Osher 1994 Recent Survey: Chan and Shen 2006 Non-smoothness and non-linearity cause computational difficulties. Can TV be competitive in speed with others (say, Tikhonov-like regularizers)?

Fast TV deconvolution (FTVd) (TV + L2) min Di u + µ Ku f 2 u 2 Introducing w i R 2 and a quadratic penalty (Courant 1943): ( min w i + β ) u,w 2 w i D i u 2 + µ Ku f 2 2 In theory, u(β) u as β. In practice, β = 200 suffices. Alternating Minimization: For fixed u, w i can be solved by a 2D-shrinkage. For fixed w, quadratic can be minimized by 3 FFTs. (Wang, Yang, Yin &Z, 2007, 2008)

FTVd FTVd is a long-missing member of the half-qudratic class (Geman-Yang 95), using a 2D Huber-like approximation. FTVd Convergence Finite convergence for wi k 0 (sparsity helps). Strong q-linear convergence rates for the others Rates depend on submatrices (sparsity helps). Continuation accelerates practical convergence. FTVd Performance Orders of magnitude faster than Lagged Diffusivity. Comparable speed with Matlab deblurring, with better quality. TV models has finally caught up in speed.

FTVd vs. Lagged Diffusivity 10 4 10 3 FTVd: Test No.1 LD: Test No.1 FTVd: Test No.2 LD: Test No.2 Running times (s) 10 2 10 1 10 0 4 6 8 10 12 14 16 18 20 hsize (Test 1: Lena 512 by 512; Test 2: Man 1024 by 1024)

FTVd vs. Others Blurry&Noisy. SNR: 5.19dB ForWaRD: SNR: 12.46dB, t = 4.88s FTVd: β = 2 5, SNR: 12.58dB, t = 1.83s deconvwnr: SNR: 11.51dB, t = 0.05s deconvreg: SNR: 11.20dB, t = 0.34s FTVd: β = 2 7, SNR: 13.11dB, t = 14.10s

FTVd Extensions Multi-channel Image Deblurring (paper forthcoming) cross-channel or within-channel blurring a small number of FFTs per iteration convergence results have been generalized TV+L 1 deblurring models (codes hard to find) Other Extensions higher-order TV regularizers (reducing stair-casing) multi-term regularizations multiple splittngs locally weighted TV weighted shrinkage reconstruction from partial Fourier coefficients (MRI) min u TV (u) + λ Φu 1 + µ F p (u) f p 2

MRI Construction from 15% Coefficients Original 250 x 250 Original 250 x 250 Original 250 x 250 Original 250 x 250 Original 250 x 250 Original 250 x 250 SNR:14.74, t=0.09 SNR:16.12, t=0.09 SNR:17.72, t=0.08 SNR:16.40, t=0.10 SNR:13.86, t=0.08 SNR:17.27, t=0.10 250 by 250 Images: time 0.1s on a PC (3 GHz Pentium D).

CS ECS Rice L1 FPC FTVd Numerical Results Color Image Deblurring Original image: 512x512 Blurry & Noisy SNR: 5.1dB. deconvlucy: SNR=6.5dB, t=8.9 deconvreg: SNR=10.8dB, t=4.4 deconvwnr: SNR=10.8dB, t=1.4 MxNopt: SNR=16.3dB, t=1.6 Comparison to Matlab Toolbox: 512 512 Lena

Summary Take-Home Messages CS recoverability can be proved in 1 page via KGG. Prior information can never degrade CS recoverability, but may significantly enhance it. 1st-order methods can be fast thanks to solution sparsity (finite convergence, rates depending on sub-hessians). TV models can be solved quickly if structures exploited. Continuation is necessary to make algorithms practical. Rice has a long tradition in optim. algorithms/software.

The End Software FPC and FTVd available at: http://www.caam.rice.edu/ optimization/l1 Thank You!