Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego

Similar documents
Bayesian Methods for Sparse Signal Recovery

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Sparse Signal Recovery: Theory, Applications and Algorithms

Scale Mixture Modeling of Priors for Sparse Signal Recovery

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

An Introduction to Sparse Approximation

On the Role of the Properties of the Nonzero Entries on Sparse Signal Recovery

Generalized Orthogonal Matching Pursuit- A Review and Some

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

Compressed Sensing. 1 Introduction. 2 Design of Measurement Matrices

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Introduction to Compressed Sensing

Sparsity in Underdetermined Systems

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Compressed Sensing and Neural Networks

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Compressed Sensing and Related Learning Problems

SPARSE signal representations have gained popularity in recent

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

Greedy Signal Recovery and Uniform Uncertainty Principles

LEARNING DATA TRIAGE: LINEAR DECODING WORKS FOR COMPRESSIVE MRI. Yen-Huan Li and Volkan Cevher

Robust multichannel sparse recovery

Sparse analysis Lecture III: Dictionary geometry and greedy algorithms

L-statistics based Modification of Reconstruction Algorithms for Compressive Sensing in the Presence of Impulse Noise

Lecture 5 : Sparse Models

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Multipath Matching Pursuit

GREEDY SIGNAL RECOVERY REVIEW

Compressive Sensing and Beyond

Stability and Robustness of Weak Orthogonal Matching Pursuits

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Compressed Sensing: Extending CLEAN and NNLS

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Recovering overcomplete sparse representations from structured sensing

l 0 -norm Minimization for Basis Selection

Recent developments on sparse representation

Sparse Solutions of an Undetermined Linear System

A Tutorial on Compressive Sensing. Simon Foucart Drexel University / University of Georgia

A new method on deterministic construction of the measurement matrix in compressed sensing

Uniqueness Conditions for A Class of l 0 -Minimization Problems

IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER

Tractable Upper Bounds on the Restricted Isometry Constant

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Signal Recovery, Uncertainty Relations, and Minkowski Dimension

Sparsity Regularization

Recovery of Compressible Signals in Unions of Subspaces

Reconstruction from Anisotropic Random Measurements

Interpolation via weighted l 1 minimization

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1

Simultaneous Sparsity

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

Mismatch and Resolution in CS

COMPRESSED SENSING IN PYTHON

of Orthogonal Matching Pursuit

Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property

Sensing systems limited by constraints: physical size, time, cost, energy

Strengthened Sobolev inequalities for a random subspace of functions

Sparse & Redundant Signal Representation, and its Role in Image Processing

Making Flippy Floppy

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Robust Sparse Recovery via Non-Convex Optimization

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Fast Hard Thresholding with Nesterov s Gradient Method

Elaine T. Hale, Wotao Yin, Yin Zhang

Compressed Sensing and Linear Codes over Real Numbers

Compressed Sensing and Sparse Recovery

Analog-to-Information Conversion

Lecture 22: More On Compressed Sensing

Sparse Estimation and Dictionary Learning

Info-Greedy Sequential Adaptive Compressed Sensing

CSC 576: Variants of Sparse Learning

Optimization for Compressed Sensing

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit

Abstract This paper is about the efficient solution of large-scale compressed sensing problems.

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

Recent Developments in Compressed Sensing

MATCHING PURSUIT WITH STOCHASTIC SELECTION

The Iteration-Tuned Dictionary for Sparse Representations

Exact Topology Identification of Large-Scale Interconnected Dynamical Systems from Compressive Observations

Sparse linear models

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Topographic Dictionary Learning with Structured Sparsity

The Analysis Cosparse Model for Signals and Images

Reduced Complexity Models in the Identification of Dynamical Networks: links with sparsification problems

ORTHOGONAL matching pursuit (OMP) is the canonical

Stochastic geometry and random matrix theory in CS

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Sparse representation classification and positive L1 minimization

COMPRESSED Sensing (CS) is a method to recover a

A Continuation Approach to Estimate a Solution Path of Mixed L2-L0 Minimization Problems

A discretized Newton flow for time varying linear inverse problems

Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso

Lecture Notes 9: Constrained Optimization

Sparse Linear Models (10/7/13)

Transcription:

Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego 1

Outline Course Outline Motivation for Course Sparse Signal Recovery Problem Applications Computational Algorithms Greedy Search l 1 norm minimization Bayesian Methods Performance Guarantees Simulations Conclusions 2

Topics Sparse Signal Recovery Problem and Compressed Sensing Uniqueness Spark Greedy search techniques and their performance evaluation Coherence condition l1 methods and their performance evaluation Restricted isometry property (RIP) Bayesian methods Extensions (Block Sparsity, Multiple Measurement vectors) Dictionary Learning Message passing algorithms 3

Books Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing by Michael Elad Compressed Sensing: Theory and Applications, edited by Yonina C. Eldar and Gitta Kutyniok An Introduction to Compressive Sensing, Collection Editors: Richard Baraniuk, Mark A. Davenport, Marco F. Duarte, Chinmay Hegde A Mathematical Introduction to Compressive Sensing by Simon Foucart and Holger Rauhut 4

Administrative details Who should take this class and background? > Second year graduate students, recommend S/U Optimization theory, Estimation theory Recommend an application to motivate the work Grades Homework Project Office hours Tuesday 1-2pm Office: Jacobs Hall 6407 Email: brao@ucsd.edu Class Website: dsp.ucsd.edu 5

Outline Motivation for Course Sparse Signal Recovery Problem Applications Computational Algorithms Greedy Search l 1 norm minimization Bayesian Methods Performance Guarantees Simulations Conclusions 6

Motivation The concept of Sparsity has many potential applications. Unification of the theory will provide synergy. Methods developed for solving the Sparse Signal Recovery problem can be a valuable tool for signal processing practitioners. Many interesting developments in the recent past that make the subject timely. 7

Outline Motivation for Course Sparse Signal Recovery Problem Applications Computational Algorithms Greedy Search l 1 norm minimization Bayesian Methods Performance Guarantees Simulations Conclusions 8

Sparse Signal Recovery: Problem Description b A x ε n 1 measurements n m r nonzero entries, r << m b is n 1 measurement vector A is n m measurement/dictionary matrix, m >> n x is m 1 desired vector which is sparse with r nonzero entries ε is the measurement noise m 1 sparse signal 9

Early Works R. R. Hocking and R. N. Leslie, Selection of the Best Subset in Regression Analysis, Technometrics, 1967. S. Singhal and B. S. Atal, Amplitude Optimization and Pitch Estimation in Multipulse Coders, IEEE Trans. Acoust., Speech, Signal Processing, 1989 S. D. Cabrera and T. W. Parks, Extrapolation and spectral estimation with iterative weighted norm modification, IEEE Trans. Acoust., Speech, Signal Processing, April 1991. Many More works Our first work I.F. Gorodnitsky, B. D. Rao and J. George, Source Localization in Magnetoencephal0graphy using an Iterative Weighted Minimum Norm Algorithm, IEEE Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, Pages: 167-171, Oct. 1992 10

Problem Statement Noise Free Case: Given a target signal y and a dictionary Φ, find the weights x that solve: m min I( x 0) subject to b x i 1 i Ax where I(.) is the indicator function Noisy Case: Given a target signal y and a dictionary Φ, find the weights x that solve: m min I( x 0) subject to b Ax β x i 1 i 2 11

Complexity Search over all possible subsets, which would mean a search over a total of ( m C r ) subsets. Combinatorial Complexity. With m = 30;n = 20; and r= 10 there are 3 10 7 subsets (Very Complex) A branch and bound algorithm can be used to find the optimal solution. The space of subsets searched is pruned but the search may still be very complex. Indicator function not continuous and so not amenable to standard optimization tools. Challenge: Find low complexity methods with acceptable performance 12

Outline Motivation for Course Sparse Signal Recovery Problem Applications Computational Algorithms Greedy Search l 1 norm minimization Bayesian Methods Performance Guarantees Simulations Conclusions 13

Applications Signal Representation (Mallat, Coifman, Wickerhauser, Donoho,...) EEG/MEG (Leahy, Gorodnitsky, Ioannides,...) Functional Approximation and Neural Networks (Chen, Natarajan, Cun, Hassibi,...) Bandlimited extrapolations and spectral estimation (Papoulis, Lee, Cabrera, Parks,...) Speech Coding (Ozawa, Ono, Kroon, Atal,...) Sparse channel equalization (Fevrier, Greenstein, Proakis, ) Compressive Sampling (Donoho, Candes, Tao...) Magnetic Resonance Imaging (Lustig,..) Cognitive Radio (Eldar,..) 14

DFT Example Measurement y yl [ ] 2(cosω l cos ω l), l 0,1,2,..., n 1. n 64. ω 0 1 0 1 2π 33 2π 34, ω. 64 2 64 2 Dictionary Elements: ( m) ω 2 ω ( 1) ω 2π = [1, j l, j l,..., j n l T al e e e ], ωl = l m Consider m = 64, 128, 256 and 512. Questions: What is the result of a zero padded DFT? When viewed as problem of solving a linear system of equations (dictionary), what solution does the DFT give us? Are there more desirable solutions for this problem? 15

DFT Example Note that b = a + a = a + a + a + a + a (128) (128) (128) (128) 33 34 94 95 + a (256) (256) (256) (256) 66 68 188 190 = a + a + a + a (512) (512) (512) (512) 132 136 376 380 Consider the linear system of equations = ( m) b A x The frequency components in the data are in the dictionaries A (m) for m = 128, 256, 512. What solution among all possible solutions does the DFT compute? 16

DFT Example m=64 m=128 80 80 60 60 40 40 20 20 0 10 20 30 0 20 40 60 m=256 m=512 80 80 60 60 40 40 20 20 0 20 40 60 80 100 120 0 50 100 150 200 250 17

Sparse Channel Estimation Received seq. m 1 j= 0 ri () = s( i j) c() j + ε(), i i= 0,1,..., n 1 Training seq. Noise Channel impulse response 18

Example: Sparse Channel Estimation Formulated as a sparse signal recovery problem ro ( ) s(0) s( 1) s( m+ 1) co ( ) ε( o) r(1) s(1) s(0) s( m+ 2) c(1) ε(1) = + rn ( 1) sn ( 1) sn ( 2) s( m+ n) cm ( 1) ε( n 1) Can use any relevant algorithm to estimate the sparse channel coefficients 19

MEG/EEG Source Localization Maxwell s eqs.? source space (x) sensor space (b)

Compressive Sampling D. Donoho, Compressed Sensing, IEEE Trans. on Information Theory, 2006 E. Candes and T. Tao, Near Optimal Signal Recovery from random Projections: Universal Encoding Strategies, IEEE Trans. on Information Theory, 2006 21

Compressive Sampling Transform Coding Ψ x b What is the problem here? Sampling at the Nyquist rate Keeping only a small amount of nonzero coefficients Can we directly acquire the signal below the Nyquist rate? 22

Compressive Sampling Transform Coding Ψ x y Compressive Sampling Ф Ψ x Ф y b A 23

Compressive Sampling Compressive Sampling Ф Ψ x Ф y b A Computation: 1. Solve for x such that Ax = b 2. Reconstruction: y = Ψx Issues Need to recover sparse signal x with constraint Ax = b Need to design sampling matrix Ф 24

Robust Linear Regression X, y: data; c: regression coeffs.; n: model noise; y X c n w: Sparse Component, Outliers Model noise ε: Gaussian Component, Regular error Transform into overcomplete representation: Y = X c + Φ w + ε, where Φ=I, or Y = [X, Φ] c + ε w

Outline Motivation for Course Sparse Signal Recovery Problem Applications Computational Algorithms Greedy Search l1 norm minimization Bayesian Methods Performance Guarantees Simulations Conclusions 26

Potential Approaches Combinatorial Complexity and so need alternate strategies Greedy Search Techniques: Matching Pursuit, Orthogonal Matching Pursuit Minimizing Diversity Measures: Indicator function not continuous. Define Surrogate Cost functions that are more tractable and whose minimization leads to sparse solutions, e.g. minimization 1 Bayesian Methods: MAP estimation Empirical Bayes: Parameterized priors (Sparse Bayesian Learning) Message Passing Algorithms 27

GREEDY SEARCH TECHNIQUES 28

Greedy Search Method: Matching Pursuit Select a column that is most aligned with the current residual b A x ε r (0) = b S (i) : set of indices selected T l= argmax a r 1 j m Remove its contribution from the residual Update S (i) : If Update r (i) : j ( i 1) ( i 1) ( i) ( i 1) l S S S l, = {} r = r = r aa r P l ( i) ( i 1) ( i 1) T ( i 1) a l l Practical stop criteria: Certain # iterations r () i smaller than 2 threshold. Or, keep S (i) the same 29

Question related to Matching Pursuit Type Algorithms Alternate search techniques Performance Guarantees 30

MINIMIZING DIVERSITY MEASURES 31

Inverse Techniques For the systems of equations Ax = b, the solution set is characterized by {x s : x s = A + y + v; v N(A)}, where N(A) denotes the null space of A and A + = A T (AA T ) -1. Minimum Norm solution: The minimum l 2 norm solution x mn = A + b is a popular solution Noisy Case: regularized l 2 norm solution often employed and is given by x reg = A T (AA T +λi) -1 b 32

Minimum 2-Norm Solution Problem: Minimum l 2 norm solution is not sparse Example: 1 0 1 1 A =, b= 0 1 1 0 T 2 1 1 x mn = vs. x [ ] 3 3 3 = 1 0 0 T DFT: Also computes minimum 2-norm solution 33

Diversity Measures m min I( x 0) subject to b x i 1 i Ax Functionals whose minimization leads to sparse solutions Many examples are found in the fields of economics, social science and information theory These functionals are usually concave which leads to difficult optimization problems 34

Examples of Diversity Measures l (p 1) Diversity Measure ( p) E m x xi p p i 1 ( ), 1 As p 0, m m ( p) p lim E ( x) lim xi Ix ( i 0) p 0 p 0 i 1 i 1 Gaussian Entropy m ( G) E x ε xi i 1 ( ) ln( ) 2 35

l 1 Diversity Measure Noiseless case m min x subject to Ax= b x i= 1 i Noisy case l 1 regularization [Candes, Romberg, Tao] m min subject to x i 2 i= 1 x b Ax β Lasso [Tibshirani], Basis Pursuit De-noising [Chen, Donoho, Saunders] min x m 2 b Ax + λ 2 x i= 1 i 36

Attractiveness of l 1 methods Convex Optimization and associated with rich class of optimization algorithms Interior-point methods Coordinate descent method. Question What is the ability to find the sparse solution? 37

Why diversity measure encourages sparse solutions? p T min [ x, x ] subject to ax + ax = b 1 2 p 1 1 2 2 ax 1 1+ ax 2 2 = b x 2 equal-norm contour x 2 x 2 x 1 x 1 x 1 0 p < 1 p = 1 p > 1 38

Example with l 1 diversity measure 1 0 1 1 A =, b= 0 1 1 0 Noiseless Case x BP = [1, 0, 0] T (machine precision) Noisy Case Assume the measurement noise ε = [0.01, -0.01] T 1 regularization result: x l1r = [0.986, 0, 8.77 10-6 ] T Lasso result (λ = 0.05): x lasso = [0.975, 0, 2.50 10-5 ] T 39

Example with l 1 diversity measure Continue with the DFT example: yl [ ] 2(cosω0l cos ω1l), l 0,1,2,..., n 1. n 64. 2π 33 2π 34 ω0, ω1. 64 2 64 2 64, 128,256,512 DFT cannot separate the adjacent frequency components Using l 1 diversity measure minimization (m=256) 1 0.8 0.6 0.4 0.2 0 50 100 150 200 250 40

BAYESIAN METHODS 41

Bayesian Methods Maximum Aposteriori Approach (MAP) Assume a sparsity inducing prior on the latent variable x Developing an appropriate MAP estimation algorithm x = arg max pxb ( ) = arg max pb ( xpx ) ( ) x Empirical Bayes Assume a parameterized prior for the latent variable x (hyper-parameters) Marginalize over the latent variable x and estimate the hyper-parameters Determine the posterior distribution of x and obtain a point as the mean, mode or median of this density x

Outline Motivation for Course Sparse Signal Recovery Problem Applications Computational Algorithms Greedy Search l 1 norm minimization Bayesian Methods Performance Guarantees Simulations Conclusions 43

Important Questions When is the l 0 solution unique? When is the l 1 solution equivalent to that of l 0? Noiseless Case Noisy Measurements What are the limits of recovery in the presence of noise? How to design the dictionary matrix A? 44

Outline Motivation for Course Sparse Signal Recovery Problem Applications Computational Algorithms Greedy Search l 1 norm minimization Bayesian Methods Performance Guarantees Simulations Conclusions 45

Empirical Example For each test case: 1. Generate a random dictionary A with 50 rows and 100 columns. 2. Generate a sparse coefficient vector x 0. 3. Compute signal via b = A x 0 (noiseless). 4. Run BP and OMP, as well as a competing Bayesian method called SBL (more on this later) to try and correctly estimate x 0. 5. Average over1000 trials to compute empirical probability of failure. Repeat with different sparsity values, i.e., ranging from 10 to 30. x 0 0

Amplitude Distribution If the magnitudes of the non-zero elements in x 0 are highly scaled, then the canonical sparse recovery problem should be easier. x 0 x 0 scaled coefficients (easy) uniform coefficients (hard) The (approximate) Jeffreys distribution produces sufficiently scaled coefficients such that best solution can always be easily computed.

Sample Results (n = 50, m = 100) Approx. Jeffreys Coefficients Unit Coefficients Error Rate Error Rate x0 x 0 0 0

Imaging Applications 1. Recovering fiber track geometry from diffusion weighted MR images [Ramirez-Manzanares et al. 2007]. 2. Multivariate autoregressive modeling of fmri time series for functional connectivity analyses [Harrison et al. 2003]. 3. Compressive sensing for rapid MRI [Lustig et al. 2007]. 4. MEG/EEG source localization [Sato et al. 2004; Friston et al. 2008].

Variants and Extensions Block Sparsity Multiple Measurement Vectors Dictionary Learning Scalable Algorithms Message Passing Algorithms Sparsity for more general inverse problems More to come 50

Summary Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving the Sparse Signal Recovery problem can be valuable tools for signal processing practitioners. Rich set of computational algorithms, e.g., Greedy search (OMP) l 1 norm minimization (Basis Pursuit, Lasso) MAP methods (Reweighted l 1 and l 2 methods) Bayesian Inference methods like SBL (show great promise) Potential for great theory in support of performance guarantees for algorithms. Expectation is that there will be continued growth in the application domain as well as in the algorithm development.