Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Similar documents
Lecture 22: More On Compressed Sensing

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Recovering overcomplete sparse representations from structured sensing

AN INTRODUCTION TO COMPRESSIVE SENSING

COMPRESSED SENSING IN PYTHON

Sensing systems limited by constraints: physical size, time, cost, energy

Phase Transition Phenomenon in Sparse Approximation

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector

Compressed Sensing and Sparse Recovery

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Low-Dimensional Signal Models in Compressive Sensing

Sparse Optimization Lecture: Sparse Recovery Guarantees

Compressive Sensing and Beyond

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Analog-to-Information Conversion

Strengthened Sobolev inequalities for a random subspace of functions

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Introduction to Compressed Sensing

Compressed Sensing: Lecture I. Ronald DeVore

Part IV Compressed Sensing

COMPRESSIVE SAMPLING USING EM ALGORITHM. Technical Report No: ASU/2014/4

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Exponential decay of reconstruction error from binary measurements of sparse signals

Reconstruction from Anisotropic Random Measurements

Compressed Sensing: Extending CLEAN and NNLS

Recent Developments in Compressed Sensing

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

The Pros and Cons of Compressive Sensing

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Sparse Legendre expansions via l 1 minimization

SPARSE signal representations have gained popularity in recent

Solving Underdetermined Linear Equations and Overdetermined Quadratic Equations (using Convex Programming)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Signal Recovery from Permuted Observations

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Lecture Notes 9: Constrained Optimization

A Survey of Compressive Sensing and Applications

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

Compressed Sensing and Related Learning Problems

Stochastic geometry and random matrix theory in CS

Compressed sensing techniques for hyperspectral image recovery

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

Optimization-based sparse recovery: Compressed sensing vs. super-resolution

Lecture 5 : Sparse Models

Compressive Sensing with Random Matrices

Constrained optimization

Fighting the Curse of Dimensionality: Compressive Sensing in Exploration Seismology

Optimization for Compressed Sensing

Enhanced Compressive Sensing and More

Near Optimal Signal Recovery from Random Projections

Robust Principal Component Analysis

Combining geometry and combinatorics

CSC 576: Variants of Sparse Learning

An Overview of Compressed Sensing

2D X-Ray Tomographic Reconstruction From Few Projections

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

1 Regression with High Dimensional Data

PRIMAL DUAL PURSUIT A HOMOTOPY BASED ALGORITHM FOR THE DANTZIG SELECTOR

Compressed Sensing - Near Optimal Recovery of Signals from Highly Incomplete Measurements

Mathematical introduction to Compressed Sensing

Conditions for Robust Principal Component Analysis

Lecture 16: Compressed Sensing

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

Compressed Sensing and Neural Networks

Uncertainty principles and sparse approximation

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information

LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION. By Srikanth Jagabathula Devavrat Shah

Minimizing the Difference of L 1 and L 2 Norms with Applications

Lecture 3. Random Fourier measurements

The Pros and Cons of Compressive Sensing

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

Interpolation via weighted l 1 minimization

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

On the coherence barrier and analogue problems in compressed sensing

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Compressive Sensing Applied to Full-wave Form Inversion

A Nonuniform Quantization Scheme for High Speed SAR ADC Architecture

Generalized Orthogonal Matching Pursuit- A Review and Some

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

New Applications of Sparse Methods in Physics. Ra Inta, Centre for Gravitational Physics, The Australian National University

An Introduction to Sparse Approximation

Collaborative Filtering Matrix Completion Alternating Least Squares

The uniform uncertainty principle and compressed sensing Harmonic analysis and related topics, Seville December 5, 2008

Towards a Mathematical Theory of Super-resolution

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

Recovery Based on Kolmogorov Complexity in Underdetermined Systems of Linear Equations

Compressive Sensing of Sparse Tensor

5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE

Going off the grid. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

Transcription:

Compressed Sensing Huichao Xue CS3750 Fall 2011

Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear system Theory behind Compressed Sensing UUP explanation Coherence and Recovery RIP explanation

Accidental Discovery In 2004, Candes accidentally discovered the fact that L1-minimization helps to fill in the blanks on an undersampled picture effectively. The recovered picture is not just slightly better than the original, rather, the picture looks sharp and perfect in every detail.

What will this technology bring to us Being able to recover from incomplete data is very important: Less time spent on MRI or other sensing technologies Relieves storage requirement, because we only need incomplete data to recover all that we need Conserves energy

A very hot topic... I did a grep at http://dsp.rice.edu/cs, about 700 papers are published on CS during these 7 years. It is applied to many fields (of course including Machine Learning) Prof. Candes was rewarded with Waterman Prize 1. 1 http://en.wikipedia.org/wiki/alan_t._waterman_award

Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear system Theory behind Compressed Sensing UUP explanation Coherence and Recovery RIP explanation

Abstract Definition Definition Compressed Sensing or Compressive Sensing(CS) is about acquiring and recovering a sparse signal in the most efficient way possible (subsampling) with the help of an incoherent projecting basis. 2 1. The signal needs to be sparse 2. The technique acquires as few samples as possible 3. Later, the original sparse signal can be recovered 4. This done with the help of an incoherent projecting basis 2 found it this definition here: https://sites.google.com/site/igorcarron2/cs

The big picture Note that there is no compression step in the framework. The compression is done when sensing, that why this technique got the name Compressed Sensing.

Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear system Theory behind Compressed Sensing UUP explanation Coherence and Recovery RIP explanation

L p Norms Compressed Sensing s algorithm makes use of L 1 norm s properties. So Let s have a review of it. Definition L p norm of a vector x = (x 1, x 2,..., x n ) T is defined as: x p = ( x 1 p + x 2 p +... + x n p ) 1 p (1)

L p norms that are used in Compressed Sensing In particular Definition L 0 norm of x, x 0, is the number of non-zero entries in x. Definition L 1 norm of x: x = x 1 + x 2 +... + x n (2) Definition L 2 norm of x: x 2 = ( x 1 2 + x 2 2 +... + x n 2) 1 2 (3)

L p balls Here are the illustrations of L 1 and L 2 balls in 2-D space:

Recovering f from an underdetermined linear system Consider the scenario below: We want to recover f from the given y and Φ. Is that even possible? There could be an infinite number of solutions for f But what if we already know that f is sparse 3? 3 being sparse means having only a few non-zero values among all f s dimensions

Consider recovering x from projection from the given y and Φ The possible solutions for x lie in the yellow colored hyperplane. To limit the solution to be just one single point, we want to pick the sparsest x from that region. How do we define sparsity?

Comparison between L 1 and L 2 Norms will help us here. We hope: smaller norm sparser But which norm should we choose?

L 1 prefers sparsity Here minimizing L 1 provides a better result because in its solution ˆx, most of the dimensions are zero. Minimizing L 2 results in small values in some dimensions, but not necessarily zero.

But L 1 isn t always better Consider the graph on the left, when we try to find a solution for y = Φx for the given y and Φ The original sparse vector x 0 which generates y from the linear transformation Φ is shown in the graph When we solve the equation y = Φx, we get the hyperplane indicated by h. If we choose to minimize the L 1 -norm on h, then we will get a totally wrong result, which lies on a different axis than x 0 s. In Compressed Sensing, people develop conditions to ensure that this never happens.

Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear system Theory behind Compressed Sensing UUP explanation Coherence and Recovery RIP explanation

The algorithm s Context The linear system is underdetermined We want f to be sparse

The algorithm to find the proper f When no noise: min f R n f 1 s.t. y = Φf When there is noise: min f R n f 1 s.t. y Φf 2 ɛ The whole literature is trying to show that: in most cases, this is going to find a very good solution.

Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear system Theory behind Compressed Sensing UUP explanation Coherence and Recovery RIP explanation

Questions at this point When do we need to solve such underdetermined linear system problems? Is that really important? If it is important, how did people deal with this before CS was discovered? Why does CS always find a good solution?

The sound data, colored in red, is quite complicated. It is a time domain representation because the x-axis is time. Luckily, it also has another representation in frequency domain, colored in blue. This representation has the benefit When? And how did people handle that?

A review of Fourier Transform This is a demonstration of how data in time domain(lower graph) also can be constructed using a superposition of periodic signals(upper graph), each of which has a different frequency.

Formulas for Fourier Transform To go between the time domain and the frequency domain, we use Fourier Transforms: H n = N 1 k=0 h n = 1 N h k e 2πikn N (4) N 1 n=0 H n e 2πikn N (5) Here H is the frequency domain representation, and h is the time domain signal. Note that the transformations are linear.

Shannon-Nyquist Sampling Theorem Theorem If a function x(t) contains no frequencies higher than B hertz, it is completely determined by giving its ordinates at a series of points spaced 1 2B seconds apart. This basically says: x s frequency domain representation is sparse in the sense that all dimensions higher than B are zero. No information loss if we sample at 2 times the highest frequency. To do this, use Fourier transform.

The mapping to the underdetermined linear system Here is the mapping between the equation above and the Shannon-Nyquist scenario: f is the low frequency signal. Higher dimensions are all zero. Φ is the inverse Fourier Transform y is our samples in the time basis

What s different in CS Rather than trying to recover all information on low frequencies, CS recovers those with high amplitudes. With the assumption that only a few frequencies have high amplitudes, CS requires much less samples to recover them

How is that possible? It sounds quite appealing. But how do we do it? How do we pick the measurements so that the peaks information is preserved? Don t we need to know how the data look like beforehand? The big findings in CS: We only need the measurements to be incoherent to the sparse basis. Several randomly generated measurements are incoherent to every basis.

Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear system Theory behind Compressed Sensing UUP explanation Coherence and Recovery RIP explanation

The Uniform Uncertainty Principle Definition Φ obeys a UUP for sets of size K if 0.8 M N f 2 2 Φf 2 2 1.2 M N f 2 2 for every K-sparse vector f. Here M and N are the numbers of dimensions for x and f, correspondingly. Example Φ obeys UUP for K M/ log N when Φ = random Gaussian Φ = random binary Φ = randomly selected Fourier samples (extra log factors apply) We call these types of measurements incoherent

Sparse Recovery UUP basically means preserving the L 2 norms. UUP for sets of size 2K 4 there is only one K-sparse explanation for y. Therefore, say f 0 is K-sparse, and we measure y = Φf 0 : If we search for the sparsest vector that explains y, we will find f 0 min f #{t : f (t) 0} s.t. Φf = y Note that here we need to minimize L 0 -norm, which is hard. Can we make it a convex optimization problem? 4 This basically means preserving L 2 distances.

Using L 1 norm UUP for sets of size 4K will recover f 0 exactly min f f 1 s.t. Φf = y

Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear system Theory behind Compressed Sensing UUP explanation Coherence and Recovery RIP explanation

Coherence Definition The coherence between the sensing basis Φ and the representation basis Ψ is µ(φ, Ψ) = n max φ k, ψ j 1 k,j n Here sensing basis is used for sensing the object f, and the representation basis is used to represent f. Note that: µ(φ, Ψ) [1, n] Example Time-frequency pair: µ(φ, Ψ) = 1 When Φ = Ψ, µ(φ, Ψ) = n

Sample Size VS Coherence 5 min x x R n 1 s.t. y k = φ k, Ψ x, k M (6) Theorem Fix f R n and suppose that the coefficient sequence x of x in the basis Ψ is S-sparse. Select m measurements in the Φ domain uniformly at random. Then if m C µ 2 (Φ, Ψ) S log n for some positive constant C, the solution to (6) is exact with overwhelming probability. In the randomly generated matrices, if we choose the sensing basis uniformly at random, the coherence is likely to be 2 log n This means: m log 2 n S 5 This was developed by Candes and Romberg in 2007.

Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear system Theory behind Compressed Sensing UUP explanation Coherence and Recovery RIP explanation

RIP(Restricted Isometry Property) aka UUP Definition For each integer S = 1, 2,..., define the isometry constant δ S of a matrix A as the smallest number such that (1 δ S ) x 2 2 Ax 2 2 (1 + δ S) x 2 2 holds for all S-sparse vector x.

RIP implies accurate reconstruction If the RIP holds, then the following linear program gives an accurate reconstruction: min x x R n 1 s.t. A x = y(= Ax) (7) Theorem Assume that δ 2S < 2 1. Then the solution x to (7) obeys and x x 2 C 0 S x x S 1 x x 1 C 0 x x S 1 for some constant C 0, where x S is the vector x with all but the largest S components set to 0.

RIP implies robustness min x x R n 1 s.t. A x y 2 ɛ (8) Theorem Assume that δ 2S < 2 1. Then the solution x to (8) obeys for some constants C 0 and C 1. x x 2 C 0 S x x S 1 + C 1 ɛ

How do we find such A s The relations between m and S are missing in theorems (13) and (14). δ 2S provides the notion of incoherency. What kind of A and m support such a property? The answer is: A can be m rows of random numbers, where m C S log(n/s) You can t do much better than this.