Compressed Sensing and Sparse Recovery

Similar documents
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Robust Principal Component Analysis

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Lecture Notes 9: Constrained Optimization

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Reconstruction from Anisotropic Random Measurements

Strengthened Sobolev inequalities for a random subspace of functions

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Signal Recovery from Permuted Observations

Low-Rank Matrix Recovery

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Sparse Legendre expansions via l 1 minimization

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Gaussian Graphical Models and Graphical Lasso

Solution Recovery via L1 minimization: What are possible and Why?

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Recovering overcomplete sparse representations from structured sensing

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

Compressed Sensing: Lecture I. Ronald DeVore

1 Regression with High Dimensional Data

Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization

Interpolation via weighted l 1 minimization

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Greedy Signal Recovery and Uniform Uncertainty Principles

GREEDY SIGNAL RECOVERY REVIEW

Stochastic geometry and random matrix theory in CS

Combining geometry and combinatorics

Z Algorithmic Superpower Randomization October 15th, Lecture 12

An Introduction to Sparse Approximation

Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso

Sparse Optimization Lecture: Sparse Recovery Guarantees

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

Exact Reconstruction Conditions and Error Bounds for Regularized Modified Basis Pursuit (Reg-Modified-BP)

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Stability and Robustness of Weak Orthogonal Matching Pursuits

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

On Optimal Frame Conditioners

Solving Corrupted Quadratic Equations, Provably

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

Constrained optimization

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Optimization for Compressed Sensing

Compressed Sensing and Robust Recovery of Low Rank Matrices

Error Correction via Linear Programming

Optimization methods

Sparsity Regularization

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Necessary and sufficient conditions of solution uniqueness in l 1 minimization

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Optimization methods

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Multipath Matching Pursuit

AN INTRODUCTION TO COMPRESSIVE SENSING

The uniform uncertainty principle and compressed sensing Harmonic analysis and related topics, Seville December 5, 2008

STAT 200C: High-dimensional Statistics

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Conditions for Robust Principal Component Analysis

Introduction to Compressed Sensing

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

IEOR 265 Lecture 3 Sparse Linear Regression

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Compressive Sensing with Random Matrices

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Near Optimal Signal Recovery from Random Projections

ACCORDING to Shannon s sampling theorem, an analog

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Compressive Sensing Theory and L1-Related Optimization Algorithms

Optimisation Combinatoire et Convexe.

Compressed Sensing and Neural Networks

Tractable Upper Bounds on the Restricted Isometry Constant

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Enhanced Compressive Sensing and More

A New Estimate of Restricted Isometry Constants for Sparse Solutions

Analysis of Greedy Algorithms

COMPRESSED SENSING IN PYTHON

COMPRESSED Sensing (CS) is a method to recover a

Recent Developments in Compressed Sensing

Existence of minimizers

Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May

CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles

Analysis of Robust PCA via Local Incoherence

High-dimensional graphical model selection: Practical and information-theoretic limits

Lecture 3. Random Fourier measurements

Convex Optimization M2

6 Compressed Sensing and Sparse Recovery

Lecture 22: More On Compressed Sensing

Sensing systems limited by constraints: physical size, time, cost, energy

Minimizing the Difference of L 1 and L 2 Norms with Applications

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Transcription:

ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217

Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing 6-2

Motivation: wastefulness of data acquisition Conventional paradigms for data acquisition: Measure full data Compress (by discarding a large fraction of coefficients) Problem: data is often highly compressible Most of acquired data can be thrown away without any perceptual loss Compressed sensing 6-3

Blind sensing Ideally, if we know a priori which coefficients are worth estimating, then we can simply measure these coefficients Unfortunately, we often have no idea which coefficients are most relevant Compressed sensing: compression on the fly mimic the behavior of the above ideal situation without pre-computing all coefficients often achieved by random sensing mechanism Compressed sensing 6-4

Why go to so much effort to acquire all the data when most of what we get will be thrown away? Can t we just directly measure the part that won t end up being thrown away? David Donoho Compressed sensing 6-5

Setup: sparse recovery = Recover x R p given y = Ax where A = [a 1,, a n ] R n p (n p): sampling matrix; a i : sampling vector; x: sparse signal Compressed sensing 6-6

Restricted isometry properties

Optimality for l minimization minimize x R p x s.t. Ax = y If instead a sparser feasible x x s.t. x x = k, then A (x x) =. (6.1) We don t want (6.1) to happen, so we hope A ( ) x x }{{}, x with x k 2k sparse To simultaneously account for all k-sparse x, we hope A T ( T 2k) to have full column rank, where A T consists of all columns of A at indices from T Compressed sensing 6-8

Restricted isometry property (RIP) Definition 6.1 (Restricted isometry constant) Restricted isometry constant δ k of A is smallest quantity s.t. (1 δ k ) x 2 Ax 2 (1 + δ k ) x 2 (6.2) holds for all k-sparse vector x R p Equivalently, (6.2) says max S: S =k A S A S I k = δ k }{{} near orthonormality where A S consists of all columns of A at indices from S Compressed sensing 6-9

Restricted isometry property (RIP) Definition 6.1 (Restricted isometry constant) Restricted isometry constant δ k of A is smallest quantity s.t. (1 δ k ) x 2 Ax 2 (1 + δ k ) x 2 (6.2) holds for all k-sparse vector x R p (Homework) For any x 1, x 2 that are supported on disjoint subsets S 1, S 2 with S 1 s 1 and S 2 s 2 : Ax 1, Ax 2 δ s1+s 2 x 1 2 x 2 2 (6.3) Compressed sensing 6-9

RIP and l minimization Fact 6.2 minimize x R p x s.t. Ax = y (Exercise) Suppose x is k-sparse. If δ 2k < 1, then l -minimization is exact and unique. Compressed sensing 6-1

RIP and l 1 minimization Theorem 6.3 Suppose x is k-sparse. If δ 2k < 2 1, then l 1 -minimization is exact and unique. RIP implies success of l 1 minimization A universal result: works simultaneously for all k-sparse signals As we will see later, many random designs satisfy this condition with near-optimal sample complexity Compressed sensing 6-11

Proof of Theorem 6.3 Suppose x + h is feasible and obeys x + h 1 x 1. The goal is to show that h = under RIP. The key is to decompose h into h T + h T1 +... T : locations of k largest entries of x T 1 : locations of k largest entries of h in T c T 2 : locations of k largest entries of h in (T T 1 ) c... Compressed sensing 6-12

Proof of Theorem 6.3 The proof proceeds by showing that 1. h T T 1 dominates h (T T 1 ) c (by objective function) 2. (converse) h (T T 1 ) c dominates h T T 1 (by RIP + feasibility) These can happen simultaneously only when h vanishes Compressed sensing 6-13

Proof of Theorem 6.3 Step 1 (depending only on objective function). Show that j 2 h T j 1 k h T 1. (6.4) This follows immediately by combining the following 2 observations: (i) Since x + h is assumed to be a better estimate: x 1 x + h 1 = x + h T 1 + h T c 1 x 1 h T 1 + h T c }{{}}{{} 1 since T is support of x triangle inequality = h T c 1 h T 1 (6.5) (ii) Since entries of h Tj 1 uniformly dominate those of h Tj (j 2): h Tj k h Tj k h T j 1 1 k k = j 2 h Tj 1 j 2 = 1 k h Tj 1 1 h Tj 1 1 = 1 k h T c 1 (6.6)

Proof of Theorem 6.3 Step 2 (using feasibility + RIP). Show that ρ < 1 s.t. h T T 1 ρ j 2 h T j (6.7) If this claim holds, then h T T 1 ρ h T j (6.4) ρ 1 h T 1 j 2 k ρ 1 ( ) k ht = ρ h T ρ h T T 1. (6.8) k Since ρ < 1, we necessarily have h T T 1 =, which together with (6.5) yields h = Compressed sensing 6-15

Proof of Theorem 6.3 We now prove (6.7). To connect h T T 1 with h (T T 1) c, we use feasibility: Ah = Ah T T 1 = j 2 Ah T j, which taken collectively with RIP yields (1 δ 2k ) h T T 1 2 Ah T T 1 2 = AhT T 1, j 2 Ah T j. It follows from (6.3) that for all j 2, which gives Ah T T 1, Ah Tj Ah T, Ah Tj + Ah T1, Ah Tj (6.3) δ 2k ( h T + h T1 ) h Tj δ 2k 2 ht T 1 h Tj, (1 δ 2k ) h T T 1 2 j 2 Ah T T 1, Ah Tj 2δ 2k h T T 1 j 2 h T j This establishes (6.7) if ρ := 2δ 2k 1 δ 2k < 1 (or equivalently, δ 2k < 2 1). Compressed sensing 6-16

Theorem 6.4 Robustness for compressible signals If δ 2k < 2 1, then the solution ˆx to l 1 -minimization obeys ˆx x x x k 1 k, where x k is best k-term approximation of x Suppose l th largest entry of x is 1/l α for some α > 1, then 1 x x k 1 1 k k l>k l α k α+.5 1 l 1 -min works well in recovering compressible signals Follows similar arguments as in proof of Theorem 6.3 Compressed sensing 6-17

Proof of Theorem 6.4 Step 1 (depending only on objective function). Show that j 2 h T j 1 k h T 1 + 2 k x x T 1. (6.9) This follows immediately by combining the following 2 observations: (i) Since x + h is assumed to be a better estimate: x T 1 + x T c 1 = x 1 x + h 1 = x T + h T 1 + x T c + h T c 1 x T 1 h T 1 + h T c 1 x T c 1 = h T c 1 h T 1 + 2 x T c 1 (6.1) (ii) Recall from (6.6) that j 2 h T j 1 k h T c 1. We highlight in red the part different from the proof of Theorem 6.3. Compressed sensing 6-18

Proof of Theorem 6.4 Step 2 (using feasibility + RIP). Recall from (6.7) that ρ < 1 s.t. If this claim holds, then h T T 1 ρ j 2 h T j h T T 1 ρ j 2 h T j (6.11) (6.1) and (6.6) ρ 1 k { h T 1 + 2 x T c 1 } ρ 1 k ( k ht + 2 x T c 1 ) = ρ h T + 2ρ k x T c 1 ρ h T T 1 + 2ρ k x T c 1. = h T T 1 2ρ x T c 1. (6.12) 1 ρ k We highlight in red the part different from the proof of Theorem 6.3. Compressed sensing 6-19

Proof of Theorem 6.4 Finally, putting the above together yields h h T T 1 + h (T T 1 ) c (6.9) h T T 1 + 1 k h T 1 + 2 k x x T 1 h T T 1 + h T + 2 k x x T 1 2 h T T 1 + 2 k x x T 1 (6.12) 2(1 + ρ) x x T 1 1 ρ k We highlight in red the part different from the proof of Theorem 6.3. Compressed sensing 6-2

Which design matrix satisfies RIP? First example: i.i.d. Gaussian design Lemma 6.5 ( ) A random matrix A R n p with i.i.d. N, 1 n entries satisfies δ k < δ with high prob., as long as n 1 δ 2 k log p k This is where non-asymptotic random matrix theory enters Compressed sensing 6-21

Gaussian random matrices Lemma 6.6 (See Vershynin 1) Suppose B R n k is composed of i.i.d. N (, 1) entries. Then ) P( n 1 σ max (B) > 1 + k n + t e nt2 /2 ) P( n 1 σ min (B) < 1 k n t e nt2 /2. When n k, one has 1 n B B I k Similar results (up to different constants) hold for i.i.d. sub-gaussian matrix Compressed sensing 6-22

Proof of Lemma 6.5 1. Fix any index subset S {1,, }, S = k, then A S (submatrix of A consisting of columns at indices from S) obeys ) A S A S I k O ( k/n + t with prob. exceeding 1 2e c 1nt 2, where c 1 > is constant. 2. Taking a union bound over all S {1,, p}, S = k yields ) δ k = max A S A S I k O ( k/n + t S: S =k with prob. exceeding 1 2 ( p k) e c 1 nt 2 1 2e k log(ep/k) c 1nt 2. Thus, δ k < δ with high prob. as long as n δ 2 k log(p/k). Compressed sensing 6-23

Other design matrices that satisfy RIP Random matrices with i.i.d. sub-gaussian entries, as long as n k log(p/k) Random partial DFT matrices with n k log 4 p, where rows of A are independently sampled from rows of DFT matrix F (Rudelson & Vershynin 8) If you have learned entropy method / generic chaining, check out Rudelson & Vershynin 8 and Candes & Plan 11 Compressed sensing 6-24

Other design matrices that satisfy RIP Random convolution matrices with n k log 4 p, where rows of A are independently sampled from rows of G = g g 1 g 2 g p 1 g p 1 g g 1 g p 2 g p 2 g p 1 g g p 3....... g 1 g 2 g 3 g with P(g i = ±1) =.5 (Krahmer, Mendelson, & Rauhut 14) Compressed sensing 6-25

RIP guarantees success of many other methods Example: projected gradient descent (iterative hard thresholding) alternate between gradient descent: z t x t µ t A (Ax t y) }{{} gradient of 1 2 y Ax 2 projection: keep only k largest (in magnitude) entries Compressed sensing 6-26

Iterative hard thresholding (IHT) Algorithm 6.1 Projected gradient descent / iterative hard thresholding for t =, 1, : ) x t+1 = P k (x t µ t A (Ax t y) where P k (x) := arg min z x is best k-term approximation of x z =k Compressed sensing 6-27

Geometric convergence of IHT under RIP Theorem 6.7 (Blumensath & Davies 9) Suppose x is k-sparse, and RIP constant δ 3k < 1/2. Then taking µ t 1 gives x t x (2δ 3k ) t x x ( ) Under RIP, IHT attains ɛ-accuracy within O log 1 ɛ iterations Each iteration takes time proportional to a matrix-vector product Drawback: need prior knowledge on k Compressed sensing 6-28

Numerical performance of IHT Relative error xt x x vs. iteration count t (n = 1, k = 5, p = 1, A i,j N (, 1/n)) Compressed sensing 6-29

Proof of Theorem 6.7 Let z := x t A (Ax t y) = x t A A(x t x). By definition of P k, x }{{} k-sparse z 2 x }{{} t+1 z 2 best k-sparse = x t+1 x 2 2 x t+1 x, z x + z x 2 which gives = x t+1 x 2 2 x t+1 x, z x = 2 x t+1 x, (I A A)(x t x) 2δ 3k x t+1 x x t x (6.13) x t+1 x 2δ 3k x t x as claimed. Here, (6.13) follows from the following fact (homework) u, (I A A)v δ s u v with s = supp (u) supp (v) Compressed sensing 6-3

A RIPless theory

Is RIP necessary? RIP leads to a universal result holding simultaneously for all k-sparse x Universality is often not needed as we might only care about a particular x There may be a gap between the regime where RIP holds and the regime in which one has minimal measurements Certifying RIP is hard Can we develop a non-universal RIPless theory? Compressed sensing 6-32

A standard recipe 1. Write out Karush-Kuhn-Tucker (KKT) optimality conditions typically involve certain dual variables 2. Construct dual variables satisfying KKT conditions Compressed sensing 6-33

Karush-Kuhn-Tucker (KKT) condition Consider a convex problem minimize x f(x) s.t. Ax y = Lagrangian: L (x, ν) := f(x) + ν (Ax y) (ν : Lagrangian multiplier) If x is optimizer, then KKT optimality condition reads { = v L(x, v) x L(x, v) Compressed sensing 6-34

Karush-Kuhn-Tucker (KKT) condition Consider a convex problem minimize x f(x) s.t. Ax y = Lagrangian: L (x, ν) := f(x) + ν (Ax y) (ν : Lagrangian multiplier) If x is optimizer, then KKT optimality condition reads { Ax y = f(x) + A ν (no constraint on ν) Compressed sensing 6-34

KKT condition for l 1 minimization minimize x x 1 s.t. Ax y = If x is optimizer, then KKT optimality condition reads { Ax y =, (naturally satisfied as x is truth) x 1 + A ν (no constraint on ν) u range(a ) s.t. { ui = sign(x i ), if x i u i [ 1, 1], else Depends only on signs of x i s irrespective of their magnitudes Compressed sensing 6-35

Uniqueness Theorem 6.8 (A sufficient and almost necessary condition) Let T := supp(x). Suppose A T has full column rank. If u = A ν for some ν R n s.t. { ui = sign(x i ), if x i, u i ( 1, 1), else then x is unique solution to l 1 minimization. Only slightly stronger than KKT! ν is said to be a dual certificate recall that ν is Lagrangian multiplier Finding ν comes down to solving another convex problem Compressed sensing 6-36

Geometric interpretation of dual certificate When u 1 < 1, solution is unique When u 1 = 1, solution is non-unique When we are able to find u range(a ) s.t. u 2 = sign(x 2 ) and u 1 < 1, then x (with x 1 = ) is unique solution to l 1 minimization Compressed sensing 6-37

Let w x 1 be optimizer with h T c Proof of Theorem 6.8 { w i = sign(x i ), w i = sign(h i ),, then if i T (support of x); else. If x + h is x 1 x + h 1 x 1 + w, h = x 1 + u, h + w u, h resulting in contradiction. = x 1 + A }{{ ν } h + assumption on u, (sign(h i )h i u i h i ) i/ T = x 1 + ν, }{{} Ah + := (feasibility) ( h i u i h i ) i/ T x 1 + i/ T (1 u i ) h i > x 1, Further, when h T c =, one must have h T = from left-invertibility of A T, and hence h = h T + h T c = Compressed sensing 6-38

Constructing dual certificates under Gaussian design We illustrate how to construct dual certificates for the following setup x R p is k-sparse Entries of A R n p are i.i.d. standard Gaussian Sample size n obeys n k log p Compressed sensing 6-39

Constructing dual certificates under Gaussian design Find ν R n s.t. (A ν) T = sign(x T ) (6.14) (A ν) i < 1, i / T (6.15) Step 1: propose a ν compatible with linear constraints (6.14). One candidate is least squares solution: ν = A T (A T A T ) 1 sign(x T ) (explicit expression) LS solution minimizes ν, which will also be helpful when controlling (A ν) i From Lemma 6.6, A T A T is invertible when n k log p Compressed sensing 6-4

Constructing dual certificates under Gaussian design Step 2: verify (6.15), which amounts to controlling max A :,i, A T (A T A T ) 1 sign(x T ) i/ T }{{}}{{} ith column of A ν Since A :,i N (, I n ) and ν are independent for any i / T, ν can be bounded by max i/ T A :,i, ν ν log p ν A T (A T A T ) 1 sgn(x T ) = ( A T A T ) 1/2 k k/n }{{} eigenvalues n When n/(k log p) is sufficiently large, max i/ T A :,i, ν < 1 Exerciese: fill in missing details Compressed sensing 6-41

More general random sampling Consider a random design: each sampling vector a i is independently drawn from a distribution F a i F Incoherence sampling: Isotropy: E[aa ] = I, a F components of a: (i) unit variance; (ii) uncorrelated Incoherence: let µ(f ) be the smallest quantity s.t. for a F, a 2 µ(f ) with high prob. Compressed sensing 6-42

Incoherence We want µ(f ) (resp. A) to be small (resp. dense)! What happen if sampling vectors a i are sparse? Example: a i Uniform({ pe 1,, pe p }) }{{} y no information = p 1 1 1 1 }{{} A 3 5 }{{} x Compressed sensing 6-43

Incoherent random sampling Compressed sensing 6-44

A general RIPless theory Theorem 6.9 (Candes & Plan, 11) Suppose x R p is k-sparse, and a i ind. F is isotropic. Then l 1 minimization is exact and unique with high prob., provided that n µ(f )k log p Near-optimal even for highly structured sampling matrices Proof idea: produce an (approximate) dual certificate by a clever golfing scheme pioneered by David Gross Compressed sensing 6-45

Examples of incoherent sampling Binary sensing: P(a[i] = ±1) =.5: E[aa ] = I, a 2 = 1, µ = 1 = l 1 -min succeeds if n k log p Partial Fourier transform: pick a random frequency f Unif {, 1 p,, p 1 } p or f Unif[, 1] and set a[i] = e j2πfi : E[aa ] = I, a 2 = 1, µ = 1 = l 1 -min succeeds if n k log p Improves upon RIP-based result (n k log 4 p) Compressed sensing 6-46

Examples of incoherent sampling Random convolution matrices: rows of A are independently sampled from rows of g g 1 g 2 g p 1 g p 1 g g 1 g p 2 G = g p 2 g p 1 g g p 3....... g 1 g 2 g 3 g with P(g i = ±1) =.5. One has E[aa ] = I, a 2 = 1, µ = 1 = l 1 -min succeeds if n k log p Improves upon RIP-based result (n k log 4 p) Compressed sensing 6-47

A general scheme for dual construction Find ν R n s.t. A T ν = sign(x T ) (6.16) A T cν < 1 (6.17) A candidate: least squares solution w.r.t. (6.16) ν = A T (A T A T ) 1 sign(x T ) (explicit expression) To verify (6.17), we need to control A T ca T (A T A T ) 1 sign(x T ) Issue 1: in general, A T c and A T are dependent Issue 2: (A T A T ) 1 is hard to deal with Compressed sensing 6-48

A general scheme for dual construction Find ν R n s.t. A T ν = sign(x T ) (6.16) A T cν < 1 (6.17) Key idea 1: use iterative scheme to solve 1 minimize ν 2 A T ν sign(x T ) 2 for t = 1, 2, ) ν (t) = ν (t 1) A T (A T ν (t 1) sign(x T ) }{{} grad of 1 2 A T v sign(x T ) 2 Converges to a solution obeying (6.16); no inversion involved Issue: complicated dependency across iterations Compressed sensing 6-48

Golfing scheme (Gross 11) Key idea 2: sample splitting use independent samples for each iteration to decouple statistical dependency Partition A into L row blocks A (1) R n1 p,, A (L) R n L p }{{} independent for t = 1, 2, (stochastic gradient) ν (t) = ν (t 1) µ t A (t) ( ) T A T ν (t 1) sign(x T ) }{{} R n t (but we need ν R n ) Compressed sensing 6-49

Golfing scheme (Gross 11) Key idea 2: sample splitting use independent samples for each iteration to decouple statistical dependency Partition A into L row blocks A (1) R n1 p,, A (L) R n L p }{{} independent for t = 1, 2, (stochastic gradient) ν (t) = ν (t 1) µ t à (t) ( ) T A T ν (t 1) sign(x T ) where Ã(t) = A (t) R n p is obtained by zero-padding Compressed sensing 6-49

Golfing scheme (Gross 11) ν (t) = ν (t 1) µ t à (t) T A T ν (t 1) sign(x T ) }{{} depends only on A (1),,A (t 1) Statistical independence across iterations By construction, A T ν(t 1) range(a (i) T ) range(a (L) (exercise) T ) Each iteration brings us closer to the target (like each golf shot brings us closer to the hole) Compressed sensing 6-5

Reference [1] Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, E. Candes, J. Romberg, and T. Tao, IEEE Transactions on Information Theory, 26. [2] Compressed sensing, D. Donoho, IEEE Transactions on Information Theory, 26 [3] Near-optimal signal recovery from random projections: Universal encoding strategies?, E. Candes, and T. Tao, IEEE Transactions on Information Theory, 26. [4] Lecture notes, Advanced topics in signal processing (ECE 821), Y. Chi, 215. [5] A mathematical introduction to compressive sensing, S. Foucart and H. Rauhut, Springer, 213. [6] On sparse reconstruction from Fourier and Gaussian measurements, M. Rudelson and R. Vershynin, Communications on Pure and Applied Mathematics, 28. Compressed sensing 6-51

Reference [7] Decoding by linear programming, E. Candes, and T. Tao, IEEE Transactions on Information Theory, 26. [8] The restricted isometry property and its implications for compressed sensing, E. Candes, Compte Rendus de l Academie des Sciences, 28. [9] Introduction to the non-asymptotic analysis of random matrices, R. Vershynin, Compressed Sensing: Theory and Applications, 21. [1] Iterative hard thresholding for compressed sensing, T. Blumensath, M. Davies, Applied and computational harmonic analysis, 29. [11] Recovering low-rank matrices from few coefficients in any basis, D. Gross, IEEE Transactions on Information Theory, 211. [12] A probabilistic and RIPless theory of compressed sensing, E. Candes and Y. Plan, IEEE Transactions on Information Theory, 211. Compressed sensing 6-52

Reference [13] Statistical learning with sparsity: the Lasso and generalizations, T. Hastie, R. Tibshirani, and M. Wainwright, 215. [14] Suprema of chaos processes and the restricted isometry property, F. Krahmer, S. Mendelson, and H. Rauhut, Communications on Pure and Applied Mathematics, 214. Compressed sensing 6-53