Scale Mixture Modeling of Priors for Sparse Signal Recovery

Size: px

Start display at page:

Download "Scale Mixture Modeling of Priors for Sparse Signal Recovery"

Drusilla Hall
5 years ago
Views:

1 Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri

2 Outline

3 Outline Sparse Signal Recovery (SSR) Problem and some Extensions

4 Outline Sparse Signal Recovery (SSR) Problem and some Extensions Scale Mixture Priors Gaussian Scale Mixture (GSM) Laplacian Scale Mixture (LSM) Power Exponential Scale Mixture (PESM)

5 Outline Sparse Signal Recovery (SSR) Problem and some Extensions Scale Mixture Priors Gaussian Scale Mixture (GSM) Laplacian Scale Mixture (LSM) Power Exponential Scale Mixture (PESM) Bayesian Methods MAP estimation (Type I) Hierarchical Bayes (Type II)

6 Outline Sparse Signal Recovery (SSR) Problem and some Extensions Scale Mixture Priors Gaussian Scale Mixture (GSM) Laplacian Scale Mixture (LSM) Power Exponential Scale Mixture (PESM) Bayesian Methods MAP estimation (Type I) Hierarchical Bayes (Type II) Experimental Results

7 Outline Sparse Signal Recovery (SSR) Problem and some Extensions Scale Mixture Priors Gaussian Scale Mixture (GSM) Laplacian Scale Mixture (LSM) Power Exponential Scale Mixture (PESM) Bayesian Methods MAP estimation (Type I) Hierarchical Bayes (Type II) Experimental Results Summary

8 Problem Description: Sparse Signal Recovery (SSR) y is a N 1 measurement vector. Φ is N M dictionary matrix where M >> N. x is M 1 desired vector which is sparse with k non zero entries. v is the measurement noise.

9 Extensions

10 Extensions Block Sparsity

11 Extensions Block Sparsity Multiple Measurement Vectors (MMV)

12 Extensions Block Sparsity Multiple Measurement Vectors (MMV) Block MMV MMV with time varying sparsity

13 Multiple Measurement Vectors (MMV) Multiple measurements: L measurements Common Sparsity Profile: k nonzero rows

14 Applications

15 Applications Signal Representation (Mallat, Coifman, Donoho,..) EEG/MEG (Leahy, Gorodnitsky,Ioannides,..) Robust Linear Regression and Outlier Detection (Jin, Giannakis,..) Speech Coding (Ozawa, Ono, Kroon,..) Compressed Sensing (Donoho, Candes, Tao,..) Magnetic Resonance Imaging (Lustig,..) Sparse Channel Equalization (Fevrier, Proakis,...) Face Recognition (Wright, Yang,...) Cognitive Radio (Eldar,..) and many more...

16 Potential Algorithmic Approaches Finding the Optimal Solution is NP hard. So need low complexity algorithms with reasonable performance.

17 Potential Algorithmic Approaches Finding the Optimal Solution is NP hard. So need low complexity algorithms with reasonable performance. Greedy Search Techniques Matching Pursuit (MP), Orthogonal Matching Pursuit (OMP),...

18 Potential Algorithmic Approaches Finding the Optimal Solution is NP hard. So need low complexity algorithms with reasonable performance. Greedy Search Techniques Matching Pursuit (MP), Orthogonal Matching Pursuit (OMP),... Minimizing Diversity Measures (Regularization Framework) Tractable Surrogate Cost functions: e.g. l 1 minimization,...

19 Potential Algorithmic Approaches Finding the Optimal Solution is NP hard. So need low complexity algorithms with reasonable performance. Greedy Search Techniques Matching Pursuit (MP), Orthogonal Matching Pursuit (OMP),... Minimizing Diversity Measures (Regularization Framework) Tractable Surrogate Cost functions: e.g. l 1 minimization,... Bayesian Methods Make appropriate Statistical assumptions on the solution (sparsity): Choice of Prior

20 Bayesian Methods: Choice of Prior Super Gaussian Distributions: Heavy tailed and sharper peak at origin compared to Gaussian. Tractable representations using Scale Mixtures: Gaussian Scale Mixture (GSM) Laplacian Scale Mixture (LSM) Power Exponential Scale Mixture (PESM)

21 Gaussian Scale Mixtures Separability: p(x) = i p(x i)

22 Gaussian Scale Mixtures Separability: p(x) = i p(x i) p(x i ) = p(x i γ i )p(γ i )dγ i = N(x i ; 0, γ i )p(γ i )dγ i

23 Gaussian Scale Mixtures Separability: p(x) = i p(x i) p(x i ) = p(x i γ i )p(γ i )dγ i = N(x i ; 0, γ i )p(γ i )dγ i Theorem A density p(x) which is symmetric with respect origin, can be represented by a GSM iff p( x) is completely monotonic on (0, ).

24 Gaussian Scale Mixtures Separability: p(x) = i p(x i) p(x i ) = p(x i γ i )p(γ i )dγ i = N(x i ; 0, γ i )p(γ i )dγ i Theorem A density p(x) which is symmetric with respect origin, can be represented by a GSM iff p( x) is completely monotonic on (0, ). Most of the sparse priors over x can be represented in this GSM form. [Palmer et al., 2006]

25 Examples of Gaussian Scale Mixture Laplacian density p(x; a) = a 2 exp( a x ) Scale mixing density: p(γ) = a2 2 a2 exp( 2 γ), γ 0.

26 Examples of Gaussian Scale Mixture Laplacian density p(x; a) = a 2 exp( a x ) Scale mixing density: p(γ) = a2 2 Student-t Distribution p(x; a, b) = ba Γ(a + 1/2) (2π) 0.5 Γ(a) a2 exp( 2 γ), γ 0. Scale mixing density: Gamma Distribution. 1 (b + x 2 /2) a+1/2

27 Examples of Gaussian Scale Mixture Laplacian density p(x; a) = a 2 exp( a x ) Scale mixing density: p(γ) = a2 2 Student-t Distribution p(x; a, b) = ba Γ(a + 1/2) (2π) 0.5 Γ(a) a2 exp( 2 γ), γ 0. Scale mixing density: Gamma Distribution. 1 (b + x 2 /2) a+1/2 Generalized Gaussian p(x; p) = 1 2Γ(1 + 1 p )e x p Scale mixing density: Positive alpha stable density of order p/2.

28 Generalized Scale Mixture Family GSM corresponds to l 2 norm based SSR algorithm. LSM corresponds to l 1 norm based SSR algorithm. Need a generalized scale mixture for a unified treatment of l 1 and l 2 minimization based SSR.

29 Power Exponential Scale Mixture Distributions (PESM) Power Exponential Distribution Also known as Box and Tiao (BT) or Generalized Gaussian distribution (GGD). p PE (x; 0, σ, p) = Ke x p σ p

30 Power Exponential Scale Mixture Distributions (PESM) Power Exponential Distribution Also known as Box and Tiao (BT) or Generalized Gaussian distribution (GGD). p PE (x; 0, σ, p) = Ke x p σ p Scale Mixture of Power Exponential : p(x i ) = p(x i γ i )p(γ i )dγ i = p PE (x i ; 0, γ i, p)p(γ i )dγ i

31 Power Exponential Scale Mixture Distributions (PESM) Choice of p=2 Gaussian Scale Mixtures (GSM): l 2 norm minimization based algorithms. Choice of p=1 Laplacian Scale Mixtures (LSM): l 1 norm minimization based algorithms. PESM Unified treatment of both l 1 and l 2 based algorithms.

32 PESM Example: Generalized t distribution

33 PESM Example: Generalized t distribution Inverse Generalized Gamma (GG) for scaling density: p(γ i ) = p GG (γ i ; p, σ, q) = η(σ/γ i ) pq+1 e (σ/γ i ) p

34 PESM Example: Generalized t distribution Inverse Generalized Gamma (GG) for scaling density: p(γ i ) = p GG (γ i ; p, σ, q) = η(σ/γ i ) pq+1 e (σ/γ i ) p p GT (x; σ, p, q) = K(1 + x p qσ p ) (q+1/p)

35 PESM Example: Generalized t distribution Inverse Generalized Gamma (GG) for scaling density: p(γ i ) = p GG (γ i ; p, σ, q) = η(σ/γ i ) pq+1 e (σ/γ i ) p p GT (x; σ, p, q) = K(1 + x p qσ p ) (q+1/p) A wide class of heavy tailed super gaussian densities can be represented by GT using suitable shape parameters p and q.

36 Variants of GT Table: Variants of Generalized t Distribution q p Distribution q 2 Normal q 1 Laplacian (Double Exponential) q 0 (degrees of freedom) 2 Student t distribution q 0 (shape parameter) 1 Generalized Double Pareto (GDP)

37 Bayesian Methods

38 Bayesian Methods MAP Estimation (Type I)

39 Bayesian Methods MAP Estimation (Type I) Hierarchical Bayes (Type II)

40 MAP Estimation Framework (Type I) Problem Statement ˆx = arg max x p(x y) = arg max p(y x)p(x) x Choice of p(x) = a 2 e a x as Laplacian and Gaussian Likelihood assumption will lead to the familiar LASSO framework.

41 Hierarchical Bayesian Framework (Type II)

42 Hierarchical Bayesian Framework (Type II) Problem Statement ˆγ = arg max γ p(γ y) = arg max p(y γ)p(γ) γ Using this estimate of γ we can compute our concerned posterior p(x y; ˆγ).

43 Hierarchical Bayesian Framework (Type II) Potential Advantages Averaging over x leads to fewer minima in p(γ y). γ can tie several parameters, leading to fewer parameters. Maximizing the true posterior mass over the subspaces spanned by non zero indexes instead of looking for the mode.

44 Hierarchical Bayesian Framework (Type II) Potential Advantages Averaging over x leads to fewer minima in p(γ y). γ can tie several parameters, leading to fewer parameters. Maximizing the true posterior mass over the subspaces spanned by non zero indexes instead of looking for the mode. Bayesian LASSO Laplacian p(x) as GSM a : p(x) = p(x γ)p(γ)dγ 1 = exp( x 2 2πγ 2γ ) }{{} p(x γ) = a 2 exp( a x ) a2 2 exp( a2 2 γ) dγ }{{} p(γ) a Bayesian Compressive Sensing Using Laplace Priors, Babacan et al

45 MAP Estimation (Type I) Framework Problem Statement ˆx = arg max x log p(x y) = arg max log p(y x) + log p(x) x

46 MAP Estimation (Type I) Framework Problem Statement Examples: ˆx = arg max x log p(x y) = arg max log p(y x) + log p(x) x Prior Distribution Penalty Function SSR Algorithm Normal x 2 Ridge Regression Laplacian x 1 LASSO Student t distribution log(ɛ + x 2 ) Reweighted l 2 (Chartrand s) Generalized Double Pareto log(ɛ + x ) Reweighted l 1 (Candes s)

47 MAP Estimation (Type I) Framework Problem Statement Examples: ˆx = arg max x log p(x y) = arg max log p(y x) + log p(x) x Prior Distribution Penalty Function SSR Algorithm Normal x 2 Ridge Regression Laplacian x 1 LASSO Student t distribution log(ɛ + x 2 ) Reweighted l 2 (Chartrand s) Generalized Double Pareto log(ɛ + x ) Reweighted l 1 (Candes s) PESM as sparsity promoting prior p(x): Unified Type I Framework

48 Unified Type I Framework Choice of Prior: p(x) Any distribution in PESM class.

49 Unified Type I Framework Choice of Prior: p(x) Any distribution in PESM class. EM Algorithm Complete Data Log-Likelihood: log p(y, x, γ) = log p(y x) + log p(x γ) + log p(γ) Hidden Variable: γ Concerned Posterior: p(γ x, y) p(γ x) (From Markov chain).

50 Unified Type I: E step [ ] Q(x) = E γ x log p(y x) + log p(x γ) + log p(γ) E Step Only second term has dependencies on both x and γ. [ Compute E 1 ] γi x i γ p i

51 Unified Type I: E step p (x i ) = d dx i 0 p(x i γ i )p(γ i )dγ i = p x i p 1 sign(x i )p(x i ) = p x i p 1 sign(x i )p(x i )E γi x i [ 1 γ p i 0 1 p(γ i x i )dγ i γ p i ] E step E γi x i [ 1 γ p i ] = p (x i ) p x i p 1 sign(x i )p(x i )

52 Unified Type I: E step p (x i ) = d dx i 0 p(x i γ i )p(γ i )dγ i = p x i p 1 sign(x i )p(x i ) = p x i p 1 sign(x i )p(x i )E γi x i [ 1 γ p i 0 1 p(γ i x i )dγ i γ p i ] E step E γi x i [ 1 γ p i ] = p (x i ) p x i p 1 sign(x i )p(x i ) Note: No need to know p(γ), as long as p(x) is known and has a PESM representation.

53 Unified Type I: M step M step ˆx (k+1) 1 = arg min x 2λ y Φx 2 + i w (k) i x i p Where, w (k) i [ 1 ] = E γi x (k) i γ p i Special Case: Generalized t distribution w (k) i = q + 1/p qσ p + x (k) i p

54 Hierarchical Bayesian Framework (Type II)

55 Hierarchical Bayesian Framework (Type II) Estimate of the posterior distribution for x using estimated ˆγ; i.e.p(x y; ˆγ). Choice of GSM as p(x) leads to Sparse Bayesian Learning

56 Sparse Bayesian Learning (Type II)

57 Sparse Bayesian Learning (Type II) y = Φx + v

58 Sparse Bayesian Learning (Type II) y = Φx + v Solving for MAP estimate of γ ˆγ = arg max γ p(γ y) = arg max p(y γ)p(γ) γ

59 Sparse Bayesian Learning (Type II) y = Φx + v Solving for MAP estimate of γ What is p(y γ) ˆγ = arg max γ p(γ y) = arg max p(y γ)p(γ) γ

60 Sparse Bayesian Learning (Type II) y = Φx + v Solving for MAP estimate of γ What is p(y γ) ˆγ = arg max γ p(γ y) = arg max p(y γ)p(γ) γ Given γ, x is Gaussian with mean zero and Covariance matrix Γ with Γ = diag(γ), i.e. p(x γ) = N(x; 0, Γ) = ΠN(x i ; 0, γ i ).

61 Sparse Bayesian Learning (Type II) y = Φx + v Solving for MAP estimate of γ What is p(y γ) ˆγ = arg max γ p(γ y) = arg max p(y γ)p(γ) γ Given γ, x is Gaussian with mean zero and Covariance matrix Γ with Γ = diag(γ), i.e. p(x γ) = N(x; 0, Γ) = ΠN(x i ; 0, γ i ). Then p(y γ) = N(y; 0, Σ y ), where Σ y = σ 2 I + ΦΓΦ T, 1 p(y γ) = (2π)N Σ y e 1 2 y T Σ 1 y y

62 Sparse Bayesian Learning (Tipping) y = Φx + v Solving for the optimal γ ˆγ = arg max γ = arg min γ p(γ y) = arg max p(y γ)p(γ) γ log Σ y + y T Σ 1 y y 2 i where, Σ y = σ 2 I + ΦΓΦ T and Γ = diag(γ) logp(γ i )

63 Sparse Bayesian Learning (Tipping) y = Φx + v Solving for the optimal γ ˆγ = arg max γ = arg min γ p(γ y) = arg max p(y γ)p(γ) γ log Σ y + y T Σ 1 y y 2 i where, Σ y = σ 2 I + ΦΓΦ T and Γ = diag(γ) logp(γ i ) Computational Methods Many options for solving the above optimization problem, e.g. Majorization Minimization, Expectation-Maximization (EM).

64 Sparse Bayesian Learning y = Φx + v Computing Posterior Now because of our convenient GSM choice, posterior can be easily computed, i.e, p(x y; ˆγ) = N(µ x, Σ x ) where, µ x = E[x y; ˆγ] = ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 y Σ x = Cov[x y; ˆγ] = ˆΓ ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 ΦˆΓ

65 Sparse Bayesian Learning y = Φx + v Computing Posterior Now because of our convenient GSM choice, posterior can be easily computed, i.e, p(x y; ˆγ) = N(µ x, Σ x ) where, µ x = E[x y; ˆγ] = ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 y Σ x = Cov[x y; ˆγ] = ˆΓ ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 ΦˆΓ µ x can be used as a point estimate.

66 Sparse Bayesian Learning y = Φx + v Computing Posterior Now because of our convenient GSM choice, posterior can be easily computed, i.e, p(x y; ˆγ) = N(µ x, Σ x ) where, µ x = E[x y; ˆγ] = ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 y Σ x = Cov[x y; ˆγ] = ˆΓ ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 ΦˆΓ µ x can be used as a point estimate. Sparsity of µ x is achieved through sparsity in γ.

67 Sparse Bayesian Learning y = Φx + v Computing Posterior Now because of our convenient GSM choice, posterior can be easily computed, i.e, p(x y; ˆγ) = N(µ x, Σ x ) where, µ x = E[x y; ˆγ] = ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 y Σ x = Cov[x y; ˆγ] = ˆΓ ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 ΦˆΓ µ x can be used as a point estimate. Sparsity of µ x is achieved through sparsity in γ. Another parameter of interest for the EM algorithm E(x 2 i y, ˆγ) = µ 2 x(i) + Σ x (i, i)

68 EM algorithm: Updating γ

69 EM algorithm: Updating γ Treating (y, x) as complete data and vector x as hidden variable. log p(y, x, γ) = log p(y x) + log p(x γ) + log p(γ)

70 EM algorithm: Updating γ Treating (y, x) as complete data and vector x as hidden variable. log p(y, x, γ) = log p(y x) + log p(x γ) + log p(γ) E step Q(γ γ k ) = E x y;γ k [log p(y x) + log p(x γ) + log p(γ)]

71 EM algorithm: Updating γ Treating (y, x) as complete data and vector x as hidden variable. log p(y, x, γ) = log p(y x) + log p(x γ) + log p(γ) E step Q(γ γ k ) = E x y;γ k [log p(y x) + log p(x γ) + log p(γ)] M step γ k+1 = argmax γ Q(γ γ k ) = argmax γ E x y;γ k [log p(x γ) + log p(γ)] M [( x 2 = argmin γ E i x y;γ k + 1 ) ] 2γ i 2 log γ i log p(γ i ) i=1

72 EM algorithm: Updating γ Treating (y, x) as complete data and vector x as hidden variable. log p(y, x, γ) = log p(y x) + log p(x γ) + log p(γ) E step Q(γ γ k ) = E x y;γ k [log p(y x) + log p(x γ) + log p(γ)] M step γ k+1 = argmax γ Q(γ γ k ) = argmax γ E x y;γ k [log p(x γ) + log p(γ)] M [( x 2 = argmin γ E i x y;γ k + 1 ) ] 2γ i 2 log γ i log p(γ i ) i=1 Solving this optimization problem with a non-informative prior p(γ), γ k+1 i = E(x 2 i y, γ k ) = µ x (i) 2 + Σ x (i, i)

73 Type II (SBL) properties

74 Type II (SBL) properties Local minima are sparse, i.e. have at most N nonzero γ i

75 Type II (SBL) properties Local minima are sparse, i.e. have at most N nonzero γ i Cost function p(γ y) is generally much smoother than the associated MAP estimation objective p(x y). Fewer local minima.

76 Type II (SBL) properties Local minima are sparse, i.e. have at most N nonzero γ i Cost function p(γ y) is generally much smoother than the associated MAP estimation objective p(x y). Fewer local minima. In high signal to noise ratio, the global minima is the sparsest solution. No structural problems.

77 Type II (SBL) properties Local minima are sparse, i.e. have at most N nonzero γ i Cost function p(γ y) is generally much smoother than the associated MAP estimation objective p(x y). Fewer local minima. In high signal to noise ratio, the global minima is the sparsest solution. No structural problems. Attempts to approximate the posterior distribution p(x y) in the area with significant mass.

78 Algorithmic Variants

79 Algorithmic Variants Fixed Point iteration based on setting the derivative of the objective function to zero (Tipping)

80 Algorithmic Variants Fixed Point iteration based on setting the derivative of the objective function to zero (Tipping) Sequential search for the significant γ s (Tipping and Faul)

81 Algorithmic Variants Fixed Point iteration based on setting the derivative of the objective function to zero (Tipping) Sequential search for the significant γ s (Tipping and Faul) Majorization-Minimization based approach (Wipf and Nagarajan)

82 Algorithmic Variants Fixed Point iteration based on setting the derivative of the objective function to zero (Tipping) Sequential search for the significant γ s (Tipping and Faul) Majorization-Minimization based approach (Wipf and Nagarajan) Reweighted l 1 and l 2 algorithms (Wipf and Nagarajan)

83 Algorithmic Variants Fixed Point iteration based on setting the derivative of the objective function to zero (Tipping) Sequential search for the significant γ s (Tipping and Faul) Majorization-Minimization based approach (Wipf and Nagarajan) Reweighted l 1 and l 2 algorithms (Wipf and Nagarajan) Approximate Message Passing (AlShoukairi and Rao)

84 Type II using PESM In E step we need to compute the conditional expectation. Closed form may not be available depending on the choice of p (distributional parameter of PESM). Alternative: MCMC technique.

85 Type II using PESM In E step we need to compute the conditional expectation. Closed form may not be available depending on the choice of p (distributional parameter of PESM). Alternative: MCMC technique. LSM Using the fact that a Laplacian density has a GSM representation, a tractable 3 layer hierarchical model can be developed.

86 Simulation Results Parameters 1 N = 50, M = Dictionary Elements: Normal Distribution with mean = 0 and standard deviation = 1. 3 Distribution of non zero elements (I) Zero mean unit variance Gaussian. (II) Student t distribution with degrees of freedom ν = 3. (Super-Gaussian) (III) Uniform ±1 random spikes.

87 Simulation Results: Gaussian Figure: Recovery performance with Gaussian distributed non zero coefficients

88 Simulation Results: Super Gaussian Figure: Recovery performance with Super Gaussian (Student t) distributed non zero coefficients

89 Simulation Results: Uniform Figure: Recovery performance with uniform spikes as non zero coefficients

90 Special Case: MMV Multiple measurements: L measurements Common Sparsity Profile: k nonzero rows

91 Bayesian Methods: GSM Extension

92 Bayesian Methods: GSM Extension Representation for Random Vectors (Rows for MMV) X = γg where, G N(g; 0, B) γ is a positive random variable, which is independent of G.

93 Bayesian Methods: GSM Extension Representation for Random Vectors (Rows for MMV) X = γg where, G N(g; 0, B) γ is a positive random variable, which is independent of G. p(x) = p(x γ)p(γ)dγ = N(x; 0, γb)p(γ)dγ

94 Bayesian Methods: GSM Extension Representation for Random Vectors (Rows for MMV) X = γg where, G N(g; 0, B) γ is a positive random variable, which is independent of G. p(x) = p(x γ)p(γ)dγ = N(x; 0, γb)p(γ)dγ B = I if the row entries are assumed independent.

95 Bayesian Methods: GSM Extension Representation for Random Vectors (Rows for MMV) X = γg where, G N(g; 0, B) γ is a positive random variable, which is independent of G. p(x) = p(x γ)p(γ)dγ = N(x; 0, γb)p(γ)dγ B = I if the row entries are assumed independent. One γ per row vector. Complexity of estimating γ does not grow with L.

96 Bayesian Methods: GSM Extension Representation for Random Vectors (Rows for MMV) X = γg where, G N(g; 0, B) γ is a positive random variable, which is independent of G. p(x) = p(x γ)p(γ)dγ = N(x; 0, γb)p(γ)dγ B = I if the row entries are assumed independent. One γ per row vector. Complexity of estimating γ does not grow with L. The EM algorithm is also very tractable.

97 MMV Empirical Comparison: 1000 trials

98 Summary

99 Summary Sparse Signal Recovery (SSR) and Compressed Sensing (CS) are interesting new signal processing tools with many potential applications.

100 Summary Sparse Signal Recovery (SSR) and Compressed Sensing (CS) are interesting new signal processing tools with many potential applications. Many algorithmic options exist to solve the underlying sparse signal recovery problem; Greedy Search Techniques, regularization methods, Bayesian methods, among others.

101 Summary Sparse Signal Recovery (SSR) and Compressed Sensing (CS) are interesting new signal processing tools with many potential applications. Many algorithmic options exist to solve the underlying sparse signal recovery problem; Greedy Search Techniques, regularization methods, Bayesian methods, among others. Bayesian methods offer interesting algorithmic options to the Sparse Signal Recovery problem

102 Summary Sparse Signal Recovery (SSR) and Compressed Sensing (CS) are interesting new signal processing tools with many potential applications. Many algorithmic options exist to solve the underlying sparse signal recovery problem; Greedy Search Techniques, regularization methods, Bayesian methods, among others. Bayesian methods offer interesting algorithmic options to the Sparse Signal Recovery problem MAP methods (reweighted l 1 and l 2 methods)

103 Summary Sparse Signal Recovery (SSR) and Compressed Sensing (CS) are interesting new signal processing tools with many potential applications. Many algorithmic options exist to solve the underlying sparse signal recovery problem; Greedy Search Techniques, regularization methods, Bayesian methods, among others. Bayesian methods offer interesting algorithmic options to the Sparse Signal Recovery problem MAP methods (reweighted l 1 and l 2 methods) Hierarchical Bayesian Methods (Sparse Bayesian Learning)

104 Summary Sparse Signal Recovery (SSR) and Compressed Sensing (CS) are interesting new signal processing tools with many potential applications. Many algorithmic options exist to solve the underlying sparse signal recovery problem; Greedy Search Techniques, regularization methods, Bayesian methods, among others. Bayesian methods offer interesting algorithmic options to the Sparse Signal Recovery problem MAP methods (reweighted l 1 and l 2 methods) Hierarchical Bayesian Methods (Sparse Bayesian Learning) Versatile and can be more easily employed in problems with structure

105 Summary Sparse Signal Recovery (SSR) and Compressed Sensing (CS) are interesting new signal processing tools with many potential applications. Many algorithmic options exist to solve the underlying sparse signal recovery problem; Greedy Search Techniques, regularization methods, Bayesian methods, among others. Bayesian methods offer interesting algorithmic options to the Sparse Signal Recovery problem MAP methods (reweighted l 1 and l 2 methods) Hierarchical Bayesian Methods (Sparse Bayesian Learning) Versatile and can be more easily employed in problems with structure Algorithms can often be justified by studying the resulting objective functions.

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting