Scale Mixture Modeling of Priors for Sparse Signal Recovery

Size: px
Start display at page:

Download "Scale Mixture Modeling of Priors for Sparse Signal Recovery"

Transcription

1 Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri

2 Outline

3 Outline Sparse Signal Recovery (SSR) Problem and some Extensions

4 Outline Sparse Signal Recovery (SSR) Problem and some Extensions Scale Mixture Priors Gaussian Scale Mixture (GSM) Laplacian Scale Mixture (LSM) Power Exponential Scale Mixture (PESM)

5 Outline Sparse Signal Recovery (SSR) Problem and some Extensions Scale Mixture Priors Gaussian Scale Mixture (GSM) Laplacian Scale Mixture (LSM) Power Exponential Scale Mixture (PESM) Bayesian Methods MAP estimation (Type I) Hierarchical Bayes (Type II)

6 Outline Sparse Signal Recovery (SSR) Problem and some Extensions Scale Mixture Priors Gaussian Scale Mixture (GSM) Laplacian Scale Mixture (LSM) Power Exponential Scale Mixture (PESM) Bayesian Methods MAP estimation (Type I) Hierarchical Bayes (Type II) Experimental Results

7 Outline Sparse Signal Recovery (SSR) Problem and some Extensions Scale Mixture Priors Gaussian Scale Mixture (GSM) Laplacian Scale Mixture (LSM) Power Exponential Scale Mixture (PESM) Bayesian Methods MAP estimation (Type I) Hierarchical Bayes (Type II) Experimental Results Summary

8 Problem Description: Sparse Signal Recovery (SSR) y is a N 1 measurement vector. Φ is N M dictionary matrix where M >> N. x is M 1 desired vector which is sparse with k non zero entries. v is the measurement noise.

9 Extensions

10 Extensions Block Sparsity

11 Extensions Block Sparsity Multiple Measurement Vectors (MMV)

12 Extensions Block Sparsity Multiple Measurement Vectors (MMV) Block MMV MMV with time varying sparsity

13 Multiple Measurement Vectors (MMV) Multiple measurements: L measurements Common Sparsity Profile: k nonzero rows

14 Applications

15 Applications Signal Representation (Mallat, Coifman, Donoho,..) EEG/MEG (Leahy, Gorodnitsky,Ioannides,..) Robust Linear Regression and Outlier Detection (Jin, Giannakis,..) Speech Coding (Ozawa, Ono, Kroon,..) Compressed Sensing (Donoho, Candes, Tao,..) Magnetic Resonance Imaging (Lustig,..) Sparse Channel Equalization (Fevrier, Proakis,...) Face Recognition (Wright, Yang,...) Cognitive Radio (Eldar,..) and many more...

16 Potential Algorithmic Approaches Finding the Optimal Solution is NP hard. So need low complexity algorithms with reasonable performance.

17 Potential Algorithmic Approaches Finding the Optimal Solution is NP hard. So need low complexity algorithms with reasonable performance. Greedy Search Techniques Matching Pursuit (MP), Orthogonal Matching Pursuit (OMP),...

18 Potential Algorithmic Approaches Finding the Optimal Solution is NP hard. So need low complexity algorithms with reasonable performance. Greedy Search Techniques Matching Pursuit (MP), Orthogonal Matching Pursuit (OMP),... Minimizing Diversity Measures (Regularization Framework) Tractable Surrogate Cost functions: e.g. l 1 minimization,...

19 Potential Algorithmic Approaches Finding the Optimal Solution is NP hard. So need low complexity algorithms with reasonable performance. Greedy Search Techniques Matching Pursuit (MP), Orthogonal Matching Pursuit (OMP),... Minimizing Diversity Measures (Regularization Framework) Tractable Surrogate Cost functions: e.g. l 1 minimization,... Bayesian Methods Make appropriate Statistical assumptions on the solution (sparsity): Choice of Prior

20 Bayesian Methods: Choice of Prior Super Gaussian Distributions: Heavy tailed and sharper peak at origin compared to Gaussian. Tractable representations using Scale Mixtures: Gaussian Scale Mixture (GSM) Laplacian Scale Mixture (LSM) Power Exponential Scale Mixture (PESM)

21 Gaussian Scale Mixtures Separability: p(x) = i p(x i)

22 Gaussian Scale Mixtures Separability: p(x) = i p(x i) p(x i ) = p(x i γ i )p(γ i )dγ i = N(x i ; 0, γ i )p(γ i )dγ i

23 Gaussian Scale Mixtures Separability: p(x) = i p(x i) p(x i ) = p(x i γ i )p(γ i )dγ i = N(x i ; 0, γ i )p(γ i )dγ i Theorem A density p(x) which is symmetric with respect origin, can be represented by a GSM iff p( x) is completely monotonic on (0, ).

24 Gaussian Scale Mixtures Separability: p(x) = i p(x i) p(x i ) = p(x i γ i )p(γ i )dγ i = N(x i ; 0, γ i )p(γ i )dγ i Theorem A density p(x) which is symmetric with respect origin, can be represented by a GSM iff p( x) is completely monotonic on (0, ). Most of the sparse priors over x can be represented in this GSM form. [Palmer et al., 2006]

25 Examples of Gaussian Scale Mixture Laplacian density p(x; a) = a 2 exp( a x ) Scale mixing density: p(γ) = a2 2 a2 exp( 2 γ), γ 0.

26 Examples of Gaussian Scale Mixture Laplacian density p(x; a) = a 2 exp( a x ) Scale mixing density: p(γ) = a2 2 Student-t Distribution p(x; a, b) = ba Γ(a + 1/2) (2π) 0.5 Γ(a) a2 exp( 2 γ), γ 0. Scale mixing density: Gamma Distribution. 1 (b + x 2 /2) a+1/2

27 Examples of Gaussian Scale Mixture Laplacian density p(x; a) = a 2 exp( a x ) Scale mixing density: p(γ) = a2 2 Student-t Distribution p(x; a, b) = ba Γ(a + 1/2) (2π) 0.5 Γ(a) a2 exp( 2 γ), γ 0. Scale mixing density: Gamma Distribution. 1 (b + x 2 /2) a+1/2 Generalized Gaussian p(x; p) = 1 2Γ(1 + 1 p )e x p Scale mixing density: Positive alpha stable density of order p/2.

28 Generalized Scale Mixture Family GSM corresponds to l 2 norm based SSR algorithm. LSM corresponds to l 1 norm based SSR algorithm. Need a generalized scale mixture for a unified treatment of l 1 and l 2 minimization based SSR.

29 Power Exponential Scale Mixture Distributions (PESM) Power Exponential Distribution Also known as Box and Tiao (BT) or Generalized Gaussian distribution (GGD). p PE (x; 0, σ, p) = Ke x p σ p

30 Power Exponential Scale Mixture Distributions (PESM) Power Exponential Distribution Also known as Box and Tiao (BT) or Generalized Gaussian distribution (GGD). p PE (x; 0, σ, p) = Ke x p σ p Scale Mixture of Power Exponential : p(x i ) = p(x i γ i )p(γ i )dγ i = p PE (x i ; 0, γ i, p)p(γ i )dγ i

31 Power Exponential Scale Mixture Distributions (PESM) Choice of p=2 Gaussian Scale Mixtures (GSM): l 2 norm minimization based algorithms. Choice of p=1 Laplacian Scale Mixtures (LSM): l 1 norm minimization based algorithms. PESM Unified treatment of both l 1 and l 2 based algorithms.

32 PESM Example: Generalized t distribution

33 PESM Example: Generalized t distribution Inverse Generalized Gamma (GG) for scaling density: p(γ i ) = p GG (γ i ; p, σ, q) = η(σ/γ i ) pq+1 e (σ/γ i ) p

34 PESM Example: Generalized t distribution Inverse Generalized Gamma (GG) for scaling density: p(γ i ) = p GG (γ i ; p, σ, q) = η(σ/γ i ) pq+1 e (σ/γ i ) p p GT (x; σ, p, q) = K(1 + x p qσ p ) (q+1/p)

35 PESM Example: Generalized t distribution Inverse Generalized Gamma (GG) for scaling density: p(γ i ) = p GG (γ i ; p, σ, q) = η(σ/γ i ) pq+1 e (σ/γ i ) p p GT (x; σ, p, q) = K(1 + x p qσ p ) (q+1/p) A wide class of heavy tailed super gaussian densities can be represented by GT using suitable shape parameters p and q.

36 Variants of GT Table: Variants of Generalized t Distribution q p Distribution q 2 Normal q 1 Laplacian (Double Exponential) q 0 (degrees of freedom) 2 Student t distribution q 0 (shape parameter) 1 Generalized Double Pareto (GDP)

37 Bayesian Methods

38 Bayesian Methods MAP Estimation (Type I)

39 Bayesian Methods MAP Estimation (Type I) Hierarchical Bayes (Type II)

40 MAP Estimation Framework (Type I) Problem Statement ˆx = arg max x p(x y) = arg max p(y x)p(x) x Choice of p(x) = a 2 e a x as Laplacian and Gaussian Likelihood assumption will lead to the familiar LASSO framework.

41 Hierarchical Bayesian Framework (Type II)

42 Hierarchical Bayesian Framework (Type II) Problem Statement ˆγ = arg max γ p(γ y) = arg max p(y γ)p(γ) γ Using this estimate of γ we can compute our concerned posterior p(x y; ˆγ).

43 Hierarchical Bayesian Framework (Type II) Potential Advantages Averaging over x leads to fewer minima in p(γ y). γ can tie several parameters, leading to fewer parameters. Maximizing the true posterior mass over the subspaces spanned by non zero indexes instead of looking for the mode.

44 Hierarchical Bayesian Framework (Type II) Potential Advantages Averaging over x leads to fewer minima in p(γ y). γ can tie several parameters, leading to fewer parameters. Maximizing the true posterior mass over the subspaces spanned by non zero indexes instead of looking for the mode. Bayesian LASSO Laplacian p(x) as GSM a : p(x) = p(x γ)p(γ)dγ 1 = exp( x 2 2πγ 2γ ) }{{} p(x γ) = a 2 exp( a x ) a2 2 exp( a2 2 γ) dγ }{{} p(γ) a Bayesian Compressive Sensing Using Laplace Priors, Babacan et al

45 MAP Estimation (Type I) Framework Problem Statement ˆx = arg max x log p(x y) = arg max log p(y x) + log p(x) x

46 MAP Estimation (Type I) Framework Problem Statement Examples: ˆx = arg max x log p(x y) = arg max log p(y x) + log p(x) x Prior Distribution Penalty Function SSR Algorithm Normal x 2 Ridge Regression Laplacian x 1 LASSO Student t distribution log(ɛ + x 2 ) Reweighted l 2 (Chartrand s) Generalized Double Pareto log(ɛ + x ) Reweighted l 1 (Candes s)

47 MAP Estimation (Type I) Framework Problem Statement Examples: ˆx = arg max x log p(x y) = arg max log p(y x) + log p(x) x Prior Distribution Penalty Function SSR Algorithm Normal x 2 Ridge Regression Laplacian x 1 LASSO Student t distribution log(ɛ + x 2 ) Reweighted l 2 (Chartrand s) Generalized Double Pareto log(ɛ + x ) Reweighted l 1 (Candes s) PESM as sparsity promoting prior p(x): Unified Type I Framework

48 Unified Type I Framework Choice of Prior: p(x) Any distribution in PESM class.

49 Unified Type I Framework Choice of Prior: p(x) Any distribution in PESM class. EM Algorithm Complete Data Log-Likelihood: log p(y, x, γ) = log p(y x) + log p(x γ) + log p(γ) Hidden Variable: γ Concerned Posterior: p(γ x, y) p(γ x) (From Markov chain).

50 Unified Type I: E step [ ] Q(x) = E γ x log p(y x) + log p(x γ) + log p(γ) E Step Only second term has dependencies on both x and γ. [ Compute E 1 ] γi x i γ p i

51 Unified Type I: E step p (x i ) = d dx i 0 p(x i γ i )p(γ i )dγ i = p x i p 1 sign(x i )p(x i ) = p x i p 1 sign(x i )p(x i )E γi x i [ 1 γ p i 0 1 p(γ i x i )dγ i γ p i ] E step E γi x i [ 1 γ p i ] = p (x i ) p x i p 1 sign(x i )p(x i )

52 Unified Type I: E step p (x i ) = d dx i 0 p(x i γ i )p(γ i )dγ i = p x i p 1 sign(x i )p(x i ) = p x i p 1 sign(x i )p(x i )E γi x i [ 1 γ p i 0 1 p(γ i x i )dγ i γ p i ] E step E γi x i [ 1 γ p i ] = p (x i ) p x i p 1 sign(x i )p(x i ) Note: No need to know p(γ), as long as p(x) is known and has a PESM representation.

53 Unified Type I: M step M step ˆx (k+1) 1 = arg min x 2λ y Φx 2 + i w (k) i x i p Where, w (k) i [ 1 ] = E γi x (k) i γ p i Special Case: Generalized t distribution w (k) i = q + 1/p qσ p + x (k) i p

54 Hierarchical Bayesian Framework (Type II)

55 Hierarchical Bayesian Framework (Type II) Estimate of the posterior distribution for x using estimated ˆγ; i.e.p(x y; ˆγ). Choice of GSM as p(x) leads to Sparse Bayesian Learning

56 Sparse Bayesian Learning (Type II)

57 Sparse Bayesian Learning (Type II) y = Φx + v

58 Sparse Bayesian Learning (Type II) y = Φx + v Solving for MAP estimate of γ ˆγ = arg max γ p(γ y) = arg max p(y γ)p(γ) γ

59 Sparse Bayesian Learning (Type II) y = Φx + v Solving for MAP estimate of γ What is p(y γ) ˆγ = arg max γ p(γ y) = arg max p(y γ)p(γ) γ

60 Sparse Bayesian Learning (Type II) y = Φx + v Solving for MAP estimate of γ What is p(y γ) ˆγ = arg max γ p(γ y) = arg max p(y γ)p(γ) γ Given γ, x is Gaussian with mean zero and Covariance matrix Γ with Γ = diag(γ), i.e. p(x γ) = N(x; 0, Γ) = ΠN(x i ; 0, γ i ).

61 Sparse Bayesian Learning (Type II) y = Φx + v Solving for MAP estimate of γ What is p(y γ) ˆγ = arg max γ p(γ y) = arg max p(y γ)p(γ) γ Given γ, x is Gaussian with mean zero and Covariance matrix Γ with Γ = diag(γ), i.e. p(x γ) = N(x; 0, Γ) = ΠN(x i ; 0, γ i ). Then p(y γ) = N(y; 0, Σ y ), where Σ y = σ 2 I + ΦΓΦ T, 1 p(y γ) = (2π)N Σ y e 1 2 y T Σ 1 y y

62 Sparse Bayesian Learning (Tipping) y = Φx + v Solving for the optimal γ ˆγ = arg max γ = arg min γ p(γ y) = arg max p(y γ)p(γ) γ log Σ y + y T Σ 1 y y 2 i where, Σ y = σ 2 I + ΦΓΦ T and Γ = diag(γ) logp(γ i )

63 Sparse Bayesian Learning (Tipping) y = Φx + v Solving for the optimal γ ˆγ = arg max γ = arg min γ p(γ y) = arg max p(y γ)p(γ) γ log Σ y + y T Σ 1 y y 2 i where, Σ y = σ 2 I + ΦΓΦ T and Γ = diag(γ) logp(γ i ) Computational Methods Many options for solving the above optimization problem, e.g. Majorization Minimization, Expectation-Maximization (EM).

64 Sparse Bayesian Learning y = Φx + v Computing Posterior Now because of our convenient GSM choice, posterior can be easily computed, i.e, p(x y; ˆγ) = N(µ x, Σ x ) where, µ x = E[x y; ˆγ] = ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 y Σ x = Cov[x y; ˆγ] = ˆΓ ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 ΦˆΓ

65 Sparse Bayesian Learning y = Φx + v Computing Posterior Now because of our convenient GSM choice, posterior can be easily computed, i.e, p(x y; ˆγ) = N(µ x, Σ x ) where, µ x = E[x y; ˆγ] = ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 y Σ x = Cov[x y; ˆγ] = ˆΓ ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 ΦˆΓ µ x can be used as a point estimate.

66 Sparse Bayesian Learning y = Φx + v Computing Posterior Now because of our convenient GSM choice, posterior can be easily computed, i.e, p(x y; ˆγ) = N(µ x, Σ x ) where, µ x = E[x y; ˆγ] = ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 y Σ x = Cov[x y; ˆγ] = ˆΓ ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 ΦˆΓ µ x can be used as a point estimate. Sparsity of µ x is achieved through sparsity in γ.

67 Sparse Bayesian Learning y = Φx + v Computing Posterior Now because of our convenient GSM choice, posterior can be easily computed, i.e, p(x y; ˆγ) = N(µ x, Σ x ) where, µ x = E[x y; ˆγ] = ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 y Σ x = Cov[x y; ˆγ] = ˆΓ ˆΓΦ T (σ 2 I + ΦˆΓΦ T ) 1 ΦˆΓ µ x can be used as a point estimate. Sparsity of µ x is achieved through sparsity in γ. Another parameter of interest for the EM algorithm E(x 2 i y, ˆγ) = µ 2 x(i) + Σ x (i, i)

68 EM algorithm: Updating γ

69 EM algorithm: Updating γ Treating (y, x) as complete data and vector x as hidden variable. log p(y, x, γ) = log p(y x) + log p(x γ) + log p(γ)

70 EM algorithm: Updating γ Treating (y, x) as complete data and vector x as hidden variable. log p(y, x, γ) = log p(y x) + log p(x γ) + log p(γ) E step Q(γ γ k ) = E x y;γ k [log p(y x) + log p(x γ) + log p(γ)]

71 EM algorithm: Updating γ Treating (y, x) as complete data and vector x as hidden variable. log p(y, x, γ) = log p(y x) + log p(x γ) + log p(γ) E step Q(γ γ k ) = E x y;γ k [log p(y x) + log p(x γ) + log p(γ)] M step γ k+1 = argmax γ Q(γ γ k ) = argmax γ E x y;γ k [log p(x γ) + log p(γ)] M [( x 2 = argmin γ E i x y;γ k + 1 ) ] 2γ i 2 log γ i log p(γ i ) i=1

72 EM algorithm: Updating γ Treating (y, x) as complete data and vector x as hidden variable. log p(y, x, γ) = log p(y x) + log p(x γ) + log p(γ) E step Q(γ γ k ) = E x y;γ k [log p(y x) + log p(x γ) + log p(γ)] M step γ k+1 = argmax γ Q(γ γ k ) = argmax γ E x y;γ k [log p(x γ) + log p(γ)] M [( x 2 = argmin γ E i x y;γ k + 1 ) ] 2γ i 2 log γ i log p(γ i ) i=1 Solving this optimization problem with a non-informative prior p(γ), γ k+1 i = E(x 2 i y, γ k ) = µ x (i) 2 + Σ x (i, i)

73 Type II (SBL) properties

74 Type II (SBL) properties Local minima are sparse, i.e. have at most N nonzero γ i

75 Type II (SBL) properties Local minima are sparse, i.e. have at most N nonzero γ i Cost function p(γ y) is generally much smoother than the associated MAP estimation objective p(x y). Fewer local minima.

76 Type II (SBL) properties Local minima are sparse, i.e. have at most N nonzero γ i Cost function p(γ y) is generally much smoother than the associated MAP estimation objective p(x y). Fewer local minima. In high signal to noise ratio, the global minima is the sparsest solution. No structural problems.

77 Type II (SBL) properties Local minima are sparse, i.e. have at most N nonzero γ i Cost function p(γ y) is generally much smoother than the associated MAP estimation objective p(x y). Fewer local minima. In high signal to noise ratio, the global minima is the sparsest solution. No structural problems. Attempts to approximate the posterior distribution p(x y) in the area with significant mass.

78 Algorithmic Variants

79 Algorithmic Variants Fixed Point iteration based on setting the derivative of the objective function to zero (Tipping)

80 Algorithmic Variants Fixed Point iteration based on setting the derivative of the objective function to zero (Tipping) Sequential search for the significant γ s (Tipping and Faul)

81 Algorithmic Variants Fixed Point iteration based on setting the derivative of the objective function to zero (Tipping) Sequential search for the significant γ s (Tipping and Faul) Majorization-Minimization based approach (Wipf and Nagarajan)

82 Algorithmic Variants Fixed Point iteration based on setting the derivative of the objective function to zero (Tipping) Sequential search for the significant γ s (Tipping and Faul) Majorization-Minimization based approach (Wipf and Nagarajan) Reweighted l 1 and l 2 algorithms (Wipf and Nagarajan)

83 Algorithmic Variants Fixed Point iteration based on setting the derivative of the objective function to zero (Tipping) Sequential search for the significant γ s (Tipping and Faul) Majorization-Minimization based approach (Wipf and Nagarajan) Reweighted l 1 and l 2 algorithms (Wipf and Nagarajan) Approximate Message Passing (AlShoukairi and Rao)

84 Type II using PESM In E step we need to compute the conditional expectation. Closed form may not be available depending on the choice of p (distributional parameter of PESM). Alternative: MCMC technique.

85 Type II using PESM In E step we need to compute the conditional expectation. Closed form may not be available depending on the choice of p (distributional parameter of PESM). Alternative: MCMC technique. LSM Using the fact that a Laplacian density has a GSM representation, a tractable 3 layer hierarchical model can be developed.

86 Simulation Results Parameters 1 N = 50, M = Dictionary Elements: Normal Distribution with mean = 0 and standard deviation = 1. 3 Distribution of non zero elements (I) Zero mean unit variance Gaussian. (II) Student t distribution with degrees of freedom ν = 3. (Super-Gaussian) (III) Uniform ±1 random spikes.

87 Simulation Results: Gaussian Figure: Recovery performance with Gaussian distributed non zero coefficients

88 Simulation Results: Super Gaussian Figure: Recovery performance with Super Gaussian (Student t) distributed non zero coefficients

89 Simulation Results: Uniform Figure: Recovery performance with uniform spikes as non zero coefficients

90 Special Case: MMV Multiple measurements: L measurements Common Sparsity Profile: k nonzero rows

91 Bayesian Methods: GSM Extension

92 Bayesian Methods: GSM Extension Representation for Random Vectors (Rows for MMV) X = γg where, G N(g; 0, B) γ is a positive random variable, which is independent of G.

93 Bayesian Methods: GSM Extension Representation for Random Vectors (Rows for MMV) X = γg where, G N(g; 0, B) γ is a positive random variable, which is independent of G. p(x) = p(x γ)p(γ)dγ = N(x; 0, γb)p(γ)dγ

94 Bayesian Methods: GSM Extension Representation for Random Vectors (Rows for MMV) X = γg where, G N(g; 0, B) γ is a positive random variable, which is independent of G. p(x) = p(x γ)p(γ)dγ = N(x; 0, γb)p(γ)dγ B = I if the row entries are assumed independent.

95 Bayesian Methods: GSM Extension Representation for Random Vectors (Rows for MMV) X = γg where, G N(g; 0, B) γ is a positive random variable, which is independent of G. p(x) = p(x γ)p(γ)dγ = N(x; 0, γb)p(γ)dγ B = I if the row entries are assumed independent. One γ per row vector. Complexity of estimating γ does not grow with L.

96 Bayesian Methods: GSM Extension Representation for Random Vectors (Rows for MMV) X = γg where, G N(g; 0, B) γ is a positive random variable, which is independent of G. p(x) = p(x γ)p(γ)dγ = N(x; 0, γb)p(γ)dγ B = I if the row entries are assumed independent. One γ per row vector. Complexity of estimating γ does not grow with L. The EM algorithm is also very tractable.

97 MMV Empirical Comparison: 1000 trials

98 Summary

99 Summary Sparse Signal Recovery (SSR) and Compressed Sensing (CS) are interesting new signal processing tools with many potential applications.

100 Summary Sparse Signal Recovery (SSR) and Compressed Sensing (CS) are interesting new signal processing tools with many potential applications. Many algorithmic options exist to solve the underlying sparse signal recovery problem; Greedy Search Techniques, regularization methods, Bayesian methods, among others.

101 Summary Sparse Signal Recovery (SSR) and Compressed Sensing (CS) are interesting new signal processing tools with many potential applications. Many algorithmic options exist to solve the underlying sparse signal recovery problem; Greedy Search Techniques, regularization methods, Bayesian methods, among others. Bayesian methods offer interesting algorithmic options to the Sparse Signal Recovery problem

102 Summary Sparse Signal Recovery (SSR) and Compressed Sensing (CS) are interesting new signal processing tools with many potential applications. Many algorithmic options exist to solve the underlying sparse signal recovery problem; Greedy Search Techniques, regularization methods, Bayesian methods, among others. Bayesian methods offer interesting algorithmic options to the Sparse Signal Recovery problem MAP methods (reweighted l 1 and l 2 methods)

103 Summary Sparse Signal Recovery (SSR) and Compressed Sensing (CS) are interesting new signal processing tools with many potential applications. Many algorithmic options exist to solve the underlying sparse signal recovery problem; Greedy Search Techniques, regularization methods, Bayesian methods, among others. Bayesian methods offer interesting algorithmic options to the Sparse Signal Recovery problem MAP methods (reweighted l 1 and l 2 methods) Hierarchical Bayesian Methods (Sparse Bayesian Learning)

104 Summary Sparse Signal Recovery (SSR) and Compressed Sensing (CS) are interesting new signal processing tools with many potential applications. Many algorithmic options exist to solve the underlying sparse signal recovery problem; Greedy Search Techniques, regularization methods, Bayesian methods, among others. Bayesian methods offer interesting algorithmic options to the Sparse Signal Recovery problem MAP methods (reweighted l 1 and l 2 methods) Hierarchical Bayesian Methods (Sparse Bayesian Learning) Versatile and can be more easily employed in problems with structure

105 Summary Sparse Signal Recovery (SSR) and Compressed Sensing (CS) are interesting new signal processing tools with many potential applications. Many algorithmic options exist to solve the underlying sparse signal recovery problem; Greedy Search Techniques, regularization methods, Bayesian methods, among others. Bayesian methods offer interesting algorithmic options to the Sparse Signal Recovery problem MAP methods (reweighted l 1 and l 2 methods) Hierarchical Bayesian Methods (Sparse Bayesian Learning) Versatile and can be more easily employed in problems with structure Algorithms can often be justified by studying the resulting objective functions.

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego

Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego 1 Outline Course Outline Motivation for Course Sparse Signal Recovery Problem Applications Computational

More information

Sparse Signal Recovery: Theory, Applications and Algorithms

Sparse Signal Recovery: Theory, Applications and Algorithms Sparse Signal Recovery: Theory, Applications and Algorithms Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego Collaborators: I. Gorodonitsky, S. Cotter,

More information

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

l 0 -norm Minimization for Basis Selection

l 0 -norm Minimization for Basis Selection l 0 -norm Minimization for Basis Selection David Wipf and Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego, CA 92092 dwipf@ucsd.edu, brao@ece.ucsd.edu Abstract

More information

Latent Variable Bayesian Models for Promoting Sparsity

Latent Variable Bayesian Models for Promoting Sparsity 1 Latent Variable Bayesian Models for Promoting Sparsity David Wipf 1, Bhaskar Rao 2, and Srikantan Nagarajan 1 1 Biomagnetic Imaging Lab, University of California, San Francisco 513 Parnassus Avenue,

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Robust multichannel sparse recovery

Robust multichannel sparse recovery Robust multichannel sparse recovery Esa Ollila Department of Signal Processing and Acoustics Aalto University, Finland SUPELEC, Feb 4th, 2015 1 Introduction 2 Nonparametric sparse recovery 3 Simulation

More information

Sketching for Large-Scale Learning of Mixture Models

Sketching for Large-Scale Learning of Mixture Models Sketching for Large-Scale Learning of Mixture Models Nicolas Keriven Université Rennes 1, Inria Rennes Bretagne-atlantique Adv. Rémi Gribonval Outline Introduction Practical Approach Results Theoretical

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Particle Filtered Modified-CS (PaFiMoCS) for tracking signal sequences

Particle Filtered Modified-CS (PaFiMoCS) for tracking signal sequences Particle Filtered Modified-CS (PaFiMoCS) for tracking signal sequences Samarjit Das and Namrata Vaswani Department of Electrical and Computer Engineering Iowa State University http://www.ece.iastate.edu/

More information

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Asilomar 2011 Jason T. Parker (AFRL/RYAP) Philip Schniter (OSU) Volkan Cevher (EPFL) Problem Statement Traditional

More information

On the Role of the Properties of the Nonzero Entries on Sparse Signal Recovery

On the Role of the Properties of the Nonzero Entries on Sparse Signal Recovery On the Role of the Properties of the Nonzero Entries on Sparse Signal Recovery Yuzhe Jin and Bhaskar D. Rao Department of Electrical and Computer Engineering, University of California at San Diego, La

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Color Scheme. swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14

Color Scheme.   swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14 Color Scheme www.cs.wisc.edu/ swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July 2016 1 / 14 Statistical Inference via Optimization Many problems in statistical inference

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Generalized Orthogonal Matching Pursuit- A Review and Some

Generalized Orthogonal Matching Pursuit- A Review and Some Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents

More information

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm J. K. Pant, W.-S. Lu, and A. Antoniou University of Victoria August 25, 2011 Compressive Sensing 1 University

More information

y(x) = x w + ε(x), (1)

y(x) = x w + ε(x), (1) Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued

More information

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Journal of Information & Computational Science 11:9 (214) 2933 2939 June 1, 214 Available at http://www.joics.com Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Jingfei He, Guiling Sun, Jie

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Noisy Signal Recovery via Iterative Reweighted L1-Minimization Noisy Signal Recovery via Iterative Reweighted L1-Minimization Deanna Needell UC Davis / Stanford University Asilomar SSC, November 2009 Problem Background Setup 1 Suppose x is an unknown signal in R d.

More information

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Structure of the tutorial Session 1: Introduction to inverse problems & sparse

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Type II variational methods in Bayesian estimation

Type II variational methods in Bayesian estimation Type II variational methods in Bayesian estimation J. A. Palmer, D. P. Wipf, and K. Kreutz-Delgado Department of Electrical and Computer Engineering University of California San Diego, La Jolla, CA 9093

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems 1 Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems Alyson K. Fletcher, Mojtaba Sahraee-Ardakan, Philip Schniter, and Sundeep Rangan Abstract arxiv:1706.06054v1 cs.it

More information

Greedy Signal Recovery and Uniform Uncertainty Principles

Greedy Signal Recovery and Uniform Uncertainty Principles Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Perspectives on Sparse Bayesian Learning

Perspectives on Sparse Bayesian Learning Perspectives on Sparse Bayesian Learning David Wipf, Jason Palmer, and Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego, CA 909 dwipf,japalmer@ucsd.edu,

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Compressive Sensing and Beyond

Compressive Sensing and Beyond Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

wissen leben WWU Münster

wissen leben WWU Münster MÜNSTER Sparsity Constraints in Bayesian Inversion Inverse Days conference in Jyväskylä, Finland. Felix Lucka 19.12.2012 MÜNSTER 2 /41 Sparsity Constraints in Inverse Problems Current trend in high dimensional

More information

Higher Order Statistics

Higher Order Statistics Higher Order Statistics Matthias Hennig Neural Information Processing School of Informatics, University of Edinburgh February 12, 2018 1 0 Based on Mark van Rossum s and Chris Williams s old NIP slides

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Priors on the Variance in Sparse Bayesian Learning; the demi-bayesian Lasso

Priors on the Variance in Sparse Bayesian Learning; the demi-bayesian Lasso Priors on the Variance in Sparse Bayesian Learning; the demi-bayesian Lasso Suhrid Balakrishnan AT&T Labs Research 180 Park Avenue Florham Park, NJ 07932 suhrid@research.att.com David Madigan Department

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Stability and Robustness of Weak Orthogonal Matching Pursuits

Stability and Robustness of Weak Orthogonal Matching Pursuits Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery

More information

Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016

Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016 Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines Eric W. Tramel itwist 2016 Aalborg, DK 24 August 2016 Andre MANOEL, Francesco CALTAGIRONE, Marylou GABRIE, Florent

More information

EM Algorithm & High Dimensional Data

EM Algorithm & High Dimensional Data EM Algorithm & High Dimensional Data Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Gaussian EM Algorithm For the Gaussian mixture model, we have Expectation Step (E-Step): Maximization Step (M-Step): 2 EM

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Recovery Guarantees for Rank Aware Pursuits

Recovery Guarantees for Rank Aware Pursuits BLANCHARD AND DAVIES: RECOVERY GUARANTEES FOR RANK AWARE PURSUITS 1 Recovery Guarantees for Rank Aware Pursuits Jeffrey D. Blanchard and Mike E. Davies Abstract This paper considers sufficient conditions

More information

Lecture 1b: Linear Models for Regression

Lecture 1b: Linear Models for Regression Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

Lecture 5 : Sparse Models

Lecture 5 : Sparse Models Lecture 5 : Sparse Models Homework 3 discussion (Nima) Sparse Models Lecture - Reading : Murphy, Chapter 13.1, 13.3, 13.6.1 - Reading : Peter Knee, Chapter 2 Paolo Gabriel (TA) : Neural Brain Control After

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information