Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Size: px

Start display at page:

Download "Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing"

Mitchell Gray
5 years ago
Views:

1 Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information Theory in Computer Vision November 13, 2011, Barcelona, Spain

2 Inverse Image Problems 2 / 22 Denoising Deblurring Inpainting

3 The Sparse Linear Model A hidden vector x R N and noisy measurements y R M. Sparse linear model K P(x;θ) t(g T k x) k=1 P(y x;θ) = N(y; Hx,σ 2 I) g 1 g 2 g 3 g K x 1 x 2 x 3 x 4 x N h 1 h 2 h 3 h M Sparsity directions: s = Gx, with G = [g T 1 ;...;gt K ] Measurement directions: H = [h T 1 ;...;ht M ] Sparse potential: t(s), e.g., Laplacian t(s) = e τ k s k Model parameters: θ = (G, H,σ 2 ) 3 / 22

4 4 / 22 Deterministic or Probabilistic Modeling? Deterministic modeling: Standard Compressive Sensing Find minimum energy configuration Same as finding the posterior MAP Probabilistic modeling: Bayesian Compressive Sensing Try to capture the full posterior distribution Suitable for learning parameters by maximum likelihood (ML) Harder than just point estimate

5 Deterministic Modeling 5 / 22 MAP estimate as an optimization problem Estimate is ˆx MAP = argminφ MAP (x), where φ MAP (x) = σ 2 y Hx 2 2 K log t(s k ), s k = g T k x. k=1 Properties Modern optimization techniques allow us find ˆx MAP efficiently for large-scale problems.

6 5 / 22 Deterministic Modeling MAP estimate as an optimization problem Estimate is ˆx MAP = argminφ MAP (x), where φ MAP (x) = σ 2 y Hx 2 2 K log t(s k ), s k = g T k x. k=1 Properties Modern optimization techniques allow us find ˆx MAP efficiently for large-scale problems. How much do we trust the solution? What about error bars? Is the MAP best in terms of PSNR performance?

7 Probabilistic Modeling Work with the full posterior distribution K P(x y) N(y; Hx,σ 2 I) t(g T k x). k=1 Posterior Prior/Measure (Figure from Seeger & Wipf, 10) 6 / 22

8 Probabilistic Modeling Markov Chain Monte-Carlo vs. Variational Bayes Markov Chain Monte-Carlo Draw samples from the posterior Typically model prior with Gaussian mixtures and perform block Gibbs sampling. Very general, but can be slow and difficult to monitor convergence [Schmidt, Rao & Roth 10], [Papandreou & Yuille, 10],... Variational Bayes Approximate the posterior distribution with a tractable parametric form Systematic error but often guaranteed convergence [Attias, 99], [Girolami, 01], [Lewicki & Sejnowski, 00], [Palmer et al., 05], [Levin et al., 11], [Seeger & Nickisch, 11],... 7 / 22

9 Variational Bounding 8 / 22 Approximate the posterior distribution with a Gaussian Q(x y) N(y; Hx,σ 2 I)e 1 2 st Γ 1s = N(x;ˆx Q, A 1 ), with ˆx Q = A 1 b, A = σ 2 H T H+G T Γ 1 G, Γ = diag(γ), and b = σ 2 H T y. Suitable for super-gaussian priors t(s k ) = sup e s2 k /(2γ k) h k (γ k )/2 γ k >0 Optimization problem: Find the variational parameters γ that give the tightest fit.

10 Variational Bounding: Double-Loop Algorithm Outer Loop: Variance Computation Compute z = diag(ga 1 G T ), i.e. the vector of variances z k = Var Q (s k y) along the sparsity directions s k = g T k x. Inner Loop: Smoothed Estimation Obtain the variational mean ˆx Q = argmin x φ Q (x; z), where φ Q (x; z) = σ 2 y Hx 2 2 Update the variational parameters γ 1 K log t ((s k 2 + z k) 1/2) k=1 k = 2 d log t( v) dv v=ŝ 2 k +z k Convex if standard MAP is convex. See [Seeger & Nickisch, 11]. 9 / 22

11 10 / 22 Variance Computation Goal: Estimate elements of Σ = A 1, where A = σ 2 H T H+G T Γ 1 G Direct inversion is hopeless (N 10 6 ). Accurate and fast techniques for problems of special structure [Malioutov et al., 08]. Lanczos iteration (only MVM required) [Schneider & Willsky, 01], [Seeger & Nickisch, 11].

12 10 / 22 Variance Computation Goal: Estimate elements of Σ = A 1, where A = σ 2 H T H+G T Γ 1 G Direct inversion is hopeless (N 10 6 ). Accurate and fast techniques for problems of special structure [Malioutov et al., 08]. Lanczos iteration (only MVM required) [Schneider & Willsky, 01], [Seeger & Nickisch, 11]. This work: Monte-Carlo variance estimation.

13 11 / 22 Gaussian Sampling by Local Perturbations g 1 g 2 g 3 g K g 1 g 2 g 3 g K x 1 x 2 x 3 x 4 x N x 1 x 2 x 3 x 4 x N h 1 h 2 h 3 h M h 1 h 2 h 3 h M Gaussian MRF sampling by local noise injection 1. Local Perturbations : ỹ N(0,σ 2 I), and β N(0,Γ 1 ) 2. Gaussian Mode : A x = σ 2 H T ỹ+g T β Then x N(0, A 1 ), where A = σ 2 H T H+G T Γ 1 G. [Papandreou & Yuille, 10]

14 12 / 22 Monte-Carlo Variance Estimation Let x i N(0, A 1 ), with i = 1,...,N s. General purpose Monte-Carlo variance estimator ˆΣ = 1 N s where s k,i = g T k x i. Properties N s i=1 x i x T i, ẑ k = 1 N s N s i=1 s 2 k,i, Marginal distribution of estimates ẑ k /z k 1 N s χ 2 (N s ). Unbiased E {ẑ k } = z k. Relative error is r = (ẑ k )/z k = 2/N s.

15 Monte-Carlo vs. Lanczos Variance Estimates 13 / 22 8 x 10 3 SAMPLE LANCZOS EXACT 6 ẑk z k x 10 3

16 14 / 22 Application: Image Deconvolution Measurement equation: y k x = Hx. Non-blind deconvolution (known blur kernel k). Blind deconvolution (unknown blur kernel k).

17 15 / 22 Blind Image Deconvolution Blur kernel recovery by Maximum Likelihood ML objective: ˆk = argmax k P(y; k) = argmax k P(y, x; k)dx. Variational ML: ˆk = argmax k Q(y; k) Contrast with argmax k (max x P(x, y; k)). [Fergus et al., 06], [Levin et al., 09].

18 16 / 22 Variational EM for Maximum Likelihood Find k by maximizing Q(y; k) [Girolami, 01], [Levin et al., 11]. E-Step Given current kernel estimate k t, do variational Bayesian inference, i.e., fit Q(x y; k t ). M-Step Maximize w.r.t. k the expected complete log-likelihood E Q(x y;k t ){log Q(x, y; k)}. Equivalently, minimize w.r.t. k { } 1 2 y Hx 2 = 1 ) ((H 2 tr T H)(A 1 + ˆxˆx T ) y T Hˆx+(const) E Q(x y;k t ) = 1 2 kt R xx k r T xyk+(const) Expected moments R xx estimated by Gaussian sampling.

19 Summary of Computational Primitives 17 / 22 Smoothed estimation Obtain the variational mean ˆx Q = argmin x φ Q (x; z), where φ Q (x; z) = σ 2 y Hx 2 2 K log t ((s k 2 + z k) 1/2) k=1 Inner loop of variational inference. Sparse linear system Ax = b, where A = σ 2 H T H+G T Γ 1 G. Estimate variances in outer loop of variational inference and moments R xx in blind image deconvolution.

20 17 / 22 Summary of Computational Primitives Smoothed estimation Obtain the variational mean ˆx Q = argmin x φ Q (x; z), where φ Q (x; z) = σ 2 y Hx 2 2 K log t ((s k 2 + z k) 1/2) k=1 Inner loop of variational inference. Sparse linear system Ax = b, where A = σ 2 H T H+G T Γ 1 G. Estimate variances in outer loop of variational inference and moments R xx in blind image deconvolution. Solve with preconditioned conjugate gradients.

21 18 / 22 Efficient Circulant Preconditioning Approximate A = σ 2 H T H+G T Γ 1 G with P = σ 2 H T H+ γ 1 G T G, with γ 1 (1/K) K k=1 γ 1 k [Lefkimmiatis et al., 12]. Properties Thanks to stationarity of P, DFT techniques apply. Optimality: P = argmin X C X A

22 Effect of Preconditioner 19 / CG 10 5 PCG

23 Non-Blind Image Deblurring Example 20 / 22 ground truth our result (PSNR=31.93dB) blurred (PSNR=22.57dB) VB stdev

24 Blind Image Deblurring Example 21 / 22 ground truth our result (PSNR=27.54dB) blurred (PSNR=22.57dB) kernel

25 Summary Main Points Variational Bayesian inference using standard optimization primitives. Scalable to large-scale problems. Open question: Monte-Carlo or Variational?

26 Summary Main Points Variational Bayesian inference using standard optimization primitives. Scalable to large-scale problems. Open question: Monte-Carlo or Variational? Our software integrated in the glm-ie open source toolbox. THANK YOU!

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou 1 and Alan L. Yuille 1,2 1 Department of Statistics, University of California, Los Angeles 2 Department of Brain