Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Size: px

Start display at page:

Download "Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J."

Victoria Morton
5 years ago
Views:

1 Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J. Andrés Christen

2 Topics... Backstory (?) Sampling in linear-gaussian hierarchical models (posterior not Gaussian) Don t do MCMC Don t do Gibbs Don t do one-block Do do marginal then conditional sampling marginalize over interesting unknowns asymptotic cost is one linear solve per independent sample from posterior THM :: MCMC over posterior marginal for hyperparameters can be slick-as, then RTO

Linear Gaussian inverse problem :: image deblurring A blurry 256 256 gray-scale photograph of Jupiter in the methane band (780nm). The inverse problem is to recover the true unblurry image.

3 Linear Gaussian inverse problem :: image deblurring A blurry gray-scale photograph of Jupiter in the methane band (780nm). The inverse problem is to recover the true unblurry image. We use the satellite (upper right) as PSF k, so semi-blind deconvolution. y = k x + η = Ax + η In the continuous setting this is the prototypical ill-posed inverse problem; k is square integrable A is Hilbert-Schmidt compact Discrete problem reflects ill-posedness

4 Regularized Fourier deconvolution Evaluate convolution by a circular convolution no need to zero pad as image is black (hence periodic) at boundaries, giving efficient evaluation by FFT a. Ax = x k Introduce regularizing functional and regularizing parameter λ. Solution ˆx λ satisfies { } ˆx λ = arg min Ax y 2 + λx T Lx x or the normal (generalized deconvolution) equations ( ) A T A + λl ˆx λ = A T y We use L is graph Laplacian on nearest-neighbour pixel lattice, with periodic boundaries b Lx x a Numerical Recipes, section 13.1 b Bardsley SISC 2014

L-curve & deblurred image Choose regularizing parameter by L-curve criterion a 10 6 10 5 x T Lx 10 4 0.

5 L-curve & deblurred image Choose regularizing parameter by L-curve criterion a x T Lx y Ax Determining λ requires 200 iterations/solves seconds (0.507s for 200 solves) Matlab R2012b, Lenovo X230, Intel CORE i5 a P. C. Hansen (1992)

6 Bayesian hierarchical model All unknowns treated as random variables, use same components as regularization ( y x, θ N Ax, (γi) 1) (likelihood) ( x θ N µ, (δl) 1) (prior) θ = (γ, δ) π (θ) where γ is precision of measurements, δ is lumping constant in true image. (hyperprior) This is a common model in statistics : y = observed data, x = latent field, θ = hyperparameter Since π(y x, θ) = γn/2 2π exp { γ 2 Ax y 2 } and π(x θ) = δn/2 det L 2π exp { δ } 2 xt Lx by Bayes rule π(x y, θ) exp { γ 2 ( Ax y 2 δ )} γ xt Lx so x MAP = ˆx λ when λ = δ/γ (though expectations average over θ)

7 Posterior inference The focus of inference is the posterior distribution π (x, θ y) = π (y x, θ) π (x, θ) π (y) Solutions and uncertainties are the posterior expectation of some function φ of x, E [φ (x)] = φ (x) π (x, θ y) dxdθ which implicitly averages over the nuisance parameter θ. When (x, θ) (1),..., (x, θ) (N) π (x, θ y) are iterates of an ergodic Markov chain with convergence guaranteed by a CLT. E [φ (x)] 1 N N φ i=1 (x (i)) Kipnis Varadhan Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions Comm Math Phys 1986

8 Random-walk MCMC over the posterior Most common is to form posterior distribution: π (x, θ y) = π (y x, θ) π (x θ) π (θ) π(y) The normalizing constant π(y) = π (y, x, θ) dxdθ is typically not available Most applications perform MCMC with Metropolis-Hastings dynamics using random walk proposals (e.g. AM, MALA, HMC). This generates a Markov chain that converges in distribution to the posterior. Convergence can be very slow due to correlations within the latent field x, and between x and the hyperparameters θ. Often need 10 5 iterations for suitable convergence to generate a sample for Monte Carlo estimates Easy to implement but gives MCMC it s reputation for being very computationally expensive.

9 Some thoughts on Monte Carlo John von Neumann (1951): Any one who considers [Monte Carlo methods] is... in a state of sin. Alan Sokal (1996): Monte Carlo is an extremely bad method; it should be used only when all alternative methods are worse. Hammersley & Handscomb (1964):... it will usually pay to scrutinize each part of a Monte Carlo experiment to see whether that part cannot be replaced by exact theoretical analysis...

10 Marginal then conditional sampling First sample from the marginal posterior distribution over hyperparameters θ π (θ y) = π (x, θ y) dx then from the full conditional distribution over x Algorithm 1: MTC sampling from the marginal for θ then the full conditional for x draw θ π (θ y) draw x π (x y, θ) Lemma 1 Algorithm 1 generates a sample from the posterior distribution, i.e., (x, θ) π (x, θ y). Proof. The density function over x and θ is π (x y, θ) π (θ y) = π (x, θ y) Independent θ gives independent (x, θ)

11 Marginal posterior for θ Lemma 2 π (θ y) = π (y θ, x) π(x θ)π (θ) π (x θ, y) π (y) Proof. π (x, y, θ) = π (x θ, y) π(y θ)π (θ) and π(x, y, θ) = π(y x, θ)π(x θ)π(θ). Since π(y) 0, the result follows. Can eliminate x to give (general Gaussian-linear model : Σ = noise covariance, Q = prior precision) π (θ y) exp det(σ 1 ) det(q) det(q + A T Σ 1 A) { 1 2 (y Aµ)T Σ 1 A [ (A T Σ 1 A) 1 (A T Σ 1 A + Q) 1] A T Σ 1 (y Aµ) } π(θ). (1)

12 Full conditional for x For the linear-gaussian hierarchical model the full conditional for x is the multivariate normal ( ) x y, θ N µ x y,θ, Q 1 x y,θ µ x y,θ = µ + ( Q + A T Σ 1 A) 1 A T Σ 1 (y Aµ) Q x y,θ = Q + A T Σ 1 A For the Jupiter example x θ, y N ((A T A + (δ/γ)l) 1 A T y, (γa T A + δl) 1) Independent sample computed in O(n log n) operations by solving (gen. deconv. eqns) ( ) γa T A + δl x = γa T y + w where w = v 1 + v 2 with independent v 1 N ( ) 0, γa T A and v 2 N (0, δl) Oliver He Reynolds 1996, Wikle Milliff Nychka Berliner 2001, Lalanne Prévost Chavel 2001, Tan Li Stoica 2010, Orieux Féron Giovannelli 2012, Bardsley 2012

13 Block Gibbs sampling Cycles through sampling from full conditional distributions (aka Glauber, heat bath) Algorithm 2: Gibbs sampling algorithm with blocking of the latent field at state x, θ = (γ, δ) draw x γ, δ, y N ((A T A + (δ/γ)l) 1 A T y, (γa T A + δl) 1) draw γ x, δ, y Γ( n 2 + λ γ, 1 2 Ax y 2 + β γ ) draw δ x, γ, y Γ( n 2 + λ δ, 1 2 Ax y 2 + β δ ) Most of the computational cost is the O(n log n) draw from the large Gaussian latent field Conjugate priors π(γ) = Γ (α γ, β γ ) and π(δ) = Γ (α δ, β δ ) make full conditionals available: (shape and scale parameters chosen, as Bardsley, to be uninformative ) λ x, δ, y Γ (n/2 + α λ, 1 ) 2 Ax y 2 + β λ δ x, λ, y Γ (n/2 + α δ, 1 ) 2 Ax y 2 + β δ

14 One-block sampler Algorithm 3: One-block algorithm at state x, θ draw θ q (θ θ) draw x π (x θ, y) accept (x, θ ) w.p. α ((x, θ) (x, θ )) = 1 π (x, θ y) π (x θ, y) q (θ θ ) π (x, θ y) π (x θ, y) q (θ θ) otherwise reject Transition kernel for the hyperparameter θ targets π (θ y) since π (x, θ y) π (x θ, y) q (θ θ ) π (x, θ y) π (x θ, y) q (θ θ) = π (θ y) q (θ θ ) π (θ y) q (θ θ). Makes larger steps in marginal for θ, rather than conditional θ x

15 Marginal then conditional (MTC) sampler Implement MTC sampler by performing MH MCMC on π (θ y) to get (effectively) independent sample θ π(θ y), then draw x π(x θ, y) to get (effectively) independent sample (x, θ) from the full posterior distribution. Evaluate MH ratio using π (θ y) Algorithm 4: Metropolis-Hastings algorithm on π(θ y) at state θ draw θ q (θ θ) accept θ w.p. α(θ θ ) = 1 π (θ y) q (θ θ ) π (θ y) q (θ θ) otherwise reject General MCMC only over low-dimensional space of hyperparameters high correlations between θ and x are irrelevant General linear-gaussian model requires ratios of determinants of Σ 1, Q and Q + A T Σ 1 A, and differences of arguments of the exponential.

16 O(1) evaluation of ratio of determinants For our Jupiter example the marginal posterior for θ simplifies to π(θ y) δ n/2 exp ( 1 2 g (λ) γ ) 2 f (λ) π(θ) λ = δ/γ, f(λ) = (A T y) T ((A T A) 1 (A T A + λl) 1 )(A T y) and g(λ) = log A T A + λl x f(λ) g(λ) λ λ a range of techniques can be used to give accurate approximation and efficient computation

17 Autocorrelation 0.8 block Gibbs one block MTC option 1 MTC option 2 autocorrelation lag Autocorrelation of λ = δ/γ for all four sampling algorithms

Posterior sample and histograms 2000 γ, the noise precision 1000 0 0.212 0.214 0.216 0.218 0.22 0.222 0.224 0.226 2000 δ, the prior precision 1000 0 4.6 4.8 5 5.2 5.4 5.

18 Posterior sample and histograms 2000 γ, the noise precision δ, the prior precision x 10 5 λ = δ/γ, the regularization parameter x 10 4 Left; image component of a posterior sample, using the MTC Option 2 algorithm Right; marginal posterior histograms for γ, δ and δ/γ

19 Computational efficiency computing cost per effective sample CCES = τt N T is the total (on-line) compute time N is the length of chain integrated autocorrelation time τ = k=1 ρ k length of chain with same variance reducing power as one independent sample Table 1: Compute times and CCES (in seconds) for 10 4 steps of sampling algorithms burn in total time acceptance rate CCES for λ Block Gibbs One block MTC Option MTC Option

20 Integrated autocorrelation times Table 2: Integrated autocorrelation times (in iterations) of various statistics of interest γ δ λ = δ/γ Block Gibbs One block MTC Option MTC Option

21 Compute time for images Table 3: Compute time (seconds) for a regularized image or independent image sample time to λ time to x total time Regularization Block Gibbs One block MTC Option MTC Option

22 Conclusions We factorized the problem into a large Gaussian/linear part and a small non-gaussian/nonlinear part Marginal posterior over hyperparameters allows selection/sampling before deconvolution Small part independent of discretization or data size... hence can perform inference over function space without approximation Trace class prior covariance potentially gives analytic ratio of determinants e.g.... Laplacian in 1-dim... MAP is not in support of the posterior

23 Thanks also to Dan Simpson, Hårvard Rue Thank You

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction