Hierarchical prior models for scalable Bayesian Tomography

Size: px

Start display at page:

Download "Hierarchical prior models for scalable Bayesian Tomography"

Moses Barber
5 years ago
Views:

A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography,

Hierarchical prior models for scalable Bayesian Tomography Ali Mohammad-Djafari

SUD SUPELEC, 91192 Gif-sur-Yvette, France http://lss.centralesupelec.

1 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 1/45. Hierarchical prior models for scalable Bayesian Tomography Ali Mohammad-Djafari Laboratoire des Signaux et Systèmes (L2S) UMR8506 CNRS-CentraleSupélec-UNIV PARIS SUD SUPELEC, Gif-sur-Yvette, France Invited talk at WSTL, Szeged, Hungary

2 Contents 1. Computed Tomography or how to See inside of a body 2. Algebraic and Regularization methods 3. Basic Bayesian approach 4. Two main steps: Choosing appropriate Prior model Do the computational efficiently 5. Unsupervised Bayesian methods: JMAP, Marginalization, VBA 6. Hierarchical prior modelling Sparsity enforcing through Student-t and IGSM Gauss-Markov-Potts models 7. Computational tools: JMAP, Gibbs Sampling MCMC, VBA 8. Case study: Image Reconstruction with only two projections 9. Implementation issues Main GPU implementation steps: Forward and Back Projections Multi-Resolution implementation 10. Conclusions A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 2/45

3 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 3/45 Computed Tomography or how to see inside of a body f (x, y) a section of a real 3D body f (x, y, z) g φ (r) a line of observed radiography g φ (r, z) Forward model: Line integrals or Radon Transform g φ (r) = f (x, y) dl + ɛ φ (r) L r,φ = f (x, y) δ(r x cos φ y sin φ) dx dy + ɛ φ (r) Inverse problem: Image reconstruction Given the forward model H (Radon Transform) and a set of data g φi (r), i = 1,, M find f (x, y)

4 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 4/45 2D and 3D Computed Tomography 3D 2D g φ (r 1, r 2 ) = f (x, y, z) dl L r1,r 2,φ g φ (r) = f (x, y) dl L r,φ Forward probelm: f (x, y) or f (x, y, z) g φ (r) or g φ (r 1, r 2 ) Inverse problem: g φ (r) or g φ (r 1, r 2 ) f (x, y) or f (x, y, z)

5 Algebraic methods: Discretization y S r H ij f (x, y) f 1 f j g i φ f N x D f (x, y) = j f j b j (x, y) { 1 if (x, y) pixel j b j (x, y) = g(r, φ) 0 else N g(r, φ) = f (x, y) dl g i = H ij f j + ɛ i L j=1 g k = H k f + ɛ k, k = 1,, K g = Hf + ɛ g k projection at angle φ k, g all the projections. A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 5/45

6 Algebraic methods g 1 g =. g K, H = H 1. H K g k = H k f+ɛ k g = k H k f+ɛ k = Hf+ɛ H is huge dimensional: 2D: , 3D: Hf corresponds to forward projection H t g corresponds to Back projection (BP) H may not be invertible and even not square H is, in general, ill-conditioned Generalized Inverse Minimum Norm Solution f = H t (HH t ) 1 g = H t k (H kh t k ) 1 g k k can be interpreted as the Filtered Back Projection solution Due to ill-posedness of the inverse problems, Generalized inversion and Least squares (LS) methods: { f = arg min J(f) = g Hf 2 } f = (H t H) 1 H t g f A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 6/45

7 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 7/45 Inversion: Deterministic methods: Data matching Observation model g = Hf + ɛ g i = [Hf] i + ɛ i = h i (f), i = 1,..., M Mismatch between data and output of the model (g, H(f)) Examples: f = arg min { (g, H(f))} f LS (g, H(f)) = g H(f) 2 = i L p (g, H(f)) = g H(f) p = i g i h i (f) 2 g i h i (f) p, 1 < p < 2 KL (g, H(f)) = i g i ln g i h i (f) In general, does not give satisfactory results for inverse problems.

8 Regularization theory Inverse problems = Ill posed problems Need for prior information Functional space (Tikhonov): g = H(f ) + ɛ J(f ) = g H(f ) λ Df 2 2 Finite dimensional space (Philips & Towmey): g = Hf + ɛ J(f) = g Hf 2 + λ f 2 Minimum norme LS (MNLS): J(f) = g H(f) 2 + λ f 2 Classical regularization: J(f) = g H(f) 2 + λ Df 2 More general regularization: J(f) = Q(g H(f)) + λω(df) or J(f) = 1 (g, H(f)) + λ 2 (Df, f 0 ) Limitations: Errors are considered implicitly white and Gaussian Limited prior information on the solution Lack of tools for the determination of the hyperparameters A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 8/45

9 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 9/45 Inversion: Probabilistic methods Taking account of errors and uncertainties Probability theory Maximum Likelihood (ML) Minimum Inaccuracy (MI) Probability Distribution Matching (PDM) Maximum Entropy (ME) and Information Theory (IT) Bayesian Inference (Bayes) Advantages: Explicit account of the errors and noise A large class of priors via explicit or implicit modelling A coherent approach to combine information content of the data and priors Limitations: Practical implementation and cost of calculation

10 Bayesian estimation approach M : g = Hf + ɛ Observation model M + Hypothesis on the noise ɛ p(g f; M) = pɛ(g Hf) A priori information p(f M) p(g f; M) p(f M) Bayes : p(f g; M) = p(g M) Link with regularization : Maximum A Posteriori (MAP) : f = arg max {p(f g)} = arg max {p(g f) p(f)} f f = arg min {J(f) = ln p(g f) ln p(f)} f Regularization: f = arg min {J(f) = Q(g, Hf) + λω(f)} f with Q(g, Hf) = ln p(g f) and λω(f) = ln p(f) A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 10/45

11 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 11/45 Case of linear models and Gaussian priors g = Hf + ɛ Prior knowledge on the noise: ɛ N (0, v ɛ 2 I) p(g f) exp [ 1 2v ɛ 2 g Hf 2 Prior knowledge on f: [ f N (0, vf 2 (D D) 1 ) p(f) exp 1 ] 2vf 2 Df 2 A posteriori: [ p(f g) exp 1 2v 2 g Hf 2 1 ] ɛ 2vf 2 Df 2 MAP : f = arg maxf {p(f g)} = arg min f {J(f)} with J(f) = g Hf 2 + λ Df 2, λ = vɛ2 v 2 f Advantage : characterization of the solution p(f g) = N ( f, Σ) f = ( H H + λd D ) 1 H g, with ] Σ = vɛ ( H H + λd D ) 1

12 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 12/45 MAP estimation with other priors: f = arg min {J(f)} f Separable priors: with J(f) = 1 v ɛ g Hf 2 + Ω(f) Gaussian: p(f j ) exp [ α f j 2] Ω(f) = α j f j 2 = f 2 2 Generalized Gaussian: p(f j ) exp [ α f j p ], 1 < p < 2 Ω(f) = α j f j p = f p p Gamma: p(f j ) fj α exp [ βf j ] Ω(f) = α j ln f j + βf j Beta: p(f j ) fj α (1 f j ) β Ω(f) = α j ln f j + β j ln(1 f j) Markovian models: p(f j f) exp α φ(f j, f i ) Ω(f) = α φ(f j, f i ), i Nj j i N j

13 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 13/45 Main advantages of the Bayesian approach MAP = Regularization Posterior mean? Marginal MAP? More information in the posterior law than only its mode or its mean Tools for estimating hyper parameters Tools for model selection More specific and specialized priors, particularly through the hidden variables and hierarchical models More computational tools: Expectation-Maximization for computing the maximum likelihood parameters MCMC for posterior exploration Variational Bayes for analytical computation of the posterior marginals...

14 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 14/45 Two main steps in the Bayesian approach Prior modeling Separable: Gaussian, Gamma, Sparsity enforcing: Generalized Gaussian, mixture of Gaussians, mixture of Gammas,... Markovian: Gauss-Markov, GGM,... Markovian with hidden variables (contours, region labels) Choice of the estimator and computational aspects MAP, Posterior mean, Marginal MAP MAP needs optimization algorithms Posterior mean needs integration methods Marginal MAP and Hyperparameter estimation need integration and optimization Approximations: Gaussian approximation (Laplace) Numerical exploration MCMC Variational Bayes (Separable approximation)

15 Which images I am looking for? A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 15/45

16 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 16/45 Which image I am looking for? Gauss-Markov Generalized GM Piecewize Gaussian Mixture of GM

17 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 17/45 Different prior models for signals and images: Separable Simple Gaussian, Gamma, Generalized Gaussian p(f) exp α φ(f j ) j Simple Markovian models: Gauss-Markov, Generalized Gauss-Markov p(f) exp α φ(f j f i ) j j N (i) Hierarchical models with hidden variables: Bernouilli-Gaussian, Gaussian-Gamma p(f z) exp φ(f j z j ) and p(z) exp φ(z j ) j j with different choices for φ(f j z j ) and pφ(z j )

18 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 18/45 Bayesian inference as Hierarchical models Simple case: g = Hf + ɛ θ 2 θ 1 p(f g, θ) p(g f, θ 1 ) p(f θ 2 ) Objective: Infer f f ɛ MAP: f = arg maxf {p(f g, H θ)} Posterior Mean (PM): f = f p(f g, θ) df g Example: { Caussian case: p(g f, θ1 ) = N (g Hf, θ 1 I) v f v ɛ p(f g, θ) = N (f f, p(f θ 2 ) = N (g 0, θ 2 I) Σ) MAP: f = arg minf {J(f)} with f ɛ J(f) = 1 v ɛ g Hf v f f 2 H Posterior Mean (PM)=MAP: g { f = (H t H + λi) 1 H t g Σ = (H t H + λi) 1 with λ = vɛ v f.

19 Gaussian model: Simple separable and Markovian g = Hf + ɛ Separable Gaussian v f v ɛ f ɛ H g Gauss-Markov v f, D v ɛ f ɛ H g g = Hf + ɛ { p(g f, θ1 ) = N (g Hf, v ɛ I) p(f g, θ) = N (f f, Σ) p(f v f ) = N (g 0, θ 2 I) MAP: f = arg minf {J(f)} with J(f) = 1 v ɛ g Hf v f f 2 Posterior Mean (PM)=MAP: { f = (H t H + λi] 1 H t g Σ = v ɛ (H t H + λi] 1 with λ = vɛ v f. Markovian case: p(f v f, D) = N (g 0, v f (DD t ) 1 ) MAP: J(f) = 1 v ɛ g Hf v f Df 2 Posterior Mean (PM)=MAP: { f = (H t H + λd t D] 1 H t g Σ = v ɛ (H t H + λd t D] 1 with λ = ve v f. A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 19/45

20 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 20/45 Bayesian inference (Unsupervised case) Unsupervised case: Hyper parameter estimation p(f, θ g) p(g f, θ 1 ) p(f θ 2 ) p(θ) JMAP: ( f, θ) = arg max (f,θ) {p(f, θ g)} Marginalization β 0 α 0 1: p(f g) = p(f, θ g) dθ θ 2 θ 1 Objective: Infer (f, θ) Marginalization 2: p(θ g) = p(f, θ g) df followed by: f ɛ H θ = arg max θ {p(θ g)} f = arg max f {p(f g, θ) } MCMC Gibbs sampling: g f p(f θ, g) θ p(θ f, g) until convergence Use samples generated to compute mean and variances VBA: Approximate p(f, θ g) by q 1 (f) q 2 (θ) Use q 1 (f) to infer f and q 2 (θ) to infer θ

21 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 21/45 Variational Bayesian Approximation Approximate p(f, θ g) by q(f, θ g) = q 1 (f g) q 2 (θ g) and then continue computations. Criterion KL(q(f, θ g) : p(f, θ g)) KL(q : p) = q ln q/p = q 1 q 2 ln q 1q 2 = p q 1 ln q 1 + q 2 ln q 2 q ln p = H(q 1 ) H(q 2 ) < ln p > q Alternate optimization algorithm q 1 q 2 q 1 q 2, [ ] q 1 (f) exp ln p(g, f, θ; M) [ q2 (θ)] q 2 (θ) exp ln p(g, f, θ; M) q1 (f) p(f, θ g) Variational Bayesian Approximation q 1 (f) f q 2 (θ) θ

22 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 22/45 JMAP, Marginalization, VBA JMAP: Marginalization p(f, θ g) optimization f θ p(f, θ g) p(θ g) θ p(f θ, g) f Joint Posterior Marginalize over f Variational Bayesian Approximation p(f, θ g) Variational Bayesian Approximation q 1 (f) f q 2 (θ) θ

23 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 23/45 Variational Bayesian Approximation Approximate p(f, θ g) by q(f, θ) = q 1 (f) q 2 (θ) and then use them for any inferences on f and θ respectively. Criterion KL(q(f, θ g) : p(f, θ g)) KL(q : p) = q ln q p = q 1 q 2 ln q 1q 2 p Iterative algorithm q 1 q 2 q 1 q 2, q 1 (f) q 2 (θ) [ ] exp ln p(g, f, θ; M) q2 [ (θ)] exp ln p(g, f, θ; M) q1 (f) p(f, θ g) Variational Bayesian Approximation q 1 (f) f q 2 (θ) θ

24 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 24/45 BVA: Choice of family of laws q 1 and q 2 { Case 1 : Joint MAP { q 1 (f f) = δ(f f) f = arg maxf {p(f, θ g; } M) } q 2 (θ θ) = δ(θ θ) θ= arg max θ {p( f, θ g; M) Case 2 : EM { q1 (f) p(f θ, g) Q(θ, θ)= ln p(f, θ g; M) q1 (f θ) q 2 (θ θ) = δ(θ θ) θ = arg max θ {Q(θ, θ) } Appropriate choice for inverse problems with Exponential families and Conjugate priors { q 1 (f) p(f θ, g; M) Accounts for the uncertainties of q 2 (θ) p(θ f, g; M) θ for f and vice versa.

25 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 25/45 JMAP, EM and VBA JMAP Alternate optimization Algorithm: θ (0) θ f = arg max f {p(f, θ g) } θ θ θ } = arg max θ {p( f, θ g) f f f EM: θ (0) θ q 1 (f) = p(f θ, g) q 1 (f) f θ θ Q(θ, θ) = ln p(f, θ g) q1(f) θ = arg max θ {Q(θ, θ) } q 1 (f) VBA: θ (0) q 2 (θ) q 1 (f) exp θ q 2 (θ) q 2 (θ) exp [ ] ln p(f, θ g) q2(θ) [ ] ln p(f, θ g) q1(f) q 1 (f) f q 1 (f)

26 Unsupervised Gaussian model Unsupervised Gaussian { case: g = Hf + ɛ, ɛ N (ɛ 0, v ɛ I) p(g f, vɛ ) = N (g Hf, v ɛ I) { p(f v f ) = N (g 0, v f I) p(vɛ ) = IG(v ɛ α ɛ0, β ɛ0 ) p(v f ) = IG(v f α f0, β f0 ) p(f, v ɛ, v f g) p(g f, v ɛ ) p(f v f ) p(v ɛ ) p(v f ) α f0, β f0 α ɛ0, β ɛ0 JMAP: ( f, ˆv ɛ, v f ) = arg max (f,v ɛ,v f ) {p(f, v ɛ, v f g)} v f v Alternate optimization: ɛ f = (H λi) 1 H + H g with λ = ˆvɛ v f f ɛ ˆv ɛ = β ɛ α ɛ, H β ɛ = β ɛ0 + g H f 2, α ɛ = α ɛ0 + M/2 v f = β f α f, β f = β f0 + f 2, α f = α f0 + N/2 g VBA: Approximate p(f, v ɛ, v f g) by q 1 (f) q 2 (v ɛ ) q 3 (v f ) Alternate optimization: q 1 (f) = N (f µ, Σ f ) q 2 (v ɛ ) = IG(v ɛ α ɛ, β ɛ ) q 3 (v f ) = IG(v ɛ α f, β f ) A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 26/45

27 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 27/45 Unsupervised Gaussian model { g = Hf + ɛ, ɛ N (ɛ 0, v ɛ I) { p(g f, vɛ ) = N (g Hf, v ɛ I) p(vɛ ) = IG(v, ɛ α ɛ0, β ɛ0 ) p(f v f ) = N (g 0, v f I) p(v f ) = IG(v f α f0, β f0 ) p(f, v ɛ, v f g) p(g f, v ɛ ) p(f v f ) p(v ɛ ) p(v f ) JMAP: ( f, ˆv ɛ, v f ) = arg max (f,v ɛ,v f ) {p(f, v ɛ, v f g)} f = (H λi) 1 H + H g with λ = ˆvɛ v f ˆv ɛ = β ɛ α ɛ, βɛ = β ɛ0 + g H f 2, α ɛ = α ɛ0 + M/2 v f = β f α f, βf = β f0 + f 2, α f = α f0 + N/2 VBA: Approximate p(f, v ɛ, v f g) by q 1 (f) q 2 (v ɛ ) q 3 (v f ) ( λi) 1 q 1 (f) = N (f µ, Σ µ = H H + H g with λ = ˆvɛ v f f ) q 2 (v ɛ ) = IG(v ɛ α ɛ, β Σ = ˆv ɛ ) ɛ (H H + λi ) 1 q 3 (v f ) = IG(v ɛ α f, β f ) ˆv ɛ = β ɛ α ɛ, βɛ = β ɛ0 + < g H f 2 >, α ɛ = α ɛ0 + M 2 v f = β f α f, βf = β f0 + < f 2 >, α f = α f0 + ] N 2 ] < g H f 2 >= g H µ 2 + Hdiag [ Σ H t, < f 2 >= µ 2 + diag [ Σ

28 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 28/45 Sparsity enforcing models Student-t model St(f ν) exp [ ν + 1 log ( 1 + f 2 /ν )] 2 Infinite Gausian Scaled Mixture (IGSM) equivalence St(f ν) 0 N (f, 0, 1/z) G(z α, β) dz, with α = β = ν/2 p(f z) p(z α, β) p(f, z α, β) = j p(f j z j ) = [ j N (f j 0, 1/z j ) exp 1 2 = j G(z j α, β) [ j z(α 1) j exp [ βz j ] ] exp j (α 1) ln z j βz j [ ] exp 1 2 j z jfj 2 + (α 1) ln z j βz j ] j z jfj 2

29 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 29/45 Non stationary noise and Integrated Gaussian Scaled Mixture (IGSM) model Non stationary noise: g = Hf+ɛ, ɛ i N (ɛ i 0, v ɛi ) ɛ N (ɛ 0, V ɛ = diag [v ɛ1,, v ɛm ]) Student-t prior model and its equivalent IGSM : f j v fj N (f j 0, v fj ) and v fj IG(v fj α f0, β f0 ) f j St(f j α f0, β f0 ) { p(g f, vɛ ) = N (g Hf, V α f0, β f0 α ɛ0, β ɛ ), V ɛ = diag [v ɛ ] ɛ0 { p(f v f ) = N (g 0, V f ), V f = diag [v f ] p(vɛ ) = v f v i IG(v ɛ i α ɛ0, β ɛ0 ) ɛ p(v f ) = i IG(v f j α f0, β f0 ) p(f, v ɛ, v f g) p(g f, v ɛ ) p(f v f ) p(v ɛ ) p(v f ) f ɛ JMAP: ( f, ˆv ɛ, ˆv f ) = arg max H (f,v ɛ,v f ) {p(f, v ɛ, v f g)} VBA: Approximate p(f, v ɛ, v f g) g by q 1 (f) q 2 (v ɛ ) q 3 (v f )

30 Sparse model in a Transform domain 1 g = Hf + ɛ, f = Dz, z sparse { p(g z, vɛ ) = N (g HDf, v ɛ I) { p(z v z ) = N (z 0, V z ), V z = diag [v z ] α z0, β z0 p(vɛ ) = IG(v ɛ α ɛ0, β ɛ0 ) p(v z ) = i IG(v z j α z0, β z0 ) v z α ɛ0, β p(z, v ɛ, v z, v ξ g) p(g z, v ɛ ) p(z v z ) p(v ɛ ) p(v z ) p(v ξ ) ɛ0 JMAP: z v (ẑ, ˆv ɛ, v z ) = arg max {p(z, v ɛ, v z g)} ɛ (z,v ɛ,v z ) D Alternate optimization: f ɛ ẑ = arg minz {J(z)} with: H J(z) = 1 2 ˆv ɛ g HDz 2 + V 1/2 z z 2 g v zj = βz 0 +ẑ2 j α z0 +1/2 ˆv ɛ = βɛ 0 + g HDẑ 2 α ɛ0 +M/2 VBA: Approximate p(z, v ɛ, v z, v ξ g) by q 1 (z) q 2 (v ɛ ) q 3 (v z ) Alternate optimization. A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 30/45

31 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 31/45 Sparse model in a Transform domain 2 g = Hf + ɛ, f = Dz + ξ, z sparse p(g f, v ɛ ) = N (g Hf, v ɛ I) α ξ0, β ξ0 α z0, β p(f z) = N (f Dz, v z0 ξ I), p(z v z ) = N (z 0, V z ), V z = diag [v z ] v ξ v z p(v ɛ ) = IG(v ɛ α ɛ0, β ɛ0 ) α ɛ0, β ɛ0 p(v z ) = i IG(v z j α z0, β z0 ) p(v ξ z v ξ ) = IG(v ξ α ξ0, β ξ0 ) ɛ p(f, z, v ɛ, v z, v ξ g) p(g f, v ɛ ) p(f z f ) p(z v z ) D p(v ɛ ) p(v z ) p(v ξ ) f ɛ JMAP: H ( f, ẑ, ˆv ɛ, v z, v ξ ) = arg max {p(f, z, v ɛ, v z, v ξ g)} (f,z,v ɛ,v z,v ξ ) g Alternate optimization. VBA: Approximate p(f, z, v ɛ, v z, v ξ g) by q 1 (f) q 2 (z) q 3 (v ɛ ) q 4 (v z ) q 5 (v ξ ) Alternate optimization.

32 Gauss-Markov-Potts prior models for images f (r) z(r) c(r) = 1 δ(z(r) z(r0 )) p(f (r) z(r) = k, mk, vk ) = N (mk, vk ) X p(f (r)) = P(z(r) = k) N (mk, vk ) Mixture of Gaussians I I k Q Separable iid hidden variables: p(z) = r p(z(r)) Markovian hidden variables: p(z) Potts-Markov: X p(z(r) z(r0 ), r0 V(r)) exp γ δ(z(r) z(r0 )) r0 V(r) X X p(z) exp γ δ(z(r) z(r0 )) r R r0 V(r) A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 32/45

Four different cases To each pixel of the image is associated 2 variables f (r) and z(r) I f z Gaussian iid, z iid : Mixture of Gaussians I f z Gauss-Markov, z iid : Mixture of Gauss-Markov I f z

33 Four different cases To each pixel of the image is associated 2 variables f (r) and z(r) I f z Gaussian iid, z iid : Mixture of Gaussians I f z Gauss-Markov, z iid : Mixture of Gauss-Markov I f z Gaussian iid, z Potts-Markov : Mixture of Independent Gaussians (MIG with Hidden Potts) I f z Markov, z Potts-Markov : Mixture of Gauss-Markov (MGM with hidden Potts) f (r) z(r) A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 33/45

34 Gauss-Markov-Potts prior models for images f (r) z(r) c(r) = 1 δ(z(r) z(r0 )) a0 g = Hf + m 0, v0 γ α 0, β 0 p(g f, v ) = N (g Hf, v I) α0, β0 p(v ) = IG(v α 0, β 0 )??? p(f = k,q mk, vk ) = N (f (r) mk, vk ) (r) z(r) P v z θ p(f z, θ) = k r Rk ak N (f (r) mk, v k ), θ = {(a, m, k k v k ), k = 1,, K R p(θ) = D(a a )N 0, v 0)IG(v α0, β0 ) h0 P(a m i P 0 p(z γ) exp γ δ(z(r) z(r )) Potts MRF 0 r r N (r) H? p(f, z, θ g) p(g f, v ) p(f z, θ) p(z γ) g MCMC: Gibbs Sampling VBA: Alternate optimization. A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 34/45

35 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 35/45 Bayesian Computation and Algorithms Joint posterior probability law of all the unknowns f, z, θ p(f, z, θ g) p(g f, θ 1 ) p(f z, θ 2 ) p(z θ 3 ) p(θ) Often, the expression of p(f, z, θ g) is complex. Its optimization (for Joint MAP) or its marginalization or integration (for Marginal MAP or PM) is not easy Two main techniques: MCMC: Needs the expressions of the conditionals p(f z, θ, g), p(z f, θ, g), and p(θ f, z, g) VBA: Approximate p(f, z, θ g) by a separable one q(f, z, θ g) = q 1 (f) q 2 (z) q 3 (θ) and do any computations with these separable ones.

36 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 36/45 MCMC based algorithm p(f, z, θ g) p(g f, z, θ 1 ) p(f z, θ 2 ) p(z θ 3 ) p(θ) General Gibbs sampling scheme: f p(f ẑ, θ, g) ẑ p(z f, θ, g) θ (θ f, ẑ, g) Generate samples f using p(f ẑ, θ, g) p(g f, θ) p(f ẑ, θ) When Gaussian, can be done via optimization of a quadratic criterion. Generate samples z using p(z f, θ, g) p(g f, ẑ, θ) p(z) Often needs sampling (hidden discrete variable) Generate samples θ using p(θ f, ẑ, g) p(g f, σ 2 ɛ I) p( f ẑ, (m k, v k )) p(θ) Use of Conjugate priors analytical expressions. After convergence use samples to compute means and variances.

37 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 37/45 Computed Tomography with only two projections y H ij f 1 f j g i f N g(r, φ) = L f (x, y) dl f (x, y) = j f j b j (x, y) { 1 if (x, y) pixel j b j (x, y) = 0 else N g i = H ij f j + ɛ i j=1 g = Hf + ɛ f (x, y) Case study: Reconstruction from 2 projections g 1 (x) = f (x, y) dy, g 2 (y) = f (x, y) dx Very ill-posed inverse problem f (x, y) = g 1 (x) g 2 (y) Ω(x, y) Ω(x, y) is a Copula: Ω(x, y) dx = 1 Ω(x, y) dy = 1 x

38 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 38/45 Simple example g 1 g 2 g 3 g =?? 4?? f 1 f 3 g 3 f 2 f 4 g 4 g 1 g 2 f 1 f 2 f 3 f 4 Hf = g f = H 1 g if H invertible. H is rank deficient: rank(h) = f 1 f 4 f 7 f 2 f 5 f 8 f 3 f 6 f 9 g 1 g 2 g 3 Problem has infinite number of solutions. How to find all those solutions? g 4 g 5 g 6 Which one is the good one? Needs prior information. To find an unique solution, one needs either more data or prior information. 0 0

Application in CT: Reconstruction from 2 projections g f g =

39 Application in CT: Reconstruction from 2 projections g f g = Hf + g f N (Hf, σ 2 I) Gaussian f z iid Gaussian or Gauss-Markov z iid or Potts c q(r) {0, 1} 1 δ(z(r) z(r0 )) binary p(f, z, θ g) p(g f, θ 1 ) p(f z, θ 2 ) p(z θ 3 ) p(θ) A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 39/45

40 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 40/45 Proposed algorithms p(f, z, θ g) p(g f, θ 1 ) p(f z, θ 2 ) p(z θ 3 ) p(θ) MCMC based general scheme: f p(f ẑ, θ, g) ẑ p(z f, θ, g) θ (θ f, ẑ, g) Iterative algorithme: Estimate f using p(f ẑ, θ, g) p(g f, θ) p(f ẑ, θ) Needs optimization of a quadratic criterion. Estimate z using p(z f, θ, g) p(g f, ẑ, θ) p(z) Needs sampling of a Potts Markov field. Estimate θ using p(θ f, ẑ, g) p(g f, σ 2 ɛ I) p( f ẑ, (m k, v k )) p(θ) Conjugate priors analytical expressions. Variational Bayesian Approximation Approximate p(f, z, θ g) by q 1 (f) q 2 (z) q 3 (θ)

41 Results Original Backprojection Filtered BP Gauss-Markov+pos GM+Line process GM+Label process c LS z c A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 41/45

42 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 42/45 Implementation issues In almost all the algorithms, the step of computation of f needs an optimization algorithm. The criterion to optimize is often in the form of J(f) = g Hf 2 + λ Df 2 Very often, we use the gradient based algorithms which need to compute J(f) = 2H t (g Hf) + 2λD t Df So, for the simplest case, in each step, we have f(k+1) = f(k) + α (k) [ H t (g H f (k) ) + 2λD t D f (k)]

43 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 43/45 Gradient based algorithms f(k+1) = f(k) + α [ H ( g H f (k)) λd D f (k)] 1. Compute ĝ = H f (Forward projection) 2. Compute δg = g ĝ (Error or residual) 3. Compute δf 1 = H δg (Backprojection of error) 4. Compute δf 2 = D D f (Correction due to regularization) 5. Update f(k+1) = f(k) + [δf1 + δf 2 ] Initial estimated guess image f (0) f (k) update Forward projection H correction term in image space Backprojection δf = H δg λd Df (k) H projections of estimated image ĝ = Hf (k) compare correction term in projection space δg = g ĝ Measured projections g Steps 1 and 3 need great computational cost and have been implemented on GPU.

44 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 44/45 Multi-Resolution Implementation Sacle 1: black g (1) = H (1) f (1) ( N N ) Sacle 2: green g (2) = H (2) f (2) (N/2 N/2) Sacle 3: red g (3) = H (3) f (3) (N/4 N/4)

45 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 45/45 Conclusions Computed Tomography is an Inverse problem Analytical methods have many limitations Algebraic methods push further these limitations Deterministic Regularization methods push still further the limitations of ill-conditioning. Probabilistic and in particular the Bayesian approach has many potentials Hierarchical prior model with hidden variables are very powerful tools for Bayesian approach to inverse problems. Gauss-Markov-Potts models for images incorporating hidden regions and contours Main Bayesian computation tools: JMAP, MCMC and VBA Application in different imaging system (X ray CT, Microwaves, PET, Ultrasound, Optical Diffusion Tomography (ODT), Acoustic source localization,...) Current Projects: Efficient implementation in 2D and 3D cases

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy.

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1. A Student-t based sparsity enforcing hierarchical prior for linear