A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy.

Size: px

Start display at page:

Download "A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy."

Morgan Garrett
5 years ago
Views:

A Student-t based sparsity enforcing hierarchical prior for linear inverse problems and its efficient

Folker Bleichrodt Laboratoire des Signaux et Systèmes (L2S) UMR8506 CNRS-CentraleSupélec-UNIV PARIS SUD

1 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1. A Student-t based sparsity enforcing hierarchical prior for linear inverse problems and its efficient Bayesian computation for 2D and 3D Computed Tomography Ali Mohammad-Djafari, Li Wang, Nicolas Gac and Folker Bleichrodt Laboratoire des Signaux et Systèmes (L2S) UMR8506 CNRS-CentraleSupélec-UNIV PARIS SUD SUPELEC, Gif-sur-Yvette, France djafari@lss.supelec.fr itwist2016, Aug , 2016, Aalborg, Denemark

2 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2 Contents 1. Computed Tomography in 2D and 3D 2. Main classical methods 3. Basic Bayesian approach 4. Sparsity enforcing models through Student-t and IGSM 5. Computational tools: JMAP, EM, VBA 6. Implementation issues Main GPU implementation steps: Forward and Back Projections Multi-Resolution implementation 7. Some results 8. Conclusions

3 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 3 Computed Tomography: Seeing inside of a body f (x, y) a section of a real 3D body f (x, y, z) g φ (r) a line of observed radiography g φ (r, z) Forward model: Line integrals or Radon Transform g φ (r) = f (x, y) dl + ɛ φ (r) L r,φ = f (x, y) δ(r x cos φ y sin φ) dx dy + ɛ φ (r) Inverse problem: Image reconstruction Given the forward model H (Radon Transform) and a set of data g φi (r), i = 1,, M find f (x, y)

4 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 4 2D and 3D Computed Tomography 3D 2D g φ (r 1, r 2 ) = f (x, y, z) dl L r1,r 2,φ g φ (r) = f (x, y) dl L r,φ Forward probelm: f (x, y) or f (x, y, z) g φ (r) or g φ (r 1, r 2 ) Inverse problem: g φ (r) or g φ (r 1, r 2 ) f (x, y) or f (x, y, z)

5 Algebraic methods: Discretization y S r H ij f (x, y) f 1 f j g i φ f N x D f (x, y) = j f j b j (x, y) { 1 if (x, y) pixel j b j (x, y) = g(r, φ) 0 else N g(r, φ) = f (x, y) dl g i = H ij f j + ɛ i L j=1 g k = H k f + ɛ k, k = 1,, K g = Hf + ɛ g k projection at angle φ k, g all the projections. A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 5

6 Algebraic methods g 1 g =. g K, H = H 1. H K g k = H k f+ɛ k g = k H k f+ɛ k = Hf+ɛ H is huge dimensional: 2D: , 3D: Hf corresponds to forward projection H t g corresponds to Back projection (BP) H may not be invertible and even not square H is, in general, ill-conditioned In limited angle tomography H is under determined, si the problem has infinite number of solutions Minimum Norm Solution f = H t (HH t ) 1 g = k H t k (H kh t k ) 1 g k can be interpreted as the Filtered Back Projection solution. A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 6

7 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 7 Prior information or constraints Positivity: f j > 0 or f j IR + Boundedness: 1 > f j ) > 0 or f j [0, 1] Smoothness: Sparsity: many f j are zeros. f j depends on the neighborhoods. Sparsity in a transform domain: f = Dz and many z j are zeros. Discrete valued (DV): f j {0, 1,..., K} Binary valued (BV): f j {0, 1} Compactness: f (r) is non zero in one or few non-overlapping compact regions Combination of the above mentioned constraints Main mathematical questions: Which combination results to unique solution? How to apply them?

8 Algebraic methods: Regularization Minimum Norm Solution: minimize f 2 2 s.t. Hf = g f = H t (HH t ) 1 g Least square Solution: f = arg minf { J(f) = g Hf 2 } f = (H t H) 1 H t g Quadratic Regularization: J(f) = g Hf 2 + λ f 2 2 f = (H t H + λi) 1 H t g L1 Regularization: J(f) = g Hf 2 + λ f 1 Lpq Regularization: J(f) = g Hf p p + λ Df q q More general Regularization: J(f) = i φ(g i [Hf] i ) + λ j ψ(df] j ) with convex potential functions φ and Ψ or J(f) = 1 (g, Hf) + λ 2 (f, f 0 ) with 1 and 2 any distances (L2, L2,..) or divergence (KL) A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 8

9 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 9 Deterministic approaches Iterative methods: SIRT, ART, Quadratic or L1 regularization, Bloc Coordinate Descent, Multiplicative ART,... Criteria: J(f) = g Hf 2 + λ Df 2 Gradient based algorithms: J(f) = 2H t (g Hf) + 2λD t Df Simplest algorithm: f(k+1) = f(k) + α (k) [ H t (g H f (k) ) + 2λD t D f (k)] More criteria: J(f) = i φ(g i [Hf] i ) + λ j ψ((df] j) with φ(t) and ψ(t) = {t 2, t, t p,...} or non convex ones. Imposing constraints in each iteration (example: DART) Mathematical studies of uniqueness and convergence of these algorithms are necessary Many specialized algorithms (ISTA, FISTA, ADMM, AMP, GAMP,...) are developped for L1 regularization.

10 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1 Bayesian estimation approach M : g = Hf + ɛ Observation model M + Hypothesis on the noise ɛ p(g f; M) = pɛ(g Hf) A priori information p(f M) p(g f; M) p(f M) Bayes : p(f g; M) = p(g M) Gaussian priors: Prior knowledge on the noise: ɛ N (0, 2 vɛ I) Prior knowledge on f: f N (0, v 2 f (D D) 1 ) ] A posteriori: p(f g) exp [ 1 2v 2 ɛ g Hf 2 1 Df 2 2vf 2 MAP : f = arg maxf {p(f g)} = arg min f {J(f)} with J(f) = g Hf 2 + λ Df 2, λ = vɛ2 vf 2 Advantage : characterization of the solution p(f g) = N ( f, Σ) with f = ( H H + λd D ) 1 H g, Σ = vɛ ( H H + λd D ) 1

11 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1 Sparsity enforcing models 3 classes of models: 1- Generalized Gaussian (Double Exp. as particular case) 2- Mixture models (2 Gaussians mixture, BG,..) 3- Heavy tailed (Student-t and Cauchy) Student-t model [ St(f ν) exp ν + 1 log ( 1 + f 2 /ν )] 2 Infinite Gausian Scaled Mixture (IGSM) equivalence St(f ν) N (f, 0, 1/z) G(z α, β) dz, with α = β = ν/2 0 p(f z) = j p(f j z j ) = [ ] j N (f j 0, 1/z j ) exp 1 2 j z jfj 2 p(z α, β) = j G(z j α, β) [ j z(α 1) j exp [ βz j ] ] exp j (α 1) ln z j βz j [ ] p(f, z α, β) exp 1 2 j z jfj 2 + (α 1) ln z j βz j

Explicit modelling f = Dz + ξ, z sparse, ξ Gaussian with unknown variance A.

12 Sparse model in a Transform domain f piecewise continuous gradient of f Sparse z =Haar Transform of f Sparse I Analysis: J(f) = kg Hfk22 + λkdfk1 I Synthesis: f = Dz J(z) = kg HDzk2 + λkzk1 I Explicit modelling f = Dz + ξ, z sparse, ξ Gaussian with unknown variance A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1

13 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1 Sparse model in a Transform domain g = Hf + ɛ, f = Dz + ξ, z sparse p(g f, v ɛ ) = N (g Hf, V ɛ ) V ɛ = diag [v ɛ ] α ξ0, β ξ0 α z0, β p(f z) = N (f Dz, V z0 ξ I) V ξ = diag [v ξ ] p(z v z ) = N (z 0, V z ), V z = diag [v z ] v ξ v z p(v ɛ ) = i α ɛ0, β IG(v ɛ i α ɛ0, β ɛ0 ) ɛ0 p(v z ) = j IG(v z j α z0, β z0 ) p(v ξ z v ξ ) = j IG(v ξ j α ξ0, β ξ0 ) ɛ p(f, z, v ɛ, v z, v ξ g) p(g f, v ɛ ) p(f z f ) p(z v z ) D p(v ɛ ) p(v z ) p(v ξ ) f ɛ JMAP: H ( f, ẑ, ˆv ɛ, v z, v ξ ) = arg max {p(f, z, v ɛ, v z, v ξ g)} (f,z,v ɛ,v z,v ξ ) g Alternate optimization. VBA: Approximate p(f, z, v ɛ, v z, v ξ g) by q 1 (f) q 2 (z) q 3 (v ɛ ) q 4 (v z ) q 5 (v ξ ) Alternate optimization.

14 JMAP Algorithm [ f, ẑ, v z, v ɛ, v ξ ] = arg max (f,z,v z,v ɛ,v ξ ) {p (f, z, v z, v ɛ, v ξ g)} iter : f (k+1) = f (k) γ (k) f J ( f (k) ) iter : ẑ (k+1) = ẑ (k) γ z (k) J (ẑ (k) ) v zj where = βz ẑ2 j α z0 +3/2, ˆv ɛ i = βɛ 0 + J (f) = 1 2 (g Hf) V 1 ɛ J (z) = 1 2 (f Dz) V 1 γ (k) f = γ (k) z = 1 2 ( g i [ H f ] i ) 2 ξ J ( f (k) 2 ) ŶɛH J ( f (k) 2 ) + Ŷξ J ( f (k) ) J (ẑ (k) ) 2 ŶξD J (ẑ (k) ) 2 + Ŷz J (ẑ(k) ) 2 and J ( ) is the gradient of J ( ). α ɛ0 +3/2, v ξj = β ξ (g Hf) (f Dz) V 1 ξ (f Dz) z V 1 z z ( f j [Dẑ] j ) 2 α ξ0 +3/2 (f Dz) 2 where Ŷɛ = V 1 2 ɛ and Ŷξ = V 1 2 where Ŷz = V 1 2 z A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1 ξ

15 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1 Implementation issues In almost all the algorithms, the step of computation of f needs an optimization algorithm. The criterion to optimize is often in the form of J(f) = g Hf λ Df 2 2 or J(z) = g HDz 2 + λ z 1 Very often, we use the gradient based algorithms which need to compute J(f) = 2H t (g Hf) + 2λD t Df and so, the simplest case, in each step, we have f(k+1) = f(k) + α (k) [H t (g H f (k) ) + 2λD t D f (k)] 1. Compute ĝ = H f (Forward projection) 2. Compute δg = g ĝ (Error or residual) 3. Compute δf 1 = H δg (Backprojection of error) 4. Compute δf 2 = D D f and update f (k+1) = f (k) + [δf 1 + δf 2 ] Steps 1 and 3 need great computational cost and have been implemented on GPU. In this work, we used ASTRA Toolbox.

16 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1 Multi-Resolution Implementation Sacle 1: black g (1) = H (1) f (1) ( N N ) Sacle 2: green g (2) = H (2) f (2) (N/2 N/2) Sacle 3: red g (3) = H (3) f (3) (N/4 N/4)

17 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1 Results Phantom: 128 x 128 x 128 Projections: 128, 64 and 32 SNR=40 db 128 projections 64 projections 32 projections

18 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1 Results Phantom: 128 x 128 x 128 Projections: 128 SNR=20 db QR TV HHBM

19 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1 Results with High SNR=30dB δf = f f f δg = H f g g

20 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2 Results: High SNR=30dB and Low SNR=20dB δf = f f f High SNR=30dB δf = f f f Low SNR=20dB

21 Variational Bayesian Approximation (VBA) Depending on cases, we have to handle p(f, θ g), p(f, z, θ g), p(f, w, θ g) or p(f, w, z, θ g). Let consider the simplest case: Approximate p(f, θ g) by q(f, θ g) = q 1 (f g) q 2 (θ g) and then continue computations. Criterion KL(q(f, θ g) : p(f, θ g)) KL(q : p) = q ln q/p = q 1 q 2 ln q 1q 2 p = q 1 ln q 1 + q2 ln q 2 q ln p = H(q 1 ) H(q 2 ) < ln p > q Iterative algorithm q 1 q 2 q 1 q 2, q 1 (f) q 2 (θ) p(f, θ g) [ ] exp ln p(g, f, θ; M) [ q2 (θ)] exp ln p(g, f, θ; M) q1 (f) Variational Bayesian Approximation q 1 (f) f q 2 (θ) θ A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2

22 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2 JMAP, Marginalization, Sampling and exploration, VBA JMAP: p(f, θ g) optimization Marginalization p(f, θ g) Joint Posterior f θ Alternate Optimization p(θ g) Marginalize over f { f = arg maxf {p(f θ, g)} θ = arg max θ {p(θ fg)} θ p(f θ, g) f Sampling and Exploration Gibbs sampling: f p(f θ, g) θ p(θ fg) Other sampling methods: IS, MH, Slice sampling,... Variational Bayesian Approximation p(f, θ g) Variational Bayesian Approximation q 1 (f) f q 2 (θ) θ

23 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2 VBA: Choice of family of laws q 1 and q 2 Case 1 : Joint MAP { q 1 (f f) = δ(f f) f = arg maxf {p(f, θ g; } M) } (1) q 2 (θ θ) = δ(θ θ) θ= arg max θ {p( f, θ g; M) Case 2 : EM { q1 (f) p(f θ, g) Q(θ, θ)= ln p(f, θ g; M) q1 (f θ) q 2 (θ θ) = δ(θ θ) θ = arg max θ {Q(θ, θ) } (2) { Appropriate choice for inverse problems { q 1 (f) p(f θ, g; M) Accounts for the uncertainties of q 2 (θ) p(θ f, g; M) θ for f and vise versa. (3) Exponential families, Conjugate priors

24 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2 JMAP, EM and VBA JMAP Alternate optimization Algorithm: θ (0) θ f = arg max f {p(f, θ g) } θ θ θ } = arg max θ {p( f, θ g) f f f EM: θ (0) θ q 1 (f) = p(f θ, g) q 1 (f) f θ θ Q(θ, θ) = ln p(f, θ g) q1(f) θ = arg max θ {Q(θ, θ) } q 1 (f) VBA: θ (0) q 2 (θ) q 1 (f) exp θ q 2 (θ) q 2 (θ) exp [ ] ln p(f, θ g) q2(θ) [ ] ln p(f, θ g) q1(f) q 1 (f) f q 1 (f)

25 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2 VBA Approximate p(f, z, v z, v ξ, v ɛ g) by q 1 (f) q 2 (z) q 3 (v z ) q 4 (v ξ ) q 5 (v ɛ ) Alternate optimization: q 1 (f) = N (f µ f, Σ f ) q 2 (z) = N (z µ z, Σ z ) q 3 (v z ) = j IG(v z j α zj, β zj ) q 4 (v ξ ) = j IG(v ξ i α ξj, β ξj ) q 5 (v ɛ ) = i IG(v ɛ i α ɛi, β ɛi ) ( ) 1 µ f = H V 1 ɛ H + v 1 ξ H V 1 ɛ (g Dẑ) ( ) 1 Σ f = H V 1 ɛ H + v 1 ξ ( ) 1 µ z = D V 1 ξ D + v 1 z D V 1 f ξ ( ) 1 Σ z = D V 1 ξ D + v 1 z

26 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2 JMAP Algorithm [ f, ẑ, v z, v ɛ, v ξ ] = arg max (f,z,v z,v ɛ,v ξ ) {p (f, z, v z, v ɛ, v ξ g)} iter : f (k+1) = f (k) γ (k) f J ( f (k) ) iter : ẑ (k+1) = ẑ (k) γ z (k) J (ẑ (k) ) v zj = βz ẑ2 j α z0 +3/2, ˆv ɛ i = βɛ ( g i [ H f ] i ) 2 α ɛ0 +3/2, v ξj = β ξ ( f j [Dẑ] j ) 2 α ξ0 +3/2 The expressions of v zj, ˆv ɛi and v ξj are more complex and needs the computation of the diagonal elements of the posterior covariances. Specific techniques are needed to compute them efficiently. ( [ ] ) 2 ) 2 v zj = βz <ẑ2 j > β α z0 +3/2, ˆv ɛ0 + 1 g 2 i H f β ξ0 + 1 i 2 ( f j [Dẑ] j ɛ i = α ɛ0 +3/2, v ξj = α ξ0 +3/2

27 A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2 Conclusions Computed Tomography is an ill-posed Inverse problem Algebraic methods and Regularization methods push further the limitations. Bayesian approach has many potentials for real applications where we need to estimate the hyper parameters (semi-supervised) and to quantify the uncertainties. Hierarchical prior model with hidden variables are very powerful tools for Bayesian approach to inverse problems. Main Bayesian computation tools: JMAP, VBA, AMP and MCMC Application in different other imaging systems: Microwaves, PET, Ultrasound, Optical Diffusion Tomography (ODT),.. Current Projects: Efficient implementation in 3D cases to reconstruct 1024 x 1024 x 1024 volumes using GPU.

Hierarchical prior models for scalable Bayesian Tomography

A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 1/45. Hierarchical prior models for scalable Bayesian Tomography Ali Mohammad-Djafari