Inverse Problems: From Regularization to Bayesian Inference

Size: px

Start display at page:

Download "Inverse Problems: From Regularization to Bayesian Inference"

Amberlynn Hensley
6 years ago
Views:

1 1 / 42 Inverse Problems: From Regularization to Bayesian Inference An Overview on Prior Modeling and Bayesian Computation Application to Computed Tomography Ali Mohammad-Djafari Groupe Problèmes Inverses Laboratoire des Signaux et Systèmes UMR 8506 CNRS - SUPELEC - Univ Paris Sud 11 Supélec, Plateau de Moulon, Gif-sur-Yvette, FRANCE. djafari@lss.supelec.fr Applied Math Colloquium, UCLA, USA, July 17, 2009 Organizer: Stanley Osher

2 2 / 42 Content Image reconstruction in Computed Tomography : An ill posed invers problem Two main steps in Bayesian approach : Prior modeling and Bayesian computation Prior models for images : Separable Gaussian, GG,... Gauss-Markov, General one layer Markovian models Hierarchical Markovian models with hidden variables (contours and regions) Gauss-Markov-Potts Bayesian computation MCMC Variational and Mean Field approximations (VBA, MFA) Application : Computed Tomography in NDT Conclusions and Work in Progress Questions and Discussion

3 / 42 Computed Tomography : Making an image of the interior of a body f(x, y) a section of a real 3D body f(x, y, z) g φ (r) a line of observed radiographe g φ (r, z) Forward model : Line integrals

3 3 / 42 Computed Tomography : Making an image of the interior of a body f(x, y) a section of a real 3D body f(x, y, z) g φ (r) a line of observed radiographe g φ (r, z) Forward model : Line integrals or Radon Transform g φ (r) = f(x, y) dl + ǫ φ (r) L r,φ = f(x, y) δ(r x cos φ y sin φ) dx dy + ǫ φ (r) Inverse problem : Image reconstruction Given the forward model H (Radon Transform) and a set of data g φi (r), i = 1,, M find f(x, y)

4 / 42 2D and 3D Computed Tomography 3D 2D Projections 80 60 40 y f(x,y) 20 0 x 20 40 60 g φ (r 1, r 2 ) = f(x, y, z) dl g φ (r) = f(x, y) dl L r1,r 2,φ L r,φ

4 4 / 42 2D and 3D Computed Tomography 3D 2D Projections y f(x,y) 20 0 x g φ (r 1, r 2 ) = f(x, y, z) dl g φ (r) = f(x, y) dl L r1,r 2,φ L r,φ Forward probelm : f(x, y) or f(x, y, z) g φ (r) or g φ (r 1, r 2 ) Inverse problem : g φ (r) or g φ (r 1, r 2 ) f(x, y) or f(x, y, z)

5 5 / 42 X ray Tomography y f(x,y) x ( ) I g(r, φ) = ln = f(x, y) dl I 0 L r,φ g(r, φ) = f(x, y)δ(r x cosφ y sinφ) dx dy D f(x, y) RT g(r,φ) phi p(r,phi) IRT? = r

6 Analytical Inversion methods Radon : g(r,φ) = f(x, y) = y S r f(x, y) φ x D g(r,φ) = L f(x, y) dl f(x, y) δ(r x cos φ y sin φ) dx dy D ( 1 ) π + r g(r,φ) 2π 2 0 (r x cos φ y sin φ) dr dφ 6 / 42

7 7 / 42 Filtered Backprojection method f(x, y) = ( 1 ) π + r g(r,φ) 2π 2 dr dφ 0 (r x cos φ y sin φ) Derivation D : g(r,φ) = g(r,φ) r Hilbert TransformH : g 1 (r,φ) = 1 g(r,φ) π 0 (r r ) dr Backprojection B : f(x, y) = 1 π g 1 (r = x cos φ + y sin φ,φ) dφ 2π f(x, y) = B H D g(r,φ) = B F 1 1 Ω F 1 g(r,φ) Backprojection of filtered projections : 0 g(r,φ) FT Filter F 1 Ω IFT F 1 1 g 1 (r,φ) Backprojection B f(x,y)

8 / 42 Limitations : Limited angle or noisy data 60 60 60 60 40 40 40 40 20 20 20 20 0 0

20 0 20 40 60 40 20 0 20 40 60 Original 64 proj. 16 proj. 8 proj.

8 8 / 42 Limitations : Limited angle or noisy data Original 64 proj. 16 proj. 8 proj. [0, π/2] Limited angle or noisy data Accounting for detector size Other measurement geometries : fan beam,...

9 / 42 CT as a linear inverse problem 1 Fan beam X ray Tomography 0.5 0 0.

9 9 / 42 CT as a linear inverse problem 1 Fan beam X ray Tomography Source positions Detector positions g(s i ) = f(r) dl i + ǫ(s i ) Discretization g = Hf + ǫ L i g, f and H are huge dimensional

10 10 / 42 Inversion : Deterministic methods Data matching Observation model g i = h i (f) + ǫ i, i = 1,...,M g = H(f) + ǫ Misatch between data and output of the model (g, H(f)) Examples : f = arg min { (g,h(f))} f LS (g,h(f)) = g H(f) 2 = i L p (g,h(f)) = g H(f) p = i g i h i (f) 2 g i h i (f) p, 1 < p < 2 KL (g,h(f)) = i g i ln g i h i (f) In general, does not give satisfactory results for inverse problems.

11 11 / 42 Inversion : Regularization theory Inverse problems = Ill posed problems Need for prior information Functional space (Tikhonov) : g = H(f) + ǫ J(f) = g H(f) λ Df 2 2 Finite dimensional space (Philips & Towmey) : g = H(f) + ǫ Minimum norme LS (MNLS) : J(f) = g H(f) 2 + λ f 2 Classical regularization : J(f) = g H(f) 2 + λ Df 2 More general regularization : J(f) = Q(g H(f)) + λω(df) or J(f) = 1 (g,h(f)) + λ 2 (f,f ) Limitations : Errors are considered implicitly white and Gaussian Limited prior information on the solution Lack of tools for the determination of the hyperparameters

12 12 / 42 Bayesian estimation approach M : g = Hf + ǫ Observation model M + Hypothesis on the noise ǫ p(g f; M) = p ǫ (g Hf) A priori information Bayes : Link with regularization : Maximum A Posteriori (MAP) : p(f M) p(f g; M) = p(g f; M) p(f M) p(g M) f = arg max {p(f g)} = arg max {p(g f) p(f)} f f = arg min { ln p(g f) ln p(f)} f with Q(g,Hf) = ln p(g f) and λω(f) = ln p(f) But, Bayesian inference is not only limited to MAP

13 13 / 42 Case of linear models and Gaussian priors g = Hf + ǫ Hypothesis on{ the noise : ǫ N(0,σ } ǫ 2 I) p(g f) exp 1 g Hf 2 2σǫ 2 Hypothesis { on f : f N(0,σ } f 2(Dt D) 1 ) p(f) exp 1 Df 2 2σf 2 A posteriori : { } p(f g) exp 1 g Hf 2 1 Df 2 2σǫ 2 2σf 2 MAP : f = arg maxf {p(f g)} = arg min f {J(f)} with J(f) = g Hf 2 + λ Df 2, λ = σ2 ǫ σ 2 f Advantage : characterization of the solution f g N( f, P ) with f = PH t g, P = ( H t H + λd t D ) 1

14 14 / 42 MAP estimation with other priors : f = arg min f {J(f)} avec J(f) = g Hf 2 + λω(f) Separable priors : Gaussian : p(f j ) exp { α f j 2} Ω(f) = α j f j 2 Gamma : p(f j ) f α j exp { βf j } Ω(f) = α j ln f j + βf j Beta : p(f j ) f α j (1 f j ) β Ω(f) = α j ln f j + β j ln(1 f j) Generalized Gaussian : p(f j ) exp { α f j p}, 1 < p < 2 Ω(f) = α j f j p, Markovian models : p(f j f) exp α φ(f j, f i ) i N j Ω(f) = α j i N j φ(f j, f i ),

15 15 / 42 MAP estimation with markovien priors f = arg min f {J(f)} with J(f) = g Hf 2 + λω(f) Ω(f) = j φ(f j f j 1 ) with φ(t) : Convex functions : t α, 1 + t 2 1, log(cosh(t)), { t 2 t T 2T t T 2 t > T or Non convex functions : log(1 + t 2 ), t t 2, arctan(t2 ), { t 2 t T T 2 t > T

16 16 / 42 MAP estimation with different prior models g = Hf + ǫ, ǫ N(0,σ 2 ǫ I) f = Cf + z, z N(0,σ 2 f I) f = (I C) 1 z = Dz p(g f) = N(Hf,σ 2 ǫ I) p(f) = N(0,σ 2 f (Dt D) 1 ) p(f g) = N( f, P f ) f = arg min f {J(f)} J(f) = g Hf 2 + λ Df 2 f = PH t g P f = ( H t H + λd t D ) 1 g = Hf + ǫ, ǫ N(0,σ 2 ǫ I) f = Wz, z N(0,σ 2 z I) g = HWz + ǫ p(g z) = N(HWz,σ 2 ǫ I) p(z) = N(0,σ 2 zi) p(z g) = N(ẑ, P z ) ẑ = arg min z {J(z)} J(z) = g HWz 2 + λ z 2 ẑ = P z H t W t g P z = ( H t W t H + λi ) 1 f = Wẑ P f = W P z W t

17 17 / 42 MAP estimation with different prior models { g = Hf + ǫ f N ( 0,σf 2(Dt D) 1 ) ) = g = Hf + ǫ f = Cf + z with z N(0,σ 2 f I) Df = z with D = (I C) f g N( f, P f ) with f = P f H t g, Pf = ( H t H + λd t D ) 1 J(f) = ln p(f g) = g Hf 2 + λ Df 2 { { g = Hf + ǫ f N ( 0,σf 2(WWt ) ) g = Hf + ǫ = f = Wz with z N(0,σf 2I) z g N(ẑ, P z ) with ẑ = P z W t H t g, Pz = ( W t H t HW + λi ) 1 J(z) = ln p(z g) = g HWz 2 + λ z 2 f = Wẑ z decomposqition coefficients

18 18 / 42 MAP estimation and Compressed Sensing { g = Hf + ǫ f = Wz W a code book matrix, z coefficients Gaussian : { } p(z) = N(0,σzI) 2 exp 1 2σ j z j 2 z 2 J(z) = ln p(z g) = g HWz 2 + λ j z j 2 Generalized Gaussian (sparsity, β = 1) : p(z) exp { λ } j z j β J(z) = ln p(z g) = g HWz 2 + λ j z j β z = arg min z {J(z)} f = Wẑ

19 19 / 42 Main advantages of the Bayesian approach MAP = Regularization Posterior mean? Marginal MAP? More information in the posterior law than only its mode or its mean Meaning and tools for estimating hyper parameters Meaning and tools for model selection More specific and specialized priors, particularly through the hidden variables More computational tools : Expectation-Maximization for computing the maximum likelihood parameters MCMC for posterior exploration Variational Bayes for analytical computation of the posterior marginals...

20 20 / 42 Full Bayesian approach M : g = Hf + ǫ Forward & errors model : p(g f,θ 1 ; M) Prior models p(f θ 2 ; M) Hyperparameters θ = (θ 1,θ 2 ) p(θ M) Bayes : p(f,θ g; M) = Joint MAP : p(g f,θ;m) p(f θ;m) p(θ M) p(g M) ( f, θ) = arg max {p(f,θ g; M)} (f,θ) { p(f g; M) = p(f,θ g; M) df Marginalization : p(θ g; M) = p(f,θ g; M) dθ { f = f p(f,θ g; M) df dθ Posterior means : θ = θ p(f,θ g; M) df dθ Evidence of the model : p(g M) = p(g f,θ; M)p(f θ; M)p(θ M) df dθ

21 Two main steps in the Bayesian approach Prior modeling Separable : Gaussian, Generalized Gaussian, Gamma, mixture of Gaussians, mixture of Gammas,... Markovian : Gauss-Markov, GGM,... Separable or Markovian with hidden variables (contours, region labels) Choice of the estimator and computational aspects MAP, Posterior mean, Marginal MAP MAP needs optimization algorithms Posterior mean needs integration methods Marginal MAP needs integration and optimization Approximations : Gaussian approximation (Laplace) Numerical exploration MCMC Variational Bayes (Separable approximation) 21 / 42

22 22 / 42 Which images I am looking for?

23 23 / 42 Which image I am looking for? Gaussian Generalized Gaussian p(f j f j 1 ) exp { α f j f j 1 2} p(f j f j 1 ) exp { α f j f j 1 p} Piecewize Gaussian p(f j q j, f j 1 ) = N ( ) (1 q j )f j 1, σf 2 Mixture of GM p(f j z j = k) = N ( ) m k, σk 2

24 24 / 42 Gauss-Markov-Potts prior models for images In NDT applications of CT, the objects are, in general, composed of a finite number of materials, and the voxels corresponding to each materials are grouped in compact regions How to model this prior information? f(r) z(r) {1,..., K } p(f(r) z(r) = k, m k, v k ) = N(m k, v k ) p(f(r)) = P(z(r) = k) N(m k, v k ) Mixture of Gaussians k p(z(r) z(r ),r V(r)) exp γ δ(z(r) z(r )) r V(r)

25 / 42 Four different cases To each pixel of the image is associated 2 variables f(r) and z(r) f z Gaussian iid, z iid : Mixture of Gaussians f z Gauss-Markov, z iid : Mixture of

25 25 / 42 Four different cases To each pixel of the image is associated 2 variables f(r) and z(r) f z Gaussian iid, z iid : Mixture of Gaussians f z Gauss-Markov, z iid : Mixture of Gauss-Markov f z Gaussian iid, z Potts-Markov : Mixture of Independent Gaussians (MIG with Hidden Potts) f(r) f z Markov, z Potts-Markov : Mixture of Gauss-Markov (MGM with hidden Potts) z(r)

26 26 / 42 Four different cases Case 1 : Mixture of Gaussians Case 2 : Mixture of Gauss-Markov Case 3 : MIG with Hidden Potts Case 4 : MGM with hidden Potts

27 27 / 42 Four different cases f(r) z(r) z(r) f(r) z(r ) z(r) z(r ) f(r) f(r ), z(r), z(r ) z(r) q(r, r ) = {0, 1} f(r) f(r ), z(r), z(r ) z(r) z(r ) q(r, r ) = {0, 1}

28 28 / 42 Case 1 : f z Gaussian iid, z iid Independent Mixture of Independent Gaussiens (IMIG) : p(f(r) z(r) = k) = N(m k, v k ), r R p(f(r)) = K k=1 α kn(m k, v k ), with k α k = 1. p(z) = r p(z(r) = k) = r α k = k αn k k Noting R k = {r : z(r) = k}, R = k R k, we have : m z (r) = m k, v z (r) = v k,α z (r) = α k, r R k p(f z) = r R N(m z (r), v z (r)) p(z) = r α z (r) = k P r R α δ(z(r) k) k = k α n k k

29 29 / 42 Case 2 : f z Gauss-Markov, z iid Independent Mixture of Gauss-Markov (IMGM) : p(f(r) z(r), z(r ), f(r ),r V(r)) = N(µ z (r), v z (r)), r R r V(r) µ z(r ) µ z (r) = 1 V(r) µ z (r ) = δ(z(r ) z(r)) f(r ) + (1 δ(z(r ) z(r)) m z (r ) = (1 c(r )) f(r ) + c(r ) m z (r ) p(f z) r N(µ z(r), v z (r)) k α kn(m k 1,Σ k ) p(z) = r v z(r) = k αn k k with 1 k = 1, r R k and Σ k a covariance matrix (n k n k ).

30 30 / 42 Case 3 : f z Gauss iid, z Potts Gauss iid as in Case 1 : p(f z) = r R N(m z(r), v z (r)) = k r R k N(m k, v k ) Potts-Markov : p(z(r) z(r ),r V(r)) exp γ p(z) exp γ r R r V(r) r V(r) δ(z(r) z(r )) δ(z(r) z(r ))

31 31 / 42 Case 4 : f z Gauss-Markov, z Potts Gauss-Markov as in Case 2 : p(f(r) z(r), z(r ), f(r ),r V(r)) = N(µ z (r), v z (r)), r R µ z (r) = 1 V(r) r V(r) µ z (r ) µ z(r ) = δ(z(r ) z(r)) f(r ) + (1 δ(z(r ) z(r)) m z (r ) p(f z) r N(µ z(r), v z (r)) k α kn(m k 1,Σ k ) Potts-Markov as in Case 3 : p(z) exp γ r R r V(r) δ(z(r) z(r ))

32 32 / 42 Summary of the two proposed models f z Gaussian iid z Potts-Markov (MIG with Hidden Potts) f z Markov z Potts-Markov (MGM with hidden Potts)

33 33 / 42 Bayesian Computation p(f,z,θ g) p(g f,z, v ǫ ) p(f z,m,v) p(z γ,α) p(θ) θ = {v ǫ,(α k, m k, v k ), k = 1,, K } p(θ) Conjugate priors Direct computation and use of p(f,z,θ g; M) is too complex Possible approximations : Gauss-Laplace (Gaussian approximation) Exploration (Sampling) using MCMC methods Separable approximation (Variational techniques) Main idea in Variational Bayesian methods : Approximate p(f,z,θ g; M) by q(f,z,θ) = q 1 (f) q 2 (z) q 3 (θ) Choice of approximation criterion : KL(q : p) Choice of appropriate families of probability laws for q 1 (f), q 2 (z) and q 3 (θ)

34 34 / 42 MCMC based algorithm General scheme : p(f,z,θ g) p(g f,z,θ) p(f z,θ) p(z) p(θ) f p(f ẑ, θ,g) ẑ p(z f, θ,g) θ (θ f,ẑ,g) Sample f from p(f ẑ, θ,g) p(g f,θ) p(f ẑ, θ) Needs optimisation of a quadratic criterion. Sample z from p(z f, θ,g) p(g f,ẑ, θ) p(z) Needs sampling of a Potts Markov field. Sample θ from p(θ f,ẑ,g) p(g f,σ 2 ǫ I) p( f ẑ,(m k, v k )) p(θ) Conjugate priors analytical expressions.

35 35 / 42 Application of CT in NDT Reconstruction from only 2 projections g 1 (x) = f(x, y) dy g 2 (y) = f(x, y) dx Given the marginals g 1 (x) and g 2 (y) find the joint distribution f(x, y). Infinite number of solutions : f(x, y) = g 1 (x) g 2 (y) Ω(x, y) Ω(x, y) is a Copula : Ω(x, y) dx = 1 and Ω(x, y) dy = 1

36 36 / 42 Application in CT g f f z z q g = Hf + ǫ iid Gaussian iid q(r) {0, 1} g f N(Hf,σǫ 2 I) or or 1 δ(z(r) z(r )) Gaussian Gauss-Markov Potts binary Forward model Gauss-Markov-Potts Prior Model Auxilary Unsupervised Bayesian estimation : p(f,z,θ g) p(g f,z,θ) p(f z,θ) p(θ)

37 / 42 Results : 2D case Original Backprojection Filtered BP LS Gauss-Markov+pos GM+Line process GM+Label process c z c

38 38 / 42 Some results in 3D case M. Defrise Phantom FeldKamp Proposed method

39 39 / 42 Some results in 3D case FeldKamp Proposed method

40 40 / 42 Some results in 3D case Experimental setup A photograpy of metalique esponge Reconstruction by proposed method

41 41 / 42 Application : liquid evaporation in metalic esponge Time 0 Time 1 Time 2

42 42 / 42 Conclusions Bayesian estimation approach gives more tools than deterministic regularization to handle inverses problems Gauss-Markov-Potts are useful prior models for images incorporating regions and contours Bayesian computation needs either multi-dimensional optimization or integration methods Different optimization and integration tools and approximations exist (Laplace, MCMC, Variational Bayes) Work in Progress and Perspectives : Efficient implementation in 2D and 3D cases using GPU Application to other linear and non linear inverse problems : X ray CT, Ultrasound and Microwave imaging, PET, SPECT,...

Inverse problems in imaging and computer vision

1 / 62. Inverse problems in imaging and computer vision From deterministic methods to probabilistic Bayesian inference Ali Mohammad-Djafari Groupe Problèmes Inverses Laboratoire des Signaux et Systèmes