Hierarchical prior models for scalable Bayesian Tomography

Size: px
Start display at page:

Download "Hierarchical prior models for scalable Bayesian Tomography"

Transcription

1 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 1/45. Hierarchical prior models for scalable Bayesian Tomography Ali Mohammad-Djafari Laboratoire des Signaux et Systèmes (L2S) UMR8506 CNRS-CentraleSupélec-UNIV PARIS SUD SUPELEC, Gif-sur-Yvette, France Invited talk at WSTL, Szeged, Hungary

2 Contents 1. Computed Tomography or how to See inside of a body 2. Algebraic and Regularization methods 3. Basic Bayesian approach 4. Two main steps: Choosing appropriate Prior model Do the computational efficiently 5. Unsupervised Bayesian methods: JMAP, Marginalization, VBA 6. Hierarchical prior modelling Sparsity enforcing through Student-t and IGSM Gauss-Markov-Potts models 7. Computational tools: JMAP, Gibbs Sampling MCMC, VBA 8. Case study: Image Reconstruction with only two projections 9. Implementation issues Main GPU implementation steps: Forward and Back Projections Multi-Resolution implementation 10. Conclusions A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 2/45

3 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 3/45 Computed Tomography or how to see inside of a body f (x, y) a section of a real 3D body f (x, y, z) g φ (r) a line of observed radiography g φ (r, z) Forward model: Line integrals or Radon Transform g φ (r) = f (x, y) dl + ɛ φ (r) L r,φ = f (x, y) δ(r x cos φ y sin φ) dx dy + ɛ φ (r) Inverse problem: Image reconstruction Given the forward model H (Radon Transform) and a set of data g φi (r), i = 1,, M find f (x, y)

4 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 4/45 2D and 3D Computed Tomography 3D 2D g φ (r 1, r 2 ) = f (x, y, z) dl L r1,r 2,φ g φ (r) = f (x, y) dl L r,φ Forward probelm: f (x, y) or f (x, y, z) g φ (r) or g φ (r 1, r 2 ) Inverse problem: g φ (r) or g φ (r 1, r 2 ) f (x, y) or f (x, y, z)

5 Algebraic methods: Discretization y S r H ij f (x, y) f 1 f j g i φ f N x D f (x, y) = j f j b j (x, y) { 1 if (x, y) pixel j b j (x, y) = g(r, φ) 0 else N g(r, φ) = f (x, y) dl g i = H ij f j + ɛ i L j=1 g k = H k f + ɛ k, k = 1,, K g = Hf + ɛ g k projection at angle φ k, g all the projections. A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 5/45

6 Algebraic methods g 1 g =. g K, H = H 1. H K g k = H k f+ɛ k g = k H k f+ɛ k = Hf+ɛ H is huge dimensional: 2D: , 3D: Hf corresponds to forward projection H t g corresponds to Back projection (BP) H may not be invertible and even not square H is, in general, ill-conditioned Generalized Inverse Minimum Norm Solution f = H t (HH t ) 1 g = H t k (H kh t k ) 1 g k k can be interpreted as the Filtered Back Projection solution Due to ill-posedness of the inverse problems, Generalized inversion and Least squares (LS) methods: { f = arg min J(f) = g Hf 2 } f = (H t H) 1 H t g f A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 6/45

7 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 7/45 Inversion: Deterministic methods: Data matching Observation model g = Hf + ɛ g i = [Hf] i + ɛ i = h i (f), i = 1,..., M Mismatch between data and output of the model (g, H(f)) Examples: f = arg min { (g, H(f))} f LS (g, H(f)) = g H(f) 2 = i L p (g, H(f)) = g H(f) p = i g i h i (f) 2 g i h i (f) p, 1 < p < 2 KL (g, H(f)) = i g i ln g i h i (f) In general, does not give satisfactory results for inverse problems.

8 Regularization theory Inverse problems = Ill posed problems Need for prior information Functional space (Tikhonov): g = H(f ) + ɛ J(f ) = g H(f ) λ Df 2 2 Finite dimensional space (Philips & Towmey): g = Hf + ɛ J(f) = g Hf 2 + λ f 2 Minimum norme LS (MNLS): J(f) = g H(f) 2 + λ f 2 Classical regularization: J(f) = g H(f) 2 + λ Df 2 More general regularization: J(f) = Q(g H(f)) + λω(df) or J(f) = 1 (g, H(f)) + λ 2 (Df, f 0 ) Limitations: Errors are considered implicitly white and Gaussian Limited prior information on the solution Lack of tools for the determination of the hyperparameters A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 8/45

9 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 9/45 Inversion: Probabilistic methods Taking account of errors and uncertainties Probability theory Maximum Likelihood (ML) Minimum Inaccuracy (MI) Probability Distribution Matching (PDM) Maximum Entropy (ME) and Information Theory (IT) Bayesian Inference (Bayes) Advantages: Explicit account of the errors and noise A large class of priors via explicit or implicit modelling A coherent approach to combine information content of the data and priors Limitations: Practical implementation and cost of calculation

10 Bayesian estimation approach M : g = Hf + ɛ Observation model M + Hypothesis on the noise ɛ p(g f; M) = pɛ(g Hf) A priori information p(f M) p(g f; M) p(f M) Bayes : p(f g; M) = p(g M) Link with regularization : Maximum A Posteriori (MAP) : f = arg max {p(f g)} = arg max {p(g f) p(f)} f f = arg min {J(f) = ln p(g f) ln p(f)} f Regularization: f = arg min {J(f) = Q(g, Hf) + λω(f)} f with Q(g, Hf) = ln p(g f) and λω(f) = ln p(f) A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 10/45

11 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 11/45 Case of linear models and Gaussian priors g = Hf + ɛ Prior knowledge on the noise: ɛ N (0, v ɛ 2 I) p(g f) exp [ 1 2v ɛ 2 g Hf 2 Prior knowledge on f: [ f N (0, vf 2 (D D) 1 ) p(f) exp 1 ] 2vf 2 Df 2 A posteriori: [ p(f g) exp 1 2v 2 g Hf 2 1 ] ɛ 2vf 2 Df 2 MAP : f = arg maxf {p(f g)} = arg min f {J(f)} with J(f) = g Hf 2 + λ Df 2, λ = vɛ2 v 2 f Advantage : characterization of the solution p(f g) = N ( f, Σ) f = ( H H + λd D ) 1 H g, with ] Σ = vɛ ( H H + λd D ) 1

12 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 12/45 MAP estimation with other priors: f = arg min {J(f)} f Separable priors: with J(f) = 1 v ɛ g Hf 2 + Ω(f) Gaussian: p(f j ) exp [ α f j 2] Ω(f) = α j f j 2 = f 2 2 Generalized Gaussian: p(f j ) exp [ α f j p ], 1 < p < 2 Ω(f) = α j f j p = f p p Gamma: p(f j ) fj α exp [ βf j ] Ω(f) = α j ln f j + βf j Beta: p(f j ) fj α (1 f j ) β Ω(f) = α j ln f j + β j ln(1 f j) Markovian models: p(f j f) exp α φ(f j, f i ) Ω(f) = α φ(f j, f i ), i Nj j i N j

13 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 13/45 Main advantages of the Bayesian approach MAP = Regularization Posterior mean? Marginal MAP? More information in the posterior law than only its mode or its mean Tools for estimating hyper parameters Tools for model selection More specific and specialized priors, particularly through the hidden variables and hierarchical models More computational tools: Expectation-Maximization for computing the maximum likelihood parameters MCMC for posterior exploration Variational Bayes for analytical computation of the posterior marginals...

14 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 14/45 Two main steps in the Bayesian approach Prior modeling Separable: Gaussian, Gamma, Sparsity enforcing: Generalized Gaussian, mixture of Gaussians, mixture of Gammas,... Markovian: Gauss-Markov, GGM,... Markovian with hidden variables (contours, region labels) Choice of the estimator and computational aspects MAP, Posterior mean, Marginal MAP MAP needs optimization algorithms Posterior mean needs integration methods Marginal MAP and Hyperparameter estimation need integration and optimization Approximations: Gaussian approximation (Laplace) Numerical exploration MCMC Variational Bayes (Separable approximation)

15 Which images I am looking for? A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 15/45

16 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 16/45 Which image I am looking for? Gauss-Markov Generalized GM Piecewize Gaussian Mixture of GM

17 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 17/45 Different prior models for signals and images: Separable Simple Gaussian, Gamma, Generalized Gaussian p(f) exp α φ(f j ) j Simple Markovian models: Gauss-Markov, Generalized Gauss-Markov p(f) exp α φ(f j f i ) j j N (i) Hierarchical models with hidden variables: Bernouilli-Gaussian, Gaussian-Gamma p(f z) exp φ(f j z j ) and p(z) exp φ(z j ) j j with different choices for φ(f j z j ) and pφ(z j )

18 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 18/45 Bayesian inference as Hierarchical models Simple case: g = Hf + ɛ θ 2 θ 1 p(f g, θ) p(g f, θ 1 ) p(f θ 2 ) Objective: Infer f f ɛ MAP: f = arg maxf {p(f g, H θ)} Posterior Mean (PM): f = f p(f g, θ) df g Example: { Caussian case: p(g f, θ1 ) = N (g Hf, θ 1 I) v f v ɛ p(f g, θ) = N (f f, p(f θ 2 ) = N (g 0, θ 2 I) Σ) MAP: f = arg minf {J(f)} with f ɛ J(f) = 1 v ɛ g Hf v f f 2 H Posterior Mean (PM)=MAP: g { f = (H t H + λi) 1 H t g Σ = (H t H + λi) 1 with λ = vɛ v f.

19 Gaussian model: Simple separable and Markovian g = Hf + ɛ Separable Gaussian v f v ɛ f ɛ H g Gauss-Markov v f, D v ɛ f ɛ H g g = Hf + ɛ { p(g f, θ1 ) = N (g Hf, v ɛ I) p(f g, θ) = N (f f, Σ) p(f v f ) = N (g 0, θ 2 I) MAP: f = arg minf {J(f)} with J(f) = 1 v ɛ g Hf v f f 2 Posterior Mean (PM)=MAP: { f = (H t H + λi] 1 H t g Σ = v ɛ (H t H + λi] 1 with λ = vɛ v f. Markovian case: p(f v f, D) = N (g 0, v f (DD t ) 1 ) MAP: J(f) = 1 v ɛ g Hf v f Df 2 Posterior Mean (PM)=MAP: { f = (H t H + λd t D] 1 H t g Σ = v ɛ (H t H + λd t D] 1 with λ = ve v f. A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 19/45

20 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 20/45 Bayesian inference (Unsupervised case) Unsupervised case: Hyper parameter estimation p(f, θ g) p(g f, θ 1 ) p(f θ 2 ) p(θ) JMAP: ( f, θ) = arg max (f,θ) {p(f, θ g)} Marginalization β 0 α 0 1: p(f g) = p(f, θ g) dθ θ 2 θ 1 Objective: Infer (f, θ) Marginalization 2: p(θ g) = p(f, θ g) df followed by: f ɛ H θ = arg max θ {p(θ g)} f = arg max f {p(f g, θ) } MCMC Gibbs sampling: g f p(f θ, g) θ p(θ f, g) until convergence Use samples generated to compute mean and variances VBA: Approximate p(f, θ g) by q 1 (f) q 2 (θ) Use q 1 (f) to infer f and q 2 (θ) to infer θ

21 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 21/45 Variational Bayesian Approximation Approximate p(f, θ g) by q(f, θ g) = q 1 (f g) q 2 (θ g) and then continue computations. Criterion KL(q(f, θ g) : p(f, θ g)) KL(q : p) = q ln q/p = q 1 q 2 ln q 1q 2 = p q 1 ln q 1 + q 2 ln q 2 q ln p = H(q 1 ) H(q 2 ) < ln p > q Alternate optimization algorithm q 1 q 2 q 1 q 2, [ ] q 1 (f) exp ln p(g, f, θ; M) [ q2 (θ)] q 2 (θ) exp ln p(g, f, θ; M) q1 (f) p(f, θ g) Variational Bayesian Approximation q 1 (f) f q 2 (θ) θ

22 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 22/45 JMAP, Marginalization, VBA JMAP: Marginalization p(f, θ g) optimization f θ p(f, θ g) p(θ g) θ p(f θ, g) f Joint Posterior Marginalize over f Variational Bayesian Approximation p(f, θ g) Variational Bayesian Approximation q 1 (f) f q 2 (θ) θ

23 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 23/45 Variational Bayesian Approximation Approximate p(f, θ g) by q(f, θ) = q 1 (f) q 2 (θ) and then use them for any inferences on f and θ respectively. Criterion KL(q(f, θ g) : p(f, θ g)) KL(q : p) = q ln q p = q 1 q 2 ln q 1q 2 p Iterative algorithm q 1 q 2 q 1 q 2, q 1 (f) q 2 (θ) [ ] exp ln p(g, f, θ; M) q2 [ (θ)] exp ln p(g, f, θ; M) q1 (f) p(f, θ g) Variational Bayesian Approximation q 1 (f) f q 2 (θ) θ

24 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 24/45 BVA: Choice of family of laws q 1 and q 2 { Case 1 : Joint MAP { q 1 (f f) = δ(f f) f = arg maxf {p(f, θ g; } M) } q 2 (θ θ) = δ(θ θ) θ= arg max θ {p( f, θ g; M) Case 2 : EM { q1 (f) p(f θ, g) Q(θ, θ)= ln p(f, θ g; M) q1 (f θ) q 2 (θ θ) = δ(θ θ) θ = arg max θ {Q(θ, θ) } Appropriate choice for inverse problems with Exponential families and Conjugate priors { q 1 (f) p(f θ, g; M) Accounts for the uncertainties of q 2 (θ) p(θ f, g; M) θ for f and vice versa.

25 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 25/45 JMAP, EM and VBA JMAP Alternate optimization Algorithm: θ (0) θ f = arg max f {p(f, θ g) } θ θ θ } = arg max θ {p( f, θ g) f f f EM: θ (0) θ q 1 (f) = p(f θ, g) q 1 (f) f θ θ Q(θ, θ) = ln p(f, θ g) q1(f) θ = arg max θ {Q(θ, θ) } q 1 (f) VBA: θ (0) q 2 (θ) q 1 (f) exp θ q 2 (θ) q 2 (θ) exp [ ] ln p(f, θ g) q2(θ) [ ] ln p(f, θ g) q1(f) q 1 (f) f q 1 (f)

26 Unsupervised Gaussian model Unsupervised Gaussian { case: g = Hf + ɛ, ɛ N (ɛ 0, v ɛ I) p(g f, vɛ ) = N (g Hf, v ɛ I) { p(f v f ) = N (g 0, v f I) p(vɛ ) = IG(v ɛ α ɛ0, β ɛ0 ) p(v f ) = IG(v f α f0, β f0 ) p(f, v ɛ, v f g) p(g f, v ɛ ) p(f v f ) p(v ɛ ) p(v f ) α f0, β f0 α ɛ0, β ɛ0 JMAP: ( f, ˆv ɛ, v f ) = arg max (f,v ɛ,v f ) {p(f, v ɛ, v f g)} v f v Alternate optimization: ɛ f = (H λi) 1 H + H g with λ = ˆvɛ v f f ɛ ˆv ɛ = β ɛ α ɛ, H β ɛ = β ɛ0 + g H f 2, α ɛ = α ɛ0 + M/2 v f = β f α f, β f = β f0 + f 2, α f = α f0 + N/2 g VBA: Approximate p(f, v ɛ, v f g) by q 1 (f) q 2 (v ɛ ) q 3 (v f ) Alternate optimization: q 1 (f) = N (f µ, Σ f ) q 2 (v ɛ ) = IG(v ɛ α ɛ, β ɛ ) q 3 (v f ) = IG(v ɛ α f, β f ) A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 26/45

27 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 27/45 Unsupervised Gaussian model { g = Hf + ɛ, ɛ N (ɛ 0, v ɛ I) { p(g f, vɛ ) = N (g Hf, v ɛ I) p(vɛ ) = IG(v, ɛ α ɛ0, β ɛ0 ) p(f v f ) = N (g 0, v f I) p(v f ) = IG(v f α f0, β f0 ) p(f, v ɛ, v f g) p(g f, v ɛ ) p(f v f ) p(v ɛ ) p(v f ) JMAP: ( f, ˆv ɛ, v f ) = arg max (f,v ɛ,v f ) {p(f, v ɛ, v f g)} f = (H λi) 1 H + H g with λ = ˆvɛ v f ˆv ɛ = β ɛ α ɛ, βɛ = β ɛ0 + g H f 2, α ɛ = α ɛ0 + M/2 v f = β f α f, βf = β f0 + f 2, α f = α f0 + N/2 VBA: Approximate p(f, v ɛ, v f g) by q 1 (f) q 2 (v ɛ ) q 3 (v f ) ( λi) 1 q 1 (f) = N (f µ, Σ µ = H H + H g with λ = ˆvɛ v f f ) q 2 (v ɛ ) = IG(v ɛ α ɛ, β Σ = ˆv ɛ ) ɛ (H H + λi ) 1 q 3 (v f ) = IG(v ɛ α f, β f ) ˆv ɛ = β ɛ α ɛ, βɛ = β ɛ0 + < g H f 2 >, α ɛ = α ɛ0 + M 2 v f = β f α f, βf = β f0 + < f 2 >, α f = α f0 + ] N 2 ] < g H f 2 >= g H µ 2 + Hdiag [ Σ H t, < f 2 >= µ 2 + diag [ Σ

28 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 28/45 Sparsity enforcing models Student-t model St(f ν) exp [ ν + 1 log ( 1 + f 2 /ν )] 2 Infinite Gausian Scaled Mixture (IGSM) equivalence St(f ν) 0 N (f, 0, 1/z) G(z α, β) dz, with α = β = ν/2 p(f z) p(z α, β) p(f, z α, β) = j p(f j z j ) = [ j N (f j 0, 1/z j ) exp 1 2 = j G(z j α, β) [ j z(α 1) j exp [ βz j ] ] exp j (α 1) ln z j βz j [ ] exp 1 2 j z jfj 2 + (α 1) ln z j βz j ] j z jfj 2

29 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 29/45 Non stationary noise and Integrated Gaussian Scaled Mixture (IGSM) model Non stationary noise: g = Hf+ɛ, ɛ i N (ɛ i 0, v ɛi ) ɛ N (ɛ 0, V ɛ = diag [v ɛ1,, v ɛm ]) Student-t prior model and its equivalent IGSM : f j v fj N (f j 0, v fj ) and v fj IG(v fj α f0, β f0 ) f j St(f j α f0, β f0 ) { p(g f, vɛ ) = N (g Hf, V α f0, β f0 α ɛ0, β ɛ ), V ɛ = diag [v ɛ ] ɛ0 { p(f v f ) = N (g 0, V f ), V f = diag [v f ] p(vɛ ) = v f v i IG(v ɛ i α ɛ0, β ɛ0 ) ɛ p(v f ) = i IG(v f j α f0, β f0 ) p(f, v ɛ, v f g) p(g f, v ɛ ) p(f v f ) p(v ɛ ) p(v f ) f ɛ JMAP: ( f, ˆv ɛ, ˆv f ) = arg max H (f,v ɛ,v f ) {p(f, v ɛ, v f g)} VBA: Approximate p(f, v ɛ, v f g) g by q 1 (f) q 2 (v ɛ ) q 3 (v f )

30 Sparse model in a Transform domain 1 g = Hf + ɛ, f = Dz, z sparse { p(g z, vɛ ) = N (g HDf, v ɛ I) { p(z v z ) = N (z 0, V z ), V z = diag [v z ] α z0, β z0 p(vɛ ) = IG(v ɛ α ɛ0, β ɛ0 ) p(v z ) = i IG(v z j α z0, β z0 ) v z α ɛ0, β p(z, v ɛ, v z, v ξ g) p(g z, v ɛ ) p(z v z ) p(v ɛ ) p(v z ) p(v ξ ) ɛ0 JMAP: z v (ẑ, ˆv ɛ, v z ) = arg max {p(z, v ɛ, v z g)} ɛ (z,v ɛ,v z ) D Alternate optimization: f ɛ ẑ = arg minz {J(z)} with: H J(z) = 1 2 ˆv ɛ g HDz 2 + V 1/2 z z 2 g v zj = βz 0 +ẑ2 j α z0 +1/2 ˆv ɛ = βɛ 0 + g HDẑ 2 α ɛ0 +M/2 VBA: Approximate p(z, v ɛ, v z, v ξ g) by q 1 (z) q 2 (v ɛ ) q 3 (v z ) Alternate optimization. A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 30/45

31 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 31/45 Sparse model in a Transform domain 2 g = Hf + ɛ, f = Dz + ξ, z sparse p(g f, v ɛ ) = N (g Hf, v ɛ I) α ξ0, β ξ0 α z0, β p(f z) = N (f Dz, v z0 ξ I), p(z v z ) = N (z 0, V z ), V z = diag [v z ] v ξ v z p(v ɛ ) = IG(v ɛ α ɛ0, β ɛ0 ) α ɛ0, β ɛ0 p(v z ) = i IG(v z j α z0, β z0 ) p(v ξ z v ξ ) = IG(v ξ α ξ0, β ξ0 ) ɛ p(f, z, v ɛ, v z, v ξ g) p(g f, v ɛ ) p(f z f ) p(z v z ) D p(v ɛ ) p(v z ) p(v ξ ) f ɛ JMAP: H ( f, ẑ, ˆv ɛ, v z, v ξ ) = arg max {p(f, z, v ɛ, v z, v ξ g)} (f,z,v ɛ,v z,v ξ ) g Alternate optimization. VBA: Approximate p(f, z, v ɛ, v z, v ξ g) by q 1 (f) q 2 (z) q 3 (v ɛ ) q 4 (v z ) q 5 (v ξ ) Alternate optimization.

32 Gauss-Markov-Potts prior models for images f (r) z(r) c(r) = 1 δ(z(r) z(r0 )) p(f (r) z(r) = k, mk, vk ) = N (mk, vk ) X p(f (r)) = P(z(r) = k) N (mk, vk ) Mixture of Gaussians I I k Q Separable iid hidden variables: p(z) = r p(z(r)) Markovian hidden variables: p(z) Potts-Markov: X p(z(r) z(r0 ), r0 V(r)) exp γ δ(z(r) z(r0 )) r0 V(r) X X p(z) exp γ δ(z(r) z(r0 )) r R r0 V(r) A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 32/45

33 Four different cases To each pixel of the image is associated 2 variables f (r) and z(r) I f z Gaussian iid, z iid : Mixture of Gaussians I f z Gauss-Markov, z iid : Mixture of Gauss-Markov I f z Gaussian iid, z Potts-Markov : Mixture of Independent Gaussians (MIG with Hidden Potts) I f z Markov, z Potts-Markov : Mixture of Gauss-Markov (MGM with hidden Potts) f (r) z(r) A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 33/45

34 Gauss-Markov-Potts prior models for images f (r) z(r) c(r) = 1 δ(z(r) z(r0 )) a0 g = Hf + m 0, v0 γ α 0, β 0 p(g f, v ) = N (g Hf, v I) α0, β0 p(v ) = IG(v α 0, β 0 )??? p(f = k,q mk, vk ) = N (f (r) mk, vk ) (r) z(r) P v z θ p(f z, θ) = k r Rk ak N (f (r) mk, v k ), θ = {(a, m, k k v k ), k = 1,, K R p(θ) = D(a a )N 0, v 0)IG(v α0, β0 ) h0 P(a m i P 0 p(z γ) exp γ δ(z(r) z(r )) Potts MRF 0 r r N (r) H? p(f, z, θ g) p(g f, v ) p(f z, θ) p(z γ) g MCMC: Gibbs Sampling VBA: Alternate optimization. A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 34/45

35 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 35/45 Bayesian Computation and Algorithms Joint posterior probability law of all the unknowns f, z, θ p(f, z, θ g) p(g f, θ 1 ) p(f z, θ 2 ) p(z θ 3 ) p(θ) Often, the expression of p(f, z, θ g) is complex. Its optimization (for Joint MAP) or its marginalization or integration (for Marginal MAP or PM) is not easy Two main techniques: MCMC: Needs the expressions of the conditionals p(f z, θ, g), p(z f, θ, g), and p(θ f, z, g) VBA: Approximate p(f, z, θ g) by a separable one q(f, z, θ g) = q 1 (f) q 2 (z) q 3 (θ) and do any computations with these separable ones.

36 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 36/45 MCMC based algorithm p(f, z, θ g) p(g f, z, θ 1 ) p(f z, θ 2 ) p(z θ 3 ) p(θ) General Gibbs sampling scheme: f p(f ẑ, θ, g) ẑ p(z f, θ, g) θ (θ f, ẑ, g) Generate samples f using p(f ẑ, θ, g) p(g f, θ) p(f ẑ, θ) When Gaussian, can be done via optimization of a quadratic criterion. Generate samples z using p(z f, θ, g) p(g f, ẑ, θ) p(z) Often needs sampling (hidden discrete variable) Generate samples θ using p(θ f, ẑ, g) p(g f, σ 2 ɛ I) p( f ẑ, (m k, v k )) p(θ) Use of Conjugate priors analytical expressions. After convergence use samples to compute means and variances.

37 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 37/45 Computed Tomography with only two projections y H ij f 1 f j g i f N g(r, φ) = L f (x, y) dl f (x, y) = j f j b j (x, y) { 1 if (x, y) pixel j b j (x, y) = 0 else N g i = H ij f j + ɛ i j=1 g = Hf + ɛ f (x, y) Case study: Reconstruction from 2 projections g 1 (x) = f (x, y) dy, g 2 (y) = f (x, y) dx Very ill-posed inverse problem f (x, y) = g 1 (x) g 2 (y) Ω(x, y) Ω(x, y) is a Copula: Ω(x, y) dx = 1 Ω(x, y) dy = 1 x

38 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 38/45 Simple example g 1 g 2 g 3 g =?? 4?? f 1 f 3 g 3 f 2 f 4 g 4 g 1 g 2 f 1 f 2 f 3 f 4 Hf = g f = H 1 g if H invertible. H is rank deficient: rank(h) = f 1 f 4 f 7 f 2 f 5 f 8 f 3 f 6 f 9 g 1 g 2 g 3 Problem has infinite number of solutions. How to find all those solutions? g 4 g 5 g 6 Which one is the good one? Needs prior information. To find an unique solution, one needs either more data or prior information. 0 0

39 Application in CT: Reconstruction from 2 projections g f g = Hf + g f N (Hf, σ 2 I) Gaussian f z iid Gaussian or Gauss-Markov z iid or Potts c q(r) {0, 1} 1 δ(z(r) z(r0 )) binary p(f, z, θ g) p(g f, θ 1 ) p(f z, θ 2 ) p(z θ 3 ) p(θ) A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 39/45

40 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 40/45 Proposed algorithms p(f, z, θ g) p(g f, θ 1 ) p(f z, θ 2 ) p(z θ 3 ) p(θ) MCMC based general scheme: f p(f ẑ, θ, g) ẑ p(z f, θ, g) θ (θ f, ẑ, g) Iterative algorithme: Estimate f using p(f ẑ, θ, g) p(g f, θ) p(f ẑ, θ) Needs optimization of a quadratic criterion. Estimate z using p(z f, θ, g) p(g f, ẑ, θ) p(z) Needs sampling of a Potts Markov field. Estimate θ using p(θ f, ẑ, g) p(g f, σ 2 ɛ I) p( f ẑ, (m k, v k )) p(θ) Conjugate priors analytical expressions. Variational Bayesian Approximation Approximate p(f, z, θ g) by q 1 (f) q 2 (z) q 3 (θ)

41 Results Original Backprojection Filtered BP Gauss-Markov+pos GM+Line process GM+Label process c LS z c A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 41/45

42 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 42/45 Implementation issues In almost all the algorithms, the step of computation of f needs an optimization algorithm. The criterion to optimize is often in the form of J(f) = g Hf 2 + λ Df 2 Very often, we use the gradient based algorithms which need to compute J(f) = 2H t (g Hf) + 2λD t Df So, for the simplest case, in each step, we have f(k+1) = f(k) + α (k) [ H t (g H f (k) ) + 2λD t D f (k)]

43 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 43/45 Gradient based algorithms f(k+1) = f(k) + α [ H ( g H f (k)) λd D f (k)] 1. Compute ĝ = H f (Forward projection) 2. Compute δg = g ĝ (Error or residual) 3. Compute δf 1 = H δg (Backprojection of error) 4. Compute δf 2 = D D f (Correction due to regularization) 5. Update f(k+1) = f(k) + [δf1 + δf 2 ] Initial estimated guess image f (0) f (k) update Forward projection H correction term in image space Backprojection δf = H δg λd Df (k) H projections of estimated image ĝ = Hf (k) compare correction term in projection space δg = g ĝ Measured projections g Steps 1 and 3 need great computational cost and have been implemented on GPU.

44 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 44/45 Multi-Resolution Implementation Sacle 1: black g (1) = H (1) f (1) ( N N ) Sacle 2: green g (2) = H (2) f (2) (N/2 N/2) Sacle 3: red g (3) = H (3) f (3) (N/4 N/4)

45 A. Mohammad-Djafari, Hierarchical prior models for scalable Bayesian Tomography, WSTL, January 25-27, 2016, Szeged, Hungary. 45/45 Conclusions Computed Tomography is an Inverse problem Analytical methods have many limitations Algebraic methods push further these limitations Deterministic Regularization methods push still further the limitations of ill-conditioning. Probabilistic and in particular the Bayesian approach has many potentials Hierarchical prior model with hidden variables are very powerful tools for Bayesian approach to inverse problems. Gauss-Markov-Potts models for images incorporating hidden regions and contours Main Bayesian computation tools: JMAP, MCMC and VBA Application in different imaging system (X ray CT, Microwaves, PET, Ultrasound, Optical Diffusion Tomography (ODT), Acoustic source localization,...) Current Projects: Efficient implementation in 2D and 3D cases

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy.

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1. A Student-t based sparsity enforcing hierarchical prior for linear

More information

Inverse Problems: From Regularization to Bayesian Inference

Inverse Problems: From Regularization to Bayesian Inference 1 / 42 Inverse Problems: From Regularization to Bayesian Inference An Overview on Prior Modeling and Bayesian Computation Application to Computed Tomography Ali Mohammad-Djafari Groupe Problèmes Inverses

More information

Bayesian X-ray Computed Tomography using a Three-level Hierarchical Prior Model

Bayesian X-ray Computed Tomography using a Three-level Hierarchical Prior Model L. Wang, A. Mohammad-Djafari, N. Gac, MaxEnt 16, Ghent, Belgium. 1/26 Bayesian X-ray Computed Tomography using a Three-level Hierarchical Prior Model Li Wang, Ali Mohammad-Djafari, Nicolas Gac Laboratoire

More information

Approximate Bayesian Computation for Big Data

Approximate Bayesian Computation for Big Data A. Mohammad-Djafari, Approximate Bayesian Computation for Big Data, Tutorial at MaxEnt 2016, July 10-15, Gent, Belgium. 1/63. Approximate Bayesian Computation for Big Data Ali Mohammad-Djafari Laboratoire

More information

Bayesian inference for Machine Learning, Inverse Problems and Big Data: from Basics to Computational algorithms

Bayesian inference for Machine Learning, Inverse Problems and Big Data: from Basics to Computational algorithms A. Mohammad-Djafari, Approximate Bayesian Computation for Big Data, Seminar 2016, Nov. 21-25, Aix-Marseille Univ. 1/76. Bayesian inference for Machine Learning, Inverse Problems and Big Data: from Basics

More information

Bayesian Microwave Breast Imaging

Bayesian Microwave Breast Imaging Bayesian Microwave Breast Imaging Leila Gharsalli 1 Hacheme Ayasso 2 Bernard Duchêne 1 Ali Mohammad-Djafari 1 1 Laboratoire des Signaux et Systèmes (L2S) (UMR8506: CNRS-SUPELEC-Univ Paris-Sud) Plateau

More information

Bayesian inference methods for sources separation

Bayesian inference methods for sources separation A. Mohammad-Djafari, BeBec2012, February 22-23, 2012, Berlin, Germany, 1/12. Bayesian inference methods for sources separation Ali Mohammad-Djafari Laboratoire des Signaux et Systèmes, UMR8506 CNRS-SUPELEC-UNIV

More information

Inverse problems in imaging and computer vision

Inverse problems in imaging and computer vision 1 / 62. Inverse problems in imaging and computer vision From deterministic methods to probabilistic Bayesian inference Ali Mohammad-Djafari Groupe Problèmes Inverses Laboratoire des Signaux et Systèmes

More information

Bayesian Approach for Inverse Problem in Microwave Tomography

Bayesian Approach for Inverse Problem in Microwave Tomography Bayesian Approach for Inverse Problem in Microwave Tomography Hacheme Ayasso Ali Mohammad-Djafari Bernard Duchêne Laboratoire des Signaux et Systèmes, UMRS 08506 (CNRS-SUPELEC-UNIV PARIS SUD 11), 3 rue

More information

Inverse Problems: From Regularization Theory to Probabilistic and Bayesian Inference

Inverse Problems: From Regularization Theory to Probabilistic and Bayesian Inference A. Mohammad-Djafari, Inverse Problems: From Regularization to Bayesian Inference, Pôle signaux, 1/11. Inverse Problems: From Regularization Theory to Probabilistic and Bayesian Inference Ali Mohammad-Djafari

More information

VARIATIONAL BAYES WITH GAUSS-MARKOV-POTTS PRIOR MODELS FOR JOINT IMAGE RESTORATION AND SEGMENTATION

VARIATIONAL BAYES WITH GAUSS-MARKOV-POTTS PRIOR MODELS FOR JOINT IMAGE RESTORATION AND SEGMENTATION VARIATIONAL BAYES WITH GAUSS-MARKOV-POTTS PRIOR MODELS FOR JOINT IMAGE RESTORATION AND SEGMENTATION Hacheme AYASSO and Ali MOHAMMAD-DJAFARI Laboratoire des Signaux et Systèmes, UMR 8506 (CNRS-SUPELEC-UPS)

More information

Entropy, Information Theory, Information Geometry and Bayesian Inference in Data, Signal and Image Processing and Inverse Problems

Entropy, Information Theory, Information Geometry and Bayesian Inference in Data, Signal and Image Processing and Inverse Problems Entropy 2015, 17, 3989-4027; doi:10.3390/e17063989 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article Entropy, Information Theory, Information Geometry and Bayesian Inference in Data,

More information

Sensors, Measurement systems and Inverse problems

Sensors, Measurement systems and Inverse problems A. Mohammad-Djafari, Sensors, Measurement systems and Inverse problems, 2012-2013 1/78. Sensors, Measurement systems and Inverse problems Ali Mohammad-Djafari Laboratoire des Signaux et Systèmes, UMR8506

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Inverse problems, Deconvolution and Parametric Estimation

Inverse problems, Deconvolution and Parametric Estimation A. Mohammad-Djafari, Inverse problems, Deconvolution and Parametric Estimation, MATIS SUPELEC, 1/102. Inverse problems, Deconvolution and Parametric Estimation Ali Mohammad-Djafari Laboratoire des Signaux

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

MAXIMUM ENTROPIES COPULAS

MAXIMUM ENTROPIES COPULAS MAXIMUM ENTROPIES COPULAS Doriano-Boris Pougaza & Ali Mohammad-Djafari Groupe Problèmes Inverses Laboratoire des Signaux et Systèmes (UMR 8506 CNRS - SUPELEC - UNIV PARIS SUD) Supélec, Plateau de Moulon,

More information

The Bayesian approach to inverse problems

The Bayesian approach to inverse problems The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009 with with July 30, 2010 with 1 2 3 Representation Representation for Distribution Inference for the Augmented Model 4 Approximate Laplacian Approximation Introduction to Laplacian Approximation Laplacian

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Variational Methods in Bayesian Deconvolution

Variational Methods in Bayesian Deconvolution PHYSTAT, SLAC, Stanford, California, September 8-, Variational Methods in Bayesian Deconvolution K. Zarb Adami Cavendish Laboratory, University of Cambridge, UK This paper gives an introduction to the

More information

Bayesian inference with hierarchical prior models for inverse problems in imaging systems

Bayesian inference with hierarchical prior models for inverse problems in imaging systems A Mohammad-Djafari, Tutorial talk: Bayesian inference for inverse problems, WOSSPA, May 11-15, 2013, Mazafran, Algiers, 1/96 Bayesian inference with hierarchical prior models for inverse problems in imaging

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Lecture 1b: Linear Models for Regression

Lecture 1b: Linear Models for Regression Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

SUPREME: a SUPer REsolution Mapmaker for Extended emission

SUPREME: a SUPer REsolution Mapmaker for Extended emission SUPREME: a SUPer REsolution Mapmaker for Extended emission Hacheme Ayasso 1,2 Alain Abergel 2 Thomas Rodet 3 François Orieux 2,3,4 Jean-François Giovannelli 5 Karin Dassas 2 Boualam Hasnoun 1 1 GIPSA-LAB

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Covariance Matrix Simplification For Efficient Uncertainty Management

Covariance Matrix Simplification For Efficient Uncertainty Management PASEO MaxEnt 2007 Covariance Matrix Simplification For Efficient Uncertainty Management André Jalobeanu, Jorge A. Gutiérrez PASEO Research Group LSIIT (CNRS/ Univ. Strasbourg) - Illkirch, France *part

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Chapter 20. Deep Generative Models

Chapter 20. Deep Generative Models Peng et al.: Deep Learning and Practice 1 Chapter 20 Deep Generative Models Peng et al.: Deep Learning and Practice 2 Generative Models Models that are able to Provide an estimate of the probability distribution

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Inverse problems in imaging science: From classical regularization methods To state of the art Bayesian methods

Inverse problems in imaging science: From classical regularization methods To state of the art Bayesian methods A. Mohammad-Djafari, Inverse problems in imaging science:..., Tutorial presentation, IPAS 2014: Tunisia, Nov. 5-7, 2014, 1/76. Inverse problems in imaging science: From classical regularization methods

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

Outline lecture 2 2(30)

Outline lecture 2 2(30) Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Clustering bi-partite networks using collapsed latent block models

Clustering bi-partite networks using collapsed latent block models Clustering bi-partite networks using collapsed latent block models Jason Wyse, Nial Friel & Pierre Latouche Insight at UCD Laboratoire SAMM, Université Paris 1 Mail: jason.wyse@ucd.ie Insight Latent Space

More information

An introduction to Bayesian inference and model comparison J. Daunizeau

An introduction to Bayesian inference and model comparison J. Daunizeau An introduction to Bayesian inference and model comparison J. Daunizeau ICM, Paris, France TNU, Zurich, Switzerland Overview of the talk An introduction to probabilistic modelling Bayesian model comparison

More information

Hierarchical Sparse Bayesian Learning. Pierre Garrigues UC Berkeley

Hierarchical Sparse Bayesian Learning. Pierre Garrigues UC Berkeley Hierarchical Sparse Bayesian Learning Pierre Garrigues UC Berkeley Outline Motivation Description of the model Inference Learning Results Motivation Assumption: sensory systems are adapted to the statistical

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Results: MCMC Dancers, q=10, n=500

Results: MCMC Dancers, q=10, n=500 Motivation Sampling Methods for Bayesian Inference How to track many INTERACTING targets? A Tutorial Frank Dellaert Results: MCMC Dancers, q=10, n=500 1 Probabilistic Topological Maps Results Real-Time

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Bayesian Knowledge Fusion in Prognostics and Health Management A Case Study

Bayesian Knowledge Fusion in Prognostics and Health Management A Case Study Bayesian Knowledge Fusion in Prognostics and Health Management A Case Study Masoud Rabiei Mohammad Modarres Center for Risk and Reliability University of Maryland-College Park Ali Moahmamd-Djafari Laboratoire

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

CSE 473: Artificial Intelligence Autumn Topics

CSE 473: Artificial Intelligence Autumn Topics CSE 473: Artificial Intelligence Autumn 2014 Bayesian Networks Learning II Dan Weld Slides adapted from Jack Breese, Dan Klein, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 473 Topics

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Bayesian Paradigm. Maximum A Posteriori Estimation

Bayesian Paradigm. Maximum A Posteriori Estimation Bayesian Paradigm Maximum A Posteriori Estimation Simple acquisition model noise + degradation Constraint minimization or Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization)

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Better restore the recto side of a document with an estimation of the verso side: Markov model and inference with graph cuts

Better restore the recto side of a document with an estimation of the verso side: Markov model and inference with graph cuts June 23 rd 2008 Better restore the recto side of a document with an estimation of the verso side: Markov model and inference with graph cuts Christian Wolf Laboratoire d InfoRmatique en Image et Systèmes

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information