Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Size: px

Start display at page:

Download "Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis"

Rosamund Fox
5 years ago
Views:

1 Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28

2 Context : Computational Anatomy Context and motivations : * Describing shapes * Shape matching : many elaborated registrations theories * Defining and infering population average (atlas) Statistical estimation in presence of unobserved variables Proper statistical model = generative Consistency issues addressed * Discrimination/Classification

3 Context : Computational Anatomy Outline Three generative statistical models and stochastic algorithms 1 Bayesian Mixed Effect (BME) gray level Template 1.1 Mathematical framework for deformable models 1.2 Past approaches to compute a population average 1.3 Generative statistical models 1.4 Statistical estimation of the model parameters 1.5 Experiments on USPS database and 2D medical images 2 Bayesian Mixed Effect DTI Template 3 Noisy ICA

4 Outline Three generative statistical models and stochastic algorithms 1 Bayesian Mixed Effect (BME) gray level Template 1.1 Mathematical framework for deformable models 1.2 Past approaches to compute a population average 1.3 Generative statistical models 1.4 Statistical estimation of the model parameters 1.5 Experiments on USPS database and 2D medical images 2 Bayesian Mixed Effect DTI Template 3 Noisy ICA

5 Warping and past approaches Linearized deformations : Non rigid deformations Let ψ be the deformation from I to I 1 and v a vector field : ψ = Id + v Resulting deformed image given by : I 1 (x) I (x v(x)) Advantages : Easy to use in computations Relevant for certain classes of shapes Drawbacks : Invertibility not guaranteed : Overlap Shape topology may change!

6 Warping and past approaches Some solutions previously given : Only for the template! Using one of the data images y n 1.

7 Warping and past approaches Some solutions previously given : Only for the template! Using one of the data images y n 1.

8 Warping and past approaches Some solutions previously given : Only for the template! Using one of the data images y n

9 Warping and past approaches Some solutions previously given : Only for the template! Procrustes mean : (Î, ˆv 1,..., ˆv n ) = arg min n I,v 1,...,v n i=1 ( 1 2 v i 2 V + 1 ) 2σ 2 y i φ v i I 2

10 Warping and past approaches Some solutions previously given : Only for the template! Procrustes mean : (Î, ˆv 1,..., ˆv n ) = arg min n I,v 1,...,v n i=1 A statistical interpretation : (Glasbey & Mardia) Problems : ( 1 2 v i 2 V + 1 ) 2σ 2 y i φ v i I 2 y i (x + v i (x)) = I (x) + ǫ x, ǫ x N(,σ 2 ), x Λ Needs interpolation Unobserved warping variables Not a generative statistical model

11 Warping and past approaches Issues : Describing databases as some i.i.d. sample of some parametric generative statistical model. Model parameters = template + deformation law (define the space V ) + noise Learning the parameters to avoid the previous problems : An intrinsic template I, A weighting term σ 2, A global geometric behavior in the class.

12 Statistical Generative Model Generative Statistical Model : Deformable template framework : y i j,k = I (r j,k v i (r j,k )) + σε j,k Conditions chosen for the template and the deformation fields : Parametric Model of Splines : let (p) kp 1 and (g)kg 1 be two sets of control points (fixed, uniformly distributed) : I (x) = K p α(x) = K p (x,p k )α(k), v β (x) = (K g β)(x) = k p k=1 k g k=1 K g (x,g k )β(k).

13 Statistical Generative Model What to learn? Photometry : Geometry : { α : to code the template σ 2 : the noise variance : to code each deformation β n 1 BUT : does not give the geometrical behavior in the training set. = Introduce a prior on β : β i ν(dβ) Parameters of ν = parameters to learn. Deformations (β i ) 1 i n = random hidden variables

14 Statistical Generative Model Generative Model : One component per class : (Γ g,θ p ) ν g ν p with θ p = (α,σ 2 ) β1 n n i=1 N(,Γ g) Γ g y n 1 n i=1 N(v β i I α,σ 2 Id Λ ) β n 1,θ p where ν g (dγ g ),ν p (dσ 2,dα) are prior laws on the parameters. Remark : big structures to learn even in the case of small size training set

15 Statistical Generative Model Generative Model (2) : General case : mixtures of deformable templates (τ m components per class) Hidden random variables : (β i ) 1 i n and the image labels (τ i ) 1 i n. ρ ν ρ θ = (θ τ g,θ τ p) 1 τ τm τm τ=1 (ν g ν p ) τ n 1 n i=1 τ m τ=1 ρ τ δ τ ρ β1 n n i=1n(,γ τ i g ) θ, τ1 n y1 n n i=1 N(v β i I ατi,στ 2 i Id Λ ) β1, n θ, τ1 n

16 Statistical Generative Model Generative Statistical Model : Unobserved variables y 1 τ 1,β 1,ε 1 Hyperparameters (ρ,α,γ,σ 2 ) 1... (ρ,α,γ,σ 2 ) m Population factors (fixed effects) τ ι,β ι,ε ι τ n,β n,ε n Individual factors (random effects) y i y n Fig.: Mixed effect structure for our BME-template

17 Estimation How to learn the parameters? the MAP Estimator : Parameters θ are estimated by maximum posterior likelihood : ˆθ = arg maxp(θ y) where θ Θ = { (α,σ 2,Γ g ) α R kp, σ 2 >, Γ g Sym + 2k g, (R) }. Sym + 2k g, (R) is the set of positive definite symmetric matrices. Let Θ = { θ Θ E P (log q(y θ )) = sup θ Θ E P (log q(y θ))} where P denotes the distribution governing the observations.

18 Estimation How to do in practice? Since β1 n are unobserved variables, a natural approach to reach the MAP estimator is the EM algorithm. Iteration l of the algorithm : E Step : Compute the posterior law on β i,i = 1,...,n. M Step : Parameter update : θ l+1 = arg max E [log q(θ,β1,y n 1 n ) y1 n,θ l ]. θ BUT : the E step is not tractable!

19 Estimation Details of the maximization step : Geometry : where θ g,l+1 = Γ g,l+1 = 1 n + a g (n[ββ t ] l + a g Σ g ). [ββ t ] l = 1 n n i=1 ββ t ν l,i (β)dβ, is the empirical covariance matrix with respect to the posterior density function. Importance of the prior!

20 Estimation Solution proposed : Stochastic version of the EM algorithm : Idea : Couple SAEM with MCMC procedure (Delyon, Lavielle, Moulines and Kuhn, Lavielle) : One component case : Iteration l l + 1 of the algorithm : Simulation step : β l+1 Π θl (β l, ) where Π θl (β l, ) is a transition probability of a convergent Markov Chain having the posterior distribution as stationary distribution, Stochastic approximation : Q l+1 (θ) = Q l (θ) + l [log q(y,β l+1,θ) Q l (θ)] where ( l ) is a decreasing sequence of positive step-sizes. Maximization step : θ l+1 = arg maxq l+1 (θ) [ ] Π θl (β l, ) given by an hybrid Gibbs sampler

21 Estimation Stochastic version of the EM algorithm (2) : Since our model is an Exponential Model, q(y,β l+1,θ) = exp { ψ(θ) + S(y,β),φ(θ) } the stochastic approximation can be done on the sufficient statistics S so that the algorithm is done via : ( ) s l+1 = s l + l S(y,β l+1 ) s l Let L(s,θ) = ψ(θ) + s,φ(θ), l(θ) = log q(y,θ) and ˆθ(s) = arg max L(s,θ(s)) then s θ k+1 = ˆθ(s l+1 ).

22 Estimation Stochastic approximation with truncation on random boundaries : Set κ =, s K and β K. k 1 compute s = s k 1 + k 1 (S( β) s k 1 ) where β is sampled from a transition kernel Π θk 1 (β k 1,.). If s K κk 1 and s s k 1 ε k 1 set (s k,β k ) = ( s, β) and κ k = κ k 1, else set (s k,β k ) = ( s, β) K xk and κ k = κ k 1 + 1, where ( s, β) can be chosen through different ways. θ k = arg max L(s k,θ) θ

23 Multicomponent Model Stochastic version of the EM algorithm for the multicomponent model : Intuitive generalization : problem of trapping states. Image analysis interpretation : each iteration tries to deform the data so it is closer to its current component and will not tend to move toward another one high dimensional hidden variable β Solution proposed : Consider another simulation method based on the Gibbs sampler for the deformation and on another law for the class of a given image.

24 Multicomponent Model The new algorithm : Transition step l l + 1 using a hybrid Gibbs sampler on (β,τ) : for each τ : Run N l times the hybrid Gibbs Sampler on β given τ. = Π N l (β τ) ˆβ (l+1) τ draw τ (l+1) through the discrete law with weights : ( 1 p Nl (τ) N N l i mc=1 [ f (ˆβ τ,(imc)) q(y, ˆβ τ,(imc),τ θ,ρ) ]) 1 β (l+1) = ˆβ τ (l+1),(n l )

25 Multicomponent Model Theoretical Results : With these models and algorithms we have proved some important asymptotic results : Consistency of the MAP estimator Convergence of both stochastic algorithms

26 Experiments MCMC-SAEM : Template estimation : Fig.: Left : one component prototype. Right : 2 component prototypes

27 Experiments MCMC-SAEM : The Geometric Distribution : Fig.: Left : 2 examples of the training set. Right : 2 examples drawn from the prior geometric distribution.

28 Experiments MCMC-SAEM : Geometry :

29 Experiments In presence of noise :

30 Experiments Medical Images : Splenium of the Corpus Callossum 47 images of the corpus callosum (and part of the cerebellum) Fig.: Left : Gray level mean of the 47 images. Right : template estimated with the stochastic algorithm.

31 Experiments Medical Images : Splenium of the Corpus Callossum 47 images of the corpus callosum (and part of the cerebellum) clustered into 2 components by the multicomponent model. Fig.: Results from the estimation with the stochastic algorithm. Left : component 1. Right : component 2.

32 Experiments Robustness of the algorithm : Same hyper-parameters as the previous gray level images

33 Outline Three generative statistical models and stochastic algorithms 1 Bayesian Mixed Effect (BME) gray level Template 1.1 Mathematical framework for deformable models 1.2 Past approaches to compute a population average 1.3 Generative statistical models 1.4 Statistical estimation of the model parameters 1.5 Experiments on USPS database and 2D medical images 2 Bayesian Mixed Effect DTI Template 3 Noisy ICA

34 BME-DTI Template Bayesian Mixed Effect DTI Template : Goals of our approach : * Estimate a Template of Diffusion Tensor Image on a given region of the anatomy * Just use the Diffusion Weight Images [DWIs] (real vector corresponding to the response to some different gradients)

35 BME-DTI Template Previous approach Subjects Least square approx. mean Min of energy DWI G1 Subject 1... DTI 1 DWI Gm DWI G1 Subject 2... DTI 2 DWI Gm... DWI G1 Subject n... DTI n DWI Gm DTI template

36 BME-DTI Template Our approach Subjects observed (hidden) Max likelihood DWI G1 Subject 1... (DTI 1) DWI Gm DWI G1 Subject 2... (DTI 2) DWI Gm... DWI G1 Subject n... (DTI n) DWI Gm DTI template

37 C. A. BME Template MCMC-SAEM algorithm Experiments DTI template Noisy ICA Conclusion BME-DTI Template Results on synthetic data : 2 different template tensors FA 1 =.6791, FA 2 =.6918 ADC 1 =.538, ADC 2 = random samples with 15 subjects each LS FAM-EM. SAEM LS FAM-EM SAEM bias var mse FA ADC

38 BME-DTI Template ellipsoide reelle oriente ellipsoide moindres carres oriente ellipsoide mode oriente ellipsoide stochatique oriente

39 Outline Three generative statistical models and stochastic algorithms 1 Bayesian Mixed Effect (BME) gray level Template 1.1 Mathematical framework for deformable models 1.2 Past approaches to compute a population average 1.3 Generative statistical models 1.4 Statistical estimation of the model parameters 1.5 Experiments on USPS database and 2D medical images 2 Bayesian Mixed Effect DTI Template 3 Noisy ICA

40 Noisy ICA Model Observations : X n 1 such as X i = AY i + σε, A : source matrix σ 2 : variance of the Gaussian noise Y n 1 hidden variables. Model : Y n,p 1,1 n i=1 p j=1 ν η η, X n 1 N i=1 N(AY i,σ 2 Id) A, σ 2, Y N 1. Various choice of the distribution ν η Same MCMC-SAEM algorithm to treat this estimation problem.

41 C. A. BME Template MCMC-SAEM algorithm DTI template Experiments Noisy ICA Experiments : 11 subjects, 2 I.C Conclusion

42 Conclusion Generative statistical model = proper statistical framework for designing and inferring population average Stochastic algorithm of multiple uses even in difficult conditions Thank you!

44 MCMC-SAEM : The Geometric Distribution : Between 2 classes : Between 2 components in the same class :

Stéphanie Allassonniere, Estelle Kuhn. To cite this version: HAL Id: hal https://hal.archives-ouvertes.fr/hal v4

Convergent Stochastic Expectation Maximization algorithm with efficient sampling in high dimension. Application to deformable template model estimation Stéphanie Allassonniere, Estelle Kuhn To cite this