Bayesian inference J. Daunizeau

Size: px

Start display at page:

Download "Bayesian inference J. Daunizeau"

Evelyn Arnold
5 years ago
Views:

1 Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK

2 Overview of the talk 1 Probabilistic modelling and representation of uncertainty 1.1 Bayesian paradigm 1.2 Hierarchical models 1.3 Frequentist versus Bayesian inference 2 Numerical Bayesian inference methods 2.1 Sampling methods 2.2 Variational methods (ReML, EM, VB) 3 SPM applications 3.1 amri segmentation 3.2 Decoding of brain images 3.3 Model-based fmri analysis (with spatial priors) 3.4 Dynamic causal modelling

3 Overview of the talk 1 Probabilistic modelling and representation of uncertainty 1.1 Bayesian paradigm 1.2 Hierarchical models 1.3 Frequentist versus Bayesian inference 2 Numerical Bayesian inference methods 2.1 Sampling methods 2.2 Variational methods (ReML, EM, VB) 3 SPM applications 3.1 amri segmentation 3.2 Decoding of brain images 3.3 Model-based fmri analysis (with spatial priors) 3.4 Dynamic causal modelling

numbers (D1) - should conform with intuition (D2) - should be

4 Bayesian paradigm probability theory: basics Degree of plausibility desiderata: - should be represented using real numbers (D1) - should conform with intuition (D2) - should be consistent (D3) normalization: a=2 marginalization: a=2 b=5 conditioning : (Bayes rule)

5 Bayesian paradigm deriving the likelihood function - Model of data with unknown parameters: y = f ( θ ) e.g., GLM: f ( θ ) = Xθ θ - But data is noisy: y = f ( θ ) + ε - Assume noise/residuals is small : f p( ε ) exp 1 2 ε 2 2σ ( σ ) P ε > ε Distribution of data, given fixed parameters: 1 p y y f 2 2σ ( θ ) exp ( ( θ )) 2

6 Bayesian paradigm likelihood, priors and the model evidence Likelihood: Prior: θ generative model m Bayes rule:

7 Bayesian paradigm forward and inverse problems forward problem ( ϑ, m) p y likelihood posterior distribution p ( ϑ y, m) inverse problem

8 Bayesian paradigm model comparison Principle of parsimony : «plurality should not be assumed without necessity» Model evidence: y=f(x) y = f(x) x model evidence p(y m) Occam s razor : space of all data sets

9 Hierarchical models principle inference causality

10 Hierarchical models directed acyclic graphs (DAGs)

define the null, e.g.: p( t H 0 ) Frequentist versus Bayesian inference a (quick) note on hypothesis testing H 0 :θ = 0 define two alternative models, e.g.: ( θ ) m : p m 0 0 ( θ ) = ( Σ) m : p m N 0, 1 1 1 if θ = 0 = 0 otherwise if ( * 0 ) t * P t > t H α ( > * 0 ) P t t H ( ) t t Y estimate parameters (obtain test stat.

11 define the null, e.g.: p( t H 0 ) Frequentist versus Bayesian inference a (quick) note on hypothesis testing H 0 :θ = 0 define two alternative models, e.g.: ( θ ) m : p m 0 0 ( θ ) = ( Σ) m : p m N 0, if θ = 0 = 0 otherwise if ( * 0 ) t * P t > t H α ( > * 0 ) P t t H ( ) t t Y estimate parameters (obtain test stat.) apply decision rule, i.e.: then reject H0 classical (null) hypothesis testing apply decision rule, e.g.: ( ) ( 1 ) y p ( Y m 0 ) p ( Y m 1 ) Bayesian Model Comparison Y space of all datasets P m0 y if α then accept m 0 P m y

12 Family-level inference trading model resolution against statistical power P(m 1 y) = 0.04 P(m 2 y) = 0.25 A B A B model selection error risk: ( = 1 ) = 1 max ( ) P e y P m y = 0.3 m P(m 2 y) = 0.01 P(m 2 y) = 0.7 A B A B u u

13 Family-level inference trading inference resolution against statistical power P(m 1 y) = 0.04 P(m 2 y) = 0.25 A B A B model selection error risk: ( = 1 ) = 1 max ( ) P e y P m y = 0.3 m P(m 2 y) = 0.01 P(m 2 y) = 0.7 family inference (pool statistical evidence) A B A B ( ) = P ( m y) P f y m f u P(f 1 y) = 0.05 u P(f 2 y) = 0.95 ( = 1 ) = 1 max ( ) P e y P f y = 0.05 f

14 Group-level model comparison preliminary: Polya s urn m m r i i = 1 = 0 i th marble is blue i th marble is purple = proportion of blue marbles in the urn r (binomial) probability of drawing a set of n marbles: m1 m2 mn n m ( ) ( 1 ) 1 i = p m r r r i= 1 m i Thus, our belief about the proportion of blue marbles is: i= 1 ( ) 1 n p r n m 1 mi 1 i p ( r m) p ( r) r ( 1 r) E r m = m n i= 1 i

15 Group-level model comparison what if we are colour blind? At least, we can measure how likely is the i th subject s data under each model! p( y1 m 1) p( y2 m2 ) p( yi m i ) p( yn mn ) m1 m2 y 1 y 2 r mn y n n (, ) ( ) ( i i ) ( i ) p r m y p r p y m p m r i= 1 Our belief about the proportion of models is: ( ) = p ( r, m y) p r y m Exceedance probability: ϕk = P( rk > rk ' k y)

16 Overview of the talk 1 Probabilistic modelling and representation of uncertainty 1.1 Bayesian paradigm 1.2 Hierarchical models 1.3 Frequentist versus Bayesian inference 2 Numerical Bayesian inference methods 2.1 Sampling methods 2.2 Variational methods (ReML, EM, VB) 3 SPM applications 3.1 amri segmentation 3.2 Decoding of brain images 3.3 Model-based fmri analysis (with spatial priors) 3.4 Dynamic causal modelling

17 Sampling methods MCMC example: Gibbs sampling

18 Variational methods VB / EM / ReML VB : maximize the free energy F(q) w.r.t. the approximate posterior q(θ) under some (e.g., mean field, Laplace) simplifying constraint θ 2 θ 1 p p ( θ1, θ2 y, m) ( θ1 or 2 y, m) q( θ 1 or 2 )

19 Overview of the talk 1 Probabilistic modelling and representation of uncertainty 1.1 Bayesian paradigm 1.2 Hierarchical models 1.3 Frequentist versus Bayesian inference 2 Numerical Bayesian inference methods 2.1 Sampling methods 2.2 Variational methods (ReML, EM, VB) 3 SPM applications 3.1 amri segmentation 3.2 Decoding of brain images 3.3 Model-based fmri analysis (with spatial priors) 3.4 Dynamic causal modelling

20 segmentation segmentation and and normalisation normalisation posterior posterior probability probability maps maps (PPMs) (PPMs) dynamic dynamic causal causal modelling modelling multivariate multivariate decoding decoding realignment smoothing general linear model normalisation statistical inference p <0.05 Gaussian field theory template

21 class variances amri segmentation mixture of Gaussians (MoG) model σ 1 σ 2 σ k µ 1 µ 2 i th voxel label y i c i λ i th voxel value class frequencies µ k class means grey matter white matter CSF

22 fixation cross + Decoding of brain images recognizing brain states from fmri pace >> response log-evidence of X-Y sparse mappings: effect of lateralization log-evidence of X-Y bilateral mappings: effect of spatial deployment

fmri time series analysis spatial priors and model

regions best explained by short-term memory model prior

coeff AR coeff (correlated noise) long-term memory design

23 fmri time series analysis spatial priors and model comparison short-term memory design matrix (X) PPM: regions best explained by short-term memory model prior variance of data noise GLM coeff prior variance of GLM coeff AR coeff (correlated noise) long-term memory design matrix (X) PPM: regions best explained by long-term memory model fmri time series

attention 15 models marginal likelihood ln p y m ( ) 0.10 PPC 0.

24 Dynamic Causal Modelling network structure identification m 1 m 2 m 3 m 4 attention attention attention attention PPC PPC PPC PPC stim V1 V5 stim V1 V5 stim V1 V5 stim V1 V5 attention 15 models marginal likelihood ln p y m ( ) 0.10 PPC 0.39 estimated effective synaptic strengths for best model (m 4 ) stim 0.26 V V m 1 m 2 m 3 m 4

25 DCMs and DAGs a note on causality θ time θ 13 θ 32 t + t 3 3 t 0 u u θ13 θ3 t x& = f ( x, u, θ ) u t t

26 I thank you for your attention.

27 A note on statistical significance lessons from the Neyman-Pearson lemma Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test Λ = ( 1) ( 0 ) p y H p y H is the most powerful test of size α = u ( Λ ) p u H 0 to test the null. what is the threshold u, above which the Bayes factor test yields a error I rate of 5%? ROC analysis 1 - error II rate MVB (Bayes factor) u=1.09, power=56% CCA (F-statistics) F=2.20, power=20% error I rate

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty