Bayesian inference J. Daunizeau

Size: px

Start display at page:

Download "Bayesian inference J. Daunizeau"

Phyllis Ross
5 years ago
Views:

1 Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK

2 Overview of the talk 1 Probabilistic modelling and representation of uncertainty 1.1 Bayesian paradigm 1.2 Hierarchical models 1.3 Frequentist versus Bayesian inference 2 Notes on Bayesian inference 2.1 Variational methods (ReML, EM, VB) 2.2 Family inference 2.3 Group-level model comparison 3 SPM applications 3.1 amri segmentation 3.2 Decoding of brain images 3.3 Model-based fmri analysis (with spatial priors) 3.4 Dynamic causal modelling

3 Overview of the talk 1 Probabilistic modelling and representation of uncertainty 1.1 Bayesian paradigm 1.2 Hierarchical models 1.3 Frequentist versus Bayesian inference 2 Notes on Bayesian inference 2.1 Variational methods (ReML, EM, VB) 2.2 Family inference 2.3 Group-level model comparison 3 SPM applications 3.1 amri segmentation 3.2 Decoding of brain images 3.3 Model-based fmri analysis (with spatial priors) 3.4 Dynamic causal modelling

should be consistent (D3) normalization:

4 Bayesian paradigm probability theory: basics Degree of plausibility desiderata: - should be represented using real numbers (D1) - should conform with intuition (D2) - should be consistent (D3) normalization: a=2 marginalization: a=2 b=5 conditioning : (Bayes rule)

5 Bayesian paradigm deriving the likelihood function - Model of data with unknown parameters: y f e.g., GLM: f X - But data is noisy: y f - Assume noise/residuals is small : f exp p P Distribution of data, given fixed parameters: p y y f exp 2

6 Bayesian paradigm likelihood, priors and the model evidence Likelihood: Prior: generative model m Bayes rule:

7 Bayesian paradigm forward and inverse problems forward problem p y, m likelihood posterior distribution p y, m inverse problem

8 y=f(x) y = f(x) Bayesian paradigm model comparison Principle of parsimony : «plurality should not be assumed without necessity» Model evidence: Occam s razor : x model evidence p(y m) space of all data sets

9 Hierarchical models principle inference causality

10 Hierarchical models directed acyclic graphs (DAGs)

11 define the null, e.g.: p t H 0 Frequentist versus Bayesian inference a (quick) note on hypothesis testing H 0 : 0 define two alternative models, e.g.: m : p m 0 0 m : p m N 0, if 0 0 otherwise if 0 t * P t t * H P t t * H 0 t t Y estimate parameters (obtain test stat.) apply decision rule, i.e.: then reject H0 classical (null) hypothesis testing apply decision rule, e.g.: y Bayesian Model Comparison p Y m 0 p Y m 1 Y space of all datasets P m0 y if then accept m 0 P m y 1

12 Overview of the talk 1 Probabilistic modelling and representation of uncertainty 1.1 Bayesian paradigm 1.2 Hierarchical models 1.3 Frequentist versus Bayesian inference 2 Notes on Bayesian inference 2.1 Variational methods (ReML, EM, VB) 2.2 Family inference 2.3 Group-level model comparison 3 SPM applications 3.1 amri segmentation 3.2 Decoding of brain images 3.3 Model-based fmri analysis (with spatial priors) 3.4 Dynamic causal modelling

g., mean field, Laplace) simplifying constraint 2 1

13 Variational methods VB / EM / ReML VB : maximize the free energy F(q) w.r.t. the approximate posterior q(θ) under some (e.g., mean field, Laplace) simplifying constraint 2 1 p p, y, m or 2, q 1 or 2 y m

14 Family-level inference trading inference resolution against statistical power P(m 1 y) = 0.04 P(m 2 y) = 0.25 A B A B model selection error risk: 1 1 max P e y P m y 0.3 m P(m 2 y) = 0.01 P(m 2 y) = 0.7 A B A B u u

15 Family-level inference trading inference resolution against statistical power P(m 1 y) = 0.04 P(m 2 y) = 0.25 A B A B model selection error risk: 1 1 max P e y P m y 0.3 m P(m 2 y) = 0.01 P(m 2 y) = 0.7 family inference (pool statistical evidence) A B A B Pm y P f y m f u P(f 1 y) = 0.05 u P(f 2 y) = max P e y P f y 0.05 f

16 Group-level model comparison preliminary: Polya s urn m m r i i 1 0 i th marble is blue i th marble is purple = proportion of blue marbles in the urn r (binomial) probability of drawing a set of n marbles: m1 m2 mn n m 1 1 i p m r r r i1 m i Thus, our belief about the proportion of blue marbles is: i1 1 n pr n m 1m 1 i i pr m prr 1 r E r m m n i1 i

17 Group-level model comparison what if we are colour blind? At least, we can measure how likely is the i th subject s data under each model! p y1 m1 p y2 m2 p yi mi p yn mn m1 m2 y 1 y 2 r mn y n n, i i i p r m y p r p y m p m r i1 Our belief about the proportion of models is: pr, m y p r y m Exceedance probability: k Prk rk ' k y

18 Overview of the talk 1 Probabilistic modelling and representation of uncertainty 1.1 Bayesian paradigm 1.2 Hierarchical models 1.3 Frequentist versus Bayesian inference 2 Notes on Bayesian inference 2.1 Variational methods (ReML, EM, VB) 2.2 Family inference 2.3 Group-level model comparison 3 SPM applications 3.1 amri segmentation 3.2 Decoding of brain images 3.3 Model-based fmri analysis (with spatial priors) 3.4 Dynamic causal modelling

19 segmentation and normalisation posterior probability maps (PPMs) dynamic causal modelling multivariate decoding realignment smoothing general linear model normalisation statistical inference p <0.05 Gaussian field theory template

20 class variances amri segmentation mixture of Gaussians (MoG) model 1 2 k 1 2 i th voxel label y i c i i th voxel value class frequencies k class means grey matter white matter CSF

21 fixation cross + Decoding of brain images recognizing brain states from fmri pace >> response log-evidence of X-Y sparse mappings: effect of lateralization log-evidence of X-Y bilateral mappings: effect of spatial deployment

22 fmri time series analysis spatial priors and model comparison short-term memory design matrix (X) PPM: regions best explained by short-term memory model prior variance of data noise GLM coeff prior variance of GLM coeff AR coeff (correlated noise) long-term memory design matrix (X) PPM: regions best explained by long-term memory model fmri time series

V5 15 10 models marginal likelihood ln p y m attention 0.10 PPC 1.25 0.39 0.

23 Dynamic Causal Modelling network structure identification m 1 m 2 m 3 m 4 attention attention attention attention PPC PPC PPC PPC stim V1 V5 stim V1 V5 stim V1 V5 stim V1 V models marginal likelihood ln p y m attention 0.10 PPC estimated effective synaptic strengths for best model (m 4 ) 5 stim 0.26 V V5 0 m 1 m 2 m 3 m 4

24 I thank you for your attention.

25 A note on statistical significance lessons from the Neyman-Pearson lemma 1 - error II rate Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test p y H p y H is the most powerful test of size 1 0 u p u H 0 to test the null. what is the threshold u, above which the Bayes factor test yields a error I rate of 5%? ROC analysis MVB (Bayes factor) u=1.09, power=56% CCA (F-statistics) F=2.20, power=20% error I rate

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty