Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK

Overview of the talk 1 Probabilistic modelling and representation of uncertainty 1.1 Bayesian paradigm 1.2 Hierarchical models 1.3 Frequentist versus Bayesian inference 2 Notes on Bayesian inference 2.1 Variational methods (ReML, EM, VB) 2.2 Family inference 2.3 Group-level model comparison 3 SPM applications 3.1 amri segmentation 3.2 Decoding of brain images 3.3 Model-based fmri analysis (with spatial priors) 3.4 Dynamic causal modelling

Bayesian paradigm probability theory: basics Degree of plausibility desiderata: - should be represented using real numbers (D1) - should conform with intuition (D2) - should be consistent (D3) normalization: a=2 marginalization: a=2 b=5 conditioning : (Bayes rule)

Bayesian paradigm deriving the likelihood function - Model of data with unknown parameters: y f e.g., GLM: f X - But data is noisy: y f - Assume noise/residuals is small : f 1 2 2 2 exp p P 4 0.05 Distribution of data, given fixed parameters: p y 1 2 2 y f exp 2

Bayesian paradigm likelihood, priors and the model evidence Likelihood: Prior: generative model m Bayes rule:

Bayesian paradigm forward and inverse problems forward problem p y, m likelihood posterior distribution p y, m inverse problem

y=f(x) y = f(x) Bayesian paradigm model comparison Principle of parsimony : «plurality should not be assumed without necessity» Model evidence: Occam s razor : x model evidence p(y m) space of all data sets

Hierarchical models principle inference causality

Hierarchical models directed acyclic graphs (DAGs)

define the null, e.g.: p t H 0 Frequentist versus Bayesian inference a (quick) note on hypothesis testing H 0 : 0 define two alternative models, e.g.: m : p m 0 0 m : p m N 0, 1 1 1 if 0 0 otherwise if 0 t * P t t * H P t t * H 0 t t Y estimate parameters (obtain test stat.) apply decision rule, i.e.: then reject H0 classical (null) hypothesis testing apply decision rule, e.g.: y Bayesian Model Comparison p Y m 0 p Y m 1 Y space of all datasets P m0 y if then accept m 0 P m y 1

Variational methods VB / EM / ReML VB : maximize the free energy F(q) w.r.t. the approximate posterior q(θ) under some (e.g., mean field, Laplace) simplifying constraint 2 1 p p, y, m 1 2 1 or 2, q 1 or 2 y m

Family-level inference trading inference resolution against statistical power P(m 1 y) = 0.04 P(m 2 y) = 0.25 A B A B model selection error risk: 1 1 max P e y P m y 0.3 m P(m 2 y) = 0.01 P(m 2 y) = 0.7 family inference (pool statistical evidence) A B A B Pm y P f y m f u P(f 1 y) = 0.05 u P(f 2 y) = 0.95 1 1 max P e y P f y 0.05 f

Group-level model comparison preliminary: Polya s urn m m r i i 1 0 i th marble is blue i th marble is purple = proportion of blue marbles in the urn r (binomial) probability of drawing a set of n marbles: m1 m2 mn n m 1 1 i p m r r r i1 m i Thus, our belief about the proportion of blue marbles is: i1 1 n pr n m 1m 1 i i pr m prr 1 r E r m m n i1 i

Group-level model comparison what if we are colour blind? At least, we can measure how likely is the i th subject s data under each model! p y1 m1 p y2 m2 p yi mi p yn mn m1 m2 y 1 y 2 r mn y n n, i i i p r m y p r p y m p m r i1 Our belief about the proportion of models is: pr, m y p r y m Exceedance probability: k Prk rk ' k y

segmentation and normalisation posterior probability maps (PPMs) dynamic causal modelling multivariate decoding realignment smoothing general linear model normalisation statistical inference p <0.05 Gaussian field theory template

class variances amri segmentation mixture of Gaussians (MoG) model 1 2 k 1 2 i th voxel label y i c i i th voxel value class frequencies k class means grey matter white matter CSF

fixation cross + Decoding of brain images recognizing brain states from fmri pace >> response log-evidence of X-Y sparse mappings: effect of lateralization log-evidence of X-Y bilateral mappings: effect of spatial deployment

fmri time series analysis spatial priors and model comparison short-term memory design matrix (X) PPM: regions best explained by short-term memory model prior variance of data noise GLM coeff prior variance of GLM coeff AR coeff (correlated noise) long-term memory design matrix (X) PPM: regions best explained by long-term memory model fmri time series

Dynamic Causal Modelling network structure identification m 1 m 2 m 3 m 4 attention attention attention attention PPC PPC PPC PPC stim V1 V5 stim V1 V5 stim V1 V5 stim V1 V5 15 10 models marginal likelihood ln p y m attention 0.10 PPC 1.25 0.39 0.26 estimated effective synaptic strengths for best model (m 4 ) 5 stim 0.26 V1 0.13 0.46 V5 0 m 1 m 2 m 3 m 4

I thank you for your attention.

A note on statistical significance lessons from the Neyman-Pearson lemma 1 - error II rate Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test p y H p y H is the most powerful test of size 1 0 u p u H 0 to test the null. what is the threshold u, above which the Bayes factor test yields a error I rate of 5%? ROC analysis MVB (Bayes factor) u=1.09, power=56% CCA (F-statistics) F=2.20, power=20% error I rate