Bayesian inference. Justin Chumbley ETH and UZH. (Thanks to Jean Denizeau for slides)

Size: px

Start display at page:

Download "Bayesian inference. Justin Chumbley ETH and UZH. (Thanks to Jean Denizeau for slides)"

Michael Bishop
5 years ago
Views:

1 Bayesian inference Justin Chumbley ETH and UZH (Thanks to Jean Denizeau for slides)

2 Overview of the talk Introduction: Bayesian inference Bayesian model comparison Group-level Bayesian model selection

3 Overview of the talk Introduction: Bayesian inference Bayesian model comparison Group-level Bayesian model selection

Introduction: Bayesian inference probability

be consistent (D) (D2) (D3) normalization:

4 Introduction: Bayesian inference probability theory: basics Degree of plausibility desiderata: should be represented using real numbers should conform with intuition should be consistent (D) (D2) (D3) normalization: a=2 marginalization: a=2 b=5 conditioning : (Bayes rule)

5 Introduction: Bayesian inference deriving the likelihood function - Model of data with unknown parameters: y f e.g., GLM: f X - But data is noisy: y f - Assume noise/residuals is small : f exp p P 4.5 Distribution of data, given fixed parameters: p y 2 2 y f exp 2

6 Introduction: Bayesian inference likelihood, priors and Bayes rule Likelihood: Prior: generative model m Bayes rule:

7 Overview of the talk Introduction: Bayesian inference Bayesian model comparison Group-level Bayesian model selection

8 y=f(x) y = f(x) Bayesian model comparison model evidence Principle of parsimony : «plurality should not be assumed without necessity» Model evidence: Occam s razor : x model evidence p(y m) space of all data sets

F(q) w.r.t. the approximate posterior q(θ) under some (e.g.

9 Bayesian model selection VB and the Free Energy ln p y m ln p y, m S q D p y, m ; q q free energy F q VB : maximize the free energy F(q) w.r.t. the approximate posterior q(θ) under some (e.g., mean field, Laplace) simplifying constraint KL 2 p p, y, m 2 or 2, q or 2 y m

10 Bayesian model selection Laplace approximation and BIC Laplace approximation q N, p F ln p y, m ln p m ln 2 ln 2 2 F Laplace BIC: Laplace approximation at the asymptotic limit n n p I p FLaplace ln p y, m ln n n 2 BIC

define the null, e.g.: p t H Bayesian model comparison a (quick) note on hypothesis testing H : define two alternative models, e.g.: m : p m m : p m N, if otherwise if t * P t t * H P t t * H t t Y estimate parameters (obtain test stat.

11 define the null, e.g.: p t H Bayesian model comparison a (quick) note on hypothesis testing H : define two alternative models, e.g.: m : p m m : p m N, if otherwise if t * P t t * H P t t * H t t Y estimate parameters (obtain test stat.) apply decision rule, i.e.: then reject H classical (null) hypothesis testing apply decision rule, e.g.: y Bayesian model comparison p Y m p Y m Y space of all datasets P m y if then accept m P m y

12 Bayesian model comparison Family-level inference P(m y) =.4 P(m 2 y) =.25 A B A B model selection error risk: max P e y P m y.3 m P(m 2 y) =. P(m 2 y) =.7 A B A B u u

13 Bayesian model comparison Family-level inference P(m y) =.4 P(m 2 y) =.25 A B A B model selection error risk: max P e y P m y.3 m P(m 2 y) =. P(m 2 y) =.7 family inference (pool statistical evidence) A B A B Pm y P f y m f u P(f y) =.5 u P(f 2 y) =.95 max P e y P f y.5 f

14 Overview of the talk Introduction: Bayesian inference Bayesian model comparison Group-level Bayesian model selection

15 Group-level model selection FFX-BMS analysis FFX-BMS: all subjects are best described by a unique (unknown) model m n j p yi m j p y m i y y2 yn ln p m y ln K F j n i ij FFX-BMS still assumes that model parameters are different across subjects! FFX-BMS is not invalid, but main assumption has to be justifiable. What if different subjects are best described by different models? RFX-BMS

16 Group-level model selection RFX-BMS: preliminary (Polya s urn) m m r i i i th marble is blue i th marble is purple = proportion of blue marbles in the urn r (binomial) probability of drawing a set of n marbles: m m2 mn n m i p m r r r i m i Thus, our belief about the proportion of blue marbles is: i n pr n m m i i pr m prr r E r m m n i i

17 Group-level model selection RFX-BMS: the group null H: reasonable prior assumption = [the urn is unbiased] E r k H K Exceedance probability: k Prk rk ' k m, H H: null prior assumption = [all frequencies are equal] H : rk K Bayesian omnibus risk : Po p H m p m H p m H p m H Protected exceedance probability: P k k P K

18 Group-level model selection RFX-BMS: what if we are colour blind? At least, we can measure how likely is the i th subject s data under each model! p y m p y2 m2 p yi mi p yn mn m m2 y y 2 r mn y n n, i i i p r m y p r p y m p m r i Our belief about the proportion of models is: pr, m y p r y m Exceedance probability: k Prk rk ' k y

19 Group-level model selection RFX-BMS: protecting from DCM overconfidence m 2 3 m y m y m u u log p(y m ) - log p(y m 2 ) EP BOR protected EP r P.5.5 r.5.5 r

20 parameter estimates Group-level model selection frequentist versus Bayesian RFX analyses? 2 2 subjects - -2 p p p.5 - p.5

21 Group-level model selection RFX-BMS: between-condition comparison within-subject design: n subjects in 2 conditions statistical evidence for a difference between conditions? compare 2 different hypotheses (at the group level): f f : same model across conditions : different models across conditions f t t 2 t 3 t 4 y m y m2 m m m 2 m 2 m m 2 y2 m yt yt 3 y y2 y y 2 y y 2 y y2 y2 m2 yt 2 yt 4 f

22 Group-level model selection RFX-BMS: between-group comparison between-subject design: 2 groups of n subjects each statistical evidence for a difference between groups? compare 2 different hypotheses (at the group level): H H : different groups come from the same population : different groups come from different populations H H r ( s) r ( s') r m s m s2 m s n m s' m s'2 m s'n ' m s m s2 m s n m s' m s'2 m s'n ' y s y s2 y sn y s' y s'2 y s'n ' y s y s2 y sn y s' y s'2 y s'n '

23 Overview of the talk Introduction: Bayesian inference Bayesian model comparison Group-level Bayesian model selection

24 I thank you for your attention.

25 A note on statistical significance lessons from the Neyman-Pearson lemma - error II rate Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test p y H p y H is the most powerful test of size u p u H to test the null. what is the threshold u, above which the Bayes factor test yields a error I rate of 5%? ROC analysis MVB (Bayes factor) u=.9, power=56% CCA (F-statistics) F=2.2, power=2% error I rate

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty