Bayesian inference J. Daunizeau

Similar documents
Bayesian inference J. Daunizeau

An introduction to Bayesian inference and model comparison J. Daunizeau

Bayesian inference. Justin Chumbley ETH and UZH. (Thanks to Jean Denizeau for slides)

Bayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course

Wellcome Trust Centre for Neuroimaging, UCL, UK.

Dynamic Causal Modelling for evoked responses J. Daunizeau

Dynamic Causal Modelling for EEG/MEG: principles J. Daunizeau

Group analysis. Jean Daunizeau Wellcome Trust Centre for Neuroimaging University College London. SPM Course Edinburgh, April 2010

Group Analysis. Lexicon. Hierarchical models Mixed effect models Random effect (RFX) models Components of variance

The General Linear Model. Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London

Statistical Inference

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

The General Linear Model (GLM)

Bayesian Inference for the Multivariate Normal

The General Linear Model (GLM)

Lecture : Probabilistic Machine Learning

Will Penny. SPM for MEG/EEG, 15th May 2012

A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y.

Overview of SPM. Overview. Making the group inferences we want. Non-sphericity Beyond Ordinary Least Squares. Model estimation A word on power

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Will Penny. DCM short course, Paris 2012

Modelling temporal structure (in noise and signal)

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Experimental design of fmri studies

Linear Dynamical Systems

Statistical inference for MEG

STA 4273H: Statistical Machine Learning

Will Penny. SPM short course for M/EEG, London 2013

Statistical Inference

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Bayesian Networks BY: MOHAMAD ALSABBAGH

Extracting fmri features

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Model Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC.

Experimental design of fmri studies

M/EEG source analysis

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Experimental design of fmri studies & Resting-State fmri

Experimental design of fmri studies

Bayesian Models in Machine Learning

Approximate Inference Part 1 of 2

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Approximate Inference Part 1 of 2

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

Bayesian Learning (II)

Unsupervised Learning

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Inference and estimation in probabilistic time series models

Regularized Regression A Bayesian point of view

Primer on statistics:

Naïve Bayes classification

Introduction to Probabilistic Machine Learning

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Experimental design of fmri studies

Detection and Estimation Theory

Model Selection for Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Statistical Data Analysis Stat 3: p-values, parameter estimation

Detection and Estimation Theory

Probabilistic Models for Learning Data Representations. Andreas Damianou

GWAS IV: Bayesian linear (variance component) models

CPSC 540: Machine Learning

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

Least Squares Regression

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

Hypothesis testing (cont d)

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Machine Learning Linear Classification. Prof. Matteo Matteucci

First Technical Course, European Centre for Soft Computing, Mieres, Spain. 4th July 2011

Learning in Bayesian Networks

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Latent Variable Models and Expectation Maximization

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Signal detection theory

Detection theory. H 0 : x[n] = w[n]

Bayesian Learning Extension

Bayesian Inference Course, WTCN, UCL, March 2013

SRNDNA Model Fitting in RL Workshop

Robust Monte Carlo Methods for Sequential Planning and Decision Making

PATTERN RECOGNITION AND MACHINE LEARNING

Latent Variable Models and Expectation Maximization

STA 4273H: Sta-s-cal Machine Learning

Dynamic Causal Modelling for fmri

P Values and Nuisance Parameters

Lecture 7 Introduction to Statistical Decision Theory

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Dynamic Causal Modelling for EEG and MEG. Stefan Kiebel

Will Penny. SPM short course for M/EEG, London 2015

Theory and Methods of Statistical Inference. PART I Frequentist likelihood methods

Least Squares Regression

Multivariate statistical methods and data mining in particle physics

Some Curiosities Arising in Objective Bayesian Analysis

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Probabilistic Machine Learning. Industrial AI Lab.

Bayesian Inference for Normal Mean

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Transcription:

Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK

Overview of the talk 1 Probabilistic modelling and representation of uncertainty 1.1 Bayesian paradigm 1.2 Hierarchical models 1.3 Frequentist versus Bayesian inference 2 Notes on Bayesian inference 2.1 Variational methods (ReML, EM, VB) 2.2 Family inference 2.3 Group-level model comparison 3 SPM applications 3.1 amri segmentation 3.2 Decoding of brain images 3.3 Model-based fmri analysis (with spatial priors) 3.4 Dynamic causal modelling

Overview of the talk 1 Probabilistic modelling and representation of uncertainty 1.1 Bayesian paradigm 1.2 Hierarchical models 1.3 Frequentist versus Bayesian inference 2 Notes on Bayesian inference 2.1 Variational methods (ReML, EM, VB) 2.2 Family inference 2.3 Group-level model comparison 3 SPM applications 3.1 amri segmentation 3.2 Decoding of brain images 3.3 Model-based fmri analysis (with spatial priors) 3.4 Dynamic causal modelling

Bayesian paradigm probability theory: basics Degree of plausibility desiderata: - should be represented using real numbers (D1) - should conform with intuition (D2) - should be consistent (D3) normalization: a=2 marginalization: a=2 b=5 conditioning : (Bayes rule)

Bayesian paradigm deriving the likelihood function - Model of data with unknown parameters: y f e.g., GLM: f X - But data is noisy: y f - Assume noise/residuals is small : f 1 2 2 2 exp p P 4 0.05 Distribution of data, given fixed parameters: p y 1 2 2 y f exp 2

Bayesian paradigm likelihood, priors and the model evidence Likelihood: Prior: generative model m Bayes rule:

Bayesian paradigm forward and inverse problems forward problem p y, m likelihood posterior distribution p y, m inverse problem

y=f(x) y = f(x) Bayesian paradigm model comparison Principle of parsimony : «plurality should not be assumed without necessity» Model evidence: Occam s razor : x model evidence p(y m) space of all data sets

Hierarchical models principle inference causality

Hierarchical models directed acyclic graphs (DAGs)

define the null, e.g.: p t H 0 Frequentist versus Bayesian inference a (quick) note on hypothesis testing H 0 : 0 define two alternative models, e.g.: m : p m 0 0 m : p m N 0, 1 1 1 if 0 0 otherwise if 0 t * P t t * H P t t * H 0 t t Y estimate parameters (obtain test stat.) apply decision rule, i.e.: then reject H0 classical (null) hypothesis testing apply decision rule, e.g.: y Bayesian Model Comparison p Y m 0 p Y m 1 Y space of all datasets P m0 y if then accept m 0 P m y 1

Overview of the talk 1 Probabilistic modelling and representation of uncertainty 1.1 Bayesian paradigm 1.2 Hierarchical models 1.3 Frequentist versus Bayesian inference 2 Notes on Bayesian inference 2.1 Variational methods (ReML, EM, VB) 2.2 Family inference 2.3 Group-level model comparison 3 SPM applications 3.1 amri segmentation 3.2 Decoding of brain images 3.3 Model-based fmri analysis (with spatial priors) 3.4 Dynamic causal modelling

Variational methods VB / EM / ReML VB : maximize the free energy F(q) w.r.t. the approximate posterior q(θ) under some (e.g., mean field, Laplace) simplifying constraint 2 1 p p, y, m 1 2 1 or 2, q 1 or 2 y m

Family-level inference trading inference resolution against statistical power P(m 1 y) = 0.04 P(m 2 y) = 0.25 A B A B model selection error risk: 1 1 max P e y P m y 0.3 m P(m 2 y) = 0.01 P(m 2 y) = 0.7 A B A B u u

Family-level inference trading inference resolution against statistical power P(m 1 y) = 0.04 P(m 2 y) = 0.25 A B A B model selection error risk: 1 1 max P e y P m y 0.3 m P(m 2 y) = 0.01 P(m 2 y) = 0.7 family inference (pool statistical evidence) A B A B Pm y P f y m f u P(f 1 y) = 0.05 u P(f 2 y) = 0.95 1 1 max P e y P f y 0.05 f

Group-level model comparison preliminary: Polya s urn m m r i i 1 0 i th marble is blue i th marble is purple = proportion of blue marbles in the urn r (binomial) probability of drawing a set of n marbles: m1 m2 mn n m 1 1 i p m r r r i1 m i Thus, our belief about the proportion of blue marbles is: i1 1 n pr n m 1m 1 i i pr m prr 1 r E r m m n i1 i

Group-level model comparison what if we are colour blind? At least, we can measure how likely is the i th subject s data under each model! p y1 m1 p y2 m2 p yi mi p yn mn m1 m2 y 1 y 2 r mn y n n, i i i p r m y p r p y m p m r i1 Our belief about the proportion of models is: pr, m y p r y m Exceedance probability: k Prk rk ' k y

Overview of the talk 1 Probabilistic modelling and representation of uncertainty 1.1 Bayesian paradigm 1.2 Hierarchical models 1.3 Frequentist versus Bayesian inference 2 Notes on Bayesian inference 2.1 Variational methods (ReML, EM, VB) 2.2 Family inference 2.3 Group-level model comparison 3 SPM applications 3.1 amri segmentation 3.2 Decoding of brain images 3.3 Model-based fmri analysis (with spatial priors) 3.4 Dynamic causal modelling

segmentation and normalisation posterior probability maps (PPMs) dynamic causal modelling multivariate decoding realignment smoothing general linear model normalisation statistical inference p <0.05 Gaussian field theory template

class variances amri segmentation mixture of Gaussians (MoG) model 1 2 k 1 2 i th voxel label y i c i i th voxel value class frequencies k class means grey matter white matter CSF

fixation cross + Decoding of brain images recognizing brain states from fmri pace >> response log-evidence of X-Y sparse mappings: effect of lateralization log-evidence of X-Y bilateral mappings: effect of spatial deployment

fmri time series analysis spatial priors and model comparison short-term memory design matrix (X) PPM: regions best explained by short-term memory model prior variance of data noise GLM coeff prior variance of GLM coeff AR coeff (correlated noise) long-term memory design matrix (X) PPM: regions best explained by long-term memory model fmri time series

Dynamic Causal Modelling network structure identification m 1 m 2 m 3 m 4 attention attention attention attention PPC PPC PPC PPC stim V1 V5 stim V1 V5 stim V1 V5 stim V1 V5 15 10 models marginal likelihood ln p y m attention 0.10 PPC 1.25 0.39 0.26 estimated effective synaptic strengths for best model (m 4 ) 5 stim 0.26 V1 0.13 0.46 V5 0 m 1 m 2 m 3 m 4

I thank you for your attention.

A note on statistical significance lessons from the Neyman-Pearson lemma 1 - error II rate Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test p y H p y H is the most powerful test of size 1 0 u p u H 0 to test the null. what is the threshold u, above which the Bayes factor test yields a error I rate of 5%? ROC analysis MVB (Bayes factor) u=1.09, power=56% CCA (F-statistics) F=2.20, power=20% error I rate