Strategies for Discovering Mechanisms of Mind using fmri: 6 NUMBERS. Joseph Ramsey, Ruben Sanchez Romero and Clark Glymour

Similar documents
NeuroImage xxx (2009) xxx xxx. Contents lists available at ScienceDirect. NeuroImage. journal homepage:

Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables

Joint estimation of linear non-gaussian acyclic models

Measurement Error and Causal Discovery

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Discovery of Linear Acyclic Models Using Independent Component Analysis

Estimation of linear non-gaussian acyclic models for latent factors

Causal modelling combining instantaneous and lagged effects: an identifiable model based on non-gaussianity

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Functional Connectivity and Network Methods

Bayesian Discovery of Linear Acyclic Causal Models

Automatic Causal Discovery

Causal Discovery by Computer

Learning causal network structure from multiple (in)dependence models

Causal Inference on Discrete Data via Estimating Distance Correlations

Towards an extension of the PC algorithm to local context-specific independencies detection

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Functional Causal Mediation Analysis with an Application to Brain Connectivity. Martin Lindquist Department of Biostatistics Johns Hopkins University

Modelling temporal structure (in noise and signal)

Discovery of non-gaussian linear causal models using ICA

Bayesian inference J. Daunizeau

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Tutorial: Causal Model Search

CPSC 540: Machine Learning

Learning the Structure of Linear Latent Variable Models

Modelling Time- varying effec3ve Brain Connec3vity using Mul3regression Dynamic Models. Thomas Nichols University of Warwick

Dynamic Causal Modelling for fmri

Bayesian inference J. Daunizeau

COMP538: Introduction to Bayesian Networks

Causal modeling of fmri: temporal precedence and spatial exploration

Identification of causal relations in neuroimaging data with latent confounders: An instrumental variable approach

Causal Discovery in the Presence of Measurement Error: Identifiability Conditions

Structure learning in human causal induction

Challenges (& Some Solutions) and Making Connections

Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach

Noisy-OR Models with Latent Confounding

Based on slides by Richard Zemel

A review of some recent advances in causal inference

Causal Discovery and MIMIC Models. Alexander Murray-Watters

Bayesian Regression (1/31/13)

Learning With Bayesian Networks. Markus Kalisch ETH Zürich

Gaussian Quiz. Preamble to The Humble Gaussian Distribution. David MacKay 1

Optimization of Designs for fmri

MIXED EFFECTS MODELS FOR TIME SERIES

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

Arrowhead completeness from minimal conditional independencies

Towards Association Rules with Hidden Variables

New Machine Learning Methods for Neuroimaging

Using background knowledge for the estimation of total causal e ects

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Causal Structure Learning and Inference: A Selective Review

Interpreting and using CPDAGs with background knowledge

Directed Graphical Models

Model Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC.

A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses

Predicting causal effects in large-scale systems from observational data

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim

Preliminary Causal Analysis Results with Software Cost Estimation Data

The Specification of Causal Models with Tetrad IV: A Review

Learning Multivariate Regression Chain Graphs under Faithfulness

Fast and Accurate Causal Inference from Time Series Data

Inferring Causal Phenotype Networks from Segregating Populat

Inferring the Causal Decomposition under the Presence of Deterministic Relations.

Causal Discovery with Latent Variables: the Measurement Problem

Causal Effect Identification in Alternative Acyclic Directed Mixed Graphs

Causal Discovery with Linear Non-Gaussian Models under Measurement Error: Structural Identifiability Results

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

Learning Measurement Models for Unobserved Variables

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features

CS 188: Artificial Intelligence Spring Today

Learning in Bayesian Networks

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

10708 Graphical Models: Homework 2

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Causality in Econometrics (3)

STA 4273H: Statistical Machine Learning

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Stochastic Dynamic Causal Modelling for resting-state fmri

Mixed effects and Group Modeling for fmri data

LEARNING WITH BAYESIAN NETWORKS

Learning of Causal Relations

Directed and Undirected Graphical Models

STA 4273H: Statistical Machine Learning

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Bayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course

Bayesian modelling of fmri time series

Causal Inference on Data Containing Deterministic Relations.

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Online Causal Structure Learning Erich Kummerfeld and David Danks. December 9, Technical Report No. CMU-PHIL-189. Philosophy Methodology Logic

LECTURE NOTE #3 PROF. ALAN YUILLE

Learning Semi-Markovian Causal Models using Experiments

Topic 3: Hypothesis Testing

Group analysis. Jean Daunizeau Wellcome Trust Centre for Neuroimaging University College London. SPM Course Edinburgh, April 2010

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y.

arxiv: v1 [cs.lg] 26 May 2017

Center for Causal Discovery: Summer Workshop

Transcription:

1 Strategies for Discovering Mechanisms of Mind using fmri: 6 NUMBERS Joseph Ramsey, Ruben Sanchez Romero and Clark Glymour

2 The Numbers 20 50 5024 9205 8388 500

3 fmri and Mechanism From indirect signals of local changes in blood oxygenation in small (2 mm 3 ) brain regions (voxels) during a cognitive task Infer The CAUSOME the causal relations between those regions in performing that task.

4 Why? By contrasting these mechanisms where there is a task with resting state mechanisms where there is no task, one could try to discover sets of causal relations specific to kinds of cognitive tasks. Which one hopes would lead to A profound refinement of real estate neuropsychology An alignment of psychological descriptions with neural processes Identification of processing disturbances in neuroatypicals

5 Typical Task Experiments These are data sets from Russ Poldrack s Open fmri project. Experiment 1 Subjects inflate balloons, judging whether to keep inflating them (risk). Experiment 3 Subjects judge nonsense words as rhyming or not. Experiment 5: Subjects are given a stake and bets (real money) which they can accept or reject.

6 Usual Procedure 1. Obtain BOLD time series from a large region of the brain (e.g., whole brain or cortex). 2. Cluster time series into regions of interest (ROIs) by anatomy, statistical clustering, or eyeballing. 3. Aggregate the BOLD measurements at a time for voxels within a cluster into a single measure (e.g., the average) 4. Apply a search procedure to estimate the causal relations among the clusters.

7 ROIs for Experiment 3: I LOCC, ROCC LIPL, RIPL LMTG, RMTG LACC, RACC LIFG, RIFG Not all visible in this image.

8 Causal Search Output from Exp. 3 With IMaGES search procedure

9 Search Procedures Guess Inverse covariance (partial correlation) PC FCI GES, IMaGES Granger LiNGAM LISREL (GIMME) Non-Gaussian scores Friston

10 Guess Look at the ROIs and the data and guess what the causal connections are. This method is in fact used!

11 Inverse covariance (partial correlation) Calculate the covariance matrix of your data. Invert it (possibly with L 1 penalty) Zero entries are X _ _ Y Z where Z = all other variables. Problem: X->Z<-Y X and Y are uncorrelated. But if you condition on Z, they re correlated! So you think there s an edge where there s not! Marries parents! Note that just thresholding the covariance matrix leads to extra edges too! X->Y->Z: X is correlated with Z, so you get the edge X Z!

12 PC Assume the true model is a directed acyclic graph (DAG) Assume there are no unrecorded common causes and i.i.d. sampling: Under these assumptions, the adjacency search finds correct adjacencies (undirected causal connections) in the large sample limit. This works well for fmri. Scales up to high dimensional problems. PC can process multiple subjects by using a multi-subject conditional independence test. The orientation search does not work so well for fmri.

13 FCI Like PC but does not assume that there are no unmeasured common causes. There may be latent variables! Uses the same adjacency search as PC. But the adjacencies are interpreted differently: not all of them represent direct causal paths Does not scale up to high dimensions but a modification (RFCI: Really Fast Causal Inference) scales up better. How much better?

14 GES, IMaGES GES (Meek; Chickering) is a Bayesian score search Under essentially the same assumptions as PC, correct in the large sample limit. IMaGES is a multi-subject version of GES. Average the BIC scores! Penalty term to counteract the effects of Indirect measurement creating spurious associations of measured variables. Works well for fmri (Ramsey et al., 2010, 2011) Scales up so-so.

15 Granger Causality Asks, which variables can I condition on to best predict future states of variables given current and past states of variables? Regress Yt on (Xit-, Yt-}. Take significant X predictors to be causes of Y. The lags are critically important for this method. In particular, it is important that the lags be uniform across the data set, a condition which doesn t hold well for fmri data. (Though work has been done to adjust for this.)

16 LiNGAM Linear Non-Gaussian Acyclic Model Runs Independent Components Analysis (ICA) Associates ICA output with variables by reordering the matrix into a lower triangle. Infers causal connections *and* directions in one fell swoop.

17 LISREL (GIMME) Will defer; Kathleen Gates will talk about this. Basically, models the causal structure over ROIs as a linear, Gaussian system. Amazingly effective for the case of non-stationary connections that is when the coefficient for X->Y changes over time.

18 Orientation Using Non-Gaussian Scores Given adjacencies, non-gaussian scoring can be effective at orienting edges. Hyvärinen and Smith (2013): Estimate direction between X, Y from signs of skews, e.g.. X 2 Y Y 2 X Other methods due to Ramsey et al.: R1, R2 assume sums of independent variables are closer to Gaussian than the summands. R3 uses an information theoretic measure based on a non-gaussianity score. R4 adapts an independent components method for cyclic graphs (Lacerda, et al., 2008).

19 Friston Method In the context of the Dynamical Causal Modeling paradigm, K. Friston proposes a method that scores a series of models and reports the model with the best score. This is possible due to a very fast scoring procedure. However, the number of possible models over a set of variables increases super-exponentially. So only models over a very few number of variables can be considered. Problem: Exhaustive search versus Search.

20 Testing Search Methods Simulate Simulate Simulate Torture animals and measure humans.

21 One Torture Test (Dawson et al., 2013) Compared causal relations between visual cortex regions of macaque monkeys with causal relations between analogous areas in the human cortex estimated by several search methods from fmri. Only adjacencies identified. PC most accurate.

22 Lots of Simulation Data (Smith, et al., 2011) tested identifications of simple graphical causal relations from simulated fmri in 28 conditions. Smith et al. tested 35 search methods. None succeeded in identifying causal directions accurately. Their code (Smith and Woolrich mainly), based on the best available biophysical model of the generation of the BOLD signal, can be used to simulate data from much more complex, higher-dimensional models.

23 Smith Study Conclusions Bayes net methods effective at finding adjacencies PC, GES, FCI, etc. Inverse covariance methods effective at finding adjacencies (because there were only a few unshielded colliders in the Smith models) Granger methods, in all versions tested, ineffective at finding adjacencies (but still frequently used). LiNGAM ineffective at finding adjacencies. No methods effective at finding orientations. (Non-Gaussian methods other than LiNGAM not tested.) (Except maybe Patel.)

24 20 Maximum number of variables (Regions of Interest) in any published attempt to infer causal mechanisms from empirical fmri data.

25 50 The number of simulated graphs in the largest model in the Smith study.

26 5,024 5054 = The number of 6mm 3 voxels in the cortex. 9 = The number of variables in the graphical causal models estimated by Poldrack for the pseudorhyme experiment.

27 9205 The number of directed edges found for one single subject data set from experiment 1 Alpha = 0.01 Calculate covariance matrix C over 5054 voxels Run PC adjacency search over C Orient using R3 Side note: GES/IMaGES is not currently scalable to 5054 variables; I ve scaled it up to about 800 variables

28 8388 The number of directed edges found for one single subject dataset from experiment 3 Alpha = 0.05 Calculate covariance matrix C over 5054 voxels Run PC adjacency search over C Orient using R3. Many missing (unrecorded) voxels, maybe should run RFCI future work can RFCI be scaled up that far?

29 500 Why Trust The Estimates? Point 1: Simulation with 500 variables with Smith code yields 90% precision; 50% recall Point 2: In experiment 3, PC + R3 identifies the stimulus variable as a cause, not an effect, of neural variables. Point 3: In experiment 3, PC + R3 agrees with experts opinion of directions of influence.

30 Details for 500 variable simulation. R1 R3 EB Tanh Skew Rskew Patel SkewE RSkewE Average AP 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94 AR 0.49 0.49 0.49 0.49 0.49 0.49 0.49 0.49 0.49 PTPA 0.76 0.94 0.75 0.7 0.96 0.97 0.72 0.9 0.9 Group AP 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.98 AR 1 1 1 1 1 1 1 1 1 PTPA 0.99 1 0.77 0.77 1 1 0.76 0.94 0.95 AP = Adjacency precision (# true positive adjacencies / (# true positive adjacencies + # false positive adjacencies)), AR = adjacency recall (# true positive adjacencies / (# true positive adjacencies + # false negative adjacencies)), PTPA = Precision for true positive adjacencies (# true positive orientations / # true positive adjacencies), averaged over 5 individual subjects and for the 5 subject analyzed groupwise, for the 500 variable, 621 edges simulation described in the text. For the individual subject case 6% of the estimated edges were false positives and for the group case 2% of the estimated edges were false positives.

31 How Is It Possible? PC for adjacencies Non-Gaussian scores for directions Programming challenges: Scaling up to 5054 variables. Seems doable for PC, non-gaussian orientation. RFCI? GES? Others? This is not the limit of aspiration we want to do 40,000 or more voxels!

32 Why the Low Recall for Single Subject? With PC there s a tradeoff between computational complexity and information retrieved. Choice is to find the largest effects only or to have the computer never return. For PC, a choice of alpha level. Higher alpha levels imply more false positive edges! Also, small sample size, 160.

33 What Are the Problems? Stability of PC under resampling Disagreements in estimates from different non-gaussian scoring procedures Disagreements in estimates between subjects Use of multiple subject data simultaneously And: Feedback relations (addressed by one non-gaussian method, R4) Cancellation of correlations by positive and negative influences Latent variables Non-stationary time series (see GIMME!)

What Are the Potential Clinical Applications? Catherine Hanson will talk about some of that, I hope. 34

35 Thanks to many people Steve Hanson Catherine Hanson Russ Poldrack Stephen Smith Patrick Hoyer Aapo Hyvärinen Cosma Shalizi Peter Spirtes Richard Scheines Marloes Matuis Kathleen Gates and many more! Sorry if I didn t list you!

36 Some few references Dawson, D. A., Cha, K., Lewis, L. B., Mendola, J. D., and Shmuel, A. (2013). Evaluation and calibration of functional network modeling methods based on known anatomical connections. NeuroImage 67:331-343. Hyvärinen, A., and Smith, S. M. (2013). Pairwise likelihood ratios for estimation of non- Gaussian structural equation models. The Journal of Machine Learning Research, 14:111-152. Lacerda, G., Spirtes, P., Ramsey, J., Hoyer, P. (2008). Discovering cyclic causal models by independent components analysis. Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence. Ramsey, J., Hanson, S., Hanson, C., Halchenko, Y., Poldrack, R., Glymour, C. (2010). Six problems for causal inference from fmri. NeuroImage, 49:1545-1558. Ramsey, J., Hanson, S., and Glymour, C. (2011). Multi-subject search correctly identifies causal connections and most causal directions in the DCM models of the Smith et al. simulation study. NeuroImage, 58(3):838-848. Smith, S. M., Miller, K. L., Salimi-Khorshidi, G., Webster, M., Beckmann, C. F., Nichols, T. E., Ramsey, J., and Woolrich, M. W. (2011). Network modeling methods for FMRI. NeuroImage, 54(2):875 891.

37