Bayesian inference of structural brain networks

Size: px

Start display at page:

Download "Bayesian inference of structural brain networks"

Arleen Woods
6 years ago
Views:

1 Bayesian inference of structural brain networks Max Hinne a,b, Tom Heskes a, Christian F. Beckmann b, Marcel A. J. van Gerven b a Radboud University Nijmegen, Institute for Computing and Information Sciences, Nijmegen, The Netherlands b Radboud University Nijmegen, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands Abstract Structural brain networks are used to model white-matter connectivity between spatially segregated brain regions. The presence, location and orientation of these white matter tracts can be derived using diffusion-weighted magnetic resonance imaging in combination with probabilistic tractography. Unfortunately, as of yet, none of the existing approaches provide an undisputed way of inferring brain networks from the streamline distributions which tractography produces. Stateof-the-art methods rely on an arbitrary threshold or, alternatively, yield weighted results that are difficult to interpret. In this paper, we provide a generative model that explicitly describes how structural brain networks lead to observed streamline distributions. This allows us to draw principled conclusions about brain networks, which we validate using simultaneously acquired resting-state functional MRI data. Inference may be further informed by means of a prior which combines connectivity estimates from multiple subjects. Based on this prior, we obtain networks that significantly improve on the conventional approach. Keywords: structural connectivity, probabilistic tractography, hierarchical Bayesian model 1. Introduction Human behavior ultimately arises through the interactions between multiple brain regions that together form networks that can be characterized in terms of structural, functional and effective connectivity Penny et al., 06). Structural connectivity presupposes the existence of whitematter tracts that connect spatially segregated brain regions which constrain the functional and effective connectivity between these regions. Hence, structural connectivity provides the scaffolding that is required to shape neuronal dynamics. Changes in structural brain networks have been related to various neurological disorders. For this reason, optimal inference of structural brain networks is of major importance in clinical neuroscience Catani, 07). Inference of these networks entails two steps. First is the estimation of the white matter tracts. The second step consists of obtaining the network that captures which regions are connected, based on the earlier identified fibre tracts. In this paper, we focus on the latter step. For the first step, we use diffusion-weighted imaging DWI), which is a prominent way to estimate structural connectivity of whole-brain networks in vivo. It is a variant of magnetic resonance imaging MRI) which measures the restricted diffusion of water molecules, thereby providing an indirect measure of the presence and orientation of Corresponding author at Radboud University Nijmegen, Faculty of Science, Institute for Computing and Information Sciences, Postbus 9010, 6500 GL Nijmegen, The Netherlands. Tel.: , Fax: address: mhinne@cs.ru.nl Max Hinne) white-matter tracts. By following the principal diffusion direction in individual voxels, streamlines can be drawn that represent the structure of fibre bundles, connecting separate regions of grey matter. This process is known as deterministic tractography Conturo et al., 1999; Chung et al., 10; Shu et al., 11). Alternatively, fibres may be estimated using probabilistic tractography Behrens et al., 03, 07; Friman et al., 06; Jbabdi et al., 07). This comprises a model for the principal diffusion direction that is then used to sample distributions of streamlines. Ultimately, the procedure results in a measure of uncertainty about where a hypothesized connection will terminate. A benefit of the probabilistic approach is that it explicitly takes uncertainty in the streamlining process into account. Apart from studies focusing on particular tracts, much research has been devoted to the derivation of macroscopic connectivity properties, that is, whole-brain structural connectivity. Several approaches have been suggested to extract whole-brain networks from probabilistic tractography results Robinson et al., 08; Hagmann et al., 07; Gong et al., 09). Unfortunately, inference of whole-brain networks from probabilistic tractography estimates remains somewhat ad hoc. Typically the underlying brain network is derived by thresholding the streamline distribution such that counts above or below threshold are taken to reflect the presence or absence of tracts, respectively. This approach is easy to implement but it has a number of issues. First, the threshold is arbitrarily chosen to have a particular value. In a substantial part of the literature, the threshold that is used to transform the streamline distribution into a network is actually set to zero Hag- Preprint submitted to NeuroImage November 29, 12

2 2 mann et al., 07, 08; Zalesky et al., 10; Vaessen et al., 10; Chung et al., 11). However, probabilistic streamlining depends on the arbitrary number of samples that are drawn per voxel. This implies that, as more samples are drawn, more brain regions are likely to eventually become connected given a threshold at zero. Alternatively, the number of streamlines can be interpreted as connection weight Bassett et al., 11; Zalesky et al., 10; Robinson et al., 10), or a relative threshold can be applied Kaden et al., 07). This way, the relative differences between connections remain respected. Unfortunately, the connection weights do not have a straightforward probabilistic) interpretation. Simply normalizing these weights does not yield a true notion of connection probability. At most, it can be regarded as the conditional probability that a streamline ends in a particular voxel given the starting point of the streamline. In the case of a streamline distribution with, say, half of the streamlines starting at A ending in B, and the other half ending in C, normalized streamline counts cannot distinguish between one edge with an uncertain end point, or two edges with definite end points. Finally, several graph-theoretical measures such as characteristic path length and clustering coefficient are ill-defined for non-binary networks. In general, it is problematic to use thresholding since it ignores the relative differences between streamline counts. Intuitively, one would expect that if, say, ninety percent of the streamlines connect from voxel A to voxel B, and ten percent connect voxel A to voxel C, then at the least the former has a higher probability of having a corresponding edge in the network than the latter, but both edges are possible as well. This is related to the burstiness phenomenon of words in document retrieval, where the occurrence of a rare word in a document makes its repeated occurrence more likely Xu and Akella, 10). Summarizing, the issue with thresholding approaches is that they consider each tract in isolation. This ignores the information that can be gained from the possible symmetry in streamline counts, as well as from the relative differences within a streamline distribution. Another important observation is that the mentioned approaches do not easily support the integration of probabilistic streamlining data with other sources of information. Data is often not collected in isolation but rather acquired for multiple subjects, potentially using a multitude of imaging techniques. Multi-modal data fusion is needed in order to provide a coherent picture of brain function Horwitz and Poeppel, 02; Groves et al., 11). The integration of multi-subject data is required for group-level inference, where the interest is in estimating a network that characterizes a particular population, for example, when comparing patients with controls in a clinical setting Simpson et al., 11). In the following, we provide a Bayesian framework for the inference of whole-brain networks from streamline distributions. In our approach, we consider the distribution of binary) networks that are supported by our data, instead of generating a single network based on an arbitrary threshold. Our approach relies on defining a generative model for whole-brain networks which extends recent work on network inference in systems biology Mukherjee and Speed, 08) and consists of two ingredients. First, a network prior is defined in terms of the classical Erdős-Rényi model Erdős and Rényi, 19). This prior is later extended to handle multi-subject data, capturing the notion that different subjects brains tend to be similar. Second, we propose a forward model based on a Dirichlet compound multinomial distribution which views the streamline distributions produced by probabilistic tractography as noisy data, thus completing the generative model. In order to validate our Bayesian framework we make use of the often reported observation that restingstate functional connectivity reflects structural connectivity Koch et al., 02; Greicius et al., 09; Honey et al., 09; Lv et al., 10; Skudlarski et al., 08; Park et al., 08; Damoiseaux and Greicius, 09). We show that structural networks that derive from our generative model informed by the connectivity for other subjects provide a better fit to the in)dependencies in resting-state functional MRI rs-fmri) data than the standard thresholding approach. 2. Material and methods 2.1. Data acquisition Twenty healthy volunteers were scanned after giving informed written consent in accordance with the guidelines of the local ethics committee. A T1 structural scan, resting-state functional data and diffusion-weighted images were obtained using a Siemens Magnetom Trio 3T system at the Donders Centre for Cognitive Neuroimaging, Radboud University Nijmegen, The Netherlands. The rs-fmri data were acquired at 3 Tesla using a multi echo echo planar imaging ME-EPI) sequence voxel size 3.5 mm isotropic, matrix size 64 64, TR = 00 ms, TEs = 6.9, 16.2, 25, 35 and 45 ms, 39 slices, GRAPPA factor 3, 6/8 partial Fourier). A total of 1030 volumes were obtained. An optimized acquisition order described by Cook et al. 06) was used in the DWI protocol voxel size 2.0 mm isotropic, matrix size , TR = ms, TE = 101 ms, 70 slices, 256 directions at b = 1500 s/mm 2 and 24 directions at b=0) Preprocessing of resting-state data The multi-echo images obtained using the rs-fmri acquisition protocol were combined using a custom Matlab script MATLAB 7.7, The MathWorks Inc., Natick, MA, USA) which implements the procedure described by Poser et al. 06) and also incorporates motion correction using functions from the SPM5 software package Wellcome Department of Imaging Neuroscience, University College London, UK). Of the 1030 combined volumes, the first six were discarded to allow the system to reach a steady state.

1 a) b) A 0 R L 100 P Figure 1: a) Covariance matrices for the resting-state data for three randomly selected subjects.

Tools from the Oxford FMRIB Software Library FSL, FMRIB, Oxford, UK) were used for further processing. Brain extraction was performed using FSL BET Smith, 02).

A zero-lag 6th order Butterworth bandpass filter was applied to the functional data to retain only frequencies between 0.01 and 0.08 Hz.

5 were discarded. This resulted in an average region count of 115.7 ± 0.1. For these regions the functional data was summed and then standardized to have zero mean and unit standard deviation.

Framework for structural connectivity estimation In this section we derive our Bayesian approach to the inference of whole-brain structural networks.

An element a ij {0, 1} represents the absence or presence of an edge between brain region i and j. A is taken to be a simple graph, such that a ij = a ji and a ii = 0.

3 1 a) b) A 0 R L 100 P Figure 1: a) Covariance matrices for the resting-state data for three randomly selected subjects. b) Axial view of RGB-FA maps for the diffusion weighted images, again for three randomly selected subjects. The s in the matrices are shown in the order they appear in the AAL atlas. Tools from the Oxford FMRIB Software Library FSL, FMRIB, Oxford, UK) were used for further processing. Brain extraction was performed using FSL BET Smith, 02). For each subject, probabilistic brain tissue maps were obtained using FSL FAST Zhang et al., 01). A zero-lag 6th order Butterworth bandpass filter was applied to the functional data to retain only frequencies between 0.01 and 0.08 Hz. After preprocessing, the fmri data were parcellated according to the Automated Anatomical Labeling AAL) atlas Tzourio-Mazoyer et al., 02). Regions without voxels with gray-matter probability 0.5 were discarded. This resulted in an average region count of ± 0.1. For these regions the functional data was summed and then standardized to have zero mean and unit standard deviation. The resulting data were used to compute the empirical covariance matrix ˆΣ. Example covariance matrices are shown in Fig. 1a Framework for structural connectivity estimation In this section we derive our Bayesian approach to the inference of whole-brain structural networks. The quantity of interest in our framework is the posterior over structural networks represented by the adjacency matrix A given observed probabilistic streamlining data N and hyperparameters ξ. An element a ij {0, 1} represents the absence or presence of an edge between brain region i and j. A is taken to be a simple graph, such that a ij = a ji and a ii = 0. A brain region can either be interpreted as a voxel or as an aggregation of voxels as defined by a gray matter parcellation. The posterior expresses our knowledge on structural connectivity given the data and background knowledge and is given by: P A N, ξ) P N A, a +, a ) P A p) 1) 2.3. Preprocessing of diffusion imaging data The preprocessing steps for the diffusion data were conducted using FSL FDT Behrens et al., 03) and consisted of correction for eddy currents and estimation of the diffusion parameters. Raw color-coded fractional anisotropy maps are shown in Fig 1b. To obtain a measure of white-matter connectivity, we used FDT Probtrackx 2.0 Behrens et al., 03, 07). As seed voxels for tractography we used those voxels that live on the boundary between white matter and gray matter. For each of these voxels 5000 streamlines were drawn, with a maximum length of 00 steps. The streamlines were restricted by the fractional anisotropy to prevent them from wandering around in gray matter. Streamlines in which a sharp angle > degrees) occurred or that had a length less than 2 mm were discarded. The output thus obtained is a matrix N with n ij the number of streamlines drawn from voxel i to voxel j. To transform this into the parcellated scheme as dictated by the AAL atlas, the streamlines were summed over all voxels per region, resulting in an aggregated connectivity matrix which ranges over regions instead of voxels. Regions that had been removed after preprocessing the fmri data were removed from the aggregated connectivity matrix as well. 3 with hyperparameters ξ = a +, a, p), for which an interpretation will be given later on. In the following, for convenience, we will sometimes suppress the dependence on the hyperparameters. To infer the posterior distribution, we must specify a prior P A) and a forward model P N A) which together define a generative model of probabilistic streamlining data. Given these components, the posterior can be approximated using a Markov chain Monte Carlo algorithm, as described in detail in Section 2.5. We now proceed to formally define the components of the generative model as shown in Fig Forward model We begin with a specification of the forward model P N A). Here, we describe how the observed streamline distributions N depend on the underlying network A through latent streamline probabilities X. Assume there are K brain regions for which we want to estimate the structural connectivity. We start by considering one region i and the possible targets in which a postulated tract may terminate. Let n ik denote the number of streamlines which start in region i and terminate in region k. We assume that n ii = 0. Probabilistic tractography produces a distribution over target vertices n i = n i1,...,n ik ) T by drawing S streamlines,

4 p A a +,a X Figure 2: The generative model that describes how the observed streamline distribution N depends on the hidden) connectivity probabilities X. These in turn depend on the hyperparameters a + and a as well as the connectivity A, which is determined by hyperparameter p of the prior. N i = K k=1 n ik S of them ending up in a target region. 1 A particular distribution n i depends on the streamline probabilities. That is, we expect many streamlines between two regions when there is a high streamline probability and vice versa. This is captured by expressing the probability of a distribution n i in terms of a multinomial distribution P n i x i ) K j=1 x nij ij, in which x i = x i1,...,x ik ) is a probability vector with j x ij = 1. Each x ij represents the probability of drawing a streamline from region i to region j. This streamlining probability itself depends on whether or not there actually exists a physical tract between region i and region j. Let a i denote the i-th row of A indicating the connectivity between region i and all other regions. Intuitively, we expect a high streamline probability when there is an edge in the network. Conversely, we expect a low probability when two regions are disconnected. Thus, the streamline probabilities depend on the actual white-matter connectivity as modeled by A. This is captured by modeling the distribution of streamline probabilities using a Dirichlet distribution P x i a i, a +, a ) K j=1 N bij 1 xij, where shorthand notation b ij a ij a + +1 a ij )a is used. The b ij can be interpreted as the parameters that determine the probability of streamlining from region i to region j when an edge a ij is either present a + ) or absent a ). To obtain a single expression for the likelihood of an adjacency matrix, let N = n 1 ;...;n K ) represent the combined probabilistic tractography data, i.e. for each of the K s a distribution of streamlines to all other s. Similarly, let X = x 1 ;... ;x K ) denote the combined hidden connection probabilities and A = a 1 ;...;a K ) the 1 It is possible that streamlines end up in voxels outside any region of the parcellation, hence the inequality. 4 adjacency matrix for all brain regions. The likelihood of the network A is expressed as P N A, a +, a ) = P N X)P X A, a +, a ) dx. 2) By recognizing that the Dirichlet distribution is the conjugate prior for the multinomial distribution, it follows that Eq. 2) is a product of Dirichlet compound multinomial distributions Madsen et al., 05; Xu and Akella, 10; Minka, 00). The DCM distribution assumes that, given a network, a probability vector can be drawn with large values where the network has edges and small values where the network is disconnected. This probability vector, in turn, can be used to sample from a multinomial distribution that reflects the probabilistic tractography outcome. For sufficiently small choices of the hyperparameters of the DCM, sampling from this multinomial reflects the burstiness behavior we observe in the streamline distributions, where some pairs of s are connected by many streamlines, while most pairs have few or even zero streamlines Network prior In order to define a prior on adjacency matrices, we adopt the Erdős-Rényi model which states that the probability of an edge between region i and j is given by parameter p Erdős and Rényi, 19). This allows the prior to be expressed in terms of a product of binomial distributions: P A p) = i<j p aij 1 p) 1 aij. Recall that a ji a ij by definition, such that choosing p = 0.5 gives rise to a flat prior on simple graphs Hierarchical model So far, we assumed that data for each subject is analyzed independently. However, in practice, data for multiple subjects may be available and data for one subject might inform the inference for another subject. The intuition is that brain connectivity will, to a certain extent, be similar across subjects. Therefore, borrowing statistical strength from other subjects should lower the susceptibility to noise and artifacts in a single subject. This can be achieved by formulating a hierarchical model, where subject-dependent parameters at the first level are tied by subject-independent parameters at the second level. Figure 3 depicts this hierarchical model. Suppose streamline data N = N 1),...,N M) ) is acquired for M subjects. Let A = A 1),...,A M) ) denote a vector whose elements A m) refers to the connectivity matrix for subject m. In the hierarchical model, we assume that the different subjects are related through parent connectivity Ā. The different Am) are conditionally independent given Ā. For a new subject M +1, the quantity

5 p a +,a Ā A m) X m) N m) m = 1 : M Figure 3: The hierarchical model describes how the connectivity for a subject depends on its streamline distribution but also on the connectivity in other subjects as mediated through parent network Ā. of interest is the posterior marginal ) P A M+1) N,N M+1), ξ P N M+1) A M+1), a +, a ) ) P A M+1) N, ξ. We could approximate this marginal by sampling from the hierarchical model. However, this is a computationally demanding task as it requires the simultaneous estimation of all of the adjacency matrices belonging to each of the subjects, as well as the parent network Ā. Instead, we specify a prior based on the connectivity obtained for other subjects. This improvement over the Erdős-Rényi model defines a separate connection probability for each individual edge instead of using a single parameter p to specify the connection probability for complete networks. This multi-subject prior is derived from the hierarchical model in Appendix A and is equal to: P ) A M+1) N, ξ = i<j p am+1) ij ij 1 p ij ) 1 am+1) ij ), 3) where p ij M m=1 âm) ij +1)/M +2) with â m) ij the maximum likelihood ML) estimate for subject m. Hence, we derive a prior for subject M + 1 from the ML estimates for subjects 1,...,M. These estimates can be obtained by running the single-subject models together with a flat prior. The multi-subject prior can subsequently be plugged into Eq. 1) to produce the posterior for subject M Approximate inference Since the posterior 1) cannot be calculated analytically, we resort to an MCMC scheme to sample from this distribution Mukherjee and Speed, 08). We always start the sampling chain with a random symmetric adjacency matrix without self-loops. A new sample is proposed based on a previous network A by flipping an edge, resulting in a network A which, because of the symmetry of A, implies a ij = 1 a ij and a ji = 1 a ji). The acceptance of the proposed sample is determined by the ratio γ = P A N, ξ) P A N, ξ). 5 A proposed network becomes a new sample with probability min1, γ) with log γ = L kl + P kl. Here, L kl and P kl define the change in log-likelihood and log-prior respectively, after flipping edge a kl. A complete derivation of these terms is given in Appendix B. The sample distributions were obtained for each subject by drawing ten parallel chains of 300,000 samples discarding the first,000 samples as burn-in phase and keeping only each 0th sample to assure independence). The collection of T accepted samples {A 1),...,A T) } forms an approximation of the posterior P A N, ξ). The samples can be used to estimate posterior probabilities of network features, such as the probability of a specific connection. Assuming the Markov chain has converged, the posterior probability of a single connection is given by E[a ij N] = 1 T T t=1 at) ij. Other summary statistics for the distribution may be estimated in a similar manner Validation of structural connectivity estimates Functional connectivity is constrained by structural connectivity Honey et al., 10; Cabral et al., 12). In other words, when there is functional connectivity, there is often structural connectivity, although structural connectivity is not a necessary requirement for functional connectivity Honey et al., 09). We exploit this relationship in the validation of structural connectivity estimates. This is achieved by constraining the conditional independence structure of functional activity by structural connectivity Marrelec et al., 06; Smith et al., 10; Varoquaux et al., 10; Deligianni et al., 11). Assume that a K 1 vector of BOLD responses y can be modeled by a zeromean Gaussian density with inverse covariance matrix Q. That is, { Py Q) = 2π) K/2 Q 1/2 exp 1 } 2 y Qy. 4) Then, given acquired resting-state data D = y 1 ;... ;y T ) for T time points, model estimation reduces to finding the maximum likelihood solution ˆQ = arg max Q t P yt Q). However, in general for fmri data, K > T, which implies that the covariance matrix is not full rank. Hence, finding its inverse requires suboptimal solutions such as the generalized inverse or pseudo-inverse Ryali et al., 11). As a solution to this problem, regularization approaches have been suggested to find sparse approximations of the inverse covariance matrix Friedman et al., 08; Huang et al., 10). In our setup, the sparsity structure is readily available in the form of structural connectivity A. In order to use A as a constraint when estimating ˆQ, we can make use of the fact that variables y i and y j are conditionally independent if and only if q ij = 0 Dempster, 1972). That is, we can interpret Eq. 4) as a Gaussian Markov random field with respect to network A such that a ij = 0 q ij = 0 for all i j Whittaker, 1990). We will use notation Q A to

6 denote that the independence structure in Q is dictated by A. Let ˆΣ = 1 T T t=1 yt y t ) denote the empirical covariance matrix. As shown by Dahl et al. 08), the ML estimate can be formulated as the following convex optimization problem: ˆQ = arg max Q A l Q) s.t. {q ij = 0 a ij = 0}, where l Q) = T/2) log detq traceq ˆΣ) ) is, up to a constant, the log-likelihood function of Q. We made use of a standard convex solver to find this constrained maximum likelihood estimate Schmidt et al., 07) and use it to define the score for a particular matrix A: SA) l ˆQ). By comparing scores for different structural connectivity estimates, we are able to quantify the performance of a structural network in terms of how well it fits the functional data. Since we compare different models, we have to take model complexity into account. We could opt for the use of a penalty term such as the Bayesian information criterion. Here, however, we use a more stringent approach, where we enforce constant model complexity. This is implemented by constraining the number of edges for all networks from one subject to be equal to that of the maximum likelihood ML) solution A ML = arg max A P N A, a +, a ) of that particular subject. Recall that this maximum likelihood solution is equivalent to the solution obtained when using a flat prior in our generative model. For the multisubject prior, the constraint on edge count is achieved by starting out with the converged ML solution and, subsequently, drawing new samples by simultaneously adding and removing an edge. For the thresholded networks, we choose a threshold such that the resulting number of edges is the same as that of the ML solution. Note that this approach is only a way to obtain a fair comparison between different structural networks and not a requirement of the model itself. The threshold was applied to the asymmetric streamline data, normalized according to the number of streamlines emanating from each. Note that all added edges were symmetric. 3. Results In order to validate our framework, we made use of resting-state functional data which was acquired in conjunction with the diffusion imaging data. Specifically, we compared the fit to the functional data for structural networks either obtained by the standard thresholded approach or obtained using the developed generative model. The fit to the functional data is quantified in terms of the score SA). We performed a comparison using either a flat prior by choosing p = 0.5) or the multi-subject prior. For simplicity, the hyperparameters a + and a were manually set to 1 and 0.1, respectively, as small values for the hyperparameters capture the burstiness phenomenon described 6 in Section 1. For the thresholded approaches, we have one structural network estimate, denoted by A T. In contrast, for our generative model, we have a posterior over structural networks, which gives rise to a distribution of scores SA t) ) where t denotes sample index Comparing ML estimates with thresholded networks The sparsity of the maximum likelihood estimates A ML, as obtained with the flat prior, was fairly constant ±39.4 out of 6670 possible edges). As an example, Fig. 4 shows connectivity results for one subject. Although thresholding of streamline distributions is common practice, how exactly the threshold is applied varies between studies. To have a fair comparison, we investigated the impact of different thresholding approaches. We considered applying the threshold to the maximum, the mean and the minimum of n ij and n ji, respectively. To compare our generative model with these approaches, we computed for each subject the fraction of samples of the posterior network distributions that scored higher than thresholding. Let f F-T be the fraction of samples where the generative model with a flat prior scored higher than the thresholded network. The results for the distribution of f F-T over subjects, given the different threshold methods, are shown in Table 1. When the threshold is applied to the maximum of n ij and n ji, the generative model outperforms thresholding. However, when either the mean or the minimum of n ij and n ji is used, samples obtained from the posterior with a flat prior score the same as thresholded networks, on average. To explain this behavior, it is instructive to consider Eq. B.3) in Appendix B. Given hyperparameters a + and a very small compared to elements of N, the change in log-likelihood after flipping edge a ij from absent to present boils down to L ij a + a ) [ log nij k n ik ) + log )] nji k n jk This expression nicely summarizes the ramifications of our model. When sampling over networks, the generative model takes symmetry between streamlines into account which follows from the sum) and it considers the relative distribution of streamlines which follows from the fractions). Note that the latter is equivalent to normalizing the streamlines; a required step for thresholding. Thresholding approaches can imitate the behavior of the DCM by thresholding on either the mean or the minimum of n ij and n ji and by normalizing the streamline distribution by the number of outgoing streamlines Multi-subject prior With optimal threshold settings, it is possible to have thresholded networks that perform similar to the networks we infer through the posterior distribution with a flat prior. However, our model is capable of incorporating additional constraints, such as the multi-subject prior. Let f M-T be the fraction of samples where the DCM with the.

a) Thresholded network b) Flat model c) 1 Multi-subject model 1 0.8 0.8 100 100 0.6 0.4 0.2 100 0.6 0.4 0.2 0 Streamlines Prior Salient differences d) e) f) 12 0 10 0.8 8 6 4 0.6 0.4 100 2 100 0.

Shown are a) the network that is obtained through the thresholding approach, b) the posterior connection probabilities according to the flat model, c) the posterior connection probabilities according

Panel f) shows the most salient differences in connectivity between the maximum a posteriori networks and the thresholding approach, across all subjects. The edges are color-coded.

7 a) Thresholded network b) Flat model c) 1 Multi-subject model Streamlines Prior Salient differences d) e) f) Figure 4: a e) Connectivity results for that subject for which sampling in conjunction with the multi-subject prior showed the largest improvement. Shown are a) the network that is obtained through the thresholding approach, b) the posterior connection probabilities according to the flat model, c) the posterior connection probabilities according to the multi-subject model, d) the streamline distribution on a log scale and e) the multi-subject prior based on the other subjects as used in the multi-subject model. Panel f) shows the most salient differences in connectivity between the maximum a posteriori networks and the thresholding approach, across all subjects. The edges are color-coded. White edges indicate those connections that were present in at least 6 subjects whereas these edges would not be part of the thresholded network. Orange edges show converse findings. All matrices are ordered according to the order of the AAL atlas. Table 1: The fraction of samples that have a higher score than thresholded networks. The fraction of samples from the distribution with a flat and an multi-subject prior are represented by f F-T and f M-T, respectively. The different threshold approaches are max, mean and min. The p-values were obtained using a one-sample t-test with µ 0 = 0.5. f F-T p f M-T p max 0. ± ± 0.07 <0.001 mean 0.50 ± ± min 0.49 ± ± multi-subject prior scored higher than the thresholded network. The results for the distribution of f M-T over subjects, given the different threshold methods, are shown in Table 1. In addition, we compared the fraction of samples obtained with the multi-subject prior that scored higher than samples with the flat prior, f M-F. We found that this distribution had a mean of 0.64 ± 0.04 p < 10 3 ). The likelihood scores estimated for the distributions over samples, obtained using our approach in the presence of either the flat prior or the multi-subject prior, are shown in Fig. 5. In addition, the figure shows the score for the thresholded network, with a threshold applied to the minimum of n ij and n ji. The distributions obtained using the multi-subject prior are narrower and therefore more consistent than those obtained with the flat prior. Moreover, likelihood scores obtained using the multi-subject prior tend to be of higher magnitude than those obtained using the flat prior. From these results we can conclude that our model is up to par with the most optimal threshold approaches, but that it is capable of surpassing thresholded networks by using informative priors. 7 Lastly, Fig. 6 shows the connections for which our multisubject approach differs most from those of thresholding, across all subjects. The edges correspond with those shown in Fig. 4f). The figure shows edges that are present in the maximum a posteriori networks while being absent in the corresponding thresholded networks for at least 6 subjects and vice versa. The edges consistently and exclusively included by either of the approaches do not differ much in length. In fact, the mean edge lengths are very close: 17.6 ± 1.6 mm for threshold-favored edges and 17.0 ± 1.4 mm for DCM-favored edges. However, we do observe that when using the multi-subject prior, consistency for cerebellar and anterior cortical tracts is increased.

8 # f F-T f M-T f M-F Multi-subject prior Flat prior Thresholded network Figure 5: The scores SA) horizontal axis) for the thresholded network A T, samples from the generative model given a flat prior lower histogram, red) and given the multi-subject prior upper histogram, green). The fraction of each bin that has a bright color corresponds with the fraction of the other distribution that is outperformed by this bin. These fractions are also shown in the table to the right; f F-T is the fraction of samples with a flat prior that outperform thresholding, f M-T is the fraction of samples with the multi-subject prior that outperform thresholding and f M-F is the fraction of samples with the multi-subject prior that outperform samples with the flat prior. The subjects are ordered according to the performance of the multi-subject prior approach relative to the thresholded network. 4. Discussion Standard thresholding approaches for the inference of whole-brain structural networks suffer from the fact that they rely on arbitrary thresholds while assuming independence between tracts and ignoring prior knowledge. In order to overcome these problems, we have put forward a Bayesian framework for inference of structural brain networks from diffusion-weighted imaging. Our approach makes use of a Dirichlet compound multinomial distribution to model the streamline distribution obtained by probabilistic tractography. In addition, we defined a simple prior on degrees as well as a multi-subject prior that uses connectivity estimates from other subjects as an additional source of information. The proposed methodology was validated using simultaneously acquired resting-state functional MRI data. The outcome of our experiments revealed that the generative model combined with a flat prior performs equally well as the most optimal thresholded network. The use of an informative multi-subject prior instead created networks that significantly outperformed the thresholding approach. A comparison between the networks obtained with the multisubject or flat prior showed that the former improved on the latter, thereby motivating the use of the multi-subject prior. In our setup, the hyperparameters a + and a were set by hand and the edge probability p was chosen to result in a flat prior. Instead these parameters could have been estimated from the streamline data using empirical Bayes, they could have been integrated out entirely in a full Bayesian sampling approach, or they could be optimized according to the resting-state functional data. Note further that a fair comparison between networks required model complexity to be controlled. This was achieved via the constraint that networks obtained with either the multi-subject prior or with the thresholding approach had the same number of edges as the most probable network with a flat prior. While this is to the advantage of the thresholding approach, since no arbitrary threshold needs to be chosen, it can only impede networks obtained using the multi-subject prior since that might support a different number of tracts. Even given optimal settings for the thresholding approach, our approach shows clear benefits. Foremost, the DCM model intuitively assigns probabilities to the existence of edges in the inferred networks, providing a mechanism to cope with the uncertainty in the data. Moreover, the proposed generative model allows for intuitive and well founded priors, such as the described multi-subject prior. The hierarchical model in Fig. 3 also allows for grouplevel inference Robinson et al., 10). This means that, 8

Figure 6: The most salient differences in connectivity between the maximum a posteriori networks with the multi-subject prior and the thresholded networks, across all subjects.

Red, thin edges show converse findings. Nodes that are not adjacent to any of these edges are omitted.

This allows one to get a handle on group differences, for instance, in the context of clinical neuroscience.

In future work, we will focus more on interpretation of the obtained structural connectivity estimates.

A logical extension of our work is to derive connectivity based on the integration of these two imaging modalities.

We can then use structural networks as an informed prior for inference of functional connectivity or infer structural connectivity from both modalities simultaneously.

In the thresholding approach, the network sparsity is a consequence of the specific threshold setting.

9 Figure 6: The most salient differences in connectivity between the maximum a posteriori networks with the multi-subject prior and the thresholded networks, across all subjects. The edges are color-coded. Blue, thick edges indicate those connections that were present in at least 6 subjects whereas these edges would not be part of the thresholded network. Red, thin edges show converse findings. Nodes that are not adjacent to any of these edges are omitted. given streamline data for multiple subjects, the generative model can be used to infer individual subject connectivity A as well as the group-level parent network Ā. This allows one to get a handle on group differences, for instance, in the context of clinical neuroscience. The current work focused mainly on the empirical validation of our theoretical framework using functional data. In future work, we will focus more on interpretation of the obtained structural connectivity estimates. In this paper we used resting-state fmri data as a means to validate whole-brain structural networks derived from diffusion-weighted imaging. A logical extension of our work is to derive connectivity based on the integration of these two imaging modalities. This example of Bayesian data fusion requires that we extend the generative model to take functional data into account as well Rykhlevskaia et al., 08; Sui et al., 11). We can then use structural networks as an informed prior for inference of functional connectivity or infer structural connectivity from both modalities simultaneously. An additional benefit of our framework is that the network sparsity follows directly from optimizing Eq. 1). In the thresholding approach, the network sparsity is a consequence of the specific threshold setting. As a byproduct of our study, we have observed that thresholding of streamlines benefits from considering the mean or minimum of the number of streamlines connecting A to B and vice versa. This in itself may lead to improvements in the analysis of structural connectivity. Summarizing, we proposed a Bayesian framework which lays the foundations for a theoretically sound approach to the inference of whole-brain structural networks. This framework does not suffer from the issues which plague current thresholding approaches to structural connectivity estimation and has been shown to give rise to substantially 9 improved structural connectivity estimates. The proposed generative model is easily modified to incorporate other sources of information, thereby further facilitating the estimation of whole-brain structural networks in vivo. Acknowledgements The authors gratefully acknowledge the support of the BrainGain Smart Mix Programme of the Netherlands Ministry of Economic Affairs and the Netherlands Ministry of Education, Culture and Science. The authors thank Erik van Oort and David Norris for the acquisition of the rsfmri and DWI data and the anonymous reviewers for their valuable suggestions and comments to improve the quality of the paper. Appendix A. Derivation of the multi-subject prior We describe here the derivation of the multi-subject prior based on the maximum likelihood estimates for previously seen subjects, as described in Section The

10 prior on A A M+1) is given by P A N, ξ) = P A Ā) P Ā N, ξ ) Ā P A Ā) P Ā p ) Ā A Ā P P N A, a +, a )P A Ā) P A Ā) P Ā p ) M P m=1 A m) A m) Ā N m) A m), a +, a ) We approximate this quantity by assuming that the main contribution in the sum over A m) is due to the ML solution Â m) = arg max P N m) A m), a +, a ). A m) Following the Erdős-Rényi model with p = 0.5 for P Ā p ) gives a flat prior on simple graphs. Up to irrelevant constants and keeping in mind that Â depends on N m), the prior is rewritten as P A N, ξ) Ā ) P M A Ā) Âm) Ā) P m=1 with Â = {Â1),...,ÂM) } the different ML solutions. We assume that the prior factorizes into ) P A M+1) N, ξ = ) P a M+1) ij N, ξ. A.1) i<j Next, we define the probability that a m) ij, m = 1,..., M + 1), inherits the connectivity from the parent network ā ij by P a m) ij = 1 ā ij = 1 ) = P. ) a m) ij = 0 ā ij = 0 q ij, m=1 âm) ij with q ij close to 1. That is, each a m) ij is a copy of ā ij with unknown probability q ij. The copying probabilities are assumed to be independent and have a flat prior. Estimating the prior probability for each edge is then nothing but an instance of Laplace s rule of succession. This says that, if we repeat an experiment that we know can result in a success presence of an edge) or failure absence of an edge) m times independently, and get M successes, then our best estimate of the probability that the next repetition a M+1) ij will be a success is: M Pa M+1) ij = 1 â 1) ij,...,âm) m=1 ij ) = âm) ij + 1 p ij. M + 2 Plugging this into Eq. A.1), we obtain the prior ) P A M+1) N, ξ = p am+1) ij ij 1 p ij ) 1 am+1) ij ). i<j, 10 Appendix B. MCMC sampling We derive here the acceptance rate γ of a sample A in the sampling chain as a function of one edge flip in A see Section 2.5). Note that each of the 2 KK 1)/2 possible networks A has a probability greater than zero of being constructed, which guarantees that the Markov chain is irreducible. The log acceptance rate of a suggested sample can be calculated as log γ = L kl + P kl, with L kl and P kl the change in log-likelihood and log-prior respectively, after flipping edge a kl. The sampling approach requires that we can efficiently update both the likelihood and the prior for new samples in the Markov chain. The log-likelihood is given by L i log N i! j n ij! + log + j log Γ b ij + n ij ) Γ b ij ) ) Γ j b ij ) Γ j b ij + n ij ) B.1) with b ij a ij a + +1 a ij )a. The change in log-likelihood as a consequence of flipping an edge a kl is defined as L kl = log P N A, a +, a ) log P N A, a +, a ), B.2) with the sole difference that a kl = a lk = 1 a kl). Plugging B.1) into B.2) yields [ ] [ ] Γ b L kl = log kl + n kl ) Γ b + log lk + n lk ) Γ b kl + n kl ) Γ b lk + n lk ) ) + log Γ j b kj ) + log Γ j b lj ) Γ j b kj Γ j b lj ) log Γ j b kj + n kj) ) Γ j b kj + n kj ) ) log Γ j b lj + n [ ] lj) ) Γ b 2 log kl ). Γ j b lj + n lj ) Γ b kl ) B.3) The change in the log-prior as a consequence of flipping a kl to 1 a kl for the prior follows from its definition in Eq. 3) P kl = log P A N, ξ) log P A N, ξ) [ ) )] pkl plk = 4a kl 2) log + log 1 p kl 1 p lk Here the edge probability p kl is the same for all edges in the case of the Erdős-Rényi model and estimated separately per edge in case of the multi-subject prior..

11 References Bassett, D.S., Brown, J.A., Deshpande, V., Carlson, J.M., Grafton, S.T., 11. Conserved and variable architecture of human white matter connectivity. NeuroImage 54, Behrens, T.E.J., Johansen-Berg, H., Jbabdi, S., Rushworth, M.F.S., Woolrich, M.W., 07. Probabilistic diffusion tractography with multiple fibre orientations: What can we gain? NeuroImage 34, Behrens, T.E.J., Woolrich, M.W., Jenkinson, M., Johansen-Berg, H., Nunes, R.G., Clare, S., Matthews, P.M., Brady, J.M., Smith, S.M., 03. Characterization and propagation of uncertainty in diffusion-weighted MR imaging. Magnet. Reson. Med. 50, Cabral, J., Hugues, E., Kringelbach, M.L., Deco, G., 12. Modeling the outcome of structural disconnection on resting-state functional connectivity. NeuroImage 62, Catani, M., 07. From hodology to function. Brain 130, 2 5. Chung, H.W., Chou, M.C., Chen, C.Y., 10. Principles and limitations of computational algorithms in clinical diffusion tensor MR tractography. Am. J. Neuroradiol. 32, Chung, M.K., Adluru, N., Dalton, K.M., Alexander, A.L., Davidson, R.J., 11. Scalable brain network construction on white matter fibers, in: SPIE Medical Imaging, pp Conturo, T.E., Lori, N.F., Cull, T.S., Akbudak, E., Snyder, A.Z., Shimony, J.S., McKinstry, R.C., Burton, H., Raichle, M.E., Tracking neuronal fiber pathways in the living human brain. Proc. Natl. Acad. Sci. USA 96, Cook, P.A., Bai, Y., Seunarine, K.K., Hall, M.G., Parker, G.J., Alexander, D.C., 06. Camino: open-source diffusion-mri reconstruction and processing, in: 14th Scientific Meeting of the International Society for Magnetic Resonance in Medicine, Seattle, WA, USA. p Dahl, J., Vandenberghe, L., Roychowdhury, V., 08. Covariance selection for non-chordal graphs via chordal embedding. Optim. Method. Softw. 23, Damoiseaux, J.S., Greicius, M.D., 09. Greater than the sum of its parts: a review of studies combining structural connectivity and resting-state functional connectivity. Brain Struct. Funct. 213, Deligianni, F., Varoquaux, G., Thirion, B., Robinson, E., Sharp, D.J., Edwards, A.D., Rueckert, D., 11. A probabilistic framework to infer brain functional connectivity from anatomical connections, in: Information Processing in Medical Imaging, Springer, Kaufbeuren, Germany. pp Dempster, A.P., Covariance selection. Biometrics 28, Erdős, P., Rényi, A., 19. On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, Friedman, J., Hastie, T., Tibshirani, R., 08. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, Friman, O., Farnebäck, G., Westin, C.F., 06. A Bayesian approach for stochastic white matter tractography. IEEE Trans. Med. Imag. 25, Gong, G., Rosa-Neto, P., Carbonell, F., Chen, Z.J., He, Y., Evans, A., 09. Age- and gender-related differences in the cortical anatomical network. J. Neurosci. 29. Greicius, M.D., Supekar, K., Menon, V., Dougherty, R.F., 09. Resting-state functional connectivity reflects structural connectivity in the default mode network. Cereb. cortex 19, Groves, A.R., Beckmann, C.F., Smith, S.M., Woolrich, M.W., 11. Linked independent component analysis for multimodal data fusion. NeuroImage 54, Hagmann, P., Cammoun, L., Gigandet, X., Meuli, R., Honey, C.J., Wedeen, V., Sporns, O., 08. Mapping the structural core of human cerebral cortex. PLoS Biol. 6, e159. Hagmann, P., Kurant, M., Gigandet, X., Thiran, P., Wedeen, V., Meuli, R., Thiran, J.P., 07. Mapping human whole-brain structural networks with diffusion MRI. PLoS ONE 2, e597. Honey, C.J., J.-P., T., Sporns, O., 10. Can structure predict function in the human brain? NeuroImage 52, Honey, C.J., Sporns, O., Cammoun, L., Gigandet, X., Thiran, J.P., Meuli, R., Hagmann, P., 09. Predicting human resting-state functional connectivity from structural connectivity. Proc. Natl. Acad. Sci. USA 106, 35. Horwitz, B., Poeppel, D., 02. How can EEG/MEG and fmri/pet data be combined? Hum. Brain Mapp. 17, 1 3. Huang, S., Li, J., Sun, L., Ye, J., Fleisher, A., Wu, T., Chen, K., Reiman, E., 10. Learning brain connectivity of Alzheimer s disease by sparse inverse covariance estimation. NeuroImage 50, Jbabdi, S., Woolrich, M.W., Andersson, J.L.R., Behrens, T.E.J., 07. A Bayesian framework for global tractography. NeuroImage 37, Kaden, E., Knösche, T.R., Anwander, A., 07. Parametric spherical deconvolution: Inferring anatomical connectivity using diffusion MR imaging. NeuroImage 37, Koch, M., Norris, D.G., Hund-Georgiadis, M., 02. An investigation of functional and anatomical connectivity using magnetic resonance imaging. NeuroImage 16, Lv, J., Guo, L., Hu, X., Zhang, T., Li, K., Zhang, D., Yang, J., Liu, T., 10. Fiber-centered analysis of brain connectivities using DTI and resting state fmri data. Med. Image Comput. Comput. Assist. Interv. 13, Madsen, R.E., Kauchak, D., Elkan, C., 05. Modeling word burstiness using the Dirichlet distribution, in: Proceedings of the 22nd International Conference on Machine Learning, ACM, New York, NY, USA. pp Marrelec, G., Krainik, A., Duffau, H., Pélégrini-Issac, M., Lehéricy, S., Doyon, J., Benali, H., 06. Partial correlation for functional brain interactivity investigation in functional MRI. NeuroImage 32, Minka, T.P., 00. Estimating a Dirichlet distribution. Technical Report. MIT. Mukherjee, S., Speed, T.P., 08. Network inference using informative priors. Proc. Natl. Acad. Sci. USA 105, Park, C., Kim, S., Kim, Y., Kim, K., 08. Comparison of the smallworld topology between anatomical and functional connectivity in the human brain. Physica A 387, Penny, W.D., Friston, K.J., Ashburner, J.T., Kiebel, S.J., Nichols, T.E. Eds.), 06. Statistical Parametric Mapping: The Analysis of Functional Brain Images. Academic Press Inc.. 1st edition. Robinson, E.C., Hammers, A., Ericsson, A., Edwards, A.D., Rueckert, D., 10. Identifying population differences in whole-brain structural networks: a machine learning approach. NeuroImage 50, Robinson, E.C., Valstar, M., Hammers, A., Ericsson, A., Edwards, A.D., Rueckert, D., 08. Multivariate statistical analysis of whole brain structural networks obtained using probabilistic tractography. Med. Image Compute. Comput. Assist. Interv. 11, Ryali, S., Chen, T., Supekar, K., Menon, V., 11. Estimation of functional connectivity in fmri data using stability selectionbased sparse partial correlation with elastic net penalty. NeuroImage 59, Rykhlevskaia, E., Gratton, G., Fabiani, M., 08. Combining structural and functional neuroimaging data for studying brain connectivity: A review. Psychophysiology 45, Schmidt, M., Fung, G., Rosales, R., 07. Fast optimization methods for L1 regularization: A comparative study and two new approaches. Machine Learning: ECML 07, Shu, N., Liu, Y., Li, K., Duan, Y., Wang, J., Yu, C., Dong, H., Ye, J., He, Y., 11. Diffusion tensor tractography reveals disrupted topological efficiency in white matter structural networks in multiple sclerosis. Cereb. Cortex 21. Simpson, S.L., Hayasaka, S., Laurienti, P.J., 11. Exponential Random Graph Modeling for complex brain networks. PLoS ONE 6, e039. Skudlarski, P., Jagannathan, K., Calhoun, V., Hampson, M., Skudlarska, B.A., Pearlson, G., 08. Measuring brain connectivity: diffusion tensor imaging validates resting state temporal correlations. NeuroImage 43, Smith, S.M., 02. Fast robust automated brain extraction. Hum.

Connectomics analysis and parcellation of the brain based on diffusion-weighted fiber tractography

Connectomics analysis and parcellation of the brain based on diffusion-weighted fiber tractography Alfred Anwander Max Planck Institute for Human Cognitive and Brain Sciences Leipzig, Germany What is the