Graph Frequency Analysis of Brain Signals

Size: px

Start display at page:

Download "Graph Frequency Analysis of Brain Signals"

Merilyn Amelia Shaw
6 years ago
Views:

1 1 Graph Frequency Analysis of Brain Signals Weiyu Huang, Leah Goldsberry, Nicholas F. Wymbs, Scott T. Grafton, Danielle S. Bassett and Alejandro Ribeiro arxiv: v1 [q-bio.nc] 2 Nov 215 Abstract This paper presents methods to analyze functional brain networks and signals from graph spectral perspectives. The notion of frequency and filters traditionally defined for signals supported on regular domains such as discrete time and image grids has been recently generalized to irregular graph domains, and defines brain graph frequency associated with different levels of spatial smoothness across the brain regions. Brain network frequency also enables the decomposition of brain signals into pieces corresponding to smooth or vibrant variations. We relate graph frequency with principal component analysis when the networks of interest denote functional connectivity. The methods are utilized to analyze brain networks and signals as subjects master a simple motor skill. We observe that brain signals corresponding to different graph frequencies exhibit different levels of adaptability throughout learning. Further, we notice the strong association between graph spectral property of brain networks with the level of exposure to tasks performed, and recognize the most contributing and important frequency signatures at different task familiarity. Index Terms Functional brain network, network theory, graph signal processing, fmri, motor learning, filtering I. INTRODUCTION We consider functional brain signals that describe activities at brain regions and the networks depicting functional connectivities between regions, and demonstrate how information can be gleaned by analyzing them using tools from graph signal processing. Most often, brain activities [1], [2] and functional brain networks [3], [4] are analyzed separately as different entities. However, they are highly related since activities can be regarded as signals supported on the underlying networks. Brain networks, irrespective of whether their connectivity is functional or structural, tend to be stable for a window of time, entailing associations between brain regions during captured time of interest. Brain activities can vary more frequently, forming multiple samples of brain signals supported on a common underlying network. The benefits of capturing network information in analyzing brain signals have been recognized and exploited in multiple domains including dictionary learning [5], rating predictions in recommendation systems [6], semi-supervised learning [7], image processing and denoising [8], image and video compressing [9], and breast cancer diagnostics [1]. Graph signal processing generalizes notions of frequency and filters conventionally described for signals residing on regular Work in this paper is supported NSF CCF , PHS NS44393, ARO ICB W911NF-9-1, John D. and Catherine T. MacArthur Foundation, the Alfred P. Sloan Foundation, ARO ARL W911NF , ARO W911NF , the Office of Naval Research Young Investigator Program, NIH R1- HD86888, NSF BCS , and NSF BCS The content is solely the responsibility of the authors and does not necessarily represent the official views of any of the funding agencies. W. Huang, L. Goldsberry, D. S. Bassett, and A. Ribeiro are with the Department of Electrical and Systems Engineering, University of Pennsylvania, 2 South 33rd Street, Philadelphia, PA N. F. Wymbs is with the Department of Physical Medicine and Rehabilitation, Johns Hopkins University, Baltimore, MD 21218, S. T. Grafton is with the Department of Psychological and Brain Sciences, University of California at Santa Barbara, Santa Barbara, CA whuang@seas.upenn.edu, goleah@seas.upenn.edu, nwymbs1@jhmi.edu, scott.grafton@psych.ucsb.edu, dsb@seas.upenn.edu, and aribeiro@seas.upenn.edu. domains, e.g. discrete time and image grids, to irregular graph and network domains. Different graph frequency components entail different levels of spatial variations with respect to the brain network. Graph frequency provides a method to equivalently represent a graph signal in two different domains the domain consisted of nodes in the network and the graph spectral domain incorporating graph frequencies. Therefore, brain signals can be manipulated in the spectral domain to shape frequency to favor different levels of interactions between neighbors in the network. Conventional frequency shaping has mainly been focused in two areas: (i) removing high frequency components with high variation to de-noise in speech [11] or images [12], which has been recently generalized to improve accuracy of recommendation system [6] and breast cancer detection [1], or (ii) extracting frequency signatures that explain the most variation in observations to compress data [13] and improve classification [14]. These frequency methods have not been applied to analyze brain signals and networks to date, and it is not clear whether brain signals associated with which kind of variation on the brain networks are constructive or destructive with neurophysiological activities such as perception, recognition, motor performance, or general learning. The main problem addressed in this paper is the application of graph signal processing tools to analyze brain signals and networks during motor learning. We present formal definitions of graph frequency and frequency decomposition of graph signals into pieces corresponding to different levels of spatial variations (Section II), and establish connections between graph signal processing with principal component analysis (Section II-C) when the networks denote functional connectivity in particular. The two experiments involving different motor tasks and different sets of participants considered in the paper are described (Section VI). The graph frequency of functional brain networks for the participants is visualized and analyzed (Section IV), which is found to be highly associated across scan sessions in the same dataset and across datasets. A simple model to construct artificial networks with a few network descriptive parameters is described (Section IV-A) and used to analyze spectral properties of the brain networks (Section IV-B). The paper then utilizes graph frequency decomposition to visualize and investigate brain activities with different levels of spatial variation (Section V). It is observed that the decomposed signals associated to different graph frequencies exhibit highly distinguished levels of adaptability throughout learning (Section V-A), which is consistent for different temporal scales and experiments. We also define learning capabilities of subjects, and examine the importance of brain frequency components at different task familiarity by evaluating their respective association with learning performance (Section VI). A strong correlation exists, and graph signal processing is utilized to recognize the most crucial frequency signatures at different task familiarity.

2 2 II. GRAPH SIGNAL PROCESSING The interest of this paper is to study brain signals in which we are given a collection of measurements x i associated with each cortical region out of n different brain regions. An example signal of this type is an fmri reading in which x i estimates the level of activity of brain region i. The collection of n measurements is henceforth grouped in the vector signal x = [x 1, x 2,..., x n ] T R n. A fundamental feature of the signal x is the existence of an underlying pattern of structural or functional connectivity that couples the values of the signal x at different brain regions. Irrespective of whether connectivity is functional or structural, our goal here is to describe tools that utilize this underlying brain network to analyze patterns in the neurophysiological signal x. We do so by modelling connectivity between brain regions with a network that is connected, weighted, undirected, and symmetric. Formally, we define a network as the pair G = (V, W), where V = {1,..., n} is a set of n vertices or nodes representing individual brain regions and W : V V R ++ is a map from the set of edges to the nonnegative reals that associate a weight w ij to the edge (i, j) V V. Since the network is undirected and symmetric we have that w ij = w ji for all (i, j). The weights w ij = w ji represent the strength of the connection between regions i and j, or, equivalently, the proximity or similarity between nodes i and j. In terms of the signal x, this means that when the weight w ij is large, the signal values x i and x j tend to be related. Conversely, when the weight w ij is small or, in the extremum, when we have w ij =, the signal values x i and x j are not directly related except for what is implied by their separate connections to other nodes. We adopt the conventional definitions of the adjacency, degree, and Laplacian matrices for the weighted graph G = (V, W), [15, Chapter 1]. The adjacency matrix A R n n + is defined such that A ij = w ij when i j and A ij = otherwise. The degree matrix D R n n + is a diagonal matrix with its ith diagonal element D ii = j w ij denoting the sum of all the weights of edges out of node i. The Laplacian matrix is defined as the difference L := D A R n n. The components of the Laplacian matrix are explicitly given by L ij = w ij, L ii = n w ij. (1) j=1 Observe that the Laplacian is real, symmetric, diagonal dominant, and with strictly positive diagonal elements. As such, all of its eigenvalues are real and nonnegative, which implies that the matrix L is positive semidefinite. Further observe that L1 = with 1 and denoting the vectors of all ones and all zeros respectively, which implies that is an eigenvalue of L and that, as a consequence, L is not positive definite. The eigenvector decomposition of L is utilized in the following section to define the graph Fourier transform and the associated notion of graph frequencies. A. Graph Fourier Transform and Graph Frequencies Begin by considering the set {λ k } k=,1,...,n 1 of eigenvalues of the Laplacian L and assume they are ordered so that = λ λ 1... λ n 1. Each of these eigenvalues has an associated unit-norm eigenvector v k such that Lv k = λ k v k. Define the diagonal eigenvalue matrix Λ := diag(λ, λ 1..., λ n 1 ) and the eigenvector matrix V := [v, v 1,..., v n 1 ]. (2) Because the graph Laplacian L is real symmetric it accepts the eigenvalue decomposition L = VΛV H, (3) where V H represents the Hermitian (conjugate transpose) of the matrix V. The validity of (3) follows because the eigenvectors of symmetric matrices are orthogonal so that the definition in (2) implies that V H V = I. The eigenvector matrix V is used to define the Graph Fourier Transform of the graph signal x as we formally state next; see, e.g., [16]. Definition 1 Given a signal x R n and a graph Laplacian L R n n accepting the decomposition in (3), the Graph Fourier Transform (GFT) of x with respect to L is the signal x = V H x. (4) The inverse (i)gft of x with respect to L is defined as x = V x. (5) We say that x and x form a GFT transform pair. Observe that since VV H = I, the igft is, indeed, the inverse of the GFT. Given a signal x we can compute the GFT as per (4). Given the transform x we can recover the original signal x through the igft transform in (5). There are several reasons that justify the association of the GFT with the Fourier transform. Mathematically, it is just a matter of definition that if the vectors v k in (3) are of the form v k = [1, e j2πk/n,..., e j2πk(n 1)/n ] T, the GFT and igft in Definition 1 reduce to the conventional time domain Fourier and inverse Fourier transforms. More deeply, it is not difficult to see that if the graph G is a cycle, the vectors v k in (3) are of the form v k = [1, e j2πk/n,..., e j2πk(n 1)/n ] T. Since cycle graphs are representations of discrete periodic signals, it follows that the GFT of a time signal is equivalent to the conventional discrete Fourier transform; see, e.g., [17]. An important property of the GFT is that it encodes a notion of variability akin to the notion of variability that the Fourier transform encodes for temporal signals. To see this, define x = [ x,..., x n 1 ] T and expand the matrix product in (5) to express the original signal x as x = n 1 k= x k v k. (6) It follows from (6) that the igft allows us to write the signal x as a sum of orthogonal components v k in which the contribution of v k to the signal x is the GFT component x k. In conventional Fourier analysis, the eigenvectors v k = [1, e j2πk/n,..., e j2πk(n 1)/n ] T carry a specific notion of variability encoded in the notion of frequency. When k is close to zero (low frequencies), the corresponding complex exponential eigenvectors are smooth, slowly varying functions. When k is close to n (high frequencies), the corresponding complex exponential eigenfunctions fluctuate more rapidly in the discrete temporal domain. Thus, when we look at the frequency components x k with small k we are looking at how much slowly varying oscillations contribute to x. When we look at x k with large k, we are

3 3 examining the contribution of rapidly varying oscillations to the signal x. In the graph setting, the graph Laplacian eigenvectors provide a similar notion of frequency. Indeed, define the total variability of the graph signal x with respect to the Laplacian L as TV(x) = x H Lx = i j w ij (x i x j ) 2, (7) where in the second equality we expanded the quadratic form and used the definition of the Laplacian entries in (1). It follows that the total variation TV(x) is a measure of how much the signal changes as measured with respect to the expectation on variability that is encoded by the weight w ij. When w ij is large we expect x i and x j to be close because a large weight w ij is encoding the structure of functional relationship between brain regions i and j. If they are not related, the contribution of their difference to the total variation is amplified by the weight w ij. If the weight w ij is small or null, the difference between the signal values x i and x j makes little or no contribution to the total variation. We can then think of a signal with small total variation as one that changes slowly over the graph and of signals with large total variation as those that change rapidly over the graph. Consider now the total variation of the eigenvectors v k and use the facts that Lv k = λ k v k and that v H k v k = 1 to conclude that TV(v k ) = v H k Lv k = λ k. (8) It follows from (8) and the fact that the eigenvalues are ordered as = λ λ 1... λ n 1, that the total variations of the eigenvectors v k follow the same order. Combining this observation with the discussion following (7), we conclude that when k is close to, the eigenvectors v k vary slowly over the graph, whereas for k close to n the eigenvalues vary more rapidly. Returning now to (6) we see that the GFT and igft allows us to decompose the brain signal x into components that characterize different levels of variability. We say that the eigenvectors associated with small eigenvalues λ k in (6) represent low graph frequencies because they vary slowly on the network composed of connections between brain regions. The corresponding GFT coefficients x k indicate how much these slowly varying signals contribute to the observed brain signal x. Likewise, the eigenvectors associated with large eigenvalues λ k in (6) represent high graph frequencies because they vary rapidly with respect to the underlying network of brain connectivity. The GFT coefficients x k indicate how much these rapidly varying signals contribute to the observed brain signal x. The frequency analysis of temporal signals is useful because it permits analysis in terms of components that represent different levels of variability. We will demonstrate in sections IV, V, and VI that graph frequency decompositions of brain signals enable analogous characterizations. Before that, we describe some basic notions of graph filtering and graph frequency decompositions in the following section. B. Graph Filtering and Frequency Decompositions Given a graph signal x with its graph frequency components x, we can take its frequency components corresponding to the lowest K L graph frequencies by defining { 1, if k K L 1; h L (k) = (9), otherwise K L K L + K M n Fig. 1. An illustration of the frequency responses of the family of three filters used in graph signal frequency decomposition. Because the gain of the passbands are always 1 and the filters are mutually exclusive yet collectively exhaustive in graph spectrum, they provide a method to decompose a given graph signal into pieces exhibiting different level of fluctuation with respect to the underlying graph. and perform the multiplication x L = H L x with H L = diag( h L ) being the diagonal matrix of h L. This operation happens in the graph spectral domain. Utilizing definitions of GFT (4) and igft (5), this is equivalent to perform the following operations in the graph vertex domain h L h M x L = V x = V H L V 1 x := H L x, (1) with H L = V H L V 1. Therefore, the graph signal x L associated with graph low frequencies of x can be considered as the multiplication of H L with x. If we denote Ψ as the Vandermonde matrix 1 λ λ n 1 Ψ = (11) 1 λ n 1 λ n 1 n 1 and define h L = Ψ 1 hl, H L can also be written as H L = n 1 k= h L(k)L k. Ψ 1 exists if the graph Laplacian does not have duplicate eigenvalues, which would highly be the case for functional brain networks. The structure of the Vandermonde matrix Ψ implies that h L corresponding to the frequency response h L defined in (9) would also concentrate on h L (k) for small index values k. Therefore, the graph signal x L corresponding with graph low frequencies of x can be interpreted as the aggregated effect of x with powers of graph Laplacian L k for small values of k. Because L x = x recovers the original signal, Lx entails local interactions between neighbors, L 2 x describes interfacing between nodes via intermediate common neighbors, L 3 x denotes communications between 3-hop neighbors and so on, x L results from the localized interactions of x from the graph vertex domain. From graph spectral domain, x L only has low graph frequency components of x and therefore can be considered as the result of passing x through a graph low-pass filter. Filters have been utilized in traditional signal processing since the birth of the field to analyze data such as acoustic signals, speech, or images. Filters are favored because they allow one to shape frequencies: by choosing different coefficients at different frequencies, the output of the filter emphasizes different types of interactions. Graph signal processing enables the generalization of filters in regular domains, e.g. discrete time or image grids, to irregular domains graphs and networks, but keeps the interpretation of filters as favoring different interactions between neighboring nodes. h H

4 4 The aforementioned graph low pass filter H L prefers localized interactions of the underlying network. Other types of graph filters can be defined analogously, leaning towards interfacing between nodes in networks other than localized ones. In this paper, apart from graph low-pass filter H L, we also consider graph band-pass filter H M and high pass filter H H, whose graph frequency responses are defined as { 1, if K L k K L + K M 1; h M (k) = (12), otherwise. { 1, if k K L + K M ; h H (k) = (13), otherwise. In words, the low-pass filter takes the lowest K L graph frequencies, the band-pass filter captures the middle K M graph frequencies, and the high-pass filter only considers the highest n K L K M frequencies. An illustration is shown in Figure 1. Notice that the three filters are defined such that the graph frequencies of their respective interest are mutually exclusive yet collectively exhaustive. As a result, if we denote x L = H L x as the graph signal filtered by the graph low-pass filter H L and x M as well as x H similarly, that the frequency gain at frequency pass bands is always 1 implies that x = x L + x M + x H in the graph spectral domain. Equivalently, in the graph vertex domain, x = x L + x M + x H. (14) Therefore, the family of graph filters H L, H M, and H H provides us a method to decompose a given graph signal x into three nonduplicating pieces, which respectively correspond to the graph signals that are smooth and slow varying on the network (x L ), the signals that fluctuate rapidly with respect to the graph (x H ), and the remaining signals (x M ). The frequency decomposition introduced in this section is also highly related to principal component analysis (PCA) [18], when the brain networks are constructed from functional connectivity estimates, as we describe in the next section. C. Relationship to Principal Component Analysis Functional brain networks have recently been extensively studied [19], [2]. In such scenarios, graph observations Y R p n at p time points are available, with the ith column y i denoting the observation at node i. The functional brain networks are then constructed so that the weight of an edge w ij is given by the Person correlation coefficient between the BOLD time series of node i and the BOLD time series of node j. In this section, we connect PCA for brain signals [21], [22] with graph frequency decomposition, when the networks are constructed from correlations, and demonstrate that PCA for brain networks is similar to a specific type of graph frequency decomposition. To begin, given the observation matrix Y, PCA considers the covariance matrix Σ = cov(y) R n n with its entry Σ ij computing the covariance of observations y i and y j, and evaluates the expansion of observations onto the eigenvector sets {vk Σ} of Σ. Functional brain networks are constructed with the adjacency matrix A defined as { correlation coefficient of y i and y j, if i j A ij =. (15),, else Denote ˆΣ = corrcoef(y) R n n as the matrix representing the correlation coefficients between columns in Y, then A can also be Session 1 Session 2 Session 3 Session 4 MIN Sequences MOD Sequences EXT Sequences Fig. 2. Relationship between training duration, intensity, and depth for the first experimental framework. The values in the table denote the number of trials (i.e., depth ) of each sequence type (i.e., intensity ) completed after each scanning session (i.e., duration ) averaged over the 2 participants. expressed as A = ˆΣ I since the correlation coefficient between y i and itself is always 1 for any i. Moreover, because the variances σ(y i ) and σ(y j ) at different nodes are not highly distinguished for different nodes i and j in the observations, the covariance between y i and y j is not significantly different from their correlation coefficient. Consequently, we can write ˆΣ α 1 I Σ for some real number α 1 R. In graph signal processing, the network structure of interest is not the adjacency matrix A but rather the graph Laplacian L = D A. For functional brain networks, if the degree distribution is homogeneous, i.e. the entry D ii denoting the summation of correlations of observation y i at node i with observations y l at all other nodes l i being not highly different D jj for j i, we can write D α 2 I for some real number α 2 R. Although this is not always the case for functional brain networks, the degree distributions for most functional brain networks are not highly skewed. Therefore, combining the constructed relationships, we can write L (α 2 + 1) I α 1 Σ. (16) This implies that the set of graph eigenvectors {v k } of L is not highly different from the set of eigenvectors {vk Σ }. We emphasize that, however, the order of the eigenvectors in {v k } and {vk Σ} flips, i.e. the graph eigenvector v k associated with a low frequency corresponds to the eigenvector vk Σ associated with a high eigenvalue of Σ. Recall in PCA, given a graph signal x R n, one is interested in computing its expansion x, vk Σ in terms of the eigenvectors vk Σ associated with high eigenvalues, since these eigenvectors explain most of the variations of the observations. Therefore, PCA for funcitonal brain networks is similar to evaluating the graph frequency coefficient for low graph frequencies, i.e. focusing on the segment of the brain signal x that varies smoothly on the underlying network. Graph frequency decomposition provides a more general framework, since it enables shaping not only the low frequency but all graph frequencies through the introduction of graph filters. In this paper, we use notions of graph frequencies to study the graph spectral properties of brain networks as subjects master a new motor skill. Furthermore, we apply the frequency decomposition introduced in (14) to examine the adaptability of brain signals and their relationship to individual variability in learning rate. We begin by describing the task that subjects performed, and the neuroimaging data that was acquired. III. BRAIN SIGNALS DURING LEARNING We considered two experiments in which subjects learned a simple motor task [23] [27]. In the experiments, fourty-seven right-handed participants (29 female, 18 male; mean age years) volunteered with informed consent in accordance with the University of California, Santa Barbara Internal Review Board. After exclusions for task accuracy, incomplete scans, and abnormal MRI, 38 participants were retained for subsequent analysis.

5 5 Total Variation Total Variation Eigenvalue Index (a) Eigenvalue Index (c) Weighted Zero Crossings Weighted Zero Crossings Eigenvalue Index (b) Eigenvalue Index (d) Fig. 3. (a) Total variation TV(v k ) and (b) weighted zero crossing ZC(v k ) of the graph Laplacian eigenvectors for the brain networks averaged across participants in the 6 week training experiment. (c) and (d) present the values for the 3 day experiment. In both cases, the Laplacian eigenvectors associated with larger indexes (larger eigenvalues) vary more on the network and cross zero relatively more often, confirming the interpretation of the graph Laplacian eigenvalues as notions of frequencies. Twenty individuals participated in the first experimental framework. The experiment lasted 6 weeks, in which there were 4 scanning sessions, roughly at the start of the experiment, at the end of the 2nd week, at the end of the 4th week, and at the end of the experiment, respectively. During each scanning session, individuals performed a discrete sequence-production task in which they responded to sequentially presented stimuli with their dominant hand on a custom response box. Sequences were presented using a horizontal array of 5 square stimuli with the responses mapped from left to right such that the thumb corresponded to the leftmost stimulus. The next square in the sequence was highlighted immediately following each correct key press; the sequence was paused awaiting the depression of the appropriate key if an incorrect key was pressed. Each participant completed 6 different 1-element sequences. Each sequence consists of two squares per key. Participants performed the same sequences at home between each two adjacent scanning sessions, however, with different level of exposures for different sequence types. Therefore, the number of trials completed by the participants after the end of each scanning session depends on the sequence type. There are 3 different sequence types (MIN, MOD, EXT) with 2 sequences per type. The number of trials of each sequence type completed after each scanning session averaged over the 2 participants is summarized in Figure 2. Eighteen subjects participated in the second experimental framework. The experiment had 3 scanning sessions spanning the three days. Each scanning session lasted roughly 2 hours and no training was performed at home between adjacent scanning sessions. Subjects responded to a visually cued sequence by generating responses using the four fingers of their nondominant hand on a custom response box. Visual cues were presented as a series of musical notes on a pseudo-musical staff with four lines such that the top line of the staff mapped to the leftmost key pressed with the pinkie finger. Each 12-note sequence randomly ordered contained three notes per line. In both experiments participants were instructed to respond promptly and accurately. Repetitions (e.g., 11 ) and regularities such as trills (e.g., 121 ) and runs (e.g., 123 ) were excluded in all sequences. The order and number of sequence trials was identical for all participants. During scanning sessions, all participants completed the tasks inside the MRI scanner. Reordering with fmri were conducted using a 3. T Siemens Trio with a 12-channel phased-array head coil. For each functional run, a single-shot echo planar imaging sequence that is sensitive to blood oxygen level dependent (BOLD) contrast was utilized to obtain 37 (the first experiment) or 33 (the second experiment) slices (3mm thickness) per repetition time (TR), with a TR of 2, ms, an echo time of 3 ms, a flip angle of 9, a field of view of 192 mm, and a acquisition matrix. Image preprocessing was performed using the Oxford Center for Functional Magnetic Resonance Imaging of the Brain (FMRIB) Software Library (FSL), and motion correction was performed using FMRIB s linear image registration tool. The whole brain is parcellated into a set of n = 112 regions of interest that correspond to the 112 cortical and subcortical structures anatomically identified in FSL s Harvard-Oxford atlas. For each individual fmri dataset, we estimate regional mean BOLD time series by averaging voxel time series in each of the n regions. We evaluate the coherence or correlation between the activity of all possible pairs of regions to construct n n functional connectivity matrices A, whose individual elements are subjected to statistical testing. In the 3 day dataset, the value of all elements that do not pass the false discovery rate correction for multiple comparisons are set to zero; the values remain unchanged otherwise. In the 3 day experiment, a single brain network is constructed for each participant. In the 6 week experiment, due to the long duration of the experiment, we build a different brain network per scanning session per sequence type for each subject. We normalize the regional mean BOLD observations ˆx(t) at any time t and consider x(t) = ˆx(t)/ ˆx(t) 2 such that the total energy of activities at all structures is consistent at different t to avoid extreme spikes due to head motion or drift artifacts in fmri. IV. BRAIN NETWORK FREQUENCIES In this section, we examine the fundamental building block of graph spectral analysis of brain networks graph Laplacian frequencies associated with brain networks, and verify the interpretation of the notions and orders of graph frequencies. For the brain network A of each subject, we construct its Laplacian L = D A as in (1), and compute the eigenvectors {v k } n 1 k=. In Section II-A, we state that in the graph setting, the graph Laplacian eigenvectors provide a notion of frequency v k associated with smaller indexing number k represents graph low frequency and varies slowly across the network. To verify this explanation, we compute the total variation TV(v k ) [cf. (8)] for each eigenvector v k. Figure 3 (a) and (c) plot the total variation of all graph eigenvectors averaged across participants of the 6 week training experiment and 3 day experiment, respectively. In both experiments, the Laplacian eigenvectors associated with larger indexes and consequently larger eigenvalues fluctuate more on the network. This corroborates our claim that Laplacian eigenvalues offer a notion of graph frequencies. Besides total variation, number of zero crossings is used as a measure of the smoothness of signals with respect to an

6 6 Fig. 4. Absolute magnitude at each of the n cortical structures averaged across participants in the 6 week experiment and averaged across all frequency in (a) the set of low graph frequencies {v k } K L 1 k=, (b) the set of middle graph frequencies {v k} K L+K M 1 k=k, and (c) the set of high graph frequencies {v L k } n 1 k=k L +K. (d)-(f) M presents the average magnitudes for the 3 day experiment. Only brain regions with absolute magnitudes higher than a fixed threshold is colored. Magnitudes for high frequencies are highly akin across the datasets, and the brain regions with high magnitude values significantly overlap with the visual and sensorimotor modules. underlying network [16]. Since brain networks are weighted, we adapt a slightly modified version weighted zero crossings to investigate the given graph eigenvector v k ZC(v k ) = 1 w ij 1 {v k (i)v k (j) < }. (17) 2 i j In words, weighted zero crossings evaluate the weighted sum of the set of edges connecting a vertex with a positive signal to a vertex with a negative signal. The factor 1/2 is applied due to the fact that the graph is both symmetric and unweighted. Figure 3 (b) and (d) demonstrate the weighted zero crossings of all graph eigenvectors averaged across subjects of the 6 week and 3 day experiments, respectively. The weighted zero crossings increase proportionally with graph frequency index k for k 1, as expected. However, eigenvectors associated with higher graph frequencies exhibit lower weighted zero crossings when k is greater than 1. A detailed analysis shows that this is because the functional brain networks are highly connected with nearly homogeneous degree distribution, and consequently each high graph frequency tends to have a value with high magnitude at one vertex of high degree and similar values at other nodes, resulting in a smaller global zero crossings for eigenvectors associated with very high frequencies. It would be interesting to examine where the associated eigenvectors lie anatomically, and the relative strength of their values. To facilitate the presentation, we consider three sets of eigenvectors, {v k } KL 1 k=, {v k} KL+KM 1 k=k L and {v k } n 1 k=k L+K M, and compute the absolute magnitude at each of the n cortical and subcortical regions averaged across participants and across all graph frequencies belonging to each of the three sets. Figure 4 presents the average magnitudes for the two experimental frameworks considered in the paper using BrainNet [28], where brain regions with absolute magnitudes lower than a fixed threshold is not colored. The most interesting finding relates to the eigenvectors associated with high graph frequencies. The magnitudes at different brain regions for high frequencies are significantly similar across the two datasets investigated, and brain regions with high magnitude values are highly alike to the visual and sensorimotor cortices [29]. This is likely to be a consequence of the fact that visual and motor regions are more strongly connected with other structures, and hence an eigenvector with a high magnitude on visual or motor structures would result in high global spatial variation. The eigenvectors of low graph frequencies are more spread across the networks, resulting in low global variations. The middle graph frequencies are less interesting the magnitudes at most regions do not pass the threshold, and little associations can be found between the eigenvectors of the 6 week and 3 day experiments. A. Artificial Functional Brain Networks An approach to analyze the complex networks that are not well fit by a few graph generation or growth parameters [3], [31] is to define a model to generate artificial networks. Examples of such models include Erdős-Rényi model [32] for unweighted networks, Barabási-Albert model for scale free network [33], stochastic block model for community formation using unweighted networks [34], and recent development and insights on weighted network models [31]. Here we present a framework to construct artificial networks that can be used to mimic the functional brain networks with only a few parameter inputs. The desired output of the

7 7 method would be a symmetric network with edge weights between and 1 and without self-loops. To begin, suppose the desired network has two different clusters of nodes V 1 and V o. The algorithm requires the average edge weight µ 1 for connections between nodes of the first cluster V 1, average edge weight µ o for links between vertices of the other cluster V o, and average edge weight µ 1o for inter-cluster connections. To reflect the fact that the edge weights on some links are independent of their joining vertices, for each edge within V 1, with probability p ɛ < 1, its weight is randomly generated with respect to uniform distribution U[, 1] between and 1. For each edge within V 1, with probability 1 p ɛ, its weight is randomly generated with respect to uniform distribution U[µ 1 δ, µ 1 + δ]. The parameter δ is selected such that the random variable has variance ˆσ 2, the desired variance in the brain networks. We also constructed networks such that edges for different types of connections are randomly generated with different δ, resulting in different variance values. The networks exhibit negligible changes in their network properties, and for simplicity, we use a constant variance for all kinds of connections. We also investigated the construction of networks using random variables with respect to other commonly used distributions, e.g. Gaussian distribution and Laplacian distribution. We found that the uniform distribution is able to imitate the networks most closely, and for this reason we use uniform distribution here. The parameter p ɛ determines the percentage of edges whose weights are selected irrespective of their actual locations, and we notice that a typical value between.25 to.1 would be able to highly imitate functional brain networks. To further simulate the observation that different participants may possess distinctive brain networks, if the edge weight is randomly generated from a uniform distribution U[µ 1 δ, µ 1 + δ], it is then perturbed by w u U[ u ɛ /2, u ɛ /2] where u ɛ controls the level of perturbation, and a typical value of u ɛ highly mimicking the brain networks is between.5 to.15. The edge weights for connections within cluster U o are generated similarly, such that with probability 1 p ɛ the edge weight is randomly chosen with respect to the uniform distribution U[µ o δ, µ o + δ] before being contaminated by w u U[ u ɛ /2, u ɛ /2]. The edge weights for connections between clusters U 1 and U o are formed in analogy using µ 1o. The method presented here can be easily generalized to analyze brain networks with more regions of interest, i.e. by specifying sets of regions of interest and by detailing the expected correlation values on each type of connection between different regions. Remark 1 At one extreme we can make each node i belonging to a different set V i = {i}. Then the method requires the inputs of expected edge weights between any pairs of nodes, or alternatively speaking, the expected network. At another extreme, there is only one set of nodes V, and then the method is highly akin to a network with edge weights completely randomly generated. Any construction of interest would have some prior knowledge regarding the community structure something between the aforementioned two extremes. Therefore, the method proposed here can be used to see if the network constructed with the specific choice of community structure highly simulate the key properties of the actual network, and can be used to examine the evolution of community structure in the brain with stimuli or throughout the process to master a particular task. B. Spectral Properties of Brain Networks In this section, we analyze the graph spectral properties of brain networks. To that end, given the graph Laplacian of an underlying graph structure, we examine the level of fluctuation of its eigenvectors on different types of connections in the brain network. Different sets of brain regions in the network are defined as modules [23]. More specifically, given an eigenvector v k, its variation on the visual module is defined as TV visual (v k ) = i,j V w v,i j ij(v k (i) v k (j)) 2 i,j V w, (18) v,i j ij where V v denotes the set of nodes belonging to the visual module. Because w ij (v k (i) v k (j)) 2 is the signal difference on connections between nodes i and j within the visual module weighted by the edge weight w ij, TV visual (v k ) computes the difference for signals on the visual module for each unit of edge weight. There are n eigenvectors in total and to facilitate interpretation, we only consider three sets of eigenvectors {v k } KL 1 k=, {v k} KL+KM 1, and k=k L {v k } n 1 k=k L+K M corresponding to low graph frequency, middle frequency, and high frequency, respectively. We then compute the visual module total variation TV L visual averaged over eigenvectors {v k } KL 1 k=, and TVM visual as well as TV H visual similarly. Besides TV visual, we also examine the level of fluctuation of eigenvectors on edges uniquely within the motor module, denoted as TV motor, and on connections uniquely belonging to brain modules other than the visual and motor module, i.e. TV others. Further, there are links between two separate brain modules, and to assess the variation of eigenvectors on those links, we define total variations between the visual and motor modules TV visual-motor (v k ) = i V v,j V m w ij (v k (i) v k (j)) 2 i V v,j V m w ij, (19) where V m denotes the set of nodes belonging to the motor module. Total variations TV visual-others between the visual and other modules as well as total variations TV motor-others between the motor and other modules are defined analogously. We chose to study visual and motor modules separately from other brain modules because of their well-known associations with motor learning task [23], [35]. Figure 5 (a) presents boxplots of the variation for eigenvectors of different graph frequencies measured over different types of connections across participants, at the start of the six week training. Despite the fact that total variation of eigenvectors is identical to their corresponding eigenvalues (cf. (8)), and therefore should be higher for eigenvectors associated with higher frequencies, the variation on the other module TV L others of eigenvectors associated with low graph frequencies is higher than TV H others the variation on the other module of eigenvectors associated with high frequencies. This implies that the eigenvectors of low graph frequencies exhibit highly similar graph signals for brain regions within the visual and motor modules, and the growth in variation of eigenvectors as the associated frequency increases is purely due to the increment of variation in visual or motor modules. Put differently, low graph frequency components are expected to vary little, which is the case for visual and motor modules, and the signals fluctuating significantly are more likely to be noise the scenario for the other module. Similar observations hold true for high frequency components. To better interpret the meaning of variations for specific types of connections, we construct artificial networks as described in

8 8 Fig. 5. Spectral property of brain networks in the 6 week experiment. (a) Left: Averaged total variation of eigenvectors v k for 6 different types of connections of the brain averaged across all eigenvectors associated with low graph frequencies v k {v k } K L 1 k=, across all participants and scan sessions. Middle: Across all eigenvectors with middle graph frequencies v k {v k } K L+K M 1 k=k. Right: Across all eigenvectors with high graph frequencies v L k {v k } n 1 k=k L +K. (b) Median M total variations of brain networks across participants of different scanning sessions and different sequence types with respect to the level of exposure of participants to the sequence type at the scanning session. Relationship between training duration, intensity, and depth is summarized in Figure 2. Value of in the x-axis in the figures refers to minimum exposure to sequences (all 3 sequence types of Session 1), and value of 1 in the x-axis denotes the maximum exposure to sequences (EXT sequence types of Session 4). An association between spectral property of brain networks and the level of exposure is clearly observed. (c) Median total variations evaluated upon artificial networks. Spectral properties of actual brain networks can be closely simulated using a few parameters. Section IV-A with visual and motor modules as regions of interest, and consider other modules to be brain regions other than visual and motor modules. We observe that there are three contributing factors that cause the variation within a specific module to become higher for higher eigenvectors and to become lower for lower eigenvectors, i.e. more dominant of the variation of eigenvectors associated with high graph frequencies on the entire brain: (i) Increases in the average edge weight for connections within the module, (ii) Increments in the average edge weight for links between this module to other module, and (iii) Escalation in the average edge weight for associations within the other module. This can also be observed by analyzing closely the definition of total variation. If a module is highly connected, in order for the eigenvector associated with a low graph frequency to be smooth on the entire network, it has to be smooth on the specific module, resulting in a low value in the variation of an eigenvector associated with a low graph frequency with respect to the module of interest. Similarly, the increase in the variation of connections between two modules, e.g. between visual and other modules are resulted from: (i) The growth in the average edge weight for connections between visual and other modules, or (ii) The augmentation of average weight for links within the other module. We notice that the increment in the average weight for relations within the visual module does not contribute significantly, largely due to the fact that the number of nodes in the other module is much larger than that in the visual module. Therefore, given a specific brain module, the level of variation of the eigenvector associated with certain graph frequencies (how high for high graph frequencies and how low for low graph frequencies), can be interpreted as a measure of how strongly associated is the

9 9 module of interest to modules that are connected with high edge weights either the module of interest is highly connected itself, or it is connected with another brain module that is strongly associated. The graph spectral properties as in Figure 5 (a) are observed because (i) visual and motor modules are themselves highly connected, and (ii) visual module is also strongly linked with motor module. Next we study how the graph spectral properties of brain networks evolve as participants become more exposed and familiar with the tasks. Figure 5 (b) illustrates the median of the variation for eigenvectors of different graph frequencies measured over different types of connections across subjects, at 1 different levels of exposure in the process of six week training. As participants become more acquainted with the assignment, their brain networks display lower variation in the visual and motor modules and higher variation in the other modules for low and middle graph frequencies, and the exact opposite is true for high graph frequencies. The association with training intensity is statistically significant (average Pearson correlation coefficient r =.8164, average p =.38). Following the interpretations based on artificial network analysis, this evolution in graph spectral properties of brain networks is mainly caused by the decrease in values of connections within visual and motor modules and between the visual and motor modules. An interesting observation is that the values in the variation of eigenvectors associated with high frequencies decline with respect to the visual module much faster than that of motor module, even though the visual module is more strongly connected throughout training compared to the motor module. A deep analysis using artificial networks shows that this results from the following three factors: (i) Though more strongly connected compared to motor module, connections within visual module weaken very quickly, (ii) Motor module is more closely connected with other module than the link between visual module to other module, and (iii) Association levels within other module stay relatively constant. Therefore, as participants become more exposed to the tasks, compared to visual module, the motor module becomes more strongly connected with modules that are themselves more associated. The graph spectral properties of actual brain networks and their evolution can be closely imitated using artificial networks, as plotted in Figure 5 (c). We also used artificial networks with edge weights randomly generated with Gaussian distributions. The graph spectral properties of actual brain networks cannot be well resembled even with more parameters to tune in the constructions. The analysis for the three day training dataset is highly similar and for this reason we do not present and analyze it separately here. V. FREQUENCY DECOMPOSITION OF BRAIN SIGNALS The previous sections focus on the study of brain networks and their graph spectral properties. In this section, we investigate brain signals from a graph signal processing perspective, and analyze the brain signals by examining the decomposed graph signals x L, x M, and x H with respect to the underlying brain networks. We note that all the decomposed signals are signals of the same underlying network and therefore can be visualized on the brain. To facilitate interpretation, we compute the absolute magnitude of the decomposed signal x L for each brain region averaged across all sample signals for each individual during a scan session and then averaged across all participants. Similar aggregation is applied for x M and x H. Figure 6 presents the distribution of the decomposed signals corresponding to different levels of spatial variations on the brain for the first scan session (top row) and the last scan session (bottom row) in the 6 week experiment. Figure 7 exhibits how the decomposed signals are distributed across brain regions in the 3 day experiment. Brain regions with absolute magnitudes lower than a fixed threshold are not colored. We note the difference between Figure 4 with Figures 6 and 7. In Figure 4, colors on brain regions denote magnitudes of the associated eigenvectors we purely examine the property of brain networks. In Figures 6 and 7, colors on brain regions represent magnitudes of the respective decomposed signals our interest is signals defined on the brain network. These two are related since networks are constructed using correlation or coherence of brain activities. However, they are conceptually different. A deep analysis yields many interesting aspects of graph frequency decomposition. First, for x L, the magnitudes on adjacent brain regions tend to possess highly similar values, resulting in a more evenly spread brain signal distribution, where as for x H, neighboring signals can exhibit highly dissimilar values; this corroborates the motivation to use graph frequency decomposition to segment brain signals into pieces corresponding to different levels of spatial fluctuations. Second, decomposed signal of a specific level of variations, e.g. x L, are highly similar with respect to different scan sessions in an experiment as well as with respect to two experiments with different sets of participants. This reflects the fact that frequency decomposition is formed by applying graph filters with different pass bands upon signals and therefore should express some consistent aspects of brain signals. Third, recall that we normalize the brain signals at every sample point for all subjects, and for this reason signals x L, x M and x H would be similarly distributed across the brain if nothing interesting happens at the decomposition. However, in both Figures 6 and 7, it is observed that many brain regions possess magnitudes higher than a threshold in x L and x H while not many brain regions pass the thresholding with respect to x M. It has long been understood that the brain is a complex system combining some degree of disorganized behavior with some degree of regularity and that the complexity of a system is high when order and disorder coexist [36], [37]. x L varies smoothly across the brain network and therefore can be regarded as regularity (order), whereas x H fluctuates vibrantly and consequently can be considered as randomness (disorder). This evokes the anticipation that graph frequency decomposition segments a brain signal x into pieces x L and x H which reflect order and disorder and are therefore more interesting, as well as the remaining piece x M. In the next two sections, we follow this anticipation and demonstrate that x L and x H display more adaptability and more obvious association with learning rate across training. A. More Active Adaptation of Low and High Graph Frequency Components We start by analyzing the adaptation of decomposed signals with respect to different levels of spatial variations. To that end, we evaluate the variance of the decomposed signals over multiple temporal scales over days and minutes for the two experiments [38], [39]. We describe the method specifically for x L for simplicity and similar computations were conducted for

the first scan session of the 6 week dataset. (b) With respect to x M and (c) with respect to x H signals fluctuating vibrantly across the brain.

10 1 Fig. 6. Distribution of decomposed signals for the 6 week experiment. (a) Absolute magnitudes for all brain regions with respect to x L brain signals varing smoothly across the network averaged across all sample points for each individual and across all participants at the first scan session of the 6 week dataset. (b) With respect to x M and (c) with respect to x H signals fluctuating vibrantly across the brain. (d), (e), and (f) are averaged x L, x M and x H at the last scan session of the 6 week dataset, respectively. Only regions with absolute magnitudes higher than a fixed threshold is colored. Fig. 7. Distribution of decomposed signals for the 3 day experiment. (a), (b), and (c) are x L, x M and x H averaged across all sample points for each subject and across participants in the 3 day experiment, respectively. Regions with absolute value less than a threshold are not colored. x M and x H. At the macro or large timescale, we average the decomposed signals x L for all sample points within each scanning session with different sequence type, and evaluate the variance of the magnitudes of the averaged signals across all the scanning sessions and sequence types [4], [41]. For the 6 week experiment, there are 4 scanning sessions and 3 different sequence types, so the variance is with respect to 12 points. For the 3 day experiment, there are 3 scanning sessions and only 1 sequence type, and therefore the variance is for 3 points. As for the micro or minutescale, we average the decomposed signals x L for all sample points within each minute, and evaluate the variance of the magnitudes of the averaged signals across all minute windows for each scanning session with different sequence types. The evaluated variance is then averaged across all participants of the experiment of interest. Figure 8 displays the variance of the decomposed signals x L, x M and x H at two different temporal scales of the two experiments. For the 6 week dataset, 3 session-sequence combinations, with number proportional to the level of exposure of participants to the sequence (1-MIN refers to MIN sequence at session 1, 5 denotes MIN sequence at session 4, 9 entails EXT sequence at session 3) are selected out of the 12 combinations in total for a cleaner illustration, but all the other session-sequence combinations exhibit similar properties. Following the definition of frequency decomposition as in (14), it is expected for the low

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory Tim Roughgarden & Gregory Valiant May 2, 2016 Spectral graph theory is the powerful and beautiful theory that arises from