SIGNAL PROCESSING INSPIRED GRAPH THEORETIC METHODS FOR UNDERSTANDING FUNCTIONAL CONNECTIVITY OF THE BRAIN. Marcos Efren Bolaños

Size: px

Start display at page:

Download "SIGNAL PROCESSING INSPIRED GRAPH THEORETIC METHODS FOR UNDERSTANDING FUNCTIONAL CONNECTIVITY OF THE BRAIN. Marcos Efren Bolaños"

Jade Tucker
6 years ago
Views:

1 SIGNAL PROCESSING INSPIRED GRAPH THEORETIC METHODS FOR UNDERSTANDING FUNCTIONAL CONNECTIVITY OF THE BRAIN By Marcos Efren Bolaños A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Electrical Engineering 2012

2 ABSTRACT SIGNAL PROCESSING INSPIRED GRAPH THEORETIC METHODS FOR UNDERSTANDING FUNCTIONAL CONNECTIVITY OF THE BRAIN By Marcos Efren Bolaños Functional brain networks underlying cognitive control processes have been of central interest in neuroscience. A great deal of empirical and theoretical work now suggests that frontal networks in particular the medial prefrontal cortex (mpfc) and lateral prefrontal cortex (lpfc) are involved in cognitive control. The most common way to study functional brain networks has been through measures of connectivity such as coherence, synchrony and mutual information. However, it has been noted that functional connectivity measures are limited to quantifying pairwise relationships between brain regions and do not describe the overall organization of the brain network. Recently, researchers have adapted tools from graph theory to address this issue. Graph theory can model a network by a set of vertices and edges upon which complex network analysis may be applied. With respect to the functional brain network, the vertices represent the individual neural assemblies and the edges are weighted by their pair-wise phase synchrony. Most graph theoretic measures, however, are limited to sparsely connected unweighted graphs. Therefore, some of the existing graph measures cannot be directly applied to the fully connected weighted graphs. In this thesis, existing graph measures and graph theoretic approaches are modified specifically for the analysis of the functional brain network. First, new weighted clustering coefficient and path length measures are introduced for quantifying the local weighted small-world index of the brain. These measures are based on modeling the edge weights as probabilities which represent the reliability of information flowing across these edges. These measures differ from conventional measures by considering all possible connections with varying strengths of connectivity and do not

3 require arbitrary thresholding of the weighted connectivity matrix, i.e. they can be applied directly to a fully connected weighted graph. Next, concepts from signal processing are adapted to graphs to identify central vertices and anomalies within a network. These measures include new graph energy and entropy measures for graphs. The proposed graph energy measure outperforms existing definitions of graph energy for local anomaly detection because it is computed from the most relevant spectral content extracted from the graph s Laplacian matrix. A new definition of entropy rate based on modeling the adjacency matrix of a graph as a Markov process is introduced to quantify the local complexity of a weighted graph. Finally, we introduce a hierarchical consensus clustering algorithm that uses the well-known Fiedler vector to reveal a hierarchical structure of the brain network across various modular resolutions. The proposed methods are applied to error-related negativity (ERN) data, a response-locked negative deflection of the brain event-related potential observed following errors in performance tasks. Previous research shows that the primary neural generator of the ERN is the anterior cingulate cortex (ACC) and there is significant difference in connectivity patterns between mpfc and lpfc for error and correct responses. The proposed graph theoretic approaches give a succinct representation of the functional networks involved during action-monitoring and cognitive control and provide insight into the reorganization of the neural networks during error processing. The small-world measures reveal there is increased local functional segregation and integration among electrodes in the mpfc and lpfc during error responses compared to correct responses. Also, the mpfc region of the brain network demonstrated increased energy and complexity indicating the presence of an anomalous perturbation located around the FCz. Finally, the hierarchical consensus clustering algorithm revealed an increase in modularity across the mpfc during error responses indicating a reorganization of the underlying functional network.

4 ACKNOWLEDGMENT I want to thank my family back in El Paso, Texas for their moral support while I pursued my Ph.D. so far from home. I missed them so much during my time here and I hope to continue making them proud. My brothers, Andres and Efren, and my parents, Ofelia and Arturo, have kept me grounded and helped me remember what the important things in life really are. I love you all so very much and I owe all my success to you. I especially want to thank Whitney Rantz for being my rock as I progressed through this challenging journey. You are a remarkable woman with such a great heart and I thank God for bringing you into my life. Thank you for all your encouragement and love. Thank you to Mark, Sue, and Kayla Rantz for making me feel like a part of their family these past few years and helping me feel at home. Also, thank you so much for purchasing my Ph.D. regalia. It is one of the most generous gifts anybody has ever given me and I will always be very grateful. Many thanks to Milton Martinez and Juan Nava for being awesome friends. It is really a blessing that the three of us El Paso guys ended up in the same state. You guys will always be brothers to me. I would also like to express my deep-felt gratitude to my advisor, Dr. Selin Aviyente, for her advice and encouragement. She was never ceasing in her belief in me and constantly pushing me to bring out the best in my work. Also, many thanks to my lab partners Ali Mutlu, Ying Liu, and Suhaily Cardona. I m glad we were able to share the frustrations and celebrations of research work together. I also wish to thank the other members of my committee, Dr. Rama, Dr. Chakrabartty, and Dr. Esfahanian. Their suggestions, comments and additional guidance were invaluable to the completion of this work. As a special note, Dr. Percy Pierre was a driving force in my success through his help in obtaining a graduate fellowship to pay for my studies, representing me in front iv

5 of the qualifier exam committee, and offering a word of advice when I needed it. I also want to extend my thanks to Dr. Barbara O Kelly for personally calling me and inviting me to visit Michigan State as well as for making my transition into the Ph.D. program a smooth experience. v

6 TABLE OF CONTENTS List of Tables ix List of Figures xi Chapter Introduction Background Physiology of the Human Brain The Electroencephalogram (EEG) Measures of Functional Connectivity Graph Theory Graph Topologies Lattice Graphs Random Graphs Star Graphs Scale-Free Networks Benchmark Data Simulated Data Real-World Networks Biological Data Thesis Outline Chapter 2 Small-World Type Measures for Weighted Graphs Introduction Clustering Coefficient Conventional Clustering Coefficient Existing Definitions for Weighted Clustering Coefficient Newly Proposed Weighted Clustering Coefficient A Random Walk Based Clustering Coefficient Measure for Binary Networks Characteristic Path Length Conventional Characteristic Path Length Existing Definition for Weighted Characteristic Path Length Newly Proposed Weighted Characteristic Path Length vi

7 2.3.4 A Random Walk Based Characteristic Path Length Measure for Binary Networks Small-World Measure of a Network Conventional Small-World Measure Newly Proposed Weighted Small-World Mapping of Weighted Graphs to Binary Graphs Existing Approaches for Transforming Weighted Graphs to Binary Graphs Weighted to Binary Graph Transformation: Alternative to Matrix Thresholding Results Random Walk Based Clustering Coefficient/Path Length Measures for Binary Benchmark Networks Weighted Graph Measures on Benchmark Data Weighted Graph Measures on ERN EEG Data Bivariate Phase-Synchrony Measures Graph Measures Relationship between Graph and Bivariate Measures Optimal Transformation From Weighted to Binary Matrix: ERN/EEG Data Conclusions Chapter 3 Local Feature Extraction from Weighted Graphs Introduction Existing Feature Measures for Graphs Hub Centrality Provincial and Connector Hubs Graph Energy Existing Measures of Graph Energy Laplacian Hückel Graph Energy Localized Laplacian Hückel Graph Energy Graph Entropy Existing Measures of Graph Entropy Proposed Graph Entropy Rate Matrix Bandwidth Minimization Scanning Function of the Permuted Matrix Quantization of the Weighted Connectivity Matrix Localized Graph Entropy Rate Results Comparison Between Energy Measures Laplacian-Hückel Energy of a Biological Network: EEG ERN data Comparison Between Local Graph Entropy Measures Entropy Rate Across Binary Graph Topologies Entropy with Respect to Permutation and Scanning Function Graph Entropy with Respect to Order of the Markov Process Entropy Rate of a Biological Network: EEG ERN data vii

8 3.6 Conclusions Chapter 4 Hierarchical Multi-Subject Community Detection for Weighted Neuronal Networks Introduction Background on Community Detection on Graphs Community Detection on Graphs Girvan-Newman Spectral Clustering Consensus Clustering Quality Measures for Community Structures Hierarchical Modular Detection Algorithms for Weighted Neuronal Networks Overlapping Community Structures in Weighted Networks Hierarchical Spectral Partitioning Algorithm for Identifying a Common Community Structure among Multiple Subjects New Weighted Graph Clustering Quality Measures Comparing Graph Partitions to Multiple Random Networks Homogeneity and Completeness Results Overlapping Clusters in the Brain Network During the Error-Related Negativity Hierarchical Spectral Consensus Clustering Analysis Comparisons with Averaging and Conventional Voting Consensus Measures Hierarchical Modular Structure of the Brain During the Error- Related Negativity Inter and Intra Edge Phase Synchrony Local Hubs of the Brain Network Conclusions Chapter 5 Summary and Future Work Summary Future Work Bibliography viii

9 LIST OF TABLES Table 1.1 Basic graph measures for undirected binary and weighted graphs Table 2.1 Comparisons between the proposed and existing weighted clustering coefficients for different variations of a triplet Table 2.2 Correlations between multivariate small world and bivariate (FCz-F5/FCz- F6) measures for both error and correct responses individually and the error-correct differences (all correlations significant; p<0.001) Table 3.1 Highest energy contributing nodes in karate network compared with their corresponding clustering coefficient and betweenness centrality Table 3.2 Entropy computed for various graph models using the proposed measure. Fifty realizations of each model were generated and an average entropy is reported. Each value is normalized by that of a random network of equivalent order Table 3.3 Graph entropy computed for various graph models using Burda s definition. Fifty realizations of each model were generated and an average entropy is reported. Each value is normalized by that of a random network of equivalent order Table 3.4 Entropy rates for three real scale-free networks using the proposed measure and Burda s measures. All measures are normalized by the entropy rate of random networks with equivalent order Table 3.5 Bits/symbol achieved with Lempel-Ziv algorithm Table 3.6 Bits/symbol achieved using Lempel-Ziv algorithm for four real scale-free networks ix

10 Table 4.1 Table 4.2 Top 3 electrodes with largest differences between Correct and Error participation scores Number of modules identified through optimization of Q and λ in simulated data x

11 LIST OF FIGURES Figure 1.1 Lobes of the cerebral cortex ( For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation Figure /20 EEG electrode arrangement ( 10 Figure 1.3 Example of an undirected 6 node graph Figure 1.4 Zachary Karate Club Network Figure 2.1 Figure 2.2 Figure 2.3 Nodes A and B have similar path length, however, connectivity in (b) is less robust than in (a). In (b), connectivity between nodes A and C is more robust than connectivity between nodes A and B; despite A and C having a longer path length than A and B Markov small-world measures averaged over 100 benchmark networks of order N = 50. a) The proposed binary clustering coefficients of the simulated small-world networks are much greater than that of random networks, i.e. γ 1; b)the proposed binary path lengths of the simulated small-world networks are similar to that of random networks, i.e. λ Markov and conventional clustering coefficients applied to binary karate network Figure 2.4 Markov and conventional path lengths applied to binary karate network.. 56 Figure 2.5 Figure 2.6 Nodal weighted clustering coefficients computed using Onnela s clustering coefficient and the proposed clustering coefficient for Zachary s Karate club network Nodal weighted path lengths computed using Stam s path length and the proposed path length for Zachary s Karate club network xi

12 Figure 2.7 Connectivity plot based on the RID-TFPS measure for electrode pairs whose phase synchrony is significantly larger for error responses compared to correct responses (Wilcoxon rank sum, p 0.001) Figure 2.8 Topomap of the differences in the nodal weighted clustering coefficient, path length and small world parameter for error and correct responses. These differences are computed for both the previously published and proposed measures. Topographic significance plots are provided for each measure Figure 2.9 Optimal binary graph generated from weighted connectivity matrices for error responses. Hubs of the network are indicated in color. Hubs are identified based on the number of shortest paths in which each node participates. Green indicates the most important local hubs, blue indicates nodes globally important for connecting distant portions of the graph, and red is the FCz node (also an important local hub) Figure 2.10 Optimal binary graphs generated from weighted connectivity matrices for correct responses. Hubs of the network are indicated in color. Hubs are identified based on the number of shortest paths in which each node participates. Green indicates the most important local hubs, blue indicates nodes globally important for connecting distant portions of the graph, and red is the FCz node (also an important local hub) Figure 2.11 Comparisons between small world calculations using the conventional thresholding method and the propose optimal binary transformation approach Figure 3.1 Local energy indices for a n = 34 karate club network: 1-hop Energy Analysis Figure 3.2 Local energy indices for a n = 34 karate club network: 2-hop Energy Analysis Figure 3.3 Local energy indices for a n = 34 karate club network: 3-hop Energy Analysis Figure hop neighborhood Energy differences (Error - Correct) Figure hop neighborhood Energy differences (Error - Correct) xii

13 Figure 3.6 Figure 3.7 Figure 3.8 Figure 3.9 Figure 3.10 Figure 3.11 Figure 3.12 Figure 3.13 Figure 3.14 Figure 3.15 Figure 3.16 Figure 3.17 Comparisons between nodal graph entropy measures for identifying anomalous regions in a simulated network with n = 60 nodes. The network is composed of four complete graphs and a random graph performing as a perturbation in the network Comparisons between nodal graph entropy rate measures for identifying important regions in the karate club network with n = 34 nodes Average graph entropy, computed over 50 simulations of each network type, with respect to various permutation and scanning functions across graph topologies each with 128 nodes. a) Random networks; b) Modular networks Average graph entropy, computed over 50 simulations of each network type, with respect to various permutation and scanning functions across graph topologies each with 128 nodes. a) Scale-Free networks; b) Star networks Average graph entropy, computed over 50 simulations of each network type, with respect to various permutation and scanning functions across graph topologies each with 128 nodes (Lattice networks) Effects of order value on graph entropy measures for various graph topologies Rate/Distortion curves average and standard deviation across all subjects for each response type: a) error; b) correct Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 2 levels Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 3 levels Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 4 levels Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 5 levels Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 6 levels xiii

14 Figure 3.18 Figure 3.19 Figure 3.20 Figure 3.21 Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 7 levels Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 8 levels Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 9 levels Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 10 levels Figure 4.1 Weighted matrix to adjacency matrix for a given threshold Figure 4.2 Participation Scores of various clustering derived from various thresholds of the network Figure 4.3 Average Participation scores in terms of network degree Figure 4.4 Figure 4.5 Figure 4.6 Variations in homogeneity and completeness where the true number of modules is 3; a) High homogeneity and low completeness; b) High completeness and low homogeneity; c) High homogeneity and completeness Modularity with respect to degree and number of clusters for Correct responses Modularity with respect to degree and number of clusters for Error responses Figure 4.7 Six overlapping clusters derived for Correct responses Figure 4.8 Six overlapping clusters derived for Error responses Figure 4.9 Hierarchical decomposition of the brain network during an Error response. 151 Figure 4.10 Hierarchical decomposition of the brain network during a Correct response.152 Figure 4.11 Significance cluster plots for Error responses Figure 4.12 Significance cluster plots for Correct responses xiv

15 Figure 4.13 Cross-Module average synchrony for Error responses across four clusters. 155 Figure 4.14 Cross-Module average synchrony for Correct responses across 2 clusters. 156 Figure 4.15 Normalized differences between Hub Identification Measures between Error and Correct a) Within Module Z-Score; b) Participation Coefficient. 157 xv

16 Chapter Introduction To this day, the human brain remains one of the most intriguing objects of interest in the scientific community. It is a complex network composed of billions of neurons and is capable of performing parallel information processing millions of times faster and more efficiently than any supercomputer in current existence. In contrast to man made systems, its performance tends to degrade gracefully under partial damage and can learn to reorganize itself to bypass damaged areas. Furthermore, the brain supports intelligence and self-awareness simultaneously. The dynamics of communication among distributed regions responsible for these remarkable characteristics of the brain are not yet fully understood by the neuroscience community. Experimental approaches to human cognition have been improved with the development of advanced functional neuroimaging techniques, however, interpretations of activation patterns is limited due to a lack of information pertaining to the nature of structure-function mappings of the brain. As a response to this problem, the human connectome project was initiated in July 2009 and is sponsored by sixteen components of the National Institutes of Health (NIH) [1]. The goals of this five year project are to increase the understanding of how functional brain states emerge from their underlying structural substrate, provide new mechanistic insights into how brain function 1

17 is affected if this structural substrate is disrupted, and allow connectome data and research to be easily accessible by the entire scientific community [2]. Ultimately, the project will facilitate the advancement of imaging and analyzing brain connections, resulting in improved sensitivity, resolution, and utility, thereby accelerating progress in the emerging field of human connectomics [3], the application of neural imaging techniques in order to increase the resolution of maps of the brain s neural connections. This dissertation focuses on the connectome goal of understanding how function is associated with an underlying structure using advanced neuroimaging techniques. Neuroimaging techniques such as fmri, electroencephalography, and magnetoencepholography make it possible to study brain activity with high spatial and temporal resolution [4] and better understand the processes underlying human cognition. However, interpretation of this data is limited due to the lack of understanding pertaining to the organizational structure of the brain. The brain is theoretically believed to function according to two primary organizational principles: functional segregation and functional integration [5]. Functional segregation describes the tendency of neural assemblies to merge together into specialized groups segregated across the cortical regions of the brain. Functional integration describes the coordinated activation of these segregated groups which enables cognitive processes and motor responses. The interplay of these principles is thought to contribute to the complexity of the brain network. Currently, three types of networks are recognized for forming the overall brain network: the anatomical connectivity network (physical links between neural assemblies), functional connectivity network (statistical dependencies between neural assemblies), and effective connectivity network (causal interactions between neural assemblies). Functional connectivity can be quantified using neuroimaging techniques, which can record data from the macroscale anatomical regions of the brain, combined with statistical measures of dependency that 2

18 quantify the relationships between distributed regions of the brain. Functional connectivity is defined as the temporal correlations between spatially remote neurophysiological events [6] and is the key to understanding how the coordinated and integrated activity of the human brain takes place [7]. Depending on the measure used to quantify it, functional connectivity may reflect linear or nonlinear interactions [8] or interactions observed at various time scales [9]. The literature has demonstrated that local and long range synchronization of oscillatory neural activity facilitates in the exchange of information throughout the brain. The strength of these interactions can quantify pair-wise connectivity in the brain, i.e. functional connectivity. Unfortunately, functional connectivity does not describe a link between function and organization of the brain but only describes the pair-wise associations of neural populations. The challenge is to define methods which reveal the underlying complex network associated with the observed pairwise connectivity patterns. Recently, graph theoretical methods have been implemented to address this problem. Graph theory describes the important properties of complex systems by quantifying topologies of their respective network representations [8]. It has been used for analyzing social networks, the internet, power grids, protein-protein networks, and more recently the brain [10, 11, 12, 7]. Using graph theory, He et al. [13] showed that Alzheimer s disease patients had abnormal topological organization in the whole-brain structural networks and decrease of hubness occurred in the temporal and parietal regions. These changes suggest a shift to more local processing and a disrupted structural integrity of the larger-scale brain systems [14]. Bassett et al. [15] reported that schizophrenia patients are associated with abnormal topology in the multimodal brain network characterized by a reduced organizational hierarchy, an increased distance between network nodes, and a general loss of hubness in the frontal lobe. In an EEG study on epilepsy, Dellen et. al. 3

19 [16] reports local graph measures, describing temporal lobe functional network connectivity, are lower and correlated with a more random network in patients with longer histories of epilepsy. Finally, Leistedt et. al. [17] showed decreases in small-world connectivity, associated with problems in information processing across the brain s functional network, may be indicative of depressive disease. Although these studies have demonstrated the usefulness of graph theory for revealing functional networks from neurimaging data, the implementation of graph theory is limited. The reason is that these measures were originally employed as modeling tools for random graphs and graphs whose properties are well understood, e.g. bipartite, tree, planar, cyclic [18]. However, the properties observed in these well-understood graphs are rarely encountered in real world networks. This naturally poses the challenge of developing biologically meaningful measures for the topologies observed in biological networks. In the application of graph theoretic measures to the study of functional brain networks, some problems are encountered. First, most graph theoretical tools are only applicable to binary networks in which edges are unweighted unlike the case of functional connectivity where a weight is assigned to all node pairs. Second, graph theory measures are typically applied to sparse networks whereas functional connectivity matrices are fully connected. Typically, the weighted information is thresholded to obtain a binary network, thus discarding important information contained by the edge weights. There is also a lack of a sufficient set of local measures to describe local connectivity. Most measures characterize the network as a whole rather than by its subregions. Last, to integrate the abstractness of graph theory with the analysis of a real biological network, methods borrowed from signal processing, information theory, and statistical physics need to be incorporated into designing new graph measures. In this dissertation, the newly proposed graph measures and methods are specifically designed to evaluate functional connectivity matrices obtained from 4

20 multichannel neurophysiological data. In this dissertation, new graph measures, weighted clustering coefficient and weighted path length, are introduced to define a weighted small-world measure for the brain network without resorting to thresholding of the weighted data. Also, methods from signal processing and information theory are used to develop new local measures of graph energy and graph entropy for weighted graphs to quantify each node s contribution to network connectivity and to identify anomalous subregions. Finally, a representation of the hierarchical modular organization common across multiple brain networks is extracted using a bi-partitioning algorithm based on spectral decomposition of these networks. The proposed measures are applied to functional connectivity graphs formed by computing pairwise phase synchrony between neural oscillations. A recently introduced time-varying phase synchrony measure [19] was used to quantify the statistical dependencies between neural populations in data, gathered during an EEG study [20], containing the error-related negativity (ERN). The ERN is a brain potential response that occurs following performance errors in a speeded reaction time task. The study tested the hypothesis that the medial prefrontal cortex (mpfc) interacts with the lateral prefrontal cortex (lpfc) in a dynamic loop during goal directed performance. This theoretical network uses a monitoring system, in particular the anterior cingulate cortex (ACC), to signal the need for enhanced cognitive control during conditions of conflict or after an error. After an error, the action-monitoring system (mpfc) is proposed to function as an alarm, recruiting the control network (lpfc) to enhance attention. Recently, Cavanagh et. al. [21] presented evidence that synchronized neural oscillations may be one mechanism that underly this functional communication between action-monitoring and cognitive control networks across mpfc and lpfc regions. The proposed graph measures will give a succinct representation of the functional networks in- 5

21 volved during action-monitoring and cognitive control and provide insight into the reorganization of the neural networks during error processing. 1.2 Background In this section, a short review on human brain physiology, neuroimaging techniques, metrics quantifying functional connectivity, and graph theory concepts is presented. This study is particularly interested in evaluating scale-free networks. Benchmark scale-free networks, which are used to test the proposed graph measures, are also introduced. Finally, a description of the biological dataset, an EEG study performed at the University of Minnesota, evaluated within this dissertation is presented Physiology of the Human Brain Neurons extend throughout the human body and vary in structure depending on their specific purpose. A typical neuron possesses three distinct components: the soma, axon, and dendrites. The soma is the cell body of the neuron containing the nucleus and fluid separating it from other cell bodies and the axon allows the neuron to transmit electrical information to other neurons. The axon extends from the soma into far regions of brain tissue where it ultimately terminates and splits into smaller branches. Only one axon can ever extend from a soma; however, numerous dendrites may exist. Signals deriving from other neurons are received either by the soma directly or by its dendrites. Dendrites can extend and split into branches many times but become thinner the further out they reach; this is unlike axons which maintain a fairly constant width [22]. Communication among neurons is due to the action potential, abrupt depolarizations that are the basis for neural signaling in the electrically excitable domains [22]. Neurons transmit electrical information via ion channels which are opened and closed according to concentrations of sodium, potassium, and chloride ions. At rest, these three chemicals remain at stable concentrations and 6

22 cause the neuron to have a resting state potential. Typically, this resting state is 70 mv thus indicating the cell membrane is polarized such that the inside is more negative than the outside. As the chemical concentrations change, depolarization, an increase in positive polarity, or hyperpolarization, an increase in negative polarity, occurs within the cell. When a motor or cognitive response occurs, the neurons follow a three phase action potential. The first phase begins with a depolarization of the cell to which the potential increases to 56 mv. This is immediately followed by a hyperpolarization phase which decreases the potential to 93 mv, past the resting potential. Finally, the cell depolarizes again and the potential returns back to 70 mv. The action potential is dependent upon the magnitude of the stimulus received by the neuron. This magnitude is referred to as the threshold which typically ranges between 0 and 10 mv. When the stimulus causes the resting potential to reach a value between this range, the action potential activates, otherwise no action potential takes place. There does not exist a medium action potential hence the "all or none" concept of a neuron. After an action potential occurs, there exists a refractory period in which the neuron will not have another action potential even if it receives a stimulus surpassing the threshold. The threshold and refractory period vary with each type of neuron, but the "all or none" concept remains true for all neurons. Neuron function and structure vary throughout the brain but for the purpose of evaluating functional connectivity, neural regions of the cerebral cortex are of primary interest. The cerebral cortex is the top layer of the human brain and can be categorized into four primary sections called lobes: Frontal, Occipital, Parietal, and Temporal (Figure 1.1). The cortex is also split into two functional halves, or hemispheres, by the midline of the brain called the corpus callosum. The functions associated with the frontal lobe are motor functions, reasoning, judgement, impulse control, planning, and memory. The functions associated with 7

Figure 1.1: Lobes of the cerebral cortex (www.morphonix.com).

23 Figure 1.1: Lobes of the cerebral cortex ( For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. the occipital lobe include vision and color recognition. The parietal lobe processes information, registers sensations of pain, controls speech, visual perception and maintains spatial orientation. Finally, the temporal lobe is associated with emotion, hearing, memory, and speech [22]. The neurons in these regions are densely packed and may be classified as pyramidal or non pyramidal. The pyramidal cells action potentials along the cerebral cortex can be monitored through the electroencephalogram The Electroencephalogram (EEG) Numerous neuroimaging methods exist such as Magnetoencephalography (MEG), functional Magnetic Resonance Imaging (fmri), Positron Emitted Tomography (PET), and the electroencephalogram (EEG). To examine functional integration in the temporal frame, there is a need to 8

24 characterize the temporal dynamics of neural networks with millisecond accuracy. Neuroimaging techniques with high temporal resolution, such as the EEG, are the most appropriate tools for examining the dynamic interactions of neural networks. The electroencephalogram is composed of a collection of oscillating electrical signals observed at each channel, i.e. electrodes, on the surface of the head/scalp. The electrodes used for recording an EEG are typically assembled in a 10/20 arrangement (Figure 1.2) such that the physical distance between adjacent electrodes are 10% or 20% of the total front to back or left to right distance of the skull. Each electrode is assigned a letter representing a lobe and a number for the hemisphere. The letters F, T, C, P, and O represent the frontal, temporal, central, parietal, and occipital regions respectively. Although there is no central lobe, the label exists to help indicate location. Electrodes located along the midline of the skull are represented by this set of letters accompanied by the letter z. Letters accompanied by even numbers are electrodes belonging to the right hemisphere and those with odd numbers represent the left hemisphere [23]. Electroencephalogram oscillations can be separated into different frequency bands and classified as five types of waves: delta (1-4 Hz), theta (4-7 Hz), alpha (8-12 Hz), beta (12-30 Hz), and gamma ( Hz). Delta waves are very slow high amplitude oscillations observed during sleep. Theta waves are mostly observed in children; however, only during drowsy or meditative states can they be observed in older adults. They are also associated with moments of enhanced creativity and daydreaming. Alpha waves are associated strongly with the occipital lobe. These waves are highly observable during wakeful relaxation when the eyes are closed. The waves disappear when the eyes are open or when the individual is asleep. Beta waves are associated with active and anxious thinking. Current research suggests gamma waves are associated with conscious perception [24]. 9

25 Figure 1.2: 10/20 EEG electrode arrangement ( Measures of Functional Connectivity After selecting an appropriate neuroimaging technique, a metric to quantify functional connectivity needs to be defined. Types of indices used for quantifying functional connectivity include linear measures, such as correlation and coherence, and nonlinear measures, such as phase synchrony and generalized synchronization measures [25]. Phase synchronization between neuronal oscillations has attracted a lot of attention in the last 10

26 decade for understanding the functional integration during cognition and attention tasks. It also has increased understanding of the underlying networks involved in psychopathologies. Synchrony is a commonly used measure of functional connectivity between neuronal oscillations and quantifies the relationship between the temporal structures of the signals regardless of signal amplitude. The amount of synchrony between two signals is usually quantified by estimating the instantaneous phase of the individual signals around the frequency of interest. The two main current approaches to isolating the instantaneous phase of the signal are Hilbert transform and complex wavelet transform. In both methods, the goal is to obtain an expression for the signal in terms of its instantaneous amplitude, a(t) and phase φ(t) at the frequency of interest as x(t,ω) = a(t)exp( j(ωt + φ(t))). This formulation can be repeated for different frequencies and the relationships between the temporal organization of two signals, x and y, can be observed by their instantaneous phase difference Φxy(t) = nφx(t) mφy(t), where n and m are integers that indicate the ratios of possible frequency locking. Most studies focus on the case n = m = 1. In neuroscience, the focus is on the case when the phase difference is approximately constant over a limited time window. This is defined as a period of phase locking between two events. In this study, a new time-varying measure of phase synchrony [19] based on the complex timefrequency distribution known as the Rihaczek distribution [26], is used to quantify the non-linear dependencies between the neural assemblies of the brain. For a signal, x(t), Rihaczek distribution is expressed as C(t,ω) = 1 2π x(t)x (ω)e jωt (1.1) and measures the complex energy of a signal at time t and frequency ω. One of the disadvantages of Rihaczek distribution is the existence of cross-terms for multicom- 11

27 ponent signals. In order to get rid of these cross-terms, a reduced interference version of Rihaczek distribution was introduced by applying a kernel function such as the Choi-Williams (CW) kernel with φ(θ,τ) = exp( (θτ)2 σ ) to filter the cross-terms to obtain C(t,ω) = ( (θτ) 2 ) exp σ ( exp j θτ ) A(θ,τ)e j(θt+τω) dτdθ (1.2) 2 where A(θ,τ) = x(u+ τ 2 )x (u τ 2 )e jθu du is the ambiguity function of the signal and exp( jθτ/2) is the kernel corresponding to the Rihaczek distribution [26]. The phase difference between two signals based on this complex distribution is computed as Φ 12 (t,ω) = arg [ C1 (t,ω)c 2 (t,ω) ] C 1 (t,ω) C 2 (t,ω) (1.3) where C 1 (t,ω) and C 2 (t,ω) refer to the complex energy distributions of the two signals x 1 (t) and x 2 (t) respectively and a synchrony measure quantifying the intertrial variability of the phase differences, phase locking value (PLV), is defined as PLV (t,ω) = 1 N N exp( jφ k 12 (t,ω)) (1.4) k=1 where N is the number of trials and Φ k 12 (t,ω) is the time-varying phase estimate between two electrodes for the kth trial. If the phase difference varies little across the trials, PLV is close to Graph Theory Graphs can be used to model relationships in a variety of complex networks including social and biological networks. In recent years, the developments in the quantitative analysis of complex networks, based largely on graph theory, have been applied to studies of functional brain networks [27]. An undirected graph, G = (V,E), is defined by a set of vertices, V = {v 1,v 2,...,vn} of order 12

28 n and a set of edges, E = {e i j v i,v j V } of size m (Figure 1.3). The graph can be represented by an n n binary adjacency matrix, A, such that 1 if an edge exists between v i and v j A(i, j) = 0 otherwise (1.5) or by a weighted matrix W such that W(i, j) = w i j where w i j can be any real number quantifying the relationship between vertices v i and v j. Another class of graphs are directed graphs, or di-graphs, which assign direction to the edges, i.e. w i j is not necessarily equal to w ji. In this dissertation, the focus is on weighted symmetric graphs where w i j [0,1]. The most fundamental measures that define the properties of a graph or di-graph are the degree of nodes, cost of the network, and the distance matrix (Table 1.1). Figure 1.3: Example of an undirected 6 node graph. This phase synchrony measure quantifies the dependent relationships between neural regions, which are represented by the EEG electrodes, and can be represented as an undirected weighted graph. The strength of the relationship between an electrode pair, i.e. edge weight, quantified by the phase locking value is represented by a single number for a particular time and frequency window obtained by averaging PLV (t, ω) in that time-frequency window. This is repeated for all 13

29 Graph Measure Symbol Equation Description Adjacency Matrix A 1 indicates an edge (0 otherwise). Connectivity Matrix W Set of all pair-wise strengths. Edge Weight W i j Edge value. Binary Node Degree k i k i = n j=1 A i j Number of node neighbors. Weighted Node Degree k i w k i w = n j=1 W i j Total weight of all neighbors. Degree Matrix K K ii = k i Node degrees along diagonal. Laplacian Matrix L L= K W Path P i j Set of edges between a node pair. Shortest Path SP i j SP i j = minp i j Distance Matrix D All pair-wise shortest paths. Table 1.1: Basic graph measures for undirected binary and weighted graphs. electrode pairs and the resulting phase locking values generate a weighted connectivity matrix W, where element w i j W is the phase locking value between electrodes v i and v j for a particular time-frequency window Graph Topologies In this section, a range of graph topologies often evaluated in graph theory research are presented. This includes lattice, random, star, and scale-free graphs Lattice Graphs Ball et. al. Coxeter [28] defines a lattice graph as the graph obtained by taking the n 2 ordered pairs of the first n positive integers as nodes and drawing an edge between all pairs having exactly one number in common. This type of graph is highly regular and organized and represents the spectrum of graphs opposite to random graphs Random Graphs There are various methods to generating a random graphs. The most popular random graph model is described by Erdos and Renyi [29]. Given a finite set S and a function P : S [0,1] where s S P(s) = 1, we call (S,P) a probability space with sample space S and probability function P. 14

30 Recall that Gn denotes the set of all labeled simple graphs on n nodes. Given a probability space (Gn,P), selecting an element of Gn according to the probability function P is called a random graph. Let p be a real number between 0 and 1, and consider the function P : Gn [0,1] such that for each graph G in Gn, P(G) = p m (1 p) N m (1.6) where m is the size of the graph G and N = ( n 2 ). Using the binomial theorem, (G n,p) is indeed a probability space such that (a + b) r = r ( ) r a k b r k (1.7) k k=0 The above (Gn,P) model is known as the Erdos-Renyi model of random graphs. In the Erdos- Renyi model, when p = 1 2, the probability of selecting a graph G G n at random is 1 2 N. This is interpreted as selecting a member of Gn where all members have the same probability of being chosen [30]. Watts et. al. [31] presented another method of generating a random network by rewiring the edges of a lattice graph. Each edge is rewired with probability p where 0 p Star Graphs Star graphs are connected graphs composed of a set of nodes such that the degree of node v 1 is d(1) = n 1 and the degree of all other nodes is d(i) = 1 i Scale-Free Networks A scale free network is a type of non-random network whose degree distribution fits a power law model, p(k) = k α, such that 2 α 3. Many real world networks such as social networks, power grids, neural networks follow a scale free topology. Watts et. al. [31] investigated networks generated by rewiring the nodes and edges of a regular network each with probability p. A regular 15

31 network of N nodes and k edges is a network where all nodes are connected to the same number of edges. Therefore a 5-regular network has five edges connected to each of its node members. Watts et. al. [31] explored the unknown region between a regular network (p = 0) and a completely disorganized random network (p = 1). A small world network was generated by using regular networks whose edges were rewired with 0 < p < 1. Small world networks demonstrated features that distinguished them from other networks in the sense that they possessed meaningful structural topologies. Small world networks possess degree distributions similar to a power law model and therefore are considered scale free. This dissertation will use the terms small world and scale free interchangeably Benchmark Data In this section, the various networks used to test the proposed measures in this dissertation are presented. This includes simulated networks and real-world networks Simulated Data Scale-free graphs were generated using the LFR benchmark software [32] which generates networks of a known underlying structure, such that membership of segregated clusters is known. The LFR benchmark is capable of generating graphs of any size with increasing complexity. Complexity is defined here as the total number of edges between clusters, kout, with respect to the the total number of edges, m Real-World Networks Four different real-world networks were used to evaluate some of the measures presented in this dissertation: A karate club social network [33]: Zachary s karate club, Figure 1.4, is a social network of friendships among 34 members of a university karate club in The network represents 16

32 the split within the group following disputes among the members. The relationship between members are presented as an unweighted matrix such that an edge exists only if two members were friends in the class and as a weighted matrix where entries represent the number of situations in and outside the club in which interactions occurred. The senior members of the karate group are numbered 1, 33, and 34. This is a useful network since the functional role of each of the nodes is well known and is revealed to have a scale-free structure such that the degree distribution follows a power law where α 2. Figure 1.4: Zachary Karate Club Network. Dolphin network [34]: This network is derived from a social network analysis of statistically significant frequent associations between 62 bottlenose dolphins in New Zealand. Les Miserables [35]: The fictional social network describes the co-appearance of 77 characters in the novel Les Miserables. The North American airport network [36]: This network is a summary of the frequent flight/connection patterns associated 456 U.S. airports. 17

33 Biological Data The proposed phase synchrony measure in conjunction with graph measures are applied to a set of EEG data containing the error-related negativity (ERN). The ERN is an event-related potential that occurs following performance errors in a speeded reaction time task. Previously reported EEG data [20] from 62-channels (10/20 system) were utilized. This included 90 undergraduate students (34 male) from the University of Minnesota (two of the original 92 participants were dropped due to artifacts rendering computation of the TFPS values problematic). Full methodological details of the recording are available in the previous report. The task was a common speeded-response letter (H/S) flanker, where error and correct response-locked trials were from each subject were utilized. A random subset of correct trials was selected, to equate the number of errors relative to correct trials for each participant. Before computing the phase-synchrony measures, all EEG epochs were converted to current source density (CSD) using published methods [37, 38]. This was done to accentuate local activity (e.g. to better index small world properties) and to attenuate distal activity (e.g. volume conduction). Aviyente [19] reported there is increased phase synchrony associated with ERN for the theta frequency band (4-7 Hz) and ERN time window (25-75 ms) for Error responses compared to Correct responses. It is expected that the proposed graph measures in this dissertation will corroborate Aviyente s findings. Functional networks underlying control processes have been an important recent focus of study and provide a reasonable test of the current measure of functional connectivity. In particular, a great deal of empirical and theoretical work suggests that frontal networks involving the medial prefrontal cortex (mpfc) and lateral prefrontal cortex (lpfc) are centrally involved in cognitive control processes [39, 40, 41, 42, 43, 21]. Time-frequency (TF) analysis techniques have been shown to be particularly well suited to isolating these control-related processes in the theta band 18

34 (3-7 Hz), because they can disentangle relevant overlapping event-related activity occurring at different frequencies [44, 45, 20, 46, 47]. Recently, Cavanagh and colleagues [21] presented evidence that synchronized neural oscillations in the theta-band may be one mechanism that underlies functional communication between networks involving mpfc and lpfc regions during the ERN. They quantified dynamics of the ERN by computing bivariate theta-band wavelet-based phase synchrony between midline-frontal and lateral-frontal sites. Specifically, they found increased synchronization in the theta band between F5-FCz and F6-FCz for error, and that this difference significantly predicted post-error slowing. Together, these findings support the view that TFPS is sensitive to the changes in functional connectivity in frontal networks underlying cognitive control, and suggest the hypothesis that this increased connectivity would correspond to increased small-world network behavior. 1.3 Thesis Outline This thesis is organized as follows. First, in Chapter 2, graph measures classifying a network as small-world are reviewed for unweighted and weighted connection matrices. An important limitation of current small-world measures is that they are mostly restricted to binary undirected graphs which require the transformation of the pairwise connectivity indices, e.g. bivariate phase synchrony values, to 0 or 1 by means of a thresholding operation. This thresholding operation loses the information about connectivity strengths between pairs of electrodes as well as brings up the issue of the choice of the optimal threshold. In a lot of studies using small-world measures, a range of thresholds is chosen corresponding to graphs with different degrees to avoid the arbitrary choice of a threshold. Although this approach does not suffer from the choice of the threshold, it increases the computational complexity of the analysis and results in uncertainty about which threshold is best for representing the underlying network. A weighted network, on the other hand, 19

35 provides further relevant information by indicating the varying levels of strengths of the pairwise connections and may be especially useful in filtering the influence of weak and potentially non-significant links. Recently, generalizations of small world measures including the weighted clustering coefficient and weighted path length have been introduced. However, these measures do not incorporate multiple and longer paths between nodes which may significantly contribute to integration in larger and sparser networks. Therefore, Chapter 2 presents new weighted clustering coefficient and weighted path length measures that take all possible connections between nodes into account to compute the small worldness of the network. Also, for situations when thresholding is absolutely required, a process to construct an optimal binary mapping of the biological network is presented. In chapter 3, concepts from signal processing and information theory are employed to introduce new local measures or features to describe graph anomalies and hubs. A new measure of localized graph energy quantifies a brain region s local contribution to functional global organizational structure during error processing. This method is based on the idea of h-hop neighborhoods of a node which can describe overall graph connectivity, similar to time scaling and shifting in a signal. The proposed measure of energy, Laplacian Huckel Energy, is based on the spectral features of the graph which provide greater insight on the structural characteristics of the network unlike other published measures of graph energy. Furthermore, Chapter 3 presents how global and local complexities of unweighted and weighted graphs are quantified using a new measure of graph entropy rate. The adjacency and connectivity matrices are modeled as a Markov process and entropy is computed according to state transitions in these matrices. The entropy rate of various graph topologies are investigated to determine their complexity. The entropy rate measure is extended to evaluate the local properties of graphs to identify anomalies and hubs of the net- 20

36 work. Ultimately, the goal of the proposed entropy rate measure is to design a generalized graph compression algorithm. In chapter 4, multivariate analysis is performed on connectivity matrices by implementing new hierarchical spectral clustering techniques over a group of networks. Clustering algorithms are typically applied to sparse and unweighted networks, whereas the proposed methods are designed specifically for fully connected and weighted networks. CONGA, a hierarchical clustering method, is implemented in a new clustering process to identify an optimal community structure in which nodes are allowed to participate in multiple clusters. Finally, most clustering algorithms compute an average connectivity matrix when evaluating multiple networks, thus disregarding variances across subjects. A spectral clustering algorithm, implementing the Fiedler vector, identifies a community structure common to a set of multiple connectivity matrices without first averaging the matrices. These techniques are applied to EEG data to identify which brain regions form highly synchronized segregated groups during error processing. In chapter 5, the contributions of this dissertation are summarized as well as the final evaluation of the ERN dataset. Finally, future work pertaining to clustering, graph compression, and dynamic brain networks is discussed. 21

37 Chapter 2 Small-World Type Measures for Weighted Graphs 2.1 Introduction Many complex networks such as social and biological networks demonstrate a phenomenon referred to as small world. The concept of small world originally stems from sociology and refers to a network composed of a sparsely connected set of nodes which form dense clusters (producing a high clustering coefficient) linked by short path lengths [31]. A small world is thought to lie between two extreme configurations: regular lattice and random graph which are defined in Chapter 1. The advantage of characterizing a complex network as small world is that it offers a quantitative means to evaluate varying levels of non-randomness in the network. The frequent occurrence of dense clusters introduces redundancy in the network thus increasing stability of the system and robustness in case of node breakdown [48]. Short path lengths permit rapid communication and robustness to random network attacks [49]. In the brain network, high clustering coefficient promotes functional overlap of densely connected neuronal elements. These 22

38 elements act as the building blocks of the cortical architecture. Short path length promotes effective interactions between neuronal elements across cortical regions which contributes to functional integration. Categorizing a network as a small world presents the opportunity to correlate a network s structure to its dynamic properties. In a study on scientific collaborations, Newman et. al. reported underlying small world structures have an impact on the dynamics of complex networks [50] by demonstrating a small world structure influenced the speed of information and idea dissemination in the academic community. Kogut et. al. [51] demonstrated that German corporate firms with higher clustering coefficients and lower path lengths within their ownership structure are more likely to be involved in takeovers and restructuring. Uzzi et. al. [52] evaluated the small world of Broadway musicals from 1945 to 1989 and found that the varying small world properties affected the creativity of Broadway musicals. The literature thus shows a clear correlation between small world structure and network behavior. Recently, this functional dependence on network structure has been exploited for understanding the topology of structural and functional brain networks. The literature suggests functional connectivity of the human brain network can be explained and modeled according to the fundamental organizational principles of a small world. Sporns et. al [27] discovered robust small world properties and non-random patterns in functional connection matrices of invertebrates and mammalian cerebral cortices; including macaque cortex and cat cortex. Further studies demonstrate that large-scale cortical networks exhibit a specific composition of structural and functional modules [53], small-world attributes, i.e. high clustering and short path lengths [54, 55], short wiring length [56] and high complexity [57]. Studies of patterns of functional connectivity in the brain, e.g. coherence or correlation, among cortical regions have demonstrated that functional brain net- 23

39 works also exhibit small-world [58, 59] and scale-free properties [60]; possibly reflecting an underlying structural organization of neural connections. In general, it has been hypothesized that the large-scale organization of neural connections in the brain may impose important constraints on functional dynamics underlying cognition [61, 62]. Small world has also been effective in distinguishing between healthy and pathological functional brain networks. In a magnetocenephalographic (MEG) study [7], it was shown that patients with Alzheimer s disease demonstrated significant decreases in small world compared to healthy brains indicating a disruption in effective interactions between and across cortical regions. A similar conclusion was made by Supekar et. al. [63] who evaluated patients through task-free functional magnetic resonance imaging (fmri). Similar disruptions in functional connectivity have been revealed in an EEG study on schizophrenia [64] and in an MRI study on heroin-dependent individuals [65]. These studies indicate that an analysis of small world describes the effective integration of multiple segregated sources of information in the brain and how decreases in the small world structure are indicative of cortical dysfunction due to disease. Quantifying the small-world measure of brain networks is limited because small-world measures, clustering coefficient and path length, are originally defined for unweighted networks. However, most functional brain networks are defined through weighted networks where the edge weights correspond to the temporal dependence between different brain regions. In this chapter, we introduce new methods for representing the small-world behavior of weighted brain networks. There has been a limited investigation into developing equivalent clustering coefficient and path length measures for weighted networks [66] and even less studies with applications to real-world weighted networks such as the brain network. Therefore, we introduce new measures of weighted clustering coefficient and weighted path length and demonstrate their applicability to quantifying 24

40 a weighted small-world measure for the human brain network and other real world networks. The literature typically overcomes problems encountered with weighted networks through thresholding. This methodology is not optimal because thresholding can eliminate important connectivity information. Also, the resulting network is biased because of the arbitrary selection of a threshold. There does not exist a systematic way for choosing an optimal threshold and therefore studies have usually evaluated the unweighted small-world measure of a network across multiple thresholds. This leads to high computational times particularly where multiple subject data is available. In large part, these problems with thresholding would be avoided using the weighted graph measures we propose in this chapter, but we also offer an alternative to thresholding through the introduction of a systematic method for transforming the weighted network to a binary network whose small world characteristics are maximized. Both the weighted graph measures and the weighted to binary network transformation algorithm are applied to both well-known networks and a biological network obtained from ERN EEG data with the intent of demonstrating that cognitive control is associated with increased functional connectivity in the brain and therefore promotes an increase in small world structure. 2.2 Clustering Coefficient Clustering coefficient of an unweighted network, a metric necessary for computing smallworld of a network, is the fraction of triangles around a node and is equivalent to the fraction of the node s neighbors that are neighbors of each other. In this section, the conventional measure of clustering coefficient is described along with a discussion of its fundamental weaknesses. This background is followed by proposed measures for addressing these weaknesses. 25

41 2.2.1 Conventional Clustering Coefficient The clustering coefficient of a node v i quantifies its ability to form a complete graph among its neighbors where all nodes are connected to one another [31]. For a binary graph, the clustering coefficient is equivalent to the fraction of the node s neighbors that are also neighbors of each other and is defined as C(i) = t i k i (k i 1)(1/2) (2.1) where t i is the number of closed triplets centered at v i and k i is the degree of v i. A closed triplet centered at v i is defined as a set of three nodes, including v i, which are linked by three edges thus forming a triangle. An open triplet, on the other hand, is composed of three nodes but with only two edges between the nodes. The conventional clustering coefficient is in the range [0,1] where C i = 0 corresponds to a node v i with degree k i = 0, k i = 1, or one where none of the neighbors share a common edge. C i = 1 if all of v i s neighbors are connected to each other. Clustering coefficient is important for determining the extent to which nodes group into communities. This clustering information is summarized for a particular network simply by computing the average clustering coefficient as C = N 1 N i=1 C i,, where N is the number of nodes in the network. A fundamental weakness of the conventional clustering coefficient is that it is originally defined for binary networks, i.e. is independent of edge weights. It is well known that network characterization is not limited to its topology but also to the dynamic flow of information described by the edge weights. For example, regions with homogenous activity within the network could be important for understanding the functionality of the network. The amount of information flow characterizing the connections is fundamental for a complete description and characterization of networks [67]. In this chapter, a new method is proposed for computing clustering coefficient in 26

42 weighted networks by treating edges weights as probabilities. Clustering coefficient is defined as the probability of a node participating in a closed triplet with respect to all open and closed triplets for which it is a member. Furthermore, we argue that the conventional measure of clustering coefficient does not completely characterize the importance of a node in binary networks. This is because it does not fully incorporate the degree of a node, which is an indicator of network centrality. We end this section on clustering coefficient by also proposing a similar clustering coefficient for binary networks that improves the identification of central nodes in the network Existing Definitions for Weighted Clustering Coefficient The literature presents various attempts at developing or implementing a clustering coefficient measure designed for weighted networks [68, 69, 66, 70]. Weighted networks contain information pertaining to the dynamic flow of information between node pairs and therefore a clustering coefficient should reflect the information flow through closed triplets. Barrat et. al. [71] was the first to propose such a weighted measure as C w 1 (i) = 1 k w i (k i 1) j,h i w i j + w ih a 2 i j a jh a ih (2.2) where a i j = 1 if there is an edge between nodes v i and v j and 0 otherwise and k w i = N j=1 w i j is the strength of node v i. This measure quantifies the local cohesiveness within a closed triplet centered at v i. However, this measure considers only the average weight of the edges connected to v i within a triplet and neglects to consider all edges that form the triplet, i.e does not take into account the weight w jh. Ignoring the weight w jh will result in clustering coefficients that do not capture the total connectivity within the closed triplet. Another closely related and widely used weighted clustering measure defined by Onnela et al. [72], previously implemented in applications to functional brain network analysis [7, 8, 73], 27

43 addresses this issue and takes into account all of the edges that form the triplet as C w 2 (i) = 1 k i (k i 1) (w i j w ih w jh ) 1/3 (2.3) j,h i This measure normalizes the geometric mean of the edges that form the triplet by the total possible number of connections for node v i. However, Onnela s measure requires the availability of information pertaining to the underlying binary network, i.e., k i. Therefore, if the network is fully connected and weighted, thresholding will be required to obtain a binary adjacency matrix. If the network is not thresholded, every node is assigned an equal degree value. This results in a constant integer in the denominator and would cause all nodes, regardless of participation in the network, to be normalized equally thus obscuring the uniqueness of each node s connection strength. In the event there exists a sparse binary representation accompanying the weights, the degree will have a strong influence on the clustering coefficient thus obscuring the information provided by the weights. The binary degree, k i, clearly introduces biases to the clustering coefficient computation that should be strongly avoided. Zhang et. al. proposed a weighted clustering coefficient definition to address this issue [74] by computing the clustering coefficient only through the use of the weighted edges and inspired by the definition in [75]: w i j w jh w hi C 3 w i h (i) = j i, j h w i j w ih j h The numerator computes the total triplet intensity for a particular reference node and the denominator normalizes the clustering coefficient to values between 0 and 1 by using the underlying open triplets centered at node v i. Although Zhang s method presents an alternative to Onnela s require- 28

44 ment to use binary node degree, the method is prone to false identification of triplets [66]. A false positive identification occurs when at least one of the edges within the closed triplet has significantly lower weight than the other edges and the clustering coefficient is still large. This false identification indicates that a strongly" linked closed triplet is identified as equivalently strong" as an open triplet Newly Proposed Weighted Clustering Coefficient There is a need to both decrease the number of false positives as well as obtain a measure that is independent from the node degree. Therefore, this motivates the extension of Onnela s and Zhang s clustering coefficients by incorporating the total triplet (closed and open) intensities within the denominator to provide a complete characterization of a node s connectivity strength through 3 w i j w jh w hi C 4 w j,h i (i) = (2.4) w ih w h j + w i j w jh w hi j,h i j,h i which treats the pairwise weights as probabilities. In this definition, w i j is treated as the probability of information flowing through a direct path between nodes v i and v j without the use of intermediate nodes. The clustering coefficient is computed as the ratio of the total probability of forming closed triplets to the total probability of all possible triplets with respect to node v i. This approach guarantees a normalized clustering coefficient between 0 and 1 using all of the edge weights. Our interpretation of an open triplet differs from that of Zhang s, where an open triplet of v i is defined as one in which v i is the center node. Our measure, however, defines an open triplet of v i for which v i is the starting point (or end point) in a connected set of three nodes. Unlike Zhang s definition, our definition of an open triplet incorporates the weighted edge that is not incident upon v i. This provides a more complete representation of the node s ability to integrate itself with the rest of 29

the network since it takes into account all possible paths originating from v i to all other nodes in the triplet. In this way, the likelihood of a false positive is reduced, i.e., the proposed measure quantifies connectivity intensities primarily unique to strong" closed triplets.

45 the network since it takes into account all possible paths originating from v i to all other nodes in the triplet. In this way, the likelihood of a false positive is reduced, i.e., the proposed measure quantifies connectivity intensities primarily unique to strong" closed triplets. Table 2.1 compares the clustering coefficient computed using Barrat s, Onnela s, Zhang s, and the proposed measure for different variations of a triplet. The dotted edges represent weak" links and the solid edges represent strong" links. The solid node represents the reference node. One hundred triangles were simulated for each of the four configurations where weights for the weak" links were selected from a uniform distribution between 0 and 0.2. Strong" links were assigned weights chosen from a uniform distribution between 0.8 and 1. From the table, our proposed measure is shown to reduce the number of false positives compared to Zhang s measure, is a more complete representation of clustering than Barrat s measure since it incorporates the third edge in the calculation, and provides a significantly larger clustering coefficient value for the strongest closed triplet than Onnela and Barrat since binary nodal degree is not a factor. Table 2.1: Comparisons between the proposed and existing weighted clustering coefficients for different variations of a triplet. C B 0.05 ± ± ± ± 0.03 C O 0.05 ± ± ± ± 0.03 C Z 0.10 ± ± ± ± 0.06 C proposed 0.10 ± ± ± ±

46 2.2.4 A Random Walk Based Clustering Coefficient Measure for Binary Networks As demonstrated by the proposed weighted clustering coefficient, connectivity of a network is best represented when the edges are weighted. Unfortunately, all edges in a binary graph are equally weighted and therefore all triplets in the network are considered equal. This property makes the conventional clustering coefficient less discriminative than the weighted clustering coefficient. This shortcoming of the conventional clustering coefficient can be demonstrated with a simple example: If nodes v i and v j are linked to 20 and 5 neighboring nodes, respectively, and all their neighbors share edges with each other, then both C i = 1 and C j = 1. However, it would be accurate to describe v i and its local connections as more modular than that of v j. If these nodes were part of a real-world network, node v i can be characterized as an anomaly since it is less probable for there to naturally exist ( 20) 2 edges shared among the neighbors of vi compared to ( 52 ) edges among neighbors of v j. Therefore, clustering coefficient can be made more discriminative for the purpose of identifying and ranking submodules of a binary network. In this section, a more discriminative clustering coefficient, based on the proposed weighted clustering coefficient, is described. A challenge with applying the proposed weighted clustering coefficient to a binary network is that there are no distinct weights associated with the edges. To overcome this, a weight can be assigned to an edge equal to its probability of being selected by a random walk. The probability of selecting edge e i j is k 1, i.e. the reciprocal of node v i i s degree [76]. Since the direction of a random walk is dependent only on the previously visited node, the binary graph can be modeled as a first-order Markov chain [77]. Transitions between all node pairs, i.e. states of the Markov chain, in the graph can be represented by a transition matrix, P, where 31

47 1/k i if e i j = 1 P(i, j) = 0 otherwise (2.5) where k i is the degree of node v i. The nth power of the transition matrix P n determines the probability of a random walk between nodes, i.e. P n (i, j) is the probability of moving from v i to v j in only n state changes. Therefore, the total probability of v i participating in an open triplet and closed triplet are computed as p(v open i ) = [P 2 ] i j (2.6) j, p(v closed i ) = [P 3 ] ii (2.7) and a modified clustering coefficient can be described as C(i) = p(v closed i ) p(v open i ) + p(v closed i ). (2.8) This clustering coefficient for binary graphs is more discriminative than the conventional measure since weights are assigned to the edges. Also, this proposed clustering coefficient is still between 0 and Characteristic Path Length Path length, the second metric necessary for computing small-world of a network, is computed as the average number of edges contained among the shortest paths between node pairs [31]. In this section, the conventional measure of path length is described along with a discussion 32

48 of its fundamental weaknesses. This background is followed by proposed measures for addressing these weaknesses Conventional Characteristic Path Length While clustering coefficient describes how nodes group together, the characteristic path length quantifies the extent to which individual nodes are segregated from each other. Characteristic path length, L i, of a node v i is computed using the distance matrix, D, of the graph such that L i = 1 N N D i j (2.9) i=1 where D contains the shortest pair-wise distances. The average network path length is defined as L = 1 N N i=1 L i. Unlike clustering coefficient, the literature is much more limited in offering alternative methods for quantifying path length. The primary interest within the literature for improving path length is developing search algorithms that discover shortest paths in networks with negative and positive weights with minimal computational time [78] or computing shortest paths for networks containing negative cycles [79]. However, with particular interest in small-world analysis, there are limited options for computing path length of weighted and binary networks. Conventional path length is computed from a shortest path cost function. Small-world analysis reported in the literature [7, 80] typically focuses on one type of shortest path cost function, the minimum weighted sum (or minimum number of edges in a binary graph) between a node pair. This definition may not be robust for a variety of reasons: 1) There may exist multiple shortest paths; 2) The shortest path may not always be used in information sharing across real-world networks; 3) A path with the minimum weighted sum may not always be the path with maximum probability. In graph theory, there are many other possible cost functions for computing the minimum distance between a node 33

49 pair [81]. Therefore, a new path length measure can be proposed using a new and more robust cost function for weighted graphs and binary graphs Existing Definition for Weighted Characteristic Path Length To retain functional information introduced by edge weights, the path length measure should be extended for weighted graphs. One such method is proposed by Stam et. al. [7] where weighted path length is defined as L(i) = 1 1 N 1 N j i w i j (2.10) This measure suggests that the stronger the relationship between a node pair, the shorter the weighted path length should be. This definition takes the strength of the edge connecting two nodes as the interaction intensity between those two nodes, i.e. only considers the strength of a single edge path. However, two nodes in a complex network, particularly the brain, can interact through multiple edged paths whose connection strengths may be larger than the weight of the single edged path. Therefore, the above method fails to capture the true flow of information throughout the network. For example, if nodes v i and v j do not share an edge, they are assigned a pair-wise weight of w i j = 0. This implies that a direct path between the two nodes does not exist. However, this does not suggest that v i and v j cannot exchange information, even if not directly. The weighted path length measure above does not consider the possibility of any information flow between these two nodes Newly Proposed Weighted Characteristic Path Length Based on the disadvantages of the existing weighted path length measure, it would be appropriate to define a new measure that considers the path length between two nodes to be equivalent 34

50 to the maximum possible connection strength between them. Among all possible connections between nodes v i and v j, the shortest path will now be defined as the one with the highest probability of connection which corresponds to the maximum product of edge weights forming a path linking a node pair. Such a method would be in line with the fundamental basis of the conventional path length measure. First, the connectivity strength, or the amount of information flow, between two nodes is defined as the product of the edge weights along the path, P i j, linking a node pair where P i j = {w i,q1,wq 1,q 2,...,wq m 1,qm } is the set of weighted edges in the path, and Q = {q 1,q 2,...,q m} is the sequence of nodes along the path P i j. An efficient search algorithm, based on a modification of the Floyd-Warshall algorithm [82], is applied to the graph. The algorithm identifies a path that maximizes the cost function F i j = w i,q1 w q m, j Πm l=2 w q l 1,q l. The proposed weighted path length is then defined as L(i) = 1 N 1 1 (2.11) F j i i j The weighted path length of node v i is based on the average probability of forming a path to all other nodes in the network. The advantages of this proposed path length measure are that it is based on the connection weights and considers all paths between nodes. The only stipulation for the evaluated connectivity matrix is that its entries must be in the range [0,1] to ensure that as the number of edges in a path increases, the probability of linking a node pair within the path decreases. In some applications, edges of a graph may be associated with a probability of reliability [83]. For situations where this information is unavailable, normalizing the weights of the connectivity matrices by the maximum connectivity weight would satisfy this requirement. 35

51 2.3.4 A Random Walk Based Characteristic Path Length Measure for Binary Networks Conventional measures of shortest distance is defined as the minimum total sum of weights along a path. Practical applications such as the optimal processing of parallel algorithms in a multiprocessor system [84], adopt this cost function for computing shortest path in order to identify the best choice for facilitating information flow in a network. Yu et. al. [85] argues that a single cost function, such as the one implemented for small-world analysis, is not guaranteed to reveal the most robust connections in a network. Consider the examples in Figure 2.1 which show variations in connectivity among three nodes. In both figures 2.1a and 2.1b, nodes A and B have a shortest path of 1; however, the robustness of their connections differs in each case. The arrangement in 2.1a shows a more robust connection between A and B, however the single edge linking them in 2.1b may be easily disrupted in a random attack. Therefore, the connectivity in 2.1a is more robust than the one in 2.1b although in both cases the value of the shortest path is the same. Further evaluation of Figure 2.1b by conventional path length would have indicated that the connection between A and B with a shortest path of 1 is more optimal than the connection between A and C with shortest path length of 2. However, it is clear that the connectivity between A and C is more robust than the one between A and B. This simple example illustrates the importance of robustness for real-world networks where perturbations are likely to occur [86, 87, 88]. Therefore, a path length, computed from a different cost function than that used in conventional small-world analysis, should convey the robustness of a network. The new cost function is based on the N N Markov transition matrix introduced in Section The matrix is characterized as absorbing if it contains at least one node which behaves as a sink [77], i.e. the probability of a directed random walk along a path from the sink to all other 36

52 (a) (b) Figure 2.1: Nodes A and B have similar path length, however, connectivity in (b) is less robust than in (a). In (b), connectivity between nodes A and C is more robust than connectivity between nodes A and B; despite A and C having a longer path length than A and B. nodes is zero. All other nodes are considered transient since a random walk may traverse between them in any sequence, i.e. the probabilities of traversal along these bi-directed paths are non-zero. P, representing the transition matrix of an absorbing Markov chain, is defined as P = Q R 0 1 (2.12) where Q is the (N 1) (N 1) matrix containing the transitional probabilities among the transient nodes and R is the (N 1) 1 vector containing the transitional probabilities from transient nodes to the absorbing node. The expected number of states needed to reach the absorbing state from any transient state is M = I +Q+Q Q, where M is the fundamental matrix of an absorbing Markov chain and is shown to be M = (I Q) 1 [89]. The following theorem then is used to 37

53 determine the expected number of state transitions necessary for reaching the absorbing node. Theorem 1. Let t i j be the expected number of steps before the Markov chain is absorbed at node v i, given the chain starts at node v j. Also, let T be the (N 1) 1 vector whose jth entry is t i j. Then T = Mc (2.13) where c is the (N 1) 1 ones vector and M is defined above [90]. By transforming a binary graph into a absorbing Markov chain model, path length can be computed as the expected value of the distance between any node pair. Since the graphs of interest in this study are undirected and connected, they can be characterized as ergodic. Therefore to compute path length, these ergodic networks are converted into absorbing networks by first selecting any node to be the absorbing state. If v i is the selected node, then P is modified such that the entire ith row is set to zero with the exception of P ii which is set to 1. This designates v i as the absorbing node. If the initial state begins at node v j, the average distance between nodes v j and v i is the average number of state changes expected before being absorbed by v i. T is then computed for each node by iteratively designating each node as the new and only absorbing state. Averaging T for each absorbing node is equivalent to the path length of that node. Algorithm 1 describes the process of computing path length. In the algorithm, K is the diagonal degree matrix of a graph G where K ii = k i ( 0 otherwise) and I is the identity matrix. This process identifies the expected length of paths traversed frequently between pairs of nodes. The expected length of a path is not necessarily the shortest, but rather the most probable and robust. Therefore, the proposed path length measure, L, is an improved extension of the conventional measure implemented in small-world analysis. Since the new path length is based on the evaluation 38

54 Algorithm 1 Markov Chain Path Length for Binary Graphs Input: N N binary graph, G = (V, E), and adjacency matrix A. Output: Path Length vector, L. 1: P = K 1 A {# Transition matrix of graph G.} 2: z = N 1. 3: for i = 1 to V do 4: Q = P {v i }. 5: M = (Iz z Q) 1 6: c = 1 z 1 7: T = Mc 8: L(i) = j T j z 9: end for 10: return of multiple alternative paths, it is also a more complete representation of network connectivity. 2.4 Small-World Measure of a Network In this section, the conventional measure of small-world implemented throughout the literature [27, 7, 91] is discussed. Since applications of small-world have primarily included binary graphs, an extension to weighted graphs is presented Conventional Small-World Measure Watts et. al. [31] computed the small-world value of a network using clustering coefficient and path length. First clustering coefficient and path length are normalized with respect to the clustering coefficient and path length of a random network. The normalized values for clustering coefficient and path length are denoted as γ and λ, respectively. It is expected that a network with small-world property will demonstrate γ 1 and λ 1. The small-world value, σ is computed as σ = γ λ where σ 1 is indicative of a small world topology. Although the small-world measure is primarily defined for binary networks, a weighted small-world measure has been discussed by Latora et. al. [80, 92]; yet they do not describe a formal definition which utilizes a weighted clustering coefficient and weighted path length. Here, a weighted extension to the small-world 39

55 measure is presented using the proposed weighted clustering coefficient and weighted path length defined in sections 2.2 and Newly Proposed Weighted Small-World The proposed weighted clustering coefficient and path length are used to define a weighted small-world measure. Normalization of the two parameters is performed in a similar manner described by Stam et. al. [7]. Normalized weighted clustering coefficient and normalized weighted path length are computed by dividing the network s clustering coefficient and path length by the average corresponding measures, C rand w (i) and Lw (i), of an ensemble of surrogate graphs. rand The surrogate graphs are generated by randomly permuting the weighted edges of the original connectivity matrices. With the normalized weighted clustering coefficient and path length defined as C w (i) = Cw (i) C rand w (i) and Lw (i) = Lw (i) L rand w we extend the conventional small-world definition (i), to evaluate local weighted small-world at each node in the weighted network as σ w (i) = Cw (i) L w (i). A weighted network is characterized as a small-world if σ w (i) >> Mapping of Weighted Graphs to Binary Graphs In this section, fundamental issues associated with weighted to binary mappings of networks are discussed. Since edges of real-world networks typically are associated with non-binary weights, there exists common approaches for obtaining a binary transformation. The weighted measures presented in sections 2.2 and 2.3 circumvent this problem, however, for cases when the binary representation is absolutely necessary, an alternative transformation algorithm is offered. 40

56 2.5.1 Existing Approaches for Transforming Weighted Graphs to Binary Graphs Most of the current graph theoretic measures are defined primarily for binary adjacency matrices [27, 93, 94]. There is little research on weighted adjacency matrices and this poses a problem for applying the conventional measures to weighted adjacency matrices and for performing comparisons between weighted graphs. A review of thresholding techniques which are currently implemented in weighted graph studies is presented by Wijk et. al. [95]. There are two primary methods for transforming a fully connected weighted graph to a sparse binary network: 1) Choosing an arbitrary threshold, τ, for which all entries of the weighted matrix equal to or are greater than τ are mapped to a 1 and those below are mapped to a 0; 2) Applying a range of thresholds, [τ 1,τ 2 ], to the weighted matrices and obtain a collection of binary matrices corresponding to each threshold. These two approaches are either purely arbitrary or computationally exhaustive. Furthermore, comparisons across different networks are unreliable since τ can cause variations in sparsity across networks. Choosing a threshold which also maintains connectivity, without sacrificing sparsity, continues to also be a challenge. The proposed method in this section offers a third option, optimal weighted to binary transformation. The advantage of an optimal transformation cost function would reduce dependence of graph measures on τ but rather utilize the strength of connectivity among node pairs Weighted to Binary Graph Transformation: Alternative to Matrix Thresholding To address the issue of connectivity presented by thresholding, it is proposed that a backbone of the network is first identified. In computer networking, a backbone is a larger transmission 41

57 line that carries data gathered from smaller lines that interconnect with it [96]. We extend the use of this terminology to describe the underlying primary structure of a graph. This backbone would ensure all nodes are reachable from any starting node. The question is how to choose the best backbone of the network. In graph theory, a spanning tree is a subgraph which contains all nodes in the network, has exactly n 1 edges, and each node is reachable from all other nodes in the network. There are no cycles in the spanning tree, i.e. removing an edge from the tree would disconnect the tree. In a complete graph with n nodes, there are n n 2 possible spanning trees [18]. There exists a spanning tree whose total cost, defined as the sum of all weights in the spanning tree, is minimum with respect to all possible spanning trees in the weighted graph. This minimum cost spanning tree is called a minimum spanning tree (MST). In the proposed work, we use the MST as the backbone of the new binary transformation of the weighted graph. In order to extract the MST corresponding to a given graph, we use Prim s algorithm [18], which is capable of identifying the MST for any graph with negative and positive weights. If W G is the weighted connectivity matrix of a graph G, then W H is the transformation from G to H such that W H = 1 W G. In this manner, the highest weights are now transformed into the lowest weights. Applying Prim s algorithm to H would produce a minimum spanning tree containing edges forming node pairs with the highest weights in G. After identifying the backbone, i.e. tree, of the new graph, the tree can be expanded to a cyclic graph by adding edges to the tree based on a user defined cost function. One such cost function in the realm of real-world network analysis, for example, identifies a degree distribution which follows a power-law model. Another cost function is the small-world parameter, defined by the clustering coefficient and path length. We propose a cost function which would maximize the clustering coefficient and minimize the path length by forming cycles in the minimum spanning 42

58 tree using nodes separated by at most λ edges in the tree where 1 λ n. λ can be incremented as long as the bounds set by the cost functions are obeyed. Since the focus of this thesis is analyzing the connectivity of real-world networks, a cost function which bounds the degree distribution according to a power law model and a cost function which facilitates small-world behavior will be implemented. Algorithm 2 describes the process for generating a binary graph from a fully connected weighted graph. The function T = MST (G) identifies the binary adjacency matrix representing the minimum spanning tree for graph G. D T = dist(t ) is the pair-wise distance matrix of the minimum spanning tree T. The variables [φ 1,φ 2 ] are the lower and upper bounds of the degree exponent, α, estimated for the degree distribution. As presented in Chapter 1, for a power-law model 3 α 2. Similarly, clustering coefficient is bound by the limits [γ 1,γ 2 ]. From previous studies [94, 50], clustering coefficients of real-world networks have been observed to be in the range 0.2 γ 0.9. Algorithm 2 Optimal Transformation from a Weighted to a Binary Adjacency Matrix Input: An n ordered weighted graph G = (V,E) with connectivity matrix W G, where e i j E w i j W G. Output: Optimal binary matrix, A. 1: H W 1 W G. 2: T = MST (H). 3: A T. 4: D T = dist(t ). 5: λ 2. 6: while λ < n do 7: A A {e i j D T (i, j) λ}. 8: if φ 1 α φ 2 then 9: if γ 1 C γ 2 then 10: λ λ : else 12: return 13: end if 14: end if 15: end while 43

59 2.6 Results In this section, the performance of the proposed graph theoretic measures are demonstrated on both simulated data derived from benchmark software, the popular Zachary karate social network, and neural connectivity matrices derived from the ERN EEG study Random Walk Based Clustering Coefficient/Path Length Measures for Binary Benchmark Networks The proposed Markov clustering coefficient and path length measures are first tested to determine if they can distinguish between different network topologies including unweighted random, scale-free ( small-world ), and lattice graphs. One-hundred networks were generated, using the LFR benchmark generator [32], of each topology with order N = 50 and the proposed graph measures were applied. The average clustering coefficient and average path length are computed across all nodes for all 100 networks. Figure 2.2 demonstrates C random < C small world << C lattice and L random L small world << L lattice using the Markov measures. These figures verify that the proposed measures are suitable for defining a modified small-world parameter. Similar results were observed for N = 100 and N = 250 which are not reported here. Next, the conventional measures are compared to Markov measures as applied to the unweighted karate network. Figures 2.3 and 2.4 demonstrate the differences between the measures. The proposed measure provides very different yet accurate information about the network s clustering properties compared to the conventional measure. For example, nodes 1, 33, and 34 are assigned the largest clustering coefficients as expected since they represent the three dojo leaders within the karate club. They were assigned significantly lower values using the conventional measure whereas nodes 15 and 16, students not participating in leadership roles, were assigned larger 44

60 values using the conventional measure. This is due to nodes 15 and 16 forming all possible triplets among their neighbors. However, this obscures the importance of 1, 33, and 34 since they have more direct connections than 15 and 16 (a problem previously discussed in section 2.2). Therefore, the proposed clustering coefficient metric is better suited for evaluating which nodes promote centrality within the community structure of the network, i.e. acting as important central hubs. In this case, the dojo leaders of the club were accurately identified since it is their responsibility to bring students together. Similarly with path length, the proposed measure assigns nodes corresponding to dojo leaders the lowest path lengths. When these nodes were chosen as absorption states in the Markov chain, it required very few state transitions to link them with all other nodes within the network. Therefore, nodes 1, 33, and 34 have very robust connections and would less likely be disconnected from the network in case of failure among node pairs. Node 12 possesses the largest path length therefore indicating it is at higher risk of being disconnected from the network. This student does not have social ties with anyone in the class except for the dojo leader. Therefore, if dojo leader 1 fails to adequately provide attention to student 12, the student may very well drop out of the club. The path length information attained from the conventional measure is nearly uniform across nodes and does not accurately portray the differences in nodal robustness and therefore less discriminative Weighted Graph Measures on Benchmark Data Performances of the proposed weighted graph measures were verified by applying them to the weighted version of Zachary s Karate club and comparing the results to those obtained through existing weighted measures, in particular Onnela s weighted clustering coefficient and Stam s characteristic path length. The first analysis was performed using Onnela s and the proposed weighted clustering coefficients (Figure 2.5). Compared with the results of Onnela s measure, the proposed 45

61 weighted clustering coefficient indicates an increased clustering coefficient for the central nodes 1, 33, and 34 and decreased clustering coefficient for outer boundary nodes 15 and 16, who are only regular members of the club. The relationships maintained by the senior members are important for forming the two large communities in the karate club. This indication of network topology and the identification of the central nodes are not available through Onnela s measure. This difference between Onnela s and the proposed weighted clustering coefficients is reflected by a low correlation coefficient of 0.3. The weighted path length described by Stam et. al. and the proposed measure in this paper are next applied to the karate network. Based on Figure 2.6, the two measures are closely related for most nodes, indicated by a correlation coefficient of 0.7. However, there are some differences between the two measures. For example, based on the proposed measure node 17 has the largest path length, whereas for Stam s measure nodes 10, 12, 18 and 19 have the largest path lengths. The proposed measure yields the highest path length for node 17 since that node is not directly connected to any of the central nodes, i.e. nodes 1, 33 and 34, whereas nodes 10, 12, 18 and 19 are all directly connected to a central node. Since the proposed measure computes the path length by considering all possible connections between a particular node and the rest of the network, it is a better indicator of nodes that are more isolated from the rest of the network. In general, most nodes in the karate network were assigned increased path lengths since all shortest paths, which may be composed of more than one edge, were considered. Therefore, the proposed path length measure presents a much more complete representation of network connectivity. Also, the minimum of the proposed weighted path length measure is 1, similar to the conventional measure for binary networks. Unlike Stam s measure, this permits a weighted small world measure to be computed according to the same principles defining the conventional small world measure for 46

62 binary networks Weighted Graph Measures on ERN EEG Data Bivariate Phase-Synchrony Measures The RID-TFPS measure was applied to trials from each electrode pair in the 62-channel montage, separately for each subject and condition (i.e. error and correct, where correct trials were randomly matched to the number of correct trials). The average phase synchrony corresponding to the theta frequency band (4-7 Hz) and ERN time window (25-75 ms) were recorded into separate 62 x 62 connectivity matrices, one for each subject (90) and condition (2), producing 180 total connectivity matrices. Figure 2.7 illustrates the significant phase synchrony differences between error and correct responses based on a Wilcoxon Rank Sum Test (p<.001). From this figure, it is clear that error trials produce stronger connections among a number of sites than correct trials, with a dense representation around FCz (consistent with mpfc). Importantly, bivariate TFPS effects (error > correct) reported by Cavanagh et al. [21] are replicated in the current data (F5-FCz, t=2.26, p<.026; F6-FCz, t=3.84, p<.001). However, this plot makes it clear that there are many more significant electrode pairs, and that these two a priori pairs do not fully represent the complexity of the functional connectivity. Thus, although the distribution of pairwise error-correct phasesynchrony differences is useful for determining which electrodes are most phase-synchronous during the ERN, this does not serve to characterize the network behavior corresponding to cognitive control, or how such networks reorganize during error trials Graph Measures To assess control-related network characteristics using the graph theoretic framework, the proposed weighted clustering coefficient and the weighted path length measures were applied to all of the connectivity matrices. The weighted graph measures were evaluated for each electrode 47

63 across participants for the error and correct conditions. This resulted in two 62 x 90 matrices for each condition, corresponding to the clustering coefficient, path length, and overall small-world measure. Significant error-correct differences (where Wilcoxon p<.01) for the proposed measures are presented in Figure 2.8. The clustering coefficient was significantly greater during error responses versus correct responses in the frontal region, centered on site FCz, as hypothesized, suggesting increased community formation or functional segregation for error responses relative to correct. Results also indicate that path length was shorter during error responses relative to correct responses supporting the idea that connectivity was increased among the observed network s sub-communities. Both measures are indicative of increased functional organization among the neuronal regions in the frontocentral region of the brain for error responses, consistent with involvement of the mpfc. Like the constituent clustering and path length parameters, the proposed weighted small-world measure evidenced significant error-correct differences in the same frontocentral regions. Together, these results indicate a significant increase in network efficiency and functional connectivity during error trials relative to correct. We also compared the performance of the proposed measures with previously published weighted graph measures on the same ERN data set. Figure 2.8 shows the topographic distributions for the error-correct differences and the associated significance of these differences for both measures. It can be seen that both the existing and proposed measures show significant differences in the small-world index in the mpfc and lpfc areas with the significance being higher for the proposed small-world index. Similarly, the path length and clustering coefficients show significant differences in similar areas for both the proposed and existing measures, with the proposed measure yielding increased significance. These results indicate that the proposed weighted graph measures capture similar information about the network topology compared to the existing measures while 48

64 being more sensitive to the differences in network topology involved in error relative to correct responses Relationship between Graph and Bivariate Measures Next, we assessed whether the significant graph theoretic measures accounted for the bivariate F5-FCz and F6-FCz relationships. This is important for verifying that these new results are directly relevant to the previously published effects of Cavanagh and colleagues [21]. First, correlations between the three graph theoretic and two bivariate measures were computed (Small World, Clustering Coefficient, and Path Length correlated with the FCz-F5 and FCz-F6 bivariate TFPS measures; presented in Table 2.2). For both the error and correct conditions independently, the correlations were high. This verified that a majority of the variance in the bivariate measures was indeed represented in the graph theoretic measures. The correlations for the error-correct difference were also significant, but were qualitatively smaller than those for the conditions independently. Thus, we next evaluated whether the significant bivariate error-correct differences for FCz-F5 and FCz-F6 would be explained by the graph theoretic measures. This was accomplished by taking the variance unique to the bivariate measures (i.e. residual variance in each bivariate measure after removing variance shared with the overall small world measure), and assessing for remaining error-correct differences. This residual variance was not significantly different from zero (both one-sample ts < 1), verifying that the small-world measure accounted for the previously reported bivariate error-correct differences. Next, additional analyses were conducted to assess whether one of the constituent small-world measures (i.e. clustering coefficient or path length measure) was specifically or uniquely responsible for the relationship between the smallworld and bivariate measures. To accomplish this, regressions were conducted using alternatively each of the significant bivariate error-correct differences as dependent variables, and the associated 49

65 clustering coefficient and path length as independent variables. Results indicated that the clustering coefficient carried some unique variance beyond what was shared with path length, for both FCz- F5 phase synchrony (R=.60, F( 1,89)=23.94, p<.001; Clustering Coefficient, t(89)=3.72, p<.001; Path Length, t<1, ns) and FCz-F6 phase synchrony (R=.52, F(1,89)=16.19, p<.001; Clustering Coefficient, t=4.49, p<.001; Path Length, t(89)=1.92, p<.058). Table 2.2: Correlations between multivariate small world and bivariate (FCz-F5/FCz-F6) measures for both error and correct responses individually and the error-correct differences (all correlations significant; p<0.001). Small World Clustering Coefficient Path Length F5-FCz F6-FCz F5-FCz F6-FCz F5-FCz F6-FCz Error Correct Error-Correct Optimal Transformation From Weighted to Binary Matrix: ERN/EEG Data The proposed method for transforming a weighted matrix into a sparse binary matrix was applied to the EEG ERN data. The 90 phase synchrony matrices were averaged into a single matrix for Correct and Error responses. These two matrices were then inputted into the transformation algorithm. The graphs in figures 2.9 and 2.10 represent the optimal binary transformation of the original weighted connectivity matrices. The red node indicates the location of the FCz node. The graph representing error responses contains dense connections among the frontal and frontal-central nodes centered at FCz. As hypothesized, stronger interactions are expected between these node regions during the ERN. Modules of parietal and central nodes are also formed further along the graph. The graph representing the correct responses similarly contains dense connections between 50

66 FCz and frontal and central nodes but is not as centrally located compared to its location in the error graph. For comparison, the small-world parameter for binary graphs was computed for binary matrices generated by thresholding. Binary graphs corresponding to a range of threshold values, 0 τ 1, with a step size of were formed. Figure 2.11 reports the small-world parameter with respect to network degree (which is dependent on threshold) generated for different τ for both error and correct responses. Small-world values computed for disconnected binary graphs are not reported. The small-world measure was also computed for the optimal binary graphs generated for error and correct using the proposed method. From inspection of figure 2.11, small-world, computed via thresholding, is larger for error compared to correct for very sparse representations (graphs with average degree approximately between 13 to 20). Small-world, computed for the optimal binary graph transformations, is larger for error as well. Furthermore, the average degrees of the optimal binary graphs are sparse (15 for error and 14 for correct). These results demonstrate the optimal binary transformation is capable of identifying a binary graph structure with similar characteristics as the graph derived from the best threshold value. This approach saves computational time since numerous thresholds do not need to be inspected, eliminates a dependence on τ all together, and maintains connectivity. 2.7 Conclusions In this chapter, we introduced new graph theoretical measures, clustering coefficient and path length, for binary and weighted graphs. The proposed measures offer improvements over existing small-world measures. By using graph measures that are directly applicable to weighted networks, the need for thresholding the elements of connectivity matrices has been circumvented and the complete information about the strength of the connections between nodes is taken into account, simultaneously. The proposed measures also improved upon existing weighted graph measures by 51

67 considering all of the alternative ways information can flow in a network rather than through its immediate neighbors only. This improvement is accomplished by adopting a probabilistic framework where the edge weights are treated as the probability of connection. These weighted measures demonstrated integration and segregation of members of the karate network. The proposed weighted measures were also used to assess the small-world properties of the ERN EEG data. Topographic results indicated that frontal areas contained increased clustering and decreased path length (i.e. small-worldness) in response to errors, relative to correct responses. This supports the hypothesis that the brain increases organization, due to the anterior cingulate cortex, across the frontal brain regions during error processing. The usefulness of these weighted measures inspired the need to improve upon the conventional measures for binary graphs. The challenge was to associate distinct weights to edges across a binary network. This was addressed by modeling the adjacency matrix as a Markov chain. Edge weights were assigned based on the probability of a random walk moving between nodes where the probability was computed as the reciprocal of the node s degree. Both clustering coefficient and path length defined for binary graphs through this manner were shown to be more indicative of a node s hubness and an edge s robustness for binary graphs as shown through simulated binary networks. Finally, in the case of weighted to binary graph transformation, the need for arbitrary thresholding was circumvented through the introduction of an optimal weighted to binary transformation algorithm. The algorithm transformed a fully connected weighted graph to a sparse binary graph while simultaneously preserving connectivity, small-worldness, and a degree sequence whose distribution was modeled as a power-law distribution. This transformation algorithm was applied to the average phase synchrony matrices pertaining to the ERN EEG study. Two different structures 52

68 representing the brain network were generated and revealed the increased presence of hub nodes both in the frontal and frontal-central regions during error responses. 53

a) The proposed binary clustering coefficients of the simulated small-world networks are

69 (a) (b) Figure 2.2: Markov small-world measures averaged over 100 benchmark networks of order N = 50. a) The proposed binary clustering coefficients of the simulated small-world networks are much greater than that of random networks, i.e. γ 1; b)the proposed binary path lengths of the simulated small-world networks are similar to that of random networks, i.e. λ 1 54

70 Normalized Clustering Coefficient Normalized Clustering Coefficient Markov Clustering Coefficient nodes (a) Conventional Clustering Coefficient nodes (b) Figure 2.3: Markov and conventional clustering coefficients applied to binary karate network. 55

71 1 Markov Path Length Normalized Path Length nodes (a) Conventional Path Length Normalized Path Length nodes (b) Figure 2.4: Markov and conventional path lengths applied to binary karate network. 56

72 1 Proposed Weighted Clustering Coefficient Clustering Coefficient nodes (a) Onnela Weighted Clustering Coefficient Clustering Coefficient nodes (b) Figure 2.5: Nodal weighted clustering coefficients computed using Onnela s clustering coefficient and the proposed clustering coefficient for Zachary s Karate club network. 57

73 1 Proposed Weighted Path Length 0.8 Path Length nodes (a) Stam Weighted Path Length 0.8 Path Length nodes (b) Figure 2.6: Nodal weighted path lengths computed using Stam s path length and the proposed path length for Zachary s Karate club network. 58

74 Figure 2.7: Connectivity plot based on the RID-TFPS measure for electrode pairs whose phase synchrony is significantly larger for error responses compared to correct responses (Wilcoxon rank sum, p 0.001). 59

75 Figure 2.8: Topomap of the differences in the nodal weighted clustering coefficient, path length and small world parameter for error and correct responses. These differences are computed for both the previously published and proposed measures. Topographic significance plots are provided for each measure. 60

76 AFZ F3 F1 FC3 FZ F2 FC1 FCZ C1 FC2 FC4 CZ C2 C4 FP1 AF3 F7 F5 FPZ CP5 FT7 C5 FC5 C3 T7 AF7 TP7 F4 FP2 F6 F8 FC6 FT8 C6 TP8 T8 AF8 CP4 CP6 CP3 CP1 CPZ CP2 P7 P5 P3 PO5 PO3 PO7 O1 POZ PO4 OZ O2 P1 PZ P2 P4 P6 PO6 P8 PO8 Figure 2.9: Optimal binary graph generated from weighted connectivity matrices for error responses. Hubs of the network are indicated in color. Hubs are identified based on the number of shortest paths in which each node participates. Green indicates the most important local hubs, blue indicates nodes globally important for connecting distant portions of the graph, and red is the FCz node (also an important local hub). 61

Figure 2.10: Optimal binary graphs generated from weighted connectivity matrices for correct responses. Hubs of the network are indicated in color.

77 Figure 2.10: Optimal binary graphs generated from weighted connectivity matrices for correct responses. Hubs of the network are indicated in color. Hubs are identified based on the number of shortest paths in which each node participates. Green indicates the most important local hubs, blue indicates nodes globally important for connecting distant portions of the graph, and red is the FCz node (also an important local hub). 62

78 Small World Computed using Thresholding and Optimality Algorithm Error (thresholding) Correct (thresholding) Error (Optimal) Correct (Optimal) Degree, K Small World Figure 2.11: Comparisons between small world calculations using the conventional thresholding method and the propose optimal binary transformation approach. 63

79 Chapter 3 Local Feature Extraction from Weighted Graphs 3.1 Introduction Chapter 2 demonstrated how complex networks may be modeled as graphs composed of vertices and edges and may be characterized as small world based on graph theoretic measures such as clustering coefficient and path length. The small world measure, however, is as much as a mathematical abstraction as the graph it describes [97]. The small-world measure and other existing graph measures, e.g. girth, diameter, degree, fall short in providing a functional interpretation of the network familiar in research fields such as signal processing and information theory. This renders it difficult to interpret the physical significance of conventional graph measures describing non-abstract systems such as biological networks. The goal of this chapter is to complement current graph measures with non-abstract measures possessing physical meanings supported by well-established theory from other fields of research. Furthermore, these new measures can serve to identify local discriminate features of the graph. 64

80 Feature extraction of data is widely explored and is an important tool in signal processing, machine learning and pattern recognition [98, 99, 100]. Feature extraction usually maps the highdimensional input data onto a lower dimensional feature space which reflects the inherent structure of the original data [101]. In unsupervised learning, these discovered features are the most representative of the original data and can be used to identify similarities and differences between data sets or to detect anomalies within the data. Adapting existing feature extraction methods from signal processing to graphs is difficult because graphs, particularly those representing real-world networks or systems, are semi-structured and non-euclidean [102]. Regardless, there have been various proposed measures and algorithms for extracting features from graphs using well-known methods from across different fields of research. For example, in a study of elastic graph matching for facial expression recognition, Zafeiriou et. al [103] implemented a kernel-based technique for discriminant feature extraction in graphs. Fei et. al. [104] adapted data mining techniques to graphs by demonstrating the frequency of subgraphs can be used as a feature extraction method for graph classification. Miller et. al. [105] demonstrated anomaly detection in subgraphs through spectral decomposition of the modularity matrix, and using a test statistic, determined the presence of noise in a graph. Ortega et. al. [106] proposed a lifting wavelet transform on sensor networks for de-noising applications. It is evident from these published measures that local graph feature extraction methods based on signal processing concepts would provide a more detailed and multiscale view of network organization. A close examination could reveal hubness of individual nodes, roles of subgraphs in maintaining connectivity, compressibility of sparsely connected graphs, or the existence of noise or anomalies. The most common local feature extraction method applied to graphs is centrality which can be used to describe the roles individual edges and vertices serve in connectivity [107]. However, the 65

81 current centrality measures such as betweenness or closeness centrality do not have any physical meaning in signal processing. In this chapter, established measures from signal processing and information theory, energy and entropy, are extended to binary and weighted graphs. These measures are designed to detect anomalous subgraphs in the network as well as classify the hubness of nodes. The proposed definitions of graph energy and graph entropy are evaluated on simulated and real-world networks, including ERN data, and compared to the performance of recently published measures. 3.2 Existing Feature Measures for Graphs In this section, a review of some of the most common feature measures for graphs is presented. These measures have been used in the literature to compare graphs with each other, understand network structure, and evaluate the flow of information across the network Hub Centrality A hub is defined as a node within a network which takes on the role of maintaining connectivity, sometimes also known as an influential node. Hubness of a node can be quantified by betweenness, closeness, eigenvector, and degree centrality [108]. Betweenness centrality [109], the most commonly utilized centrality measure, quantifies the fraction of shortest paths within the network that travel through node v i such that C B (i) = σmn(i) m i n V σmn (3.1) where σmn is the number of shortest paths between nodes vm and vn and σmn(i) is the number of shortest paths which travel through node v i. Closeness centrality [110] is the mean geodesic distance between node v i and all other nodes such that 66

82 1 C C (i) = j V /i D(i, j) (3.2) where D is the distance matrix whose pair-wise entries are the minimum number of edges required to form a path between v i and v j. Degree centrality [110] quantifies a node s importance based on the number of direct connections such that C D (i) = k i N 1. (3.3) Finally, eigenvector centrality, introduced by Bonacich [111], quantifies a node s importance based on the principle eigenvector of the adjacency matrix such that C E = 1 λ AC E (3.4) where λ is the most dominant eigenvalue of adjacency matrix A and C E is the eigenvector associated with λ such that C E (i) is the centrality score for v i. Among these four centrality measures, eigenvector centrality is the most reliable of true graph connectivity [111] and is similar to a commonly used signal processing method, principle component analysis (PCA) [112]. Bonacich argued that a centrality measure based on degree, e.g. closeness, betweenness, and degree centrality, weights every edge connection in the network equally while the eigenvector weights edge connections according to their true centralities. Furthermore, eigenvector centrality is a weighted sum of not only direct connections, as is the case for degree centrality, but indirect connections of every length such that it accounts for the entire underlying connection pattern across the network. 67

83 3.2.2 Provincial and Connector Hubs Another method for evaluating the centrality of a node is to classify it as either a provincial hub, connector hub, or non-hub [113] according to its role in a community structure. A dense group of highly connected nodes in a network are referred to as clusters. A node is considered to be provincial if it is primarily responsible for linking nodes together within a cluster. A connector hub is a node which links together multiple clusters and are located along the inter-edges of two or more clusters. A non-hub does not demonstrate any important leadership roles in maintaining network connectivity and may be the outer node of a cluster. Node categorization is performed by computing a within-cluster score, z i, and a participation coefficient, P i. Nodes with similar roles are expected to have similar relative within-cluster connectivity. A cluster set is defined as C = {c 1,c 2,...,cs} where c j V and s is the number of clusters in the community structure. If Φ(i) = {c j v i c j }, z i = κ i κ Φ(i) σκ Φ(i) (3.5) where κ i is the number of edges shared between v i and nodes within its cluster, i.e. intra-cluster edges for node v i, κ Φ(i) is the average of κ i over all the nodes in Φ(i), and σκ Φ(i) is the standard deviation of κ i. The within-cluster degree z-score measures how well-connected node v i is to other nodes in the cluster. Roles are also determined by connections between clusters. For example, two nodes with the same z-score will play different roles if one of them is connected to several nodes in other clusters while the other is not. The participation coefficient P i of node v i is 68

84 ( ) s κic 2 j P i = 1 (3.6) k j=1 i where κ ic j is the number of links of v i to nodes in cluster c j and k i is the total degree of node v i. The participation coefficient of a node is therefore close to 1 if its links are uniformly distributed among all the clusters and 0 if all its links are within its own cluster. Guimera et. al. [113] describe a method to classify a node as provincial, connector, or non hub based on the value of the node s within-cluster score and participation coefficient. For example, a node with a within-cluster score greater than 2 and a participation coefficient less than 0.3 is categorized as a provincial node. However, the problem with the classification technique of [113] is that a priori knowledge about cluster membership is required. Therefore, a reliable clustering algorithm is first necessary to identify clusters and their nodal memberships. 3.3 Graph Energy In this section, current definitions of graph energy are presented. This is followed by the introduction of a new graph energy metric based on the Laplacian matrix and an algorithm for detecting local and global structural anomalies in a graph Existing Measures of Graph Energy Energy is a well defined metric in signal processing for quantifying the norm of a finite signal. To measure the stability and reliability of connectivity in a graph, a global energy measure was first defined by Gutman et. al [114] as E(G) = n λ i (3.7) i=1 where λ 1,λ 2,...λn are the eigenvalues of the adjacency matrix. If a graph is not connected, its 69

85 energy is the sum of the energies of its connected components; therefore, isolated vertices do not contribute or affect the total energy. Energy is affected, however, by the number of edges in the graph. Day et. al [115] reports graph energy changes if edges are removed or added to the graph. Therefore, if a node which is linked to the network by at least one edge, is removed, then energy will change. Another measure of energy, Hückel energy, was introduced by Coulson [116] to determine the energies of molecular orbitals in hydrocarbon systems. The vertices of the graph associated with a given molecule are in one to one correspondence with the carbon atoms of the hydrocarbon system. Hückel energy, defined as 2 r HE(G) = i=1 λ i if n = 2r 2 r (3.8) i=1 λ i + λ r+1 if n = 2r + 1 insures that the total π-electron energy of a conjugated hydrocarbon is simply the energy of the corresponding molecular graph. Balakrishnan [117] argues Hückel energy provides a better approximation for the total π-electron energy than Gutman s definition. This is attributed to using only a subset of the leading eigenvalues associated with the adjacency matrix. This approach can be thought of as revealing the internal structure of the data such that the variance in the data is best explained [118]. Coulson s approach is also supported by Allefeld et. al. [119] who demonstrated that the leading eigenvalues have increased correspondence with the strength of local network connectivity than trailing eigenvalues. The information revealed by Hückel energy, however, is limited since the spectral information of the adjacency matrix has been shown to be less indicative of structural content than the spectral content of the Laplacian matrix [120]. Furthermore, these measures of graph energy do not evaluate the local properties of graphs. Individual nodes or subgraphs may be responsible for the energy content of a graph due to their local connections. In the next section, an extension of Hückel energy, Laplacian Hückel Energy, is presented along with an algorithm for 70

86 computing local energy indices Laplacian Hückel Graph Energy It has been shown that the Laplacian matrix retains greater structural information of a graph than the adjacency matrix [120, 121] where the Laplacian matrix of a graph is defined as L i j = k i if i = j 1 if A i j = 1 0 otherwise (3.9). Therefore, an energy measure computed from the eigenspectra, (µ 1 µ 2... µn), of the Laplacian matrix is proposed. The Laplacian matrix is positive semi-definite and its eigenvalues are all positive. While the adjacency matrix eigenvalues have a zero-mean distribution, i.e. n i=1 λ i n = 0, the Laplacian matrix does not share this property since all its eigenvalues are positive. Computing the trace of the Laplacian matrix as tr(l) = i L ii (3.10) = = n i n j A i j (3.11) n i k i (3.12) and with n i µ i = tr(l), it can be shown that n i=1 µ i n = n i=1 k i n, i.e. the average degree of the network is equivalent to the average of the Laplacian eigenvalues. The average is subtracted from 71

87 each of the Laplacian eigenvalues to render a zero-mean distribution such that µ i = µ i n i=1 k i n. Now, n i=1 µ i = n i=1 λ i = 0 and the adjacency matrix eigenvalues in the original definition of Hückel energy are substituted with the new eigenvalues of the Laplacian matrix to obtain the Laplacian Hückel Energy measure defined as 2 r LHE(G) = i=1 µ i if n = 2r (2 r (3.13) i=1 µ i ) + µ r+1 if n = 2r + 1 This new definition of graph energy incorporates the advantage that most structural information is contained in the first few largest eigenvalues of the Laplacian matrix. It is also expected to be more discriminative of energy content than Gutman s and the conventional Hückel energy in the graph since the spectrum of the Laplacian matrix is more superior at representing structural connectivity than that of the adjacency matrix Localized Laplacian Hückel Graph Energy In this section, the lack of signal processing inspired local graph indices is addressed by extending the proposed Laplacian Hückel Energy to evaluate the energy content of nodes and their neighborhoods across the network. This index can be used to quantify the importance of a node based on its magnitude of contribution to total graph energy. The h-hop neighborhood of a node v i is defined as H(i) = {v j V D(i, j) h}, where h {1,2,3,..,d}. Therefore, V h (i) = (v i H(i)) and E h (i) = {e kl E v k,v l V h (i)}, where G h (i) = (V h (i),e h (i)) is the subgraph of G containing node v i and its h-hop neighbors. To define nodal energy contribution of a node, the subgraph, G h (i), obtained by the node of interest v i and all of its h-hop neighbors is considered. This limits the energy dependencies of G to the internal structure of G h (i) only. Removing a node or subgraph and computing the effect on the original network s 72

88 energy is not an accurate representation of energy contribution since the original network s total energy is not a sum of its subgraphs individual energies. Therefore, instead of removing the nodes and edges of G h (i), randomizations of G h (i) s internal structure are considered, i.e. changing the position of edges, to determine how a perturbation in the local neighborhood of node v i affects the energy of the whole graph G. The number of nodes, n = V h (i), and edges, m = E h (i), are preserved to maintain a constant Cost, where Cost = 2 E n(n 1) and 0 Cost 1, and graph order across randomizations. As Cost approaches 1, there are less permutations of the subgraph possible. When Cost = 1, G h (i) is complete, and therefore cannot be permuted. Therefore the node is deemed redundant and unnecessary for maintaining structural integrity of the network, i.e. it contributes zero energy to G since removing it will not change connectivity in the graph. A permuted G h (i) will be identified as PG h (i). If the total possible number of edges in a graph is α = n (n 1) 2, then the number of possible PG h (i) s is Algorithm 3 Nodal contribution to graph energy 1: Input: Graph G=(V,E). 2: Output: d n matrix β containing nodal energy contributions. 3: n = V. 4: Compute n n distance matrix, D, of G. 5: for i = 1 to n do 6: for h = 1 to d do 7: Find G h (i) 8: m = E h (i), n = V h (i). 9: α = 2 1n (n 1). 10: for r = 1 to PG h (i) do 11: G r = G\G h (i) PG r h (i) 12: energy(r)=lhe(g r ). 13: end for 14: LHE rand =mean(energy) 15: β h (i) = LHE(G) LHE rand. 16: end for 17: end for 73

89 ( ) α PG h (i) = m 1 (3.14) Therefore, a new subgraph, PG r h (i), is formed from each permutation where the integer r = {1,2,3,... PG j (i) }. The Laplacian Hückel Energy of the new graph is evaluated, G r, where G r = G\G h (i) PG r h (i) for each randomization of G h (i) and an average change in energy is computed. The node associated with the greatest change in energy is deemed the most important node for maintaining structural integrity as long as the following constraints are adhered to during the randomization of the subgraphs: Every G r generated must be a connected graph, i.e. no isolated vertices or multiple components. G h (i) is connected but the permuted version, PG h (i), does not need to be connected. The importance of node v i, associated with the subgraph G h (i), in preserving structural connectivity of G is proportional to LHE(G) LHE rand, i.e. the larger the difference between the two energies within a particular h-hop neighborhood, the more important a node is to the network. The h-hop neighborhood determines if the energy change is due to the center node s local or more extensive/global relationships. 3.4 Graph Entropy Entropy is another classic measure commonly used in signal processing and information theory. In this section, current definitions of graph entropy and graph entropy rate are presented. This is followed by the introduction of a new graph entropy rate measure based on modeling the adjacency matrix as a Markov process. This new definition of graph entropy is utilized in an algorithm designed to quantify the complexity of graph subregions as well as to detect local anomalies. 74

90 3.4.1 Existing Measures of Graph Entropy In information theory [122], entropy is a measure of the uncertainty associated with a random variable. The original definition of graph entropy was introduced by Korner [123] which quantifies the lower bound on the complexity of graphs. It is defined as H(G) = min X,Y I(X;Y ) such that the minimum is taken over all pairs of random variables X, Y such that X is a uniformly selected vertex in G and Y is the set of vertices containing X. Similar to basic properties of entropy in information theory, if G contains no edges, then H = 0 and if G is complete such that every node is connected to all of the other nodes, then H(G) = log 2 (n), where n is the cardinality of V, the number of nodes. Implementation of Korner s definition of entropy, however, is NP hard which makes its evaluation for real world networks impracticle. A more recent definition for computing entropy in real-world networks was proposed by Dehmer et al. [124]. Dehmer proposed to quantify the complexity of a graph using Shannon s definition of entropy such that the probability distribution is computed from a metrical property or graph invariant, e.g. node degree or closeness centrality. Entropy is defined as I f (G) = V i=1 f (v i ) V j=1 f (v j ) log f (v i ) V j=1 f (v j ) (3.15) where f is an arbitrary information functional. The set S j (v i,g) = {v V d(v i,v) = j, j 1} is called the j-sphere, similar to the previously defined h-hop neighborhood, of v i regarding G and d(v i,v) denotes the shortest distance between nodes v i and v. For example, the information functional computed from node degree is f V (v i ) = α c 1 S 1 (v i,g) +c 2 S 2 (v i,g) +...+c ρ Sρ(v i,g) (3.16) 75

91 where c k 0 is an arbitrary real positive coefficient, 1 k ρ, ρ is the maximum j-sphere level evaluated, and α > 0. Dehmer s method is a good approach to incorporating the local connectivity information to compute overall graph entropy. However there are three noticeable disadvantages of Dehmer s method. First, the many variables in the information functional and the sensitivity to the type of graph metric incorporated will have a biased impact on entropy. Entropy will be significantly different across varying α and graph metrics. Second, Dehmer s solution evaluates random realizations of c k and which have no actual significance or relationship to the structure of the network. This also makes it difficult to compare across multiple graphs if the same realizations of c k are not used. Finally, as discussed in section 3.2, node degree is a weak metric to understand the structural connectivity of the network. The entropy measure is dependent on this graph metric and therefore further research is necessary to determine which is the best graph metric for representing the graph. Another local entropy measure was proposed by Shetty et. al [125] for identifying the local hubs in the social network of the Enron corporation. Shetty computed a probability distribution based on the frequency of certain correspondences within the company. The novelty is not in the definition of entropy, since he directly implements Shannon s definition, but rather from the procedure for evaluating the contribution of individual nodes to entropy of the entire network. Nodes which have the most effect of graph entropy when they are removed are thought to be important to network connectivity. Next, measures for computing entropy rate of a graph are discussed. The first was presented in an application by Demetrius et. al. [126] who implemented Kolmogorov-Sinai entropy rate by evaluating random walks along the graph. This measure of graph entropy rate, proposed by Burda et. al. [127], computes the entropy rate of a graph as logλ 1 where λ 1 is the largest eigenvalue of the adjacency matrix. Sinatra et. al [128] recently applied Burda s definition to 76

92 quantify the maximum level of information diffusion across a network. Burda s measure, however, is plagued by a similar problem observed in Dehmer s measure; a dependence on node degree which is the weakest measure of network connectivity [129]. Finally, Szpankowski et. al. [130] proved that the structural entropy rate of an Erdos-Renyi random graph is ( n) 2 h(p) nlogn+o(n) where h(p) = plog p (1 p)log(1 p) is the entropy rate of a conventional memoryless binary source. Szpankowski is the first to present a graph coding scheme based on a measure of entropy rate. However, entropy is computed under the assumption that all graphs with an order n and size m are equiprobable and edges are formed with uniform probability. This is not the case in real-world networks which follow a power-law distribution. Burda s entropy rate is generalizable to any graph but is not suitable for evaluating graph compressibility since a graph cannot be uniquely reconstructed solely with knowledge of its degree sequence. Szpankowski offers a novel compression method for graphs, however it is developed from examining random graphs only. This motivates the need for a new entropy measure which considers the unique structure across an array of graph topologies, not only random graphs, and can also lead to an appropriate and practical graph coding algorithm Proposed Graph Entropy Rate In this section, a new measure of graph entropy rate is proposed for unweighted (binary) and weighted graphs by modeling their adjacency and connectivity matrices as a Markov process. The adjacency matrix is directly related to the structure of a graph and therefore intuitively the spatial coherence of elements in the matrix will reflect the pair wise relationships among nodes in the network. For example, nodes in a random graph do not have an organized relationship and therefore it s expected this random relationship will be reflected in the lack of spatial coherence among elements in the adjacency matrix. To examine the variation in adjacency matrix element 77

93 dependencies, similar to pixel dependencies in an image [131, 132, 133], it is proposed that the adjacency matrix be modeled as a Markov process. Entropy rate of a Markov process X 1,X 2,X...Xr is formally defined as H(χ) = i j π i P i j log 2 P i j (3.17) where P = [P i j ] is the transition probability matrix such that i, j = 1,2,...,n, P i j = Pr(X r+1 = j Xr = i), and π is the stationary distribution. In this section, we first introduce a new definition of the entropy rate of a binary graph, H(G), by transforming the binary adjacency matrix into a stochastic process X 1,X 2,...,Xr, for r = 1,2,..., through permutation and scanning operations. If X is an m th order Markov process, we define the general form of the entropy rate for a graph as H(G;m) min Z,ψ H x(zaz T ;ψ) (3.18) where Z is a permutation matrix applied to the adjacency matrix, A, and ψ is the particular scanning function of the upper triangular matrix elements. These two variables define our transition matrix, P. that Equation 3.18 can be extended to evaluate the entropy characteristics of a weighted graph such H w (G;m) where C Q is a codebook generated by a quantizer. min Hx(ZWZ T ;ψ;c Q ) (3.19) Z,ψ,C Q 78

94 Matrix Bandwidth Minimization For a graph, G, composed of n nodes, there exists n! permutations, i.e. isomorphisms, of G. The structural information of the graph is retained across all isomorphisms of G. Therefore, to maximize the spatial coherence of the adjacency matrix elements, similar elements in the adjacency matrix are grouped together by implementing an optimization technique, matrix bandwidth minimization [134]. A labeling f of G assigns the integers {1,2,...n} to the vertices of G [135]. Let f (v) be the label of vertex v, where each node has a different label. The bandwidth minimization function generates a permutation matrix, Z, such that Z is an n order binary symmetric orthogonal square matrix where each row and column contains precisely a single 1 and zeros everywhere else. The permuted versions of the binary adjacency and weighted connectivity matrices are obtained, respectively, by ZAZ T and ZWZ T. For binary graphs, the bandwidth of a node v i, B f (v i ), is the maximum of the differences between f (v i ) and the labels of its adjacent nodes. That is B f (v i ) = max v j N(v i ) { f (v i ) f (v j ) } (3.20) where N(v i ) = {v j : A i j = 1} for j = {1,2,...,n}. The bandwidth of a binary graph G with respect to a labeling f is then B f (G) = max vi V {B f (v i )}. For weighted graphs, the weighted graph bandwidth minimization (WGBM) [134] cost function identifies a permutation of the connectivity matrix which places similar weights close together. A labeling f of weighted graph G assigns positive integers to the vertices of G. Let f (v) {1,2,...,n} be the label of vertex v, where each node has a different label. The objective of the WGBM is to minimize the maximum weighted label difference of an edge, i.e. the edge weight times the label difference. That is 79

95 B f (v i ) = max v j N(v i ) {w i j f (v i ) f (v j ) } (3.21) For a fully connected graph, e.g. pairwise phase synchrony matrices, N(v i ) = V \v i where V is the set of all vertices in the graph. The bandwidth of a weighted graph G with respect to a labeling f is then B f (G) = max vi V B f (v i ). Z is generated by reordering the rows and columns of an identity matrix according to the optimal labeling of the unweighted or weighted bandwidth minimization cost function, which can be based on node degree [136], clustering algorithms [137], and centrality measures [110]. For example, DBSCAN is a density based clustering algorithm proposed by Ester et. al. [138] which applies a Euclidean distance measure, and identifies a number of clusters, for binary and weighted graphs, starting from the estimated density distribution of corresponding nodes. The advantage of this method is that a priori knowledge of the number of expected clusters is not required. Dongen et. al. [139] introduced a fast and scalable unsupervised clustering algorithm, Markov Cluster Algorithm (MCL), for binary and weighted graphs based on the simulation of random flow in graphs. Ding et. al. [140] introduced a spectral bandwidth minimization method (Spectral BWM) which is a cross between clustering and node centrality measures. The permutation matrix Z is formed by assigning labels to nodes based on their co-occurrence in a cluster and can maximize spatial coherence in the adjacency matrix. However, bandwidth minimization cost functions based on these three clustering algorithms are primarily suitable for highly modular networks only. An alternative and much more generalizable cost function is the Cuthill-Mckee algorithm by CutHill et. al. [141]. It is a variant on the breadth-first search of graphs and assigns labels based on the degree relationship of adjacent nodes. An extension of the Cuthill-Mckee algorithm to weighted graphs is proposed by Lin et. al. [142]. The weighted algorithm has a complexity of O(n 3 ) where 80

96 n is the number of vertices in the graph and is generalizable to all weighted graphs. Walker et. al. [136] introduced Approximate Minimum Degree (AMD) for the purpose of reordering a symmetric sparse matrix prior to numerical factorization. The degree may be the binary or weighted definition. Again, Z is generated such that it satisfies the reordering conditions set by Cuthill- Mckee or AMD reordering. In general, Cuthill-Mckee is far superior than AMD in minimizing the matrix bandwidth [143] Scanning Function of the Permuted Matrix The upper triangle elements of the permuted binary adjacency matrix are mapped to a 1- dimensional vector of dimension 1 k, where k = n(n 1) 2 and n = V. In a binary sequence for a first order Markov process, there exists 4 possible state transitions: 0 0, 0 1, 1 0, and 1 1. The states, 0 0 and 1 1, are non-transitional sub-sequences and 0 1 and 1 0 are transitional sub-sequences. The permutation matrix, Z, plays a major role in the entropy rate measure, H(G), since the entropy rate bounds are dictated by the frequency of occurrence of these sub-sequences, i.e. the magnitude of spatial coherence among adjacent elements. Entropy will be minimized if similar states are positioned next to each other in a permuted connectivity matrix since homogeneity of these subsequences is increased. A scanning function is implemented to extract this sequence from the permuted adjacency matrix. The scanning function, ψ, preserves the spatial coherence in the permuted 2-dimensional adjacency matrix when mapped to the 1-dimensional vector. This approach would facilitate convergence to the true lower bound on entropy rate. In image processing, there are various methods for scanning the pixel values into a vector. Four of the most common approaches are: raster scanning [144], Cantor Diagonal scanning [145], Morton Z order scanning [146], and Hilbert Curve scanning [147]. Raster scanning extracts elements in the matrix either in an alternating horizontal 81

97 pattern or an alternating vertical pattern. Cantor diagonal scanning extracts elements along the diagonals of the matrix. Both raster and Cantor diagonalization preserve the locality of element pairs in the matrix but fail to maximally preserve locality among blocks of elements. Morton Z order scanning is a function which maps multidimensional data to a one dimension vector while maximally preserving spatial locality of the data points. The z-value of a point in multi-dimensions is calculated by interleaving the binary representations of its coordinate values. The scanning pattern resembles a group of Z s linked to each other. Morton Z order scanning is better suited than raster scanning for grouping regions of similar pixels in large blocks of an image [148]. The Hilbert Curve algorithm is a continuous fractal space-filling curve whose discrete approximation is useful because it provides a mapping between 1D and 2D space that preserves locality. If (x,y) is the coordinates of a point within the unit square, and d is the distance along the curve when it reaches that point, then points that have nearby d values will also have nearby (x,y) values. Kamata et. al. [145] reports that neighborhood preservation of the Hilbert Curve scan is better than that of the raster, Cantor Diagonal, and Morton Z order scanning functions Quantization of the Weighted Connectivity Matrix Prior to matrix permutation and scanning on the weighted connectivity matrix, quantization is applied across the connectivity matrix in order to establish a finite alphabet of states, Q. Matrix elements are assigned the value pertaining to the level of quantization. A non-uniform quantizer is best suited for determining these levels since the distribution of values in the connectivity matrix of a real world network are typically non-uniform, e.g. most real-world networks have a power-law distribution of their weighted node degrees. Therefore, non-uniform quantization is generalizable to connectivity matrices representing a variety of graph topologies than uniform quantization. For a desired number of quantization levels, Lloyd-Max algorithm [149] offers an optimal solution for 82

98 minimizing distortion, typically quantified by the mean-square quantization error Localized Graph Entropy Rate In this section, the proposed graph entropy rate measure is extended to evaluate the local complexity of a network. The connectivity topology centered around individual nodes is expected to have an impact on overall network complexity. An algorithm is presented here which quantifies the magnitude of importance of individual nodes based on their contribution to network entropy. The proposed graph entropy measure is first applied to the original adjacency or connectivity matrix of graph G. Next, a node v i is selected from the network and its corresponding row and column from the matrix are removed, i.e. all edges incident upon v i in G are deleted. Once again, the proposed graph entropy measure is applied to the transformed graph. The process is formally described in Algorithm 4. This is repeated for each node to obtain n entropy difference measures. Nodes that contribute significantly to network complexity are expected to be associated with a large difference in entropy. The algorithm is applicable to both binary and weighted graphs under the condition that the appropriate graph entropy measure, H(G) or H w (G), is implemented. Algorithm 4 Local Graph Entropy Input: An n ordered graph G = (V,E). Output: h i v i V. 1: h H(G). 2: for i = 1 to n do 3: G = G\v i. 4: h H(G). 5: h i = h h. 6: end for 7: return 83

99 3.5 Results In this section, the proposed Laplacian Hückel Energy and Markov Graph Entropy Rate measures are applied to real-world networks. The performance of these measures, as well as the local indices based on these measures, are compared to published measures. The real-world networks evaluated are the Zachary karate network and the brain networks represented by the phase synchrony matrices obtained from the ERN EEG study Comparison Between Energy Measures In this section, the local energy index using the proposed Laplacian Hückel Energy is compared to the local indices of Gutman s and conventional Hückel energies. These three measures are applied to the Zachary karate network (Figures 3.1, 3.2, and 3.3). The Laplacian Hückel Energy was more discriminative than the other two energy measures. In the 1-hop neighborhood, only Laplacian Hückel Energy associated very high energy contribution to nodes 1 and 33, the dojo leaders of each of the karate groups in network. In the 2-hop neighborhood, Laplacian Hückel Energy was much more discriminative than the other two measures and identified nodes 3, 9, 14, 20, 30, and 31 as nodes which contributed most to network energy. These particular nodes are responsible for long range connectivity since they represent the karate members associated with both groups in the karate network. The set of nodes identified as contributing most to energy by Gutman and Hückel energy measures include those identified by Laplacian Hückel Energy but also nodes whose long range connections are not necessary for maintaining connectivity across the network. These represent the karate members who are primarily associated with one karate group but not both. In the 3-hop neighborhood, Gutman, Hückel, and Laplacian Hückel Energy associate high energy contribution to most of the nodes. Only Laplacian Hückel Energy indicates node 17 as less important for maintaining connectivity since it is the most distant node and its removal 84

100 would not deteriorate overall connectivity. Therefore, Laplacian Hückel Energy is a much more discriminative local energy measure than Gutman and Hückel in the karate network. Next, the nodal Laplacian Hückel energies are compared to betweenness centrality (B i ) and conventional clustering coefficient (C i ) of nodes in the karate network (Table 3.1). Conventional clustering coefficient fails to recognize the importance of nodes 1, 33, and 34 through very low clustering coefficient values. This is due to the clustering coefficient only considering the interdependent relationships among nodes within the 1-hop neighborhood rather than the relationships a node has with the entire graph and its effect on the global energy of the network. The ranking of betweenness centrality agrees with the proposed measure for the nine nodes shown in Table 3.1. However, it does not indicate whether the node is important at a local or a more global scale. Table 3.1: Highest energy contributing nodes in karate network compared with their corresponding clustering coefficient and betweenness centrality Node h-hop Energy Contribution C i B i Laplacian-Hückel Energy of a Biological Network: EEG ERN data The local energy measure was applied to the ERN data set to determine which regions of the brain contributed most to network energy. An average matrix was first computed for both Error and Correct data across all subjects. Next, a threshold was applied to each average matrix to derive a sparsely connected weighted graph such that the average degree was k = 10. This value of k 85

101 (a) Gutman (b) Hückel (c) Laplacian Hückel Energy Figure 3.1: Local energy indices for a n = 34 karate club network: 1-hop Energy Analysis. was chosen such that varying h-hop levels could be evaluated. Evaluating energy for the average matrix without thresholding is equivalent to only one neighborhood containing all nodes, i.e. the 1-hop neighborhood, which did not provide useful localized energy information. A value of k = 10 allowed 5 different neighborhood levels to be observed thus demonstrating how increases in energy 86

102 (a) Gutman (b) Hückel (c) Laplacian Hückel Energy Figure 3.2: Local energy indices for a n = 34 karate club network: 2-hop Energy Analysis. contribution propagated across neighborhoods of increasing size or scale. Energy was reevaluated for graph mappings of the ERN data with average degrees of k = 40,30,20 to determine the extent to which k influenced energy in the biological network. Increasing k resulted in a larger set of h- hop neighborhoods, i.e. beyond h = 5, yet the variations in nodal energy were significant for only 87

(a) Gutman (b) Hückel (c) Laplacian Hückel Energy Figure 3.3: Local energy indices for a n = 34 karate club network: 3-hop Energy Analysis. h = 1, h = 2, and h = 3 h-hop neighborhoods.

103 (a) Gutman (b) Hückel (c) Laplacian Hückel Energy Figure 3.3: Local energy indices for a n = 34 karate club network: 3-hop Energy Analysis. h = 1, h = 2, and h = 3 h-hop neighborhoods. Therefore, biological significances of the graph were evaluated with k = 10. Figures 3.4 and 3.5 reveal that the most influential brain regions to network energy contribution are located around the frontal and frontal/central regions of the brain. Previous work indicates that 88

104 the mpfc is expected to be more influential to communication across the brain during an error response than a correct response. This is shown to be true in the 1-hop neighborhood of Error Correct energy contribution differences (Figure 3.4). FCz, FC2, and FC4 demonstrated significantly (p 0.01 Wilcoxon Rank Sum) greater energy contributions to the network during error responses compared to correct responses. The 2-hop neighborhood demonstrated significantly greater energy during error responses (p 0.04 Wilcoxon Rank Sum) in the lateral regions of the brain, particularly for the right lateral nodes. This may be indicative of nodes in the lpfc communicating with distant neural regions due to activation of the ACC. Increased activation is also observed in the occipital region but is expected since the data was collected from a visual based test. Similar anomalies are observed in the 3-hop, 4-hop, and 5-hop neighborhoods (not shown) with decreasing significance Comparison Between Local Graph Entropy Measures In this section, the local entropy index using the proposed Markov Graph Entropy Rate is compared to the local indices using Burda s and Szpankowski s entropy rate definitions. First, a scale-free network composed of n = 60 nodes was generated. A subregion of the network, consisting of 17 nodes, was perturbed by randomly re-wiring the connections according to the Erdös-Rényi model with probability p = 0.5. This perturbed region contains the subset of nodes indicated by dark black node borders in Figure 3.6. The three local indices were computed for both the simulated network (Figure 3.6) and Zachary s karate network (Figure 3.7). All entropy measures are normalized by the maximum node entropy. The graphs in Figure 3.6 demonstrate that the proposed Markov Graph Entropy rate measure is capable of identifying the nodes in the network contributing most to network connectivity. The node members of the perturbed region are all assigned magnitudes greater than the rest of the nodes 89

105 1 hop Energy: Error Correct Figure 3.4: 1-hop neighborhood Energy differences (Error - Correct). in the network. Szpankowski s entropy generates results opposite that of the proposed entropy measure. According to the results of Szpankowski s entropy, the subregions of the network with a more organized structure is deemed more complex than the subregion with random connectivity. Burda s entropy rate fails to identify the perturbed sub-network and assigns high entropy rate to only one of the organized sub-networks. It fails to choose either the perturbed network as the proposed measure does or all the organized sub-networks like Szpankowski s does. The Markov Graph Entropy Rate is therefore more discriminative than Burda s entropy and correctly identifies the perturbed network unlike Szpankowski. The graphs in Figure 3.7 reveal very similar evaluation of node contribution to network com- 90

106 2 hop Energy: Error Correct Figure 3.5: 2-hop neighborhood Energy differences (Error - Correct). plexity. Nodes 1 and 34 are identified by all three methods as being important and in general the nodes centered in the network contributed to overall complexity. The similarities in nodal entropy contributions between all three methods may be due to the small order of the network. However, this result indicates there is consistency between the proposed entropy measure and published measures Entropy Rate Across Binary Graph Topologies Graph entropy 1 was computed for five types of artificial graphs: star graph, scale-free Barabási- Albert graph, Erdös-Rényi random graph, three modular graphs with 10, 15, and 25 clusters, and a lattice graph. Fifty instances of each graph were generated for increasing order, n {50,250, 91

(a) Markov Graph Entropy Rate (b) Szpankowski (c) Burda Figure 3.6: Comparisons between nodal graph entropy measures for identifying anomalous regions in a simulated network with n = 60 nodes.

107 (a) Markov Graph Entropy Rate (b) Szpankowski (c) Burda Figure 3.6: Comparisons between nodal graph entropy measures for identifying anomalous regions in a simulated network with n = 60 nodes. The network is composed of four complete graphs and a random graph performing as a perturbation in the network. 500, 1000}. Entropy was computed using both the proposed measure (Table 3.2) and that proposed by Burda in [127] (Table 3.3). Burda s measure is computed as H(G) = log 2 λ 1 where λ 1 is the largest eigenvalue of the adjacency matrix. Graph entropy was also computed for the following published undirected scale-free networks (Table 3.4): 1. Dolphin network: a social network of frequent associations between 62 dolphins in New Zealand [34]. 92

Les Miserables: a co-appearance network of 77 characters in the novel Les Miserables [35]. 3. The North American airport network: a flight connection network composed of 456 airports [36].

108 (a) Markov Graph Entropy Rate (b) Szpankowski (c) Burda Figure 3.7: Comparisons between nodal graph entropy rate measures for identifying important regions in the karate club network with n = 34 nodes. 2. Les Miserables: a co-appearance network of 77 characters in the novel Les Miserables [35]. 3. The North American airport network: a flight connection network composed of 456 airports [36]. 1 In this section, we employ our definition of the entropy rate to evaluate the graph entropy of different classes of popular graph topologies. Due to the finite nature of the graphs used in our simulations, effectively we are evaluating an idealistic measure of the graph entropy and not the true entropy rate, which theoretically requires infinitely large graphs. Our objective is still the same 93

109 Graph entropy was computed assuming a first-order Markov process, matrix bandwidth minimization was performed using the reverse Cuthill-McKee (RCM) algorithm [141], and the horizontal raster scanning function was implemented. Results indicated random graphs had an entropy of H 1 for all n using the proposed entropy rate. Burda s entropy rate revealed random graph had the largest entropy rate across all the graph topologies investigated here. This is expected since random graphs do not have an organized structure in which an underlying pattern can be identified. For the rest of these results, the entropy values for all other topologies are normalized with respect to entropy of the random networks to allow a comparison between the proposed and Burda s entropy rate measures. Table 3.2 reveals that both the lattice and star graphs possess trivial structures which result in a low entropy. However, the more complex structures underlying the modular, scale-free, and random graphs are expected to have a much larger entropy than star and lattice graphs. In contrast to the structure of random graphs, scale-free and modular graphs possess underlying organized structures. This permits the Cuthill-McKee algorithm to group together more 1s, i.e. generating more non-transitional sub-sequences composed of 1s. The scanned sequence results in a significantly lower entropy than a random network. Across all scale-free graphs generated, H 0.5 for low n and monotonically decreased with increased n. This is caused by the occurrence of longer non-transitional 0 sub-sequences generated by the growing network sparsity as n increases. Entropy across the modular networks are low because the individual clusters generate sub-regions of increased spatial coherence between neighboring elements in the permuted adjacency matrix. In general, the organized structures of scale-free and modular graphs are appropriately described by the lower measures of entropy. The application of our proposed measure on the real-world netthough; and that is to gain an insight about the entropy bound associated with different types of graph topologies. 94

110 works (Table 3.4) reveals they all possess entropy values similar to the simulated scale-free and modular networks. This is expected since these real-world networks are classified in the literature as scale-free since their degree distributions model power law distributions. It is important to note that the expected complexity of each of the observed graphs is reflected by their respective entropy quantified by the proposed definition. The definition of graph entropy rate by Burda assigns large entropy values to both lattice and random graphs. This measure is therefore misleading by indicating that a simple lattice is as complex as a random network. According to Sinatra et. al. [128], information diffusion across a lattice and random network will be similar since most nodes will have similar degrees. A network with a fairly homogenous distribution of degrees will permit all paths in the network to have equal probability of being traversed by a random walk, i.e. Burda s entropy rate will increase. This is a consequence of Burda s measure being highly dependent upon node degree. Analysis of the real-world network entropy, Table 3.4, reveals the dolphin network can be characterized as scale-free according to Burda. However, Burda s measure characterizes the Airport and Les Miserable networks more like that of a random network. The increased order and sparsity of these two networks may be the cause for this misclassification by Burda s measure. Table 3.2: Entropy computed for various graph models using the proposed measure. Fifty realizations of each model were generated and an average entropy is reported. Each value is normalized by that of a random network of equivalent order. n = 50 n = 250 n = 500 n = 1000 Random (Erdös-Rényi) 0.99 ± ± ± ±10 3 Modular (10 clusters) 0.46 ± ± ± ± 0.01 Modular (15 clusters) 0.38 ± ± ± ± 0.01 Modular (25 clusters) 0.27 ± ± ± ± 0.01 Scale-Free (Barabási-Albert) 0.50 ± ± ± ±10 3 Lattice 0.22 ± ± ± ±10 3 Star

111 Table 3.3: Graph entropy computed for various graph models using Burda s definition. Fifty realizations of each model were generated and an average entropy is reported. Each value is normalized by that of a random network of equivalent order. n = 50 n = 250 n = 500 n = 1000 Random (Erdös-Rényi) 1 ± ± ± ±10 3 Modular (10 clusters) 0.54 ± ± ± ±10 3 Modular (15 clusters) 0.45 ± ± ± ±10 3 Modular (25 clusters) 0.34 ± ± ± ±10 3 Scale-Free (Barabási-Albert) 0.53± ± ± ± 0.08 Lattice 0.79 ± ± ± ±10 3 Star Table 3.4: Entropy rates for three real scale-free networks using the proposed measure and Burda s measures. All measures are normalized by the entropy rate of random networks with equivalent order. Proposed Measure Burda s Measure Dolphin (n = 62) Les Miserable (n = 77) Airport (n = 456) After computing the entropy rate of each network, the upper triangular sequences were coded using the Lempel-Ziv compression algorithm [150]. Tables 3.5 and 3.6 report the achieved compression of each network. Using the universal coding algorithm Lempel-Ziv, the elements within the adjacency matrix upper triangle, with the exception of random graphs, could on average be represented with less than 1 bit. The lowest bit/symbol occurred for lattice, star, and scale-free graphs particularly for higher graph orders. Across all n, the variations in compression are in agreement with the variations in entropy computed by our proposed measure. In general, the bits/symbol achieved by Lempel-Ziv, as compared with the computed entropy of both simulated and real-world networks, indicate that Lempel-Ziv may not be the best coding algorithm for this application. Our entropy measures indicate that the graphs can be compressed further. 96

112 Table 3.5: Bits/symbol achieved with Lempel-Ziv algorithm. n = 50 n = 250 n = 500 n = 1000 Star Scale-Free (Barabási-Albert) Random (Erdös-Rényi) Modular (10 clusters) Modular (15 clusters) Modular (25 clusters) Lattice Table 3.6: Bits/symbol achieved using Lempel-Ziv algorithm for four real scale-free networks. Network Bits/Symbol Dolphin (n = 62) 0.74 Les Miserable (n = 77) 0.63 Airport Network (n = 456) Entropy with Respect to Permutation and Scanning Function In this section, the proposed entropy rate was evaluated across graph topologies, each with 100 node members, by varying the combinations of permutation cost functions and scanning functions. For the analysis, 50 simulations of each graph topology were generated. The plots in figures 3.8 through 3.10 reflect the average entropy rate for each possible permutation/scanning function pair. The permutation functions tested include: MCL, Spectral BWM, DBscan, Cuthill-Mckee, and AMD. The scanning functions include raster, Cantor, Morton, and Hilbert scans. According to the plots associated with random and star graphs, entropy rate remained fairly similar across all possible combinations. Therefore, the choice in permutation/scanning functions does not make a large difference in computing entropy rate for either topology. As for modular and scale-free graphs, entropy rate remained midway between that of a random and star graph. However, entropy rate was minimized for modular graphs using MCL as the permutation function combined with a vertical raster scanning function. MCL performed the best for forming denser 97

113 regions of 1s in the adjacency matrix. Scale-free graph entropy rate was minimized using the Cuthill-Mckee algorithm as the permutation matrix. There was no significant difference in entropy from using any of the five scanning functions. Interestingly, there was a large gap in entropy for lattice graphs with respect to certain permutation/scanning combinations. Spectral BWM, DBscan, and MCL performed poorly and the results were as good as if no permutation was applied to the adjacency matrix. Lattice graphs are not modular, which may explain why clustering algorithms such as these did not perform well. The lowest entropy rate measures were computed using either Cuthill-Mckee or AMD as the permutation matrices. The choice of scanning function for the lattice graph made no significant difference to entropy rate Graph Entropy with Respect to Order of the Markov Process In this section, we evaluate the effect of the choice of Markov process order on the computed entropy rate. Figure 3.11 reports the relationship between Markov order and graph entropy rate across graph topologies. Entropy was computed using the Cuthill-Mckee algorithm for permutation with a horizontal raster scanning function. The results indicate entropy rate of random graphs do not significantly change as the assumed order increases. This is expected since random graphs do not have much structure and increasing the Markov model order has no effect. Modular graph entropy significantly, p 0.05 Wilcoxon Rank Sum, decreased when transitioning from order 1 to order 2. This change in entropy supports the fact that there will exist isolated subregions, in the permuted adjacency matrix, containing a dense collection of 1s. Since these regions are finite in size, entropy change is restricted to a low Markov order. Scale-free graphs similarly perform as modular graphs with the exception that entropy decreases significantly, p 0.05 Wilcoxon Rank Sum, for higher orders. Entropy of lattice graphs do not change significantly across order number. This may be indicative of the lower entropy rate bound being achieved. Finally, entropy rate for 98

114 Graph Entropy Rate Random Networks, n=128 None Cuthill Mckee Spectral BWM AMD DBscan MCL 0 Raster (horiz) Cantor Diagonal Raster (vert) Morton Hilbert Scanning Function, ψ (a) Random Graph Entropy Rate Modular Networks, n=128 None Cuthill Mckee Spectral BWM AMD DBscan MCL 0 Raster (horiz) Cantor Diagonal Raster (vert) Morton Hilbert Scanning Function, ψ (b) Modular Figure 3.8: Average graph entropy, computed over 50 simulations of each network type, with respect to various permutation and scanning functions across graph topologies each with 128 nodes. a) Random networks; b) Modular networks. 99

115 Graph Entropy Rate Scale Free Networks, n=128 None Cuthill Mckee Spectral BWM AMD DBscan MCL 0 Raster (horiz) Cantor Diagonal Raster (vert) Morton Hilbert Scanning Function, ψ (a) Scale-Free Graph Entropy Rate Star Networks, n=128 None Cuthill Mckee Spectral BWM AMD DBscan MCL 0 Raster (horiz) Cantor Diagonal Raster (vert) Morton Hilbert Scanning Function, ψ (b) Star Figure 3.9: Average graph entropy, computed over 50 simulations of each network type, with respect to various permutation and scanning functions across graph topologies each with 128 nodes. a) Scale-Free networks; b) Star networks. 100

116 Graph Entropy Rate Lattice Networks, n=128 None Cuthill Mckee Spectral BWM AMD DBscan MCL 0 Raster (horiz) Cantor Diagonal Raster (vert) Morton Hilbert Scanning Function, ψ Figure 3.10: Average graph entropy, computed over 50 simulations of each network type, with respect to various permutation and scanning functions across graph topologies each with 128 nodes (Lattice networks) star graphs decreases significantly up to the maximum order evaluated, 8. Star graphs are among the simplest topologies and therefore they are less of a challenge to compress. The only information required to reconstruct a star graph is the number of nodes in the network and the label of the center node Entropy Rate of a Biological Network: EEG ERN data We are primarily interested in understanding the change in complexity across neural networks involved during cognitive control. The proposed local weighted graph entropy measure was applied to each matrix containing the pairwise phase locking values for error and correct responses across subjects. Each matrix was quantized for Q = 2 to 10 levels. The quantization errors, measured as the mean square error (MSE), were evaluated across all subjects and are reported in Figure Rate/Distortion plots are provided to demonstrate the variations in quantization errors when 101

117 Entropy Rate Graph Entropy Rate vs Markov Order Markov Order Random Modular Lattice Scale Free Star Figure 3.11: Effects of order value on graph entropy measures for various graph topologies. applying the Lloyd-Max algorithm across all phase synchrony matrices. Figure 3.12 reveals low quantization errors across all quantization levels, Q, evaluated and as well as for levels beyond Q = 10. Variance across subjects, indicated by the error bars, is also small across the number of Markov states. This indicates the phase synchrony connectivity information was not significantly compromised due to quantization. Bandwidth minimization on the weighted matrices was performed using the weighted version of the Cuthill-Mckee algorithm and scanning was performed using the horizontal raster scan. The entropy differences for all 62 electrodes, representing different neural regions, were also computed across subjects. The topographical plots, obtained for quantization levels 2 Q 10, representing the significant difference, using a Wilcoxon Rank Sum, in local graph entropy between response types are reported in Figures 3.13 through From these results, it can be seen that nodes located around the frontal-central region of the brain contributed significantly more to the complexity of the underlying brain network during error 102

118 0.04 Rate/Distortion of Quantized Connectivity Matrices (Error) Average MSE States (a) 0.04 Rate/Distortion of Quantized Connectivity Matrices (Correct) Average MSE States (b) Figure 3.12: Rate/Distortion curves average and standard deviation across all subjects for each response type: a) error; b) correct. responses vs correct responses. As expected, the entropy contributions of nodes located around FCz, in the vicinity of the ACC, were significantly greater, p 0.02 Wilcoxon Rank Sum, in error responses. Across all realizations of the topographic plots, the difference between Error and 103

119 Figure 3.13: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 2 levels. Correct responses remained more significant around the frontal central region nodes than other brain regions. Regardless of the number of quantization levels, these significant differences still occurred. Beyond quantization levels of Q = 10, distortion significantly decreased; however, there was no significant difference between the response types in terms of the local entropy measures. 104

120 3.6 Conclusions In this chapter, new local feature extraction techniques were introduced for graphs. Well-known signal processing and information theoretic measures were adapted for explaining the underlying structure of a graph as well as for identifying possible anomalies within the network. With applications to the karate network and the ERN data, the localized graph energy measure was shown to identify nodes which contribute most to the structural dynamics of the graph. The proposed Laplacian Hückel energy was shown to be more discriminate than Gutman s or the conventional Hückel energy definitions. Furthermore, the proposed energy measure identified the local neural populations of the brain which contributed most to the energy of the brain network. Energy during error responses was significantly greater than that for correct responses particularly in the mpfc and lpfc regions. The greatest 1-hop neighborhood energy was observed in the mpfc and the greatest 2-hop neighborhood energy was observed in the lpfc. Next, the proposed definition of graph entropy rate outperformed published entropy measures by correctly classifying various graph topologies in terms of their structural complexity. The proposed entropy rate computed by modeling the adjacency matrix as a Markov process, revealed node members in the perturbed region of a simulated network. Compared to the entropy rate definitions proposed by Burda and Szpankowski, the proposed measure was more discrimative for determining the magnitude of importance of each node within the network. The entropy measure also revealed local brain regions of increased complexity during the ERN. These areas were primarily situated in the frontal central region of the brain as expected according to the hypothesis stated in this thesis. Challenges still remain for adapting signal processing and information theoretic measures to graphs. These challenges include computing exact energy and entropy bounds based on the number 105

121 of nodes, edges, and topology of the network. Szpankowski presented a method for computing the lower bound entropy rate of random graphs but the next step is to compute the lower bound across all topologies especially for scale-free networks. Also, a coding method which achieves the entropy rates presented in this thesis would be beneficial for compressing all types of graphs. The graph compression and reconstruction algorithms should be applicable to both binary and weighted networks. 106

122 Figure 3.14: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 3 levels. 107

123 Figure 3.15: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 4 levels. 108

124 Figure 3.16: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 5 levels. 109

125 Figure 3.17: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 6 levels. 110

126 Figure 3.18: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 7 levels. 111

127 Figure 3.19: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 8 levels. 112

128 Figure 3.20: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 9 levels. 113

129 Figure 3.21: Significance (Wilcoxon Rank Sum) plot for Error vs Correct local entropy values using Lloyd-Max quantizer optimization for Q = 10 levels. 114

130 Chapter 4 Hierarchical Multi-Subject Community Detection for Weighted Neuronal Networks 4.1 Introduction One important characteristic of complex networks is modularity. A module is composed of a densely connected set of nodes with sparse connections between modules in the network [151] [152]. Node members of each module, or cluster, share a common trait, function, or other similar characteristic. Identifying this inhomogeneity of connections is important for understanding how network topology and function are related. Some of the applications of community detection include understanding the functional pathways of metabolic networks [113], discovering topic related webpages on the internet [153], and evaluating the rapid spread of disease throughout populations [41]. It s hypothesized that the modular structure of complex neuronal networks is indicative of robustness [154] and contributes to functionality [113] by compartmentalizing specific functions within certain cortical regions without perturbing the rest of the network [155]. Intra-module associations are thought to describe the segregation of information processing while 115

131 the inter-module associations testify to the integration of information processing across distant brain regions [4]. This modular structure is therefore a compelling model for emphasizing the information distributing nature of the brain s structure and function mappings [27, 61]. Community detection is an important topic in the study of complex networks yet there does not exist a precise formulation to uncovering the "correct" underlying structure; therefore, a vast array of different methodologies for community detection and cluster identification have been proposed throughout the literature [156] [119]. These methodologies are inspired by stochastic theory [157], [158] signal processing [159], and genetic algorithms [160]. The availability of highly complex networks has attracted the need for fast algorithms which find reasonably good partitions. The primary reason for the existence of numerous community detection methodologies is due to the lack of a priori knowledge of what a natural partitioning of a network really is. How many communities should exist in the network? Are the sizes of the communities equal to each other or differ significantly? For example, should a network composed of 100 vertices split into two groups of 50 or two groups of 25 and 75? Is all the information provided in the network important or may some of it be classified as noise or anomalies? Also, should vertices be restricted to membership in one cluster or be allowed to participate in multiple clusters? The list of questions can go on and many of them may be answered with a priori knowledge of the community structure of the network. However, the availability of this knowledge would negate the need for a clustering algorithm. Therefore, community detection becomes a bootstrapping problem which depends solely upon the original data representing the graph, i.e. the weighted or unweighted connectivity matrix. There are a variety of clustering algorithms which address this problem including spectral clustering [161], Girvan-Newman [151], and k-means [162]. These and other algorithms found in the literature have been designed for a variety of applications. This chapter primarily addresses community 116

132 detection issues unique to analyzing the modular structure of the human brain. Identification of functional modules or communities in the brain has been originally addressed using methods like principle component analysis [163] and independent component analysis [164] which put non-physiological constraints in the obtained components such as orthogonality and independence. Recently, methods from spectral analysis of graphs have been used to extract modules [165, 140, 119, 166] by mapping the functional connections to a multi-dimensional subspace defined by a set of eigenvectors. This method, however, does not reveal a hierarchical decomposition of the network. Meunier et. al. [167] argues that most information processing systems possess a multi-scale modularity characteristic, i.e. are hierarchically decomposable into a finite number of modular levels. This property allows a complex system to be more rapidly and robustly assembled since it is embedded in a relatively low-dimensional space. Therefore, a hierarchical decomposition of a complex network, particularly the human brain, is a natural representation of modular organization. A key challenge in identifying modular organization of brain networks is determining a common structure across multiple subjects. Current work either focuses on obtaining the community structure for the average connectivity network or on analyzing each subject individually and obtaining a common modular structure using a voting technique. Averaging neglects the variance across subjects whereas conventional voting methods either require a priori knowledge of cluster labels or for one to first acquire the modular structure of each subject [168]. Voting does not permit convergence to an optimal modular structure since it requires only one iteration of the process. Various studies [169, 170] have proposed alternatives for identifying similarities across multiple subjects by introducing random effect analysis and Bayesian classifiers. With an application specific to module identification across multiple subjects, a possible solution would be to 117

133 implement a new consensus based approach [74, 171, 172] in which the best modular organization is identified by combining information shared across multiple modular structures and information provided by previous decompositions of the data set. Integrating modular information shared across multiple subjects decreases the uncertainty that node pairs are randomly clustered together and incorporating information from previous modular structures maintains consistency throughout the hierarchical tree. In this chapter, the most common methods designed for community detection are reviewed and shown how they differ fundamentally. Furthermore, this chapter addresses challenges pertaining to thresholding weighted graphs, community quality measures for weighted graphs, and cluster analysis across multiple subjects. The proposed methods are applied to weighted graphs obtained by computing pairwise phase synchrony values for the error-related negativity. The proposed community detection methods are applied to EEG data collected during a study of cognitive control in the brain based on the error-related negativity. In particular, this study focusses on understanding the modular organization of the brain during error processing. 4.2 Background on Community Detection on Graphs In this section, we provide a review of various community detection measures, implementation of consensus clustering, and quality measures for quantifying the goodness" of a discovered community Community Detection on Graphs Numerous methods for identifying community structures within a graph exist throughout the literature [173]. These include spectral and hierarchical approaches. In this section, we briefly review some of the most popular published community detection algorithms. 118

134 Girvan-Newman One of the pioneers of community detection in complex networks is M. Newman, who along with M. Girvan, developed an iteratively divisive hierarchical algorithm referred to as the Girvan- Newman method [151]. The algorithm iteratively removes edges in a graph that are deemed weakest compared to all existing edges where the strength of an edge is defined through betweenness centrality. Betweenness centrality is defined as the fraction of shortest paths within the network that travel through each edge, e i j, such that C B (e i j ) = σmn(e i j ) m i n V σmn (4.1) where σmn(e i j ) is the number of shortest paths between nodes vm and vn which contain edge e i j and σmn is the total number of shortest paths between vm and vn. The edge assigned the largest betweenness centrality is considered important to maintaining global network connectivity, i.e. inter-connecting edges between clusters will have high betweenness centrality and edges within clusters will have lower betweenness centrality. The Girvan-Newman algorithm computes the edge betweenness for all edges and removes the edge with largest C B from the graph. The removal process is repeated until the graph is partitioned into a desired number of clusters. Girvan-Newman algorithm performs well for identifying the boundaries of clusters and is easily implementable. However, Girvan-Newman algorithm has the disadvantage of running in O(m 2 n) time on an arbitrary network with m edges and n vertices and therefore would be computationally expensive for non-sparse networks. Although it can be applied to weighted graphs, it is designed for unweighted networks since betweenness centrality evaluates shortest paths. Given a fully connected network, the shortest path will always be equal to one and betweenness centrality would have no effect on the results. The algorithm also does not consider overlapping clusters where vertices may belong to 119

135 more than one cluster. Finally, connectivity is based on shortest paths rather than all possible paths thus ignoring much of the available connectivity information particularly in a weighted network. The problem with overlapping clusters is addressed by an extension of the Girvan-Newman algorithm, the Cluster-Overlap Newman Girvan Algorithm (CONGA), proposed by Gregory et. al. [174]. CONGA extends the Girvan-Newman algorithm by incorporating a split betweenness, for splitting nodes into various clusters if a node is strongly connected to more than one cluster. CONGA divides the betweenness of every edge by its corresponding weight such that C B (e i j )new = CB (e i j ) W i j (4.2) where W i j is the weight associated with the edge between nodes v i and v j. This is followed by the calculation of each node s split betweenness. Each node, v i, is split into two nodes, v i1 and v i2, and an imaginary edge is placed between them. Using Eq. 4.2, the edge betweenness for the imaginary edge is computed, referred to as split betweenness. If there exists a node with split betweenness greater than the edge betweenness of all the edges connected to it, then the node is split. Otherwise, the edge with maximum edge betweenness is removed as in the GN algorithm. The process repeats itself until all nodes are isolated or until the desired number of clusters are found. CONGA presents its results as a set of community structures C = {c 1,c 2,c 3,...,c k } such that community c 1 contains one cluster, community c 2 contains two clusters, and community c k contains k clusters. CONGA addresses the problem of overlapping clusters but still depends on the shortest paths to define clusters and is not suitable for fully connected weighted graphs Spectral Clustering Another commonly used method for graph clustering is known as spectral clustering based on the eigenvectors and eigenvalues of a matrix describing the graph. The spectral decomposition of a 120

136 graph can be applied to its adjacency matrix, A, laplacian matrix, L, normalized laplacian matrix,l and modularity matrix, B. The spectral properties can be be used for identifying the number of clusters in the network and identifying memberships within a cluster. Since the matrices that are decomposed are symmetric, they will have an orthonormal basis of eigenvectors, {u 1,u 2,...,un} with real eigenvalues λ 1 λ 2... λn such that Mu i = λ i u i, i where M is any symmetric matrix of size n n. Although spectral clustering can be performed on the adjacency matrix, it is typically performed on the Laplacian of a graph since it provides a better representation of network structure [161]. The Laplacian of a graph is defined as L = K A where K is the diagonal matrix containing the degree of each vertex such that L i j = k i if i = j 1 if A i j = 1 0 otherwise (4.3). The Laplacian matrix is positive semi-definite and therefore its eigenvalues are all positive. The multiplicity of the eigenvalue λ = 0 indicates the number of components in the graph. Provided we are only interested in connected graphs, the multiplicity of λ = 0 is 1. It has been shown that the normalized Laplacian, L, provides significantly better clustering results than the unnormalized counterpart [161]. L is defined as L = K 1/2 LK 1/2 such that L i j = 1 if i = j 1 ki k j if i and j are adjacent (4.4) 0 otherwise 121

137 . The normalized Laplacian matrix has eigenvalue properties similar to the un-normalized Laplacian matrix but has the extra advantage of having bounded eigenvalues such that 0 λ 2; where λ = 2 is indicative of a bipartite component within the graph. It has been shown that the number of clusters within a network is revealed by the largest eigengap [175] [176] [177] [178]. To find the largest eigengap, the eigenvalues are first arranged in descending order, i.e. λ 1 λ 2...λ i λ N where N is the number nodes in the network. Each successive pair of eigenvalues form an eigengap such that the ith eigengap is computed as λ i λ i+1. The number of communities is then defined as the location of the largest eigengap, e.g. there are i clusters if the ith gap is the largest eigengap. Another popular approach to community discovery is use of the eigenvectors to project the network into a lower dimensional representation [165] [140] [119] [166]. A common method of utilizing the eigenvectors is to take a subset, {u 1,u 2,..uc} where c < n, of the computed eigenvectors and form an n c matrix with the columns being the c eigenvectors. The maximum number of clusters, k, which can be detected by any clustering algorithm is limited to k c. The elements along the ith row of the matrix represent the projected node vector corresponding to node v i. One of the most popular methods to evaluate a set of eigenvectors for identifying cluster membership is the k-means algorithm [162] which minimizes the objective function J = k n x i j c j 2 (4.5) j=1i=1 where x i j c j 2 is a distance measure between a data point x i j, i.e. the jth element of the ith eigenvector, and the cluster centre c j. c j is chosen randomly and designated the seed node. However, one significant problem with the k-means method is the uncertainty about the number of 122

138 and selection of eigenvectors to utilize in the process. Another problem pertains to the selection of c j which can affect the resulting clusters. This remains a challenge in many other eigenvector based clustering techniques such as fuzzy k-means, generalized Synchronization Cluster Analysis [119], the Ng-Jordan-Weiss algorithm (NJW) [179], and power iteration clustering (PIC) [180]. An alternative to choosing a subset of eigenvectors of a network is to evaluate only one eigenvector for the purpose of bi-partitioning, i.e. identifying a minimal cut of the graph. This eliminates the problem of searching for the optimal set of eigenvectors. According to Holzrichter et. al. [181], the optimal minimal cut of a graph matrix is defined by the eigenvector, u 2, associated with the second smallest non-zero eigenvalue, λ 2, of the Laplacian or normalized Laplacian matrices. This eigenvector is referred to as the Fiedler vector, u F, and defines a set of two clusters {C 1,C 2 } where v i C 1 u F (i) 0 and v i C 2 u F (i) < 0. The Fiedler partition can iteratively be applied to each successive partition in order to achieve a clustering with k > 2. The partitioning of a graph will be referred to by the syntax, FiedlerPartition(G), which splits graph G into two clusters, G 1 and G 2 such that G 1 G 2 = V. The advantage of using the Fiedler vector over other eigenvectors is that the partitions performed on the network are guaranteed to be optimal, compared to identifying multiple splits in one iteration, since the Fiedler vector is the optimal solution to a min cut problem. The Fiedler vector also allows one to obtain the hierarchical structure of the network. Finally, the Fiedler partitioning method can be applied to a fully connected weighted matrix without the need of thresholding as opposed to the Girvan-Newman algorithm Consensus Clustering In many clustering problems, it is common to apply different algorithms to the same data and then use a consensus method to combine the results [182]. Two popular consensus methods are consensus averaging and the hypergraph partitioning algorithm (HPGA). The first approach 123

139 averages m connectivity matrices to obtain Ŵ = [ŵ i j ] where ŵ i j = 1 m m w (r) i j r=1 (4.6) A clustering algorithm, e.g the FiedlerPartition(G), can then be applied to Ŵ to obtain a hierarchical community structure. Inter-subject variability is not considered through this approach. The computational complexity of O(kN 3 ) is attractive for identifying a community structure for numerous graphs, however confidence in the resulting clustering is low since variance between the multiple similarity matrices is neglected. Variance between matrices should be considered in order to decrease the uncertainty that a node pair are not randomly assigned to the same cluster. A second option for obtaining a common community structure across multiple graphs is to identify the community structure of each individual graph. A process is then implemented which uses information common across the multiple community structures to identify a global community structure. With the availability of multiple graphs, nodes within each similarity matrix are first partitioned into a set of clusters C (r) = {c (r) 1,c(r) 2,...,c(r) }, i.e. base clusterings, using a clustering k algorithm, where k is restricted to be the same across all graphs. These base clusterings can be used to identify a common community structure using the HyperGraph-Partitioning Algorithm (HGPA) [183]. HGPA treats each cluster across all base clusterings as a hyperedge within a single global graph. The algorithm is a multilevel graph partitioning system which partitions this graph in three steps: 1) compress the graph by collapsing hyper-edges, 2) partition the compressed graph using a minimum cut objective function, and 3) decompress the partitions and repeat the process. HGPA has a computational complexity of O(kNh) where h is the number of hyperedges, however its overall complexity is dependent upon the total complexity of the clustering algorithms used to obtain the base clusterings. HGPA has the disadvantage of generating clusters of approximately 124

140 equal sizes, even though in real networks, equally sized clusters are unlikely Quality Measures for Community Structures When a community structure is discovered, it is difficult to determine if it is a suitable representation of the network of interest without a priori knowledge. Therefore, it is necessary to use a metric which quantifies the quality of the discovered community structure. The literature overwhelmingly supports the use of a measure known as modularity. Modularity was developed by Newman et. al. [156] to determine if the number of edges between clusters is lower than expected in a random graph, i.e. less dense arrangement of inter-connecting edges, and if the number of edges within clusters are higher than expected in a random graph, i.e. highly dense arrangement of intra-connecting edges. Modularity compares a detected community structure with a network of equivalent size where edges are randomly linked among node pairs with probability Z i j = k i k j where k i is the degree of node v i. This value of Z i j maintains a fair comparison between the actual network and the random network, such that the expected degree of node v i in the random network should be equivalent to the actual degree of node v i in G such that j Z i j = k i. Since the random connection between v i and v j is dependent on their degrees in graph G, then Z i j is modified as Z i j = k i k j 2m where the factor of 2 accounts for counting each edge twice since the network is symmetric. If A is the adjacency matrix of G, then modularity is proportional to A i j Z i j summed over all node pairs belonging to the same cluster such that Q = 1 2m ] [A i j Z i j σ(c i,c j ) (4.7) i j where σ(c i,c j ) = 1 if nodes v i andv j are in the same cluster and 0 otherwise and m is the number of edges in the network. This can also be rewritten as 125

141 Q = 1 2m ] [B i j σ(c i,c j ) (4.8) i j where B i j = A i j k i k j 2m is the modularity matrix. Modularity may be negative or positive where positive values indicate the possible presence of a community structure. Therefore, one can quantify the quality of a community structure by evaluating the value of Q. The modularity matrix B is a symmetric matrix which may consist of both positive and negative values. However, if all the entries of B are negative, then the corresponding network is considered indivisible, i.e. any clustering algorithm will not reveal a good community structure and Q will be assigned low, possibly negative, values. The modularity matrix, B, can also be used to perform spectral clustering in the same way as the Laplacian matrix. Modularity for weighted graphs can be defined by modifying Equation 4.9 as Q = 1 [ 2m W i j s i s ] j σ(i, j) (4.9) i j 2m such that m is the number of edges, s i (indegree) is the sum of the ith row in W, and σ(i, j) = 1 if v i and v j are in the same cluster and 0 otherwise. In ideal cases, clusters will not overlap. However, in real world networks, this may not always be the case. Therefore it is imperative to used a modified version of modularity which considers nodes belonging to more than one community. Nicosia et. al. [184] proposed an improved modularity measure where a belonging factor for each node v i, α i,c, is defined to quantify the strength of node v i in cluster c. α i,c = 1 s for i c where s is the number of clusters a node belongs to. Next, a coefficient for defining the belonging strength of an edge within a particular cluster, c, is calculated as β(e i j,c) = 1 (1+e f (α i,c ) )(1+e f (α j,c ) ) where f (α i,c ) is a linear scaling 126

142 function, defined as f (x) = 60x 30 in the literature. The modified modularity is defined as Q = 1 2m β(e i j,c)w(e i j ) β2 (e i j,c)k i k j σ(c i j 2m i,c j ) (4.10) If no clusters overlap, then β(e i j,c) = 1 and the extended version of modularity will produce the same results as the original modularity presented in Eq. (4.8). 4.3 Hierarchical Modular Detection Algorithms for Weighted Neuronal Networks In this section, two methods for evaluating the modular structure of a fully connected weighted network are presented. The first combines CONGA and Nicosia s modularity in an iterative manner to identify the best overlapping modular structure of neuronal data. The second algorithm is a hierarchical consensus clustering method which utilizes the properties of the Fiedler vector for identifying a community structure common across multiple neuronal data sets Overlapping Community Structures in Weighted Networks Modularity has been established as a reliable method to determine if a community structure is optimal. In the case of weighted and symmetric graphs representing the functional connectivity between neuronal populations, several problems exist in determining the optimal cluster arrangement. First, clustering algorithms typically require fully connected weighted graphs to be mapped to binary graphs by thresholding. Thresholding allows the important information pertaining to possible clusters to be retained while filtering the unimportant information. As discussed in previous chapters, choosing a threshold which optimally represents the connectivity structure of a network remains a challenge. Sparsely weighted graphs may be evaluated by Girvan-Newman and CONGA by scaling edge betweenness by the weight of the edge however this idea fails for non-sparse or 127

143 fully connected weighted matrices as is the case for neuronal data where each pair of regions is assigned a weighted relationship. These methods fail since all edges would be considered equally important according to the definition of edge betweenness centrality. Therefore, thresholding is an important issue and discovering an optimal threshold which preserves the cluster information hidden within the data is necessary. Second, most clustering algorithms also assume a node does not participate in multiple clusters. In real-world networks, the expectation of a community structure in which clusters are not allowed to overlap is an unreasonable assumption. Overlapping clusters are increasingly expected as the network becomes more complex or larger in size [185]. Clusters within real-world networks, such as neural assemblies in the brain, must interact with each other in order for the network to function properly. These interactions may be due to nodal members common to multiple clusters. CONGA is one of the few clustering algorithms which considers overlapping clusters and therefore it is the basis of this optimal cluster identification process. Nicosia s modularity is applied to evaluate the resulting overlapping community structures. Considering these challenges, the goal is to identify a single sparse binary mapping of the weighted data which results in the optimal community structure based on threshold (T ), number of clusters (k), and modularity (Q). We present an exhaustive search algorithm in terms of the threshold and number of clusters and determine the best combination of threshold and number of clusters based on Nicosia s modularity since this permits us to evaluate a hierarchical soft clustering of the data. Using CONGA, the nodes of the weighted matrix are categorized into 2 K k clusters for each binary mapping generated by T where 0 T 1. The contribution of each node to the community structures is defined by an N N participation matrix as PM T,c k (i, j) = σ(ck i,ck j ) for each community structure generated by each T. Modularity, Q T, is also computed for each 128

144 thresholded graph. σ(c k i,ck j ) = 1 if nodes v i and v j participate in the same clusters within the community structure c k and 0 otherwise. A participation score for each node, v i, for within c k, is computed as PS T,c k (i) = N 1 N j=1 PM T,ck(i, j) which indicates how central each node is within their respective cluster. This is applied to all PM matrices generated for the network. Next, all PS T,c k are categorized according to the degree range, R, for which the associated threshold T falls within. This generates a set of degree ranges, {R 1,R 2,..,R j } R, where j is the cardinality of degree range categories, which are averaged together and stacked side by side with all c k to form the N k participation score matrix PS(R j ). The modularity values are averaged to obtain an 1 k modularity vector such that Q k (R j ) = T 1 T R j Q(T ). The following steps summarize the evaluation procedure. Step 1: Acquire the fully connected weighted matrix, G = (V,E), where {w i j E v i,v j V }. Step 2: Iteratively threshold G by T, 0 T 1, to obtain a binary adjacency matrix (Figure 4.1). Figure 4.1: Weighted matrix to adjacency matrix for a given threshold 129

145 Step 3: Apply CONGA and acquire K clusters such that 2 K k. Step 4: Compute Nicosia s modularity for each community structure (Equation 4.10). Step 5: Obtain Participation matrices and Participation Scores across thresholds and different community structures, K (Figure 4.2). Figure 4.2: Participation Scores of various clustering derived from various thresholds of the network Step 6: Average Participation scores based on the corresponding average degree, R. (Figure 4.3). The final participation score for node v i is defined as 1 PS(i) = k(n 1) j PS i (R j ) (4.11) 130

Figure 4.3: Average Participation scores in terms of network degree A large PS(i) is indicative of a node which participates in overlapping clusters with high frequency, i.e. a possible central hub of the network.

146 Figure 4.3: Average Participation scores in terms of network degree A large PS(i) is indicative of a node which participates in overlapping clusters with high frequency, i.e. a possible central hub of the network. Next, to evaluate average participation scores for the entire network with respect to degree range, R j, and community structure, c k, compute P(R j ) = N PS i (R j ) (4.12) i=1 and form the j k participation matrix, PSmat, where the jth row is P(R j ). Finally, to evaluate optimal modularity, the 1 k modularity vector of each degree range, R j, containing the average modularity for each community structure, Q k (R j ), are assembled to form a j k average modularity matrix. Maximum modularity can then be determined with respect to degree range, R j, and community structure c k. 131

147 4.3.2 Hierarchical Spectral Partitioning Algorithm for Identifying a Common Community Structure among Multiple Subjects Since the literature has consistently demonstrated that the Fiedler vector is highly effective in partitioning graphs into clusters [186, 187], in this study we will use the Fiedler vector for performing consensus clustering across multiple weighted graphs. The original connectivity matrices are bi-partitioned into two clusters using the Fiedler partitioning method. This results in a cluster matrix T r such that T r 1 if nodes v i,v j are in the same cluster (i, j) = 0 otherwise and r = {1,2,...,m} and m is the number of connectivity matrices, e.g. number of subjects in the brain study, evaluated. In order to find the common community structure across multiple graphs, we introduce a co-occurance matrix P where P(i, j) = m r=1 T r (i, j) m and P(i, j) [0,1]. P(i, j) is the probability that a pair of nodes are members of the same cluster across multiple graphs. The similarity matrix reflects the strength of a direct relationship between a node pair, whereas P reflects the likeliness that a pair of nodes are in the same cluster across all subjects. The Laplacian matrix of P is computed and the Fiedler vector is found to form a bi-partition of P into a community structure composed of clusters c 1 and c 1. Since P represents the probability that a node pair should be clustered together, the Fiedler partition of P represents the community structure common to all graphs. The initial partition set, C = {c 1,c 1 }, contains 2 clusters but if k > 2 is desired, the process can be repeated by selecting a cluster in C to partition. In this case, c 1 or c 1 may be selected based on a user defined cost function. Next, sub-matrices are extracted from the original connectivity matrices such that they only contain the nodes of the chosen cluster. 132

148 Algorithm 5 Probabilistic Fiedler Clustering Algorithm 1: Input: m N N dimensional graphs, G = {G 1,G 2,...,G m } with vertices V = {v 1,v 2,..,v N } and edges E r = {w r i j : v i, v j V } such that Gr = (V,E r ) and r = {1,2,...,m}. 2: Input: Number of clusters, k. 3: Output: k clusters C = {c 1,c 2,...,c k } where c j V. 4: C = /0 5: for t = 2 to k do 6: P = 0 V V 7: for r = 1 to m do 8: submatrix G ˆr G r G ˆr = (V,E r ) 9: (V 1,V 2 ) = SubRoutine(Fiedler Partition( G ˆr )) 10: P (i, j) = P (i, j) + T r (i, j) where 11: end for 12: P = P m 13: (V 1,V 2 ) = SubRoutine(Fiedler Partition(P)) 14: C = C {V 1,V 2 }. 15: if t k then 16: V = c i i = maxq{hcq } and c q C 17: E r = {w r i j : v i, v j V } 18: C = C\{c i } 19: end if 20: end for T r (i, j) = { 1 if nodes vi,v j V 1 or v i,v j V 2 0 otherwise These sub-matrices are used to derive the new sub co-occurance matrix, P y, where y = 1 if cluster c 1 was selected and y = 1 if cluster c 1 was selected. The Fiedler partition will result in two new clusters, c y 1 and cy 1. The final cluster set is C = {c y,c y 1,cy 1 } which is a concatenation of the two new clusters with the original cluster that was not chosen for bi-partitioning. Algorithm 5 describes this process for obtaining community structures for a given number of clusters. The algorithm, in this case, implements the homogeneity cost function, Hcq, for selecting subclusters for bi-partitioning. Homogeneity is a quality function which will be described further in the next section. 133

149 Algorithm 6 Fiedler Partition 1: Input: graph G = (V,E). 2: Output: Vertex sets V 1 and V 2. 3: Compute Laplacian Matrix, L, of G. 4: Compute V eigenvectors, u, and eigenvalues, λ, of L. 5: Order eigenvalues in ascending order: λ 1 λ 2... λ V. 6: u F = u i where i = minq{λq} λq 0. 7: for j = 1 to V do 8: 9: end for v j { V1 if u F ( j) 0 V 2 if u F ( j) < New Weighted Graph Clustering Quality Measures The primary problem with iteratively dividing a network is determining the optimal number of clusters. The most commonly used optimization measure, modularity [156], compares a community structure to the expected community structure of a single random graph such that there exists a high number of edges within clusters and low number of edges between clusters. Unfortunately, modularity does not always result in the highest value for the true community structure [188] and can reveal a suboptimal structure which may be due to the simplicity of the random model computed through s i s j 2m in Equation 4.9. In this section we propose alternative measures for quantifying the quality of a cluster partition across weighted networks Comparing Graph Partitions to Multiple Random Networks In this section, it is proposed that a comparison between the detected weighted community structure to numerous randomly weighted community structures be performed. It is expected that vertex pairs connected with large weights are more likely to be in the same cluster as opposed to pairs connected with a low weight. A community matrix is defined as 134

150 C k W i j if v i and v j are in the same cluster (i, j) = 0 otherwise (4.13) where k is the number of clusters. The random community matrix is obtained by randomly assigning each edge to one of k clusters and computing equation 4.13 for the random assignments to obtain C (k,p), such that p = {1,2,...,r} is the number of permutations. Membership size of rand each cluster may differ between the random model and the detected community structure for each p, but the number of clusters must be the same. A large p is recommended such that numerous random clustering possibilities are evaluated. The average inter-cluster weight corresponding to each clustering matrix is computed as Ĉ k 1 = N(N 1) C k (i, j) (4.14) i j and Ĉ (k,p) rand = 1 N(N 1) i j C (k,p) (i, j) (4.15) rand Averaging over all p such that Ĉ rand k = p 1 pĉ(k,p) rand Then, k is found such that Q = Ĉ k Ĉ k rand provides the random inter-cluster value. is maximum. A maximum value indicates that the vertices within a cluster share significantly larger weights than if they were clustered with any other set of vertices. This method relies heavily on weights and demonstrates how weight distribution is a significant contributor to community structure. 135

151 4.4.2 Homogeneity and Completeness By the definition of a cluster, the pairwise connections within a cluster must be stronger than the inter-cluster connections. In this section, measures that evaluate the quality of a particular clustering structure based on the distribution of the inter and intra-edge distributions across m graphs are proposed. These distributions will be defined similar to probability mass functions (pmfs). Prior to defining the pmfs of intra-cluster and inter-cluster edges, a function that maps the continuous edge values to a discrete alphabet is defined as f : W (r) i {w (r) i j [0,1]} refers to the ith row of the rth similarity matrix, S (r) i = {s (r) i j S (r) i where W (r) i = {1,2,...,N}}, and r = {1,2,...,m}. The elements of each row of the connectivity matrix across subjects are mapped to discrete integer values between 1 and N to eliminate the variation of edge strengths across subjects and extract only relational information about the pairwise edge strengths. The rank function is proposed to perform this mapping such that the node pair with the largest similarity measure is assigned a 1 and the weakest node pair is assigned N. When a particular cluster set C = {c 1,c 2,...,c k } is identified, the probability mass function of intra-cluster ranks for a particular cluster, ct, is defined as Pct intra (β) = Fct intra (β) N β=1 Fintra ct (β) (4.16) where Fct intra (β) = m r=1 N δ(s r i j,β) v i,v j c t (4.17) i, j 136

152 t {1,2,...,k}, β {1,2,...,N}, and 1 if x = y δ(x,y) = 0 otherwise Similarly, the probability mass function of inter-cluster ranks for cluster ct is defined as Pct inter (β) = Fct inter (β) N β=1 Finter ct (β) (4.18) where Fct inter (β) = m r=1 N δ(s r i j,β) v i c t;v j / ct (4.19) i, j The distribution of edge ranks within a cluster is expected to be concentrated around smaller values compared to the distribution of ranks between clusters since this would be indicative of optimal segregation in a community structure. This is quantified using a one-sided Mann-Whitney U test such that the relationship between medians, x, is expected to be x(p intra ct significance of p α. A score is assigned to cluster ct as ) < x(p ct inter ) with a Qct = 1 if x(p ct intra ) < x(p ct inter ) 0 otherwise (4.20) This is repeated for all k clusters and an average quality score is computed as Q = 1 k k t=1 Q(c t). Q will always be within the range of 0 and 1 where values closer to one indicate that most clusters contain strong within cluster relationships and weak between cluster relationships across all graphs. The proposed quality measure ensures intra-cluster relationships are significantly stronger 137

characteristics which further determine the quality of clustering.

complete cluster contains all possible data points within the sample space (Figure 4.4).

4: Variations in homogeneity and completeness where the true number of modules is 3; a)

that a cluster with high (low) homogeneity will have low (high) completeness.

153 than inter-cluster relationships. However the next step is to quantify homogeneity and completeness, two principle characteristics which further determine the quality of clustering. A homogenous cluster contains only data points which belong to the same class while a complete cluster contains all possible data points within the sample space (Figure 4.4). (a) (b) (c) Figure 4.4: Variations in homogeneity and completeness where the true number of modules is 3; a) High homogeneity and low completeness; b) High completeness and low homogeneity; c) High homogeneity and completeness. Homogeneity and completeness are approximately inversely proportional to each other such that a cluster with high (low) homogeneity will have low (high) completeness. Similar measures such as F-measure [189] and V-measure [190], have been used in the literature to quantify the quality of a cluster. Both measures, however, require a priori knowledge of class labels. In most cases, this ground truth is unknown and therefore alternative measures of cluster accuracy are needed. 138

154 For this reason, the characteristic that homogeneity is inversely proportional to the variance of edge ranks within a cluster is exploited. If a particular cluster is homogenous, then we would expect the pairwise connection strengths among the members of that cluster to be close to each other, thus implying the ranks of the weights to have small variance. We propose to quantify this variation through a measure of normalized entropy such that the lowest homogeneity score is obtained for a cluster containing a uniform representation of ranks or large variation among the edge weights and the maximum homogeneity score is obtained for a cluster containing only one rank. Therefore, a normalized entropy measure of the cluster s intra-cluster rank distribution would be indicative of homogeneity. Homogeneity is defined as Hct = 1 1 k k H(Pct intra ) t=1 log 2 N and will be maximum when H(Pct intra ) << log 2 N. Similarly, we define a metric to quantify completeness using relative entropy between the intercluster rank and intra-cluster rank distributions. Rank distributions for inter-edges and intra-edges are expected to be different from each other to maximize completeness. The similarity between two distributions is commonly quantified using divergence measures. In this paper, we propose to use the Jensen-Shannon divergence measure [191] since it is symmetric and bounded. With respect to completeness, a divergence measure approaching 1 is synonymous with increased completeness. Completeness of a cluster is therefore defined as Cct = 1 k k t=1 JS(Pintra ct,pct inter ) [ ] where JS(p,q) = 2 1 i p(i)log p(i) 2 0.5(p(i)+q(i)) + i q(i)log q(i) 2, p = P 0.5(p(i)+q(i)) ct inter, and q = Pct inter. Average completeness, Ĉ = 1 k k t=1 C ct, and average homogeneity, Ĥ = 1 k k t=1 H ct, are computed across all clusters. An approach to optimize both homogeneity and completeness is to maximize a cost function, λ = Ĉγ + Ĥ(1 γ) (4.21) 139

155 where γ [0, 1], for the whole community structure while maintaining an acceptable user defined value of Q. 4.5 Results Overlapping Clusters in the Brain Network During the Error-Related Negativity Since the data is in the form of fully connected weighted matrices, the proposed optimization clustering process using CONGA was well suited since it evaluated a range of thresholds to identify an optimal community structure. First, the phase synchrony data from all 90 subjects were averaged together to form a single error and correct response matrix. Then, the network was evaluated for 2 k 10 clusters and identified the degree range, R, and cluster size, k, which resulted in different modularity values, Q. Finally, the difference between the participation scores belonging to the two response types, Error and Correct, for each degree range is then found as C E = PSmat error PSmat correct F. Table 4.1: Top 3 electrodes with largest differences between Correct and Error participation scores Degree Range, R C E Frobenius Electrode 1 Electrode 2 Electrode F8 AF3 CP FC4 AF8 AF CP2 CP4 FT P2 AF8 C F1 P2 F F6 F1 FC4 Figures 4.5 and 4.6 are plots of modularity versus degree range R j and cluster size k for Correct and Error responses, respectively. Clusters with high modularity were observed for sparsely connected graphs, R 25, and for smaller number of clusters k 8. This is an expected result since 140

156 sparsely connected binary mappings of the phase synchrony index computed from EEG exhibit small-world network characteristics. Therefore, this results in tightly formed clusters [192]. Thus, a high modularity would be expected for these lower degree ranges where an underlying organized structure is observed. A two-sample t-test was applied for each electrode to determine the electrodes which showed significant differences between Error and Correct based on the participation score. Table 4.1 lists the top three electrodes with respect to the degree ranges that demonstrate the most significant (p 0.05) difference between Correct and Error responses. The table shows that the largest differences occur for degrees 21 R 25 and 26 R 30. The electrodes that demonstrated the most significant difference between Correct and Error responses are FC4, AF8, AF3, CP2, CP4, and FT7, corresponding to the frontal and central regions of the brain in accordance with prior work [19]. Finally, the optimal size, k, for a clustering community was determined by finding the k which maximizes N i=1 (PSC R,c k i PS E R,c k ) in 21 R 25 for 2 k 10 and is found to be equal i to 6. The clustering community corresponding to the largest difference between Correct and Error responses are shown in Figures 4.7 and 4.8. These optimal clusters have an average modularity of 0.5 for Correct responses and 0.45 for Error responses. It is observed that many of the nodes participate in more than two clusters during both Correct and Error responses. The plots associated with the Correct response demonstrate global non-localized participation of neural regions during the task response test as demonstrated by the large overlapping clusters spread around the topographic line plots. However, the plot of Figure 4.8b strongly indicates that the neural assemblies located around the anterior cingulate cortex specifically formed a highly synchronized cluster in response to the ERN. As expected, this region shows the primary distinguishing graph features between Correct data and Error data since the ACC would only be activated during an error response. 141

157 Therefore, the proposed optimization technique identified the distinguishing features of neuronal data represented by fully connected weighted matrices Hierarchical Spectral Consensus Clustering Analysis In this section, the proposed Fiedler Consensus Algorithm is implemented for revealing the hierarchical modular decomposition across multiple graphs. The optimal modular organization will be identified by maximizing the quality measure Q, the homogeneity Ĥ, and the completeness score Ĉ proposed in section First, the results obtained for a set of simulated networks will be compared to those obtained through averaging the similarity matrices and HPGA. Next, the proposed measures will be used to identify a modular structure which best describes the multivariate relationships across multiple subjects from connectivity graphs obtained from EEG data. The parameters of the quality measure, Q, and cost function, λ, were set to α = 0.01 and γ = 0.5, respectively Comparisons with Averaging and Conventional Voting Consensus Measures Ninety simulated networks consisting of 64 nodes and composed of 8 modules were generated to evaluate the Fiedler consensus method against two other consensus clustering methods, averaging and the HGPA. A mixing parameter was selected such that the difficulty of identifying the true modular structure increased with decreasing η where 0 η 1. The pairwise elements of each of the simulated connectivity matrices ranged between 0 and 1. The weights of the intra-modular edges were selected from a log-distribution with a mean of µ intra = 0.8 and a standard deviation of σ intra = 0.1. The weights of the inter-modular edges were selected from a log-normal distribution with a mean of µ inter = (1 η)µ intra and a standard deviation of σ inter = 0.1. Across all networks, each module had 8 members. Using the proposed Fiedler Consensus Algorithm, the Averaging method, and HGPA, the net- 142

158 works were evaluated for 2 k 20 modules. The best modular structure was selected where Q and λ were maximum. Table 4.2 reports the number of modules for each method with respect to varying degrees of inter-modular strengths. The proposed approach performed best at identifying the correct number of modules for η =1,0.5, 0.1 while Averaging and HPGA were increasingly less accurate. Therefore, the Fiedler Consensus Algorithm, along with the maximization of Q and λ is suitable for identifying the optimal number of modules in a network. Table 4.2: Number of modules identified through optimization of Q and λ in simulated data. Method/η Fiedler Method Averaging HGPA Hierarchical Modular Structure of the Brain During the Error-Related Negativity The new time-varying measure of phase synchrony in conjunction with graph measures are applied to a set of EEG data containing the error-related negativity (ERN). First, the proposed Fiedler consensus approach was applied to the set of Error and Correct data in order to identify an optimal modular structure. Hierarchical decomposition of the networks stopped when Q and λ were maximized. Optimization revealed Error responses were best represented by a structure composed of 4 modules (Figure 4.9), with Q = 1 and λ = 0.17, and Correct responses with 2 modules (Figure 4.10), with Q = 1 and λ = Figures 4.11 and 4.12 reveal sparser versions of Figures 4.9 and 4.10 such that only node pairs found to possess significantly greater phase synchrony, (Wilcoxon Sign-Rank, p 0.05), during Error responses compared to Correct responses are shown. The optimal modular organization describing Error responses consists of three distinct modules located around the mpfc and lpfc brain regions. The appearance of the central module is driven by increased mpfc activity which may be indicative of the activation of a monitoring system, anterior 143

159 cingulate cortex (ACC), which signals the need for enhanced cognitive control during conditions of conflict or after an error. The optimum modular organization of the brain during Correct responses does not indicate any segregation among nodes in the frontal region. The only partition is between the parietal and frontal regions of the brain. Since an error was not committed, activation of the ACC did not occur and therefore the brain network foregoed the need for increased cognitive control Inter and Intra Edge Phase Synchrony To further assess the statistical significance of the resulting modular structures of the brain, Q and λ were computed for random connectivity matrices. A random matrix was generated by randomly shuffling the values of each of the original connectivity matrices while preserving the average degree of the network. One hundred random matrices were generated for each subject across Correct and Error responses. The random connectivity matrices were partitioned into 2 k 10 clusters using the Fiedler Consensus Algorithm and the resulting Q, λ, and weighted modularity values were compared against Q, λ, and modularity of the Correct and Error response modular networks. The test indicated Q and λ resulting from the Correct and Error modular networks were greater than 100% of the corresponding random measures. Furthermore, no discernible modular structure was observed throughout any of the random graph partitions. Next, average intra-modular and inter-modular edge phase synchronies were computed for Correct and Error matrices and are presented in Figures 4.13 and These plots describe the segregated and integrated processes occurring in the brain. Strong functional connectivity is expected within modules and although weaker, connectivity is expected between modules in order to facilitate in the integration of information processing. Evaluation of the Error response crossmodule average synchrony matrix indicates the mpfc module shares greater (p 0.05) phase 144

160 synchronization with the left lpfc and right lpfc modules than with the Parietal module. There is less cross module integration between the Correct response modules. Furthermore, increased functional connectivity within the mpfc region during the ERN is illustrated by higher intra-modular synchrony compared to the other three modules (p 0.01) Local Hubs of the Brain Network Finally, the within module degree and participation coefficient were computed for all nodes across subjects. Figure 4.15 shows the average differences between the measures for Error and Correct responses. The within module degree was significantly greater, p 0.05, for Error along the far left and right lateral prefrontal cortex regions as well as in the medial prefrontal cortex. This increase in phase synchrony intensity may indicate greater enforcement of cognitive control by the mpfc. The participation coefficient was significantly greater, p 0.01, in Error across the entire pre-frontal region of the brain. This may be indicative of increased integration of information processing between the monitoring and control regions of the brain. 4.6 Conclusions In this chapter, two new approaches for community detection, in fully connected weighted graphs, which represent the multivariate relationships among neural populations were presented. First, using CONGA in an optimization process, a method to identify an optimal community structure composed of overlapping clusters based on thresholding and a recent definition of modularity was proposed. The proposed implementation of CONGA permitted modules to overlap and identified a hierarchical decomposition of the brain network. Furthermore, identification of the optimal modular structure representative of human brain connectivity without the need for a priori information on the number of clusters or cluster size is possible with our proposed algorithm. Hierarchical spectral partitioning algorithm using the Fiedler vector for discovering a common community 145

161 structure across multiple subjects was also presented. By considering the cluster tendencies of nodes across multiple subjects, a community structure representative of all evaluated data could be generated. This approach is an improvement to conventional methods which include averaging which neglects variance information contained across data sets. Finally, alternative measures to the modularity function, an indicator of cluster quality, are presented which were shown to be effective in evaluating community structures identified in weighted networks. The proposed measures were applied to the set of EEG data representing the error-related negativity and revealed that neural assemblies located around the Anterior Cingulate Cortex tend to participate in a common cluster. Both the CONGA optimization process and the Fiedler Algorithm show that node FCz is within the center of this cluster only during Error responses. Therefore, this is indicative of increased in-homogenous structure in the frontal region of the brain compared to Correct responses during the task related test. This observation can be further explored by focussing analysis on the frontal region of the brain. Data associated with the parietal region of the brain may contribute noise to the clustering algorithms and therefore ignoring these regions will most likely reveal further hierarchical levels of the community structure. 146

162 Modularity for Correct responses Average Degree Range, R Number of clusters, k Figure 4.5: Modularity with respect to degree and number of clusters for Correct responses. 147

163 Modularity for Error responses Average Degree Range, R Number of clusters, k Figure 4.6: Modularity with respect to degree and number of clusters for Error responses. 148

164 (a) (b) (c) (d) (e) (f) Figure 4.7: Six overlapping clusters derived for Correct responses. 149

165 (a) (b) (c) (d) (e) (f) Figure 4.8: Six overlapping clusters derived for Error responses. 150

166 Figure 4.9: Hierarchical decomposition of the brain network during an Error response. 151

167 Figure 4.10: Hierarchical decomposition of the brain network during a Correct response. 152

168 Figure 4.11: Significance cluster plots for Error responses. 153

169 Figure 4.12: Significance cluster plots for Correct responses 154

170 Figure 4.13: Cross-Module average synchrony for Error responses across four clusters. 155

171 Figure 4.14: Cross-Module average synchrony for Correct responses across 2 clusters. 156

172 (a) (b) Figure 4.15: Normalized differences between Hub Identification Measures between Error and Correct a) Within Module Z-Score; b) Participation Coefficient. 157

A Multivariate Time-Frequency Based Phase Synchrony Measure for Quantifying Functional Connectivity in the Brain

A Multivariate Time-Frequency Based Phase Synchrony Measure for Quantifying Functional Connectivity in the Brain Dr. Ali Yener Mutlu Department of Electrical and Electronics Engineering, Izmir Katip Celebi