Correlation Networks

Similar documents
Biological Concepts and Information Technology (Systems Biology)

Unravelling the biochemical reaction kinetics from time-series data

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

An introduction to SYSTEMS BIOLOGY

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

SUPPLEMENTARY INFORMATION

Preface. Contributors

Biological Networks. Gavin Conant 163B ASRC

Androgen-independent prostate cancer

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Discovering Binding Motif Pairs from Interacting Protein Groups

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Discovering molecular pathways from protein interaction and ge

Introduction to Bioinformatics

Correlation. Engineering Mathematics III

Lecture 8: Temporal programs and the global structure of transcription networks. Chap 5 of Alon. 5.1 Introduction

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

Introduction to Bioinformatics

Grundlagen der Systembiologie und der Modellierung epigenetischer Prozesse

Systems biology and biological networks

Bioinformatics and Computerscience

Identifying Signaling Pathways

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Theoretical aspects of C13 metabolic flux analysis with sole quantification of carbon dioxide labeling. Guangquan Shi 04/28/06

Proteomics. 2 nd semester, Department of Biotechnology and Bioinformatics Laboratory of Nano-Biotechnology and Artificial Bioengineering

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Self Similar (Scale Free, Power Law) Networks (I)

Agilent MassHunter Profinder: Solving the Challenge of Isotopologue Extraction for Qualitative Flux Analysis

networks in molecular biology Wolfgang Huber

SYSTEMS BIOLOGY 1: NETWORKS

Biological Networks Analysis

Identifying Bio-markers for EcoArray

Weighted gene co-expression analysis. Yuehua Cui June 7, 2013

Supplementary Materials for

Interaction Network Analysis

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

FUNDAMENTALS of SYSTEMS BIOLOGY From Synthetic Circuits to Whole-cell Models

Graph Theory and Networks in Biology

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Inferring Causal Phenotype Networks from Segregating Populat

Plant Physiology Research Methods

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Unsupervised machine learning

Extend Your Metabolomics Insight!

Proteomics Systems Biology

Erzsébet Ravasz Advisor: Albert-László Barabási

BMD645. Integration of Omics

S A T T A I T ST S I T CA C L A L DAT A A T

Computational approaches for functional genomics

October 08-11, Co-Organizer Dr. S D Samantaray Professor & Head,

FCModeler: Dynamic Graph Display and Fuzzy Modeling of Regulatory and Metabolic Maps

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Clustering and Network

Mixture models for analysing transcriptome and ChIP-chip data

Learning in Bayesian Networks

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Introduction to Systems Biology

Complex (Biological) Networks

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

Introduction to Bioinformatics

Multi-residue analysis of pesticides by GC-HRMS

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Better Bond Angles in the Protein Data Bank

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Overview of clustering analysis. Yuehua Cui

Multi-scale Modeling of Ecological Systems: Systems Biology in Application to Natural Resource Management

In-Depth Assessment of Local Sequence Alignment

About OMICS Group Conferences

Bioinformatics Chapter 1. Introduction

What is Systems Biology

Differential Modeling for Cancer Microarray Data

The Role of Network Science in Biology and Medicine. Tiffany J. Callahan Computational Bioscience Program Hunter/Kahn Labs

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Campbell Biology AP Edition 11 th Edition, 2018

Fuzzy Clustering of Gene Expression Data

Causal Discovery by Computer

Genome Assembly. Sequencing Output. High Throughput Sequencing

Statistical Methods for Integration of Multiple Omics Data

International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February ISSN

Data science with multilayer networks: Mathematical foundations and applications

Guide to Peptide Quantitation. Agilent clinical research

RNA evolution and Genotype to Phenotype maps

1 Searching the World Wide Web

Graph Alignment and Biological Networks

Computational Biology Course Descriptions 12-14

Course plan Academic Year Qualification MSc on Bioinformatics for Health Sciences. Subject name: Computational Systems Biology Code: 30180

Dynamical Modeling in Biology: a semiotic perspective. Junior Barrera BIOINFO-USP

ComPlEx: conservation and divergence of co-expression networks in A. thaliana, Populus and O. sativa

Networks in systems biology

SC55 Anatomy and Physiology Course #: SC-55 Grade Level: 10-12

Lecture 15: Realities of Genome Assembly Protein Sequencing

Geert Geeven. April 14, 2010

Measuring relationships among multiple responses

A general co-expression network-based approach to gene expression analysis: comparison and applications

BioControl - Week 6, Lecture 1

Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law

Metabolic Networks analysis

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Transcription:

QuickTime decompressor and a are needed to see this picture. Correlation Networks Analysis of Biological Networks April 24, 2010 Correlation Networks - Analysis of Biological Networks 1

Review We have seen that Network Theory has been employed in specific fields such as social sciences for a long time, e.g. social networks, networks of actors etc. Since the late 1990s this research area gained wider publicity, e.g. the WWW and different kinds of biological networks In fact, network theory can be applied to almost every scientific branch but also to many aspects of our every day lives In this regard, we need to keep in mind that edges/links represent some sort of relation between individual vertices/nodes The interpretation of the relations pretty much depends on what the network is attempting to represent Correlation networks have gained popularity especially in molecular biology April 24, 2010 Correlation Networks - Analysis of Biological Networks 2

Introduction Recently biological science experienced revival of Systems Biology Technological ability to virtually sequence the genome from any organism triggered by the development of various highthroughput assays Sequencing based Genomics Microarray based Transcriptomics GC-MS based Metabolomics MS based Proteomics April 24, 2010 Correlation Networks - Analysis of Biological Networks 3

Intro cont. These technologies allow us simultaneous monitoring of all components of cellular inventories: Gene transcript Protein Metabolite Multiparallel omics technologies open up the possibility to gain comprehensive insight into understanding biological systems and their complexity However, due to complexity and diverse activity and functionality of different cellular elements, qualitative and quantitative data gathering cannot suffice to understand systems We need some sort of statistical measures to describe interrelations!!! CORRELATION ANALYSIS April 24, 2010 Correlation Networks - Analysis of Biological Networks 4

General remarks Correlation values can be viewed from two perspectives: 1. Probability: here we are interested whether the correlation originates from mere coincidence or a real connection? Sometimes weak connections are generated Due to the fact that a correlation coefficient is not only related to its strength but also to the number of examined samples Thus, through examining a large number of samples, weak correlations can become significant 2. Strength: here we consider the strength of the interaction Correlation networks are a priori undirected Important when attempting to interpret networks based on correlation April 24, 2010 Correlation Networks - Analysis of Biological Networks 5

Samples, Profiles, and Replica Set Basic sources for data in molecular and biological science are samples In principal, the sample describes the biological material used for analyses The sample represents a part of an organism or its entirety This can be specified by various attributes: treatment, condition of growth, age, etc. For construction of correlation networks a set of diverse variables have to be measured, which is enabled by high-throughout technologies The primary profile is simply the readout of the technology platform, e.g. chromatogram Primary profiles are converted into secondary profiles by various algorithms and normalization April 24, 2010 Correlation Networks - Analysis of Biological Networks 6

Samples cont. In general, such profiles consist of a number of diverse variables - up to many thousands - each with their respective observations These observations describe semi-/quantitative or relative values of the abundance of the variables Secondary profiles can contain further attributes, such as values or scores, which may be used by further algorithms for normalization After normalization and data preprocessing, we can then proceed to construction of correlation networks April 24, 2010 Correlation Networks - Analysis of Biological Networks 7

Construction and analyses of correlation networks 1 Data/profiles generation and/or collection 2 Data matrix selection and assembly of experiments 3 Correlation matrix equations, robustness 4.1 Correlation 4.0 Boolean 4.2 Distance 4 Graph/Network Conversion 5 Graph analysis global, sub-networks, guide gene driven 6 Interpretation 7 Evaluation April 24, 2010 Correlation Networks - Analysis of Biological Networks 8

A practical example - Metabolite Network We utilized the set of 76 Introgression Lines (IL) of the tomato Each IL can be considered as a genotype and therefore as a condition We were interested in the metabolic profile of the seed Dry seeds were gathered in Akko/Israel of the 2004 season Approximately 6 replicas per IL Seeds were subjected to GC-MS April 24, 2010 Correlation Networks - Analysis of Biological Networks 9

Data and profiles The first step for correlation network analysis is the generation and/or collection of profile data Usually derived from multiplex high-throughput technologies An alternative source for profiles is represented by the evergrowing number of public repositories These profiles provide access to thousands of profiles We chose an GC-MS based approach April 24, 2010 Correlation Networks - Analysis of Biological Networks 10

Abundance Intensity % Methodological approach Extraction 73 217 319 m/z 40.2 Retention time Amino acids, organic acids, sugars, etc. April 24, 2010 Correlation Networks - Analysis of Biological Networks 11

Data Set and Matrix With a set of profiles in hand, one can select and assemble data sets and ultimately convert them into data matrices Complex matrices comprise profiles that differ with respect to genotypes, environmental conditions, treatments, developmental stages etc. To aid in the selection, the experimental description may help Thus, interpretation of the generalized matrix should still be exercised with care For GC-MS based analysis we use a software named TagFinder to construct Metabolite matrices April 24, 2010 Correlation Networks - Analysis of Biological Networks 12

Data set and matrix cont. Var1 Cond. Var2 Var_n met1 met2 met3 met_n data Gt1_rep1 data data data data data data data Gt1_rep2 data data data data data data data Gt1_rep3 data data data data data data data Gt3_rep1 data data data data data data m x n matrix with many variables needed for raw data processing, but not useful for correlation analysis further processing - normalization is required April 24, 2010 Correlation Networks - Analysis of Biological Networks 13

Further processing with R, MatLab, or Excel Var1 Cond. Var2 Var_n met1 met2 met3 met_n data Gt1_rep1 data data data data data data data Gt1_rep2 data data data data data data data Gt1_rep3 data data data data data data data Gt3_rep1 data data data data data data extract relevant data only Verify data Normalize data Fold change etc. end up with m x n matrix and proceed with correlation analysis Cond. met1 met2 met3 met4 met5 met6 met_n Gt1_rep1 data data data data data data data Gt1_rep2 data data data data data data data Gt1_rep3 data data data data data data data Gt3_rep1 data data data data data data data April 24, 2010 Correlation Networks - Analysis of Biological Networks 14

Correlation Matrix The next step is to convert this variable-by-unit matrix into a symmetrical variable-by-variable correlation matrix Often the Pearson product moment correlation r is used m x n variable-by-unit matrix m1 m2 m3 m4 m_n c1 1.2 1.1 8.5 6.0 1.3 c2 4.5 2.2 5.6 6.1 1.5 c3 3.0 9.1 7.4 4.0 1.2 Pearson correlation algorithm m x m variable-by-variable correlation matrix m1 m2 m3 m4 m_n m1 1-0.3 0.9 0.4 0.2 m2-0.3 1 0.8 0.2 0.5 m3 0.9 0.8 1-0.1 0.8 m4 0.4 0.2-0.1 1 0.5 m_n 0.2 0.5 0.8 0.5 1 April 24, 2010 Correlation Networks - Analysis of Biological Networks 15

Correlation Matrix cont. Correlation values are usually expressed as values from -1 to 1 Depending on scientific background the strength of a correlation is defined differently In biological science an r value from > 0.5 v -0.5 is considered to be strong It is pivotal to bare in mind that the correlation suggested by the Pearson product moment is entirely based on a mathematical equation The r value is based on the linearity of two variables to be compared We cannot infer any biochemical or bimolecular processes based on correlation values Correlation values suggest an interdependency of two variables without determining which variable represents the cause and which variable responds April 24, 2010 Correlation Networks - Analysis of Biological Networks 16

Correlation Matrix cont. A special case of the Pearson s r is the non-parametric Spearman rankorder correlation Where the differences di = xi yi between the ranks of each observation on the two variables are calculated It can describe liner and non-linear relationships It is more robust to outliers However, usually we achieve similar results with Pearson product moment in biological data Another type of correlation is canonical correlation April 24, 2010 Correlation Networks - Analysis of Biological Networks 17

Heatmap representation April 24, 2010 Correlation Networks - Analysis of Biological Networks 18

Probability, Confidence, and Power Beyond measuring the relationship of variables the correlation obtained can be used for further statistical testing or comparison Often such statistical analysis requires Fishers-z-transformation P-values, however, are achieved by permutation tests Pearson correlation Pearson correlation X Y X Y 1 1 1 3 2 2 Permutation of Y 2 1 3 3 = r 1 n times 3 2 = r 2 4 4 If r n > r 1 => p = 1/n 4 5 5 5 p value of < 0.05 % is considered to be significant 5 4 April 24, 2010 Correlation Networks - Analysis of Biological Networks 19

Probability heatmap April 24, 2010 Correlation Networks - Analysis of Biological Networks 20

Network Matrix The next step to bring the correlation into actual network form is discretization of the correlation matrix - for an unweighted network This can be done by utilizing the p-value matrix as generated before We have to decide which p-value shall be our cutoff value (0.05 or 0.01) Meaning, only relationships falling under the threshold will be integrated into the network and eventually displayed in the network Alternatively, we can combine r values with p-values Meaning, we choose and consider solely relationships which fall under a certain p-value threshold, let s say 0.01, and exceed an r value of 0.7 Eventually, based on our result we can construct an adjacency matrix April 24, 2010 Correlation Networks - Analysis of Biological Networks 21

Adjacency matrices Discretization matrix - truth/boolean table for significant and insignificant relationships, respectively m1 m2 m3 m4 m5 m1 0 1 0 1 0 m2 1 0 1 0 1 m3 0 1 0 1 0 m4 1 0 1 0 1 m5 0 1 0 1 0 Weighted adjacency matrix - significant relationships are presented as their actual absolute r values m1 m2 m3 m4 m5 m1 0 0.9 0 0.97 0 m2 0.9 0 0.83 0 0.78 m3 0 0.83 0 0.96 0 m4 0.97 0 0.96 0 0.77 m5 0 0.78 0 0.77 0 April 24, 2010 Correlation Networks - Analysis of Biological Networks 22

The Network QuickTime and a decompressor are needed to see this picture. QuickTime and a decompressor are needed to see this picture. Fruchterman Reingold Circle QuickTime and a decompressor are needed to see this picture. Kamda Kawai April 24, 2010 Correlation Networks - Analysis of Biological Networks 23

Correlation Network Analysis The analysis and interpretation of the network is totally dependent on the biological question of the experiment As such, we can utilize the core network to identify subgraphs and study them in detail Alternatively, if we are interested on networks originating from different genotypes, conditions, treatments etc., we may want to compare the graphs As such, we are interested in which nodes are conserved throughout all networks Can we identify common motifs? What about hubs? Special score equations haven been developed to address these questions In this sense, just as the analysis, the interpretation also depends on the raised question April 24, 2010 Correlation Networks - Analysis of Biological Networks 24

April 24, 2010 Correlation Networks - Analysis of Biological Networks 25