Clustering of Pathogenic Genes in Human Co-regulatory Network. Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015

Similar documents
Network Motifs of Pathogenic Genes in Human Regulatory Network

Preface. Contributors

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

1 Matrix notation and preliminaries from spectral graph theory

Logic Regression: Biological Motivation Cyclic Gene Study

Proteomics Systems Biology

Spectral Alignment of Networks Soheil Feizi, Gerald Quon, Muriel Medard, Manolis Kellis, and Ali Jadbabaie

Network Infusion to Infer Information Sources in Networks Soheil Feizi, Ken Duffy, Manolis Kellis, and Muriel Medard

Physical network models and multi-source data integration

Network motifs in the transcriptional regulation network (of Escherichia coli):

Inferring Transcriptional Regulatory Networks from High-throughput Data

Learning in Bayesian Networks

Peddie Summer Day School

OCR Biology Checklist

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

OCR Biology Checklist

Biological Networks Analysis

Statistical aspects of prediction models with high-dimensional data

Introduction to Bioinformatics

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

networks in molecular biology Wolfgang Huber

Self Similar (Scale Free, Power Law) Networks (I)

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Communities, Spectral Clustering, and Random Walks

Graph Wavelets to Analyze Genomic Data with Biological Networks

Canada s Experience with Chemicals Assessment and Management and its Application to Nanomaterials

Computational Biology: Basics & Interesting Problems

Computational methods for predicting protein-protein interactions

A general co-expression network-based approach to gene expression analysis: comparison and applications

When Dictionary Learning Meets Classification

Mathematics, Genomics, and Cancer

How much non-coding DNA do eukaryotes require?

Discovering molecular pathways from protein interaction and ge

Lowndes County Biology II Pacing Guide Approximate

Graph Theory and Networks in Biology

Machine Learning for Data Science (CS4786) Lecture 11

Androgen-independent prostate cancer

The Role of Network Science in Biology and Medicine. Tiffany J. Callahan Computational Bioscience Program Hunter/Kahn Labs

Networks and Their Spectra

Basic modeling approaches for biological systems. Mahesh Bule

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

V 5 Robustness and Modularity

Inverse Power Method for Non-linear Eigenproblems

Predicting Protein Functions and Domain Interactions from Protein Interactions

Total

Networks in systems biology

Cellular Systems Biology or Biological Network Analysis

Introduction to Biology with Lab

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering. Genome 373 Genomic Informatics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Principles of Genetics

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Measuring TF-DNA interactions

SYSTEMS BIOLOGY 1: NETWORKS

Types of biological networks. I. Intra-cellurar networks

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Data Mining and Analysis: Fundamental Concepts and Algorithms

Predicting causal effects in large-scale systems from observational data

Gene Ontology and overrepresentation analysis

Bioinformatics. Transcriptome

NETWORK BIOLOGY: UNDERSTANDING THE CELL S FUNCTIONAL ORGANIZATION

Francisco M. Couto Mário J. Silva Pedro Coutinho

Functional Characterization and Topological Modularity of Molecular Interaction Networks

The Generalized Haar-Walsh Transform (GHWT) for Data Analysis on Graphs and Networks

CHAPTER 6 : LITERATURE REVIEW

P E R E N C O - C H R I S T M A S P A R T Y

Latent Variable models for GWAs

Genome Assembly. Sequencing Output. High Throughput Sequencing

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Miller & Levine Biology

Systems biology and biological networks

Feature gene selection method based on logistic and correlation information entropy

Prentice Hall. Biology: Concepts and Connections, 6th edition 2009, (Campbell, et. al) High School

Computational Systems Biology

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

Simulation of Gene Regulatory Networks

Graph Alignment and Biological Networks

Biological networks CS449 BIOINFORMATICS

COMPETENCY GOAL 1: The learner will develop abilities necessary to do and understand scientific inquiry.

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Introduction to clustering methods for gene expression data analysis

Overview of clustering analysis. Yuehua Cui

Chapter 1. Vectors, Matrices, and Linear Spaces

V 6 Network analysis

STA141C: Big Data & High Performance Statistical Computing

Linear models for the joint analysis of multiple. array-cgh profiles

STATE UNIVERSITY OF NEW YORK COLLEGE OF TECHNOLOGY CANTON, NEW YORK. COURSE OUTLINE BIOL 310 The Human Genome. Prepared By: Ron Tavernier

Statistical Physics approach to Gene Regulatory Networks

Module Based Neural Networks for Modeling Gene Regulatory Networks

Learning (from) Networks

Erzsébet Ravasz Advisor: Albert-László Barabási

Introduction to Spectral Graph Theory and Graph Clustering

BIOLOGICAL PHYSICS MAJOR OPTION QUESTION SHEET 1 - will be covered in 1st of 3 supervisions

CS 556: Computer Vision. Lecture 13

Introduction to clustering methods for gene expression data analysis

Spectral Clustering. Guokun Lai 2016/10

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Transcription:

Clustering of Pathogenic Genes in Human Co-regulatory Network Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015

Topics Background Genetic Background Regulatory Networks The Human Regulatory Network Co-regulatory Networks Modularity Purpose and Methods Implementation Results Clustering Algorithm Goals Algorithmic Basis Initial Method and Progress

Genetic Background Genes have regulatory effects on each other Upregulation Downregulation Transcription factors have regulatory effects Target genes do not affect other genes Both can be subject to regulation by other genes Figure: The central dogma of molecular biology including gene regulation

Genetic Regulatory Networks Method of storing regulatory information in a computationally accessible format Captures regulatory dynamics of a genome Allows for the use of algorithms from the field of graph theory Nodes represent genes Figure: A small section of the human regulatory network Directed edges indicate upregulatory effects Edge weights indicate strength of regulatory activity

The Human Regulatory Network Primary dataset used for pulling regulation data Created by combining datasets into a unified network Co-expression network Motif network ChIP network 2757 transcription factors 16464 target genes ~1,000,000 regulatory relationships (cutoff =.95)

Co-regulatory Networks Capture different relationships than regulatory networks Nodes still represent genes; edges represent similar regulatory profiles R a Ç R b R a È R b ³ C Undirected network Clustering is better defined

Topics Background Genetic Background Regulatory Networks The Human Regulatory Network Co-regulatory Networks Modularity Purpose and Methods Implementation Results Clustering Algorithm Goals Algorithmic Basis Initial Methodology and Progress

Motivations for Analysis Pathogenic genes are associated with a specific genetic disease (dbgap) Search for differences and patterns in how pathogenic genes are regulated Understanding the basis of genetic diseases Applications to gene therapy

Preliminary Analysis: Modularity Method of examining the types of connections in the network: Non-pathogenic - Non-pathogenic Pathogenic Pathogenic Non-pathogenic - Pathogenic How does the number of edges between nodes of the same classification compare to the expected value (null model)? Assortative (preference for same classification) Disassortative (preference for different)

Hypothesis Test and P-value Example p =.05

Modularity Testing Analyzed 45 diseases across the network of 19,221 genes MATLAB for parallel operations Possible outcomes: Insignificant (p > 0.05) Assortative (modular) Disassortative

Modularity Results 12/45 (26.7%) diseases displayed statistically significant assortativity (p < 0.05) Clopidogrel a, b, j, k, l (p = 0.01) Cardiovascular disease risk T1D Type 1 Multiple Sclerosis Psoriasis More connections between similarly classified genes than expected

Implications Suggests that the network contains communities of pathogenic and nonpathogenic genes Potential for statistically significant clusters based on pathogenicity Significance of the co-regulatory structure Suggests that pathogenic genes share common regulatory characteristics

Topics Background Genetic Background Regulatory Networks The Human Regulatory Network Co-regulatory Networks Modularity Purpose and Methods Implementation Results Clustering Algorithm Goals Algorithmic Basis Initial Methodology and Progress

Clustering Another point of interest for genetic diseases Typically based on connectivity Searching for cohesive regulatory units Based on modularity Provides more information about how genes interact Identifies patterns in regulatory profiles

Co-regulatory Clustering Goals Identify clusters by combining network structure with pathogenicity classification Combine co-regulation with common genetic disease associations Clusters should indicate groups of pathogenic genes that share regulatory profiles Indicates regulatory patterns that can lead to genetic disease

Algorithmic Basis: Spectral Partitioning Goal: divide a network into two groups such that the modularity is minimized Method: use the sign of values in the second eigenvector of the graph Laplacian to determine classification Estimation stemming from a constraint relaxation

Algorithmic Basis: Spectral Clustering Similar to spectral partitioning, but produces k clusters Basic Algorithm: Define a similarity matrix that quantifies the similarity between two vertices Use the similarity matrix to produce a graph Laplacian Use the values in the first k eigenvectors as input to the k-means algorithm

Current Hybrid Algorithm Construct a similarity matrix S capturing the structure of the network and the classifications of vertices Assign a similarity value based on pure connectivity Scale these values for each pairing using their classifications Genes of the same type will have a higher similarity Genes of different types will have a lower similarity Spectral clustering is applied to S

Future Goals Continue work on the clustering algorithm Incorporate shortest path and other measures of distance into the similarity matrix Refine similarity values for pairings Extend study to examine genetic disease information Linear mixed model for genome-wide association studies

Thank You To MIT PRIMES for providing this research opportunity To my mentor Soheil Feizi for his support, suggestions, and assistance To Professor Manolis Kellis for suggesting the project