Network Motifs of Pathogenic Genes in Human Regulatory Network

Similar documents
Clustering of Pathogenic Genes in Human Co-regulatory Network. Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015

Introduction to Bioinformatics

Logic Regression: Biological Motivation Cyclic Gene Study

Preface. Contributors

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

Measuring TF-DNA interactions

Chapter 15 Active Reading Guide Regulation of Gene Expression

Self Similar (Scale Free, Power Law) Networks (I)

Cellular Systems Biology or Biological Network Analysis

Networks in systems biology

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

networks in molecular biology Wolfgang Huber

Network motifs in the transcriptional regulation network (of Escherichia coli):

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Simulation of Gene Regulatory Networks

Network Infusion to Infer Information Sources in Networks Soheil Feizi, Ken Duffy, Manolis Kellis, and Muriel Medard

CHAPTER : Prokaryotic Genetics

How much non-coding DNA do eukaryotes require?

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Predicting Protein Functions and Domain Interactions from Protein Interactions

Androgen-independent prostate cancer

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Spectral Alignment of Networks Soheil Feizi, Gerald Quon, Muriel Medard, Manolis Kellis, and Ali Jadbabaie

NETWORK BIOLOGY AND COMPLEX DISEASES. Ahto Salumets

Introduction to Bioinformatics

Proteomics Systems Biology

Graph Theory and Networks in Biology

Biological Networks Analysis

Computational Biology: Basics & Interesting Problems

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

OCR Biology Checklist

OCR Biology Checklist

Graph Alignment and Biological Networks

BIOLOGY IB DIPLOMA PROGRAM PARENTS ORIENTATION DAY

Context dependent visualization of protein function

Package NarrowPeaks. September 24, 2012

86 Part 4 SUMMARY INTRODUCTION

Virginia Western Community College BIO 101 General Biology I

Discovering molecular pathways from protein interaction and ge

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Bio 101 General Biology 1

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

A Semi-Supervised Dimensionality Reduction Method to Reduce Batch Effects in Genomic Data

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Introduction to Systems Biology

Graph Theory and Networks in Biology arxiv:q-bio/ v1 [q-bio.mn] 6 Apr 2006

Stochastic processes and

Clustering Lecture 1: Basics. Jing Gao SUNY Buffalo

V 5 Robustness and Modularity

Bioinformatics. Transcriptome

Gene Ontology. Shifra Ben-Dor. Weizmann Institute of Science

Network Biology-part II

Phylogenetic Networks, Trees, and Clusters

Learning in Bayesian Networks

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Discovering Correlation in Data. Vinh Nguyen Research Fellow in Data Science Computing and Information Systems DMD 7.

Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors

Network deconvolution as a general method to distinguish direct dependencies in networks

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Clustering and Network

Sig2GRN: A Software Tool Linking Signaling Pathway with Gene Regulatory Network for Dynamic Simulation

A general co-expression network-based approach to gene expression analysis: comparison and applications

BTRY 7210: Topics in Quantitative Genomics and Genetics

25 : Graphical induced structured input/output models

Genetic transcription and regulation

UNIVERSITY OF CALIFORNIA SANTA BARBARA DEPARTMENT OF CHEMICAL ENGINEERING. CHE 154: Engineering Approaches to Systems Biology Spring Quarter 2004

AP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji

De novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes

Miller & Levine Biology

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

Physical network models and multi-source data integration

Spectra of Adjacency and Laplacian Matrices

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species

Chromatin Structure and Transcriptional Activity Shaping Genomic Landscapes in Eukaryotes

Mathematics, Genomics, and Cancer

Basic Synthetic Biology circuits

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Progress on Parallel Chip-Firing

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

Studying Life. Lesson Overview. Lesson Overview. 1.3 Studying Life

Biological Networks. Gavin Conant 163B ASRC

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

1 Matrix notation and preliminaries from spectral graph theory

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Regulation of Gene Expression

Control and synchronization in systems coupled via a complex network

The focus of this study is whether evolutionary medicine can be considered a distinct scientific discipline.

CONTENTS. P A R T I Genomes 1. P A R T II Gene Transcription and Regulation 109

Cybergenetics: Control theory for living cells

Introduction Centrality Measures Implementation Applications Limitations Homework. Centrality Metrics. Ron Hagan, Yunhe Feng, and Jordan Bush

Latent Variable models for GWAs

Overview of clustering analysis. Yuehua Cui

Data preprocessing. DataBase and Data Mining Group 1. Data set types. Tabular Data. Document Data. Transaction Data. Ordered Data

Lecture 7: Simple genetic circuits I

Transcription:

Network Motifs of Pathogenic Genes in Human Regulatory Network Michael Colavita Mentor: Soheil Feizi Fourth Annual MIT PRIMES Conference May 18, 2014

Topics Background Genetics Regulatory Networks The Human Regulatory Network Network Motifs Questions and Methods Sparse Disconnect Low Distance Clustering Network Metrics Clustering Detection Method Clusters Found

Genetic Background Cell s genes have regulatory effects on each other Upregulation Downregulation Transcription factors control the expression of other genes Target genes have no regulatory effects Both can be subject to regulation by other genes Figure: The central dogma of molecular biology with regulation of gene expression

Genetic Regulatory Networks Medium for storing regulatory information for computational analysis Captures regulatory dynamics of a genome Nodes represent genes Figure: A sample of the human regulatory network Edges indicate upregulatory effects Edge weights indicate strength of regulatory activity

The Human Regulatory Network Primary dataset used for regulation data Created by combining datasets into a unified network Co-expression network Motif network ChIP network 2757 transcription factors 16464 target genes ~1,000,000 regulatory relationships (cutoff =.95)

Topics Background Genetics Regulatory Networks The Human Regulatory Network Network Motifs Questions and Methods Sparse Disconnect Low Distance Clustering Network Metrics Clustering Detection Method Clusters Found

Network Motifs of Pathogenic Genes Motifs are recurring patterns within the network Patterns in structure Consistent high or low enrichment for given metrics Indegree/Outdegree Eigenvector/Betweenness Centrality Clustering Coefficient Do certain network motifs lead to genetic disease through positive feedback?

Motivation for Motif Identification Examining motifs of pathogenic genes (dbgap) Genes associated with genetic disease Understanding the regulatory behavior behind genetic diseases Investigating larger scale regulatory structures Possible regulatory basis behind genetic disease

Method of Motif Detection 1. Generate a binary network from the top 5% of edges. 2. Compute enrichment of pathogenicity over a given network metric. 3. Compute p-value by comparing this enrichment to that of a randomized disease.

P-value Example p =.05

Network Motifs Identified Analyzed 45 diseases in the network of 19,221 genes Identified two major motifs so far Sparse disconnect Low distance clustering

Sparse Disconnect Visualization

Pathogenic Motifs: Sparse Disconnect Exhibited in age-related macular degeneration (types 1a and 1b) 4 diseases found with this motif Enrichment of high indegree (p = 0.0080) Enrichment of low outdegree (p =.0137) Outdegree = 2 Low density within pathogenic subnetwork (p =.0161) Pathogenic transcription factors and target genes are disconnected 25+ components Indegree = 3

Sparse Disconnect Visualization

Low Distance Clustering Visualization

Pathogenic Motifs: Low Distance Clustering Exhibited in schizophrenia (type 2) Enrichment for both high indegree (p =.0084) and high outdegree (p =.0548) Positive feedback Enrichment for high betweenness centrality (p =.0481) and high eigenvector centrality (p =.0605) High density within pathogenic sub-network (p =.0239) 99% of genes are in a single connected component

Low Distance Clustering Visualization

Network Metrics Enrichment of indegree or outdegree was present in 36% of diseases Indegree Outdegree Both Neither Centrality measures were enriched in 9% of diseases Betweenness Centrality Both Neither No diseases were consistently enriched over the genes clustering coefficient

Topics Background Genetics Regulatory Networks The Human Regulatory Network Network Motifs Questions and Methods Sparse Disconnect Low Distance Clustering Network Metrics Clustering Detection Method Clusters Found

Clustering Another point of interest for genetic diseases Searching for cohesive regulatory units Provides more information about how the pathogenic genes interact

Cluster Detection Detects clusters through spectral clustering Simplest form: uses network s algebraic connectivity to divide the nodes into two groups Maximize cluster density and minimize cluster count 1 0.8 0.6 0.4 0.2 0 Density Enrichment (p-value) Number of Clusters Clusters become too small Cutoff for statistical significance

Spectral Clustering Goal: divide a network into two clusters such that the number of edges between k clusters is minimized Method: Combined spectral clustering with the k-means algorithm to optimize clusters

Age-related Macular Degeneration (type 1b) Clustering k = 3

Cardiovascular Disease Risk (type 1b) Clustering k = 2

Future Goals Continue search for pathogenic motifs Identify additional clusters Different clustering algorithms Investigate GO terms within clusters

Thank You To MIT PRIMES for this engaging and challenging research opportunity To my mentor Soheil Feizi for his assistance and guidance throughout the project To Professor Manolis Kellis for suggesting the project