Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim

Size: px
Start display at page:

Download "Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim"

Transcription

1 Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim

2 Goal: Discover Structure and Func;on of Complex systems in the Cell Identify the different regulators and their target genes that are involved in the system. Represent the relationship between regulators and their target genes as a network Nodes: entities (regulators, target genes) Edges: regulatory relationship High- level goal: Use high throughput data to discover paaerns of combinatorial regula.on and to understand how the ac.vity of genes involved in related biological processes is coordinated and interconnected.

3 Overview Bayesian networks (network with directed edges): Module networks and their extensions Module network (Segal et al., Nature Gene.cs 2003): Gene module s ac.vity is determined by their expression levels of regulator genes Geronemo (Lee et al., PNAS 2006): Gene module s ac.vity is determined by their expression levels of regulator gene and SNPs Lirnet (Lee et al., PLoS Gene.cs 2009): incorporates prior knowledge CONEXIC (Akavia et al., Cell 2010): cancer data analysis for copy number varia.on and gene expression data Gaussian graphical models (network with undirected edges) and their extensions

4 Probabilis;c Graphical Models p(x 3 X 2 ) X 3 p(x 2 X 1 ) X 2 p(x 1 ) X 1 X 6 p(x 4 X 1 ) p(x 5 X 4 ) X 4 X 5 p(x 6 X 2, X 5 ) The joint distribu.on on (X 1, X 2,, X N ) factors according to the parent- of rela.ons defined by the edges E : p(x 1, X 2, X 3, X 4, X 5, X 6 ) = p(x 1 ) p(x 2 X 1 )p(x 3 X 2 ) p(x 4 X 1 )p(x 5 X 4 )p(x 6 X 2, X 5 )

5 Learning Bayesian Networks Density es.ma.on Model data distribu.on in popula.on Learn both the graph structure and the associated probability density Probabilis.c inference: Predic.on Classifica.on Data p(x 3 X 2 ) X 3 p(x 2 X 1 ) X 2 p(x 1 ) X 1 X 6 p(x 4 X 1 ) p(x 5 X 4 ) p(x 6 X 2, X 5 ) X 4 X 5 p(x 1, X 2, X 3, X 4, X 5, X 6 ) = p(x 1 ) p(x 2 X 1 )p(x 3 X 2 ) p(x 4 X 1 )p(x 5 X 4 )p(x 6 X 2, X 5 )

6 The Module Network Idea Unlike methods that represent individual genes, module- based methods explicitly model modular structure. This helps in: Reducing dependence on possibly noisy measurements for individual genes, by combining informa.on among genes in the same module Elevated sta.s.cal significance

7 The Module Network Idea Bayesian Network CPD 1 Module Network Share parameters and dependencies between variables with similar behavior CPD 1 MSFT Module I MSFT CPD 2 CPD 3 MOT CPD 4 MOT CPD 2 DELL INTL DELL INTL Module II CPD 5 AMAT HPQ CPD 6 AMAT HPQ CPD 3 Module III Slides from the presenta.on by Segal et al. UAI03

8 Learning Module Network Module Network Model defini.on Learning the model Experimental results

9 Module Network Components Module Assignment Func.on A( ) A(MSFT)=M I AMAT DELL MSFT MOT HPQ INTL A(MOT)=A(DELL)=A(INTL) =M II A(AMAT)= A(HPQ)=M III Module I MSFT MOT DELL INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

10 Module Network Components Module Assignment Func.on Set of parents for each module Pa(M I )= Pa(M II )={MSFT} Pa(M III )={DELL, INTL} Module I MSFT MOT DELL INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

11 Module Network Components Module Assignment Func.on Set of parents for each module Condi.onal probability density (CPD) template for each module Module I MSFT MOT DELL INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

12 Ground Bayesian Network A module network induces a ground BN over X A module network defines a coherent probabilty distribu.on over X if the ground BN is acyclic Module I MSFT MSFT DELL MOT INTL Module II DELL MOT INTL AMAT HPQ Ground Bayesian Network Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

13 Module Graph Nodes correspond to modules M i M j if at least one variable in M i is a parent of M j M I M II M III Module graph Module I MSFT Theorem: The ground BN is acyclic if the module graph is acyclic Module II DELL MOT INTL Acyclicity checked efficiently using the module graph Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

14 Learning Module Network Module Network Model defini.on Learning the model Experimental results

15 Learning Overview Given data D, find assignment func.on A and structure S that maximize the Bayesian score Marginal data likelihood Marginal likelihood Assignment / structure prior Data likelihood Parameter prior Slides from the presenta.on by Segal et al. UAI03

16 Bayesian Score Decomposi;on Bayesian score decomposes by modules Module j parents Module j variables

17 Bayesian Score Decomposi;on Bayesian score decomposes by modules Module I MSFT MOT DELL INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

18 Likelihood Func;on θ MI Score M2 (MSFT, X 2 : D) = Score(DELL,MSFT : D) + Score(MOT,MSFT : D) + Score(INTL,MSFT : D) θ MII MSFT Module I MSFT MOT DELL INTL θ MIII DELL,INTL Module II Module score decomposes by variables in the module Module III Instance 1 Instance 2 Instance 3 AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

19 Algorithm Overview Find assignment func.on A and structure S that maximize the Bayesian score Find initial assignment A Assignment function A Improve assignments Improve structure Dependency structure S Slides from the presenta.on by Segal et al. UAI03

20 Learning Dependency Structure Heuris.c search with operators Add/delete parent for module Module I MSFT MOT DELL INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

21 Learning Dependency Structure Heuris.c search with operators Add/delete parent for module Handle acyclicity Can be checked efficiently on the module graph Module I MSFT X MOT M I M II M III DELL INTL X INTL Module I AMAT HPQ Module II Module III Slides from the presenta.on by Segal et al. UAI03

22 Learning Dependency Structure Heuris.c search with operators Add/delete parent for module Handle acyclicity Can be checked efficiently on the module graph M I M II M III Module I DELL MSFT MOT INTL X INTL Module I INTL Module III Module II AMAT HPQ Module III Slides from the presenta.on by Segal et al. UAI03

23 Learning Dependency Structure Heuris.c search with operators Add/delete parent for module Handle acyclicity Can be checked efficiently on the module graph Module I MSFT Efficient computa.on Ajer applying operator for module M j, only update score of operators for module M j DELL Module II MOT INTL AMAT HPQ Module III Slides from the presenta.on by Segal et al. UAI03

24 Algorithm Overview Find assignment func.on A and structure S that maximize the Bayesian score Find initial assignment A Improve assignments Assignment function A Dependency structure S Improve structure Slides from the presenta.on by Segal et al. UAI03

25 Learning Assignment Func;on DELL Module I MSFT MOT INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

26 Learning Assignment Func;on A(DELL)=M I Score: 0.7 DELL Module I MSFT MOT INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

27 Learning Assignment Func;on A(DELL)=M I Score: 0.7 Module I MSFT A(DELL)=M II Score: 0.9 DELL MOT INTL Module II Module III AMAT HPQ Slides from the presenta.on by Segal et al. UAI03

28 Learning Assignment Func;on A(DELL)=M I Score: 0.7 Module I MSFT A(DELL)=M II Score: 0.9 MOT INTL A(DELL)=M III Score: cyclic! Module II DELL AMAT HPQ Module III Slides from the presenta.on by Segal et al. UAI03

29 Learning Module Network Module Network Model defini.on Learning the model Experimental results

30 Learning Module Network from Gene Expression Data

31 Determine Combinatorial Control From Gene Expression data Under Mul;ple Condi;ons

32 Resul;ng Module Row: genes Columns: array (condi.on) Segal et al Nature Gene.cs 2003

33 Experimental Design Hypothesis: Regulator X ac;vates process Y Experiment: Knock out X and repeat experiment false false true HAP4 true Ypl230W X?

34 Biological Experiments Valida;on Ypl230w Were the differen.ally expressed genes predicted as targets? Rank modules by enrichment for diff. expressed genes # Module Significance 39 Protein folding 7/23, 1e Cell differentiation 6/41, 2e Glycolysis and foldin g 5/37, 4e Mitochondrial and protein fate 5/37, 4e - 2 Ppt1 # Module Significance 14 Ribosomal and phosphate metabolism 8/32, 9e 3 11 Amino acid and purine metabolism 11/53, 1e 2 15 mrna, rrna and trna processing 9/43, 2e 2 39 Protein f olding 6/23, 2e 2 30 Cell cycle 7/30, 2e 2 Kin82 All regulators regulate predicted modules # Module Significance 3 Ener gy and osmotic stress I 8/31, 1e 4 2 Energy, osmolarity & camp signaling 9/64, 6e 3 15 mrna, rrna and trna processing 6/43, 2e 2 Segal et al., Nature Genetics, 2003

35 Extensions of Module Network: Geronemo (Lee et al., PNAS 2006) Both gene expression levels and SNPs can be regulators for gene expression levels of other genes Purple regulator: expression regulator Blue regulator: SNP regulator

36 Extensions of Module Network: Lirnet (Lee et al., PLoS Gene;cs 2009) Incorporate prior knowledge (regulatory features) on gene- expression regulators and SNP regulators Regulatory features (on the lej) and learned regulatory priors (purple bar)

37 Extensions of Module Network: CONEXIC (Akavia et al., Cell 2010) Extends module networks to handle cancer copy number varia.on and gene expression data to find cancer- causing (driver) muta.on A driver muta.on should occur in mul.ple tumors more ojen than would be expected by chance

38 Extensions of Module Network: CONEXIC (Akavia et al., Cell 2010) A driver muta.on may be associated (correlated) with the expression of a group of genes that form a module A driver may be over- expressed due to amplifica.on of the DNA encoding it or the ac.on of other factors. The target genes correlate with driver gene expression rather than driver copy number

39 Overview Bayesian networks (network with directed edges): Module networks and their extensions Module network (Segal et al., Nature Gene.cs 2003): Gene module s ac.vity is determined by their expression levels of regulator genes Geronemo (Lee et al., PNAS 2006): Gene module s ac.vity is determined by their expression levels of regulator gene and SNPs Lirnet (Lee et al., PLoS Gene.cs 2009): incorporates prior knowledge CONEXIC (Akavia et al., Cell 2010): cancer data analysis for copy number varia.on and gene expression data Gaussian graphical models (network with undirected edges) and their extensions

40 Gaussian Graphical Models The gene expressions for K genes Y={y 1,, y K } are Gaussian distributed: Y ~ N(0 K,Θ 1 ) 0 K : vector of K zeros Θ: K by K inverse covariance matrix Then, the inverse covariance matrix Θ encodes a Gaussian graphical model Non- zero elements in Θ correspond to edges

41 Gaussian Graphical Models Non- zero elements in Θ correspond to edges Gaussian graphical model encoded by Θ y 1 y 2 y 3 y 4 y y 1 y 3 y 2 y 4 y 5

42 Gaussian Graphical Models Non- zero elements in Θ correspond to edges Gaussian graphical model encoded by Θ y 1 y 2 y 3 y 4 y y 1 y 3 y 2 y 4 y 5 Nonzero/zero paaern of the y 5 s column matches the neighbors of the node y 5

43 Probabilis;c Graphical Models Sta.s.cs and Mechanics are independent of each other condi.onal on Algebra

44 Learning a Sparse Gaussian Graphical Models Minimize nega.ve log likelihood of data with L 1 penalty argmin logdet Θ tr(sθ) λ Θ 1 where - tr(a) is the trace of matrix A - S is a K by K sample covariance - Θ 1 is an L 1 regulariza.on The op.miza.on problem is convex! Many sojware packages exist (e.g., BIG&QUIC, Hsieh et al., NIPS 2013; FastGGM, Wang et al., Plos Comp Bio 2016)

45 Extensions of Gaussian Graphical Models Network with hubs (Tan et al., JMLR 2014) Network with block structures (Tan et al., UAI 2009) Learned from Glioblastoma expression data (Hubs as pink nodes)

46 Summary Modeling gene networks with Bayesian networks Probabilis.c model for learning modules of variables and their structural dependencies Module networks have improved performance over Bayesian networks Sta.s.cal robustness Interpretability Reconstruc.on of many known regulatory modules and predic.on of targets for unknown regulators Modeling gene networks with undirected networks Gaussian graphical models are extremely popular: fast learning methods are available (more efficient than Bayesian network learning)

Regulatory Inferece from Gene Expression. CMSC858P Spring 2012 Hector Corrada Bravo

Regulatory Inferece from Gene Expression. CMSC858P Spring 2012 Hector Corrada Bravo Regulatory Inferece from Gene Expression CMSC858P Spring 2012 Hector Corrada Bravo 2 Graphical Model Let y be a vector- valued random variable Suppose some condi8onal independence proper8es hold for some

More information

CSCI1950 Z Computa3onal Methods for Biology Lecture 24. Ben Raphael April 29, hgp://cs.brown.edu/courses/csci1950 z/ Network Mo3fs

CSCI1950 Z Computa3onal Methods for Biology Lecture 24. Ben Raphael April 29, hgp://cs.brown.edu/courses/csci1950 z/ Network Mo3fs CSCI1950 Z Computa3onal Methods for Biology Lecture 24 Ben Raphael April 29, 2009 hgp://cs.brown.edu/courses/csci1950 z/ Network Mo3fs Subnetworks with more occurrences than expected by chance. How to

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource Networks Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource Networks in biology Protein-Protein Interaction Network of Yeast Transcriptional regulatory network of E.coli Experimental

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Priors in Dependency network learning

Priors in Dependency network learning Priors in Dependency network learning Sushmita Roy sroy@biostat.wisc.edu Computa:onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu

More information

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16 Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign

More information

Inferring Transcriptional Regulatory Networks from High-throughput Data

Inferring Transcriptional Regulatory Networks from High-throughput Data Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20

More information

Bayesian networks Lecture 18. David Sontag New York University

Bayesian networks Lecture 18. David Sontag New York University Bayesian networks Lecture 18 David Sontag New York University Outline for today Modeling sequen&al data (e.g., =me series, speech processing) using hidden Markov models (HMMs) Bayesian networks Independence

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment

More information

CS 6140: Machine Learning Spring 2016

CS 6140: Machine Learning Spring 2016 CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment

More information

Parameter Es*ma*on: Cracking Incomplete Data

Parameter Es*ma*on: Cracking Incomplete Data Parameter Es*ma*on: Cracking Incomplete Data Khaled S. Refaat Collaborators: Arthur Choi and Adnan Darwiche Agenda Learning Graphical Models Complete vs. Incomplete Data Exploi*ng Data for Decomposi*on

More information

The Monte Carlo Method: Bayesian Networks

The Monte Carlo Method: Bayesian Networks The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks

More information

STAD68: Machine Learning

STAD68: Machine Learning STAD68: Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 1 Evalua;on 3 Assignments worth 40%. Midterm worth 20%. Final

More information

Regression.

Regression. Regression www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts linear regression RMSE, MAE, and R-square logistic regression convex functions and sets

More information

Sta$s$cal sequence recogni$on

Sta$s$cal sequence recogni$on Sta$s$cal sequence recogni$on Determinis$c sequence recogni$on Last $me, temporal integra$on of local distances via DP Integrates local matches over $me Normalizes $me varia$ons For cts speech, segments

More information

Bias/variance tradeoff, Model assessment and selec+on

Bias/variance tradeoff, Model assessment and selec+on Applied induc+ve learning Bias/variance tradeoff, Model assessment and selec+on Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège October 29, 2012 1 Supervised

More information

Graphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum

Graphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum Graphical Models Lecture 1: Mo4va4on and Founda4ons Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. Board work Expert systems the desire for probability

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016 Boolean models of gene regulatory networks Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016 Gene expression Gene expression is a process that takes gene info and creates

More information

Latent Variable models for GWAs

Latent Variable models for GWAs Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs

More information

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Part 2. Representation Learning Algorithms

Part 2. Representation Learning Algorithms 53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then

More information

Introduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas

Introduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas Introduc)on to RNA- Seq Data Analysis Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas Material: hep://)ny.cc/rnaseq Slides: hep://)ny.cc/slidesrnaseq

More information

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i CSE 473: Ar+ficial Intelligence Bayes Nets Daniel Weld [Most slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at hnp://ai.berkeley.edu.]

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler + Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table

More information

Graph structure learning for network inference

Graph structure learning for network inference Graph structure learning for network inference Sushmita Roy sroy@biostat.wisc.edu Computa9onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu

More information

Transcrip:on factor binding mo:fs

Transcrip:on factor binding mo:fs Transcrip:on factor binding mo:fs BMMB- 597D Lecture 29 Shaun Mahony Transcrip.on factor binding sites Short: Typically between 6 20bp long Degenerate: TFs have favorite binding sequences but don t require

More information

CSCI1950 Z Computa3onal Methods for Biology* (*Working Title) Lecture 1. Ben Raphael January 21, Course Par3culars

CSCI1950 Z Computa3onal Methods for Biology* (*Working Title) Lecture 1. Ben Raphael January 21, Course Par3culars CSCI1950 Z Computa3onal Methods for Biology* (*Working Title) Lecture 1 Ben Raphael January 21, 2009 Course Par3culars Three major topics 1. Phylogeny: ~50% lectures 2. Func3onal Genomics: ~25% lectures

More information

Sta$s$cal Op$miza$on for Big Data. Zhaoran Wang and Han Liu (Joint work with Tong Zhang)

Sta$s$cal Op$miza$on for Big Data. Zhaoran Wang and Han Liu (Joint work with Tong Zhang) Sta$s$cal Op$miza$on for Big Data Zhaoran Wang and Han Liu (Joint work with Tong Zhang) Big Data Movement Big Data = Massive Data- size + High Dimensional + Complex Structural + Highly Noisy Big Data give

More information

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species 02-710 Computational Genomics Reconstructing dynamic regulatory networks in multiple species Methods for reconstructing networks in cells CRH1 SLT2 SLR3 YPS3 YPS1 Amit et al Science 2009 Pe er et al Recomb

More information

Probability and Structure in Natural Language Processing

Probability and Structure in Natural Language Processing Probability and Structure in Natural Language Processing Noah Smith, Carnegie Mellon University 2012 Interna@onal Summer School in Language and Speech Technologies Quick Recap Yesterday: Bayesian networks

More information

Inferring Protein-Signaling Networks II

Inferring Protein-Signaling Networks II Inferring Protein-Signaling Networks II Lectures 15 Nov 16, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022

More information

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki Phylogene)cs IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, 2016 Joyce Nzioki Phylogenetics The study of evolutionary relatedness of organisms. Derived from two Greek words:» Phle/Phylon: Tribe/Race» Genetikos:

More information

Computational Genomics

Computational Genomics Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event

More information

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein

More information

Inferring Causal Phenotype Networks from Segregating Populat

Inferring Causal Phenotype Networks from Segregating Populat Inferring Causal Phenotype Networks from Segregating Populations Elias Chaibub Neto chaibub@stat.wisc.edu Statistics Department, University of Wisconsin - Madison July 15, 2008 Overview Introduction Description

More information

An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology. Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University

An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology. Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University Why Do We Care? Necessity in today s labs Principled approach:

More information

Systems biology and biological networks

Systems biology and biological networks Systems Biology Workshop Systems biology and biological networks Center for Biological Sequence Analysis Networks in electronics Radio kindly provided by Lazebnik, Cancer Cell, 2002 Systems Biology Workshop,

More information

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu Mul$ple Sequence Alignment Methods Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu Species Tree Orangutan Gorilla Chimpanzee Human From the Tree of the Life

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Sta$s$cs for Genomics ( )

Sta$s$cs for Genomics ( ) Sta$s$cs for Genomics (140.688) Instructor: Jeff Leek Slide Credits: Rafael Irizarry, John Storey No announcements today. Hypothesis testing Once you have a given score for each gene, how do you decide

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 1 Evalua:on

More information

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Written Exam 15 December Course name: Introduction to Systems Biology Course no Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate

More information

Logis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Logis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com Logis&c Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Structure Learning: the good, the bad, the ugly

Structure Learning: the good, the bad, the ugly Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform

More information

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1 10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex

More information

UVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia

UVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia UVA CS 4501: Machine Learning Lecture 6: Linear Regression Model with Regulariza@ons Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sec@ons of this course

More information

Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum

Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum Graphical Models Lecture 10: Variable Elimina:on, con:nued Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. 1 Last Time Probabilis:c inference is

More information

Introduction to continuous and hybrid. Bayesian networks

Introduction to continuous and hybrid. Bayesian networks Introduction to continuous and hybrid Bayesian networks Joanna Ficek Supervisor: Paul Fink, M.Sc. Department of Statistics LMU January 16, 2016 Outline Introduction Gaussians Hybrid BNs Continuous children

More information

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13 Computer Vision Pa0ern Recogni4on Concepts Part I Luis F. Teixeira MAP- i 2012/13 What is it? Pa0ern Recogni4on Many defini4ons in the literature The assignment of a physical object or event to one of

More information

Outline. What is Machine Learning? Why Machine Learning? 9/29/08. Machine Learning Approaches to Biological Research: Bioimage Informa>cs and Beyond

Outline. What is Machine Learning? Why Machine Learning? 9/29/08. Machine Learning Approaches to Biological Research: Bioimage Informa>cs and Beyond Outline Machine Learning Approaches to Biological Research: Bioimage Informa>cs and Beyond Robert F. Murphy External Senior Fellow, Freiburg Ins>tute for Advanced Studies Ray and Stephanie Lane Professor

More information

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.

More information

From genes to func.on Gene regula.on and transcrip.on

From genes to func.on Gene regula.on and transcrip.on From genes to func.on Gene regula.on and transcrip.on Systems biology for system engineers Part 2 Sofia Pe(ersson Informa.on Coding Dept. of Electrical Engineering Linköping University The eukaryo.c cell

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

Learning Bayesian Networks Does Not Have to Be NP-Hard

Learning Bayesian Networks Does Not Have to Be NP-Hard Learning Bayesian Networks Does Not Have to Be NP-Hard Norbert Dojer Institute of Informatics, Warsaw University, Banacha, 0-097 Warszawa, Poland dojer@mimuw.edu.pl Abstract. We propose an algorithm for

More information

Evolu&on of Cellular Interac&on Networks. Pedro Beltrao Krogan and Lim UCSF

Evolu&on of Cellular Interac&on Networks. Pedro Beltrao Krogan and Lim UCSF Evolu&on of Cellular Interac&on Networks Pedro Beltrao Krogan and Lim Labs @ UCSF Point muta&ons Recombina&on Duplica&ons Mutants Mutants Point muta&ons Recombina&on Duplica&ons Changes in protein- protein,

More information

Research Statement on Statistics Jun Zhang

Research Statement on Statistics Jun Zhang Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Graphical Models. Lecture 3: Local Condi6onal Probability Distribu6ons. Andrew McCallum

Graphical Models. Lecture 3: Local Condi6onal Probability Distribu6ons. Andrew McCallum Graphical Models Lecture 3: Local Condi6onal Probability Distribu6ons Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. 1 Condi6onal Probability Distribu6ons

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Systems biology Introduction to Bioinformatics Systems biology: modeling biological p Study of whole biological systems p Wholeness : Organization of dynamic interactions Different behaviour of the individual

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

{ p if x = 1 1 p if x = 0

{ p if x = 1 1 p if x = 0 Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

Self Similar (Scale Free, Power Law) Networks (I)

Self Similar (Scale Free, Power Law) Networks (I) Self Similar (Scale Free, Power Law) Networks (I) E6083: lecture 4 Prof. Predrag R. Jelenković Dept. of Electrical Engineering Columbia University, NY 10027, USA {predrag}@ee.columbia.edu February 7, 2007

More information

Causal Graphical Models in Systems Genetics

Causal Graphical Models in Systems Genetics 1 Causal Graphical Models in Systems Genetics 2013 Network Analysis Short Course - UCLA Human Genetics Elias Chaibub Neto and Brian S Yandell July 17, 2013 Motivation and basic concepts 2 3 Motivation

More information

Principles of Gene Expression

Principles of Gene Expression Principles of Gene Expression I. Introduc5on Genome : the en*re set of genes (transcrip*on units) of an organism Transcriptome : the en*re set of marns found in a cell at a given *me Proteome : the en*re

More information

Machine learning for Dynamic Social Network Analysis

Machine learning for Dynamic Social Network Analysis Machine learning for Dynamic Social Network Analysis Manuel Gomez Rodriguez Max Planck Ins7tute for So;ware Systems UC3M, MAY 2017 Interconnected World SOCIAL NETWORKS TRANSPORTATION NETWORKS WORLD WIDE

More information

Multi Omics Clustering. ABDBM Ron Shamir

Multi Omics Clustering. ABDBM Ron Shamir Multi Omics Clustering ABDBM Ron Shamir 1 Outline Introduction Cluster of Clusters (COCA) icluster Nonnegative Matrix Factorization (NMF) Similarity Network Fusion (SNF) Multiple Kernel Learning (MKL)

More information

Least Squares Parameter Es.ma.on

Least Squares Parameter Es.ma.on Least Squares Parameter Es.ma.on Alun L. Lloyd Department of Mathema.cs Biomathema.cs Graduate Program North Carolina State University Aims of this Lecture 1. Model fifng using least squares 2. Quan.fica.on

More information

UVA CS / Introduc8on to Machine Learning and Data Mining

UVA CS / Introduc8on to Machine Learning and Data Mining UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department

More information

Reasoning Under Uncertainty: Belief Network Inference

Reasoning Under Uncertainty: Belief Network Inference Reasoning Under Uncertainty: Belief Network Inference CPSC 322 Uncertainty 5 Textbook 10.4 Reasoning Under Uncertainty: Belief Network Inference CPSC 322 Uncertainty 5, Slide 1 Lecture Overview 1 Recap

More information

Quan&fying Uncertainty. Sai Ravela Massachuse7s Ins&tute of Technology

Quan&fying Uncertainty. Sai Ravela Massachuse7s Ins&tute of Technology Quan&fying Uncertainty Sai Ravela Massachuse7s Ins&tute of Technology 1 the many sources of uncertainty! 2 Two days ago 3 Quan&fying Indefinite Delay 4 Finally 5 Quan&fying Indefinite Delay P(X=delay M=

More information

Differen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies

Differen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies Differen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies Florian Tramèr, Zhicong Huang, Erman Ayday, Jean- Pierre Hubaux ACM CCS 205 Denver, Colorado,

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

networks in molecular biology Wolfgang Huber

networks in molecular biology Wolfgang Huber networks in molecular biology Wolfgang Huber networks in molecular biology Regulatory networks: components = gene products interactions = regulation of transcription, translation, phosphorylation... Metabolic

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types

More information

Introduction to Particle Filters for Data Assimilation

Introduction to Particle Filters for Data Assimilation Introduction to Particle Filters for Data Assimilation Mike Dowd Dept of Mathematics & Statistics (and Dept of Oceanography Dalhousie University, Halifax, Canada STATMOS Summer School in Data Assimila5on,

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

Bayesian Networks. Motivation

Bayesian Networks. Motivation Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations

More information

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it? Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains

More information

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application, CleverSet, Inc. STARMAP/DAMARS Conference Page 1 The research described in this presentation has been funded by the U.S.

More information

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information

Lecture 5: Bayesian Network

Lecture 5: Bayesian Network Lecture 5: Bayesian Network Topics of this lecture What is a Bayesian network? A simple example Formal definition of BN A slightly difficult example Learning of BN An example of learning Important topics

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr

More information

Bayesian Networks (Part II)

Bayesian Networks (Part II) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part II) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,

More information

Classifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University

Classifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University Classifica(on and predic(on omics style Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University Classifica(on Learning Set Data with known classes Prediction Classification rule Data with unknown

More information

Learning P-maps Param. Learning

Learning P-maps Param. Learning Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Prof. Dr. Ralf Möller Dr. Özgür L. Özçep Universität zu Lübeck Institut für Informationssysteme. Tanya Braun (Exercises)

Prof. Dr. Ralf Möller Dr. Özgür L. Özçep Universität zu Lübeck Institut für Informationssysteme. Tanya Braun (Exercises) Prof. Dr. Ralf Möller Dr. Özgür L. Özçep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercises) Slides taken from the presentation (subset only) Learning Statistical Models From

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information