Nested Effects Models at Work

Size: px

Start display at page:

Download "Nested Effects Models at Work"

Sharyl Wilkerson
6 years ago
Views:

1 21/06/2010 Nested ffects Models at Work Tutorial Session: Network Modelling in Systems Biology with R Prof. Dr. Holger Fröhlich Algorithmic Bioinformatics Bonn-Aachen International Center for Information Technology (B-IT)

2 Learning From Perturbation ffects P1 P2 P3 P4 effects effects effects effects Microarrays RNAi-Screening Reverse Phase Protein Arrays mass spec RNAseq... Page 2 Holger Fröhlich Algorithmic Bioinformatics

3 Break a system to learn, how it works Page 3 Holger Fröhlich Algorithmic Bioinformatics

4 Pathways from RNAi Data an xample Page 4 Holger Fröhlich Algorithmic Bioinformatics

5 Pathways from RNAi Data an xample Computational approach: Bayesian Networks - Nested ffects Models Markowetz et al., 2005, 2007 Fröhlich et al., 2007, 2008a, b, 2009 Tresch and Markowetz, 2008 Zeller et al, 2008 Vaske et al, 2009 Anchang et al., 2009 Page 5 Holger Fröhlich Algorithmic Bioinformatics

6 Principle Idea of Nested ffects Models Distinguish between: Perturbed genes Observed effects S 1 S 2 S 3 S 4 Φ θ Measure downstream effects of each knockdown Network reconstruction is based on observed effects under different perturbations Observed effects Perturbed genes S1 S2 S3 S4 Markowetz et al., 2005 Page 6 Holger Fröhlich Algorithmic Bioinformatics

7 Nested ffects Models (NMs) are transitively closed graphs, which explain the nested structure of downstream effects. Page 7 Holger Fröhlich Algorithmic Bioinformatics

8 Likelihood of the Signaling Graph (Φ) Two different approaches: Bayesian: Integrate over effects linkage graphs Θ assuming P( Θ Φ ) = P( Θ) : P( D Φ ) = P( D Φ, Θ) P( Θ) Θ Markowetz et al., 2005; Fröhlich et al., 2007, 2008 Take MAP/ML estimator for Θ: Θ ˆ = arg max P( D Φ, Θ) P( Θ) (, ˆ ˆ P D Φ Θ) P( Φ) P( Φ D, Θ ) = P( D) Θ Tresch et al., 2008 Page 8 Holger Fröhlich Algorithmic Bioinformatics

9 Likelihood of the Signaling Graph (Φ) Two different approaches: Bayesian: Integrate over effects linkage graphs Θ assuming P( Θ Φ ) = P( Θ) : P( D Φ ) = P( D Φ, Θ) P( Θ) Θ Markowetz et al., 2005; Fröhlich et al., 2007, 2008 Take MAP/ML estimator for Θ: Θ ˆ = arg max P( D Φ, Θ) P( Θ) (, ˆ ˆ P D Φ Θ) P( Φ) P( Φ D, Θ ) = P( D) Θ Tresch et al., 2008 Page 9 Holger Fröhlich Algorithmic Bioinformatics

10 Calculation of ffect Likelihoods Factorization of the likelihood under i.i.d. assumption: P( D Φ ) = P( D Φ, Θ) P( Θ) Θ = P( D Φ, Θ = 1) P( Θ = 1) k ε s S t S tk sk sk ~ P( D m = Φ ) k ε s S t S tk tk ts 1. Model for binary data D with fixed error probabilities α and β: Dtk = 1 Dtk = 0 P( D ) 1 if mtk 1 tk mtk = α α = β 1 β if mtk = 0 Markowetz et al., 2005 Page 10 Holger Fröhlich Algorithmic Bioinformatics

11 Modeling Continuous Data 2. Data D are computed as p-values for significant change, when comparing interventions to non-interventions. Under the null hypothesis (i.e. expecting no effect) p-values are distributed uniformly Under the alternative hypothesis (i.e. expecting an effect) there is a high density for small p-values and a strong decrease for increasing p-values [Pounds et al., 2003]. f ( D ) = π + π Beta( D, α,1) + π Beta( D,1, β ) tk 1k 2k tk t 3k tk t -> fit via M algorithm P( D m ) tk tk f ( Dtk ) f (1) 1 f (1) if mtk = 1 = 1 if mtk = 0 Fröhlich et al., 2008 Page 11 Holger Fröhlich Algorithmic Bioinformatics

12 Using NMs in R library(nem) load( raw_pvaluesboutros2002.rda ) D = getdensitymatrix(pvalues) Page 12 Holger Fröhlich Algorithmic Bioinformatics

13 How to Infer the Network Structure? Choose candidate graph S 1 S 2 S 3 S 4 Calculate score, e.g. using Bayesian statistics (average over -Gene positions) Likelihood model Propose different topology Complete enumeration of all topologies Markowetz et al., 2005 Page 13 Holger Fröhlich Algorithmic Bioinformatics

14 How to Infer the Network Structure? Choose candidate graph S 1 S 2 S 3 S 4 Calculate score, e.g. using Bayesian statistics (average over -Gene positions) Combinatorial explosion: Combinatorial explosion: n = 4: 355 possible networks n = 10: ~10 27 possible networks Likelihood model Propose different topology Complete enumeration of all topologies Markowetz et al., 2005 Page 14 Holger Fröhlich Algorithmic Bioinformatics

15 Heuristics for Large Networks (> 4 S-Genes). MCMC sampling time consuming neighborhood relation in transitively closed graphs difficult Greedy hill climbing Fröhlich et al., Bioinformatics, 2008 Module networks Fröhlich et al., BMC Bioinformatics, 2007 Fröhlich et al., Bioinformatics, 2008 Triplets inference Markowetz et al., Bioinformatics, 2007 Alternating MAP optimization Tresch and Markowetz, Stat. Appl. Mol. Biol., 2008 Page 15 Holger Fröhlich Algorithmic Bioinformatics

16 Large Scale Networks: Module Networks Problem: complete enumeration of all network hypotheses only possible for small networks (< 5 S-genes) Solution: Divide and conquer 1. Highest scoring subnetworks for modules of S-Genes 2. stimate connections between modules Page 16 Holger Fröhlich Algorithmic Bioinformatics

17 Large Scale Networks: (a) Module Networks S 3 S 4 S 2 S 5 S 9 Log-likelihood S 6 S 7 S 1 S Network S 8 Fröhlich et al., 2007, 2008 Page 17 Holger Fröhlich Algorithmic Bioinformatics

inference= ModuleNetwork, control=control, verbose=fals) plot.

18 Network Inference with the nem-package control=set.default.paramet ers(unique(colnames(d)), type="contmllbayes") mynem = nem(d, inference= ModuleNetwork, control=control, verbose=fals) plot.nem(mynem, SCC=FALS, D=D, draw.lines=tru) Page 18 Holger Fröhlich Algorithmic Bioinformatics

19 Automated Selection of Relevant -Genes (Feature Selection) Motivation: Irrelevant -genes can degrade network estimation accuracy 1. Select -Genes having a positive contribution to the model s log-likelihood only. 2. Re-estimate the network with the new set of -Genes 3. Iterate the process until convergence Page 19 Holger Fröhlich Algorithmic Bioinformatics Fröhlich et al., 2008

20 Network Inference with the nem-package D2 = BoutrosRNAiDiscrete[,9:16] control=set.default.parameters (unique(colnames(d2)), selgenes=tru) mynem2 = nem(d2, inference= triples, control=control, verbose=fals) plot.nem(mynem2, D=D2, draw.lines=tru) Page 20 Holger Fröhlich Algorithmic Bioinformatics

21 Incorporation of Prior Knowledge Bias scoring such that known interactions are considered Bayesian prior on network structure P( Φ ) = P( Φ ) i, j ij ) 1 Φij Φij P( Φ ij ν ) = exp 2ν ν Φ= Signaling Graph Φ = Prior Belief ν = Hyperparameter of Laplace Distribution Complete trust in prior P( Φ ) = P( Φ ν ) P( ν ) dν Page 21 Holger Fröhlich Algorithmic Bioinformatics ij 0 ν ~ InvGamma(1, 0.5) P( Φ ) = ij ν (scale of prior) ij 1 ) ( 1+ 2 Φij Φij ) 2 Complete trust in data Fröhlich et al., 2008

22 Using Prior Knowledge with the nem-package control=set.default.parameters (unique(colnames(d)), selgenes=tru, type= CONTmLLMAP, Pm=diag(4)) mynem3 = nem(d, control=control, verbose=fals) plot.nem(mynem3, SCC=FALS, D=D, draw.lines=tru) Page 22 Holger Fröhlich Algorithmic Bioinformatics

23 Statistical Stability and Significance How stable the inferred network? Do small changes of -genes lead to different network hypotheses? Use non-parametric bootstrap repeat Sample n -genes with replacement Is the inferred network better than random? Randomly permute node labels and look, whether random network has a higher likelihood. R 0.7 S Q P Page 23 Holger Fröhlich Algorithmic Bioinformatics

24 Statistical Stability and Significance How stable the inferred network? Do small changes of -genes lead to different network hypotheses? Use non-parametric bootstrap repeat Sample n -genes with replacement Is the inferred network better than random? Randomly permute node labels and look, whether random network has a higher likelihood. S 0.7 R P Q Page 24 Holger Fröhlich Algorithmic Bioinformatics

25 Statistical Stability and Significance How stable the inferred network? Do small changes of -genes lead to different network hypotheses? Use non-parametric bootstrap repeat Sample n -genes with replacement Is the inferred network better than random? Randomly permute node labels and look, whether random network has a higher likelihood. P 0.7 S R Q Page 25 Holger Fröhlich Algorithmic Bioinformatics

26 Bootstrapping and Significance Calculation with the nem-package control=set.default.parameters (unique(colnames(d)), type= CONTmLLBayes, Pm=diag(4)) mynem.boot = nem.bootstrap(d, nboot=100, control=control) plot.nem(mynem.boot, SCC=FALS, plot.probs=tru) nem.calcsignificance(d, N=1000, mynem.boot) p = (label permutation test) Page 26 Holger Fröhlich Algorithmic Bioinformatics

27 Summary Inference of features of signaling pathways from high dimensional, targeted perturbation effects Different likelihood models Discretized data P-value log-densities Algorithms for inference of large networks Module Networks Triplets Greedy hillclimbing... Possibility to integrate prior knowledge Automatic selection of relevant -genes Various plotting and analysis methods Non-parametric bootstrap Label permutation p-values Page 27 Holger Fröhlich Algorithmic Bioinformatics

28 Acknowledgements Div. Molecular Genome Analyis, DKFZ Bioinformatics: - Tim Beißbarth - Christian Bender - Marc Johannes xpression Profiling - Holger Sültmann - Marc Fellmann - Ruprecht Kuner - Sabrina Belauger Proteomics - Christian Löbke - Özgür Sahin - Dorit Arlt Page 28 Holger Fröhlich Algorithmic Bioinformatics

Learning in Bayesian Networks

Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks