Inferring Protein-Signaling Networks II

Size: px

Start display at page:

Download "Inferring Protein-Signaling Networks II"

Wesley Holmes
5 years ago
Views:

1 Inferring Protein-Signaling Networks II Lectures 15 Nov 16, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) Closing the Loop Thank you for your participation to the survey! Things that helped you A very diverse set of topics Well-organized Who is with me? Quality of the slides Things that did not help you Lack of depth Intended to be achieved through problem sets and projects HW problems Needs to improve clarity 2 1

Outline Inferring the protein-signaling network Computational problem Structure score Structure search Interventional modeling Evaluation of results Conclusion Key learning points Structure learning

2 Outline Inferring the protein-signaling network Computational problem Structure score Structure search Interventional modeling Evaluation of results Conclusion Key learning points Structure learning of Bayesian network Intervention modeling Evaluation of the inferred biological network 3 Inferring Signaling Networks Signaling networks are full of post-translational regulatory mechanisms (e.g. phosphorylation) = Measured proteins 4 2

3 Computational Problem Given data D Goals Proteins praf pmek plcg PIP2 PIP3 p44/42 pakts473 P38 pjnk Cells Infer the causal interaction network Plc Jnk P38 Raf Mek PIP3 P44/42 PIP2 Akt Structure learning of Bayesian network General machine learning problem Applicable to different areas in network biology 5 Structure Search Score-based learning algorithm Given a set of all possible structures and the scoring function, we aim to select the highest scoring network structure. Greedy hill-climbing search Make one edge change which maximizes the graph score Current Next Add an edge Remove an edge Reverse an edge *Constrained to non-cyclic modifications Importance of score decomposition 6 3

4 Score Decomposition Greedy hill-climbing search Make one edge change which maximizes the graph score Current G 1 Next G 1 Add an edge Compare G and G in terms of the structure score If score decomposability holds, Score of G = S 1 ( 1, Pa 1 )+S 2 ( 2, Pa 2 )+...+S 6 ( 6, Pa 6 ), where S i ( i, Pa i ) is a FamScore for node i When G G, we only need to re-compute S 4 ( 4, Pa 4 ) How about commonly used structure scores? 7 Maximum Likelihood Find G that maximizes: P( Data=D Graph=G, Θ MLE ) K=#discrete levels of N ijk = # times i=k and Parents(i)=j in the Data D = Data G=Graph θ = CPT values for each node θ ijk = P( i=k Parents(i)=j ) N ijk = # times i=k and Parents(i)=j in the Data Θ ijk ML = N ijk / k N ijk θ ijk = P( i=k Parents(i)=j ) [# of (0,0,0)] / [# of (0,0,*)] Θ C10 Θ C20 Θ C30 Θ C40 Θ C11 Θ C21 Θ C31 Θ C41 D = [(0,0,0), (0,0,1), (1,1,1), (1,0,0), (1,1,1), (1,0,1), : (1,0,0), (1,0,1)] 8 4

5 Structure Score Bayesian score P( Data=D Graph=G ) = P(D G,θ) P(θ G) dθ D = Data G=Graph θ = CPT values for each node θ ijk = P( i=k Parents(i)=j ) N ijk = # times i=k and Parents(i)=j in the Data P( 1,, K) ~ Multinomial Dirichlet prior ~ Dir(α) P( D, θ G ) K=#discrete levels of N ijk = # times i=k and Parents(i)=j in the Data Dirichlet normalizer Θ ij = Simplex { k θ ijk = 1} θ ijk = P( i=k Parents(i)=j ) D. Heckerman. A Tutorial on Learning with Bayesian Networks. 1999, 1997, G. Cooper E. Herskovits. A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning, 9, Structure Score D = Data G=Graph θ = CPT values for each node θ ijk = P( i=k Parents(i)=j ) N ijk = # times i=k and Parents(i)=j in the Data 10 G. Cooper E. Herskovits. A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning, 9,

6 Structure Score D = Data G=Graph θ = CPT values for each node θ ijk = P( i=k Parents(i)=j ) N ijk = # times i=k and Parents(i)=j in the Data + log P(G) Decomposability! 11 Bayesian Score P( Data=D Graph=G ) = P(D G,θ) P(θ G) dθ D = Data G=Graph θ = CPT values for each node θ ijk = P( i=k Parents(i)=j ) N ijk = # times i=k and Parents(i)=j in the Data Θ ijk BS = (N ijk + α ijk ) / k (N ijk + α ijk ) Imaginary counts [# of (0,0,0)+ α C10 ] / [# of (0,0,*) + α C10 + α C11 ] Θ C10 Θ C20 Θ C11 Θ C21 Θ C30 Θ C40 Θ C31 Θ C

7 Structure Learning Algorithm Greedy hill-climbing search Make one edge change which maximizes the graph score Current G Next G Add an edge Remove an edge Reverse an edge Update S 4 ( 4, Pa 4 ) when G G Update S 3 ( 3, Pa 3 ) when G G Update S 3 ( 3, Pa 3 ) when G G Repeat to make one edge changes until the score no longer increases (find local maxima) 13 Model Averaging Generate N graphs using bootstrapped data Protein A Protein A Protein B Protein B Protein C Protein D Protein C Protein D Protein E Protein F Protein E Protein F Confidence(feature f) = # graphs with feature f / # graphs Select a confidence threshold 14 7

Causal Networks Bayes net is NOT a causal net Conditional Independence B (A E) (B D A,E) (C A,D,E B) (D B,C,E A) (E A,D) Does structure learning of the Bayesian network reveal causal relationships?

8 Causal Networks Bayes net is NOT a causal net Conditional Independence B (A E) (B D A,E) (C A,D,E B) (D B,C,E A) (E A,D) Does structure learning of the Bayesian network reveal causal relationships? Not necessarily Pe'er D. Bayesian Network Analysis of Signaling Networks: A Primer. Science STKE. April Causal Networks Simple example Given D={<0,0>,<0,1>,<0,0>,,<1,1>}, we can compute the Bayesian scores of both graphs very similar! (e.g , -6.78) Data D containing intervention samples can improve the accurate inference of causal relationships D = { (0,0), (0,1), [do(=1),1], (0,0), [do(=0),0], [do(=0),0], (1,1),, (1,1) } 8

9 Observation vs Intervention Data Training data D Proteins praf pmek plcg PIP2 PIP3 p44/42 pakts473 P38 pjnk No intervention Cells Intervention conditions In each intervention condition, add a chemical known to inhibit/ activate a certain protein How do we treat the data from intervention conditions? 17 Intervention Modeling Let s say that the real network is: Observational / interventional data would be like: D ={ (0,0), (1,1), (0,0), (0,0), (1,1), (0,1), (1,1), (1,0), (0,0) [do(=0),0], [do(=1),1], [do(=0),0], [do(=0),0], [do(=1),1], [1, do(=0)], [1, do(=1)], [0,do(=0)], [0,do(=1)], [0,do(=1)], } How should the scores be computed in each case? Let s consider ML score G 1 G

10 Intervention Modeling D ={ (0,0), (1,1), (0,0), (0,0), (1,1), (0,1), (1,1), (1,0), (0,0) [do(=0),0], [do(=1),1], [do(=0),0], [do(=0),0], [do(=1),1], [1, do(=0)], [1, do(=1)], [0,do(=0)], [0,do(=1)], [0,do(=1)], } When computing the structure score, the training samples in D are assumed to come from the same underlying network. However, do(=0/1) means that is forced to have value 0/1 and does not depend on it s parents. We should treat the intervention samples differently account when computing the score. G 1 G 1 [do(=0/1)] G 1 [do(=0/1)] When estimating MLE for s CPD, ignore the samples with [do(=0/1)] 19 Intervention Modeling D ={ (0,0), (1,1), (0,0), (0,0), (1,1), (0,1), (1,1), (1,0), (0,0) [do(=0),0], [do(=1),1], [do(=0),0], [do(=0),0], [do(=1),1], [1, do(=0)], [1, do(=1)], [0,do(=0)], [0,do(=1)], [0,do(=1)], } ML scores of G 1 & G 2 : G 1 Which of G 1 and G 2 is likely to have a higher score? G 1 [do(=0/1)] G 1 [do(=0/1)] D 1 ={ (0,0), (1,1), (0,0), (0,0), (1,1), (0,1), (1,1), (1,0), (0,0) [do(=0),0], [do(=1),1], [do(=0),0], [do(=0),0], [do(=1),1], [1, do(=0)], [1, do(=1)], [0,do(=0)], [0,do(=1)], [0,do(=1)], } Dependency between and can be modeled by G 1 better. Intervention samples help! D 2 ={ (0,0), (1,1), (0,0), (0,0), (1,1), (0,1), (1,1), (1,0), (0,0) [do(=0),0], [do(=1),1], [do(=0),0], [do(=0),0], [do(=1),1], [1, do(=0)], [1, do(=1)], [0,do(=0)], [0,do(=1)], [0,do(=1)], } G 2 G 2 [do(=0/1)] G 2 [do(=0/1)] 20 10

Intervention Modeling Compute the score with intervention data * Modified FamScore S i Ignore counts

P38 pjnk Cells 500 1000 1500 2000 5 6 activator No intervention Inhibit 3 do(3=0) Activate 6 do(6=1)

A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning, 9, 309-347.

Cytometry perturbation n Datasets of cells condition a condition b condition n 22 Influence diagram of

11 Intervention Modeling Compute the score with intervention data * Modified FamScore S i Ignore counts for the intervened node Inhibitor 3 Current G Proteins praf pmek plcg PIP2 PIP3 p44/42 pakts473 P38 pjnk Cells activator No intervention Inhibit 3 do(3=0) Activate 6 do(6=1) Intervention conditions * G. Cooper E. Herskovits. A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning, 9, Overview Conditions (multi well format) perturbation a perturbation b Multiparameter Flow Cytometry perturbation n Datasets of cells condition a condition b condition n 22 Influence diagram of measured variables Bayesian Network Analysis Source: Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data. Sachs et al. Science (2005)

12 Signaling Networks Example Classic signaling network and points of intervention Human T cell (white blood cell) = Measured proteins Intervention conditions Stimuli: anti-cd3, anti-cd28, ICAM-2 Inhibitors to: Akt,, PIP3, Mek Activators of:, Source: Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data. Sachs et al. Science (2005). 23 Inferred Network Phospho-Proteins Phospho-Lipids Perturbed in data Plc Raf Jnk P38 Mek PIP3 P44/42 PIP2 Akt Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data. Sachs et al. Science (2005)

13 Evaluation I Phospho-Proteins Phospho-Lipids Perturbed in data Plc Raf Jnk P38 Mek PIP3 P44/42 (Erk) PIP2 Akt Direct phosphorylation Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data. Sachs et al. Science (2005). 25 Features of Approach Direct phosphorylation: Mek Erk Difficult to detect using other forms of high-throughput data: - Protein-protein interaction data - Microarrays Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data. Sachs et al. Science (2005)

14 Evaluation II Phospho-Proteins Phospho-Lipids Perturbed in data Plc Jnk P38 Raf Mek PIP3 P44/42 (Erk) PIP2 Akt Indirect signaling Direct phosphorylation Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data. Sachs et al. Science (2005). 27 Features of Approach Indirect signaling Jnk Mapkkk Mapkk Jnk Explaining away Not measured Indirect connections can be found even when the intermediate molecule(s) are not measured Raf Mek Erk Unnecessary edges do NOT appear 28 14

15 Indirect signaling - Complex example Is this a mistake? Raf Mek The real picture Raf s497 Ras Raf s259 Mek Phoso-protein specific More than one pathway of influence 29 Evaluation Phospho-Proteins Phospho-Lipids Perturbed in data Plc Raf Expected Pathway Indirect signaling Direct phosphorylation Jnk P38 Mek PIP3 P44/42 15/17 Classic PIP2 Akt Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data. Sachs et al. Science (2005)

16 Evaluation Phospho-Proteins Phospho-Lipids Perturbed in data Plc Raf Expected Pathway Reported Reversed Jnk P38 Missed Mek PIP3 PIP2 Akt P44/42 15/17 Classic 17/17 Reported 3 Missed 31 Prediction Erk1/2 unperturbed Erk influence on Akt previously reported in colon cancer cell lines Raf Mek Erk1/2 Predictions: Erk1/2 influences Akt While correlated, Erk1/2 does not influence Akt 16

17 Validation SiRNA on Erk1/Erk2 Select transfected cells Measure Akt and control, stimulated Erk1 sirna, stimulated P=9.4e -5 P= APC-A: p-akt-647 APC-A P-Akt PE-A: p-pka-546 PE-A P- 33 Conclusions Many limitations Interventions Flow cytometry (4-12 proteins, no time series data) Bayesian networks (no feedback loops) Advantages In vivo measurement No a priori knowledge Enablers of accurate inference Network intervention Sufficient numbers of single cells 34 17

18 What We ve Covered So Far ML basics (Bayesian networks, MLE, EM) Genetics (association studies, phasing, linkage analysis) Systems biology (gene regulation, gene interaction) What next? 35 Sequence Analysis (5 lectures) Sequencing techniques Sequence alignment Comparative genomics 36 18

Inferring Protein-Signaling Networks

Inferring Protein-Signaling Networks Lectures 14 Nov 14, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 1