types of codon models

Size: px

Start display at page:

Download "types of codon models"

Prosper Brooks
6 years ago
Views:

1 part 3: analysis of natural selection pressure omega models! types of codon models if i and j differ by > π j for synonymous tv. Q ij = κπ j for synonymous ts. ωπ j for non-synonymous tv. ωκπ j for non-synonymous ts. Goldman(and(Yang((994)( Muse(and(Gaut((994)(

2 this codon model M omega models! if i and j differ by > π j for synonymous tv. Q ij = κπ j for synonymous ts. ωπ j for non-synonymous tv. ωκπ j for non-synonymous ts. Goldman(and(Yang((994)( Muse(and(Gaut((994)( x x 2! x 3 x 4 t :ω j t 2 :ω t 3 :ω t 4 :ω ω t 5 t:ω 4 k GTG CTG TCT CCT GCC GAC AAG ACC AAC GTC AAG GCC GCC TGG GGC AAG GTT G.C T....T C..T A..... A.T AA... A.C.....C... G.A.AT.....A A..... AA. TG......G... A C..G GA...T T C....G..A... AT......T.....G same ω for all branches same ω for all sites two basic types of models t :ω x x 2! x 3 x 4 j t 4 :ω t 5 :ω t 2 :ω t 3 :ω t 4 :ω k branch models (ω varies among branches) ω ω ω ω ω GTG CTG TCT CCT GCC GAC AAG ACC AAC GTC AAG GCC GCC TGG GGC AAG GTT G.C T....T C..T A..... A.T AA... A.C.....C... G.A.AT.....A A..... AA. TG......G... A C..G GA...T T C....G..A... AT......T.....G site models (ω varies among sites) 2

approach Yang, 998 fixed effects Bielawski and Yang, 23 Seo et al.

3 interpretation of a branch model x x 2! x 3 x 4 t :ω j t 2 :ω t 3 :ω t 4 :ω t 4 :ω t 5 :ω k episodic adaptive evolution of a novel function with ω > branch models* x x 2! x 3 x 4 t :ω j t 2 :ω t 3 :ω t 4 :ω t 4 :ω t 5 :ω k variation (ω) among branches: approach Yang, 998 fixed effects Bielawski and Yang, 23 Seo et al. 24 Kosakovsky Pond and Frost, 25 Dutheil et al. 22 fixed effects auto-correlated rates genetic algorithm clustering algorithm * these methods can be useful when selection pressure is strongly episodic 3

4 site models* GTG CTG TCT CCT GCC GAC AAG ACC AAC GTC AAG GCC GCC TGG GGC AAG GTT GGC GCG CAC G.C T....T GC A C..T A..... A.T AA... A.C... AGC C... G.A.AT.....A A..... AA. TG......G... A....T.GC..T.....C..G GA...T T C....G..A... AT......T.....G..A.GC... variation (ω) among sites: Yang and Swanson, 22 Bao, Gu and Bielawski, 26 Massingham and Goldman, 25 Kosakovsky Pond and Frost, 25 Nielsen and Yang, 998 Kosakovsky Pond, Frost and Muse, 25 Huelsenbeck and Dyer, 24; Huelsenbeck et al. 26 Rubenstein et al. 2 Bao, Gu, Dunn and Bielawski 28 & 2 Murell et al. 23 approach fixed effects (ML) fixed effects (ML) site wise (LRT) site wise (LRT) mixture model (ML) mixture model (ML) mixture (Bayesian) mixture model (ML) mixture (LiBaC/MBC) mixture (Bayesian) useful when at some sites evolve under diversifying selection pressure over long periods of time this is not a comprehensive list site models: discrete model (M3) mixture-model likelihood! K P(x h ) = p i P(x h ω i ) i= conditional likelihood calculation (see part ) ω =. ω =. ω 2 = 2. 4

5 interpretation of a sites-model % of sites diversifying selection (frequency dependent) at 5% of sites with ω 2 = 2 ω =. ω =. ω 2 = 2. models for variation among branches & sites x x 2 x 3 x 4 ω ω ω ω ω t :ω t 2 :ω j t :ω t 3 :ω t 4 :ω GTG CTG TCT CCT GCC GAC AAG ACC AAC GTC AAG GCC GCC TGG GGC AAG GTT G.C T....T C..T A..... A.T AA... A.C.....C... G.A.AT.....A A..... AA. TG......G... A C..G GA...T T C....G..A... AT......T.....G k branch models (ω varies among branches) site models (ω varies among sites) branch-site models (combines the features of above models) 5

6 models for variation among branches & sites variation (ω) among branches & sites: approach Yang and Nielsen, 22 fixed+mixture (ML) Forsberg and Christiansen, 23 fixed+mixture (ML) Bielawski and Yang, 24 fixed+mixture (ML) Giundon et al., 24 Zhang et al. 25 Kosakovsky Pond et al. 2, 22 switching (ML) fixed+mixture (ML) full mixture (ML) * these methods can be useful when selection pressures change over time at just a fraction of sites * it can be a challenge to apply these methods properly (more about this later) branch-site Model B mixture-model likelihood! P( x h ) = K i= p P( x i h ω ) i Foreground branch only ω ω ω =. =.9 = 5.55 ω for background branches are from site-classes and 2 (. or.9) 6

two scenarios can yield branch-sites with dn/ds >.9.8.7.6.5.4.3.2. % of sites Foreground (FG) branch only % of sites have shifting balance on a fixed peak (same function) ω =. ω =.9 ω FG = 5.

7 two scenarios can yield branch-sites with dn/ds > % of sites Foreground (FG) branch only % of sites have shifting balance on a fixed peak (same function) ω =. ω =.9 ω FG = 5.55 episodic adaptive evolution at % of sites for novel function branch-site codon models cannot tell which scenario is correct without external information! Jones et al (26) MBE omega models! model-based inference if i and j differ by > π j for synonymous tv. Q ij = κπ j for synonymous ts. ωπ j for non-synonymous tv. ωκπ j for non-synonymous ts. Goldman(and(Yang((994)( Muse(and(Gaut((994)( 7

8 model based inference 3 analytical tasks task. parameter estimation (e.g., ω) task 2. hypothesis testing task 3. make predictions (e.g., sites having ω > ) task : parameter estimation t, κ, ω = unknown constants estimated by ML π s = empirical [GY: F3 4 or F6 in Lab] use a numerical hill-climbing algorithm to maximize the likelihood function 8

Sooner or later you ll get it 27-7-29 task : parameter estimation Parameters: t and ω Gene: acetylcholine α receptor mouse human common ancestor Sooner or later you

9 Sooner or later you ll get it task : parameter estimation Parameters: t and ω Gene: acetylcholine α receptor mouse human common ancestor Sooner or later you ll get it lnl = task 2: statistical significance task. parameter estimation (e.g., ω) task 2. hypothesis testing LRT task 3. prediction / site identification 9

10 task 2: likelihood ratio test for positive selection H : variable selective pressure but NO positive selection (M) H : variable selective pressure with positive selection (M2) Compare 2Δl = 2(l - l ) with a χ 2 distribution Model a (Ma) Model 2a (M2a) ωˆ =.5 (ω = ) ωˆ =.5 (ω = ) ωˆ = 3.25 task 2: likelihood ratio test for positive selection H : Beta distributed variable selective pressure (M7) H : Beta plus positive selection (M8) Compare 2Δl = 2(l - l ) with a χ 2 distribution M7: beta M8: beta & ω sites sites ω ratio ω ratio >

11 task 3: identify the selected sites task. parameter estimation (e.g., ω) task 2. hypothesis testing task 3. prediction / site identification Bayes rule task 3: which sites have dn/ds > model: 9% have ω > Bayes rule: site 4, 2 & 3 GTG CTG TCT CCT GCC GAC AAG ACC AAC GTC AAG GCC GCC TGG GGC AAG GTT GGC GCG CAC G.C T....T GC A C..T A..... A.T AA... A.C... AGC C... G.A.AT.....A A..... AA. TG......G... A....T.GC..T.....C..G GA...T T C....G..A... AT......T.....G..A.GC... structure: sites are in contact

12 review the mixture likelihood (model M3) K P(x h ) = p(ω i )P(x h ω i ) i= Total probability Prior Likelihood ω ω ω 2 p p p 2 =.3 =.4 = 4. =.85 =. =.5 Bayes rule for identifying selected sites? Site class : ω =.3, 85% of codon sites Site class : ω =.4, % of codon sites Site class 2: ω 2 = 4, 5% of codon sites Prior probability of hypothesis (ω 2 ) Likelihood of hypothesis (ω 2 ) ( ) P(ω 2 x h ) = P(ω 2 )P x h ω 2 K i= P(ω i )P( x h ω i ) Posterior probability of hypothesis (ω 2 ) Marginal probability (Total probability) of the data 2

task 3: Bayes rule for which sites have dn/ds > (",-./2%3454/67%837/56% 956:3843%837/56% >5:;38/58%.85?-?//;2%!#'"!#&"!#%"!#$"!

(" Site class : ω =.3 (strong purifying selection) Site class : ω =.

P-value ; it can be interpreted as a measure of relative support, although there is rarely any attempt at calibration.

assumes no MLE errors BEB Bayes Empirical Bayes Yang et al.

13 task 3: Bayes rule for which sites have dn/ds > (",-./2%3454/67%837/56% 956:3843%837/56% >5:;38/58%.85?-?//;2%!#'"!#&"!#%"!#$"!" (" &" ((" (&" $(" $&" )(" )&" %(" %&" *(" *&" &(" &&" +(" +&" '(" '&",(",&" (!(" (!&" (((" ((&" ($(" ($&" ()(" ()&" (%(" (%&" (*(" (*&" (&(" (&&" (+(" (+&" ('(" ('&" (,(" (,&" $!(" Site class : ω =.3 (strong purifying selection) Site class : ω =.4 (weak purifying selection) Site class 2: ω 2 = 4 (positive selection) NOTE: The posterior probability should NOT be interpreted as a P-value ; it can be interpreted as a measure of relative support, although there is rarely any attempt at calibration. task 3: Bayes rule for which sites have dn/ds > Bayes rule empirical Bayes bootstrap NEB Naive Empirical Bayes Nielsen and Yang, 998 assumes no MLE errors BEB Bayes Empirical Bayes Yang et al., 25 accommodate MLE errors for some model parameters via uniform priors SBA Smoothed bootstrap aggregation Mingrone et al., MBE, 33: accommodate MLE errors via bootstrapping ameliorates biases and MLE instabilities with kernel smoothing and aggregation 3

14 critical question: Have the requirements for maximum likelihood inference been met? (rarely addressed in real data analyses) regularity conditions have been met Frequency Frequency p ω> w ω> Normal MLE uncertainty (M2a) large sample size with regularity conditions MLEs approximately unbiased and minimum variance ( ( ) ) θ ˆ ~ N θ, I θ ˆ 4

15 regularity conditions have NOT been met Frequency Frequency 5 5 bootstrapping can be used to diagnose this problem: Bielawski et al. (26) Curr. Protoc. Bioinf. 56:6.5. Mingrone et al., MBE, 33: p ω> w ω> MLE instabilities (M2a) small sample sizes and θ on boundary continuous θ has been discretized (e.g., M2a) non-gaussian, over-dispersed, divergence among datasets 5

part 3: analysis of natural selection pressure

part 3: analysis of natural selection pressure markov models are good phenomenological codon models do have many benefits: o principled framework for statistical inference o avoiding ad hoc corrections