codon substitution models and the analysis of natural selection pressure

Size: px
Start display at page:

Download "codon substitution models and the analysis of natural selection pressure"

Transcription

1 codon substitution models and the analysis of natural selection pressure Joseph P. Bielawski Department of Biology Department of Mathematics & Statistics Dalhousie University introduction morphological adaptation 1

2 protein structure introduction Troponin C: fast skeletal Troponin C: cardiac and slow skeletal gene sequences introduction human cow rabbit rat opossum GTG CTG TCT CCT GCC GAC AAG ACC AAC GTC AAG GCC GCC TGG GGC AAG GTT GGC GCG CAC G.C T...GC A.. A.. A.T.AA A.C AGC G.A.AT A.. AA. TG...G A...GC..G GA. C....G AT...G.GC GCT GGC GAG TAT GGT GCG GAG GCC CTG GAG AGG ATG TTC CTG TCC TTC CCC ACC ACC AAG.CT AG..G. G.. T.. GG..G..A. C.. GCT G...CC.CA.CC.CC ACC TAC TTC CCG CAC TTC GAC CTG AGC CAC GGC TCT GCC CAG GTT AAG GGC CAC GGC AAG..G G.. T.C.C..AG A.C.C. T.T A.T G.A.C..CT TC..C. A.C C.. 2

3 introduction Powerful analytical tools: 1. Population genetic data 2. Comparative analysis of codon sequences 3. Comparative analysis of amino acid sequences there is no single statistic which is best for testing the most general departures from neutrality! (Watterson 1977) overview introduction 1. introduction to modeling codon evolution 2. model based inference 3. PAML introduction & real data exercises 3

4 outline part I 1. introduction to the ω ratio 2. markov model of codon evolution 3. codon models for ω variation over branches & sites 4. model realism vs. model complexity an index of natural selection pressure 1. the ω ratio Kimura (1968) d S : number of synonymous substitutions per synonymous site (K S ) d N : number of nonsynonymous substitutions per nonsynonymous site (K A ) ω : the ratio d N /d S ; it measures selection at the protein level The genetic code determines how random changes to the gene brought about by the process of mutation will impact the function of the encoded protein. 4

5 index of natural selection pressure: ω ratio 1. the ω ratio rate ratio mode example ω < 1 purifying (negative) selection histones ω =1 Neutral Evolution pseudogenes ω > 1 Diversifying (positive) selection MHC, Lysin the basics 1. the ω ratio Why use d N and d S? (Why not use raw counts?) Example of counts: 300 codon gene from a pair of species 5 synonymous differences 5 nonsynonymous differences 5/5 = 1 Why don t we conclude that rates are equal (i.e., neutral evolution)? 5

6 the basics 1. the ω ratio Relative proportion of different types of mutations in hypothetical protein coding sequence. Expected number of changes (proportion) Type All 3 Positions 1 st positions 2 nd positions 3 rd positions Total mutations 549 (100) 183 (100) 183 (100) 183 (100) Synonymous 134 (25) 8 (4) 0 (0) 126 (69) Nonsyonymous 392 (71) 166 (91) 176 (96) 57 (27) nonsense 23 (4) 9 (5) 7 (4) 7 (4) Modified from Li and Graur (1991). Note that we assume a hypothetical model where all codons are used equally and that all types of point mutations are equally likely. the basics 1. the ω ratio Why use d N and d S? Same example, but using d N and d S : Synonymous sites = 25.5% S = % = Nonsynonymous sites = 74.5% N = % = So, d S = 5/229.5 = d N = 5/670.5 = d N /d S (ω) = 0.34, purifying selection!!! 6

7 mutational opportunity 1. the ω ratio Relative proportion of different types of mutations in hypothetical protein coding sequence. Expected number of changes (proportion) Type All 3 Positions 1 st positions 2 nd positions 3 rd positions Total mutations 549 (100) 183 (100) 183 (100) 183 (100) Synonymous 134 (25) 8 (4) 0 (0) 126 (69) Nonsyonymous 392 (71) 166 (91) 176 (96) 57 (27) nonsense 23 (4) 9 (5) 7 (4) 7 (4) Modified from Li and Graur (1991). Note that we assume a hypothetical model where all codons are used equally and that all types of point mutations are equally likely. Note: by framing the counting of sites in this way we are using a mutational opportunity definition of the sites Not everyone agrees that this is the best approach. For an alternative view see Bierne and Eyre-Walker 2003 Genetics 168: real data have biases (Drosophila GstD1 gene) 1. the ω ratio transitions vs. transversions: A G ts/tv = 2.71 C T preferred vs. un-preferred codons: Partial codon usage table for the GstD gene of Drosophila Phe F TTT 0 Ser S TCT 0 Tyr Y TAT 1 Cys C TGT 0 TTC 27 TCC 15 TAC 22 TGC 6 Leu L TTA 0 TCA 0 *** * TAA 0 *** * TGA 0 TTG 1 TCG 1 TAG 0 Trp W TGG Leu L CTT 2 Pro P CCT 1 His H CAT 0 Arg R CGT 1 CTC 2 CCC 15 CAC 4 CGC 7 CTA 0 CCA 3 Gln Q CAA 0 CGA 0 CTG 29 CCG 1 CAG 14 CGG

8 1. the ω ratio corrections and estimation bias in d S 4 3 low codon bias 4 3 med codon bias true simple model 2 d S d S 2 ts/tv + codon bias d S t high codon bias t d S t extreme codon bias t Data from: Dunn, Bielawski, and Yang (2001) Genetics, 157: x 1 x 2! x 3 x 4 t 1 t 2 t 3 j t 4 t 0 k Markov models of codon evolution A G C T 1. assumptions are explicit 2. corrections are not ad hoc 3. explicit use of a phylogeny improves power 4. principled framework for modelling and inference of the biology Goldman & Yang 1994 MBE 11: Muse & Gaut 1994 MBE 11:

9 GY-style codon models (mechanistic) some important parameters: o transition/transversion rate ratio: κ o biased codon usage: π j for codon j o nonsynonymous/synonymous rate ratio: ω = d N /d S GY-style codon models (mechanistic) let s model a change to CTG Synonymous CTC (Leu) CTG (Leu): π CTG TTC (Leu) CTG (Leu): κπ CTG Nonsynonymous GTG (Val) CTG (Leu): ωπ CTG CCG (Pro) CTG (Leu): κωπ CTG 9

10 GY-style codon models (mechanistic) From codon below: TTT (Phe) TTC (Phe) TTA (Leu) to codon below: TTG (Leu) CTT (Leu) CTC (Leu) TTT (Phe) κπ TTC ωπ TTA ωπ TTG ωκπ TTT 0 0 TTC (Phe) κπ TTT ωπ TTA ωπ TTG 0 ωκπ CTC 0 TTA (Leu) ωπ TTT ωπ TTC TTG (Leu) ωπ TTT ωπ TTC κπ TTA CTT (Leu) ωκπ TTT κπ CTC 0 CTC (Leu) 0 ωκπ TTC 0 0 κπ TTT 0 GGG (Gly) GGG (Gly) * This is equivalent to the codon model of Goldman and Yang (1994). Parameter ω is the ratio d N /d S, κ is the transition/transversion rate ratio, and π i is the equilibrium frequency of the target codon (i). P(t) = {p ij (t)} = e Qt GY-style codon models (mechanistic) (Goldman & Yang 1994 MBE 11: ) q ij 0 π j, = κπ j, ωπ j, ωκπ j, if i and j differ at 2 or 3 positions for syn.transversion for syn.transition for nonsyn. transversion for nonsyn. transition P(t) = {p ij (t)} = e Qt 10

11 likelihood of the data at a site (only two codons) CCC CCT t 0 t 1 k ( t ) p ( ) Lh ( CCC, CCT ) = π k pkccc 0 kcct t1 k Note: analysis is typically done by using an unrooted tree likelihood of the data at all sites The likelihood of observing the entire sequence alignment is the product of the probabilities at each site. N L = L 1 L 2 L 3 L N = h= 1 L h The log likelihood is a sum over all sites. l = ln{l} = ln{l 1 } + ln{l 2 } + ln{l 3 } + + ln{l N } = h= N 1 ln{ L h } 11

12 we made some progress the good: we now have a framework for o avoiding ad hoc corrections of counting methods o computation of transition probabilities * o principled framework for statistical inference a new issue: averaging ω over a pair of sequences has very low power to detect positive selection if the question is about when or where ω > 1! * Computation of transition probabilities accomplishes, in just one step, (1) a proper correction for multiple substitutions, (2) weighting for alternative pathways between codons and (3) is the basis for estimating the values of the model parameters from the data in hand. selection pressure (ω) varies in time mya mya 35 mya mya ε γ G γ A δ β Chrom. 11 β globin gene cluster Our question: When? 12

13 selection pressure (ω) varies in time mya mya mya 35 mya Chrom. 11 ε γ G γ A δ β β globin gene cluster b.l. (my) Fraction of t ω (Tot = 565my) (Tot = 1) Grey branches: ω = 0.2 Black branches: ω = 0.5 Blue branches: ω = 1.2 If we average over the tree, we do NOT detect positive selection; ω = selection pressure (ω) varies among sites Our question: Where? 13

14 selection pressure (ω) varies among sites If we average over sites, we do NOT detect positive selection; ω = 0.31 ATG. TAA Purifying: d N /d S = 0.01 Neutral: d N /d S = 1 Adaptive: d N /d S = 2 likelihood of a phylogeny problem: averaging ω over a pair has very low power if the questions are about when or where! solution: phylogenetic estimation of selection pressure x 1 x 2! x 3 x 4 t 1 t 2 j t 3 t 4 t 0 k 14

15 likelihood of a phylogeny x 1 t 1 t 2 t 0 t x 3 3 t 4 x 2 x 4 x 1 x 2! x 3 x 4 t 1 t 2 j t 3 t 4 Sum over all possible codons at ancestral nodes {TTT, TTC,, GGG} t 0 k L ) ( xh) k kx4 4 kx3 3 kj 0 jx2 2 jx1 1 k j = [ π p ( t ) p ( t ) p ( t ) p ( t ) p ( t ] improvements. 3. ω variation x 1 x 2 x 3 x 4 t 1 :ω 1 t 2 :ω 1 t 3 :ω 0 t 4 :ω 0 j t 0 :ω 0 k ω 1 ω 0 ω 1 ω 0 ω 1 GTG CTG TCT CCT GCC GAC AAG ACC AAC GTC AAG GCC GCC TGG GGC AAG GTT G.C T.. A.. A.T.AA A.C G.A.AT A.. AA. TG...G A....G GA. C....G AT...G 3.1. branch models (ω varies among branches) 3.2. site models (ω varies among sites) 15

16 3. ω variation 3.1. branch models * x 1 x 2 x 3 x 4 t 1 :ω 1 t 2 :ω 1 j t 3 :ω 0 t 4 :ω 0 t 0 :ω 0 k Variation (ω) among branches: Yang, 1998 Bielawski and Yang, 2003 Seo et al Kosakovsky Pond and Frost, 2005 Dutheil et al Approach fixed effects fixed effects auto-correlated rates genetic algorithm clustering algorithm * These methods can be useful when selection pressure is strongly episodic 3.1. branch models 3. ω variation x 1 x 2 x 3 x 4 t 1 :ω 1 t 2 :ω 1 j t 3 :ω 0 t 4 :ω 0 t 0 :ω 0 o species colonization of a new niche k o altered context for gene expression o gene duplication event(s) o lateral gene transfers (LGTs) o cross-species virus transmission & host switching o organismal adaptive radiations 16

17 3. ω variation 3.2. site models * GTG CTG TCT CCT GCC GAC AAG ACC AAC GTC AAG GCC GCC TGG GGC AAG GTT GGC GCG CAC G.C T...GC A.. A.. A.T.AA A.C AGC G.A.AT A.. AA. TG...G A...GC..G GA. C....G AT...G.GC Variation (ω) among sites: Yang and Swanson, 2002 Bao, Gu and Bielawski, 2006 Massingham and Goldman, 2005 Kosakovsky Pond and Frost, 2005 Nielsen and Yang, 1998 Kosakovsky Pond, Frost and Muse, 2005 Huelsenbeck and Dyer, 2004; Huelsenbeck et al Rubenstein et al Bao, Gu, Dunn and Bielawski 2008 & 2011 Murell et al Approach fixed effects (ML) fixed effects (ML) site wise (LRT) site wise (LRT) mixture model (ML) mixture model (ML) mixture (Bayesian) mixture model (ML) mixture (LiBaC/MBC) mixture (Bayesian) Useful when at some sites evolve under diversifying selection pressure over long periods of time This is not a comprehensive list site models: M-series 3. ω variation Model Code NP Parameters One-ratio M0 1 ω Neutral M1a 2 P 0, ω 0, Selection M2a 4 p 0, p 1, ω 0, ω 2 Discrete M3 2K-1 p 0, p 1,, p K-2 ω 0, ω 1,, ω K-2 Frequency M4 5 p 0, p 1,, p 4 Gamma M5 2 α, β 2Gamma M6 4 p 0, α 0, β 0, α 1 Beta M7 2 p, q Beta&ω M8 4 p 0, p, q, ω Beta&gamma M9 5 p 0, p, q, α, β Beta&normal+1 M10 5 p 0, p, q, α, β Beta&normal>1 M11 5 p 0, p, q, μ, α 0&2normal>1 M12 5 p 0, p 1, μ 2, α 1, α 2 3normal>0 M13 6 p 0, p 1, μ 2, α 0, α 1, α 2 17

18 discrete model K 1 human cow rabbit rat opossum 3. ω variation GTG CTG TCT..G CCT G.C G.A GA. GCC.AT GAC AAG ACC T.. AAC C.. GTC A.. A....G AAG GCC A.T AA. GCC TG. AT. TGG GGC.AA..G AAG GTT A.C A....G GGC GCG.GC AGC.GC.GC CAC A.. GCT.G..G. GGC GAG.CT.CC TAT GGT.CA GCG.A. GAG GCC GAG C...CC AGG ATG.CC TTC CTG T.. GCT TCC AG. GG. G.. TTC CCC ACC ACC AAG ACC TAC TTC CCG T.T CAC TTC A.T GAC CTG T.C G.A CTG G.. 1 AGC C TC CAC.C..C. GGC TCT..G GCC.AG CAG GTT A.C A.C AAG C.. GGC.C..CT CAC GGC AAG G.. P(x h ) = pi P(x h ωi ) i =0 ω = 2.0 ω = 0.01 ω = site models globin model directional selection human cow rabbit rat opossum 3. ω variation GTG CTG TCT..G CCT G.C G.A GA. GCC.AT GAC AAG ACC T.. AAC C.. GTC A.. A....G AAG GCC A.T AA. GCC TG. AT. TGG GGC.AA..G AAG GTT A.C A....G GGC GCG.GC AGC.GC.GC CAC A.. GCT.G..G. GGC GAG.CT.CC TAT GGT.CA GCG.A. GAG GCC CTG G.. GAG C...CC AGG ATG.CC TTC CTG T.. GCT TCC AG. GG. G.. TTC CCC ACC ACC AAG ACC TAC TTC CCG T.T CAC TTC A.T GAC CTG T.C G.A AGC.C. TC..C..C...G.AG A.C A.C C...C..CT G.. CAC GGC TCT GCC CAG GTT AAG GGC CAC GGC AAG MHC model diversifying site models selection Site models detect diversifying positive selection; absence of signal under a sites model does not mean changes in a protein did not have fitness consequences (Bielawski et al PNAS 101: ) 18

19 3. ω variation 3.2. site models GTG CTG TCT CCT GCC GAC AAG ACC AAC GTC AAG GCC GCC TGG GGC AAG GTT GGC GCG CAC G.C T...GC A.. A.. A.T.AA A.C AGC G.A.AT A.. AA. TG...G A...GC..G GA. C....G AT...G.GC o genetic incompatibilities in human infertility o non-hormonal contraception drugs o identify pathogenicity genes o venom-anti-venom co-evolution o identify candidate genes for drug therapies o identify immune and defense system genes o vaccine design o aid functional classification of unknown genes o incorporate in models of protein 2D and 3D structure we made some more progress. 3. ω variation x 1 x 2 x 3 x 4 t 1 :ω 1 t 2 :ω 1 t 3 :ω 0 t 4 :ω 0 j t 0 :ω 0 k ω 1 ω 0 ω 1 ω 0 ω 1 GTG CTG TCT CCT GCC GAC AAG ACC AAC GTC AAG GCC GCC TGG GGC AAG GTT G.C T.. A.. A.T.AA A.C G.A.AT A.. AA. TG...G A....G GA. C....G AT...G 3.1. branch models (ω varies among branches) 3.2. site models (ω varies among sites) 3.3. branch-site models (combines the features of above models) 19

20 3.3. branch-site model: model B 3. ω variation P( x h ) = K 1 i= 0 p P( x i h ω ) i Foreground branch only ω ω ω = 0.01 = 0.90 = 5.55 ω for background branches are from site-classes 1 and 2 (0.01 or 0.90) 3.3. models for variation in branches & sites 3. ω variation Variation (ω) among branches & sites: Yang and Nielsen, 2002 Forsberg and Christiansen, 2003 Bielawski and Yang, 2004 Giundon et al., 2004 Zhang et al Kosakovsky Pond et al. 2011, 2012 Approach fixed+mixture (ML) fixed+mixture (ML) fixed+mixture (ML) switching (ML) fixed+mixture (ML) full mixture (ML) * These methods can be useful when selection pressures change over time at just a fraction of sites * It can be a challenge to apply these methods properly (more abut this later). 20

21 a MODEL is an intentional simplification of a complex situation designed to eliminate extraneous detail in order to focus attention on the essentials of the situation! (Daniel L. Hartl) (images from Paul Lewis) there is a kind of tension between model realism and model complexity UNDER FIT OVER FIT bias total error total error variance error (low) model complexity (high) 21

22 AA exchangeabilties 4. realism vs. complexity intentional simplification: all of the models listed above treat these two amino acid substitutions in the same way! 1. ATA (Ile) è TTA (Leu): ωπ TTA (conservative) 2. ATA (Ile) è AAA (Lys): ωπ AAA (radical) Q matrix for M0 codon models 4. realism vs. complexity From codon below: TTT (Phe) TTC (Phe) TTA (Leu) to codon below: TTG (Leu) CTT (Leu) CTC (Leu) TTT (Phe) κπ TTC ωπ TTA ωπ TTG ωκπ TTT 0 0 TTC (Phe) κπ TTT ωπ TTA ωπ TTG 0 ωκπ CTC 0 TTA (Leu) ωπ TTT ωπ TTC TTG (Leu) ωπ TTT ωπ TTC κπ TTA CTT (Leu) ωκπ TTT κπ CTC 0 CTC (Leu) 0 ωκπ TTC 0 0 κπ TTT 0 GGG (Gly) GGG (Gly) * This is equivalent to the codon model of Goldman and Yang (1994). Parameter ω is the ratio d N /d S, κ is the transition/transversion rate ratio, and π i is the equilibrium frequency of the target codon (i). 22

23 COLD R matrix (log scale) 4. realism vs. complexity structure of rate matrix: GY-M0 M0 R matrix (log scale) 2 parameters (κ and ω) too sparse? Structure of a Q matrix (log scaled and binned) derived from M0 for Abolone sperm lysin isons of estimated R-matrices the R matrix on logarithm derived by scale: our selected model using F R-matrices by COLD and estimated R matrix by fitting method and the R matrix derived from M0 for structure of rate matrix: ECM ECM R matrix (log scale) 1830 parameters 4. realism vs. complexity COLD R matrix (log scale) way too al framework for codon models of phylogeny, which many! e the currently used models and provides the flexatrix to be decided by a number ECM (Kosiol et al. of 2007) related for all sequences in the factors. Pandit database Structure of a Q matrix (log scaled and binned) derived y to construct the model matrix to include some of Figure 8: Pattern comparisons of estimated R-matr tors to model the rateecm matrix. R matrix, The factors estimated that will R-matrices by COLD and M0 for Lysin data. 23 re data dependent; this is achieved through model

24 codon rate matrix 4. realism vs. complexity UNDER FIT? OVER FIT? bias total error total error variance error the R matrix derived by our selected model using F 61 as codon frequency fitting method and the R matrix derived from M0 for Lysin data in Figure 8. COLD R matrix (log scale) M0 R matrix (log scale) ECM R matrix (log scale) COLD R matrix (log scale) M0 R matrix (log scale) (low) model complexity (high) Figure 8: Pattern comparisons of estimated R-matrices on logarithm scale: ECM R matrix, estimated R-matrices by COLD and estimated R matrix by M0 for Lysin data. Figure 8: Pattern comparisons of estimated R-matrices on logarithm scale: ECM R matrix, estimated R-matrices by COLD and estimated R matrix by M0 for Lysin data. 6 Conclusion 6 Conclusion We have presented a general framework for codon models of phylogeny, which We have presented a general framework for codon models of phylogeny, which is general enough to include the currently used models and provides the flexibility to allow the rate matrix to be decided by a number of related factors. 4. realism vs. iscomplexity general enough to include the currently used models and provides the flexibility asderived codon to allow the by frequency rateour matrixselected to be decidedmodel by a number using of related Ffactors. 61 as trix derived by our selected model trying the using Rto matrix Ffind 61 as a derived balance codon by frequency our selected model the using R matrix F 61 We further proposed a way to construct the model matrix to include some of We further proposed a way to construct the model matrix to include some of thod the possibly andimportant the Rfactors matrix to model derived the rate matrix. from fitting TheM0 factorsfor method thatlysin will and datathe in RFigure matrix 8. derived fromfitting M0 for method Lysinand datathe inrfigure matrix 8. derived from M0 for Lysin the possibly important factors to model the rate matrix. The factors that will be included in the model are data dependent; this is achieved through model be included in the model are data dependent; this is achieved through model selection. Our model typically results in a richer pattern for the R matrix R matrix (log scale) COLD R matrix (log scale) ECM R M0 matrix R matrix (log (log scale) scale) COLD R matrix (log scale) ECM R M0 matrix R matrix (log (log scale) scale) COLD R matrix (log scale) M0 R m selection. Our model typically results in a richer pattern for the R matrix than the currently available codon models. The proposed model framework than the currently available codon models. The proposed model framework allows us to fit more realistic models to the data but avoids the unnecessarily allows us to fit more realistic models to the data but avoids the unnecessarily complicated models by applying model selection. complicated models by applying model selection parameters : 2 (fitted) 23 (fitted) 1830 (not fitted) lnl (Lysin) : Pattern comparisons of estimatedfigure R-matrices 8: Pattern on logarithm comparisons scale: of estimatedfigure R-matrices 8: Pattern on logarithm comparisons scale: of estimated R-matrices on atrix, estimated R-matrices by COLD ECMand R matrix, estimated estimated R matrix R-matrices by by COLD ECMand R matrix, estimated estimated R matrix R-matrices by by COLD and estim sin data. M0 forquestion: Lysin data. does this impact inference of M0 positive for Lysin selection? data. preliminary answer: sometimes (but, oftentimes no) nclusion 6 Conclusion 6 Conclusion resented a general framework for codon We have models presented of phylogeny, a general which framework for codon We have models presented of phylogeny, a general which framework for codon models of enough to include the currently used is general models enough and provides to include the flexallow the rate matrix to be decided the currently used is general modelsenough and provides to include the flexibility the currently used models and 24 by a to number allow of the related rate matrix factors. to be decidedibility by a to number allow of therelated rate matrix factors. be decided by a number r proposed a way to construct the model matrix to include some of We further proposed a way to construct thewe model further matrix proposed to include a way some to construct of the model matrix t

25 modeling amino acid exchangeabilities 4. realism vs. complexity multiple amino acid exhangeabilities: model type Yang and Goldman 1994; Yang et al. 1998; PCP Sanudiin et al. 2005; Wong et al PCP Conant and Stadler (2009) PCP Kosiol et al. 2007; De Maio et al ECM è MEP * Doron-Faigenboin & Pupko 2007 MEP * Miyazawa (2011) MEP (LCEP) * Zoller and Schneider (2012) MEP (LCEP) * Delport et al GPP (GA) Zaheri, Dib and Salamin (2014) GPP * Bielawski et al. (in prep.) GPP * PCP: physiochemcially constrained parametric model MEP: mixed empirical and parametric model ECM: empirical codon model GPP: general purpose parametric model * models permitting double and triple changes among codons lots of options codon models o methods have different pros and cons (that s OK) o researchers have different preferences (that s OK) o I use a wide variety of different methods; you might have different preferences (that s OK) o most important: KNOW YOUR DATA and MAKE INFORMED DECISIONS. no single method will be suitable for all situations 25

types of codon models

types of codon models part 3: analysis of natural selection pressure omega models! types of codon models if i and j differ by > π j for synonymous tv. Q ij = κπ j for synonymous ts. ωπ j for non-synonymous tv. ωκπ j for non-synonymous

More information

part 3: analysis of natural selection pressure

part 3: analysis of natural selection pressure part 3: analysis of natural selection pressure markov models are good phenomenological codon models do have many benefits: o principled framework for statistical inference o avoiding ad hoc corrections

More information

Practical Bioinformatics

Practical Bioinformatics 5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o

More information

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.

More information

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton. 2017-07-29 part 4: and biological inference review types of models phenomenological Newton F= Gm1m2 r2 mechanistic Einstein Gαβ = 8π Tαβ 1 molecular evolution is process and pattern process pattern MutSel

More information

Codon-model based inference of selection pressure. (a very brief review prior to the PAML lab)

Codon-model based inference of selection pressure. (a very brief review prior to the PAML lab) Codon-model based inference of selection pressure (a very brief review prior to the PAML lab) an index of selection pressure rate ratio mode example dn/ds < 1 purifying (negative) selection histones dn/ds

More information

Advanced topics in bioinformatics

Advanced topics in bioinformatics Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib

More information

SUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA

SUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA SUPPORTING INFORMATION FOR SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA Aik T. Ooi, Cliff I. Stains, Indraneel Ghosh *, David J. Segal

More information

High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm

High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm Electronic Supplementary Material (ESI) for Nanoscale. This journal is The Royal Society of Chemistry 2018 High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence

More information

Supplemental data. Pommerrenig et al. (2011). Plant Cell /tpc

Supplemental data. Pommerrenig et al. (2011). Plant Cell /tpc Supplemental Figure 1. Prediction of phloem-specific MTK1 expression in Arabidopsis shoots and roots. The images and the corresponding numbers showing absolute (A) or relative expression levels (B) of

More information

Why do more divergent sequences produce smaller nonsynonymous/synonymous

Why do more divergent sequences produce smaller nonsynonymous/synonymous Genetics: Early Online, published on June 21, 2013 as 10.1534/genetics.113.152025 Why do more divergent sequences produce smaller nonsynonymous/synonymous rate ratios in pairwise sequence comparisons?

More information

Supplementary Information for

Supplementary Information for Supplementary Information for Evolutionary conservation of codon optimality reveals hidden signatures of co-translational folding Sebastian Pechmann & Judith Frydman Department of Biology and BioX, Stanford

More information

Crick s early Hypothesis Revisited

Crick s early Hypothesis Revisited Crick s early Hypothesis Revisited Or The Existence of a Universal Coding Frame Ryan Rossi, Jean-Louis Lassez and Axel Bernal UPenn Center for Bioinformatics BIOINFORMATICS The application of computer

More information

Clay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.

Clay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. Clay Carter Department of Biology QuickTime and a TIFF (LZW) decompressor are needed to see this picture. Ornamental tobacco

More information

Number-controlled spatial arrangement of gold nanoparticles with

Number-controlled spatial arrangement of gold nanoparticles with Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2016 Number-controlled spatial arrangement of gold nanoparticles with DNA dendrimers Ping Chen,*

More information

3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies

3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies Richard Owen (1848) introduced the term Homology to refer to structural similarities among organisms. To Owen, these similarities indicated that organisms were created following a common plan or archetype.

More information

SSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),

SSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr), 48 3 () Vol. 48 No. 3 2009 5 Journal of Xiamen University (Nat ural Science) May 2009 SSR,,,, 3 (, 361005) : SSR. 21 516,410. 60 %96. 7 %. (),(Between2groups linkage method),.,, 11 (),. 12,. (, ), : 0.

More information

Characterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin

Characterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin International Journal of Genetic Engineering and Biotechnology. ISSN 0974-3073 Volume 2, Number 1 (2011), pp. 109-114 International Research Publication House http://www.irphouse.com Characterization of

More information

Sequence Divergence & The Molecular Clock. Sequence Divergence

Sequence Divergence & The Molecular Clock. Sequence Divergence Sequence Divergence & The Molecular Clock Sequence Divergence v simple genetic distance, d = the proportion of sites that differ between two aligned, homologous sequences v given a constant mutation/substitution

More information

Modelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics

Modelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics 582746 Modelling and Analysis in Bioinformatics Lecture 1: Genomic k-mer Statistics Juha Kärkkäinen 06.09.2016 Outline Course introduction Genomic k-mers 1-Mers 2-Mers 3-Mers k-mers for Larger k Outline

More information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1 Supplementary Figure 1 Zn 2+ -binding sites in USP18. (a) The two molecules of USP18 present in the asymmetric unit are shown. Chain A is shown in blue, chain B in green. Bound Zn 2+ ions are shown as

More information

Electronic supplementary material

Electronic supplementary material Applied Microbiology and Biotechnology Electronic supplementary material A family of AA9 lytic polysaccharide monooxygenases in Aspergillus nidulans is differentially regulated by multiple substrates and

More information

NSCI Basic Properties of Life and The Biochemistry of Life on Earth

NSCI Basic Properties of Life and The Biochemistry of Life on Earth NSCI 314 LIFE IN THE COSMOS 4 Basic Properties of Life and The Biochemistry of Life on Earth Dr. Karen Kolehmainen Department of Physics CSUSB http://physics.csusb.edu/~karen/ WHAT IS LIFE? HARD TO DEFINE,

More information

Protein Threading. Combinatorial optimization approach. Stefan Balev.

Protein Threading. Combinatorial optimization approach. Stefan Balev. Protein Threading Combinatorial optimization approach Stefan Balev Stefan.Balev@univ-lehavre.fr Laboratoire d informatique du Havre Université du Havre Stefan Balev Cours DEA 30/01/2004 p.1/42 Outline

More information

SUPPLEMENTARY DATA - 1 -

SUPPLEMENTARY DATA - 1 - - 1 - SUPPLEMENTARY DATA Construction of B. subtilis rnpb complementation plasmids For complementation, the B. subtilis rnpb wild-type gene (rnpbwt) under control of its native rnpb promoter and terminator

More information

Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R

Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R AAC MGG ATT AGA TAC CCK G GGY TAC CTT GTT ACG ACT T Detection of Candidatus

More information

Introduction to Molecular Phylogeny

Introduction to Molecular Phylogeny Introduction to Molecular Phylogeny Starting point: a set of homologous, aligned DNA or protein sequences Result of the process: a tree describing evolutionary relationships between studied sequences =

More information

Regulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)

Regulatory Sequence Analysis. Sequence models (Bernoulli and Markov models) Regulatory Sequence Analysis Sequence models (Bernoulli and Markov models) 1 Why do we need random models? Any pattern discovery relies on an underlying model to estimate the random expectation. This model

More information

The Trigram and other Fundamental Philosophies

The Trigram and other Fundamental Philosophies The Trigram and other Fundamental Philosophies by Weimin Kwauk July 2012 The following offers a minimal introduction to the trigram and other Chinese fundamental philosophies. A trigram consists of three

More information

Supporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)-

Supporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)- Supporting Information for Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)- Dependence and Its Ability to Chelate Multiple Nutrient Transition Metal Ions Rose C. Hadley,

More information

Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy

Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy Supporting Information Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy Cuichen Wu,, Da Han,, Tao Chen,, Lu Peng, Guizhi Zhu,, Mingxu You,, Liping Qiu,, Kwame Sefah,

More information

Supplemental Figure 1.

Supplemental Figure 1. A wt spoiiiaδ spoiiiahδ bofaδ B C D E spoiiiaδ, bofaδ Supplemental Figure 1. GFP-SpoIVFA is more mislocalized in the absence of both BofA and SpoIIIAH. Sporulation was induced by resuspension in wild-type

More information

Supporting Information

Supporting Information Supporting Information T. Pellegrino 1,2,3,#, R. A. Sperling 1,#, A. P. Alivisatos 2, W. J. Parak 1,2,* 1 Center for Nanoscience, Ludwig Maximilians Universität München, München, Germany 2 Department of

More information

Evolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval

Evolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval Evolvable Neural Networs for Time Series Prediction with Adaptive Learning Interval Dong-Woo Lee *, Seong G. Kong *, and Kwee-Bo Sim ** *Department of Electrical and Computer Engineering, The University

More information

TM1 TM2 TM3 TM4 TM5 TM6 TM bp

TM1 TM2 TM3 TM4 TM5 TM6 TM bp a 467 bp 1 482 2 93 3 321 4 7 281 6 21 7 66 8 176 19 12 13 212 113 16 8 b ATG TCA GGA CAT GTA ATG GAG GAA TGT GTA GTT CAC GGT ACG TTA GCG GCA GTA TTG CGT TTA ATG GGC GTA GTG M S G H V M E E C V V H G T

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Probabilistic modeling and molecular phylogeny

Probabilistic modeling and molecular phylogeny Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical

More information

Codon Distribution in Error-Detecting Circular Codes

Codon Distribution in Error-Detecting Circular Codes life Article Codon Distribution in Error-Detecting Circular Codes Elena Fimmel, * and Lutz Strüngmann Institute for Mathematical Biology, Faculty of Computer Science, Mannheim University of Applied Sciences,

More information

Aoife McLysaght Dept. of Genetics Trinity College Dublin

Aoife McLysaght Dept. of Genetics Trinity College Dublin Aoife McLysaght Dept. of Genetics Trinity College Dublin Evolution of genome arrangement Evolution of genome content. Evolution of genome arrangement Gene order changes Inversions, translocations Evolution

More information

Supplemental Table 1. Primers used for cloning and PCR amplification in this study

Supplemental Table 1. Primers used for cloning and PCR amplification in this study Supplemental Table 1. Primers used for cloning and PCR amplification in this study Target Gene Primer sequence NATA1 (At2g393) forward GGG GAC AAG TTT GTA CAA AAA AGC AGG CTT CAT GGC GCC TCC AAC CGC AGC

More information

Using algebraic geometry for phylogenetic reconstruction

Using algebraic geometry for phylogenetic reconstruction Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION DOI:.8/NCHEM. Conditionally Fluorescent Molecular Probes for Detecting Single Base Changes in Double-stranded DNA Sherry Xi Chen, David Yu Zhang, Georg Seelig. Analytic framework and probe design.. Design

More information

The role of the FliD C-terminal domain in pentamer formation and

The role of the FliD C-terminal domain in pentamer formation and The role of the FliD C-terminal domain in pentamer formation and interaction with FliT Hee Jung Kim 1,2,*, Woongjae Yoo 3,*, Kyeong Sik Jin 4, Sangryeol Ryu 3,5 & Hyung Ho Lee 1, 1 Department of Chemistry,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION DOI:.38/NCHEM.246 Optimizing the specificity of nucleic acid hyridization David Yu Zhang, Sherry Xi Chen, and Peng Yin. Analytic framework and proe design 3.. Concentration-adjusted

More information

Evolutionary Analysis of Viral Genomes

Evolutionary Analysis of Viral Genomes University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral

More information

evoglow - express N kit distributed by Cat.#: FP product information broad host range vectors - gram negative bacteria

evoglow - express N kit distributed by Cat.#: FP product information broad host range vectors - gram negative bacteria evoglow - express N kit broad host range vectors - gram negative bacteria product information distributed by Cat.#: FP-21020 Content: Product Overview... 3 evoglow express N -kit... 3 The evoglow -Fluorescent

More information

Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi

Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi Supporting Information http://www.genetics.org/cgi/content/full/genetics.110.116228/dc1 Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi Ben

More information

Supplementary Information

Supplementary Information Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2014 Directed self-assembly of genomic sequences into monomeric and polymeric branched DNA structures

More information

evoglow - express N kit Cat. No.: product information broad host range vectors - gram negative bacteria

evoglow - express N kit Cat. No.: product information broad host range vectors - gram negative bacteria evoglow - express N kit broad host range vectors - gram negative bacteria product information Cat. No.: 2.1.020 evocatal GmbH 2 Content: Product Overview... 4 evoglow express N kit... 4 The evoglow Fluorescent

More information

Re- engineering cellular physiology by rewiring high- level global regulatory genes

Re- engineering cellular physiology by rewiring high- level global regulatory genes Re- engineering cellular physiology by rewiring high- level global regulatory genes Stephen Fitzgerald 1,2,, Shane C Dillon 1, Tzu- Chiao Chao 2, Heather L Wiencko 3, Karsten Hokamp 3, Andrew DS Cameron

More information

Biosynthesis of Bacterial Glycogen: Primary Structure of Salmonella typhimurium ADPglucose Synthetase as Deduced from the

Biosynthesis of Bacterial Glycogen: Primary Structure of Salmonella typhimurium ADPglucose Synthetase as Deduced from the JOURNAL OF BACTERIOLOGY, Sept. 1987, p. 4355-4360 0021-9193/87/094355-06$02.00/0 Copyright X) 1987, American Society for Microbiology Vol. 169, No. 9 Biosynthesis of Bacterial Glycogen: Primary Structure

More information

The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded DNA Sequence Is Self-Designed as a Numerical Whole

The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded DNA Sequence Is Self-Designed as a Numerical Whole Applied Mathematics, 2013, 4, 37-53 http://dx.doi.org/10.4236/am.2013.410a2004 Published Online October 2013 (http://www.scirp.org/journal/am) The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded

More information

Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila

Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila biorxiv preprint first posted online May. 3, 2016; doi: http://dx.doi.org/10.1101/051557. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved.

More information

THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE

THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE STATISTICA, anno LXIX, n. 2 3, 2009 THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE Diego Luis Gonzalez CNR-IMM, Bologna Section, Via Gobetti 101, I-40129, Bologna,

More information

Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective

Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective Jacobs University Bremen Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective Semester Project II By: Dawit Nigatu Supervisor: Prof. Dr. Werner Henkel Transmission

More information

It is the author's version of the article accepted for publication in the journal "Biosystems" on 03/10/2015.

It is the author's version of the article accepted for publication in the journal Biosystems on 03/10/2015. It is the author's version of the article accepted for publication in the journal "Biosystems" on 03/10/2015. The system-resonance approach in modeling genetic structures Sergey V. Petoukhov Institute

More information

ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line

ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line PRODUCT DATASHEET ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line CATALOG NUMBER: HTS137C CONTENTS: 2 vials of mycoplasma-free cells, 1 ml per vial. STORAGE: Vials are to be stored in liquid N

More information

Objective: You will be able to justify the claim that organisms share many conserved core processes and features.

Objective: You will be able to justify the claim that organisms share many conserved core processes and features. Objective: You will be able to justify the claim that organisms share many conserved core processes and features. Do Now: Read Enduring Understanding B Essential knowledge: Organisms share many conserved

More information

An Analytical Model of Gene Evolution with 9 Mutation Parameters: An Application to the Amino Acids Coded by the Common Circular Code

An Analytical Model of Gene Evolution with 9 Mutation Parameters: An Application to the Amino Acids Coded by the Common Circular Code Bulletin of Mathematical Biology (2007) 69: 677 698 DOI 10.1007/s11538-006-9147-z ORIGINAL ARTICLE An Analytical Model of Gene Evolution with 9 Mutation Parameters: An Application to the Amino Acids Coded

More information

Near-instant surface-selective fluorogenic protein quantification using sulfonated

Near-instant surface-selective fluorogenic protein quantification using sulfonated Electronic Supplementary Material (ESI) for rganic & Biomolecular Chemistry. This journal is The Royal Society of Chemistry 2014 Supplemental nline Materials for ear-instant surface-selective fluorogenic

More information

Supplementary information. Porphyrin-Assisted Docking of a Thermophage Portal Protein into Lipid Bilayers: Nanopore Engineering and Characterization.

Supplementary information. Porphyrin-Assisted Docking of a Thermophage Portal Protein into Lipid Bilayers: Nanopore Engineering and Characterization. Supplementary information Porphyrin-Assisted Docking of a Thermophage Portal Protein into Lipid Bilayers: Nanopore Engineering and Characterization. Benjamin Cressiot #, Sandra J. Greive #, Wei Si ^#,

More information

AtTIL-P91V. AtTIL-P92V. AtTIL-P95V. AtTIL-P98V YFP-HPR

AtTIL-P91V. AtTIL-P92V. AtTIL-P95V. AtTIL-P98V YFP-HPR Online Resource 1. Primers used to generate constructs AtTIL-P91V, AtTIL-P92V, AtTIL-P95V and AtTIL-P98V and YFP(HPR) using overlapping PCR. pentr/d- TOPO-AtTIL was used as template to generate the constructs

More information

Timing molecular motion and production with a synthetic transcriptional clock

Timing molecular motion and production with a synthetic transcriptional clock Timing molecular motion and production with a synthetic transcriptional clock Elisa Franco,1, Eike Friedrichs 2, Jongmin Kim 3, Ralf Jungmann 2, Richard Murray 1, Erik Winfree 3,4,5, and Friedrich C. Simmel

More information

Supplementary Information

Supplementary Information Supplementary Information Arginine-rhamnosylation as new strategy to activate translation elongation factor P Jürgen Lassak 1,2,*, Eva Keilhauer 3, Max Fürst 1,2, Kristin Wuichet 4, Julia Gödeke 5, Agata

More information

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG Department of Biology (Galton Laboratory), University College London, 4 Stephenson

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

Capacity of DNA Data Embedding Under. Substitution Mutations

Capacity of DNA Data Embedding Under. Substitution Mutations Capacity of DNA Data Embedding Under Substitution Mutations Félix Balado arxiv:.3457v [cs.it] 8 Jan Abstract A number of methods have been proposed over the last decade for encoding information using deoxyribonucleic

More information

Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics Maximum Likelihood in Phylogenetics June 1, 2009 Smithsonian Workshop on Molecular Evolution Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut, Storrs, CT Copyright 2009

More information

Pathways and Controls of N 2 O Production in Nitritation Anammox Biomass

Pathways and Controls of N 2 O Production in Nitritation Anammox Biomass Supporting Information for Pathways and Controls of N 2 O Production in Nitritation Anammox Biomass Chun Ma, Marlene Mark Jensen, Barth F. Smets, Bo Thamdrup, Department of Biology, University of Southern

More information

Diversity of Chlamydia trachomatis Major Outer Membrane

Diversity of Chlamydia trachomatis Major Outer Membrane JOURNAL OF ACTERIOLOGY, Sept. 1987, p. 3879-3885 Vol. 169, No. 9 0021-9193/87/093879-07$02.00/0 Copyright 1987, American Society for Microbiology Diversity of Chlamydia trachomatis Major Outer Membrane

More information

Chain-like assembly of gold nanoparticles on artificial DNA templates via Click Chemistry

Chain-like assembly of gold nanoparticles on artificial DNA templates via Click Chemistry Electronic Supporting Information: Chain-like assembly of gold nanoparticles on artificial DNA templates via Click Chemistry Monika Fischler, Alla Sologubenko, Joachim Mayer, Guido Clever, Glenn Burley,

More information

Using an Artificial Regulatory Network to Investigate Neural Computation

Using an Artificial Regulatory Network to Investigate Neural Computation Using an Artificial Regulatory Network to Investigate Neural Computation W. Garrett Mitchener College of Charleston January 6, 25 W. Garrett Mitchener (C of C) UM January 6, 25 / 4 Evolution and Computing

More information

Identification of a Locus Involved in the Utilization of Iron by Haemophilus influenzae

Identification of a Locus Involved in the Utilization of Iron by Haemophilus influenzae INFECrION AND IMMUNITY, OCt. 1994, p. 4515-4525 0019-9567/94/$04.00+0 Copyright 1994, American Society for Microbiology Vol. 62, No. 10 Identification of a Locus Involved in the Utilization of Iron by

More information

HADAMARD MATRICES AND QUINT MATRICES IN MATRIX PRESENTATIONS OF MOLECULAR GENETIC SYSTEMS

HADAMARD MATRICES AND QUINT MATRICES IN MATRIX PRESENTATIONS OF MOLECULAR GENETIC SYSTEMS Symmetry: Culture and Science Vol. 16, No. 3, 247-266, 2005 HADAMARD MATRICES AND QUINT MATRICES IN MATRIX PRESENTATIONS OF MOLECULAR GENETIC SYSTEMS Sergey V. Petoukhov Address: Department of Biomechanics,

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Edinburgh Research Explorer

Edinburgh Research Explorer Edinburgh Research Explorer Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable

More information

Insects act as vectors for a number of important diseases of

Insects act as vectors for a number of important diseases of pubs.acs.org/synthbio Novel Synthetic Medea Selfish Genetic Elements Drive Population Replacement in Drosophila; a Theoretical Exploration of Medea- Dependent Population Suppression Omar S. Abari,,# Chun-Hong

More information

The Mathematics of Phylogenomics

The Mathematics of Phylogenomics arxiv:math/0409132v2 [math.st] 27 Sep 2005 The Mathematics of Phylogenomics Lior Pachter and Bernd Sturmfels Department of Mathematics, UC Berkeley [lpachter,bernd]@math.berkeley.edu February 1, 2008 The

More information

In this article, we investigate the possible existence of errordetection/correction

In this article, we investigate the possible existence of errordetection/correction EYEWIRE BY DIEGO LUIS GONZALEZ, SIMONE GIANNERINI, AND RODOLFO ROSA In this article, we investigate the possible existence of errordetection/correction mechanisms in the genetic machinery by means of a

More information

How DNA barcoding can be more effective in microalgae. identification: a case of cryptic diversity revelation in Scenedesmus

How DNA barcoding can be more effective in microalgae. identification: a case of cryptic diversity revelation in Scenedesmus How DNA barcoding can be more effective in microalgae identification: a case of cryptic diversity revelation in Scenedesmus (Chlorophyceae) Shanmei Zou, Cong Fei, Chun Wang, Zhan Gao, Yachao Bao, Meilin

More information

Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics Maximum Likelihood in Phylogenetics 26 January 2011 Workshop on Molecular Evolution Český Krumlov, Česká republika Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut,

More information

Supplemental Figure 1. Differences in amino acid composition between the paralogous copies Os MADS17 and Os MADS6.

Supplemental Figure 1. Differences in amino acid composition between the paralogous copies Os MADS17 and Os MADS6. Supplemental Data. Reinheimer and Kellogg (2009). Evolution of AGL6-like MADSbox genes in grasses (Poaceae): ovule expression is ancient and palea expression is new Supplemental Figure 1. Differences in

More information

FliZ Is a Posttranslational Activator of FlhD 4 C 2 -Dependent Flagellar Gene Expression

FliZ Is a Posttranslational Activator of FlhD 4 C 2 -Dependent Flagellar Gene Expression JOURNAL OF BACTERIOLOGY, July 2008, p. 4979 4988 Vol. 190, No. 14 0021-9193/08/$08.00 0 doi:10.1128/jb.01996-07 Copyright 2008, American Society for Microbiology. All Rights Reserved. FliZ Is a Posttranslational

More information

SENCA: A codon substitution model to better estimate evolutionary processes

SENCA: A codon substitution model to better estimate evolutionary processes SENCA: A codon substitution model to better estimate evolutionary processes Fanny Pouyet Marc Bailly-Bechet, Dominique Mouchiroud and Laurent Guéguen LBBE - UCB Lyon - UMR 5558 June 2015, Porquerolles,

More information

T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid

T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid Lecture 11 Increasing Model Complexity I. Introduction. At this point, we ve increased the complexity of models of substitution considerably, but we re still left with the assumption that rates are uniform

More information

Symmetry Studies. Marlos A. G. Viana

Symmetry Studies. Marlos A. G. Viana Symmetry Studies Marlos A. G. Viana aaa aac aag aat caa cac cag cat aca acc acg act cca ccc ccg cct aga agc agg agt cga cgc cgg cgt ata atc atg att cta ctc ctg ctt gaa gac gag gat taa tac tag tat gca gcc

More information

Supporting Information. An Electric Single-Molecule Hybridisation Detector for short DNA Fragments

Supporting Information. An Electric Single-Molecule Hybridisation Detector for short DNA Fragments Supporting Information An Electric Single-Molecule Hybridisation Detector for short DNA Fragments A.Y.Y. Loh, 1 C.H. Burgess, 2 D.A. Tanase, 1 G. Ferrari, 3 M.A. Maclachlan, 2 A.E.G. Cass, 1 T. Albrecht*

More information

Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection

Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection Ziheng Yang,* Wendy S.W. Wong, and Rasmus Nielsen à *Department of Biology, University College London, London, United Kingdom;

More information

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0

More information

Statistics for Biology and Health. Series Editors K. Dietz, M. Gail, K. Krickeberg, A. Tsiatis, J. Samet

Statistics for Biology and Health. Series Editors K. Dietz, M. Gail, K. Krickeberg, A. Tsiatis, J. Samet Statistics for Biology and Health Series Editors K. Dietz, M. Gail, K. Krickeberg, A. Tsiatis, J. Samet Rasmus Nielsen Editor Statistical Methods in Molecular Evolution 5 Maximum Likelihood Methods for

More information

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

More information

Lecture IV A. Shannon s theory of noisy channels and molecular codes

Lecture IV A. Shannon s theory of noisy channels and molecular codes Lecture IV A Shannon s theory of noisy channels and molecular codes Noisy molecular codes: Rate-Distortion theory S Mapping M Channel/Code = mapping between two molecular spaces. Two functionals determine

More information

The Journal of Animal & Plant Sciences, 28(5): 2018, Page: Sadia et al., ISSN:

The Journal of Animal & Plant Sciences, 28(5): 2018, Page: Sadia et al., ISSN: The Journal of Animal & Plant Sciences, 28(5): 2018, Page: 1532-1536 Sadia et al., ISSN: 1018-7081 Short Communication BIOINFORMATICS ANALYSIS OF CODON USAGE BIAS AND RNA SECONDARY STRUCTURES FOR SALT

More information

Lecture 15: Realities of Genome Assembly Protein Sequencing

Lecture 15: Realities of Genome Assembly Protein Sequencing Lecture 15: Realities of Genome Assembly Protein Sequencing Study Chapter 8.10-8.15 1 Euler s Theorems A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing

More information

DNA sequence analysis of the imp UV protection and mutation operon of the plasmid TP110: identification of a third gene

DNA sequence analysis of the imp UV protection and mutation operon of the plasmid TP110: identification of a third gene QD) 1990 Oxford University Press Nucleic Acids Research, Vol. 18, No. 17 5045 DNA sequence analysis of the imp UV protection and mutation operon of the plasmid TP110: identification of a third gene David

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

Supplementary Figure 1. Schematic of split-merger microfluidic device used to add transposase to template drops for fragmentation.

Supplementary Figure 1. Schematic of split-merger microfluidic device used to add transposase to template drops for fragmentation. Supplementary Figure 1. Schematic of split-merger microfluidic device used to add transposase to template drops for fragmentation. Inlets are labelled in blue, outlets are labelled in red, and static channels

More information

Glucosylglycerate phosphorylase, a novel enzyme specificity involved in compatible solute metabolism

Glucosylglycerate phosphorylase, a novel enzyme specificity involved in compatible solute metabolism Supplementary Information for Glucosylglycerate phosphorylase, a novel enzyme specificity involved in compatible solute metabolism Jorick Franceus, Denise Pinel, Tom Desmet Corresponding author: Tom Desmet,

More information

Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution

Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution Maria Anisimova, Joseph P. Bielawski, and Ziheng Yang Department of Biology, Galton Laboratory, University College

More information

Evolutionary Change in Nucleotide Sequences. Lecture 3

Evolutionary Change in Nucleotide Sequences. Lecture 3 Evolutionary Change in Nucleotide Sequences Lecture 3 1 So far, we described the evolutionary process as a series of gene substitutions in which new alleles, each arising as a mutation ti in a single individual,

More information