Polymorphism due to multiple amino acid substitutions at a codon site within

Size: px
Start display at page:

Download "Polymorphism due to multiple amino acid substitutions at a codon site within"

Transcription

1 Polymorphism due to multiple amino acid substitutions at a codon site within Ciona savignyi Nilgun Donmez* 1, Georgii A. Bazykin 1, Michael Brudno* 2, Alexey S. Kondrashov *Department of Computer Science, and Banting and Best Department of Medical Research, University of Toronto, Toronto ON M5S 3G4 Canada Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Moscow , Russia Life Sciences Institute and Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI , USA Running head: Polymorphism due to multiple substitutions Keywords: polymorphism, positive selection, non-synonymous substitutions, twosubstitution codons, Ciona savignyi 1 These two authors contributed equally to this work. 2 Corresponding author: Michael Brudno 10 King s College Rd Pratt Bldg Rm 283 Toronto ON M5S 3G4 Canada brudno@cs.toronto.edu Tel.: Fax:

2 Abstract We compared two haploid genotypes of one Ciona savignyi individual and identified codons at which these genotypes differ by two non-synonymous substitutions. Using the Ciona intestinalis genome as an outgroup, we showed that both substitutions tend to occur in the same genotype. Only in 53 (34.4%) of 154 codons, one substitution occurred in each of the two genotypes, although 77 (50%) of such codons are to be expected if substitutions were independent. We considered two feasible evolutionary causes for the observed pattern: substitutions driven by positive selection and compensatory substitutions, as well as several potential biases. However, none of these explanations is fully compelling, and data on multiple genotypes of C. savignyi would help to elucidate the causes of this pattern. Natural populations possess very different levels of nucleotide diversity, ranging from <0.001 to >0.1 (Snoke et al 2006; Lynch 2007). Among multicellular organisms, the highest nucleotide diversity has so far been observed in a marine ascidian Ciona savignyi, where two haploid genotypes (haplotypes) sequenced from the same individual differ from each other at 8% of nucleotide sites (Small et al 2007). Such a high diversity, which appears to be primarily due to a large effective population size of the species, makes C. savignyi an attractive model organism for population genetic studies. Some phenomena and patterns that can be investigated in C. savignyi may be almost impossible to study in less polymorphic species. 2

3 Here we consider one such phenomenon: the presence, in different haplotypes of C. savignyi, of allelic codons that differ from each other at two or three nucleotide sites. Such allelic codons are exceedingly rare in less diverse populations. When codons that differ from each other by multiple nonsynonymous substitutions were observed in different species, an excess of substitutions within the same lineage was interpreted as a sign of positive selection (Bazykin et al 2004, 2006). In this paper we report a similar excess in two haplotypes of C. savignyi, and analyze its possible causes. MATERIALS AND METHODS C. intestinalis annotations constituting 14,002 genes were downloaded from ftp://ftp.jgi-psf.org/pub/jgi_data/ciona/v2.0. The C. savignyi genome (version 2.01) was downloaded from We used the alignment of the two haploid genotypes, A and B, available at the site and described in (Small et al. 2007). Translated C. intestinalis genes were queried against both haplotypes of C. savignyi using protein2genome functionality of the alignment software Exonerate (Slater and Birney 2005). The best hits for a gene on both haplotypes were required to have a normalized score of at least 3.0 (calculated by dividing the Exonerate raw score by the protein length involved in the alignment). In case of a tie for the best hit in either of the haplotypes, or if the normalized score of the best hits on each haplotype differed by more than 0.5, the protein was eliminated to avoid potential paralog problems. In addition, the locations of the alignments on each haplotype were checked to ensure that they correspond to the same position (+/- 5 nucleotides) in the global alignment of the two haplotypes (Small et al 2007). If the intersection of the 3

4 pairwise alignments was less than 100 codons or covered less than 75% of the gene, the gene was not considered for further analysis. Finally, alignments with ambiguities in nucleotide sequences or internal stop codons were discarded. For all of the remaining genes, the two sets of pairwise Exonerate alignments were merged into 3-way alignments as follows: the intersection of C. intestinalis vs. haplotype A and C. intestinalis vs. haplotype B alignments was taken based on the C. intestinalis amino acid, while introducing extra gaps whenever one of the pairwise alignments had a gap in the C. intestinalis sequence that the other did not. The remaining dataset consisted of 5,478 triplets of orthologous genes. We further masked codons that were not flanked, on each side, by gapless alignments of 10 amino acids or more, with at least 5 matches between the 2 haplotypes, and at least 3 matches between each haplotype and C. intestinalis. To remove the effect of insertion and deletion sequencing errors, we also masked frame-shifted regions: those in which the same DNA sequence of length 4 or more occurred in the aligned C. savignyi sequences with a shift of +/-1. The full alignments and the list of codons are available at The last common ancestor (LCA) codon for a pair of allelic codons in the two C. savignyi haplotypes was determined as follows. When the two allelic codons differed from each other at one nucleotide site, and encoded the same amino acid, we assumed that if the homologous C. intestinalis codon (outgroup, O) coincided either with the codon from haplotype A or with the codon from haplotype B, LCA coincided with O. In other words, 4

5 we assumed parsimony, which implies that when O coincides with A (B), the single synonymous nucleotide substitution occurred in the lineage that led to B (A). If A and B differ from each other at one nucleotide site but encode different amino acids, we assumed that if O either coincides with A (B) or differs from both A and B but still encodes the same amino acid as A (B), the LCA also encodes this amino acid, and that the only nonsynonymous substitution occurred in the lineage that led to B (A). When O encodes an amino acid different from that encoded by both A and B, we assumed that the LCA could not be determined. When the two allelic codons differ from each other at two nucleotide sites, we also assumed that the LCA coincided with O either if O coincided with A or B, or if the O codon was intermediate between codons A and B, in the sense that O differed from both A and B by a single nucleotide. In addition, when both allelic codons and the two intermediates all encoded different amino acids we assumed that the LCA encoded the same amino acid as O if O encoded the same amino acid as A, B, or one of the intermediate codons. Otherwise, we assumed that LCA could not be determined. Pairs of codons for which one of the two possible intermediate codons is a stop codon were not considered. When the two allelic codons differ from each other at three nucleotide sites, we assumed that the LCA coincided with O either if O coincided with A or B, or if the O codon was intermediate between codons A and B, in the sense that O differed from one of them at one nucleotide site and from the other one at two nucleotide sites. In all other cases, we 5

6 assumed that LCA could not be determined, as it is usually impossible to establish with confidence that all the substitutions between the two allelic codons were nonsynonymous. Pairs of codons for which one of the six possible intermediate codons is a stop codon were not considered. Gene-specific synonymous and nonsynonymous evolutionary distances were estimated by codeml program of the PAML package (Yang et al. 1997) from pairwise nucleotide alignments for the two C. savignyi haplotypes and each haplotype and the C. intestinalis genome, taken from the triple alignments. When the distances were estimated for correlation with the occurrence of a variable codon in some region, that codon itself was excluded in distance estimation. RESULTS Patterns in distribution of multiple substitutions within a codon Among the 1,251,343 homologous codons in the 5,478 analyzed genes, 93.46% are identical in the two haplotypes, and 6.40%, 0.12% and 0.005% differ at one, two and three nucleotide sites, respectively (no-, one-, two- and three-substitution codons). The mean evolutionary distance between the haplotypes is at synonymous sites and at non-synonymous sites, in agreement with Small et al. (2007). Among codons with a single synonymous substitution between the haplotype A and haplotype B, O (outgroup) coincides with either A or B in 56% of cases (Table 1). Among codons with a single nonsynonymous substitution between A and B, O encoded the same amino acid as either A or B in 60% of cases (Table 1). 6

7 There are 1610 codons separated by two substitutions, including 288 codons separated by two synonymous substitutions (such codons are rare, as they must code for either Leu or Arg), and 249 codons separated by two non-synonymous substitutions (Table 2). If the substitutions were independent, we would expect both of them to occur in the same lineage with the probability of ~50% (Bazykin et al., 2004). In agreement with this expectation, in codons where both substitutions were synonymous, they occur in the same lineage approximately in half of the cases. In contrast, two non-synonymous substitutions occur in the same lineage in 66% of the cases, and in different lineages in only 34% of the cases (Table 2). Clumping of nonsynonymous substitutions is more pronounced in highly conserved genes and gene regions (Table 2). When two nonsynonymous substitutions occur in the same lineage, they tend to occur in the lineage which has a higher rate of nonsynonymous (86 of 101; chi-square, P < ), but not necessarily synonymous (55 of 101; chi-square, P = 0.573), substitutions in this gene. In contrast, when two synonymous substitutions occur in the same lineage, there is no significant difference in the rate of synonymous or nonsynonymous substitutions between the two lineages (chi-square, P > 0.05). Clumping is also present in 66 three-substitution codons: while all substitutions are expected to occur in the same lineage in only 25% of cases, the observed value is 46% (chi-square, P = 0.029; Table 3). 7

8 Analysis In the following subsections, we will consider three possible explanations for the observed clumping of non-synonymous substitutions: positive selection, compensatory mutations, and potential biases in the Last Common Ancestor identification, as well as other biases and phenomena that could lead to an excess of substitutions in the same lineage. Two-substitution polymorphisms due to positive selection Positive selection can lead to clumping of non-synonymous substitutions within a codon (Bazykin 2004, 2006). However, in contrast to the previous observations of such clumping, here we are dealing with intrapopulation polymorphisms, and transitive polymorphisms due to positive selection-driven allele replacements are short-lived (see Crow and Kimura, 1970). Thus, it is not clear whether positive selection driving both of the substitutions that convert the codon found in haplotype A into the codon found in haplotype B can provide a quantitatively feasible explanation. We will roughly estimate the number of codons that differ by two nonsynonymous substitutions between two haplotypes that segregate within a population under positive selection. Let us assume (unrealistically) that both of these substitutions occur simultaneously. Then, in the absence of dominance, the deterministic dynamics of replacement of codon A with codon B are described by the fundamental equation dp/dt = sp(1-p) (1) 8

9 where p is the frequency of codon B, 1-p is the frequency of codon A, and s is the selective advantage of codon B over codon A. The solution to this equation is given by (2) (see Crow and Kimura, 1970). If two haploid genotypes are sampled from the population every generation, the total number of heterozygous combinations over the whole course of a substitution is T = 2p(t)(1 p(t))dt = 2/s (3) Thus, if we observe k heterozygous loci under positive selection within a pair of complete haploid genotypes, the per-generation number of positive selection-driven allele replacements required to explain this fact is ks/2. The excess of codons where both nonsynonymous substitutions occurred in the lineage of the same haplotype suggests that k ~50 (Table 2). Thus, even if selection is very weak (say, s = 10-5 ), we still need to assume that there is a positive selection-driven replacement that results in a 2-substitution codon every 4000 generations. Probably this rate of positive selection-driven evolution is too high (Eyre-Walker 2006), because selection that simultaneously favors two nonsynonymous substitutions must occur only in a minority of cases. Moreover, our calculations underestimate the required prevalence of positive selection, because in 9

10 reality the two substitutions necessary to convert codon A into codon B cannot occur simultaneously. Codon A will be replaced not by codon B directly, but by an intermediate codon which has selective advantage over A, and codons A and B would coexist for a much shorter period than the two alleles where one directly replaces the other, as assumed in equation (1). Thus, simple positive selection favoring both changes is unlikely to be the leading cause of the observed pattern. Two-substitution polymorphisms due to compensatory mutations A second potential explanation for the observed pattern is compensatory evolution. Let us assume that codons A and B have the same fitness, but the intermediate codons have a reduced fitness, 1-s. In this case, in each pair of the substitutions, only the second (from an intermediate codon to either A or B) is driven by positive selection, and there is no long-term increase in fitness. Then, the deterministic equilibrium frequency of the intermediate codons will be 2m/s, where m is the per-nucleotide mutation rate, and codons A and B appear from these intermediate codons, due to mutation, at the rate m*(2m/s) = 2m 2 /s. If we treat coexistence of A and B as a selectively neutral polymorphism with the "effective mutation rate" m eff = 2m 2 /s, the expected nucleotide diversity is π = 4N e m eff = 8N e m 2 /s. Assuming m = 10-8, N e = 10 6 (Small et al. 2007), and s = 10-5, we obtain π ~ Thus, in order to explain the 50 extra codons where the variants found in haplotypes A and B differ by two non-synonymous substitutions (Table 2), we need to assume that such compensatory selection operates on 5*10 5 codons, i. e. on 40% of all codons. The assumption that such a large fraction of protein sites are under this kind of selection, with at least two amino acids conferring the same high fitness, does 10

11 not seem to be very likely. Of course, if the selection against the intermediate codons is weaker, a smaller fraction of codons under compensatory selection will suffice: if s = 10-6 only 4% of the codons could be under this selection. As s declines past 10-6, however, selection becomes inefficient given N e = 10 6, and the intermediate codon(s) will become effectively neutral. Biased misidentification of the ancestral codon The observed clumping could also be caused by biased misidentification of the LCA codon for the two C. savignyi haplotypes. Because C. intestinalis is not a close outgroup for within-species polymorphism in C. savignyi, in a substantial fraction of cases, LCA cannot be identified under the assumption of parsimony (Table 2), and in some cases, LCA is likely to be misidentified. Unbiased mistakes in identification of the LCA codon will not produce the observed pattern: if, in the case of a two-substitution pair of C. savignyi codons A and B, the LCA codon was drawn randomly from the four possible codons (A, B, and the two intermediates), the two substitution that distinguish A from B will be attributed to the same and the two different haplotypes with equal probabilities. However, there may also be a systematic bias in misidentification of LCA, due to two reasons. First, it may be possible that only one of the two intermediate codons confers a high fitness, and the other one confers a low fitness. For example, if the codons A and B are AAT (encoding Asn) and GGT (encoding Gly), the intermediate codon AGT (encoding Ser) may be fit, and the intermediate codon GAT (encoding Asp) may be unfit. Then, if 11

12 the outgroup is very distant from the LCA, it will carry the fit intermediate codon in only 1/3 of cases (assuming that at equilibrium codons AAT, GGT, and AGT are equally common). As a result, one could conclude that for 2/3 of codon pairs, both substitutions occurred in one C. savignyi haplotype, because the LCA (as revealed by the outgroup) coincides with the other haplotype. Unfortunately, C. intestinalis is the closest known outgroup that can be used to polarize the C. savignyi polymorphism. We investigated the potential effect of biased misidentification of LCA by testing the robustness of the excess of multiple substitutions in the same lineage in interspecific divergence (Bazykin et al 2004) to the choice of the outgroup. We identified the codons with 2 non-synonymous substitutions between human and mouse, and determined LCA in 3 ways: i) using only sites where dog and opossum carry the same codon (most confident), ii) using dog as an outgroup, and iii) using opossum as an outgroup (least confident). The corresponding fractions of codons where both substitutions in two-substitution human-mouse codons occurred in the same lineage (amino acid-level pattern) were 76%, 69%, and 67%, respectively. Thus, the excess of codons where both substitutions were attributed to the same lineage diminishes when a more distant outgroup is used. These data argue against biased misidentification of the LCA as an explanation of the observed pattern in divergence between independent evolutionary lineages, although this conclusion does not necessarily apply to withinspecies polymorphism of C. savignyi. 12

13 Second, biased misidentification of LCA may appear if the amino acid composition of proteins is out of equilibrium (Jordan et al. 2005). If, for a double-substitution codon, the amino acid substitution that creates the terminal amino acid from the intermediate amino acid is more likely than the reciprocal substitution, some clumping could be an artefact of systematic errors in inferring the LCA from the outgroup codon (Bazykin et al. 2004). To test this hypothesis, we used the data on codons identical between the two C. savignyi haplotypes to infer the rates of each nonsynonymous substitution between C. intestinalis and C. savignyi. Next, for each 2-substitution pair of codons, we compared the rate of substitutions from each of the two intermediate codons into each of the two terminal codons (a total of four substitutions) with the rate of the reciprocal substitutions. A substitution from intermediate to terminal codon can lead to false excess of clumping for the given 2-substitution codon if its rate is higher than the rate of the reciprocal substitution. We compared the clumping between 2-substitution codons with 0 or 1 false excess amino acid substitutions ( false deficit codons) with that in 2-substitution codons with 2-4 false excess amino acid substitutions ( false excess codons). The difference between the two values was not large (Table 2; chi-square, P > 0.1), arguing against nonequilibrium composition of proteins as a leading cause for the observed clumping. Other explanations The observed clumping might also be due to errors in alignment, or closely correlated nearby single nucleotide sequencing errors. We believe, however, that our filtering criteria are effective in eliminating such cases: we make sure that the two orthologous C. savignyi exons are also aligned to each other in the haplotype alignments, and that the 13

14 level of conservation between the two exons in C. savignyi is above a threshold. The observed elevated prevalence of nonsynonymous substitutions in the vicinity of codons where haplotypes A and B differ by two nonsynonymous substitutions is unlikely to be due to sequencing errors, because this effect is absent when A and B differ by two synonymous substitutions. Moreover, the clumping of nonsynonymous substitutions is the strongest when the nearby codons are completely conserved between the A and B haplotypes, further reducing the chance that sequencing or alignment errors play a dominant role. This clumping cannot be explained by hypermutable CpG dinucleotides, because substitutions are also clumped in codon pairs where no intermediate codons contain CpGs (Table 2). Due to the small number of 2-substitution amino acids with mutations at 1 st and 3 rd nucleotides (Table 2), we cannot immediately dismiss that some of the clumping is due to mutation events spanning two adjacent nucleotides. In the interspecific case, however, Bazykin et al (2004) rejected this explanation in the presence of a larger dataset. DISCUSSION Our analysis of differences between the two haploid genotypes from one C. savignyi individual follows closely that for differences between mouse and rat genomes (Bazykin et al. 2004). Surprisingly, the results of these analyses are also rather similar to each other. Our data reveal clumping of within-population non-synonymous polymorphisms at the same codon that is similar in magnitude to clumping of non-synonymous substitutions 14

15 that distinguish different genomes (Tables 2 and 3). Clumping of non-synonymous substitutions in different mammalian species (Bazykin et al. 2004) and HIV-1 strains (Bazykin et al. 2006) was interpreted as the signature of positive selection which occurred at some time during the divergence of the analyzed lineages. Although the exact fraction of positive selection-driven amino acid substitutions remains controversial, it is almost certainly substantial (Eyre-Walker 2006). In contrast, the role of positive selection in maintaining polymorphism at the molecular level is believed to be small (Kimura 1983). Indeed, our rough estimates suggest that the observed clumping of non-synonymous substitutions cannot be easily explained by positive selection. We considered two different scenarios, assuming that either both, or only one, of the two non-synonymous differences between the two C. savignyi codons are favored by positive selection, and in both cases it takes very high prevalence of positive selection to explain the observed clumping. Another feasible explanation for the observed clumping of non-synonymous substitutions is the biased misidentification of the LCA. Qualitatively, this is feasible if only one intermediate codon has a high enough fitness to be present. Quantitatively, however, this mechanism appears to be unlikely to explain what we see. We also considered the effect of sequencing errors and correlated mutations at adjacent nucleotides, and do not believe these to be the leading causes of the observed clumping. 15

16 The claim that multiple non-synonymous substitutions at a codon tend to occur in clumps (Bazykin et al. 2004) was disputed by Friedman and Hughes (2005). They argued that codons with two nonsynonymous substitutions occur less frequently than codons with one synonymous and one non-synonymous substitution. Because in most genes nonsynonymous substitutions are much rarer than synonymous substitutions, their result is certainly true. This, however, in no way affects the argument that codons with two nonsynonymous substitutions in one lineage are overrepresented compared to codons with two non-synonymous mutations in different lineages. The model-free approach of Friedman and Hughes (2005) was also criticized by Yang (2006). In summary, we observe a pattern the clumping of non-synonymous substitutions that distinguish the two known haplotypes of C. savignyi that appears to be real but lacks a definite explanation. The situation is even more intriguing because the analogous pattern in divergence of different lineages can be naturally explained by positive selection. We believe that the sequencing and analysis of additional haplotypes of C. savignyi will resolve this puzzle. In particular, if positive selection does play a role in the observed clumping, nucleotide diversity in the vicinity of a young, derived allele created by several advantageous substitutions must be reduced due to hitch-hiking (e. g., Evans et al., 2005, Helgason et al. 2007). Simultaneously the allele frequencies of intermediate codons will make it possible to ascertain adaptive landscapes associated with individual polymorphic codons and to elucidate the factors responsible for our observations. ACKNOWLEDGEMENTS 16

17 This study was supported by National Science and Engineering Research Council of Canada (NSERC) Discovery grant to M.B., and grants from the Russian Academy of Sciences (programs Biodiversity and Molecular and Cellular Biology ) and Russian Foundation of Basic Research ( and ). G.A.B. was partially supported by the Russian Science Support Foundation. Literature Cited Bazykin, G. A., F. A. Kondrashov, A. Y. Ogurtsov, S. Sunyaev, and A. S. Kondrashov, 2004 Positive selection at sites of multiple amino acid replacements since ratmouse divergence. Nature 429: Bazykin, G. A., J. Dushoff, S. A. Levin, and A. S. Kondrashov, 2006 Bursts of nonsynonymous substitutions in HIV-1 evolution reveal instances of positive selection at conservative protein sites. Proc. Natl. Acad. Sci. USA 103: Crow, J. F., and M. Kimura, 1970 An Introduction to Population Genetics Theory. Harper & Row: New York. Evans, P. D., S. L. Gilbert, N. Mekel-Bobrov, E. J. Vallender, J. R. Anderson et al., 2005 Microcephalin, a gene regulating brain size, continues to evolve adaptively in humans. Science 309: Eyre-Walker, A The genomic rate of adaptive evolution. Trends Ecol. Evol. 21:

18 Friedman, R., and A. L. Hughes, 2005 The pattern of nucleotide difference at individual codons among mouse, rat, and human. Mol. Biol. Evol. 22: Huelsenbeck, J. P., S. Jain, S. W. Frost, and S. L. Pond, 2006 A Dirichlet process model for detecting positive selection in protein-coding DNA sequences. Proc. Natl. Acad. Sci. USA. 103: Helgason, A, S. Palsson, G. Thorleifsson, S. F. A. Grant, V. Emilsson et al., 2007 Refining the impact of TCF7L2 gene variants on type 2 diabetes and adaptive evolution. Nat. Genet. 39: Jordan, I. K., F. A. Kondrashov, I. A. Adzhubei, Yu. I. Wolf, E. V. Koonin et al., 2005 A universal trend of amino acid gain and loss in protein evolution. Nature 433: Kimura M., 1983 The Neutral Theory of Molecular Evolution. Cambridge: Cambridge Univ. Press. Lynch M., The Origins of Genome Architecture. Sinauer: Sunderland. Slater, G. S., and E. Birney, 2005 Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 6: 31. Small, K. S., M. Brudno, M. M. Hill, and A. Sidow, 2007 Extreme genomic variation in a natural population. Proc. Natl. Acad. Sci. USA. 104: Snoke, M. S., T. U. Berendonk, D. Barth, and M. Lynch, 2006 Large global effective population sizes in Paramecium. Mol. Biol. Evol. 23:

19 Yang Z., 2006 On the Varied Pattern of Evolution of 2 Fungal Genomes: A Critique of Hughes and Friedman. Mol. Biol. Evol. 23: Yang Z., 1997 PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:

20 Table 1. Divergence at codons where haplotypes A and B differ at 1 nucleotide site a Substitution in lineage of A a Substitution in lineage of B a LCA unknown Synonymous substitution 20,023 (50.0%) 20,052 (50.0%) 31,901 (44.3%) Non-synonymous substitution 2,452 (49.7%) 2,481 (50.3%) 3,236 (39.6%) a Frequencies in the first two columns are only within codons where the LCA is known (see Methods for details). 20

21 Table 2. Divergence at codons where haplotypes A and B differ at 2 nucleotide sites Both substitutions in same haplotype Substitutions in different haplotypes LCA unknown Two synonymous substitutions 82 (43.9%) 105 (56.2%) 101 (35.1%) One synonymous and one nonsynonymous substitution 117 (43.0%) 155 (57.0%) 404 (59.8%) Two synonymous or two nonsynonymous substitutions 52 (69.3%) 23 (30.7%) 69 (47.9%) None or one synonymous substitution, two or one nonsynonymous substitutions 68 (68.7%) 31 (31.3%) 126 (56.0%) Two non-synonymous substitutions 101 (65.6%) 53 (34.4%) 95 (38.2%) Codons: Possible false excess 63 (68.5%) 29 (31.5%) 74 (44.6%) Possible false deficit 38 (61.3%) 24 (38.7%) 21 (25.3%) 1,3-substitution a 13 (56.5%) 10 (43.5%) 10 (30.3%) CpG-free b 69 (66.3%) 35 (33.7%) 62 (37.3%) Regions: c Very strong conservation 12 (92.3%) 1 (07.7%) 4 (23.5%) Strong conservation 15 (78.9%) 4 (21.1%) 11 (36.7%) Moderate conservation 19 (55.9%) 15 (44.1%) 18 (34.6%) All others 55 (62.5%) 33 (37.5%) 62 (41.3%) Genes: d Low D n 18 (94.7%) 1 (05.3%) 4 (17.4%) Medium D n 28 (58.3%) 20 (41.7%) 32 (40.0%) 21

22 High D n 55 (63.2%) 32 (36.8%) 59 (40.4%) a 1,3-substitution codons are those where the two C. savignyi haplotypes differ from each other at the first and third nucleotide sites. b CpG-free codons are those in which neither of the two possible intermediate states between rat and mouse codons includes CpG context, neither inside the codon nor on its boundary. c Regions with very strong, strong and moderate conservation are those in which the codon under consideration is flanked from each side by gapless alignments of two C. savignyi genomes and C. intestinalis of length 10 or more each with 9 or 10, 8, and 7 invariant amino acids, respectively. d Genes were split into 3 bins of equal size (low, medium and high D n ) according to the average of D n values between C. intestinalis and each of the haplotypes of C. savignyi. 22

23 Table 3. Divergence at codons where genomes A and B differ at 3 nucleotide sites a Number of paths All substitution in Two substitutions in LCA unknown involving synonymous same lineage same lineage, one substitutions substitution in the other lineage Six 9 (52.9%) 8 (47.1%) 10 (37.0%) Five 0 (00.0%) 1 (100.0%) 5 (83.3%) Four 2 (100.0%) 0 (00.0%) 1 (33.3%) Three 1 (50.0%) 1 (50.0%) 6 (75.0%) Two 0 (00.0%) 4 (100.0%) 6 (60.0%) One 0 (00.0%) 0 (00.0%) 3 (100.0%) None 0 (00.0%) 0 (00.0%) 0 (00.0%) Total 12 (46.2%) 14 (53.8%) 31 (54.4%) a There are six evolutionary paths between the two codons that differ from each other at all three sites, depending on the order in which the substitutions occur. 23

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly Comparative Genomics: Human versus chimpanzee 1. Introduction The chimpanzee is the closest living relative to humans. The two species are nearly identical in DNA sequence (>98% identity), yet vastly different

More information

Fitness landscapes and seascapes

Fitness landscapes and seascapes Fitness landscapes and seascapes Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen: Cross-species analysis of bacterial promoters, Nonequilibrium evolution of

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Lecture Notes: BIOL2007 Molecular Evolution

Lecture Notes: BIOL2007 Molecular Evolution Lecture Notes: BIOL2007 Molecular Evolution Kanchon Dasmahapatra (k.dasmahapatra@ucl.ac.uk) Introduction By now we all are familiar and understand, or think we understand, how evolution works on traits

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility SWEEPFINDER2: Increased sensitivity, robustness, and flexibility Michael DeGiorgio 1,*, Christian D. Huber 2, Melissa J. Hubisz 3, Ines Hellmann 4, and Rasmus Nielsen 5 1 Department of Biology, Pennsylvania

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss Methods Identification of orthologues, alignment and evolutionary distances A preliminary set of orthologues was

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION Annu. Rev. Genomics Hum. Genet. 2003. 4:213 35 doi: 10.1146/annurev.genom.4.020303.162528 Copyright c 2003 by Annual Reviews. All rights reserved First published online as a Review in Advance on June 4,

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

On the optimality of the standard genetic code: the role of stop codons

On the optimality of the standard genetic code: the role of stop codons On the optimality of the standard genetic code: the role of stop codons Sergey Naumenko 1*, Andrew Podlazov 1, Mikhail Burtsev 1,2, George Malinetsky 1 1 Department of Non-linear Dynamics, Keldysh Institute

More information

Gene regulation: From biophysics to evolutionary genetics

Gene regulation: From biophysics to evolutionary genetics Gene regulation: From biophysics to evolutionary genetics Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen Johannes Berg Stana Willmann Curt Callan (Princeton)

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Processes of Evolution

Processes of Evolution 15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

Neutral Theory of Molecular Evolution

Neutral Theory of Molecular Evolution Neutral Theory of Molecular Evolution Kimura Nature (968) 7:64-66 King and Jukes Science (969) 64:788-798 (Non-Darwinian Evolution) Neutral Theory of Molecular Evolution Describes the source of variation

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012 Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based

More information

Single alignment: Substitution Matrix. 16 march 2017

Single alignment: Substitution Matrix. 16 march 2017 Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block

More information

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018 CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Fixation of Deleterious Mutations at Critical Positions in Human Proteins

Fixation of Deleterious Mutations at Critical Positions in Human Proteins Fixation of Deleterious Mutations at Critical Positions in Human Proteins Author Sankarasubramanian, Sankar Published 2011 Journal Title Molecular Biology and Evolution DOI https://doi.org/10.1093/molbev/msr097

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

In-Depth Assessment of Local Sequence Alignment

In-Depth Assessment of Local Sequence Alignment 2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

Comparative Bioinformatics Midterm II Fall 2004

Comparative Bioinformatics Midterm II Fall 2004 Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Graduate Funding Information Center

Graduate Funding Information Center Graduate Funding Information Center UNC-Chapel Hill, The Graduate School Graduate Student Proposal Sponsor: Program Title: NESCent Graduate Fellowship Department: Biology Funding Type: Fellowship Year:

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre PhD defense Chromosomal rearrangements in mammalian genomes : characterising the breakpoints Claire Lemaitre Laboratoire de Biométrie et Biologie Évolutive Université Claude Bernard Lyon 1 6 novembre 2008

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Multiple Alignment of Genomic Sequences

Multiple Alignment of Genomic Sequences Ross Metzger June 4, 2004 Biochemistry 218 Multiple Alignment of Genomic Sequences Genomic sequence is currently available from ENTREZ for more than 40 eukaryotic and 157 prokaryotic organisms. As part

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS Masatoshi Nei" Abstract: Phylogenetic trees: Recent advances in statistical methods for phylogenetic reconstruction and genetic diversity analysis were

More information

2 Genome evolution: gene fusion versus gene fission

2 Genome evolution: gene fusion versus gene fission 2 Genome evolution: gene fusion versus gene fission Berend Snel, Peer Bork and Martijn A. Huynen Trends in Genetics 16 (2000) 9-11 13 Chapter 2 Introduction With the advent of complete genome sequencing,

More information

Evolutionary Analysis of Viral Genomes

Evolutionary Analysis of Viral Genomes University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Unifying theories of molecular, community and network evolution 1

Unifying theories of molecular, community and network evolution 1 Carlos J. Melián National Center for Ecological Analysis and Synthesis, University of California, Santa Barbara Microsoft Research Ltd, Cambridge, UK. Unifying theories of molecular, community and network

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

Supporting Information

Supporting Information Supporting Information Hammer et al. 10.1073/pnas.1109300108 SI Materials and Methods Two-Population Model. Estimating demographic parameters. For each pair of sub-saharan African populations we consider

More information

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement

More information

Natural selection on the molecular level

Natural selection on the molecular level Natural selection on the molecular level Fundamentals of molecular evolution How DNA and protein sequences evolve? Genetic variability in evolution } Mutations } forming novel alleles } Inversions } change

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information

Cladistics and Bioinformatics Questions 2013

Cladistics and Bioinformatics Questions 2013 AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species

More information

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

More information

Multiple Whole Genome Alignment

Multiple Whole Genome Alignment Multiple Whole Genome Alignment BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 206 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS August 0 Vol 4 No 005-0 JATIT & LLS All rights reserved ISSN: 99-8645 wwwjatitorg E-ISSN: 87-95 EVOLUTIONAY DISTANCE MODEL BASED ON DIFFEENTIAL EUATION AND MAKOV OCESS XIAOFENG WANG College of Mathematical

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

Supporting information for Demographic history and rare allele sharing among human populations.

Supporting information for Demographic history and rare allele sharing among human populations. Supporting information for Demographic history and rare allele sharing among human populations. Simon Gravel, Brenna M. Henn, Ryan N. Gutenkunst, mit R. Indap, Gabor T. Marth, ndrew G. Clark, The 1 Genomes

More information

Supporting Information

Supporting Information Supporting Information Weghorn and Lässig 10.1073/pnas.1210887110 SI Text Null Distributions of Nucleosome Affinity and of Regulatory Site Content. Our inference of selection is based on a comparison of

More information

There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page.

There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page. EVOLUTIONARY BIOLOGY EXAM #1 Fall 2017 There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page. Part I. True (T) or False (F) (2 points each). Circle

More information

Proceedings of the SMBE Tri-National Young Investigators Workshop 2005

Proceedings of the SMBE Tri-National Young Investigators Workshop 2005 Proceedings of the SMBE Tri-National Young Investigators Workshop 25 Control of the False Discovery Rate Applied to the Detection of Positively Selected Amino Acid Sites Stéphane Guindon,* Mik Black,*à

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics. Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS. !! www.clutchprep.com CONCEPT: OVERVIEW OF EVOLUTION Evolution is a process through which variation in individuals makes it more likely for them to survive and reproduce There are principles to the theory

More information

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013 Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution. Classical vs. balanced views of genome structure

Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution. Classical vs. balanced views of genome structure Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution Classical vs. balanced views of genome structure - the proposal of the neutral theory by Kimura in 1968 led to the so-called neutralist-selectionist

More information

Group activities: Making animal model of human behaviors e.g. Wine preference model in mice

Group activities: Making animal model of human behaviors e.g. Wine preference model in mice Lecture schedule 3/30 Natural selection of genes and behaviors 4/01 Mouse genetic approaches to behavior 4/06 Gene-knockout and Transgenic technology 4/08 Experimental methods for measuring behaviors 4/13

More information

D. Incorrect! That is what a phylogenetic tree intends to depict.

D. Incorrect! That is what a phylogenetic tree intends to depict. Genetics - Problem Drill 24: Evolutionary Genetics No. 1 of 10 1. A phylogenetic tree gives all of the following information except for. (A) DNA sequence homology among species. (B) Protein sequence similarity

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

Classical Selection, Balancing Selection, and Neutral Mutations

Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection Perspective of the Fate of Mutations All mutations are EITHER beneficial or deleterious o Beneficial mutations are selected

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0

More information

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging Method KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging Zhang Zhang 1,2,3#, Jun Li 2#, Xiao-Qian Zhao 2,3, Jun Wang 1,2,4, Gane Ka-Shu Wong 2,4,5, and Jun Yu 1,2,4 * 1

More information

7.36/7.91 recitation CB Lecture #4

7.36/7.91 recitation CB Lecture #4 7.36/7.91 recitation 2-19-2014 CB Lecture #4 1 Announcements / Reminders Homework: - PS#1 due Feb. 20th at noon. - Late policy: ½ credit if received within 24 hrs of due date, otherwise no credit - Answer

More information

Estimating selection on non-synonymous mutations. Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh,

Estimating selection on non-synonymous mutations. Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Genetics: Published Articles Ahead of Print, published on November 19, 2005 as 10.1534/genetics.105.047217 Estimating selection on non-synonymous mutations Laurence Loewe 1, Brian Charlesworth, Carolina

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics: Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships

More information

Protocol S1. Replicate Evolution Experiment

Protocol S1. Replicate Evolution Experiment Protocol S Replicate Evolution Experiment 30 lines were initiated from the same ancestral stock (BMN, BMN, BM4N) and were evolved for 58 asexual generations using the same batch culture evolution methodology

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions? 1 Supporting Information: What Evidence is There for the Homology of Protein-Protein Interactions? Anna C. F. Lewis, Nick S. Jones, Mason A. Porter, Charlotte M. Deane Supplementary text for the section

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

MiGA: The Microbial Genome Atlas

MiGA: The Microbial Genome Atlas December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From

More information

Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law

Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law Ze Zhang,* Z. W. Luo,* Hirohisa Kishino,à and Mike J. Kearsey *School of Biosciences, University of Birmingham,

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information