Jungreis et al. Supplementary Materials

Size: px
Start display at page:

Download "Jungreis et al. Supplementary Materials"

Transcription

1 Supplemental Text S1: What we mean by functional To understand what we mean when we refer to functional readthrough candidates, it is helpful to review the nature of the signal detected by PhyloSF. PhyloSF gets most of its information from substitutions, so regions with very high nucleotide-level conservation have very little signal and tend to get scores near 0, intermediate between the positive scores of most coding regions and negative ones of most noncoding regions. For a region to get a high PhyloSF score it needs to have many substitutions, and these substitutions need to be ones that make up a higher proportion of the substitutions typical of coding regions than of those typical of noncoding regions, such as synonymous substitutions and substitutions between amino acids with similar biochemical properties. nlike high amino acid conservation, which could be a side effect of selective constraint on the DN sequence unrelated to translation, a high PhyloSF score can only occur when selection has allowed some variation, but has preferred variation that preserves the biochemical properties of a translation product. Furthermore, selective pressure favoring the act of translation rather than its product, say because of some regulatory effect of translation, would have a different signature. Thus, PhyloSF detects the signature of a fitness advantage of the translation product itself. Functional Translational Readthrough has been defined as translational readthrough that leads to functions different from the parent protein (Schueren and Thoms 2016). While PhyloSF detects the evolutionary signature of a peptide extension that serves some function, it alone does not show that this function is different from that of the parent protein. However, the fact that leaky stop codon contexts have been conserved for most of our readthrough candidates provides evidence that the extended protein is functionally different from the parent; indeed, if the two were functionally identical then why would there be selective pressure to preserve readthrough and to keep the extension functional? While these evolutionary signatures provide evidence for Functional Translational Readthrough in evolutionary time, only experimental validation can determine if readthrough of any particular transcript has continued to the present (though SNV data from the nopheles gambiae 1000 genomes project provide evidence for this in the aggregate). Furthermore, while we have evidence that the extended proteins do have functions distinct from the parent, we do not know what their functions are. For these reasons we have referred to them as readthrough candidates. Supplement Page 1

2 Supplemental Text S2: Estimate for number of orthologous readthrough pairs due to convergent evolution We estimated that the number of pairs of orthologous readthrough stop codons descended from a nonreadthrough ancestral stop codon that have become readthrough independently in the two clades through convergent evolution as roughly the product of the number, N, of non-readthrough ancestral stop codons that have descendent stop codons in the two clades for which we can detect orthology, times the probability, P, that in both clades a non-readthrough ancestral stop codon will have developed readthrough detectable by our process. We estimate N to be around 7188 by taking the number of stop codons in. gambiae in genes that have a Diptera-level OrthoDB ortholog in D. melanogaster, 8562, times the fraction of. gambiae readthrough stop codons having an orthologous gene but not found by orthology, that also have an end-orthologous transcript in D. melanogaster, 0.853, minus the number of orthologous readthrough pairs, 115, an estimate for the number to be excluded due to being readthrough in the ancestor. This somewhat overestimates N, because it estimates the number of pairs that are end-orthologous rather than the number satisfying the stricter condition of being stop-orthologous, and because when we subtract orthologous readthrough pairs we ignore the fact that some of our ambiguous readthrough could actually be readthrough. lthough it subtracts an estimate for the number of ancestral readthrough stop codons using orthologous readthrough pairs, which treats convergent evolution cases as ancestral, the result is little changed if we do not make this subtraction. To detect readthrough in both species, we need to detect it in one or the other species without using orthology, but can then use orthology to detect it in the other. We can estimate the probability that a stop codon that was not readthrough in the ancestor has developed readthrough in one species, and that we can detect this readthrough without orthology, by counting the number of non-ancient readthrough candidates in that species (all of which were found without using orthology) and dividing by the total number of stop codons in that species minus the number of ancient readthrough candidates. We can estimate the probability that the orthologous stop codon is readthrough in the other species and that we will be able to detect it, possibly using orthology, using our estimate for the number of readthrough stop codons in each species for which the PhyloSF-Ψ Emp score of the second ORF is positive, since that is roughly the criteria we used to determine if a stop codon is readthrough once we know it is stop orthologous to a readthrough stop codon in the other species, minus the number of ancient readthrough regions having positive PhyloSF-Ψ Emp, divided by the total number of stop codons in that species minus the number of ancient readthrough candidates. arrying out these estimates we find that the probability that a non-readthrough ancestral stop codon will have become readthrough in both clades is approximately However, while this calculation accounts for the fact that detection of readthrough in one species helps us detect readthrough in the other, it does not take into account that some genes, such as longer ones, could have a greater tendency to become readthrough, in both species, than others. ccounting for the distributions of first ORF lengths of readthrough candidates and of all other genes increases the probability that a non-readthrough ancestral stop codon will have developed readthrough in both clades by about 30%, which gives us an estimate for P of Supplement Page 2

3 ombining these estimates of N and P, we estimate that the number of pairs of orthologous readthrough stop codons that have become readthrough independently from a non-readthrough ancestral stop codon through convergent evolution in the two clades is around 7188 * = caveat is that there could be unknown confounders other than first ORF length that make an ancestral gene more likely to develop readthrough, though such confounders would have be to considerably more influential than first ORF length to change the result much in comparison to the total number of ancient readthrough regions. Supplement Page 3

4 Supplemental Text S3: Z curve test provides an underestimate for the number of readthrough regions There are several reasons that the results of our Z curve test is likely to be an underestimate for the actual number of functional readthrough regions. First, the test only counts readthrough regions at least 10 codons long and that have Z curve score greater than 0. s discussed in the main text, using our PhyloSF test we found evidence that more than half of functional readthrough regions are less than 10 codons long. Furthermore, only about half of the readthrough regions of our readthrough candidates have Z curve score > 0 (Supplemental Figure S5,D). Since the readthrough regions of our readthrough candidates are the ones having the highest coding potential as measured by PhyloSF, we would expect them also to have higher Z curve scores than most of the other readthrough regions, so probably fewer than half of functional readthrough regions have Z curve score > 0. ombining these results, we find that the estimate provided by our test is probably less than 25% of the actual number of annotated stop codons that have functional readthrough. Second, our test only looks at annotated stop codons, which could be considerably fewer than the number of actual stop codons in species having incomplete annotations. Third, there are several factors that could inflate the counts of second ORFs having positive Z curve scores in frames 1 and 2 but not frame 0, which would decrease any frame-0 excess and thus our estimate. When an annotated stop codon lies within a coding region in another frame, the second ORF of that stop codon in the other frame is likely to have a high Z curve score, which would inflate the count in frame 1 or 2; we exclude any stop codons that are within annotated coding regions in another frame, but stop codons within unannotated coding regions in other frames would bias our readthrough estimate downward. lso, while our estimate corrects for recent nonsense substitutions and sequencing errors that read a sense codon as a stop codon, which would inflate the count in frame 0, we do not correct for recent indels and sequencing indels, which would inflate the counts in frames 1 and 2 and bias our readthrough estimate downward. Supplement Page 4

5 T- T- T- T-T T-T T- T-T T- T- T- T- T- Readthrough,.gam. Readthrough, D.mel. Non-readthrough,.gam. Non-readthrough, D.mel. 20% 10% 0% 10% 20% 30% 40% Fraction of stop codons with specified context Supplemental Figure S1. Stop codon contexts in. gambiae and D. melanogaster. sage of stop codon context (stop codon and subsequent base) sorted in order of decreasing frequency among the 12,058 non-readthrough, nonmitochondrial, annotated. gambiae stop codons (dark blue), with less frequent stop codons (top, e.g., T-) experimentally associated with translational leakage in other species, and most frequent (bottom, e.g., T-) associated with efficient termination. Error bars show standard error of mean. ontext frequencies for. gambiae readthrough candidates (red) are almost the opposite of those of non-readthrough transcripts, suggesting a preference for leaky context, with 36% using T- and almost none using T-. Frequencies shown are for our unbiased by stop codon subset of readthrough candidates and exclude double-stop readthrough candidates. The frequencies for both readthrough and nonreadthrough stop codons are similar to those for D. melanogaster (pink and light blue, respectively). Supplement Page 5

6 Protein Length (mino cids) gam. RT.gam. NonRT D.mel. RT D.mel. NonRT B Number of DS exons gam. RT.gam. NonRT D.mel. RT D.mel. NonRT Mean length of an intron (NT) gam. RT.gam. NonRT D.mel. RT D.mel. NonRT D Nonreadthrough Ortholog Number of DS exons of RT NonRT pairs.gam. RT D.mel. RT E Nonreadthrough Ortholog Mean length of an intron of RT NonRT pairs.gam. RT D.mel. RT Readthrough andidate 5e+01 5e+02 5e+03 5e+04 Readthrough andidate Number of DS exons RT paired with NonRT NonRT paired with RT ll NonRT that have an ortholog Mean length of an intron (NT) RT paired with NonRT NonRT paired with RT ll NonRT that have an ortholog Supplemental Figure S2. Readthrough genes are longer than non-readthrough. (-) Box plots for lengths of unextended proteins (), number of exons in the coding portion of the transcript (B), and average length of an intron in each transcript having at least one intron (), for. gambiae readthrough candidates (red),. ambia non-readthrough transcripts (green), D. melanogaster readthrough candidates (orange), and D. melanogaster non-readthrough transcripts (gray). By all three measures, readthrough candidates are significantly longer than non-readthrough transcripts, in both species. (D,E) pper panels are scatter plots showing number of coding exons (D) and average intron length (E) for each. gambiae readthrough candidate orthologous to a non-readthrough D. melanogaster transcript versus the corresponding value for its non-readthrough ortholog (black dots) and similarly for D. melanogaster readthrough candidates orthologous to nonreadthrough. gambiae transcripts (red dots). Lower panels are box plots showing number of coding exons (D) and average intron length (E) of those readthrough candidates in either species that are orthologous to a non-readthrough transcript in the other (red), the corresponding lengths of the paired non-readthrough transcripts (green), and the lengths of all non-readthrough transcripts in genes that have orthologs in the other species (orange). nlike the similar comparison for protein lengths shown in Figure 3E, there is no clear relationship among the number of exons or average intron length of readthrough candidates, their non-readthrough orthologs, and other non-readthrough transcripts. Supplement Page 6

7 P R FBtr These parts Homologous appear stem loops similar P R FBtr P R FBtr P R FBtr Supplemental Figure S3. RN structures for readthrough-readthrough pairs. onserved, stable, RN structures predicted by RNz in the 100-nt regions downstream from (and including) candidate readthrough stop codons for the four stoporthologous pairs of readthrough candidates in. gambiae (left) and D. melanogaster (right) for which a structure was predicted in both. There is clear homology between stem loops near the 5 ends of the second pair of structures (red ovals). Other than that, we see no obvious similarity between the predicted structures in the two species, offering the possibility that it is the presence of a stable structure that is functional rather than particular features of that structure. Supplement Page 7

8 Readthrough structures and stop codons 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% T T T 20.0% 10.0% 0.0%.gam. Structure.gam. No Structure D.mel. Structure D.mel. No Structure Supplemental Figure S4. Stop codon usage in candidates having predicted structures. Frequencies of usage of T (blue), T (red), and T (green) first stop codons among. gambiae readthrough candidates having predicted structures (first group),. gambiae readthrough candidates lacking predicted structures (second group), D. melanogaster readthrough candidates having predicted structures (third group), and D. melanogaster readthrough candidates lacking predicted structures (fourth group), with error bars showing standard error of mean. lthough most readthrough candidates in both species use T, readthrough candidates having structures in D. melanogaster show a preference for T, prompting speculation that a leaky stop codon context might not be necessary for readthrough in the presence of an RN structure. However, in. gambiae there is only a small and not-statistically significant depletion for T stop codons among readthrough candidates having predicted structures. Supplement Page 8

9 .gam. Readthrough andidates B D.mel. Readthrough andidates Readthrough Regions 1st ORF 3' ends 3rd ORFs Readthrough Regions 1st ORF 3' ends 3rd ORFs PhyloSF Per odon PhyloSF Per odon.gam. Readthrough andidates D D.mel. Readthrough andidates 2nd ORFs 1st ORF 3' ends 3rd ORFs 2nd ORFs 1st ORF 3' ends 3rd ORFs Z urve score Z urve score E RT RT pair 2nd ORF codons D.mel gam. Supplemental Figure S5. Readthrough regions under less constraint than other coding regions. (-D) Distribution of coding potential as measured by PhyloSF score per codon ( and B) or Z curve score ( and D) of readthrough regions (red), same-sized coding regions at the 3 end of the first ORF of readthrough candidates (black), and noncoding third ORFs of readthrough candidates (cyan), for. gambiae ( and ) and D. melanogaster (B and D). Double readthrough candidates and readthrough candidates whose third ORF is 0-length or contains degenerate bases have been excluded. In all four cases, coding potential of readthrough regions tends to be intermediate between that of noncoding regions and other coding regions, suggesting that they have been under some purifying selection at the amino acid level within each clade, but less so than other coding regions. (E) Log-scale scatter plot showing the lengths in codons of the readthrough regions in. gambiae and D. melanogaster for each pair of stop-orthologous readthrough candidates. lthough for many pairs the lengths are similar in the two species (circles along diagonal line), there are also many pairs for which the lengths are quite different, suggesting that in many cases readthrough regions have remained functional despite large changes in length. Supplement Page 9

10 gamp3_aa * Q L S T P S K S K L * gamp3 T TT T T T T T T T gams1 T TT T T T T T T T gamm1 T TT T T T T T T T merm1 T TT T T T T T T T arad1 T TT T T T T T T T quas1 T TT T T T T T T T mel1 T TT T T T T T T T chr1 T TT T T T T T T T T epie1 T TT T T T TT T T T minm1 T TT T T T TT T T T T cul1 T TT T T T TT T T T T funf1 T TT T T T TT T T T T stes1 T TT T T T TT T T T T stei2 T TT T T T TT T T T T macm1 T TT T T T TT T T T T farf1 T TT T T T T T T T T dirw1 T TT T T T T T T T T sins1 T TT T T T T T T T T T atre1 T TT T T T T T T T T dar2 T TT T T T T T T T T T albs1 T TT T T T T T T T T T P B dmel_aa * Q L N P K S K L * dmel T TT T T T TT T dsim T TT T T T TT T dsec T TT T T T TT T dyak T TT T T T TT T dere T TT T T T TT T deug T TT T T T TT T dbia T TT T T T TT T dtak T TT T T T TT T dfic T TT T T T TT T dele T TT T T T TT T drho T TT T T T TT T dkik T TT T T T TT T dana T TT T T T TT T dbip T TT T T T TT T dper T TT T T T TT T dpse T TT T T T TT T dwil T TT T T T TT T dmoj T TT T T T T T dvir T TT T T T TT T dgri T TT T T T TT T 1969 dmel_aa * V Q H H P Y L Y Y P P E D D P Y Y N S R M * dmel T T T T TT T T TT T TT T--- T T T T dsim T T T T T TT T TT T TT T--- T T T T dsec T T T T T TT T TT T T TT T--- T T T T dyak T T T T T TT T TT T T TT T--- T T T T dere T T T T T TT T TT T T TT T--- T T T T deug T T T T T TT T TT T T T T T--- T T T T dbia T T T T T TT T TT T T T T--- T T T T dtak T T T T TT TT T TT T T T--- T T T T T dfic T T T T T T T TT T T T T--- T T T T dele T T T T T T T T T T T--- T T T T drho T T T T T T T TT T T T--- T T T T dkik T T T T T TT T T T T T--- T T T dana T T T T T TT T T T T TT--- T T T dbip T T T T T TT T TT T T TT--- T T dper T T T T T T TT T T --- T TT T T--- T T T dpse T T T T T T TT T T --- T TT T T--- T T T dwil T T T-- -T T TT T T TT --- T --- T TT TT T T T T dmoj T T T-- - TT TT T TT T --- TT T T TT T T T--- T T T dvir T T T-- - TT TTT T TT T --- TTT T T TT T T T--- T T Tsp86D Supplemental Figure S6. Peroxisomal targeting signals. lignments of readthrough regions of. gambiae gene P among 21 nopheles genomes (), its D. melanogaster ortholog, 1969, among 20 Drosophila genomes (B), and D. melanogaster gene Tsp86D (). predicted peroxisomal targeting signal in the final 12 amino acids of the extension of P (yellow highlighting) has been conserved across all 21 mosquitoes and 20 flies, despite the presence of several radical amino acid substitutions within the nopheles lineage (red highlighting) and two amino acid insertions or deletions between the two clades (red circles). The predicted peroxisomal targeting signal in Tsp86D is conserved as far as D. kikkawai but not in D. ananassae or beyond. Supplement Page 10

11 gamp3_aa * V E L F R T R L P S S F V V S L N P L T L R P W W W W H T S Q V Q * N R N R N R F F gamp3 T T T TT T T T TT T TT TT T TT T T TT T T T T T T T T T TTT T TT gams1 T T T TT T T T TT T TT TT T TTT T T TT T T T T T T T T T TTT T TT gamm1 T T T TT T T T TT T TT TT T TT T T TT T T T T T T T T T TTT T TT merm1 T T T TT T T T TT T TT TT T TTT T T TT T T T T T T T T T TTT T TT arad1 T T T TT T T T TT T TT TT T T T T TT T T T T T T T T T TTT T TT quas1 T T T TT T T T TT T TT TT T TT T T TT T T T T T T T T T TTT T TT mel1 T T T TT T T T TT T TT TT T TTT T T TT T T T T T T T T T T TTT T TT chr1 T T T TT T T T TT T TT TT T TTT T T TT T T T --- T T T T - -- T TT- T TT epie1 T T T TT T T T TT T TT TT T TTT T T TT T T T T --- T T T T TT TT T- T T minm1 T T T T T T T TT T TT TT T TTT T T TT TT T T T T T T T - T T TT T TT cul1 T T T T T T T TT T TT TT T TTT T T TT TT T T T T T - T TT TT T TT funf1 T T T TT T T T TT T TT TT T TTT T T TT TTT T T T T T T - TT --- T TTT T TT stes1 T T T TT T T T TT T T TT T TT T T TT TT T T T T T T T T TT T T TTT T TT stei2 T T T TT T T T TT T T TT T TT T T TT TT T T T T T T T T TT T T TTT T TT macm1 T T T TT T T T TT T TT TT T TT T T TT TT T T T T T T T T TT T T T T TT farf1 T T TT - -- TT T T TT TT T TT T TT T T T T TT T T TT T --- T T TT- T TT dirw1 T T T TT - -- TT TT T T TT T T TT- T TT sins1 T T TT - -- TT T TT TTT T TT T T TT T T -T T T T T - -- T T T T T atre1 T T TT - -- TT T T TTT T TT T T T T T T TT T T T - -- T T T T T T T T TT dar2 T T T TT - -- T T T TT T T - -- T -- T TT T TT T T T - -- TT T T albs1 T T T TT - -- TT T T TT T T - -- TT -- T T TT TT T T T - -- T T T T Supplemental Figure S7. Readthrough candidates identified using other features of coding regions. lignment including readthrough region of. gambiae readthrough candidate P R. lthough the PhyloSF+Stop score of this readthrough region, 10.9, is below our threshold of 17.0, it was included among our list of readthrough candidates because it exhibits other features characteristic of coding regions that are not accounted for by PhyloSF+Stop. In particular, the reading frame is preserved in all species despite the presence of many indels (gray) but degrades soon after the second stop codon (orange) and there are several synonymous substitutions in the second stop codon. (ll insertions relative to. gambiae before the second stop codon are frame preserving but are not shown in order to make the image more compact.) We included 40 second ORFs having PhyloSF+Stop between 5.0 and 17.0 in our list of. gambiae readthrough candidates based on a subjective assessment of these and other features characteristic of protein coding regions. Supplement Page 11

12 dmel_aa Y N S P P R F L * R F R V P D H H Y F S M P F * I N K I R Y N * N * dmel T T TT TTT TT T TTT T T T TT TTT T T TT T T T T T T T dsim T T TT TTT TT T TTT T T T TT TTT T T TT T T T T T T T dsec T T TT TTT TT T TTT T T T TT TTT T T TT T T T T T T T dyak T T TT TTT TT T TTT T T T T TT TTT T T TT T T T T T T T dere T T TT TTT TT T TTT T T T TT TTT T T TT T T- - T T T T T dana TTT T T T T TTT TT T T TT - T TT TT T T TT T- --- T TT T- T dpse TTT TT TTT TT T TT T T T TT TTT T T TT T - -- TT T T TT T dper TTT TT TTT TT T TT T T T TT TTT T T TT T - -- TT T T TT T dwil T T TTT TT T T T T T T TT T TT T T T TT T dvir TT T TT TT TTT TTT T T T T T TT T TT T TT T T T T T dmoj T TT T TTT TT T TT T T T TTT TT TTT T TTT T T T TT T TT T dgri TT T T T TTT TT T TT T T TT TT TTT T TT T T T T TT T TT T gamp3_aa Y H S P P R F L * K R L Q V F P D R S L F V M P F * T K K K K H E T T gamp3 TT T TT TT T T TT TT T T T T TT T T T TT T T gams1 TT T TT TT T T TT TT T T T T TT T T T TT T -- gamm1 TT T TT TT T T TT TT T T T T TT T T T TT T - T merm1 TT T TT TT T T TT TT T T T T TT T T T TT T --- T arad1 TT T TT TT T T TT TT T T T T TT T T T TT T T quas1 TT T TT TT T T T TT T T T T TT T T T TT T T mel1 TT T TT TT T T TT TT T T T T TT T T T TT T. T chr1 TT T TT TT T T TT TTT T T T T TT TT T T TT T T epie1 TT T TT TT T T TT TT T T T T TT T T T TT T T T minm1 TT T TT TT T T T TT TT T T T TT T T T TT T T cul1 TT T TT TT T T T TT TT T T T TT T T T TT T funf1 TT T TT TT T T TT TT T T T TT T T T TT T stes1 TT T TT TT T T TT TT T T T T TT T T T TT T -- - T stei2 TT T TT TT T T TT TT T T T T TT T T T TT T -- - T macm1 TT T TT TT T TT T TT T T T T TT T T T TT T T farf1 TT T TT TT T T TT TTT T T T T TT T T T TT T dirw1 TT T TT TT T T TT TTT T T T T TT T T T TT T sins1 TT TT TT TT T TT T TTT T T T T TT T T T TT T atre1 TT TT TT TT T TT T TTT T T T T TT T T T TT T dar2 TT T TT TT T T T TTT T T T T TT T T T TT T albs1 TT T TT TT T T T TTT T T T T TT T T T TT T Supplemental Figure S8. Readthrough identified using orthology. lignment of the readthrough region of D. melanogaster readthrough candidate FBtr (upper panel) identified using orthology to. gambiae readthrough candidate P R (lower panel), with many matching amino acids both within the readthrough region and before the first stop codon (yellow highlighting). FBtr had not been previously identified as readthrough, but satisfied our criteria for readthrough given orthology to a readthrough candidate in the other clade. In order to facilitate cross-clade comparisons, we identified 51 D. melanogaster and 21. gambiae readthrough candidates in this way. Supplement Page 12

13 enomes enome Browser Tools Mirrors Downloads My Data View Help bout s S enome Browser on D. melanogaster pr (BDP R5/dm3) move <<< << < > >> >>> zoom in 1.5x 3x 10x base zoom out 1.5x 3x 10x 10 B chr3r:16,505,520-16,535,237 29,718 bp. enter position, gene symbol or search terms ncient Paired ncient npaired Double-stop RT Orthologs Paralogs Orthologs Orthologs 109.gam.... P R P R P R D.mel. FBtr dmel_aa Y V * Y E T P K S T P L T P D P E N E S Y L E F S T * T F P dmel T T T TT T T TT T T T T T TT TTT T TT T dsim T T T TT T T TT T T T T TT TTT T TT T dsec T T T TT T T TT T T T T TT TTT T TT T dyak T T T TT T T TT T T T TT TTT T T T dere T T T TT T TT T T T T TT TTT T T T dana T T T TT T T T T TT TTT T dpse TT T T TT T T T T TT T TTT T T dper TT T T TT T T T T TT T TTT T T dwil TT T T TT T T T T T T T T T T T T TT T TTT T T T dvir TT T T TT -- - T --- T T T T TT TTT T T dmoj T T T TT -- - T --- T T T T TT TTT T T T dgri TT T T TT -- - T --- T T T T TT TTT T T gamp3_aa Y V * * R V T R V R I V H H H P V R I T R V I R F * E E S P S S S gamp3 T T T T T T T T T T TT T TT TTT T T T T T gams1 T T T T T T T T T T TT T TT TTT T T gamm1 T T T T T T T T T T TT T TT TTT T T merm1 T T T T T T T T T T TT T TT TTT T T arad1 T T T T T T T T T T TT T TT TTT T T quas1 T T T T T T T T T T TT T TT TTT T T mel1 T T T T T T T T T T TT T TT TTT T T chr1 T T T T T T T T - -- T T TT T TT TTT T epie1 T T T T T T T T T T TT T TT TTT T T minm1 T T T T T T T TT - -- T T T TT T TT TTT T cul1 T T T T T T T TT - -- T T T TT T TT TTT T funf1 T T T T T T T TT - -- T T T TT T TT TTT T stes1 T T T T T T T T T - -- T T T TT T TT TTT T stei2 T T T T T T T T T - -- T T T TT T TT TTT T macm1 T T T T T T T T T - -- T T TT T TT TTT T farf1 T T T T T T T T - --T T T T T TT TTT T dirw1 T T T T T T T T - --T T T TT T TT TTT T sins1 T T T T T T T T - --T T T TT T TT TTT T atre1 T T T T T T T T - --T T T TT T TT TTT T dar2 T T T T T T T TT T T T T TT TTT T T albs1 T T T T T T T TT T T TT T TT TTT T T P R P R P RB P RB P R P R P R FBtr FBtr FBtr FBtr FBtr FBtr FBtr D dmel_aa D N D L * * Q E E L E P L R N S R T I I S L I D L * V Q * T L R K ancestor T TT T T T T T T T T TT T T T T TT T T T T dmel TT T T T T T T T T T TT TT T T T T T T T T dsim TT T T T T T T T T T TT TT T T TT T T T T T dsec TT T T T T T T T T T TT TT T T T T T T T T dyak TT T T TT T T T T T T TT TT T T T T T T T T dere TT T T T T T T T T T TT TT T T T T T T T T T dana T T T T T T T T T T T T TT T T T T T T --- TT -- dpse TT T T T TT- -- TT T T T T T TT T T T T T T dper TT T T T TT- -- TT T T T T T TT T T T T T T dwil T TT T T T T T T T T T T T TT T T T T TT T T TT T dvir T T TT T T T T T T T TT T T T T TT T T T T T dmoj T TT T T T T T T T TT T T T T T dgri T TT T T T T T T T TT T T T T TT T. gamp3_aa F D L * * P S T E S V Y L R K P D L E R F L I H L * R Q E D L ancestor TT T TT T T T T T T T T TT TT T T T T T gamp3 TTT TT TT T T T T T TT T T TT TT T T T T T gams1 TTT TT TT T T T T T TT T T TT TT T T T T T gamm1 TTT TT TT T T T T T TT T T TT TT T T T T T merm1 TTT TT TT T T T T T T TT T T TT TT T T T T T arad1 TTT TT TT T T T T T TT T T TT TT T T T T T quas1 TTT TT TT T T T T T TT T T TT TT T T T T T mel1 TTT TT TT T T T T T TT T T TT TT T T T T T chr1 TT T TT TT T T T T T T TT T T TT TT TT T T T T epie1 TT TT TT T T T T T T T TT T T TT TT T T T T T minm1 TTT TT TT T T T T T T TT TT T TTT TT T T T T T cul1 TTT TT TT T T T T T T T TT T TTT TT T T T T T funf1 TT TT TT T T T T T T TT T T TTT TT T T T T T stes1 TT TT TT T T T T T T TT T T TTT TT T T T T T stei2 TT TT TT T T T T T T TT T T TTT TT T T T T T macm1 TTT T TT TT T T T T T T TT T T T TTT TT T T T T T farf1 TT T TT T T T T T T T T TT T T T T T dirw1 TT T TT T T T T T T T T TT T T T T T sins1 TT T TT T T T T T T T T TT TT T T T T T T atre1 TT T TT T T T T T T T T T TT T T T T T T T T dar2 TTT T TT T T T T T T T T TT TT T T T T albs1 TT T TT T T T T T T T T TT TT T T T T Supplement Page 13

14 Supplemental Figure S9. ncient readthrough pairing. () Pairing of ancient readthrough candidates in. gambiae and D. melanogaster. We defined a readthrough candidate to be ancient if it is stop-orthologous to a readthrough candidate in the other species at the Diptera level and is not a double-stop readthrough candidate. In most cases (109) there is a one-toone pairing of orthologous ancient readthrough candidates (green boxes). However, in two cases there are two paralogous. gambiae readthrough candidates orthologous to the same D. melanogaster candidate, so we excluded one of the paralogs (white boxes) from analyses that required a one-to-one pairing. There is one case of four homologous readthrough candidates, however we were able to determine that these split into two orthologous pairs (panel B). There are three D. melanogaster readthrough candidates orthologous to double-stop readthrough candidates in. gambiae (blue rectangles), so these were excluded from analyses that required a one-to-one pairing of non-double-stop readthrough candidates (panel ). Finally, there is one pair of orthologous double-stop readthrough candidates, which were excluded from most of our analyses (panel D). (B) Four-way homologous readthrough candidates. Two. gambiae readthrough candidates and two D. melanogaster readthrough candidates, all homologous at the Diptera level, displayed using the S genome browser. In each species, the first ORFs of the two readthrough candidate transcripts are four-exon alternative splice variants of a single gene, with the first three exons shared. The downstream fourth exons in the two species are more similar to each other than either is to the alternative exons, and the same is true of the upstream fourth exons, suggesting that these two genes are descended from a gene in the common ancestor having a similar configuration, and that the alternative final exons were formed by an earlier duplication of a final exon containing a readthrough stop codon. () Double-stop readthrough orthologous to single-stop readthrough. lignments including the readthrough regions of D. melanogaster readthrough candidate FBtr and. gambiae double-stop readthrough candidate P RB. It is possible that the second T stop codon in the double first stop codon of P RB is related by a single nucleotide substitution to the TT tyrosine codon immediately 3 of the first stop codon in FBtr We note that although there is very little cross-clade similarity between the readthrough regions at the amino acid level, within each of the two clades there are several alignment columns immediately after the first stop codon and near the end of the readthrough region that have no synonymous substitutions, suggesting that there might be some overlapping constraint at the nucleotide level in addition to any constraint on the amino acid sequences. (D) Orthologous double-stop readthrough candidates. lignments including the readthrough regions of orthologous double-stop readthrough candidates FBtr and P R. The common double-t stop codon suggests that these might be descended from a double-stop readthrough transcript in the common ancestor. Supplement Page 14

15 gamp3_aa D S L Y K R * L L R V R S L F S L * * Q R E L * V L W R gamp3 T T T T T TT T T T TT T T T T --- T T TT TT T gams1 T T T T T TT T T T TT T T T T --- T T T TT T T gamm1 T T T T T TT T T T TT T T T T --- T T T TT T T merm1 T T T T T TT T T T TT T T T T --- T T T TT T T arad1 T T T T T TT T T T TT T T T T --- T T TT TT T quas1 T T T T T TT T T T TT T T T T --- T T T TT T T mel1 T T T T T TT T T T TT T T T T --- T T T TT T T chr1 T T T T T TT T T T TT T T T T --- T T T T TT T T T epie1 T T T T T TT T T T TT T T T --- T T T minm1 T T T T T TT T T T TT T T T T --- T T T cul1 T T T T T TT T T T T TT T T T T --- T T T funf1 T T T T T TT T TT T T TT T T T T --- T T T stes1 T T T T T TT T T T TT T T T T --- T T T T T T stei2 T T T T T TT T T T TT T T T T --- T T T T T T macm1 T T T T T TT T T T TT T T T T --- T T T T. T farf1 T T T T T T TT T T T TT T T T T T T dirw1 T T T T T TT T T T T TT T T T T --- T T T sins1 T T T T T TT T T T TTT T T T T T T T atre1 T T T T T T TT TT T T TTT T T T T T T T dar2 T T T T T T T T T T T TT T T T T T T albs1 T T T T T T T T T T T TT T T T T T T B gamp3_aa I D L L * K T K * * Q L L K R R F * N D T H R R gamp3 T T TT TT T T T T T T TTT T T T gams1 T T TT TT T T T T T T TTT T T T merm1 T T TT TT T T T T TT T TTT T T T arad1 T T TT TT T T T T T T TTT T T T quas1 T T TT TT T T T T TT T TTT T TT T mel1 T T TT TT T T T T TT T TTT T T T chr1 TT T T TT T TT T T TT T TTT T TTT TT T epie1 TT T TT TT T TT T T TT T T TTT T T-- T T minm1 TT T TT TT T TT T T TT T TTT T T- --T cul1 TT T TT TT T TT T T TT T TTT T T- --T funf1 TT T TT TT T TT T T TT T TTT T TT- T--T TT stes1 T T TT TT T TT T T TT T T TTT T TT TT T stei2 T T TT TT T TT T T TT T T TTT T TT TT T farf1 T T TT TT T TT T T TTT T T T dirw1 T T T TT TT T T T T TT T TT T T T-- --T sins1 T T T TT T T T T T T TT T T --- T-- T atre1 T T T TT T T T T T T TTT T T T -- T dar2 T T T T TT T TT T TT T T TT T -TT - -- T T albs1 T T T TT T TT T T T T TT T Supplemental Figure S10. Single-double readthrough. lignments showing triple readthrough for P R () and P R (B), in which a single readthrough is followed by a double-stop readthrough in some species. In P R, the second T stop codon in the. gambiae double stop codon is aligned to a likely-ancestral T ysteine codon in several species, though the presence of indels makes the history of the particular codon uncertain. In P R the second T stop codon in the double stop codon appears to be ancestral, and has become a TT Leucine codon in. darlingi. Supplement Page 15

16 Percent of SNVs 80% 70% 60% 50% 40% 30% 20% SNVs synonymous in frame Frame0 Frame1 Frame2 10% 0% Before 1st Stop In RT Region <er 2nd Stop B umulative Distribution SNVs Before 1st Stop Non synonymous Synonymous umulative Distribution SNVs In RT Region Non synonymous Synonymous umulative Distribution SNVs fter 2nd Stop Non synonymous Synonymous Derived llele Frequency Derived llele Frequency Derived llele Frequency Supplemental Figure S11. Polymorphism evidence supports recent protein-coding selection. () Single nucleotide variants (SNVs) in. gambiae show a strong bias toward synonymous codon changes in readthrough regions (middle) and same-sized coding regions 5 of readthrough stop codons (left), but not in same-sized noncoding regions 3 of second stop codons of readthrough candidates (right), providing evidence that readthrough regions are under purifying selection at the amino acid level within the. gambiae population. For each type of region we show the fraction of SNVs that would be synonymous if translated in each of three frames, with frame 0 matching the translated frame of the coding region of the gene. Error bars show the Standard Error of the Mean (SEM). (B) umulative distributions of derived allele frequencies for SNVs that would be synonymous (red) or non-synonymous (black) if translated in the frame of the coding region of the gene, for the same three sets of regions. Derived allele frequencies are lower for non-synonymous SNVs than for synonymous ones, in both coding and readthrough regions, indicating that they are under greater purifying selection, whereas in noncoding regions there is no significant difference, providing further evidence that purifying selection at the amino acid level in readthrough regions has continued in the. gambiae population. Supplement Page 16

17 .gam. 5 codon regions B.gam. 10 codon regions PsiEmp coding Psi normal coding PsiEmp noncoding Psi normal noncoding PsiEmp coding Psi normal coding PsiEmp noncoding Psi normal noncoding Density PhyloSF scores corresponding to PhyloSF-Ψ EMP = 17 and PhyloSF-Ψ = 17. Density Normal approxima8on overes8mates density in right tail of noncoding distribu8on Raw PhyloSF Score Raw PhyloSF Score.gam. 20 codon regions D.gam. 40 codon regions PsiEmp coding Empirical coding Psi normal coding PsiEmp noncoding Empirical noncoding Psi normal noncoding PsiEmp coding Empirical coding Psi normal coding PsiEmp noncoding Empirical noncoding Psi normal noncoding Density Density Small sample size makes tail of empirical distribu8on unreliable for regions >> 10 codons Raw PhyloSF Score Raw PhyloSF Score Supplement Page 17

18 E D.mel. 5 codon regions F D.mel. 10 codon regions PsiEmp coding Psi normal coding PsiEmp noncoding Psi normal noncoding PsiEmp coding Psi normal coding PsiEmp noncoding Psi normal noncoding Density Density Raw PhyloSF Score Raw PhyloSF Score D.mel. 20 codon regions H D.mel. 40 codon regions PsiEmp coding Empirical coding Psi normal coding PsiEmp noncoding Empirical noncoding Psi normal noncoding PsiEmp coding Empirical coding Psi normal coding PsiEmp noncoding Empirical noncoding Psi normal noncoding Density Density Raw PhyloSF Score Raw PhyloSF Score Supplemental Figure S12. Empirical score distributions allow PhyloSF-Ψ Emp to achieve higher sensitivity than PhyloSF-Ψ while maintaining high specificity. oding (black) and noncoding (red) PhyloSF score distributions used to define PhyloSF-Ψ Emp for. gambiae (-D) and D. melanogaster (E-H), and normal approximations used to define PhyloSF-Ψ (magenta and orange dashed lines), for regions of lengths 5, 10, 20, and 40 codons. For lengths less than or equal to 10, the distributions used to define PhyloSF-Ψ Emp were calculated directly from training regions of that length, whereas for greater lengths PhyloSF-Ψ Emp was defined by scaling the distributions for regions of length 10. For lengths greater than 10 we show the actual distributions of coding (green) and noncoding (cyan) training regions of those lengths for reference, even though they were not used in calculating PhyloSF-Ψ Emp. The bump in the right tail of the cyan curve for. gambiae 40-codon regions (D) is presumably a result of sampling error due to the small number of noncoding training regions of that length; such sampling errors are the reason that we scaled the length-10 distribution rather than interpolating densities through scores of actual regions of greater lengths. PhyloSF scores for regions of each length for which PhyloSF-Ψ Emp and PhyloSF-Ψ are equal to our score threshold of 17 (solid and dashed vertical blue lines, respectively) are the scores that are approximately 50 times more likely to occur in coding regions than noncoding regions. Because the normal approximations overestimate the densities in the right tail of the noncoding score distributions, PhyloSF-Ψ overestimates the PhyloSF score needed to achieve this high specificity; by using the more accurate empirical densities, PhyloSF-Ψ Emp allows us to detect coding regions having lower PhyloSF score, achieving greater sensitivity while maintaining this high specificity. Supplement Page 18

19 Supplement Page 19 Supplemental Figure S13. Score distribution scale factors used for definition of PhyloSF-Ψ Emp. Mean PhyloSF scores of coding (black circles) and noncoding (red circles) training regions of each length used to define PhyloSF-Ψ Emp for. gambiae () and D. melanogaster (B) and intervals one standard deviation above and below (black and red vertical lines). For lengths greater than 10 codons, PhyloSF-Ψ Emp was defined by scaling the distributions for regions of length 10 codons using the means and standard deviations of the scores of training regions of each length up to 60 codons, and an extrapolation using linear regression on the means and the logs of the standard deviations of regions of length from 30 to 60 codons for regions greater than 60 codons (green and cyan curves) Region length (codons) Raw PhyloSF score PhyloSF PsiEmp scale factors for D.mel mean + stdev for coding training regions mean + stdev for noncoding training regions extrapolation for longer coding regions extrapolation for longer noncoding regions Region length (codons) Raw PhyloSF score PhyloSF PsiEmp scale factors for.gam mean + stdev for coding training regions mean + stdev for noncoding training regions extrapolation for longer coding regions extrapolation for longer noncoding regions B

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Lecture 17. Comparative genomics I: Genome annotation using evolutionary signatures

Lecture 17. Comparative genomics I: Genome annotation using evolutionary signatures 6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution Lecture 17 Comparative genomics I: Genome annotation using evolutionary signatures 1 Module V: Comparative genomics and evolution

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION med!1,2 Wild-type (N2) end!3 elt!2 5 1 15 Time (minutes) 5 1 15 Time (minutes) med!1,2 end!3 5 1 15 Time (minutes) elt!2 5 1 15 Time (minutes) Supplementary Figure 1: Number of med-1,2, end-3, end-1 and

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11 The Eukaryotic Genome and Its Expression Lecture Series 11 The Eukaryotic Genome and Its Expression A. The Eukaryotic Genome B. Repetitive Sequences (rem: teleomeres) C. The Structures of Protein-Coding

More information

Annotation of Drosophila grimashawi Contig12

Annotation of Drosophila grimashawi Contig12 Annotation of Drosophila grimashawi Contig12 Marshall Strother April 27, 2009 Contents 1 Overview 3 2 Genes 3 2.1 Genscan Feature 12.4............................................. 3 2.1.1 Genome Browser:

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Cryo-EM data collection, refinement and validation statistics

Cryo-EM data collection, refinement and validation statistics 1 Table S1 Cryo-EM data collection, refinement and validation statistics Data collection and processing CPSF-160 WDR33 (EMDB-7114) (PDB 6BM0) CPSF-160 WDR33 (EMDB-7113) (PDB 6BLY) CPSF-160 WDR33 CPSF-30

More information

Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila

Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila biorxiv preprint first posted online May. 3, 2016; doi: http://dx.doi.org/10.1101/051557. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved.

More information

Gene Family Evolution across 12 Drosophila Genomes

Gene Family Evolution across 12 Drosophila Genomes Gene Family Evolution across 12 Drosophila Genomes Matthew W. Hahn 1,2*, Mira V. Han 2, Sang-Gook Han 2 1 Department of Biology, Indiana University, Bloomington, Indiana, United States of America, 2 School

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Chromosomal Rearrangement Inferred From Comparisons of 12 Drosophila Genomes

Chromosomal Rearrangement Inferred From Comparisons of 12 Drosophila Genomes Copyright Ó 2008 by the Genetics Society of America DOI: 10.1534/genetics.107.086108 Chromosomal Rearrangement Inferred From Comparisons of 12 Drosophila Genomes Arjun Bhutkar,*,,1 Stephen W. Schaeffer,

More information

Eukaryotic vs. Prokaryotic genes

Eukaryotic vs. Prokaryotic genes BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 18: Eukaryotic genes http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Eukaryotic vs. Prokaryotic genes Like in prokaryotes,

More information

Sequence Alignment (chapter 6)

Sequence Alignment (chapter 6) Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:

More information

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,

More information

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly Comparative Genomics: Human versus chimpanzee 1. Introduction The chimpanzee is the closest living relative to humans. The two species are nearly identical in DNA sequence (>98% identity), yet vastly different

More information

Expanded View Figures

Expanded View Figures Molecular Systems iology Evolutionary divergence of mouse P Mei-Sheng Xiao et al Expanded View Figures Nucleotide percentage (%) 0 20 40 60 80 100 T G Nucleotide percentage (%) 0 20 40 60 80 100 T G Frequency

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Supplementary Information for Discovery and characterization of indel and point mutations

Supplementary Information for Discovery and characterization of indel and point mutations Supplementary Information for Discovery and characterization of indel and point mutations using DeNovoGear Avinash Ramu 1 Michiel J. Noordam 1 Rachel S. Schwartz 2 Arthur Wuster 3 Matthew E. Hurles 3 Reed

More information

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018 CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

More information

objective functions...

objective functions... objective functions... COFFEE (Notredame et al. 1998) measures column by column similarity between pairwise and multiple sequence alignments assumes that the pairwise alignments are optimal assumes a set

More information

GEP Annotation Report

GEP Annotation Report GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

Comparing whole genomes

Comparing whole genomes BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION Annu. Rev. Genomics Hum. Genet. 2003. 4:213 35 doi: 10.1146/annurev.genom.4.020303.162528 Copyright c 2003 by Annual Reviews. All rights reserved First published online as a Review in Advance on June 4,

More information

EARLY ONLINE RELEASE. Daniel A. Pollard, Venky N. Iyer, Alan M. Moses, Michael B. Eisen

EARLY ONLINE RELEASE. Daniel A. Pollard, Venky N. Iyer, Alan M. Moses, Michael B. Eisen EARLY ONLINE RELEASE This is a provisional PDF of the author-produced electronic version of a manuscript that has been accepted for publication. Although this article has been peer-reviewed, it was posted

More information

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid. 1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

Extensive identification and analysis of conserved small ORFs in animals

Extensive identification and analysis of conserved small ORFs in animals Mackowiak et al. Genome Biology (2015) 16:179 DOI 10.1186/s13059-015-0742-x RESEARCH Open Access Extensive identification and analysis of conserved small ORFs in animals Sebastian D. Mackowiak 1, Henrik

More information

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6) Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?

More information

1. In most cases, genes code for and it is that

1. In most cases, genes code for and it is that Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod

More information

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis he universe of biological sequence analysis Word/pattern recognition- Identification of restriction enzyme cleavage sites Sequence alignment methods PstI he universe of biological sequence analysis - prediction

More information

CRITICA: Coding Region Identification Tool Invoking Comparative Analysis

CRITICA: Coding Region Identification Tool Invoking Comparative Analysis CRITICA: Coding Region Identification Tool Invoking Comparative Analysis Jonathan H. Badger and Gary J. Olsen Department of Microbiology, University of Illinois Gene recognition is essential to understanding

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species. Supplementary Figure 1 Icm/Dot secretion system region I in 41 Legionella species. Homologs of the effector-coding gene lega15 (orange) were found within Icm/Dot region I in 13 Legionella species. In four

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Genome-wide analysis of the MYB transcription factor superfamily in soybean

Genome-wide analysis of the MYB transcription factor superfamily in soybean Du et al. BMC Plant Biology 2012, 12:106 RESEARCH ARTICLE Open Access Genome-wide analysis of the MYB transcription factor superfamily in soybean Hai Du 1,2,3, Si-Si Yang 1,2, Zhe Liang 4, Bo-Run Feng

More information

Name: Monster Synthesis Activity

Name: Monster Synthesis Activity Name: Monster Synthesis ctivity Purpose: To examine how an organism s DN determines their phenotypes. blem: How can traits on a particular chromosome be determined? How can these traits determine the characteristics

More information

Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites

Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites Paper by: James P. Balhoff and Gregory A. Wray Presentation by: Stephanie Lucas Reviewed

More information

Package vhica. April 5, 2016

Package vhica. April 5, 2016 Type Package Package vhica April 5, 2016 Title Vertical and Horizontal Inheritance Consistence Analysis Version 0.2.4 Date 2016-04-04 Author Arnaud Le Rouzic Suggests ape, plotrix, parallel, seqinr, gtools

More information

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions? 1 Supporting Information: What Evidence is There for the Homology of Protein-Protein Interactions? Anna C. F. Lewis, Nick S. Jones, Mason A. Porter, Charlotte M. Deane Supplementary text for the section

More information

Pyrobayes: an improved base caller for SNP discovery in pyrosequences

Pyrobayes: an improved base caller for SNP discovery in pyrosequences Pyrobayes: an improved base caller for SNP discovery in pyrosequences Aaron R Quinlan, Donald A Stewart, Michael P Strömberg & Gábor T Marth Supplementary figures and text: Supplementary Figure 1. The

More information

Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

More information

Unlocated Arthropod genes and ways to find them

Unlocated Arthropod genes and ways to find them Unlocated Arthropod genes and ways to find them Many bug genes are hard to find - Daphnia s many tandems were lost for a bit Duplicate genes, a bain and a boon Genome tile expression picks out many more

More information

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and counterclockwise for the inner row, with green representing coding

More information

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

Student Handout Fruit Fly Ethomics & Genomics

Student Handout Fruit Fly Ethomics & Genomics Student Handout Fruit Fly Ethomics & Genomics Summary of Laboratory Exercise In this laboratory unit, students will connect behavioral phenotypes to their underlying genes and molecules in the model genetic

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype Lecture Series 7 From DNA to Protein: Genotype to Phenotype Reading Assignments Read Chapter 7 From DNA to Protein A. Genes and the Synthesis of Polypeptides Genes are made up of DNA and are expressed

More information

Getting statistical significance and Bayesian confidence limits for your hidden Markov model or score-maximizing dynamic programming algorithm,

Getting statistical significance and Bayesian confidence limits for your hidden Markov model or score-maximizing dynamic programming algorithm, Getting statistical significance and Bayesian confidence limits for your hidden Markov model or score-maximizing dynamic programming algorithm, with pairwise alignment of sequences as an example 1,2 1

More information

Synteny Portal Documentation

Synteny Portal Documentation Synteny Portal Documentation Synteny Portal is a web application portal for visualizing, browsing, searching and building synteny blocks. Synteny Portal provides four main web applications: SynCircos,

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information

A Method for Aligning RNA Secondary Structures

A Method for Aligning RNA Secondary Structures Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1 Outline Introduction Structural alignment of RN

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Practical considerations of working with sequencing data

Practical considerations of working with sequencing data Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!

More information

8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage

8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage Chris M. Rands 1, Stephen Meader 1, Chris P. Ponting 1 *, Gerton Lunter 2

More information

In Genomes, Two Types of Genes

In Genomes, Two Types of Genes In Genomes, Two Types of Genes Protein-coding: [Start codon] [codon 1] [codon 2] [ ] [Stop codon] + DNA codons translated to amino acids to form a protein Non-coding RNAs (NcRNAs) No consistent patterns

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

MegAlign Pro Pairwise Alignment Tutorials

MegAlign Pro Pairwise Alignment Tutorials MegAlign Pro Pairwise Alignment Tutorials All demo data for the following tutorials can be found in the MegAlignProAlignments.zip archive here. Tutorial 1: Multiple versus pairwise alignments 1. Extract

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Lecture : p he biological problem p lobal alignment p Local alignment p Multiple alignment 6 Background: comparative genomics p Basic question in biology: what properties

More information

Lecture Notes: BIOL2007 Molecular Evolution

Lecture Notes: BIOL2007 Molecular Evolution Lecture Notes: BIOL2007 Molecular Evolution Kanchon Dasmahapatra (k.dasmahapatra@ucl.ac.uk) Introduction By now we all are familiar and understand, or think we understand, how evolution works on traits

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Exploring Evolution & Bioinformatics

Exploring Evolution & Bioinformatics Chapter 6 Exploring Evolution & Bioinformatics Jane Goodall The human sequence (red) differs from the chimpanzee sequence (blue) in only one amino acid in a protein chain of 153 residues for myoglobin

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

Supporting Information

Supporting Information Supporting Information Das et al. 10.1073/pnas.1302500110 < SP >< LRRNT > < LRR1 > < LRRV1 > < LRRV2 Pm-VLRC M G F V V A L L V L G A W C G S C S A Q - R Q R A C V E A G K S D V C I C S S A T D S S P E

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

Organic Chemistry Option II: Chemical Biology

Organic Chemistry Option II: Chemical Biology Organic Chemistry Option II: Chemical Biology Recommended books: Dr Stuart Conway Department of Chemistry, Chemistry Research Laboratory, University of Oxford email: stuart.conway@chem.ox.ac.uk Teaching

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

Illegitimate translation causes unexpected gene expression from on-target out-of-frame alleles

Illegitimate translation causes unexpected gene expression from on-target out-of-frame alleles Illegitimate translation causes unexpected gene expression from on-target out-of-frame alleles created by CRISPR-Cas9 Shigeru Makino, Ryutaro Fukumura, Yoichi Gondo* Mutagenesis and Genomics Team, RIKEN

More information

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=

More information

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall

More information

Introduction to protein alignments

Introduction to protein alignments Introduction to protein alignments Comparative Analysis of Proteins Experimental evidence from one or more proteins can be used to infer function of related protein(s). Gene A Gene X Protein A compare

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

Supporting Information

Supporting Information Supporting Information Weghorn and Lässig 10.1073/pnas.1210887110 SI Text Null Distributions of Nucleosome Affinity and of Regulatory Site Content. Our inference of selection is based on a comparison of

More information

Markov Models & DNA Sequence Evolution

Markov Models & DNA Sequence Evolution 7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

More information

species Nicolas Bargues and Emmanuelle Lerat *

species Nicolas Bargues and Emmanuelle Lerat * Bargues and Lerat Mobile DNA (2017) 8:7 DOI 10.1186/s13100-017-0090-3 RESEARCH Open Access Evolutionary history of LTRretrotransposons among 20 Drosophila species Nicolas Bargues and Emmanuelle Lerat *

More information

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years. Structure Determination and Sequence Analysis The vast majority of the experimentally determined three-dimensional protein structures have been solved by one of two methods: X-ray diffraction and Nuclear

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

Comparative Bioinformatics Midterm II Fall 2004

Comparative Bioinformatics Midterm II Fall 2004 Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans

More information

Comparative Genomics II

Comparative Genomics II Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

Biology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall

Biology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall Biology Biology 1 of 26 Fruit fly chromosome 12-5 Gene Regulation Mouse chromosomes Fruit fly embryo Mouse embryo Adult fruit fly Adult mouse 2 of 26 Gene Regulation: An Example Gene Regulation: An Example

More information

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007 -2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open

More information

Computational Identification of Evolutionarily Conserved Exons

Computational Identification of Evolutionarily Conserved Exons Computational Identification of Evolutionarily Conserved Exons Adam Siepel Center for Biomolecular Science and Engr. University of California Santa Cruz, CA 95064, USA acs@soe.ucsc.edu David Haussler Howard

More information

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

More information

From gene to protein. Premedical biology

From gene to protein. Premedical biology From gene to protein Premedical biology Central dogma of Biology, Molecular Biology, Genetics transcription replication reverse transcription translation DNA RNA Protein RNA chemically similar to DNA,

More information

1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine.

1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine. Protein Synthesis & Mutations RNA 1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine. RNA Contains: 1. Adenine 2.

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Interpolated Markov Models for Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Interpolated Markov Models for Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Interpolated Markov Models for Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following the

More information