Jungreis et al. Supplementary Materials
|
|
- Pamela Williams
- 5 years ago
- Views:
Transcription
1 Supplemental Text S1: What we mean by functional To understand what we mean when we refer to functional readthrough candidates, it is helpful to review the nature of the signal detected by PhyloSF. PhyloSF gets most of its information from substitutions, so regions with very high nucleotide-level conservation have very little signal and tend to get scores near 0, intermediate between the positive scores of most coding regions and negative ones of most noncoding regions. For a region to get a high PhyloSF score it needs to have many substitutions, and these substitutions need to be ones that make up a higher proportion of the substitutions typical of coding regions than of those typical of noncoding regions, such as synonymous substitutions and substitutions between amino acids with similar biochemical properties. nlike high amino acid conservation, which could be a side effect of selective constraint on the DN sequence unrelated to translation, a high PhyloSF score can only occur when selection has allowed some variation, but has preferred variation that preserves the biochemical properties of a translation product. Furthermore, selective pressure favoring the act of translation rather than its product, say because of some regulatory effect of translation, would have a different signature. Thus, PhyloSF detects the signature of a fitness advantage of the translation product itself. Functional Translational Readthrough has been defined as translational readthrough that leads to functions different from the parent protein (Schueren and Thoms 2016). While PhyloSF detects the evolutionary signature of a peptide extension that serves some function, it alone does not show that this function is different from that of the parent protein. However, the fact that leaky stop codon contexts have been conserved for most of our readthrough candidates provides evidence that the extended protein is functionally different from the parent; indeed, if the two were functionally identical then why would there be selective pressure to preserve readthrough and to keep the extension functional? While these evolutionary signatures provide evidence for Functional Translational Readthrough in evolutionary time, only experimental validation can determine if readthrough of any particular transcript has continued to the present (though SNV data from the nopheles gambiae 1000 genomes project provide evidence for this in the aggregate). Furthermore, while we have evidence that the extended proteins do have functions distinct from the parent, we do not know what their functions are. For these reasons we have referred to them as readthrough candidates. Supplement Page 1
2 Supplemental Text S2: Estimate for number of orthologous readthrough pairs due to convergent evolution We estimated that the number of pairs of orthologous readthrough stop codons descended from a nonreadthrough ancestral stop codon that have become readthrough independently in the two clades through convergent evolution as roughly the product of the number, N, of non-readthrough ancestral stop codons that have descendent stop codons in the two clades for which we can detect orthology, times the probability, P, that in both clades a non-readthrough ancestral stop codon will have developed readthrough detectable by our process. We estimate N to be around 7188 by taking the number of stop codons in. gambiae in genes that have a Diptera-level OrthoDB ortholog in D. melanogaster, 8562, times the fraction of. gambiae readthrough stop codons having an orthologous gene but not found by orthology, that also have an end-orthologous transcript in D. melanogaster, 0.853, minus the number of orthologous readthrough pairs, 115, an estimate for the number to be excluded due to being readthrough in the ancestor. This somewhat overestimates N, because it estimates the number of pairs that are end-orthologous rather than the number satisfying the stricter condition of being stop-orthologous, and because when we subtract orthologous readthrough pairs we ignore the fact that some of our ambiguous readthrough could actually be readthrough. lthough it subtracts an estimate for the number of ancestral readthrough stop codons using orthologous readthrough pairs, which treats convergent evolution cases as ancestral, the result is little changed if we do not make this subtraction. To detect readthrough in both species, we need to detect it in one or the other species without using orthology, but can then use orthology to detect it in the other. We can estimate the probability that a stop codon that was not readthrough in the ancestor has developed readthrough in one species, and that we can detect this readthrough without orthology, by counting the number of non-ancient readthrough candidates in that species (all of which were found without using orthology) and dividing by the total number of stop codons in that species minus the number of ancient readthrough candidates. We can estimate the probability that the orthologous stop codon is readthrough in the other species and that we will be able to detect it, possibly using orthology, using our estimate for the number of readthrough stop codons in each species for which the PhyloSF-Ψ Emp score of the second ORF is positive, since that is roughly the criteria we used to determine if a stop codon is readthrough once we know it is stop orthologous to a readthrough stop codon in the other species, minus the number of ancient readthrough regions having positive PhyloSF-Ψ Emp, divided by the total number of stop codons in that species minus the number of ancient readthrough candidates. arrying out these estimates we find that the probability that a non-readthrough ancestral stop codon will have become readthrough in both clades is approximately However, while this calculation accounts for the fact that detection of readthrough in one species helps us detect readthrough in the other, it does not take into account that some genes, such as longer ones, could have a greater tendency to become readthrough, in both species, than others. ccounting for the distributions of first ORF lengths of readthrough candidates and of all other genes increases the probability that a non-readthrough ancestral stop codon will have developed readthrough in both clades by about 30%, which gives us an estimate for P of Supplement Page 2
3 ombining these estimates of N and P, we estimate that the number of pairs of orthologous readthrough stop codons that have become readthrough independently from a non-readthrough ancestral stop codon through convergent evolution in the two clades is around 7188 * = caveat is that there could be unknown confounders other than first ORF length that make an ancestral gene more likely to develop readthrough, though such confounders would have be to considerably more influential than first ORF length to change the result much in comparison to the total number of ancient readthrough regions. Supplement Page 3
4 Supplemental Text S3: Z curve test provides an underestimate for the number of readthrough regions There are several reasons that the results of our Z curve test is likely to be an underestimate for the actual number of functional readthrough regions. First, the test only counts readthrough regions at least 10 codons long and that have Z curve score greater than 0. s discussed in the main text, using our PhyloSF test we found evidence that more than half of functional readthrough regions are less than 10 codons long. Furthermore, only about half of the readthrough regions of our readthrough candidates have Z curve score > 0 (Supplemental Figure S5,D). Since the readthrough regions of our readthrough candidates are the ones having the highest coding potential as measured by PhyloSF, we would expect them also to have higher Z curve scores than most of the other readthrough regions, so probably fewer than half of functional readthrough regions have Z curve score > 0. ombining these results, we find that the estimate provided by our test is probably less than 25% of the actual number of annotated stop codons that have functional readthrough. Second, our test only looks at annotated stop codons, which could be considerably fewer than the number of actual stop codons in species having incomplete annotations. Third, there are several factors that could inflate the counts of second ORFs having positive Z curve scores in frames 1 and 2 but not frame 0, which would decrease any frame-0 excess and thus our estimate. When an annotated stop codon lies within a coding region in another frame, the second ORF of that stop codon in the other frame is likely to have a high Z curve score, which would inflate the count in frame 1 or 2; we exclude any stop codons that are within annotated coding regions in another frame, but stop codons within unannotated coding regions in other frames would bias our readthrough estimate downward. lso, while our estimate corrects for recent nonsense substitutions and sequencing errors that read a sense codon as a stop codon, which would inflate the count in frame 0, we do not correct for recent indels and sequencing indels, which would inflate the counts in frames 1 and 2 and bias our readthrough estimate downward. Supplement Page 4
5 T- T- T- T-T T-T T- T-T T- T- T- T- T- Readthrough,.gam. Readthrough, D.mel. Non-readthrough,.gam. Non-readthrough, D.mel. 20% 10% 0% 10% 20% 30% 40% Fraction of stop codons with specified context Supplemental Figure S1. Stop codon contexts in. gambiae and D. melanogaster. sage of stop codon context (stop codon and subsequent base) sorted in order of decreasing frequency among the 12,058 non-readthrough, nonmitochondrial, annotated. gambiae stop codons (dark blue), with less frequent stop codons (top, e.g., T-) experimentally associated with translational leakage in other species, and most frequent (bottom, e.g., T-) associated with efficient termination. Error bars show standard error of mean. ontext frequencies for. gambiae readthrough candidates (red) are almost the opposite of those of non-readthrough transcripts, suggesting a preference for leaky context, with 36% using T- and almost none using T-. Frequencies shown are for our unbiased by stop codon subset of readthrough candidates and exclude double-stop readthrough candidates. The frequencies for both readthrough and nonreadthrough stop codons are similar to those for D. melanogaster (pink and light blue, respectively). Supplement Page 5
6 Protein Length (mino cids) gam. RT.gam. NonRT D.mel. RT D.mel. NonRT B Number of DS exons gam. RT.gam. NonRT D.mel. RT D.mel. NonRT Mean length of an intron (NT) gam. RT.gam. NonRT D.mel. RT D.mel. NonRT D Nonreadthrough Ortholog Number of DS exons of RT NonRT pairs.gam. RT D.mel. RT E Nonreadthrough Ortholog Mean length of an intron of RT NonRT pairs.gam. RT D.mel. RT Readthrough andidate 5e+01 5e+02 5e+03 5e+04 Readthrough andidate Number of DS exons RT paired with NonRT NonRT paired with RT ll NonRT that have an ortholog Mean length of an intron (NT) RT paired with NonRT NonRT paired with RT ll NonRT that have an ortholog Supplemental Figure S2. Readthrough genes are longer than non-readthrough. (-) Box plots for lengths of unextended proteins (), number of exons in the coding portion of the transcript (B), and average length of an intron in each transcript having at least one intron (), for. gambiae readthrough candidates (red),. ambia non-readthrough transcripts (green), D. melanogaster readthrough candidates (orange), and D. melanogaster non-readthrough transcripts (gray). By all three measures, readthrough candidates are significantly longer than non-readthrough transcripts, in both species. (D,E) pper panels are scatter plots showing number of coding exons (D) and average intron length (E) for each. gambiae readthrough candidate orthologous to a non-readthrough D. melanogaster transcript versus the corresponding value for its non-readthrough ortholog (black dots) and similarly for D. melanogaster readthrough candidates orthologous to nonreadthrough. gambiae transcripts (red dots). Lower panels are box plots showing number of coding exons (D) and average intron length (E) of those readthrough candidates in either species that are orthologous to a non-readthrough transcript in the other (red), the corresponding lengths of the paired non-readthrough transcripts (green), and the lengths of all non-readthrough transcripts in genes that have orthologs in the other species (orange). nlike the similar comparison for protein lengths shown in Figure 3E, there is no clear relationship among the number of exons or average intron length of readthrough candidates, their non-readthrough orthologs, and other non-readthrough transcripts. Supplement Page 6
7 P R FBtr These parts Homologous appear stem loops similar P R FBtr P R FBtr P R FBtr Supplemental Figure S3. RN structures for readthrough-readthrough pairs. onserved, stable, RN structures predicted by RNz in the 100-nt regions downstream from (and including) candidate readthrough stop codons for the four stoporthologous pairs of readthrough candidates in. gambiae (left) and D. melanogaster (right) for which a structure was predicted in both. There is clear homology between stem loops near the 5 ends of the second pair of structures (red ovals). Other than that, we see no obvious similarity between the predicted structures in the two species, offering the possibility that it is the presence of a stable structure that is functional rather than particular features of that structure. Supplement Page 7
8 Readthrough structures and stop codons 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% T T T 20.0% 10.0% 0.0%.gam. Structure.gam. No Structure D.mel. Structure D.mel. No Structure Supplemental Figure S4. Stop codon usage in candidates having predicted structures. Frequencies of usage of T (blue), T (red), and T (green) first stop codons among. gambiae readthrough candidates having predicted structures (first group),. gambiae readthrough candidates lacking predicted structures (second group), D. melanogaster readthrough candidates having predicted structures (third group), and D. melanogaster readthrough candidates lacking predicted structures (fourth group), with error bars showing standard error of mean. lthough most readthrough candidates in both species use T, readthrough candidates having structures in D. melanogaster show a preference for T, prompting speculation that a leaky stop codon context might not be necessary for readthrough in the presence of an RN structure. However, in. gambiae there is only a small and not-statistically significant depletion for T stop codons among readthrough candidates having predicted structures. Supplement Page 8
9 .gam. Readthrough andidates B D.mel. Readthrough andidates Readthrough Regions 1st ORF 3' ends 3rd ORFs Readthrough Regions 1st ORF 3' ends 3rd ORFs PhyloSF Per odon PhyloSF Per odon.gam. Readthrough andidates D D.mel. Readthrough andidates 2nd ORFs 1st ORF 3' ends 3rd ORFs 2nd ORFs 1st ORF 3' ends 3rd ORFs Z urve score Z urve score E RT RT pair 2nd ORF codons D.mel gam. Supplemental Figure S5. Readthrough regions under less constraint than other coding regions. (-D) Distribution of coding potential as measured by PhyloSF score per codon ( and B) or Z curve score ( and D) of readthrough regions (red), same-sized coding regions at the 3 end of the first ORF of readthrough candidates (black), and noncoding third ORFs of readthrough candidates (cyan), for. gambiae ( and ) and D. melanogaster (B and D). Double readthrough candidates and readthrough candidates whose third ORF is 0-length or contains degenerate bases have been excluded. In all four cases, coding potential of readthrough regions tends to be intermediate between that of noncoding regions and other coding regions, suggesting that they have been under some purifying selection at the amino acid level within each clade, but less so than other coding regions. (E) Log-scale scatter plot showing the lengths in codons of the readthrough regions in. gambiae and D. melanogaster for each pair of stop-orthologous readthrough candidates. lthough for many pairs the lengths are similar in the two species (circles along diagonal line), there are also many pairs for which the lengths are quite different, suggesting that in many cases readthrough regions have remained functional despite large changes in length. Supplement Page 9
10 gamp3_aa * Q L S T P S K S K L * gamp3 T TT T T T T T T T gams1 T TT T T T T T T T gamm1 T TT T T T T T T T merm1 T TT T T T T T T T arad1 T TT T T T T T T T quas1 T TT T T T T T T T mel1 T TT T T T T T T T chr1 T TT T T T T T T T T epie1 T TT T T T TT T T T minm1 T TT T T T TT T T T T cul1 T TT T T T TT T T T T funf1 T TT T T T TT T T T T stes1 T TT T T T TT T T T T stei2 T TT T T T TT T T T T macm1 T TT T T T TT T T T T farf1 T TT T T T T T T T T dirw1 T TT T T T T T T T T sins1 T TT T T T T T T T T T atre1 T TT T T T T T T T T dar2 T TT T T T T T T T T T albs1 T TT T T T T T T T T T P B dmel_aa * Q L N P K S K L * dmel T TT T T T TT T dsim T TT T T T TT T dsec T TT T T T TT T dyak T TT T T T TT T dere T TT T T T TT T deug T TT T T T TT T dbia T TT T T T TT T dtak T TT T T T TT T dfic T TT T T T TT T dele T TT T T T TT T drho T TT T T T TT T dkik T TT T T T TT T dana T TT T T T TT T dbip T TT T T T TT T dper T TT T T T TT T dpse T TT T T T TT T dwil T TT T T T TT T dmoj T TT T T T T T dvir T TT T T T TT T dgri T TT T T T TT T 1969 dmel_aa * V Q H H P Y L Y Y P P E D D P Y Y N S R M * dmel T T T T TT T T TT T TT T--- T T T T dsim T T T T T TT T TT T TT T--- T T T T dsec T T T T T TT T TT T T TT T--- T T T T dyak T T T T T TT T TT T T TT T--- T T T T dere T T T T T TT T TT T T TT T--- T T T T deug T T T T T TT T TT T T T T T--- T T T T dbia T T T T T TT T TT T T T T--- T T T T dtak T T T T TT TT T TT T T T--- T T T T T dfic T T T T T T T TT T T T T--- T T T T dele T T T T T T T T T T T--- T T T T drho T T T T T T T TT T T T--- T T T T dkik T T T T T TT T T T T T--- T T T dana T T T T T TT T T T T TT--- T T T dbip T T T T T TT T TT T T TT--- T T dper T T T T T T TT T T --- T TT T T--- T T T dpse T T T T T T TT T T --- T TT T T--- T T T dwil T T T-- -T T TT T T TT --- T --- T TT TT T T T T dmoj T T T-- - TT TT T TT T --- TT T T TT T T T--- T T T dvir T T T-- - TT TTT T TT T --- TTT T T TT T T T--- T T Tsp86D Supplemental Figure S6. Peroxisomal targeting signals. lignments of readthrough regions of. gambiae gene P among 21 nopheles genomes (), its D. melanogaster ortholog, 1969, among 20 Drosophila genomes (B), and D. melanogaster gene Tsp86D (). predicted peroxisomal targeting signal in the final 12 amino acids of the extension of P (yellow highlighting) has been conserved across all 21 mosquitoes and 20 flies, despite the presence of several radical amino acid substitutions within the nopheles lineage (red highlighting) and two amino acid insertions or deletions between the two clades (red circles). The predicted peroxisomal targeting signal in Tsp86D is conserved as far as D. kikkawai but not in D. ananassae or beyond. Supplement Page 10
11 gamp3_aa * V E L F R T R L P S S F V V S L N P L T L R P W W W W H T S Q V Q * N R N R N R F F gamp3 T T T TT T T T TT T TT TT T TT T T TT T T T T T T T T T TTT T TT gams1 T T T TT T T T TT T TT TT T TTT T T TT T T T T T T T T T TTT T TT gamm1 T T T TT T T T TT T TT TT T TT T T TT T T T T T T T T T TTT T TT merm1 T T T TT T T T TT T TT TT T TTT T T TT T T T T T T T T T TTT T TT arad1 T T T TT T T T TT T TT TT T T T T TT T T T T T T T T T TTT T TT quas1 T T T TT T T T TT T TT TT T TT T T TT T T T T T T T T T TTT T TT mel1 T T T TT T T T TT T TT TT T TTT T T TT T T T T T T T T T T TTT T TT chr1 T T T TT T T T TT T TT TT T TTT T T TT T T T --- T T T T - -- T TT- T TT epie1 T T T TT T T T TT T TT TT T TTT T T TT T T T T --- T T T T TT TT T- T T minm1 T T T T T T T TT T TT TT T TTT T T TT TT T T T T T T T - T T TT T TT cul1 T T T T T T T TT T TT TT T TTT T T TT TT T T T T T - T TT TT T TT funf1 T T T TT T T T TT T TT TT T TTT T T TT TTT T T T T T T - TT --- T TTT T TT stes1 T T T TT T T T TT T T TT T TT T T TT TT T T T T T T T T TT T T TTT T TT stei2 T T T TT T T T TT T T TT T TT T T TT TT T T T T T T T T TT T T TTT T TT macm1 T T T TT T T T TT T TT TT T TT T T TT TT T T T T T T T T TT T T T T TT farf1 T T TT - -- TT T T TT TT T TT T TT T T T T TT T T TT T --- T T TT- T TT dirw1 T T T TT - -- TT TT T T TT T T TT- T TT sins1 T T TT - -- TT T TT TTT T TT T T TT T T -T T T T T - -- T T T T T atre1 T T TT - -- TT T T TTT T TT T T T T T T TT T T T - -- T T T T T T T T TT dar2 T T T TT - -- T T T TT T T - -- T -- T TT T TT T T T - -- TT T T albs1 T T T TT - -- TT T T TT T T - -- TT -- T T TT TT T T T - -- T T T T Supplemental Figure S7. Readthrough candidates identified using other features of coding regions. lignment including readthrough region of. gambiae readthrough candidate P R. lthough the PhyloSF+Stop score of this readthrough region, 10.9, is below our threshold of 17.0, it was included among our list of readthrough candidates because it exhibits other features characteristic of coding regions that are not accounted for by PhyloSF+Stop. In particular, the reading frame is preserved in all species despite the presence of many indels (gray) but degrades soon after the second stop codon (orange) and there are several synonymous substitutions in the second stop codon. (ll insertions relative to. gambiae before the second stop codon are frame preserving but are not shown in order to make the image more compact.) We included 40 second ORFs having PhyloSF+Stop between 5.0 and 17.0 in our list of. gambiae readthrough candidates based on a subjective assessment of these and other features characteristic of protein coding regions. Supplement Page 11
12 dmel_aa Y N S P P R F L * R F R V P D H H Y F S M P F * I N K I R Y N * N * dmel T T TT TTT TT T TTT T T T TT TTT T T TT T T T T T T T dsim T T TT TTT TT T TTT T T T TT TTT T T TT T T T T T T T dsec T T TT TTT TT T TTT T T T TT TTT T T TT T T T T T T T dyak T T TT TTT TT T TTT T T T T TT TTT T T TT T T T T T T T dere T T TT TTT TT T TTT T T T TT TTT T T TT T T- - T T T T T dana TTT T T T T TTT TT T T TT - T TT TT T T TT T- --- T TT T- T dpse TTT TT TTT TT T TT T T T TT TTT T T TT T - -- TT T T TT T dper TTT TT TTT TT T TT T T T TT TTT T T TT T - -- TT T T TT T dwil T T TTT TT T T T T T T TT T TT T T T TT T dvir TT T TT TT TTT TTT T T T T T TT T TT T TT T T T T T dmoj T TT T TTT TT T TT T T T TTT TT TTT T TTT T T T TT T TT T dgri TT T T T TTT TT T TT T T TT TT TTT T TT T T T T TT T TT T gamp3_aa Y H S P P R F L * K R L Q V F P D R S L F V M P F * T K K K K H E T T gamp3 TT T TT TT T T TT TT T T T T TT T T T TT T T gams1 TT T TT TT T T TT TT T T T T TT T T T TT T -- gamm1 TT T TT TT T T TT TT T T T T TT T T T TT T - T merm1 TT T TT TT T T TT TT T T T T TT T T T TT T --- T arad1 TT T TT TT T T TT TT T T T T TT T T T TT T T quas1 TT T TT TT T T T TT T T T T TT T T T TT T T mel1 TT T TT TT T T TT TT T T T T TT T T T TT T. T chr1 TT T TT TT T T TT TTT T T T T TT TT T T TT T T epie1 TT T TT TT T T TT TT T T T T TT T T T TT T T T minm1 TT T TT TT T T T TT TT T T T TT T T T TT T T cul1 TT T TT TT T T T TT TT T T T TT T T T TT T funf1 TT T TT TT T T TT TT T T T TT T T T TT T stes1 TT T TT TT T T TT TT T T T T TT T T T TT T -- - T stei2 TT T TT TT T T TT TT T T T T TT T T T TT T -- - T macm1 TT T TT TT T TT T TT T T T T TT T T T TT T T farf1 TT T TT TT T T TT TTT T T T T TT T T T TT T dirw1 TT T TT TT T T TT TTT T T T T TT T T T TT T sins1 TT TT TT TT T TT T TTT T T T T TT T T T TT T atre1 TT TT TT TT T TT T TTT T T T T TT T T T TT T dar2 TT T TT TT T T T TTT T T T T TT T T T TT T albs1 TT T TT TT T T T TTT T T T T TT T T T TT T Supplemental Figure S8. Readthrough identified using orthology. lignment of the readthrough region of D. melanogaster readthrough candidate FBtr (upper panel) identified using orthology to. gambiae readthrough candidate P R (lower panel), with many matching amino acids both within the readthrough region and before the first stop codon (yellow highlighting). FBtr had not been previously identified as readthrough, but satisfied our criteria for readthrough given orthology to a readthrough candidate in the other clade. In order to facilitate cross-clade comparisons, we identified 51 D. melanogaster and 21. gambiae readthrough candidates in this way. Supplement Page 12
13 enomes enome Browser Tools Mirrors Downloads My Data View Help bout s S enome Browser on D. melanogaster pr (BDP R5/dm3) move <<< << < > >> >>> zoom in 1.5x 3x 10x base zoom out 1.5x 3x 10x 10 B chr3r:16,505,520-16,535,237 29,718 bp. enter position, gene symbol or search terms ncient Paired ncient npaired Double-stop RT Orthologs Paralogs Orthologs Orthologs 109.gam.... P R P R P R D.mel. FBtr dmel_aa Y V * Y E T P K S T P L T P D P E N E S Y L E F S T * T F P dmel T T T TT T T TT T T T T T TT TTT T TT T dsim T T T TT T T TT T T T T TT TTT T TT T dsec T T T TT T T TT T T T T TT TTT T TT T dyak T T T TT T T TT T T T TT TTT T T T dere T T T TT T TT T T T T TT TTT T T T dana T T T TT T T T T TT TTT T dpse TT T T TT T T T T TT T TTT T T dper TT T T TT T T T T TT T TTT T T dwil TT T T TT T T T T T T T T T T T T TT T TTT T T T dvir TT T T TT -- - T --- T T T T TT TTT T T dmoj T T T TT -- - T --- T T T T TT TTT T T T dgri TT T T TT -- - T --- T T T T TT TTT T T gamp3_aa Y V * * R V T R V R I V H H H P V R I T R V I R F * E E S P S S S gamp3 T T T T T T T T T T TT T TT TTT T T T T T gams1 T T T T T T T T T T TT T TT TTT T T gamm1 T T T T T T T T T T TT T TT TTT T T merm1 T T T T T T T T T T TT T TT TTT T T arad1 T T T T T T T T T T TT T TT TTT T T quas1 T T T T T T T T T T TT T TT TTT T T mel1 T T T T T T T T T T TT T TT TTT T T chr1 T T T T T T T T - -- T T TT T TT TTT T epie1 T T T T T T T T T T TT T TT TTT T T minm1 T T T T T T T TT - -- T T T TT T TT TTT T cul1 T T T T T T T TT - -- T T T TT T TT TTT T funf1 T T T T T T T TT - -- T T T TT T TT TTT T stes1 T T T T T T T T T - -- T T T TT T TT TTT T stei2 T T T T T T T T T - -- T T T TT T TT TTT T macm1 T T T T T T T T T - -- T T TT T TT TTT T farf1 T T T T T T T T - --T T T T T TT TTT T dirw1 T T T T T T T T - --T T T TT T TT TTT T sins1 T T T T T T T T - --T T T TT T TT TTT T atre1 T T T T T T T T - --T T T TT T TT TTT T dar2 T T T T T T T TT T T T T TT TTT T T albs1 T T T T T T T TT T T TT T TT TTT T T P R P R P RB P RB P R P R P R FBtr FBtr FBtr FBtr FBtr FBtr FBtr D dmel_aa D N D L * * Q E E L E P L R N S R T I I S L I D L * V Q * T L R K ancestor T TT T T T T T T T T TT T T T T TT T T T T dmel TT T T T T T T T T T TT TT T T T T T T T T dsim TT T T T T T T T T T TT TT T T TT T T T T T dsec TT T T T T T T T T T TT TT T T T T T T T T dyak TT T T TT T T T T T T TT TT T T T T T T T T dere TT T T T T T T T T T TT TT T T T T T T T T T dana T T T T T T T T T T T T TT T T T T T T --- TT -- dpse TT T T T TT- -- TT T T T T T TT T T T T T T dper TT T T T TT- -- TT T T T T T TT T T T T T T dwil T TT T T T T T T T T T T T TT T T T T TT T T TT T dvir T T TT T T T T T T T TT T T T T TT T T T T T dmoj T TT T T T T T T T TT T T T T T dgri T TT T T T T T T T TT T T T T TT T. gamp3_aa F D L * * P S T E S V Y L R K P D L E R F L I H L * R Q E D L ancestor TT T TT T T T T T T T T TT TT T T T T T gamp3 TTT TT TT T T T T T TT T T TT TT T T T T T gams1 TTT TT TT T T T T T TT T T TT TT T T T T T gamm1 TTT TT TT T T T T T TT T T TT TT T T T T T merm1 TTT TT TT T T T T T T TT T T TT TT T T T T T arad1 TTT TT TT T T T T T TT T T TT TT T T T T T quas1 TTT TT TT T T T T T TT T T TT TT T T T T T mel1 TTT TT TT T T T T T TT T T TT TT T T T T T chr1 TT T TT TT T T T T T T TT T T TT TT TT T T T T epie1 TT TT TT T T T T T T T TT T T TT TT T T T T T minm1 TTT TT TT T T T T T T TT TT T TTT TT T T T T T cul1 TTT TT TT T T T T T T T TT T TTT TT T T T T T funf1 TT TT TT T T T T T T TT T T TTT TT T T T T T stes1 TT TT TT T T T T T T TT T T TTT TT T T T T T stei2 TT TT TT T T T T T T TT T T TTT TT T T T T T macm1 TTT T TT TT T T T T T T TT T T T TTT TT T T T T T farf1 TT T TT T T T T T T T T TT T T T T T dirw1 TT T TT T T T T T T T T TT T T T T T sins1 TT T TT T T T T T T T T TT TT T T T T T T atre1 TT T TT T T T T T T T T T TT T T T T T T T T dar2 TTT T TT T T T T T T T T TT TT T T T T albs1 TT T TT T T T T T T T T TT TT T T T T Supplement Page 13
14 Supplemental Figure S9. ncient readthrough pairing. () Pairing of ancient readthrough candidates in. gambiae and D. melanogaster. We defined a readthrough candidate to be ancient if it is stop-orthologous to a readthrough candidate in the other species at the Diptera level and is not a double-stop readthrough candidate. In most cases (109) there is a one-toone pairing of orthologous ancient readthrough candidates (green boxes). However, in two cases there are two paralogous. gambiae readthrough candidates orthologous to the same D. melanogaster candidate, so we excluded one of the paralogs (white boxes) from analyses that required a one-to-one pairing. There is one case of four homologous readthrough candidates, however we were able to determine that these split into two orthologous pairs (panel B). There are three D. melanogaster readthrough candidates orthologous to double-stop readthrough candidates in. gambiae (blue rectangles), so these were excluded from analyses that required a one-to-one pairing of non-double-stop readthrough candidates (panel ). Finally, there is one pair of orthologous double-stop readthrough candidates, which were excluded from most of our analyses (panel D). (B) Four-way homologous readthrough candidates. Two. gambiae readthrough candidates and two D. melanogaster readthrough candidates, all homologous at the Diptera level, displayed using the S genome browser. In each species, the first ORFs of the two readthrough candidate transcripts are four-exon alternative splice variants of a single gene, with the first three exons shared. The downstream fourth exons in the two species are more similar to each other than either is to the alternative exons, and the same is true of the upstream fourth exons, suggesting that these two genes are descended from a gene in the common ancestor having a similar configuration, and that the alternative final exons were formed by an earlier duplication of a final exon containing a readthrough stop codon. () Double-stop readthrough orthologous to single-stop readthrough. lignments including the readthrough regions of D. melanogaster readthrough candidate FBtr and. gambiae double-stop readthrough candidate P RB. It is possible that the second T stop codon in the double first stop codon of P RB is related by a single nucleotide substitution to the TT tyrosine codon immediately 3 of the first stop codon in FBtr We note that although there is very little cross-clade similarity between the readthrough regions at the amino acid level, within each of the two clades there are several alignment columns immediately after the first stop codon and near the end of the readthrough region that have no synonymous substitutions, suggesting that there might be some overlapping constraint at the nucleotide level in addition to any constraint on the amino acid sequences. (D) Orthologous double-stop readthrough candidates. lignments including the readthrough regions of orthologous double-stop readthrough candidates FBtr and P R. The common double-t stop codon suggests that these might be descended from a double-stop readthrough transcript in the common ancestor. Supplement Page 14
15 gamp3_aa D S L Y K R * L L R V R S L F S L * * Q R E L * V L W R gamp3 T T T T T TT T T T TT T T T T --- T T TT TT T gams1 T T T T T TT T T T TT T T T T --- T T T TT T T gamm1 T T T T T TT T T T TT T T T T --- T T T TT T T merm1 T T T T T TT T T T TT T T T T --- T T T TT T T arad1 T T T T T TT T T T TT T T T T --- T T TT TT T quas1 T T T T T TT T T T TT T T T T --- T T T TT T T mel1 T T T T T TT T T T TT T T T T --- T T T TT T T chr1 T T T T T TT T T T TT T T T T --- T T T T TT T T T epie1 T T T T T TT T T T TT T T T --- T T T minm1 T T T T T TT T T T TT T T T T --- T T T cul1 T T T T T TT T T T T TT T T T T --- T T T funf1 T T T T T TT T TT T T TT T T T T --- T T T stes1 T T T T T TT T T T TT T T T T --- T T T T T T stei2 T T T T T TT T T T TT T T T T --- T T T T T T macm1 T T T T T TT T T T TT T T T T --- T T T T. T farf1 T T T T T T TT T T T TT T T T T T T dirw1 T T T T T TT T T T T TT T T T T --- T T T sins1 T T T T T TT T T T TTT T T T T T T T atre1 T T T T T T TT TT T T TTT T T T T T T T dar2 T T T T T T T T T T T TT T T T T T T albs1 T T T T T T T T T T T TT T T T T T T B gamp3_aa I D L L * K T K * * Q L L K R R F * N D T H R R gamp3 T T TT TT T T T T T T TTT T T T gams1 T T TT TT T T T T T T TTT T T T merm1 T T TT TT T T T T TT T TTT T T T arad1 T T TT TT T T T T T T TTT T T T quas1 T T TT TT T T T T TT T TTT T TT T mel1 T T TT TT T T T T TT T TTT T T T chr1 TT T T TT T TT T T TT T TTT T TTT TT T epie1 TT T TT TT T TT T T TT T T TTT T T-- T T minm1 TT T TT TT T TT T T TT T TTT T T- --T cul1 TT T TT TT T TT T T TT T TTT T T- --T funf1 TT T TT TT T TT T T TT T TTT T TT- T--T TT stes1 T T TT TT T TT T T TT T T TTT T TT TT T stei2 T T TT TT T TT T T TT T T TTT T TT TT T farf1 T T TT TT T TT T T TTT T T T dirw1 T T T TT TT T T T T TT T TT T T T-- --T sins1 T T T TT T T T T T T TT T T --- T-- T atre1 T T T TT T T T T T T TTT T T T -- T dar2 T T T T TT T TT T TT T T TT T -TT - -- T T albs1 T T T TT T TT T T T T TT T Supplemental Figure S10. Single-double readthrough. lignments showing triple readthrough for P R () and P R (B), in which a single readthrough is followed by a double-stop readthrough in some species. In P R, the second T stop codon in the. gambiae double stop codon is aligned to a likely-ancestral T ysteine codon in several species, though the presence of indels makes the history of the particular codon uncertain. In P R the second T stop codon in the double stop codon appears to be ancestral, and has become a TT Leucine codon in. darlingi. Supplement Page 15
16 Percent of SNVs 80% 70% 60% 50% 40% 30% 20% SNVs synonymous in frame Frame0 Frame1 Frame2 10% 0% Before 1st Stop In RT Region <er 2nd Stop B umulative Distribution SNVs Before 1st Stop Non synonymous Synonymous umulative Distribution SNVs In RT Region Non synonymous Synonymous umulative Distribution SNVs fter 2nd Stop Non synonymous Synonymous Derived llele Frequency Derived llele Frequency Derived llele Frequency Supplemental Figure S11. Polymorphism evidence supports recent protein-coding selection. () Single nucleotide variants (SNVs) in. gambiae show a strong bias toward synonymous codon changes in readthrough regions (middle) and same-sized coding regions 5 of readthrough stop codons (left), but not in same-sized noncoding regions 3 of second stop codons of readthrough candidates (right), providing evidence that readthrough regions are under purifying selection at the amino acid level within the. gambiae population. For each type of region we show the fraction of SNVs that would be synonymous if translated in each of three frames, with frame 0 matching the translated frame of the coding region of the gene. Error bars show the Standard Error of the Mean (SEM). (B) umulative distributions of derived allele frequencies for SNVs that would be synonymous (red) or non-synonymous (black) if translated in the frame of the coding region of the gene, for the same three sets of regions. Derived allele frequencies are lower for non-synonymous SNVs than for synonymous ones, in both coding and readthrough regions, indicating that they are under greater purifying selection, whereas in noncoding regions there is no significant difference, providing further evidence that purifying selection at the amino acid level in readthrough regions has continued in the. gambiae population. Supplement Page 16
17 .gam. 5 codon regions B.gam. 10 codon regions PsiEmp coding Psi normal coding PsiEmp noncoding Psi normal noncoding PsiEmp coding Psi normal coding PsiEmp noncoding Psi normal noncoding Density PhyloSF scores corresponding to PhyloSF-Ψ EMP = 17 and PhyloSF-Ψ = 17. Density Normal approxima8on overes8mates density in right tail of noncoding distribu8on Raw PhyloSF Score Raw PhyloSF Score.gam. 20 codon regions D.gam. 40 codon regions PsiEmp coding Empirical coding Psi normal coding PsiEmp noncoding Empirical noncoding Psi normal noncoding PsiEmp coding Empirical coding Psi normal coding PsiEmp noncoding Empirical noncoding Psi normal noncoding Density Density Small sample size makes tail of empirical distribu8on unreliable for regions >> 10 codons Raw PhyloSF Score Raw PhyloSF Score Supplement Page 17
18 E D.mel. 5 codon regions F D.mel. 10 codon regions PsiEmp coding Psi normal coding PsiEmp noncoding Psi normal noncoding PsiEmp coding Psi normal coding PsiEmp noncoding Psi normal noncoding Density Density Raw PhyloSF Score Raw PhyloSF Score D.mel. 20 codon regions H D.mel. 40 codon regions PsiEmp coding Empirical coding Psi normal coding PsiEmp noncoding Empirical noncoding Psi normal noncoding PsiEmp coding Empirical coding Psi normal coding PsiEmp noncoding Empirical noncoding Psi normal noncoding Density Density Raw PhyloSF Score Raw PhyloSF Score Supplemental Figure S12. Empirical score distributions allow PhyloSF-Ψ Emp to achieve higher sensitivity than PhyloSF-Ψ while maintaining high specificity. oding (black) and noncoding (red) PhyloSF score distributions used to define PhyloSF-Ψ Emp for. gambiae (-D) and D. melanogaster (E-H), and normal approximations used to define PhyloSF-Ψ (magenta and orange dashed lines), for regions of lengths 5, 10, 20, and 40 codons. For lengths less than or equal to 10, the distributions used to define PhyloSF-Ψ Emp were calculated directly from training regions of that length, whereas for greater lengths PhyloSF-Ψ Emp was defined by scaling the distributions for regions of length 10. For lengths greater than 10 we show the actual distributions of coding (green) and noncoding (cyan) training regions of those lengths for reference, even though they were not used in calculating PhyloSF-Ψ Emp. The bump in the right tail of the cyan curve for. gambiae 40-codon regions (D) is presumably a result of sampling error due to the small number of noncoding training regions of that length; such sampling errors are the reason that we scaled the length-10 distribution rather than interpolating densities through scores of actual regions of greater lengths. PhyloSF scores for regions of each length for which PhyloSF-Ψ Emp and PhyloSF-Ψ are equal to our score threshold of 17 (solid and dashed vertical blue lines, respectively) are the scores that are approximately 50 times more likely to occur in coding regions than noncoding regions. Because the normal approximations overestimate the densities in the right tail of the noncoding score distributions, PhyloSF-Ψ overestimates the PhyloSF score needed to achieve this high specificity; by using the more accurate empirical densities, PhyloSF-Ψ Emp allows us to detect coding regions having lower PhyloSF score, achieving greater sensitivity while maintaining this high specificity. Supplement Page 18
19 Supplement Page 19 Supplemental Figure S13. Score distribution scale factors used for definition of PhyloSF-Ψ Emp. Mean PhyloSF scores of coding (black circles) and noncoding (red circles) training regions of each length used to define PhyloSF-Ψ Emp for. gambiae () and D. melanogaster (B) and intervals one standard deviation above and below (black and red vertical lines). For lengths greater than 10 codons, PhyloSF-Ψ Emp was defined by scaling the distributions for regions of length 10 codons using the means and standard deviations of the scores of training regions of each length up to 60 codons, and an extrapolation using linear regression on the means and the logs of the standard deviations of regions of length from 30 to 60 codons for regions greater than 60 codons (green and cyan curves) Region length (codons) Raw PhyloSF score PhyloSF PsiEmp scale factors for D.mel mean + stdev for coding training regions mean + stdev for noncoding training regions extrapolation for longer coding regions extrapolation for longer noncoding regions Region length (codons) Raw PhyloSF score PhyloSF PsiEmp scale factors for.gam mean + stdev for coding training regions mean + stdev for noncoding training regions extrapolation for longer coding regions extrapolation for longer noncoding regions B
C3020 Molecular Evolution. Exercises #3: Phylogenetics
C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from
More informationLecture 17. Comparative genomics I: Genome annotation using evolutionary signatures
6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution Lecture 17 Comparative genomics I: Genome annotation using evolutionary signatures 1 Module V: Comparative genomics and evolution
More informationSUPPLEMENTARY INFORMATION
med!1,2 Wild-type (N2) end!3 elt!2 5 1 15 Time (minutes) 5 1 15 Time (minutes) med!1,2 end!3 5 1 15 Time (minutes) elt!2 5 1 15 Time (minutes) Supplementary Figure 1: Number of med-1,2, end-3, end-1 and
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More informationNewly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:
m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail
More informationThe Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11
The Eukaryotic Genome and Its Expression Lecture Series 11 The Eukaryotic Genome and Its Expression A. The Eukaryotic Genome B. Repetitive Sequences (rem: teleomeres) C. The Structures of Protein-Coding
More informationAnnotation of Drosophila grimashawi Contig12
Annotation of Drosophila grimashawi Contig12 Marshall Strother April 27, 2009 Contents 1 Overview 3 2 Genes 3 2.1 Genscan Feature 12.4............................................. 3 2.1.1 Genome Browser:
More informationBio 1B Lecture Outline (please print and bring along) Fall, 2007
Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationCryo-EM data collection, refinement and validation statistics
1 Table S1 Cryo-EM data collection, refinement and validation statistics Data collection and processing CPSF-160 WDR33 (EMDB-7114) (PDB 6BM0) CPSF-160 WDR33 (EMDB-7113) (PDB 6BLY) CPSF-160 WDR33 CPSF-30
More informationEvolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila
biorxiv preprint first posted online May. 3, 2016; doi: http://dx.doi.org/10.1101/051557. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved.
More informationGene Family Evolution across 12 Drosophila Genomes
Gene Family Evolution across 12 Drosophila Genomes Matthew W. Hahn 1,2*, Mira V. Han 2, Sang-Gook Han 2 1 Department of Biology, Indiana University, Bloomington, Indiana, United States of America, 2 School
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationChromosomal Rearrangement Inferred From Comparisons of 12 Drosophila Genomes
Copyright Ó 2008 by the Genetics Society of America DOI: 10.1534/genetics.107.086108 Chromosomal Rearrangement Inferred From Comparisons of 12 Drosophila Genomes Arjun Bhutkar,*,,1 Stephen W. Schaeffer,
More informationEukaryotic vs. Prokaryotic genes
BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 18: Eukaryotic genes http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Eukaryotic vs. Prokaryotic genes Like in prokaryotes,
More informationSequence Alignment (chapter 6)
Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:
More informationEnsembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:
Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,
More informationDrosophila melanogaster and D. simulans, two fruit fly species that are nearly
Comparative Genomics: Human versus chimpanzee 1. Introduction The chimpanzee is the closest living relative to humans. The two species are nearly identical in DNA sequence (>98% identity), yet vastly different
More informationExpanded View Figures
Molecular Systems iology Evolutionary divergence of mouse P Mei-Sheng Xiao et al Expanded View Figures Nucleotide percentage (%) 0 20 40 60 80 100 T G Nucleotide percentage (%) 0 20 40 60 80 100 T G Frequency
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationComputational approaches for functional genomics
Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding
More informationSupplementary Information for Discovery and characterization of indel and point mutations
Supplementary Information for Discovery and characterization of indel and point mutations using DeNovoGear Avinash Ramu 1 Michiel J. Noordam 1 Rachel S. Schwartz 2 Arthur Wuster 3 Matthew E. Hurles 3 Reed
More informationCONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018
CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of
More informationobjective functions...
objective functions... COFFEE (Notredame et al. 1998) measures column by column similarity between pairwise and multiple sequence alignments assumes that the pairwise alignments are optimal assumes a set
More informationGEP Annotation Report
GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:
More informationOutline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16
Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection
More informationComparing whole genomes
BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationSEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION
Annu. Rev. Genomics Hum. Genet. 2003. 4:213 35 doi: 10.1146/annurev.genom.4.020303.162528 Copyright c 2003 by Annual Reviews. All rights reserved First published online as a Review in Advance on June 4,
More informationEARLY ONLINE RELEASE. Daniel A. Pollard, Venky N. Iyer, Alan M. Moses, Michael B. Eisen
EARLY ONLINE RELEASE This is a provisional PDF of the author-produced electronic version of a manuscript that has been accepted for publication. Although this article has been peer-reviewed, it was posted
More information(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.
1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the
More informationA PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS
A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology
More informationBasic Local Alignment Search Tool
Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses
More informationExtensive identification and analysis of conserved small ORFs in animals
Mackowiak et al. Genome Biology (2015) 16:179 DOI 10.1186/s13059-015-0742-x RESEARCH Open Access Extensive identification and analysis of conserved small ORFs in animals Sebastian D. Mackowiak 1, Henrik
More informationBackground: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)
Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?
More information1. In most cases, genes code for and it is that
Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod
More informationSequence alignment methods. Pairwise alignment. The universe of biological sequence analysis
he universe of biological sequence analysis Word/pattern recognition- Identification of restriction enzyme cleavage sites Sequence alignment methods PstI he universe of biological sequence analysis - prediction
More informationCRITICA: Coding Region Identification Tool Invoking Comparative Analysis
CRITICA: Coding Region Identification Tool Invoking Comparative Analysis Jonathan H. Badger and Gary J. Olsen Department of Microbiology, University of Illinois Gene recognition is essential to understanding
More informationNature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.
Supplementary Figure 1 Icm/Dot secretion system region I in 41 Legionella species. Homologs of the effector-coding gene lega15 (orange) were found within Icm/Dot region I in 13 Legionella species. In four
More informationPhylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline
Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying
More informationBLAST. Varieties of BLAST
BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database
More informationGenome-wide analysis of the MYB transcription factor superfamily in soybean
Du et al. BMC Plant Biology 2012, 12:106 RESEARCH ARTICLE Open Access Genome-wide analysis of the MYB transcription factor superfamily in soybean Hai Du 1,2,3, Si-Si Yang 1,2, Zhe Liang 4, Bo-Run Feng
More informationName: Monster Synthesis Activity
Name: Monster Synthesis ctivity Purpose: To examine how an organism s DN determines their phenotypes. blem: How can traits on a particular chromosome be determined? How can these traits determine the characteristics
More informationEvolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites
Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites Paper by: James P. Balhoff and Gregory A. Wray Presentation by: Stephanie Lucas Reviewed
More informationPackage vhica. April 5, 2016
Type Package Package vhica April 5, 2016 Title Vertical and Horizontal Inheritance Consistence Analysis Version 0.2.4 Date 2016-04-04 Author Arnaud Le Rouzic Suggests ape, plotrix, parallel, seqinr, gtools
More informationSupplementary text for the section Interactions conserved across species: can one select the conserved interactions?
1 Supporting Information: What Evidence is There for the Homology of Protein-Protein Interactions? Anna C. F. Lewis, Nick S. Jones, Mason A. Porter, Charlotte M. Deane Supplementary text for the section
More informationPyrobayes: an improved base caller for SNP discovery in pyrosequences
Pyrobayes: an improved base caller for SNP discovery in pyrosequences Aaron R Quinlan, Donald A Stewart, Michael P Strömberg & Gábor T Marth Supplementary figures and text: Supplementary Figure 1. The
More informationPairwise & Multiple sequence alignments
Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived
More informationUnlocated Arthropod genes and ways to find them
Unlocated Arthropod genes and ways to find them Many bug genes are hard to find - Daphnia s many tandems were lost for a bit Duplicate genes, a bain and a boon Genome tile expression picks out many more
More informationFigure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and
Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and counterclockwise for the inner row, with green representing coding
More informationMassachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution
Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More informationStudent Handout Fruit Fly Ethomics & Genomics
Student Handout Fruit Fly Ethomics & Genomics Summary of Laboratory Exercise In this laboratory unit, students will connect behavioral phenotypes to their underlying genes and molecules in the model genetic
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationReading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype
Lecture Series 7 From DNA to Protein: Genotype to Phenotype Reading Assignments Read Chapter 7 From DNA to Protein A. Genes and the Synthesis of Polypeptides Genes are made up of DNA and are expressed
More informationGetting statistical significance and Bayesian confidence limits for your hidden Markov model or score-maximizing dynamic programming algorithm,
Getting statistical significance and Bayesian confidence limits for your hidden Markov model or score-maximizing dynamic programming algorithm, with pairwise alignment of sequences as an example 1,2 1
More informationSynteny Portal Documentation
Synteny Portal Documentation Synteny Portal is a web application portal for visualizing, browsing, searching and building synteny blocks. Synteny Portal provides four main web applications: SynCircos,
More informationUnderstanding relationship between homologous sequences
Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationA Method for Aligning RNA Secondary Structures
Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1 Outline Introduction Structural alignment of RN
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationPractical considerations of working with sequencing data
Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!
More information8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage
8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage Chris M. Rands 1, Stephen Meader 1, Chris P. Ponting 1 *, Gerton Lunter 2
More informationIn Genomes, Two Types of Genes
In Genomes, Two Types of Genes Protein-coding: [Start codon] [codon 1] [codon 2] [ ] [Stop codon] + DNA codons translated to amino acids to form a protein Non-coding RNAs (NcRNAs) No consistent patterns
More informationOutline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18
Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection
More informationCHAPTERS 24-25: Evidence for Evolution and Phylogeny
CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology
More informationMegAlign Pro Pairwise Alignment Tutorials
MegAlign Pro Pairwise Alignment Tutorials All demo data for the following tutorials can be found in the MegAlignProAlignments.zip archive here. Tutorial 1: Multiple versus pairwise alignments 1. Extract
More informationHomology and Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Lecture : p he biological problem p lobal alignment p Local alignment p Multiple alignment 6 Background: comparative genomics p Basic question in biology: what properties
More informationLecture Notes: BIOL2007 Molecular Evolution
Lecture Notes: BIOL2007 Molecular Evolution Kanchon Dasmahapatra (k.dasmahapatra@ucl.ac.uk) Introduction By now we all are familiar and understand, or think we understand, how evolution works on traits
More informationBustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #
Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either
More informationExploring Evolution & Bioinformatics
Chapter 6 Exploring Evolution & Bioinformatics Jane Goodall The human sequence (red) differs from the chimpanzee sequence (blue) in only one amino acid in a protein chain of 153 residues for myoglobin
More informationCISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)
CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST
More informationSupporting Information
Supporting Information Das et al. 10.1073/pnas.1302500110 < SP >< LRRNT > < LRR1 > < LRRV1 > < LRRV2 Pm-VLRC M G F V V A L L V L G A W C G S C S A Q - R Q R A C V E A G K S D V C I C S S A T D S S P E
More informationPhylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?
Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species
More informationOrganic Chemistry Option II: Chemical Biology
Organic Chemistry Option II: Chemical Biology Recommended books: Dr Stuart Conway Department of Chemistry, Chemistry Research Laboratory, University of Oxford email: stuart.conway@chem.ox.ac.uk Teaching
More informationSequence Alignment Techniques and Their Uses
Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this
More informationIllegitimate translation causes unexpected gene expression from on-target out-of-frame alleles
Illegitimate translation causes unexpected gene expression from on-target out-of-frame alleles created by CRISPR-Cas9 Shigeru Makino, Ryutaro Fukumura, Yoichi Gondo* Mutagenesis and Genomics Team, RIKEN
More informationOrganization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p
Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=
More informationQ1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.
OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall
More informationIntroduction to protein alignments
Introduction to protein alignments Comparative Analysis of Proteins Experimental evidence from one or more proteins can be used to infer function of related protein(s). Gene A Gene X Protein A compare
More informationChapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships
Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic
More informationSupporting Information
Supporting Information Weghorn and Lässig 10.1073/pnas.1210887110 SI Text Null Distributions of Nucleosome Affinity and of Regulatory Site Content. Our inference of selection is based on a comparison of
More informationMarkov Models & DNA Sequence Evolution
7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under
More informationspecies Nicolas Bargues and Emmanuelle Lerat *
Bargues and Lerat Mobile DNA (2017) 8:7 DOI 10.1186/s13100-017-0090-3 RESEARCH Open Access Evolutionary history of LTRretrotransposons among 20 Drosophila species Nicolas Bargues and Emmanuelle Lerat *
More informationCopyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.
Structure Determination and Sequence Analysis The vast majority of the experimentally determined three-dimensional protein structures have been solved by one of two methods: X-ray diffraction and Nuclear
More informationMotivating the need for optimal sequence alignments...
1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use
More informationComparative Bioinformatics Midterm II Fall 2004
Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans
More informationComparative Genomics II
Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods
More informationGCD3033:Cell Biology. Transcription
Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors
More informationBiology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall
Biology Biology 1 of 26 Fruit fly chromosome 12-5 Gene Regulation Mouse chromosomes Fruit fly embryo Mouse embryo Adult fruit fly Adult mouse 2 of 26 Gene Regulation: An Example Gene Regulation: An Example
More informationMathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007
-2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open
More informationComputational Identification of Evolutionarily Conserved Exons
Computational Identification of Evolutionarily Conserved Exons Adam Siepel Center for Biomolecular Science and Engr. University of California Santa Cruz, CA 95064, USA acs@soe.ucsc.edu David Haussler Howard
More informationIntroduction to Hidden Markov Models for Gene Prediction ECE-S690
Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence
More informationFrom gene to protein. Premedical biology
From gene to protein Premedical biology Central dogma of Biology, Molecular Biology, Genetics transcription replication reverse transcription translation DNA RNA Protein RNA chemically similar to DNA,
More information1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine.
Protein Synthesis & Mutations RNA 1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine. RNA Contains: 1. Adenine 2.
More informationBLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010
BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for
More informationInterpolated Markov Models for Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey
Interpolated Markov Models for Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following the
More information