6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
|
|
- Jessie Henderson
- 6 years ago
- Views:
Transcription
1 MIT OpenCourseWare / Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit:
2 Computational Biology: Genomes, Networks, Evolution Computational Gene Prediction and Generalized Hidden Markov Models Lecture 8 September 30, 2008
3 Today Gene Prediction Overview HMMs for Gene Prediction GHMMs for Gene Prediction Genscan
4 Genome Annotation Genome Sequence RNA Transcription Translation Protein
5 Eukaryotic Gene Structure ATG complete mrna coding segment TGA exon intron exon intron exon ATG... GT AG... GT AG... TGA start codon donor site acceptor site donor site acceptor site stop codon Courtesy of William Majoros. Used with permission.
6 Translation Chain of amino acids transfer RNA one amino acid Anti-codon (3 bases) Codon (3 bases) messenger RNA (mrna) Courtesy of William Majoros.Used with permission. Ribosome (performs translation)
7 Genetic Code Acid A C GCA GCC GCG GCT G H GGA GGC GGG GGT M N ATG S T AGC AGT TCA TCC TCG TCT ACA D E F Codons Acid Codons Acid Codons Acid Codons TGC TGT CAC CAT AAC AAT ACT GAC GAT I K ATA ATC ATT P Q CCA CCC CCG CCT V W GTA GTC GTG GTT TGG GAA GAG TTC TTT L AAA AAG CTA CTC CTG CTT TTA R CAA CAG AGA AGG CGA CGC CGG CGT Y ACC ACG TAC TAT Each amino acid is encoded by one or more codons. Each codon encodes a single amino acid. The third position of the codon is the most likely to vary, for a given amino acid. Figure by MIT OpenCourseWare.
8 Gene Prediction as Parsing Given a genome sequence, we wish to label each nucleotide according to the parts of genes Exon, intron, intergenic, etc The sequence of labels must follow the syntax of genes e.g. exons must be followed by introns or intergenic not by other exons We wish to find the optimal parsing of a sequence by some measure
9 Features A feature is any DNA subsequence of biological significance. For practical reasons, we recognize two broad classes of features: signals short, fixed-length features content regions variable-length features Courtesy of William Majoros. Used with permission.
10 Signals Associated with short fixed(-ish) length sequences Start Codon - ATG Stop Codons TAA, TAG,TGA 5 3 E I E I E 5 Splice Site (Acceptor) 3 Splice Site (Donor)
11 Content Regions Content regions often have characteristic base composition Example Recall: often multiple codons for each amino acid All codons are not used equally Characteristic higher order nucleotide statistics in coding sequences (hexanucleotides) P exon (X i X i-1, X i-2, X i-3, X i-4, X i-5 ) 5 3 E I E I E P(X i X i-1, X i-2, X i-3, X i-4, X i-5 ) = P(X i )
12 Extrinsic Evidence intron exon Gene Gene Prediction Algorithms BLAST Hits HMMer Domains EST Alignments Neurospora crassa (a fungus)
13 HMMs for Gene Prediction States correspond to gene and genomic regions (exons, introns, intergenic, etc) State transitions ensure legal parses Emission matrices describe nucleotide statistics for each state
14 A (Very) Simple HMM Donor T Intron Acceptor A Donor G Acceptor G the Markov model: Start Codon G Start Codon T Exon Stop Codon G Stop Codon T Start Codon A Intergenic Stop Codon A q 0 Courtesy of William Majoros. Used with permission.
15 A Generative Model We can use this HMM to generate a sequence and state labeling The initial state is q 0 Donor T Intron Acceptor A Choose a subsequent state, conditioned on the current state, according to a jk =P(q k q j ) Choose a nucleotide to emit from the state emissions matrix e k (X i ) Donor G Start Codon G Start Codon T Exon Acceptor G Stop Codon G Stop Codon T Repeat until number of nucleotides equals desired length of sequence Start Codon A Intergenic Stop Codon A q 0
16 But We Usually Have the Sequence Donor T Intron Acceptor A Donor G Acceptor G the Markov model: Start Codon G Start Codon T Exon Stop Codon G Stop Codon T Start Codon A Intergenic Stop Codon A q 0 the input sequence: AGCTAGCAGTATGTCATGGCATGTTCGGAGGTAGTACGTAGAGGTAGCTAGTATAGGTCGATAGTACGCGA What is the best state labeling? Courtesy of William Majoros. Used with permission.
17 Finding The Most Likely Path A sensible choice is to choose π that maximizes P[π x] This is equivalent to finding path π* that maximizes total joint probability P[ x, π ]: P(x,π) = a 0π1 * Π i e πi (x i ) a πi π i+1 start emission transition How do we select π efficiently?
18 A (Very) Simple HMM Donor T Intron Acceptor A Donor G Acceptor G the Markov model: Start Codon G Start Codon T Exon Stop Codon G Stop Codon T Start Codon A Intergenic Stop Codon A q 0 the input sequence: the most probable path: the gene prediction: AGCTAGCAGTATGTCATGGCATGTTCGGAGGTAGTACGTAGAGGTAGCTAGTATAGGTCGATAGTACGCGA Courtesy of William Majoros. Used with permission. exon 1 exon 2 exon 3
19 A Real HMM Gene Predictor Title page of journal article removed due to copyright restrictions. The article is the following: Krogh, Anders, I. Saira Mian, and David Haussler. "A Hidden Markov Model That Finds Genes in E.coli DNA." Nucleic Acids Research 22, no. 22 (1994):
20 HMM Limitations The HMM framework imposes constraints on state paths Donor T Intron Acceptor A Donor G Acceptor G Start Codon G Exon Stop Codon G Start Codon T Stop Codon T Start Codon A Intergenic Stop Codon A q 0
21 Human Exon Lengths Courtesy of Christopher Burge. Used with permission. Burge, MIT PhD Thesis
22 Fungal Intron Lengths 50% 40% N. crassa 50% 40% S. cerevisiae 30% 30% 20% 20% 10% 10% 0% 50% 40% C. neoformans 0% 50% 40% S. pombe 30% 30% 20% 20% 10% 10% 0% 0% Nucleotides Nucleotides Figure by MIT OpenCourseWare. Kupfer et al., (2004) Eukaryotic Cell
23 HMM Emissions Donor T Intron Acceptor A HMMs typically emit a single nucleotide per state Donor G Acceptor G Exon P(T)=1 P(A),P(G),P(C)=0 Intergenic q 0
24 Generalized HMMs (GHMMs) GHMMs emit more than one symbol per state Emissions probabilities modeled by any arbitrary probabilistic model Feature lengths are explicitly modeled W.H. Majaros ( Courtesy of William Majoros. Used with permission. Human Introns Courtesy of Elsevier, Inc. Used with permission. Burge, Karlin (1997)
25 GHMM Elements States Observations Initial state probabilities Transition Probabilities Q V π i = P(q 0 =i) a jk = P(q i =k q i-1 =j) Like HMMs Duration Probabilities f k (d)=p(state k of length d) Emission Probabilities e k (X α,α+d )=P k (X α X α+d q k,d) Now emit a subsequence
26 Model Abstraction in GHMMs W.H. Majaros ( Models must return the probability of a subsequence given a state and duration Courtesy of William Majoros. Used with permission.
27 GHMM Submodel Examples 1. WMM (Weight Matrix) L 1 i=0 P i (x i ) 2. Nth-order Markov Chain (MC) n 1 i=0 L 1 P(x i x 0...x i 1 ) P(x i x i n...x i 1 ) i=n 3. Three-Periodic Markov Chain (3PMC) 5. Codon Bias n 1 i=0 P(x α+3i x α+3i+1 x α+3i+2 ) L 1 i=0 P ( f + i)(mod 3) (x i ) 6. MDD Ref: Burge C (1997) Identification of complete gene structures in human genomic DNA. PhD thesis. Stanford University. 7. Interpolated Markov Model Ref: Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Research 26: P IMM e (s g 0...g k 1 ) = λ G k P e (s g 0...g k 1 )+ (1 λ G k )P IMM e (s g 1...g k 1 ) if k > 0 P e (s) if k = 0 Courtesy of William Majoros. Used with permission.
28 GHMMs as Generative Models Like HMM, we can use a GHMM to generate a sequence and state labeling Choose initial state, q 1, from π i Choose a duration d 1, from length distribution f q1 (d) Generate a subsequence from state model e q1 (X 1,d ) Choose a subsequent state, q 2, conditioned on the current state, according to a jk =P(q k q j ) Repeat until number of nucleotides equals desired length of sequence Given a sequence, how do we pick a state labeling (segmentation)?
29 The Viterbi Algorithm - HMMs State 1 2 V k (i) K x 1 x 2 x 3..x N Input: x = x1 xn Initialization: V 0 (0)=1, V k (0) = 0, for all k > 0 Iteration: V k (i) = e K (x i ) max j a jk V j (i-1) Termination: P(x, π*) = max k V k (N) Traceback: Follow max pointers back In practice: Use log scores for computation Running time and space: Time: O(K 2 N) Space: O(KN)
30 HMM Viterbi Recursion hidden states V j (i-1) j a jk k e k V k (i) observations x i-1 x i Assume we know V j for the previous time step (i-1) Calculate V k (i) = e k (x i ) * max j ( V j (i-1) a jk ) current max this emission max ending in state j at step i-1 Transition from state j all possible previous states j
31 GHMM Recursion hidden states V j (i-1) j a jk k e k V k (i) observations x i-1 x i We could have come from state j at the last position
32 GHMM Recursion But we could also have come from state j 4 positions ago (State k could have duration 4 so far)
33 GHMM Recursion In general, we could have come from state j d positions ago (State k could have duration d so far)
34 GHMM Recursion This leads to the following recursion equation: () max max ( ) ( ) ( ) V i = P x x k V i d P d k a k i i d j jk j d Max over all prev states and state durations Prob of subsequence given state k Max ending in state j at step i-d Prob that state k has duration d Transition from j->k
35 Comparing GHMMs and HMMs HMM O(K 2 L) V k (i) = e k (x i ) * max j ( V j (i-1) a jk ) current max this emission max ending in state j at step i-1 Transition from state j GHMM O(K 2 L 3 ) all possible previous states j Similar modifications needed for forward and backward algorithms () max max ( ) ( ) ( ) V i = P x x k V i d P d k a k i i d j jk j d Max over all prev states and state durations Prob of subsequence given state k Max ending in state j at step i-d Prob that state k has duration d Transition from j->k
36 Courtesy of Elsevier, Inc. Used with permission.
37 Genscan - Burge and Karlin, (1997) Explicit State Duration GHMM 5 th order markov models for coding and non-coding sequences Each CDS frame has own model WAM models for start/stop codons and acceptor sites MDD model for donor sites Separate parameters for regions of different GC content Model +/- strand concurrently Courtesy of Elsevier, Inc. Used with permission.
38 Types of Exons Three types of exons are defined, for convenience: initial exons extend from a start codon to the first donor site; internal exons extend from one acceptor site to the next donor site; final exons extend from the last acceptor site to the stop codon; single exons (which occur only in intronless genes) extend from the start codon to the stop codon: Courtesy of William Majoros. Used with permission.
39 Intron and Exon Phase Codon Phase 0 Intron Exon Intron Exon Phase 0 Exon Phase 1 Intron Phase 1 Exon Exon Intron Exon Phase 2 Intron Phase 2 Exon Exon Intron Exon
40 Two State Types in Genscan D-type represented by diamonds C-type represented by circles D states are always followed by C and vice versa C states always preceded by same D- state Courtesy of Elsevier, Inc. Used with permission.
41 Two State Types in Genscan D-Type geometric length distribution Intergenic regions UTRs Introns P Xa, c = P ( Xa, b) P ( Xb+ 1, c) ( ) k k k f ( d) = P(state duration d state k)=p f ( d 1) k Sequence models are factorable : k k Increasing duration by one changes probability by constant factor
42 Two State Types in Genscan C-Type general length distributions and sequence generating modes Exons (initial, internal, terminal) Promotors Poly-A Signals
43 Genscan Inference Genscan uses same basic forward, backward, and viterbi algorithms as generic GHMMs But assumptions about C, and D states reduce algorithmic complexity
44 Genscan Inference C-state List D states State 1 2 V k (i) C states K C2 C1 x 1 x 2 x 3..x N L k (i) C1, duration 2, previous D-state=1 C2, duration 3, previous D-state=2
45 Genscan Viterbi Induction Max from previous step in this state Extend D-type state by one step (factorable state) Probability of emiting one more nucleotide from state k Was in state K in previous step ( ) ( ) V i+ 1 = max [ V i p e ( X ), k k k k i+ 1 c c-type states ( ) c max { V i d 1 P(y c)(1-p ) P(d c)p(x..x c) P(k c) p e ( X )}] Max probability of ending D type state y c at position i-d-1 y c y i-d i k k i+ 1 c Probability of transition from y c to c Terminate D- type state length Probability that C state was duration d Probability of subsequence of length d from state c Probability of transition from c to k Probability of nucleotide from state k Just transitioned from c-type state c of duration d which previously transitioned from D-type state y c
46 The gene finder will later be deployed for use in predicting the rest of the organism s genes. The way in which the model parameters are inferred during training can significantly affect the accuracy of the deployed program. Courtesy of William Majoros. Used with permission. Training A Gene Predictor During training of a gene finder, only a subset K of an organism s gene set will be available for training:
47 Training A Gene Predictor Courtesy of William Majoros. Used with permission.
48 Gene Prediction Accuracy Gene predictions can be evaluated in terms of true positives (predicted features that are real), true negatives (non-predicted features that are not real), false positives (predicted features that are not real), and false negatives (real features that were not predicted: These definitions can be applied at the whole-gene, whole-exon, or individual nucleotide level to arrive at three sets of statistics. Courtesy of William Majoros. Used with permission.
49 Sn = Accuracy Metrics SMC = TP TP + FN Sp = TP TP + FP TP +TN TP +TN + FP + FN F = 2 Sn Sp Sn + Sp CC = (TP TN ) (FN FP) (TP + FN ) (TN + FP) (TP + FP) (TN + FN ). ACP = 1 n TP TP + FN + TP TP + FP + TN TN + FP + TN TN + FN, AC = 2(ACP 0.5). Courtesy of William Majoros. Used with permission.
50 More Information sroom.html
Practical Bioinformatics
5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o
More informationSEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA
SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.
More informationCrick s early Hypothesis Revisited
Crick s early Hypothesis Revisited Or The Existence of a Universal Coding Frame Ryan Rossi, Jean-Louis Lassez and Axel Bernal UPenn Center for Bioinformatics BIOINFORMATICS The application of computer
More informationSUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA
SUPPORTING INFORMATION FOR SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA Aik T. Ooi, Cliff I. Stains, Indraneel Ghosh *, David J. Segal
More informationSupplementary Information for
Supplementary Information for Evolutionary conservation of codon optimality reveals hidden signatures of co-translational folding Sebastian Pechmann & Judith Frydman Department of Biology and BioX, Stanford
More informationHigh throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm
Electronic Supplementary Material (ESI) for Nanoscale. This journal is The Royal Society of Chemistry 2018 High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence
More informationAdvanced topics in bioinformatics
Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib
More informationClay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.
QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. Clay Carter Department of Biology QuickTime and a TIFF (LZW) decompressor are needed to see this picture. Ornamental tobacco
More informationRegulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)
Regulatory Sequence Analysis Sequence models (Bernoulli and Markov models) 1 Why do we need random models? Any pattern discovery relies on an underlying model to estimate the random expectation. This model
More informationSupplemental data. Pommerrenig et al. (2011). Plant Cell /tpc
Supplemental Figure 1. Prediction of phloem-specific MTK1 expression in Arabidopsis shoots and roots. The images and the corresponding numbers showing absolute (A) or relative expression levels (B) of
More informationSUPPLEMENTARY DATA - 1 -
- 1 - SUPPLEMENTARY DATA Construction of B. subtilis rnpb complementation plasmids For complementation, the B. subtilis rnpb wild-type gene (rnpbwt) under control of its native rnpb promoter and terminator
More informationCharacterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin
International Journal of Genetic Engineering and Biotechnology. ISSN 0974-3073 Volume 2, Number 1 (2011), pp. 109-114 International Research Publication House http://www.irphouse.com Characterization of
More informationNSCI Basic Properties of Life and The Biochemistry of Life on Earth
NSCI 314 LIFE IN THE COSMOS 4 Basic Properties of Life and The Biochemistry of Life on Earth Dr. Karen Kolehmainen Department of Physics CSUSB http://physics.csusb.edu/~karen/ WHAT IS LIFE? HARD TO DEFINE,
More informationDNA Feature Sensors. B. Majoros
DNA Feature Sensors B. Majoros What is Feature Sensing? A feature is any DNA subsequence of biological significance. For practical reasons, we recognize two broad classes of features: signals short, fixed-length
More informationSSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),
48 3 () Vol. 48 No. 3 2009 5 Journal of Xiamen University (Nat ural Science) May 2009 SSR,,,, 3 (, 361005) : SSR. 21 516,410. 60 %96. 7 %. (),(Between2groups linkage method),.,, 11 (),. 12,. (, ), : 0.
More informationElectronic supplementary material
Applied Microbiology and Biotechnology Electronic supplementary material A family of AA9 lytic polysaccharide monooxygenases in Aspergillus nidulans is differentially regulated by multiple substrates and
More informationNumber-controlled spatial arrangement of gold nanoparticles with
Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2016 Number-controlled spatial arrangement of gold nanoparticles with DNA dendrimers Ping Chen,*
More informationTM1 TM2 TM3 TM4 TM5 TM6 TM bp
a 467 bp 1 482 2 93 3 321 4 7 281 6 21 7 66 8 176 19 12 13 212 113 16 8 b ATG TCA GGA CAT GTA ATG GAG GAA TGT GTA GTT CAC GGT ACG TTA GCG GCA GTA TTG CGT TTA ATG GGC GTA GTG M S G H V M E E C V V H G T
More informationNature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1
Supplementary Figure 1 Zn 2+ -binding sites in USP18. (a) The two molecules of USP18 present in the asymmetric unit are shown. Chain A is shown in blue, chain B in green. Bound Zn 2+ ions are shown as
More informationTable S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R
Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R AAC MGG ATT AGA TAC CCK G GGY TAC CTT GTT ACG ACT T Detection of Candidatus
More informationModelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics
582746 Modelling and Analysis in Bioinformatics Lecture 1: Genomic k-mer Statistics Juha Kärkkäinen 06.09.2016 Outline Course introduction Genomic k-mers 1-Mers 2-Mers 3-Mers k-mers for Larger k Outline
More informationSupplementary Information
Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2014 Directed self-assembly of genomic sequences into monomeric and polymeric branched DNA structures
More informationSupplemental Figure 1.
A wt spoiiiaδ spoiiiahδ bofaδ B C D E spoiiiaδ, bofaδ Supplemental Figure 1. GFP-SpoIVFA is more mislocalized in the absence of both BofA and SpoIIIAH. Sporulation was induced by resuspension in wild-type
More informationSupporting Information
Supporting Information T. Pellegrino 1,2,3,#, R. A. Sperling 1,#, A. P. Alivisatos 2, W. J. Parak 1,2,* 1 Center for Nanoscience, Ludwig Maximilians Universität München, München, Germany 2 Department of
More informationevoglow - express N kit distributed by Cat.#: FP product information broad host range vectors - gram negative bacteria
evoglow - express N kit broad host range vectors - gram negative bacteria product information distributed by Cat.#: FP-21020 Content: Product Overview... 3 evoglow express N -kit... 3 The evoglow -Fluorescent
More informationSupporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)-
Supporting Information for Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)- Dependence and Its Ability to Chelate Multiple Nutrient Transition Metal Ions Rose C. Hadley,
More informationEvolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval
Evolvable Neural Networs for Time Series Prediction with Adaptive Learning Interval Dong-Woo Lee *, Seong G. Kong *, and Kwee-Bo Sim ** *Department of Electrical and Computer Engineering, The University
More informationBuilding a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy
Supporting Information Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy Cuichen Wu,, Da Han,, Tao Chen,, Lu Peng, Guizhi Zhu,, Mingxu You,, Liping Qiu,, Kwame Sefah,
More informationProtein Threading. Combinatorial optimization approach. Stefan Balev.
Protein Threading Combinatorial optimization approach Stefan Balev Stefan.Balev@univ-lehavre.fr Laboratoire d informatique du Havre Université du Havre Stefan Balev Cours DEA 30/01/2004 p.1/42 Outline
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationSUPPLEMENTARY INFORMATION
SUPPLEMENTARY INFORMATION DOI:.38/NCHEM.246 Optimizing the specificity of nucleic acid hyridization David Yu Zhang, Sherry Xi Chen, and Peng Yin. Analytic framework and proe design 3.. Concentration-adjusted
More information3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies
Richard Owen (1848) introduced the term Homology to refer to structural similarities among organisms. To Owen, these similarities indicated that organisms were created following a common plan or archetype.
More informationevoglow - express N kit Cat. No.: product information broad host range vectors - gram negative bacteria
evoglow - express N kit broad host range vectors - gram negative bacteria product information Cat. No.: 2.1.020 evocatal GmbH 2 Content: Product Overview... 4 evoglow express N kit... 4 The evoglow Fluorescent
More informationWhy do more divergent sequences produce smaller nonsynonymous/synonymous
Genetics: Early Online, published on June 21, 2013 as 10.1534/genetics.113.152025 Why do more divergent sequences produce smaller nonsynonymous/synonymous rate ratios in pairwise sequence comparisons?
More informationSUPPLEMENTARY INFORMATION
DOI:.8/NCHEM. Conditionally Fluorescent Molecular Probes for Detecting Single Base Changes in Double-stranded DNA Sherry Xi Chen, David Yu Zhang, Georg Seelig. Analytic framework and probe design.. Design
More informationCodon Distribution in Error-Detecting Circular Codes
life Article Codon Distribution in Error-Detecting Circular Codes Elena Fimmel, * and Lutz Strüngmann Institute for Mathematical Biology, Faculty of Computer Science, Mannheim University of Applied Sciences,
More informationEvolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila
biorxiv preprint first posted online May. 3, 2016; doi: http://dx.doi.org/10.1101/051557. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved.
More informationEncoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective
Jacobs University Bremen Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective Semester Project II By: Dawit Nigatu Supervisor: Prof. Dr. Werner Henkel Transmission
More information6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II
6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution Lecture 05 Hidden Markov Models Part II 1 2 Module 1: Aligning and modeling genomes Module 1: Computational foundations Dynamic programming:
More informationThe 3 Genomic Numbers Discovery: How Our Genome Single-Stranded DNA Sequence Is Self-Designed as a Numerical Whole
Applied Mathematics, 2013, 4, 37-53 http://dx.doi.org/10.4236/am.2013.410a2004 Published Online October 2013 (http://www.scirp.org/journal/am) The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded
More informationpart 3: analysis of natural selection pressure
part 3: analysis of natural selection pressure markov models are good phenomenological codon models do have many benefits: o principled framework for statistical inference o avoiding ad hoc corrections
More informationThe Trigram and other Fundamental Philosophies
The Trigram and other Fundamental Philosophies by Weimin Kwauk July 2012 The following offers a minimal introduction to the trigram and other Chinese fundamental philosophies. A trigram consists of three
More informationInterpolated Markov Models for Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey
Interpolated Markov Models for Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following the
More informationSupplemental Table 1. Primers used for cloning and PCR amplification in this study
Supplemental Table 1. Primers used for cloning and PCR amplification in this study Target Gene Primer sequence NATA1 (At2g393) forward GGG GAC AAG TTT GTA CAA AAA AGC AGG CTT CAT GGC GCC TCC AAC CGC AGC
More informationThe role of the FliD C-terminal domain in pentamer formation and
The role of the FliD C-terminal domain in pentamer formation and interaction with FliT Hee Jung Kim 1,2,*, Woongjae Yoo 3,*, Kyeong Sik Jin 4, Sangryeol Ryu 3,5 & Hyung Ho Lee 1, 1 Department of Chemistry,
More informationRe- engineering cellular physiology by rewiring high- level global regulatory genes
Re- engineering cellular physiology by rewiring high- level global regulatory genes Stephen Fitzgerald 1,2,, Shane C Dillon 1, Tzu- Chiao Chao 2, Heather L Wiencko 3, Karsten Hokamp 3, Andrew DS Cameron
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationSex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi
Supporting Information http://www.genetics.org/cgi/content/full/genetics.110.116228/dc1 Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi Ben
More informationChemiScreen CaS Calcium Sensor Receptor Stable Cell Line
PRODUCT DATASHEET ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line CATALOG NUMBER: HTS137C CONTENTS: 2 vials of mycoplasma-free cells, 1 ml per vial. STORAGE: Vials are to be stored in liquid N
More informationUsing algebraic geometry for phylogenetic reconstruction
Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA
More informationFrom DNA to protein, i.e. the central dogma
From DNA to protein, i.e. the central dogma DNA RNA Protein Biochemistry, chapters1 5 and Chapters 29 31. Chapters 2 5 and 29 31 will be covered more in detail in other lectures. ph, chapter 1, will be
More informationThe Computational Problem. We are given a sequence of DNA and we wish to know which subsequence or concatenation of subsequences constitutes a gene.
GENE FINDING The Computational Problem We are given a sequence of DNA and we wish to know which subsequence or concatenation of subsequences constitutes a gene. The Computational Problem Confounding Realities:
More informationTiming molecular motion and production with a synthetic transcriptional clock
Timing molecular motion and production with a synthetic transcriptional clock Elisa Franco,1, Eike Friedrichs 2, Jongmin Kim 3, Ralf Jungmann 2, Richard Murray 1, Erik Winfree 3,4,5, and Friedrich C. Simmel
More informationNear-instant surface-selective fluorogenic protein quantification using sulfonated
Electronic Supplementary Material (ESI) for rganic & Biomolecular Chemistry. This journal is The Royal Society of Chemistry 2014 Supplemental nline Materials for ear-instant surface-selective fluorogenic
More informationHMMs and biological sequence analysis
HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the
More informationTHE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE
STATISTICA, anno LXIX, n. 2 3, 2009 THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE Diego Luis Gonzalez CNR-IMM, Bologna Section, Via Gobetti 101, I-40129, Bologna,
More informationSupplementary Information
Supplementary Information Arginine-rhamnosylation as new strategy to activate translation elongation factor P Jürgen Lassak 1,2,*, Eva Keilhauer 3, Max Fürst 1,2, Kristin Wuichet 4, Julia Gödeke 5, Agata
More informationHidden Markov Models (I)
GLOBEX Bioinformatics (Summer 2015) Hidden Markov Models (I) a. The model b. The decoding: Viterbi algorithm Hidden Markov models A Markov chain of states At each state, there are a set of possible observables
More informationpart 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.
2017-07-29 part 4: and biological inference review types of models phenomenological Newton F= Gm1m2 r2 mechanistic Einstein Gαβ = 8π Tαβ 1 molecular evolution is process and pattern process pattern MutSel
More informationComparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey
Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes
More informationData Mining in Bioinformatics HMM
Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics
More informationPathways and Controls of N 2 O Production in Nitritation Anammox Biomass
Supporting Information for Pathways and Controls of N 2 O Production in Nitritation Anammox Biomass Chun Ma, Marlene Mark Jensen, Barth F. Smets, Bo Thamdrup, Department of Biology, University of Southern
More informationAtTIL-P91V. AtTIL-P92V. AtTIL-P95V. AtTIL-P98V YFP-HPR
Online Resource 1. Primers used to generate constructs AtTIL-P91V, AtTIL-P92V, AtTIL-P95V and AtTIL-P98V and YFP(HPR) using overlapping PCR. pentr/d- TOPO-AtTIL was used as template to generate the constructs
More informationInsects act as vectors for a number of important diseases of
pubs.acs.org/synthbio Novel Synthetic Medea Selfish Genetic Elements Drive Population Replacement in Drosophila; a Theoretical Exploration of Medea- Dependent Population Suppression Omar S. Abari,,# Chun-Hong
More informationEvolutionary Analysis of Viral Genomes
University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral
More informationMarkov Models & DNA Sequence Evolution
7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under
More information1/22/13. Example: CpG Island. Question 2: Finding CpG Islands
I529: Machine Learning in Bioinformatics (Spring 203 Hidden Markov Models Yuzhen Ye School of Informatics and Computing Indiana Univerty, Bloomington Spring 203 Outline Review of Markov chain & CpG island
More informationIntroduction to Molecular Phylogeny
Introduction to Molecular Phylogeny Starting point: a set of homologous, aligned DNA or protein sequences Result of the process: a tree describing evolutionary relationships between studied sequences =
More informationIntroduction to Hidden Markov Models for Gene Prediction ECE-S690
Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence
More informationIdentification of a Locus Involved in the Utilization of Iron by Haemophilus influenzae
INFECrION AND IMMUNITY, OCt. 1994, p. 4515-4525 0019-9567/94/$04.00+0 Copyright 1994, American Society for Microbiology Vol. 62, No. 10 Identification of a Locus Involved in the Utilization of Iron by
More informationMotif Finding Algorithms. Sudarsan Padhy IIIT Bhubaneswar
Motif Finding Algorithms Sudarsan Padhy IIIT Bhubaneswar Outline Gene Regulation Regulatory Motifs The Motif Finding Problem Brute Force Motif Finding Consensus and Pattern Branching: Greedy Motif Search
More informationChain-like assembly of gold nanoparticles on artificial DNA templates via Click Chemistry
Electronic Supporting Information: Chain-like assembly of gold nanoparticles on artificial DNA templates via Click Chemistry Monika Fischler, Alla Sologubenko, Joachim Mayer, Guido Clever, Glenn Burley,
More informationLecture 15: Programming Example: TASEP
Carl Kingsford, 0-0, Fall 0 Lecture : Programming Example: TASEP The goal for this lecture is to implement a reasonably large program from scratch. The task we will program is to simulate ribosomes moving
More informationSupporting Information. An Electric Single-Molecule Hybridisation Detector for short DNA Fragments
Supporting Information An Electric Single-Molecule Hybridisation Detector for short DNA Fragments A.Y.Y. Loh, 1 C.H. Burgess, 2 D.A. Tanase, 1 G. Ferrari, 3 M.A. Maclachlan, 2 A.E.G. Cass, 1 T. Albrecht*
More informationBiosynthesis of Bacterial Glycogen: Primary Structure of Salmonella typhimurium ADPglucose Synthetase as Deduced from the
JOURNAL OF BACTERIOLOGY, Sept. 1987, p. 4355-4360 0021-9193/87/094355-06$02.00/0 Copyright X) 1987, American Society for Microbiology Vol. 169, No. 9 Biosynthesis of Bacterial Glycogen: Primary Structure
More informationIt is the author's version of the article accepted for publication in the journal "Biosystems" on 03/10/2015.
It is the author's version of the article accepted for publication in the journal "Biosystems" on 03/10/2015. The system-resonance approach in modeling genetic structures Sergey V. Petoukhov Institute
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationFliZ Is a Posttranslational Activator of FlhD 4 C 2 -Dependent Flagellar Gene Expression
JOURNAL OF BACTERIOLOGY, July 2008, p. 4979 4988 Vol. 190, No. 14 0021-9193/08/$08.00 0 doi:10.1128/jb.01996-07 Copyright 2008, American Society for Microbiology. All Rights Reserved. FliZ Is a Posttranslational
More informationSymmetry Studies. Marlos A. G. Viana
Symmetry Studies Marlos A. G. Viana aaa aac aag aat caa cac cag cat aca acc acg act cca ccc ccg cct aga agc agg agt cga cgc cgg cgt ata atc atg att cta ctc ctg ctt gaa gac gag gat taa tac tag tat gca gcc
More informationEvidence for RNA editing in mitochondria of all major groups of
Proc. Natl. Acad. Sci. USA Vol. 91, pp. 629-633, January 1994 Plant Biology Evidence for RNA editing in mitochondria of all major groups of land plants except the Bryophyta RUDOLF HIESEL, BRUNO COMBETTES*,
More informationIntroduction to Hidden Markov Models (HMMs)
Introduction to Hidden Markov Models (HMMs) But first, some probability and statistics background Important Topics 1.! Random Variables and Probability 2.! Probability Distributions 3.! Parameter Estimation
More informationObjective: You will be able to justify the claim that organisms share many conserved core processes and features.
Objective: You will be able to justify the claim that organisms share many conserved core processes and features. Do Now: Read Enduring Understanding B Essential knowledge: Organisms share many conserved
More informationGenome Sequencing & DNA Sequence Analysis
7.91 / 7.36 / BE.490 Lecture #1 Feb. 24, 2004 Genome Sequencing & DNA Sequence Analysis Chris Burge What is a Genome? A genome is NOT a bag of proteins What s in the Human Genome? Outline of Unit II: DNA/RNA
More informationCISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)
CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models
More informationevoglow yeast kit distributed by product information Cat.#: FP-21040
evoglow yeast kit product information distributed by Cat.#: FP-21040 Flavin-mononucleotide-based Fluorescent Protein (FbFP) evoglow basic kit Cat.# FP-21010 Quantity 20 µg each General Information Fluorescent
More informationThe Mathematics of Phylogenomics
arxiv:math/0409132v2 [math.st] 27 Sep 2005 The Mathematics of Phylogenomics Lior Pachter and Bernd Sturmfels Department of Mathematics, UC Berkeley [lpachter,bernd]@math.berkeley.edu February 1, 2008 The
More informationSupplemental Figure 1. Differences in amino acid composition between the paralogous copies Os MADS17 and Os MADS6.
Supplemental Data. Reinheimer and Kellogg (2009). Evolution of AGL6-like MADSbox genes in grasses (Poaceae): ovule expression is ancient and palea expression is new Supplemental Figure 1. Differences in
More informationBMI/CS 576 Fall 2016 Final Exam
BMI/CS 576 all 2016 inal Exam Prof. Colin Dewey Saturday, December 17th, 2016 10:05am-12:05pm Name: KEY Write your answers on these pages and show your work. You may use the back sides of pages as necessary.
More informationAoife McLysaght Dept. of Genetics Trinity College Dublin
Aoife McLysaght Dept. of Genetics Trinity College Dublin Evolution of genome arrangement Evolution of genome content. Evolution of genome arrangement Gene order changes Inversions, translocations Evolution
More informationThe Cell Cycle & Cell Division. Cell Function Cell Cycle. What does the cell do = cell physiology:
Cell Function 2404 What does the cell do = cell physiology: 1. Cell Cycle & Cell Division 2. Membrane Transport 3. Secretion 4. Membrane Potential 5. Metabolism 6. Cellular Interactions 7. Cellular Control:
More informationDNA sequence analysis of the imp UV protection and mutation operon of the plasmid TP110: identification of a third gene
QD) 1990 Oxford University Press Nucleic Acids Research, Vol. 18, No. 17 5045 DNA sequence analysis of the imp UV protection and mutation operon of the plasmid TP110: identification of a third gene David
More informationSupplementary information. Porphyrin-Assisted Docking of a Thermophage Portal Protein into Lipid Bilayers: Nanopore Engineering and Characterization.
Supplementary information Porphyrin-Assisted Docking of a Thermophage Portal Protein into Lipid Bilayers: Nanopore Engineering and Characterization. Benjamin Cressiot #, Sandra J. Greive #, Wei Si ^#,
More informationGlucosylglycerate phosphorylase, a novel enzyme specificity involved in compatible solute metabolism
Supplementary Information for Glucosylglycerate phosphorylase, a novel enzyme specificity involved in compatible solute metabolism Jorick Franceus, Denise Pinel, Tom Desmet Corresponding author: Tom Desmet,
More informationCodon-model based inference of selection pressure. (a very brief review prior to the PAML lab)
Codon-model based inference of selection pressure (a very brief review prior to the PAML lab) an index of selection pressure rate ratio mode example dn/ds < 1 purifying (negative) selection histones dn/ds
More informationSupplemental Figure 1. Phenotype of ProRGA:RGAd17 plants under long day
Supplemental Figure 1. Phenotype of ProRGA:RGAd17 plants under long day conditions. Photo was taken when the wild type plant started to bolt. Scale bar represents 1 cm. Supplemental Figure 2. Flowering
More informationNEW DNA CYCLIC CODES OVER RINGS
NEW DNA CYCLIC CODES OVER RINGS NABIL BENNENNI, KENZA GUENDA AND SIHEM MESNAGER arxiv:1505.06263v1 [cs.it] 23 May 2015 Abstract. This paper is dealing with DNA cyclic codes which play an important role
More information1. In most cases, genes code for and it is that
Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod
More informationHow DNA barcoding can be more effective in microalgae. identification: a case of cryptic diversity revelation in Scenedesmus
How DNA barcoding can be more effective in microalgae identification: a case of cryptic diversity revelation in Scenedesmus (Chlorophyceae) Shanmei Zou, Cong Fei, Chun Wang, Zhan Gao, Yachao Bao, Meilin
More information160, and 220 bases, respectively, shorter than pbr322/hag93. (data not shown). The DNA sequence of approximately 100 bases of each
JOURNAL OF BACTEROLOGY, JUlY 1988, p. 3305-3309 0021-9193/88/073305-05$02.00/0 Copyright 1988, American Society for Microbiology Vol. 170, No. 7 Construction of a Minimum-Size Functional Flagellin of Escherichia
More informationVL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16
VL Algorithmen und Datenstrukturen für Bioinformatik (19400001) WS15/2016 Woche 16 Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie Universität Berlin Based on slides by
More information