Evolutionary Analysis of Viral Genomes
|
|
- Robert Stafford
- 6 years ago
- Views:
Transcription
1 University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: Evolutionary Analysis of Viral Genomes Lecture 1: Quantifying Genetic Diversity Oliver G. Pybus Department of Zoology, University of Oxford h t t p : / / e v o l v e. z o o. o x. a c. u k
2 Information in Molecular Sequences Biological sequences (DNA, RNA, protein) contain information about the processes and events that formed them. This evolutionary information is often scrambled, fragmentary, hidden, or lost. Our aim is to use mathematical models to recover and interpret this information.
3 Information in Molecular Sequences Genetic distances Phylogenetic relationships Rates of evolution Dates of historical events Evolutionary / Population Processes Recombination rates Migration rates among subpopulations Natural selection & adaptation Population size change
4 Information in Molecular Sequences Mutation is the source of all sequence differences. Single nucleotide polymorphisms Silent / Replacement Transitions / Transversions Length polymorphisms Insertions / Deletions Recombination generates new combinations of mutations. Natural Selection and Genetic Drift act to change the frequency of mutations within a population.
5 Types of Mutation Transitions (TS): purine-to-purine or pyramidine-to-pyramidine A G or C T Transversions (TV): purine-to-pyramidine A C, A T, G C or G T Silent: encoded amino acid is unchanged Replacement: encoded amino acid is changed Silent change Synonymous change Ser Ile Ser Thr Ser TGT ATC TCT ACG AGC TGT ATA TCT ATG AGC Ser Ile Ser Met Ser Replacement change Non-synonymous change
6 MOLECULAR SEQUENCES Alignment Methods BIOINFORMATICS ALIGNMENT Sequence Evolution Models MOLECULAR EVOLUTION Phylogenetic Inference GENETIC DISTANCES EVOLUTIONARY TREE (time scale = genetic distance) PHYLOGENETICS Molecular Clock Models EVOLUTIONARY TREE (time scale = years) PHYLOGENETICS Coalescent Models POPULATION GENETICS POPULATION PROCESSES (e.g. adaptation, migration, population size change)
7 MOLECULAR SEQUENCES Alignment Methods BIOINFORMATICS ALIGNMENT Sequence Evolution Models MOLECULAR EVOLUTION Phylogenetic Inference GENETIC DISTANCES EVOLUTIONARY TREE (time scale = genetic distance) PHYLOGENETICS Molecular Clock Models EVOLUTIONARY TREE (time scale = years) PHYLOGENETICS Coalescent Models POPULATION GENETICS POPULATION PROCESSES (e.g. adaptation, migration, population size change)
8 Sequence Alignment Homology: similarity of a character among organisms due to inheritance from a shared common ancestor. Positional Homology: equivalent nucleotide/ amino acid positions within a sequence. Alignment: a proposed assignment of positional homology for a set of gene/protein sequences.
9 Sequence Alignment For example, how should these two sequences be aligned? Seq1: ATGCGTCGTT Seq2: ATCCGCGTC
10 Sequence Alignment Like this? Seq1: ATGCGTCGTT.. Seq2: ATCCG-CGTC (7 homologous sites + 2 mismatches + 1 insertion/deletion) Or like this? Seq1: AT--GCGTCGTT Seq2: ATCCGCGTC--- (7 homologous sites + 0 mismatches + 2 insertions/deletions)
11 Sequence Alignment Most alignment methods start by assigning relative weights to mismatches versus insertions/deletions. Different types of mismatch (e.g. transitions and transversions) can be weighted differently. The weights are used to calculate a total score for each possible alignment. Algorithms then search for the alignment with the best total score.
12 Sequence Alignment ClustalX is a commonly used alignment program. Alignment algorithms provide a useful first draft. Further adjustment by hand is often needed to correct errors.
13 Sequence Alignment Many pathogen nucleotide sequences are highly divergent and are therefore difficult to align: Seq1: GAAGGAAGCTCCTGGTTACTCCTGGGATCC Seq2: GAGGGTTCCTATCTATTAATTGGTAGC Seq3: GACGGCAGTGCATGGCTTTTGGGCAGT Seq4: GATGGGTCAGCTTACCTCCTGGCCGGGTCA
14 Sequence Alignment Considering the amino acid translation of the nucleotide sequences can make things easier: Seq1: GAA GGA AGC TCC TGG TTA CTC CTG GGA TCC Seq2: GAG GGT TCC --- TAT CTA TTA ATT GGT AGC Seq3: GAC GGC AGT GCA TGG --- CTT TTG GGC AGT Seq4: GAT GGG TCA GCT TAC CTC CTG GCC GGG TCA Seq1: Glu Gly Ser Ser Trp Leu Leu Leu Gly Ser Seq2: Glu Gly Ser - Tyr Leu Leu Ile Gly Ser Seq3: Asp Gly Ser Ala Trp - Leu Leu Gly Ser Seq4: Asp Gly Ser Ala Tyr Leu Leu Ala Gly Ser
15 MOLECULAR SEQUENCES Alignment Methods BIOINFORMATICS ALIGNMENT Sequence Evolution Models MOLECULAR EVOLUTION Phylogenetic Inference GENETIC DISTANCES EVOLUTIONARY TREE (time scale = genetic distance) PHYLOGENETICS Molecular Clock Models EVOLUTIONARY TREE (time scale = years) PHYLOGENETICS Coalescent Models POPULATION GENETICS POPULATION PROCESSES (e.g. adaptation, migration, population size change)
16 Genetic Distances SIVcpz HIV-1 SIVcpz HIV-1 SIVcpz HIV-1 SIVcpz HIV-1 SIVcpz HIV-1 SIVcpz HIV-1 SIVcpz HIV-1 SIVcpz HIV-1 SIVcpz HIV-1 ATGGGTGCGA GAGCGTCAGT TCTAACAGGG GGAAAATTAG ATCGCTGGGA ATGGGTGCGA GAGCGTCAGT ATTAAGCGGG GGAGAATTAG ATCGATGGGA AAAAGTTCGG CTTAGGCCCG GGGGAAGAAA AAGATATATG ATGAAACATT AAAAATTCGG TTAAGGCCAG GGGGAAAGAA AAAATATAAA TTAAAACATA TAGTATGGGC AAGCAGGGAG CTGGAAAGAT TCGCATGTGA CCCCGGGCTA TAGTATGGGC AAGCAGGGAG CTAGAACGAT TCGCAGTTAA TCCTGGCCTG ATGGAAAGTA AGGAAGGATG TACTAAATTG TTACAACAAT TAGAGCCAGC TTAGAAACAT CAGAAGGCTG TAGACAAATA CTGGGACAGC TACAACCATC TCTCAAAACA GGCTCAGAAG GACTGCGGTC CTTGTTTAAC ACTCTGGCAG CCTTCAGACA GGATCAGAAG AACTTAGATC ATTATATAAT ACAGTAGCAA TACTGTGGTG CATACATAGT GACATCACTG TAGAAGACAC ACAGAAAGCT CCCTCTATTG TGTGCATCAA AGGATAGAGA TAAAAGACAC CAAGGAAGCT CTAGAACAGC TAAAGCGGCA TCATGGAGAA CAACAGAGCA AAACTGAAAG TTAGACAAGA TAGAG--GAA -----GAGCA AAACAAAAGT AA---GAAAA TAACTCAGGA AGCCGTGAAG GGGGAGCCAG TCAAGGCGCT AGTGCCTCTG AAGCACAGCA AGC-----AG CAGCTGACA- -CAGGACAC- AG--CAGC-- CTGGCATTAG TGGAAATTAC CAGG--TCAG CCAAAATTAC 420 sites, 121 differences, 22 indels
17 The Multiple Substitution Problem Identical by A Convergent G descent: evolution: A A A A C Over time, multiple substitutions can generate sequence homology: A G A A
18 The Multiple Substitution Problem After sufficient time, the sequence will be random because so many substitutions have occurred. For nucleotide sequences, this means that 25% of sites will identical by chance. Hence maximum sequence divergence = 75%.
19 The Multiple Substitution Problem When divergence is low, the observed number of changes is similar to the true number of changes (genetic distance). When divergence is high, the observed number of changes underestimates the true genetic distance ( saturation ). Genetic Distance 200% 180% 160% 140% 120% 100% 80% 60% 40% 20% 0% Actual Time Hidden Information Observed
20 The Multiple Substitution Problem A statistical model of sequence evolution can be used to accurately estimate genetic distance. These are called nucleotide substitution models. Genetic Distance 200% 180% 160% 140% 120% 100% 80% 60% 40% 20% 0% Actual Time Observed
21 Nucleotide substitution models Time-reversible Markov process with four states (A,C,G,T). A a C Each nucleotide site evolves independently. c b e d Transitions are in red. Transversions are in blue. A and G are Purines. C and T are Pyrimidines. T G a,b,c,d,e,f = relative rate parameters f
22 Nucleotide substitution models Q is the instantaneous rate matrix of the Markov process. Elements represent the instantaneous rate of change from base X to base Y. Q = # % % % % $. µa"c µb"g µc"t µa"a. µd"g µe"t µb"a µd"c. µf"t µc"a µe"c µf"g. & ( ( ( ( Diagonal elements are set so that the rows sum to zero. μ = nucleotide substitution rate πx = frequency of base X (usually estimated from the data)
23 Nucleotide substitution models Table 1.1. The models of nucleotide substitution Model Description Parameter constraints REV TrN The most general, timereversible, Markov model TVs; purine TSs; and pyrimidine TSs Equal freq.? Reference none no (e.g. Yang, 1994a) a=c=d=f no (Tamura and Nei, 1993) HKY TVs and TSs a=c=d=f, b=e no (Hasegawa et al., 1985) F81 One substitution type a=b=c=d=e=f no (Felsenstein, 1981) K3ST A-T, C-G TVs; A-C, G-T TVs; and TSs a=f, c=d, b=e yes (Kimura, 1981) K2P TVs and TSs a=c=d=f, b=e yes (Kimura, 1980) JC69 One substitution type a=b=c=d=e=f yes (Jukes and Cantor, 1969) Equal freq? = equal base frequencies (πx)?
24 Nucleotide substitution models Suppose we have 2 sequences, A & B, which have n sites. The genetic distance between them is t. The units of t are substitutions per site (μ time). A t B If the nucleotide substitution rate of the sequences (μ) is known, then t represents time (months or years).
25 Nucleotide substitution models At site x, sequence A has base i and sequence B has base j. For this site, the probability distribution of the genetic distance t between A and B is: Qt P x (t) = i, j Thus the distribution of genetic distance, given A & B is: P(t) = n " x=1 The value t can be estimated using maximum likelihood. e P x (t) i, j
26 Nucleotide substitution models Under the HKY model, the substitution probability is : P x i, j (t) = # j =, /. "j + "j "j + "j % & % & 1 #j 1 #j % "j1$ e $µt & ( $1e $µt + ) ( % & % $1e $µt $ "j ( ) ) & #j $ "j #j #j ( e ), " A + " G if j is a purine - /" C + " T if j is a pyrimidine + = transition rate / transversion rate ( ) e $µt(1+ #j(+ $1)) (i = j) $µt(1+ #j(+ $1)) (transition) (transversion) More complex models are calculated numerically.
27 Among-site rate heterogeneity Some sites evolve slowly, others evolve rapidly. Among-site rate heterogeneity models let μ vary among nucleotide sites. The codon-position model defines 3 relative rates, one for each codon position. The third codon position usually evolves faster than positions 1 and 2. The gamma model supposes that μ is distributed according to a one-parameter gamma distribution. The substitution probabilty P(t) is then integrated across this distribution.
28 Among-site rate heterogeneity The gamma model has one parameter, α.
29 Genetic Distances: Assumptions Insertions and deletions are ignored. Substitutions are reversible (non-reversible models are possible, but computationally difficult). Nucleotide base frequencies (πx) do not change through time (stationarity). All nucleotide sites evolve independently. But correlation among sites may arise from: i. Epistatic interactions among mutations ii. DNA/RNA secondary structure (stem-loops, trna, rrna) iii.mutations that change more than one site at once.
30 Genetic Distances: Example Estimated genetic distances between SIVcpz and HIVlai, under different nucleotide substitution models: Observed % different sites = Jukes-Cantor (JC69) = Kimura 2 paramater (K2P) = Hasegawa-Kishino-Yano (HKY) = General reversible (REV) = General reversible + gamma (REV+gamma) = 1.017
SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA
SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.
More informationPractical Bioinformatics
5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o
More informationAdvanced topics in bioinformatics
Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More informationSupplementary Information for
Supplementary Information for Evolutionary conservation of codon optimality reveals hidden signatures of co-translational folding Sebastian Pechmann & Judith Frydman Department of Biology and BioX, Stanford
More informationIntroduction to Molecular Phylogeny
Introduction to Molecular Phylogeny Starting point: a set of homologous, aligned DNA or protein sequences Result of the process: a tree describing evolutionary relationships between studied sequences =
More informationMassachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution
Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral
More informationSequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of
More informationLecture Notes: Markov chains
Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity
More information3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies
Richard Owen (1848) introduced the term Homology to refer to structural similarities among organisms. To Owen, these similarities indicated that organisms were created following a common plan or archetype.
More informationSUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA
SUPPORTING INFORMATION FOR SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA Aik T. Ooi, Cliff I. Stains, Indraneel Ghosh *, David J. Segal
More informationCharacterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin
International Journal of Genetic Engineering and Biotechnology. ISSN 0974-3073 Volume 2, Number 1 (2011), pp. 109-114 International Research Publication House http://www.irphouse.com Characterization of
More informationSequence Divergence & The Molecular Clock. Sequence Divergence
Sequence Divergence & The Molecular Clock Sequence Divergence v simple genetic distance, d = the proportion of sites that differ between two aligned, homologous sequences v given a constant mutation/substitution
More informationCrick s early Hypothesis Revisited
Crick s early Hypothesis Revisited Or The Existence of a Universal Coding Frame Ryan Rossi, Jean-Louis Lassez and Axel Bernal UPenn Center for Bioinformatics BIOINFORMATICS The application of computer
More informationLecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26
Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and
More informationWhy do more divergent sequences produce smaller nonsynonymous/synonymous
Genetics: Early Online, published on June 21, 2013 as 10.1534/genetics.113.152025 Why do more divergent sequences produce smaller nonsynonymous/synonymous rate ratios in pairwise sequence comparisons?
More informationHow Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building
How Molecules Evolve Guest Lecture: Principles and Methods of Systematic Biology 11 November 2013 Chris Simon Approaching phylogenetics from the point of view of the data Understanding how sequences evolve
More informationProtein Threading. Combinatorial optimization approach. Stefan Balev.
Protein Threading Combinatorial optimization approach Stefan Balev Stefan.Balev@univ-lehavre.fr Laboratoire d informatique du Havre Université du Havre Stefan Balev Cours DEA 30/01/2004 p.1/42 Outline
More informationSSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),
48 3 () Vol. 48 No. 3 2009 5 Journal of Xiamen University (Nat ural Science) May 2009 SSR,,,, 3 (, 361005) : SSR. 21 516,410. 60 %96. 7 %. (),(Between2groups linkage method),.,, 11 (),. 12,. (, ), : 0.
More informationUsing algebraic geometry for phylogenetic reconstruction
Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA
More informationSupplemental data. Pommerrenig et al. (2011). Plant Cell /tpc
Supplemental Figure 1. Prediction of phloem-specific MTK1 expression in Arabidopsis shoots and roots. The images and the corresponding numbers showing absolute (A) or relative expression levels (B) of
More informationProbabilistic modeling and molecular phylogeny
Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the
More informationUnderstanding relationship between homologous sequences
Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationClay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.
QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. Clay Carter Department of Biology QuickTime and a TIFF (LZW) decompressor are needed to see this picture. Ornamental tobacco
More informationSUPPLEMENTARY DATA - 1 -
- 1 - SUPPLEMENTARY DATA Construction of B. subtilis rnpb complementation plasmids For complementation, the B. subtilis rnpb wild-type gene (rnpbwt) under control of its native rnpb promoter and terminator
More informationSupporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)-
Supporting Information for Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)- Dependence and Its Ability to Chelate Multiple Nutrient Transition Metal Ions Rose C. Hadley,
More informationpart 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.
2017-07-29 part 4: and biological inference review types of models phenomenological Newton F= Gm1m2 r2 mechanistic Einstein Gαβ = 8π Tαβ 1 molecular evolution is process and pattern process pattern MutSel
More informationLecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22
Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and
More informationTaming the Beast Workshop
Workshop David Rasmussen & arsten Magnus June 27, 2016 1 / 31 Outline of sequence evolution: rate matrices Markov chain model Variable rates amongst different sites: +Γ Implementation in BES2 2 / 31 genotype
More informationThe Trigram and other Fundamental Philosophies
The Trigram and other Fundamental Philosophies by Weimin Kwauk July 2012 The following offers a minimal introduction to the trigram and other Chinese fundamental philosophies. A trigram consists of three
More informationSubstitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A
GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC
More informationBio 1B Lecture Outline (please print and bring along) Fall, 2007
Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution
More informationLecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)
Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from
More informationEarly History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species
Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0
More informationHigh throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm
Electronic Supplementary Material (ESI) for Nanoscale. This journal is The Royal Society of Chemistry 2018 High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence
More informationNSCI Basic Properties of Life and The Biochemistry of Life on Earth
NSCI 314 LIFE IN THE COSMOS 4 Basic Properties of Life and The Biochemistry of Life on Earth Dr. Karen Kolehmainen Department of Physics CSUSB http://physics.csusb.edu/~karen/ WHAT IS LIFE? HARD TO DEFINE,
More information7. Tests for selection
Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info
More informationMutation models I: basic nucleotide sequence mutation models
Mutation models I: basic nucleotide sequence mutation models Peter Beerli September 3, 009 Mutations are irreversible changes in the DNA. This changes may be introduced by chance, by chemical agents, or
More informationMolecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016
Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,
More informationModelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics
582746 Modelling and Analysis in Bioinformatics Lecture 1: Genomic k-mer Statistics Juha Kärkkäinen 06.09.2016 Outline Course introduction Genomic k-mers 1-Mers 2-Mers 3-Mers k-mers for Larger k Outline
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationAoife McLysaght Dept. of Genetics Trinity College Dublin
Aoife McLysaght Dept. of Genetics Trinity College Dublin Evolution of genome arrangement Evolution of genome content. Evolution of genome arrangement Gene order changes Inversions, translocations Evolution
More informationCodon Distribution in Error-Detecting Circular Codes
life Article Codon Distribution in Error-Detecting Circular Codes Elena Fimmel, * and Lutz Strüngmann Institute for Mathematical Biology, Faculty of Computer Science, Mannheim University of Applied Sciences,
More informationElectronic supplementary material
Applied Microbiology and Biotechnology Electronic supplementary material A family of AA9 lytic polysaccharide monooxygenases in Aspergillus nidulans is differentially regulated by multiple substrates and
More informationAn Analytical Model of Gene Evolution with 9 Mutation Parameters: An Application to the Amino Acids Coded by the Common Circular Code
Bulletin of Mathematical Biology (2007) 69: 677 698 DOI 10.1007/s11538-006-9147-z ORIGINAL ARTICLE An Analytical Model of Gene Evolution with 9 Mutation Parameters: An Application to the Amino Acids Coded
More informationQuantifying sequence similarity
Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity
More informationSupplemental Table 1. Primers used for cloning and PCR amplification in this study
Supplemental Table 1. Primers used for cloning and PCR amplification in this study Target Gene Primer sequence NATA1 (At2g393) forward GGG GAC AAG TTT GTA CAA AAA AGC AGG CTT CAT GGC GCC TCC AAC CGC AGC
More informationIn: P. Lemey, M. Salemi and A.-M. Vandamme (eds.). To appear in: The. Chapter 4. Nucleotide Substitution Models
In: P. Lemey, M. Salemi and A.-M. Vandamme (eds.). To appear in: The Phylogenetic Handbook. 2 nd Edition. Cambridge University Press, UK. (final version 21. 9. 2006) Chapter 4. Nucleotide Substitution
More informationEvolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval
Evolvable Neural Networs for Time Series Prediction with Adaptive Learning Interval Dong-Woo Lee *, Seong G. Kong *, and Kwee-Bo Sim ** *Department of Electrical and Computer Engineering, The University
More informationTHE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE
STATISTICA, anno LXIX, n. 2 3, 2009 THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE Diego Luis Gonzalez CNR-IMM, Bologna Section, Via Gobetti 101, I-40129, Bologna,
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationMolecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço
Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço jcarrico@fm.ul.pt Charles Darwin (1809-1882) Charles Darwin s tree of life in Notebook B, 1837-1838 Ernst Haeckel (1934-1919)
More informationpart 3: analysis of natural selection pressure
part 3: analysis of natural selection pressure markov models are good phenomenological codon models do have many benefits: o principled framework for statistical inference o avoiding ad hoc corrections
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationMaximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.
Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf
More informationIn: M. Salemi and A.-M. Vandamme (eds.). To appear. The. Phylogenetic Handbook. Cambridge University Press, UK.
In: M. Salemi and A.-M. Vandamme (eds.). To appear. The Phylogenetic Handbook. Cambridge University Press, UK. Chapter 4. Nucleotide Substitution Models THEORY Korbinian Strimmer () and Arndt von Haeseler
More informationPhylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction
More informationRegulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)
Regulatory Sequence Analysis Sequence models (Bernoulli and Markov models) 1 Why do we need random models? Any pattern discovery relies on an underlying model to estimate the random expectation. This model
More informationLecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).
1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff
More informationMaximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018
Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationNumber-controlled spatial arrangement of gold nanoparticles with
Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2016 Number-controlled spatial arrangement of gold nanoparticles with DNA dendrimers Ping Chen,*
More informationLecture 4. Models of DNA and protein change. Likelihood methods
Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36
More informationWhat Is Conservation?
What Is Conservation? Lee A. Newberg February 22, 2005 A Central Dogma Junk DNA mutates at a background rate, but functional DNA exhibits conservation. Today s Question What is this conservation? Lee A.
More informationLecture Notes: BIOL2007 Molecular Evolution
Lecture Notes: BIOL2007 Molecular Evolution Kanchon Dasmahapatra (k.dasmahapatra@ucl.ac.uk) Introduction By now we all are familiar and understand, or think we understand, how evolution works on traits
More informationNature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1
Supplementary Figure 1 Zn 2+ -binding sites in USP18. (a) The two molecules of USP18 present in the asymmetric unit are shown. Chain A is shown in blue, chain B in green. Bound Zn 2+ ions are shown as
More informationLecture 4. Models of DNA and protein change. Likelihood methods
Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/39
More informationCONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018
CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of
More informationBioinformatics. Part 8. Sequence Analysis An introduction. Mahdi Vasighi
Bioinformatics Sequence Analysis An introduction Part 8 Mahdi Vasighi Sequence analysis Some of the earliest problems in genomics concerned how to measure similarity of DNA and protein sequences, either
More informationTM1 TM2 TM3 TM4 TM5 TM6 TM bp
a 467 bp 1 482 2 93 3 321 4 7 281 6 21 7 66 8 176 19 12 13 212 113 16 8 b ATG TCA GGA CAT GTA ATG GAG GAA TGT GTA GTT CAC GGT ACG TTA GCG GCA GTA TTG CGT TTA ATG GGC GTA GTG M S G H V M E E C V V H G T
More informationEvolutionary Change in Nucleotide Sequences. Lecture 3
Evolutionary Change in Nucleotide Sequences Lecture 3 1 So far, we described the evolutionary process as a series of gene substitutions in which new alleles, each arising as a mutation ti in a single individual,
More informationPhylogenetics. BIOL 7711 Computational Bioscience
Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium
More informationSupplementary Information
Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2014 Directed self-assembly of genomic sequences into monomeric and polymeric branched DNA structures
More informationMolecular Evolution and DNA systematics
Biology 4505 - Biogeography & Systematics Dr. Carr Molecular Evolution and DNA systematics Ultimately, the source of all organismal variation that we have examined in this course is the genome, written
More informationSupplemental Figure 1.
A wt spoiiiaδ spoiiiahδ bofaδ B C D E spoiiiaδ, bofaδ Supplemental Figure 1. GFP-SpoIVFA is more mislocalized in the absence of both BofA and SpoIIIAH. Sporulation was induced by resuspension in wild-type
More informationEstimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057
Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number
More informationObjective: You will be able to justify the claim that organisms share many conserved core processes and features.
Objective: You will be able to justify the claim that organisms share many conserved core processes and features. Do Now: Read Enduring Understanding B Essential knowledge: Organisms share many conserved
More informationTable S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R
Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R AAC MGG ATT AGA TAC CCK G GGY TAC CTT GTT ACG ACT T Detection of Candidatus
More informationSimilarity or Identity? When are molecules similar?
Similarity or Identity? When are molecules similar? Mapping Identity A -> A T -> T G -> G C -> C or Leu -> Leu Pro -> Pro Arg -> Arg Phe -> Phe etc If we map similarity using identity, how similar are
More informationBIOL 502 Population Genetics Spring 2017
BIOL 502 Population Genetics Spring 2017 Lecture 1 Genomic Variation Arun Sethuraman California State University San Marcos Table of contents 1. What is Population Genetics? 2. Vocabulary Recap 3. Relevance
More informationPhylogenetics: Building Phylogenetic Trees
1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should
More informationSupporting Information
Supporting Information T. Pellegrino 1,2,3,#, R. A. Sperling 1,#, A. P. Alivisatos 2, W. J. Parak 1,2,* 1 Center for Nanoscience, Ludwig Maximilians Universität München, München, Germany 2 Department of
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationSUPPLEMENTARY INFORMATION
DOI:.8/NCHEM. Conditionally Fluorescent Molecular Probes for Detecting Single Base Changes in Double-stranded DNA Sherry Xi Chen, David Yu Zhang, Georg Seelig. Analytic framework and probe design.. Design
More informationPhylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University
Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary
More informationCapacity of DNA Data Embedding Under. Substitution Mutations
Capacity of DNA Data Embedding Under Substitution Mutations Félix Balado arxiv:.3457v [cs.it] 8 Jan Abstract A number of methods have been proposed over the last decade for encoding information using deoxyribonucleic
More informationBioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics
Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods
More informationRELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG
RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG Department of Biology (Galton Laboratory), University College London, 4 Stephenson
More informationPreliminaries. Download PAUP* from: Tuesday, July 19, 16
Preliminaries Download PAUP* from: http://people.sc.fsu.edu/~dswofford/paup_test 1 A model of the Boston T System 1 Idea from Paul Lewis A simpler model? 2 Why do models matter? Model-based methods including
More informationPhylogenetic Assumptions
Substitution Models and the Phylogenetic Assumptions Vivek Jayaswal Lars S. Jermiin COMMONWEALTH OF AUSTRALIA Copyright htregulation WARNING This material has been reproduced and communicated to you by
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationBiosynthesis of Bacterial Glycogen: Primary Structure of Salmonella typhimurium ADPglucose Synthetase as Deduced from the
JOURNAL OF BACTERIOLOGY, Sept. 1987, p. 4355-4360 0021-9193/87/094355-06$02.00/0 Copyright X) 1987, American Society for Microbiology Vol. 169, No. 9 Biosynthesis of Bacterial Glycogen: Primary Structure
More informationIt is the author's version of the article accepted for publication in the journal "Biosystems" on 03/10/2015.
It is the author's version of the article accepted for publication in the journal "Biosystems" on 03/10/2015. The system-resonance approach in modeling genetic structures Sergey V. Petoukhov Institute
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More informationEncoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective
Jacobs University Bremen Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective Semester Project II By: Dawit Nigatu Supervisor: Prof. Dr. Werner Henkel Transmission
More informationEVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS
August 0 Vol 4 No 005-0 JATIT & LLS All rights reserved ISSN: 99-8645 wwwjatitorg E-ISSN: 87-95 EVOLUTIONAY DISTANCE MODEL BASED ON DIFFEENTIAL EUATION AND MAKOV OCESS XIAOFENG WANG College of Mathematical
More informationSequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013
Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationGet started on your Cornell notes right away
UNIT 10: Evolution DAYSHEET 100: Introduction to Evolution Name Biology I Date: Bellringer: 1. Get out your technology and go to www.biomonsters.com 2. Click the Biomonsters Cinema link. 3. Click the CHS
More information