Lecture 4. Models of DNA and protein change. Likelihood methods


 Laurence Allison
 2 years ago
 Views:
Transcription
1 Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36
2 The JukesCantor model (1969) A u/3 u/3 G u/3 u/3 u/3 C u/3 T the simplest symmetrical model of DNA evolution Lecture 4. Models of DNA and protein change. Likelihood methods p.2/36
3 Transition probabilities under the JukesCantor model All sites change independently All sites have the same stochastic process working at them Make up a fictional kind of event, such that when it happens the site changes to one of the 4 bases chosen at random (equiprobably) Assertion: Having these events occur at rate 4 3u is the same as having the JukesCantor model events occur at rate u The probability of none of these fictional events happens in time t is exp( 4 3 ut) No matter how many of these fictional events occur, provided it is not zero, the chance of ending up at a particular base is 1 4. Lecture 4. Models of DNA and protein change. Likelihood methods p.3/36
4 JukesCantor transition probabilities, cont d Putting all this together, the probability of changing to C, given the site is currently at A, in time t is Prob (C A, t) = 1 4 (1 e 4 3 ut) while Prob (A A, t) = e 4 3 t (1 e 4 3 ut) or Prob (A A, t) = 1 4 (1 + 3e 4 3 ut) so that the total probability of change is (1 e 4 3 ut ) Prob (change t) = 3 4 Lecture 4. Models of DNA and protein change. Likelihood methods p.4/36
5 Fraction of sites different, JukesCantor 1 Differences per site Branch length after branches of different length, under the JukesCantor model Lecture 4. Models of DNA and protein change. Likelihood methods p.5/36
6 Kimura s (1980) K2P model of DNA change, A a G b b b b C a T which allows for different rates of transitions and transversions, Lecture 4. Models of DNA and protein change. Likelihood methods p.6/36
7 Motoo Kimura Motoo Kimura, with family in Mishima, Japan in the 1960 s Lecture 4. Models of DNA and protein change. Likelihood methods p.7/36
8 Transition probabilities for the K2P model with two kinds of events: I. At rate α, if the site has a purine (A or G), choose one of the two purines at random and change to it. If the site has a pyrimidine (C or T), choose one of the pyrimidines at random and change to it. II. At rate β, choose one of the 4 bases at random and change to it. By proper choice of α and β one can achieve the overall rate of change and T s /T n ratio R you want. For rate of change 1, the transition probabilities (warning: terminological tangle). ) Prob (transition t) = ( exp 2R+1 R+1 t Prob (transversion t) = exp ( 2 R+1 t ) exp ( 2 R+1 t ) (the transversion probability is the sum of the probabilities of both kinds of transversions). Lecture 4. Models of DNA and protein change. Likelihood methods p.8/36
9 Transitions, transversions expected Differences Total differences Transitions 0.20 Transversions Time (branch length) R = 10 in different amounts of branch length under the K2P model, for T s /T n = 10 Lecture 4. Models of DNA and protein change. Likelihood methods p.9/36
10 Transitions, transversions expected Total differences Differences Transversions Transitions R = Time (branch length) in different amounts of branch length under the K2P model, for T s /T n = 2 Lecture 4. Models of DNA and protein change. Likelihood methods p.10/36
11 Other commonly used models include: Two models that specify the equilibrium base frequencies (you provide the frequencies π A, π C, π G, π T and they are set up to have an equilibrium which achieves them), and also let you control the transition/transversion ratio: The HasegawaKishinoYano (1985) model: to : A G C T from : A απ G + βπ G απ C απ T G απ A + βπ A απ C απ T C απ A απ G απ T + βπ T T απ A απ G απ C + βπ C Lecture 4. Models of DNA and protein change. Likelihood methods p.11/36
12 My F84 model to : A G C T from : A απ G + β π G πr απ C απ T G απ A + β π A πr απ C απ T C απ A απ G απ T + βπ T π Y T απ A απ G απ C + β π C πy where π R = π A + π G and π Y = π C + π T (The equilibrium frequencies of purines and pyrimidines) Both of these models have formulas for the transition probabilities, and both are subcases of a slightly more general class of models, the TamuraNei model (1993). Lecture 4. Models of DNA and protein change. Likelihood methods p.12/36
13 Reversibility P ij Pji π j π i Lecture 4. Models of DNA and protein change. Likelihood methods p.13/36
14 The General TimeReversible model (GTR) It maintains detailed balance" so that the probability of starting at (say) A and ending at (say) T in evolution is the same as the probability of starting at T and ending at A: to : A G C T from : A απ G βπ C γπ T G απ A δπ C ɛπ T C βπ A δπ G υπ T T γπ A ɛπ G υπ C And there is of course the general 12parameter model which has arbitrary rates for each of the 12 possible changes (from each of the 4 nucleotides to each of the 3 others). (Neither of these has formulas for the transition probabilities, but those can be done numerically.) Lecture 4. Models of DNA and protein change. Likelihood methods p.14/36
15 Relation between models There are many other models, but these are the most widelyused ones. Here is a general scheme of which models are subcases of which other ones: General 12 parameter model (12) General time reversible model (9) Tamura Nei (6) HKY (5) F84 (5) Kimura K2P (2) Jukes Cantor (1) Lecture 4. Models of DNA and protein change. Likelihood methods p.15/36
16 Rate variation among sites In reality, rates of evolution are not constant among sites. Fortunately, in the transition probability formulas, rates come in as simple multiples of times Thus if we know the rates at two sites, we can compute the probabilities of change by simply, for each site, multiplying all branch lengths by the appropriate rate If we don t know the rates, we can imagine averaging them over a distribution of rates. Usually the Gamma distribution is used In practice a discrete histogram of rates approximates the integration (For the Gamma it seems best to use Generalized Laguerre Quadrature to pick the rates and frequencies in the histogram). Also, there are actually autocorrelations with neighboring sites having similar rates of change. This can be handled by Hidden Markov Models, which we cover later. Lecture 4. Models of DNA and protein change. Likelihood methods p.16/36
17 A pioneer of protein evolution Margaret Dayhoff, about 1966 Lecture 4. Models of DNA and protein change. Likelihood methods p.17/36
18 Models of amino acid change in proteins There are a variety of models put forward since the mid1960 s: 1. Amino acid transition matrices Dayhoff (1968) model. Tabulation of empirical changes in closely related pairs of proteins, normalized. The PAM100 matrix, for example, is the expected transition matrix given 1 substitution per position. Jones, Taylor and Thornton (1992) recalculated PAM matrices (the JTT matrix) from a much larger set of data. Jones, Taylor, and Thurnton (1994a, 1994b) have tabulated a separate mutation data matrix for transmembrane proteins. Koshi and Goldstein (1995) have described the tabulation of further contextdependent mutation data matrices. Henikoff and Henikoff (1992) have tabulated the BLOSUM matrix for conserved motifs in gene families. 2. Goldman and Yang (1994) pioneered codonbased models (see next screen). Lecture 4. Models of DNA and protein change. Likelihood methods p.18/36
19 Approaches to protein sequence models Making a model for protein sequence evolution (a not very practical approach) 1. Use a good model of DNA evolution. 2. Use the appropriate genetic code 3. When an amino acid changes, accept it with a probability that declines as the amino acids become more different 4. Fit this to empirical information on protein evolution 5. Take into account variation of rate from site to site 6. Take into account correlation of rate variation in adjacent sites 7. How about protein structure? Secondary structure? 3 D struncture? Lecture 4. Models of DNA and protein change. Likelihood methods p.19/36
20 Likelihoods and odds ratios Bayes Theorem relates prior and posterior probabilities of an hypothesis H: Prob (H D) = Prob (H and D)/ Prob (D) = Prob (D H) Prob (H)/ Prob (D) The ratios of posterior probabilities of two hypotheses, 1 and 2 can be written, putting this into its odds ratio" form () cancels): Prob (H 1 D) Prob (H 2 D) = Prob (D H 1) Prob (D H 2 ) Prob (H 1 ) Prob (H 2 ) Note that this says that the posterior odds in favor of 1 over 2 are the product of prior odds and a likelihood ratio. The likelihood of the hypothesis H is the probability of the observed data given it, ). This is not the same as the probability of the hypothesis given the data. That is the posterior probability of H and requires that we also have a believable prior probability ) Lecture 4. Models of DNA and protein change. Likelihood methods p.20/36
21 Rationale of likelihood inference If the data consists of n items that are conditionally independent given the hypothesis i, Prob (D H i ) = Prob (D (1) H i ) Prob (D (2) H i )... Prob (D (n) H i ). and we can then write the likelihood ratio as a product of ratios: Prob (D H 1 ) Prob (D H 2 ) = ( n i=1 ) Prob (D (i) H 1 ) Prob (D (i) H 2 ) If the amount of data is large the likelihood ratio terms will dominate and push the result towards the correct hypothesis. This can console us somewhat for the lack of a believable prior. Lecture 4. Models of DNA and protein change. Likelihood methods p.21/36
22 Properties of likelihood inference Likeihood inference has (usually) properties of Consistency. As the number of data items n gets large, we converge to the correct hypothesis with probability 1. Efficiency. Asymptotically, the likelihood estimate has the smallest possible variance (it need not be best for any finite number n of data points). Lecture 4. Models of DNA and protein change. Likelihood methods p.22/36
23 A simple example coin tossing If we toss a coin which has heads probability p and get HHTTHTHHTTT the likelihood is L = Prob (D p) = pp(1 p)(1 p)p(1 p)pp(1 p)(1 p)(1 p) = p 5 (1 p) 6 so that trying to maximize it we get dl dp = 5p4 (1 p) 6 6p 5 (1 p) 5 Lecture 4. Models of DNA and protein change. Likelihood methods p.23/36
24 finding the ML estimate and searching for a value of p for which the slope is zero: which has roots at 0, 1, and 1 dl dp = p4 (1 p) 5 (5(1 p) 6p) = 0 Lecture 4. Models of DNA and protein change. Likelihood methods p.24/36
25 Log likelihoods Alternatively, we could maximize not L but its logarithm. This turns products into sums: ln L = 5 ln p + 6 ln(1 p) whereby d(ln L) dp = 5 p 6 (1 p) = 0 so that finally p = 5/11 Lecture 4. Models of DNA and protein change. Likelihood methods p.25/36
26 Likelihood curve for coin tosses Likelihood p Lecture 4. Models of DNA and protein change. Likelihood methods p.26/36
27 Likelihood on trees A C C G t 1 t 2 y C t 3 t 4 t 5 w t t 6 z 7 x t 8 A tree, with branch lengths, and the data at a single site This example is used to describe calculation of the likelihood Since the sites evolve independently on the same tree, L = Prob (D T) = m Prob ( ) D (i) T i=1 Lecture 4. Models of DNA and protein change. Likelihood methods p.27/36
28 Likelihood at one site on a tree We can compute this by summing over all assignments of states x, y, z and w to the interior nodes Prob ( D (i) T ) = x y z w Prob (A, C, C, C, G, x, y, z, w T) Lecture 4. Models of DNA and protein change. Likelihood methods p.28/36
29 Computing the terms For each combination of states, the Markov process allows us to express it as a product of probabilities of a series of changes, with the probability that we start in state x: Prob (A, C, C, C, G, x, y, z, w T) = Prob (x) Prob (y x, t 6 ) Prob (A y, t 1 ) Prob (C y, t 2 ) Prob (z x, t 8 ) Prob (C z, t 3 ) Prob (w z, t 7 ) Prob (C w, t 4 ) Prob (G w, t 5 ) Lecture 4. Models of DNA and protein change. Likelihood methods p.29/36
30 Computing the terms Summing this up, there are 256 terms in this case: x y z w Prob (x) Prob (y x, t 6 ) Prob (A y, t 1 ) Prob (C y, t 2 ) Prob (z x, t 8 ) Prob (C z, t 3 ) Prob (w z, t 7 ) Prob (C w, t 4 ) Prob (G w, t 5 ) Lecture 4. Models of DNA and protein change. Likelihood methods p.30/36
31 References (models) Barry, D., and J. A. Hartigan Statistical analysis of hominoid molecular evolution. Statistical Science 2: [Early use of full 12parameter model] Dayhoff, M. O. and R. V. Eck Atlas of Protein Sequence and Structure National Biomedical Research Foundation, Silver Spring, Maryland. [[Dayhoff s PAM modelfor proteins] Goldman, N., and Z. Yang A codonbased model of nucleotide substitution for proteincoding DNA sequences. Molecular Biology and Evolution 11: [[codonbased protein/dna models] Hasegawa, M., H. Kishino, and T. Yano Dating of the humanape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22: [[HKY model] Henikoff, S. and J. G. Henikoff Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences, USA 89: [[BLOSUM protein model] Lecture 4. Models of DNA and protein change. Likelihood methods p.31/36
32 References (models) Jones, D. T., W. R. Taylor, and J. M. Thornton The rapid generation of mutation data matrices from protein sequences. Computer Applcations in the Biosciences (CABIOS) 8: [[JTT model for proteins] Jones, D. T., W. R. Taylor, and J. M. Thornton. 1994a. A model recognition approach to the prediction of allhelical membrane protein structure and topology. Biochemistry 33: [JTT membrane protein model] Jones, D. T., W. R. Taylor, and J. M. Thornton. 1994b. A mutation data matrix for transmembrane proteins. FEBS Letters 339: [[JTT membrane protein model] Jukes, T. H. and C. Cantor Evolution of protein molecules. pp in Mammalian Protein Metabolism, ed. M. N. Munro. Academic Press, New York. [[JukesCantor model] Kimura, M A simple model for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16: [[Kimura s 2parameter model] Lecture 4. Models of DNA and protein change. Likelihood methods p.32/36
33 References (models) Koshi, J. M. and R. A. Goldstein Contextdependent optimal substitution matrices. Protein Engineering 8: [[generating other kinds of protein model matrices] Lanave, C., G. Preparata, C. Saccone, and G. Serio A new method for calculating evolutionary substitution rates. Journal of Molecular Evolution 20: [[General reversible model] Lockhart, P. J., M. A. Steel, M. D. Hendy, and D. Penny Recovering evolutionary trees under a more realistic model of sequence evolution. Molecular Biology and Evolution 11: [[The LogDet distance for correcting for changing base composition] Tamura, K. and M. Nei Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution 10: [[TamuraNei model] Lecture 4. Models of DNA and protein change. Likelihood methods p.33/36
34 References (likelihood and Bayesian) Barry, D., and J. A. Hartigan Statistical analysis of hominoid molecular evolution. Statistical Science 2: [[ML with full 12parameter model, estimated on each branch] Edwards, A. W. F., and L. L. CavalliSforza Reconstruction of evolutionary trees. pp in Phenetic and Phylogenetic Classification, ed. V. H. Heywood and J. McNeill. Systematics Association Publ. No. 6, London. [[first paper on likelihood for phylogenies] Felsenstein, J Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: [[Made likelihood practical for n species] Felsenstein, J Maximum likelihood and minimumsteps methods for estimating evolutionary trees from data on discrete characters. Systematic Zoology 22: [[The pruning" algorithm] Fisher, R. A On an absolute criterion for fitting frequency curves. Messenger of Mathematics 41: [[First modern paper introducing likelihood] Fisher, R. A On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London, A 222: [[Likelihood in generality] Lecture 4. Models of DNA and protein change. Likelihood methods p.34/36
35 References (likelihood and Bayesian) Kashyap, R. L., and S. Subas Statistical estimation of parameters in a phylogenetic tree using a dynamic model of the substitutional process. Journal of Theoretical Biology 47: [[Second paper applying likelihood to molecular sequences] Neyman, J Molecular studies of evolution: a source of novel statistical problems. pp in Statistical Decision Theory and Related Topics, ed. S. S. Gupta and J. Yackel. Academic Press, New York. [[First application of likelihood to molecular sequences] Lecture 4. Models of DNA and protein change. Likelihood methods p.35/36
36 How it was done This projection produced using the prosper style in LaTeX, using Latex to make a.dvi file, using dvips to turn this into a Postscript file, using ps2pdf to make it into a PDF file, and displaying the slides in Adobe Acrobat Reader. Result: nice slides using freeware. Lecture 4. Models of DNA and protein change. Likelihood methods p.36/36
Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22
Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and
More informationLecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26
Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and
More informationLecture 4. Models of DNA and protein change. Likelihood methods
Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/39
More informationHow should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?
How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What
More informationLecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30
Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A nonphylogeny
More informationMaximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018
Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras
More informationWeek 5: Distance methods, DNA and protein models
Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69 A tree and the expected distances it predicts E A 0.08 0.05 0.06 0.03
More informationLecture Notes: Markov chains
Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity
More informationPhylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Distance Methods COMP 571  Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distancebased methods Evolutionary Models and Distance Correction
More informationLecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).
1 Bioinformatics: Indepth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationPhylogeny Estimation and Hypothesis Testing using Maximum Likelihood
Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood For: Prof. Partensky Group: Jimin zhu Rama Sharma Sravanthi Polsani Xin Gong Shlomit klopman April. 7. 2003 Table of Contents Introduction...3
More informationAlgorithmic Methods Welldefined methodology Tree reconstruction those that are welldefined enough to be carried out by a computer. Felsenstein 2004,
Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin 1837
More informationAmira A. ALHosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. ALHosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut UniversityEgypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationLecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)
Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from
More informationSequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the loglikelihood ratio of
More informationDr. Amira A. ALHosary
Phylogenetic analysis Amira A. ALHosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut UniversityEgypt Phylogenetic Basics: Biological
More informationWeek 7: Bayesian inference, Testing trees, Bootstraps
Week 7: ayesian inference, Testing trees, ootstraps Genome 570 May, 2008 Week 7: ayesian inference, Testing trees, ootstraps p.1/54 ayes Theorem onditional probability of hypothesis given data is: Prob
More informationMaximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.
Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters  "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationSubstitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A
GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC
More informationTaming the Beast Workshop
Workshop David Rasmussen & arsten Magnus June 27, 2016 1 / 31 Outline of sequence evolution: rate matrices Markov chain model Variable rates amongst different sites: +Γ Implementation in BES2 2 / 31 genotype
More informationInferring Phylogenies from Protein Sequences by. Parsimony, Distance, and Likelihood Methods. Joseph Felsenstein. Department of Genetics
Inferring Phylogenies from Protein Sequences by Parsimony, Distance, and Likelihood Methods Joseph Felsenstein Department of Genetics University of Washington Box 357360 Seattle, Washington 981957360
More informationWhat Is Conservation?
What Is Conservation? Lee A. Newberg February 22, 2005 A Central Dogma Junk DNA mutates at a background rate, but functional DNA exhibits conservation. Today s Question What is this conservation? Lee A.
More informationMassachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution
Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral
More informationPhylogenetics. BIOL 7711 Computational Bioscience
Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium
More informationBioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics
Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics  in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa.  before we review the
More informationMaximum Likelihood in Phylogenetics
Maximum Likelihood in Phylogenetics June 1, 2009 Smithsonian Workshop on Molecular Evolution Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut, Storrs, CT Copyright 2009
More informationPhylogenetics: Building Phylogenetic Trees
1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should
More informationLetter to the Editor. Department of Biology, Arizona State University
Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona
More informationBMI/CS 776 Lecture 4. Colin Dewey
BMI/CS 776 Lecture 4 Colin Dewey 2007.02.01 Outline Common nucleotide substitution models Directed graphical models Ancestral sequence inference Poisson process continuous Markov process X t0 X t1 X t2
More informationPhylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University
Phylogenetics: Building Phylogenetic Trees COMP 571  Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary
More informationInferring Molecular Phylogeny
Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction
More informationRELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG
RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG Department of Biology (Galton Laboratory), University College London, 4 Stephenson
More informationEstimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6057
Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 42818 Jordan 6057 Tree estimation strategies: Parsimony?no model, simply count minimum number
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION  theory that groups of organisms change over time so that descendeants differ structurally
More informationAdditive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.
Additive distances Let T be a tree on leaf set S and let w : E R + be an edgeweighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then
More informationImproving divergence time estimation in phylogenetics: more taxa vs. longer sequences
Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 6500377 Postal
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istancebased methods Ultrametric Additive: UPGMA Transformed istance NeighborJoining Characterbased Maximum Parsimony Maximum Likelihood
More informationLikelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution
Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions
More informationReconstruire le passé biologique modèles, méthodes, performances, limites
Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire
More informationPhylogenetic Inference and Hypothesis Testing. Catherine Lai (92720) BSc(Hons) Department of Mathematics and Statistics University of Melbourne
Phylogenetic Inference and Hypothesis Testing Catherine Lai (92720) BSc(Hons) Department of Mathematics and Statistics University of Melbourne November 13, 2003 Contents 1 Introduction 4 2 Molecular Phylogenetics
More informationMolecular Evolution, course # Final Exam, May 3, 2006
Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least
More informationPreliminaries. Download PAUP* from: Tuesday, July 19, 16
Preliminaries Download PAUP* from: http://people.sc.fsu.edu/~dswofford/paup_test 1 A model of the Boston T System 1 Idea from Paul Lewis A simpler model? 2 Why do models matter? Modelbased methods including
More informationEvolutionary Analysis of Viral Genomes
University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral
More information7. Tests for selection
Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group PaulFlechsigInstitute for Brain Research www. nowicklab.info
More informationMODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL
MODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL MATTHEW W. DIMMIC*, DAVID P. MINDELL RICHARD A. GOLDSTEIN* * Biophysics Research Division Department of Biology and
More informationSummary statistics, distributions of sums and means
Summary statistics, distributions of sums and means Joe Felsenstein Summary statistics, distributions of sums and means p.1/17 Quantiles In both empirical distributions and in the underlying distribution,
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distancebased methods Ultrametric Additive: UPGMA Transformed Distance NeighborJoining Characterbased Maximum Parsimony Maximum Likelihood
More informationPredicting the Evolution of two Genes in the Yeast Saccharomyces Cerevisiae
Available online at wwwsciencedirectcom Procedia Computer Science 11 (01 ) 4 16 Proceedings of the 3rd International Conference on Computational SystemsBiology and Bioinformatics (CSBio 01) Predicting
More informationProbabilistic modeling and molecular phylogeny
Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical
More informationQuantifying sequence similarity
Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity
More informationEVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS
August 0 Vol 4 No 0050 JATIT & LLS All rights reserved ISSN: 998645 wwwjatitorg EISSN: 8795 EVOLUTIONAY DISTANCE MODEL BASED ON DIFFEENTIAL EUATION AND MAKOV OCESS XIAOFENG WANG College of Mathematical
More informationInference of phylogenies, with some thoughts on statistics and geometry p.1/31
Inference of phylogenies, with some thoughts on statistics and geometry Joe Felsenstein University of Washington Inference of phylogenies, with some thoughts on statistics and geometry p.1/31 Darwin s
More informationIn: M. Salemi and A.M. Vandamme (eds.). To appear. The. Phylogenetic Handbook. Cambridge University Press, UK.
In: M. Salemi and A.M. Vandamme (eds.). To appear. The Phylogenetic Handbook. Cambridge University Press, UK. Chapter 4. Nucleotide Substitution Models THEORY Korbinian Strimmer () and Arndt von Haeseler
More informationLab 9: Maximum Likelihood and Modeltest
Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2010 Updated by Nick Matzke Lab 9: Maximum Likelihood and Modeltest In this lab we re going to use PAUP*
More informationWeek 8: Testing trees, Bootstraps, jackknifes, gene frequencies
Week 8: Testing trees, ootstraps, jackknifes, gene frequencies Genome 570 ebruary, 2016 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.1/69 density e log (density) Normal distribution:
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationIntegrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley
Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;
More informationMaximum Likelihood in Phylogenetics
Maximum Likelihood in Phylogenetics 26 January 2011 Workshop on Molecular Evolution Český Krumlov, Česká republika Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut,
More informationEfficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used
Molecular Phylogenetics and Evolution 31 (2004) 865 873 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev Efficiencies of maximum likelihood methods of phylogenetic inferences when different
More informationMutation models I: basic nucleotide sequence mutation models
Mutation models I: basic nucleotide sequence mutation models Peter Beerli September 3, 009 Mutations are irreversible changes in the DNA. This changes may be introduced by chance, by chemical agents, or
More informationIn: P. Lemey, M. Salemi and A.M. Vandamme (eds.). To appear in: The. Chapter 4. Nucleotide Substitution Models
In: P. Lemey, M. Salemi and A.M. Vandamme (eds.). To appear in: The Phylogenetic Handbook. 2 nd Edition. Cambridge University Press, UK. (final version 21. 9. 2006) Chapter 4. Nucleotide Substitution
More informationUnderstanding relationship between homologous sequences
Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective
More informationPhylogenetic Inference using RevBayes
Phylogenetic Inference using RevBayes Model section using Bayes factors Sebastian Höhna 1 Overview This tutorial demonstrates some general principles of Bayesian model comparison, which is based on estimating
More informationKaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging
Method KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging Zhang Zhang 1,2,3#, Jun Li 2#, XiaoQian Zhao 2,3, Jun Wang 1,2,4, Gane KaShu Wong 2,4,5, and Jun Yu 1,2,4 * 1
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationEdward Susko Department of Mathematics and Statistics, Dalhousie University. Introduction. Installation
1 dist est: Estimation of RatesAcrossSites Distributions in Phylogenetic Subsititution Models Version 1.0 Edward Susko Department of Mathematics and Statistics, Dalhousie University Introduction The
More informationLocal Alignment Statistics
Local Alignment Statistics Stephen Altschul National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda, MD Central Issues in Biological Sequence Comparison
More informationPhylogenetic Assumptions
Substitution Models and the Phylogenetic Assumptions Vivek Jayaswal Lars S. Jermiin COMMONWEALTH OF AUSTRALIA Copyright htregulation WARNING This material has been reproduced and communicated to you by
More informationEstimating Divergence Dates from Molecular Sequences
Estimating Divergence Dates from Molecular Sequences Andrew Rambaut and Lindell Bromham Department of Zoology, University of Oxford The ability to date the time of divergence between lineages using molecular
More informationStatistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?
Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution? 30 July 2011. Joe Felsenstein Workshop on Molecular Evolution, MBL, Woods Hole Statistical nonmolecular
More informationPhylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz
Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels
More informationLie Markov models. Jeremy Sumner. School of Physical Sciences University of Tasmania, Australia
Lie Markov models Jeremy Sumner School of Physical Sciences University of Tasmania, Australia Stochastic Modelling Meets Phylogenetics, UTAS, November 2015 Jeremy Sumner Lie Markov models 1 / 23 The theory
More informationSequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University
Sequence Alignment: Scoring Schemes COMP 571 Luay Nakhleh, Rice University Scoring Schemes Recall that an alignment score is aimed at providing a scale to measure the degree of similarity (or difference)
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11 THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationInferring Speciation Times under an Episodic Molecular Clock
Syst. Biol. 56(3):453 466, 2007 Copyright c Society of Systematic Biologists ISSN: 10635157 print / 1076836X online DOI: 10.1080/10635150701420643 Inferring Speciation Times under an Episodic Molecular
More informationBayesian Analysis of Elapsed Times in ContinuousTime Markov Chains
Bayesian Analysis of Elapsed Times in ContinuousTime Markov Chains Marco A. R. Ferreira 1, Marc A. Suchard 2,3,4 1 Department of Statistics, University of Missouri at Columbia, USA 2 Department of Biomathematics
More informationMaximum Likelihood in Phylogenetics
Maximum Likelihood in Phylogenetics 29 July 2014 Workshop on Molecular Evolution Woods Hole, Massachusetts Paul O. Lewis Department of Ecology & Evolutionary Biology Paul O. Lewis (2014 Woods Hole Workshop
More informationMaximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A
J Mol Evol (2000) 51:423 432 DOI: 10.1007/s002390010105 SpringerVerlag New York Inc. 2000 Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus
More informationInferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution
Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis) advantages of different information types
More informationScoring Matrices. Shifra BenDor Irit Orr
Scoring Matrices Shifra BenDor Irit Orr Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison
More informationHow Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building
How Molecules Evolve Guest Lecture: Principles and Methods of Systematic Biology 11 November 2013 Chris Simon Approaching phylogenetics from the point of view of the data Understanding how sequences evolve
More informationBLAST: Target frequencies and information content Dannie Durand
Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences
More informationModeling Noise in Genetic Sequences
Modeling Noise in Genetic Sequences M. Radavičius 1 and T. Rekašius 2 1 Institute of Mathematics and Informatics, Vilnius, Lithuania 2 Vilnius Gediminas Technical University, Vilnius, Lithuania 1. Introduction:
More information7.36/7.91 recitation CB Lecture #4
7.36/7.91 recitation 2192014 CB Lecture #4 1 Announcements / Reminders Homework:  PS#1 due Feb. 20th at noon.  Late policy: ½ credit if received within 24 hrs of due date, otherwise no credit  Answer
More informationarxiv: v1 [qbio.pe] 4 Sep 2013
Version dated: September 5, 2013 Predicting ancestral states in a tree arxiv:1309.0926v1 [qbio.pe] 4 Sep 2013 Predicting the ancestral character changes in a tree is typically easier than predicting the
More informationMolecular phylogeny  Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016
Molecular phylogeny  Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,
More informationBioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre
Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement
More informationLikelihoods and Phylogenies
Likelihoods and Phylogenies Joe Felsenstein Department of enome Sciences and Department of Biology University of Washington, Seattle Likelihoods and Phylogenies p.1/68 n ideal parsimony method? Ideally,
More informationEvolutionary Models. Evolutionary Models
Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment
More informationTheoretical Basis of Likelihood Methods in Molecular Phylogenetic Inference
Theoretical Basis of Likelihood Methods in Molecular Phylogenetic Inference Rhiju Das, Centre of Mathematics and Physical Sciences applied to Life science and EXperimental biology (CoMPLEX), University
More informationA Statistical Test of Phylogenies Estimated from Sequence Data
A Statistical Test of Phylogenies Estimated from Sequence Data WenHsiung Li Center for Demographic and Population Genetics, University of Texas A simple approach to testing the significance of the branching
More informationWeek 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models
Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models Genome 570 February, 2012 Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.1/63
More information"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley B.D. Mishler Feb. 1, 2011. Qualitative character evolution (cont.)  comparing
More informationNatural selection on the molecular level
Natural selection on the molecular level Fundamentals of molecular evolution How DNA and protein sequences evolve? Genetic variability in evolution } Mutations } forming novel alleles } Inversions } change
More informationCounting phylogenetic invariants in some simple cases. Joseph Felsenstein. Department of Genetics SK50. University of Washington
Counting phylogenetic invariants in some simple cases Joseph Felsenstein Department of Genetics SK50 University of Washington Seattle, Washington 98195 Running Headline: Counting Phylogenetic Invariants
More information