A COMMON APPROACH TO FINDING THE OPTIMAL SCENARIOS OF A MARKOV STOCHASTIC PROCESS OVER A PHYLOGENETIC TREE

Size: px
Start display at page:

Download "A COMMON APPROACH TO FINDING THE OPTIMAL SCENARIOS OF A MARKOV STOCHASTIC PROCESS OVER A PHYLOGENETIC TREE"

Transcription

1 Article BIOIFORMATICS DOI:.554/bbeq.22.7 A COMMO APPROACH TO FIDIG THE OPTIMAL SCEARIOS OF A MARKOV STOCHASTIC PROCESS OVER A PHYLOGEETIC TREE Petar Konovski University of London, Department of Computer Science and Information Systems Birkbeck, London, UK petar@dcs.bbk.ac.uk ABSTRACT Inferring phylogenetic trees is a general approach in the reconstruction of the evolutionary histories of organisms. In order to estimate the events over a phylogenetic tree, several criteria and algorithms are used. In the present work, a common approach, together with an effective algorithm is proposed. The approach aims to unify the applying of the Maximum parsimony and Maximum likelihood criteria over a phylogenetic tree with various edge lengths. The events over an edge of the phylogenetic tree are described by a Markov stochastic process. Explicit formulae for the simplest case are given. Maximum likelihood is reformulated from the point of view of Information Theory as Minimum surprisal. Additionally, a new scoring criterion, Minimum entropy, is proposed. Biotechnol. & Biotechnol. Eq. 2 26(5), Keywords: phylogenetic tree, maximum parsimony, maximum likelihood, Markov stochastic process, minimum entropy Introduction Mapping the evolutionary events on a phylogenetic tree is a widely used approach to elucidate the inheritance of given treats in a set of species. Here we make an attempt to create a common framework for the various approaches to such mapping and to introduce some improvements of the existing algorithms. Maximum parsimony Maximum parsimony (MP) is the oldest approach to estimate the events over a phylogenetic tree. It is the simplest one and inevitably has its limitations, but nevertheless it is still widely used. For the needs of estimating the phylogenetic events, it can be formulated as explaining the present-day distribution of the hereditary traits in the investigated group of species using minimal assumptions about the events which have happened in the common ancestors. In our case, we consider only two events to happen in the nodes of the tree: a gain of the trait and a loss of the trait. The mere inheritance is not considered as an event: it happens by default every time when the trait is present in the ancestor and no loss happens. Because the loss of the traits is expected to happen more often than the gain, a gain penalty coefficient G > is introduced, while the loss penalty is assumed to be equal to. Thus, we try to minimize the sum of the penalties. In a formal way, this is described as follows: The scoring function over the nodes of the tree is min if loss happens F( ) = G if gain happens if nothing happens (Eq. ) The Maximum parsimony goal is to find T F The minimum is over all speculative event sets which can explain the distribution of the trait in the leaves. In this case, Minimal penalties Minimum expenditure Maximum parsimony. An efficient algorithm, PARS, (PARsimonious Scenario) for computing the Maximum parsimony over a given phylogenetic tree is given by Mirkin et al. (6). Although the assumption in (6) is that the gain and loss penalties are uniform over the tree, the recursive structure of PARS allows the use of node-specific penalties. Maximum parsimony has been used in different areas long before the arising of the contemporary evolutionary theory. Actually, it can be considered as an implementation of the Occam s razor. An example of a phylogenetic tree In the example shown the protein content of 26 prokaryotic species is investigated. Given the present-day protein content of the species (represented by the leaves of the tree), the protein content of their ancestors (represented by the internal nodes of the tree) is estimated using the Maximum parsimony. The proteins are grouped in 337 COGs (Clusters of Orthologous Groups of proteins). A graphic representation of the phylogenetic tree is given in Fig.. For every node of the tree, the numbers of gained and lost COGs are shown. The graphics is generated by the ER package (in testing phase). The 3296

2 Fig. Gains/losses of 337 COGs mapped over a phylogenetic tree. used data set is the same as in (6). The data are extracted from the COG database (2). Maximum likelihood In the phylogenetic research, Maximum likelihood (ML) is widely used in recent days and is considered more accurate than MP. It is a general statistical method, and its application in phylogenetic research originates from Farris (4). The method relies on the estimation of the likelihoods for gain or loss in every node of the tree. This allows applying the calculations on trees with variable edge lengths or with gain and loss rates which vary from edge to edge. The variable edge lengths reflect the variable time intervals passed between various ancestor child pairs represented in the tree. ow, the probabilities of the gains and losses for different nodes are not equal. Let γ be the probabilities for the gain and loss in node and γ = γ, λ = λ. The scoring function is λ if loss happens γ if gain happens L( ) = λ if no loss happens γ if no gain happens (Eq. 2) Π over all speculative event sets which can explain the pattern in the leaves. The main obstacle for the algorithm is the initial set of data for γ, because at the beginning the researcher has the information for the contents of the leaves only. This is because initially the algorithm is fed with these data from external source (e.g., the probabilities are calculated using the output from MP). After that, an iterative process can be applied, by calculating the new values of γ using the newly obtained values for gains and losses. This iterative procedure together with the first open problem about it (described below) originates from Mirkin et al. (7). The procedure is a part of the algorithm MALS (MAximum Likelihood Scenario) (7), on which some of the following considerations are based. Because the gains and losses obtained for a single trait are too scarce to make reliable conclusions about the nodedependent gain and loss rates, these rates are calculated using the data for all traits (e.g. in the example cited above: for all 337 COGs). This is based on the assumption that the gain and loss rates are equal for all traits, though they can be nodespecific. Open problems: The following problems concerning the iteration procedure still need answers: Then we search for max L T ( ) 3297

3 . Is the iteration process always convergent? So far, no situation has been observed when the iteration enters an endless cycle. Using data similar to the example given above, the iteration procedure reaches an extremum within maximum iterations. 2. Are there any local extrema in the iteration procedure? Chor et al. () give counterexamples of phylogenetic trees for which the Maximum likelihood function has multiple extrema, even a continuum set of extrema. A similar maximum likelihood problem is investigated analytically by Vandev et al. (3). Materials and Methods The Phylogenetic Scenarios from the Point of View of the Information Theory Claude Shannon in his work () founded the contemporary Information Theory, or Mathematical Theory of Communication. We will use some basic elements from it. The ideas of Claude Shannon influence various branches of engineering and science, from the communication lines through data storage through cryptography till compression algorithms. The information content of the outcome of a random event Let X be a discrete random event and ω be one of its outcomes with probability P(ω). Then, the knowledge that the outcome is ω gives us the amount of information (or self-information, surprisal) I ( ω) ( P( ω) ) = log. If the base of the logarithm is the measure is in the well-known bits; if the base is e, the measure is in nats; if the base is, the measure is in hartleys. Example: When flipping a coin, the knowledge that the outcome is head, gives us an amount of surprisal log 2 (/ 2) = bit. The information content of a scenario over a phylogenetic tree Let ω be one of the alternative scenarios over the phylogenetic tree T. It is described by its set of specific gains and losses in the nodes of the tree, as mentioned before. The probability of the scenario is P( ω) = max Π L T ( ) and its information content is I ( ω) = - log L( ). The phylogenetic scenario which minimizes the above function, should maximize the likelihood function and vice versa. This follows from the fact that the function log is strictly decreasing. Hence, one can use the scoring function T log( λ ) if loss happens log( γ ) if gain happens I ( ) = log( λ ) if no loss happens log( γ ) if no gain happens (Eq. 3) instead of (Eq. 2) and search for min I ( ) T It can be seen that the function which we intend to minimize, is very similar to that of the maximum parsimony. This allows us to apply a common minimization algorithm. The entropy of a random variable Another concept from the Information theory which can be useful, is the entropy of a random variable. It is a different entity from the physical concept for entropy, though they have some important similarities. ow, let X be a random variable with outcomes { x,..., x n} (not necessarily numbers) which happen with the corresponding probabilities { λ,..., λ n}. Let us assume that the output of the experiment is xi, the information content of the outcome is I ( x ) = log( λ ) i i and the entropy of X is defined as: I ( x ) H x n = λ = λ log( λ ) i i i i i k = k = n The entropy of a scenario over a phylogenetic tree Another approach, which can benefit from the common framework, is to investigate the entropy of the scenarios over a phylogenetic tree. Let γ be the probabilities for the gain or loss in node and γ = γ, λ = λ. The scoring function is H λlog( λ ) if loss happens γ nlog( γ n ) if gain happens = λ log( λ ) if no loss happens γ log( γ ) if no gain happens (Eq. 4) The Phylogenetic Tree Events As Outputs of a Markov Process It is considered as a paradigm that given the genetic contents of a specific organism, the genetic contents of its children depend on this information only and do not depend on the genetic contents of other ancestors and siblings. This is the justification to declare that the processes of inheritance have Markov property. Following this point of view, the natural way to introduce lengths for the edges of a phylogenetic tree is to consider them as time intervals. So, talking about the edge lengths, we assume that they are time intervals between the considered events, often without being able to specify anything about the measurement units of this time. Definitions and examples Let A be a parent node, S be one of its children, and the edge length be arbitrary. Let the trait of interest be a set of characters { c, c,..., c m}, only one of which can be present in A or in S. The outcomes of the process can be described by the matrix (P ij ) where P ij is the probability of replacing the character c i (if present in A) by the character c j in S.

4 Example : In the microevolution s studies of Single ucleotide Polymorphism (SP), the characters can be chosen as {, A, c, G,t } (empty, Adenine, Cytosine, Guanine, Thymine). In a specified position in a DA string, all possible events which can happen are described by a 5 5 matrix containing the corresponding probabilities. Example 2: When a given amino acid is substituted by another one in a polypeptide chain, the matrix describing the process is 2 2. If the character (absence of the amino acid) is considered also, the matrix becomes 2 2. Example 3: In our case, we need only two characters: : the trait k is absent and : the trait k is present. The matrix is 2 2. Formulation as a Markov process: Kolmogorov s forward equation Further in this section, we follow Ross (9). Let g be the gain rate and l be the loss rate over the edge. The infinitesimal generator of the Markov process is given by the matrix M g g = l l The probability that a given character i will be replaced by the character j along an edge of length t is Pi j ( t ) where i and j take values or. P(t) is the solution of the Kolmogorov s forward equations which in matrix form can be written as = P ' t P t M. The solution is represented as P( t) exp( Mt) =. Here, exp( Mt ) is the matrix exponent of Mt. The solutions Luckily, the solutions in the case of two characters can be found analytically. They are as follows: l g P ( t) = + exp( ( l + g) t) - The character is absent and will not be gained after time t. g g P ( t) = exp( ( l + g) t) - The character is absent and will be gained after time t. l l P ( t) = exp( ( l + g) t) - The character is present and will be lost after time t. g l P ( t) = + exp( ( l + g) t) - The character is present and will not be lost after time t. Comment Though the infinitesimal rates g and l and the time t are additional unknown parameters, this approach gives us the basis to evaluate the node-dependent probabilities of gain and loss in some situations. Reversely, given an estimation of the gain and loss probabilities, one can estimate t, g and l. This approach explains also the four scoring coefficients which are attached to each node in the previous considerations. Actually, though they are not the time-dependent probabilities found above, they are functions from them or their rough approximation (as in the case of Maximum parsimony). The nature of these solutions shows how to use any information about the time spent during the transition from the parent to the child. Results and Discussion A Generalized Minimization Algorithm Preliminary notes The algorithm described is a generalization of a set of algorithms which have been reinvented several times and applied in special cases, sometimes as MP, sometimes as ML reconstruction. The first idea of such an approach was developed by Fitch (5) using a set theory approach and was applied to MP for nucleotide substitution reconstruction. A formal description is made by Sankoff () and is applied to MP. Implementation for ML in the case of amino-acid substitutions is made by Pupko et al. (8). The generalized approach aims to deal with a class of assessments, which include Maximum parsimony, Minimum surprisal (equivalent to Maximum likelihood) and Minimum entropy. The approach covers the case of trees with different edge lengths and the case when the events have different rates over different parts of the tree. According to the Markov properties of the processes over the phylogenetic tree, a wide class of the scoring approaches lead to a set of weighting coefficients per node. In our case, the coefficients are four. In the corresponding weighting function, one and only one of them appear as an addend representing the corresponding node and its choice is made according to uniform rules. otations Let the investigated phylogenetic tree T be binary and rooted with root R and let L T be the set of the leaves. For T \ L we denote the subtree with root as U(). Let K = { k,..., km} be a set of independent hereditary characters whose presence or absence in the leaves is given as initial data. In the further presentation only a fixed k K will be considered. The basic building block of the tree is an ancestor node A with two children S and S2. There is a set of four cases, each of which has its own weighting coefficient: 3299

5 Loss: k A and k S : l ; g ; Gain: k A and k S : ot Loss (or inheritance): k A and k S : l ; ot Gain: k A and k S : g In particular, for the scoring approaches considered above, we have: For the Maximum parsimony: l =, g = G, l =, g = l For the Minimum surprisal: = log( λ ), g = log( γ ), l = log( λ ), g = log( γ ) l For the Minimum entropy: = λ log( λ ), g = γ log( γ ), l = λ log( λ ), g = γ log( γ ) The elementary Score Function For every A T \ L and conditionally of the trait k we define a triad of boolean variables as follows: ( k A, k S, k S2) The number of all such triplets is 2 3 = 8. The elementary score function Φ is defined on A (actually, on its associated triad) as follows: Φ ( A),, : g S + g S,, : g S + gs,, : gs + g S,, : gs + gs =,, : ls + ls,, : S S 2 l + l,,, : l S + ls,, : l S + l S 2. (Eq. 5) ote that it does not take into account the coefficients attached to A but only those of its children. Recursive definition of the minimizing functions T \ L { R}, (i.e., an internal node, which is For every not the root) we define Λ ( ) = min Φ ( A) A U ( ) where the minimum is taken over all possible scenarios. Actually, this is the value of the minimum reached in the tree whose root is, without taking into account the weighting coefficients of itself. The following two restrictions of Λ will be used: Λ = Λ( k ) and Λ = Λ( k ). ote: Finally, we define that Λ ( R) = min( Λ ( R), Λ ( R) + g R ). This will be used when we calculate the global minimum as Λ(R). Adding g R (and choosing a proper gain weight for the root node) is important from a conceptual point of view. If we do not add such a penalty for the root, the minimization procedure tends to move the originating of the treat towards the root. In other words, the root will become a source of free gains contrary to the common evolutionary knowledge. This is true especially if the gain event is considered as more unlikely to happen than the loss. The following assertions hold: ( A) ( ) ( 2 ) Λ S + Λ S2 + g S + g S Λ S + Λ S2 + g + g, 2, Λ S + Λ S + g + g S S 2 Λ = min Λ S + Λ S + gs + g S 2 S S 2 (Eq. 6) Proof: The components of the minimum expression cover the full set of events which can happen under the condition k A. In every case, the corresponding score coefficients are added. ( ) ( 2 ) Λ S + Λ S2 + l + l, S S 2 Λ S + Λ S2 + ls + l S Λ ( A) = min Λ S + Λ S2 + l S + ls Λ S + Λ S + l + l S S 2 (Eq. 7) Proof: The components of the minimum expression cover the full set of events which can happen under the condition k A. In every case, the corresponding score coefficients are added. ote: In real-world biological examples, the gain and loss coefficients are much bigger than not gain and not loss. This comes from the fact that the mutations are relatively rare events. Thus, the last case in Eq. 6 and the first case in Eq. 7 will never qualify for reaching the minimums and can be excluded. But they arise naturally from the logic of the explanation as they stand for scenarios which are not impossible. The case of the leaves The functions Φ and A, which we defined above, use data from the children of the corresponding node. When the node is a leaf, they must be defined explicitly. So, if S L, the following settings conform with the general definitions: ( S ) ( S ) Λ = Φ = An overview of the implementation If we have already calculated Λ ( S ), Λ ( S2 ), Λ ( S) and Λ ( S2) : Λ ; - Calculate A Λ and A 33

6 - Remember the number of the expressions in the minimums at which the minimal values are achieved; ote: It may happen that the minimums are achieved at more than one expression. This actually can lead to finding alternative scenarios, all of which are extremal. For now, we do not investigate this opportunity. - When achieving the root of the tree, calculate Λ = ( Λ, Λ + ). R min R R g R - Remember the expression at which the minimum is achieved. (This will determine if k R and actually, the winning of the two competing alternatives.); - Using the stored number of the minimal expression, we determine the presence or absence of k in the children of R and will continue recursively. Comparison with PARS and MALS algorithms The given minimization algorithm can achieve the same results (a particular scenario) as any other algorithm, given the scoring coefficients are the same. Therefore the results are expected to be identical with those obtained by PARS (Maximum Parsimony) and MALS (Maximum Likelihood) if the proper scoring coefficients are chosen. Unlikely PARS and MALS, here not any lists of nodes are kept, which should simplify the implementation and should allow its implementation on much bigger phylogenetic trees. The considered approach and proposed algorithm allow proceeding with a wide class of phylogenetic trees and scoring criteria. The scoring approaches include the most popular Maximum parsimony and Maximum likelihood. In the given context, the Maximum parsimony is a variant of the same optimisation approach, with a simple and uniform set of scoring coefficients. The iteration procedure described reaches conformity in the mapping of the gains and losses and the corresponding probabilities used. As mentioned above, the uniqueness of the extremum for the iterative ML approach is not guaranteed. The conditions when the problem has a single extremum remain to be elucidated in the future. An additional scoring criteria, Minimum entropy, is proposed. It fits the same approach as MP and ML, with differences in the set of the scoring coefficients only. If Minimum entropy is a useful approach as it is formulated, is a subject to further research. Some authors have used log likelihood as a simplification step for the Maximum likelihood computations, but it was not recognized as the negative value of the information content of a phylogenetic scenario. One can see recent examples ( 3, 4) and many others. Actually, this can be considered as an reinvention of the concept of the information, though in a different context, and shows that the useful scientific paradigms inevitably appear when there is a need for them. Acknowledgements This work was partially supported by DCSIS, Birkbeck, University of London, London, UK. It is a further extension of previously published results (6, 7) and the author is grateful to Prof. Boris Mirkin and Prof. Trevor Fenner from DCSIS, Birkbeck for their guidance. The author also would like to thank the anonymous referees of the first version of the paper for the constructive criticism. REFERECES. Chor B., Hendy M.D., Holland B.R., Penny D. (2) Mol. Biol. Evol., 7, Cohen O., Pupko T. (2) Mol. Biol. Evol., 27, Guindon S., Dufayard J.-F., et al. (2) Syst. Biol., 59, Farris J.S. (973) Syst. Zool., Fitch W. (97) Syst. Zool., 2, Mirkin B.G., Fenner T.I., Galperin M.Y., Koonin E.V. (23) BMC Evol. Biol., 3(2). 7. Mirkin B.G., Camargo R., Fenner T.I., Loizou G., Kellam P. (26) In: Proceedings of the 26 IEEE symposium on computational intelligence in bioinformatics and computational biology (D. Ashlock, Ed.), Piscataway, Pupko T., Pe er I., Shamir R., Graur D. (2) Mol. Biol. Evol., 7, Ross S.M. (996) Stochastic Processes, John Wiley & Sons, ew York.. Sankoff D. (975) SIAM J. Appl. Math., 28, Shannon C. (948) Bell Syst. Tech. J., 27, , Tatusov R.L., atale D.A.et al. (2) ucleic Acids Res., 29, Vandev D., Prodanova K., Petkov V. (23) Application of Mathematics in Engineering and Economics, BULVEST 2, Sofia, pp Yang J., Benyamin B., et al. (2) at. Gen.,

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Lecture Notes: Markov chains

Lecture Notes: Markov chains Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

More information

Bayesian Models for Phylogenetic Trees

Bayesian Models for Phylogenetic Trees Bayesian Models for Phylogenetic Trees Clarence Leung* 1 1 McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada ABSTRACT Introduction: Inferring genetic ancestry of different species

More information

Is the equal branch length model a parsimony model?

Is the equal branch length model a parsimony model? Table 1: n approximation of the probability of data patterns on the tree shown in figure?? made by dropping terms that do not have the minimal exponent for p. Terms that were dropped are shown in red;

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Mutual Information & Genotype-Phenotype Association. Norman MacDonald January 31, 2011 CSCI 4181/6802

Mutual Information & Genotype-Phenotype Association. Norman MacDonald January 31, 2011 CSCI 4181/6802 Mutual Information & Genotype-Phenotype Association Norman MacDonald January 31, 2011 CSCI 4181/6802 2 Overview What is information (specifically Shannon Information)? What are information entropy and

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

Unsupervised Learning in Spectral Genome Analysis

Unsupervised Learning in Spectral Genome Analysis Unsupervised Learning in Spectral Genome Analysis Lutz Hamel 1, Neha Nahar 1, Maria S. Poptsova 2, Olga Zhaxybayeva 3, J. Peter Gogarten 2 1 Department of Computer Sciences and Statistics, University of

More information

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir Sequence Bioinformatics Multiple Sequence Alignment Waqas Nasir 2010-11-12 Multiple Sequence Alignment One amino acid plays coy; a pair of homologous sequences whisper; many aligned sequences shout out

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

Thanks to Paul Lewis and Joe Felsenstein for the use of slides Thanks to Paul Lewis and Joe Felsenstein for the use of slides Review Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPGMA infers a tree from a distance

More information

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University Phylogenetics: Parsimony and Likelihood COMP 571 - Spring 2016 Luay Nakhleh, Rice University The Problem Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

DEMBSKI S SPECIFIED COMPLEXITY: A SIMPLE ERROR IN ARITHMETIC

DEMBSKI S SPECIFIED COMPLEXITY: A SIMPLE ERROR IN ARITHMETIC DEMBSKI S SPECIFIED COMPLEXITY: A SIMPLE ERROR IN ARITHMETIC HOWARD A. LANDMAN Abstract. We show that the derivation of Dembski s formula for specified complexity contains a simple but enormous error,

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Chapter 7: Models of discrete character evolution

Chapter 7: Models of discrete character evolution Chapter 7: Models of discrete character evolution pdf version R markdown to recreate analyses Biological motivation: Limblessness as a discrete trait Squamates, the clade that includes all living species

More information

Molecular Evolution & Phylogenetics

Molecular Evolution & Phylogenetics Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200 Spring 2018 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley D.D. Ackerly Feb. 26, 2018 Maximum Likelihood Principles, and Applications to

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

Fast computation of maximum likelihood trees by numerical approximation of amino acid replacement probabilities

Fast computation of maximum likelihood trees by numerical approximation of amino acid replacement probabilities Computational Statistics & Data Analysis 40 (2002) 285 291 www.elsevier.com/locate/csda Fast computation of maximum likelihood trees by numerical approximation of amino acid replacement probabilities T.

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17: Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:50 5001 5 Multiple Sequence Alignment The first part of this exposition is based on the following sources, which are recommended reading:

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Properties of normal phylogenetic networks

Properties of normal phylogenetic networks Properties of normal phylogenetic networks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu August 13, 2009 Abstract. A phylogenetic network is

More information

Proof Techniques (Review of Math 271)

Proof Techniques (Review of Math 271) Chapter 2 Proof Techniques (Review of Math 271) 2.1 Overview This chapter reviews proof techniques that were probably introduced in Math 271 and that may also have been used in a different way in Phil

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Phylogenetics: Parsimony

Phylogenetics: Parsimony 1 Phylogenetics: Parsimony COMP 571 Luay Nakhleh, Rice University he Problem 2 Input: Multiple alignment of a set S of sequences Output: ree leaf-labeled with S Assumptions Characters are mutually independent

More information

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D 7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Information in Biology

Information in Biology Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2011 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley B.D. Mishler Feb. 1, 2011. Qualitative character evolution (cont.) - comparing

More information

Notes on induction proofs and recursive definitions

Notes on induction proofs and recursive definitions Notes on induction proofs and recursive definitions James Aspnes December 13, 2010 1 Simple induction Most of the proof techniques we ve talked about so far are only really useful for proving a property

More information

Rapid evolution of the cerebellum in humans and other great apes

Rapid evolution of the cerebellum in humans and other great apes Rapid evolution of the cerebellum in humans and other great apes Article Accepted Version Barton, R. A. and Venditti, C. (2014) Rapid evolution of the cerebellum in humans and other great apes. Current

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION MAGNUS BORDEWICH, KATHARINA T. HUBER, VINCENT MOULTON, AND CHARLES SEMPLE Abstract. Phylogenetic networks are a type of leaf-labelled,

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.

More information

Introduction to Information Theory

Introduction to Information Theory Introduction to Information Theory Impressive slide presentations Radu Trîmbiţaş UBB October 2012 Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 1 / 19 Transmission of information

More information

Information in Biology

Information in Biology Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

UNIT I INFORMATION THEORY. I k log 2

UNIT I INFORMATION THEORY. I k log 2 UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper

More information

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny

More information

Clustering Proteins and Reconstructing Evolutionary Events

Clustering Proteins and Reconstructing Evolutionary Events Clustering Proteins and Reconstructing Evolutionary Events Boris Mirkin 1,2 1 School of Computer Science, Birkbeck University of London, Malet Street, London, WC1 7HX, UK, mirkin@dcs.bbk.ac.uk 2 Department

More information

#A32 INTEGERS 10 (2010), TWO NEW VAN DER WAERDEN NUMBERS: w(2; 3, 17) AND w(2; 3, 18)

#A32 INTEGERS 10 (2010), TWO NEW VAN DER WAERDEN NUMBERS: w(2; 3, 17) AND w(2; 3, 18) #A32 INTEGERS 10 (2010), 369-377 TWO NEW VAN DER WAERDEN NUMBERS: w(2; 3, 17) AND w(2; 3, 18) Tanbir Ahmed ConCoCO Research Laboratory, Department of Computer Science and Software Engineering, Concordia

More information

should be presented and explained in the combined species tree (Fitch, 1970; Goodman et al., 1979). The gene divergence can be the results of either s

should be presented and explained in the combined species tree (Fitch, 1970; Goodman et al., 1979). The gene divergence can be the results of either s On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies Louxin Zhang lxzhang@iss.nus.sg BioInformatics Center Institute of Systems Science Heng Mui Keng Terrace Singapore 119597 Abstract

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

AVL Trees. Manolis Koubarakis. Data Structures and Programming Techniques

AVL Trees. Manolis Koubarakis. Data Structures and Programming Techniques AVL Trees Manolis Koubarakis 1 AVL Trees We will now introduce AVL trees that have the property that they are kept almost balanced but not completely balanced. In this way we have O(log n) search time

More information

Streaming Algorithms for Optimal Generation of Random Bits

Streaming Algorithms for Optimal Generation of Random Bits Streaming Algorithms for Optimal Generation of Random Bits ongchao Zhou Electrical Engineering Department California Institute of echnology Pasadena, CA 925 Email: hzhou@caltech.edu Jehoshua Bruck Electrical

More information

Phylogeny: building the tree of life

Phylogeny: building the tree of life Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

More information

Phylogenetic trees 07/10/13

Phylogenetic trees 07/10/13 Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Computational Biology

Computational Biology Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

Statistics 992 Continuous-time Markov Chains Spring 2004

Statistics 992 Continuous-time Markov Chains Spring 2004 Summary Continuous-time finite-state-space Markov chains are stochastic processes that are widely used to model the process of nucleotide substitution. This chapter aims to present much of the mathematics

More information

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS Masatoshi Nei" Abstract: Phylogenetic trees: Recent advances in statistical methods for phylogenetic reconstruction and genetic diversity analysis were

More information

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences 140.638 where do sequences come from? DNA is not hard to extract (getting DNA from a

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Molecular Evolution and Phylogenetic Tree Reconstruction

Molecular Evolution and Phylogenetic Tree Reconstruction 1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Population Genetics: a tutorial

Population Genetics: a tutorial : a tutorial Institute for Science and Technology Austria ThRaSh 2014 provides the basic mathematical foundation of evolutionary theory allows a better understanding of experiments allows the development

More information

Computational statistics

Computational statistics Computational statistics Combinatorial optimization Thierry Denœux February 2017 Thierry Denœux Computational statistics February 2017 1 / 37 Combinatorial optimization Assume we seek the maximum of f

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

arxiv: v5 [q-bio.pe] 24 Oct 2016

arxiv: v5 [q-bio.pe] 24 Oct 2016 On the Quirks of Maximum Parsimony and Likelihood on Phylogenetic Networks Christopher Bryant a, Mareike Fischer b, Simone Linz c, Charles Semple d arxiv:1505.06898v5 [q-bio.pe] 24 Oct 2016 a Statistics

More information

In-Depth Assessment of Local Sequence Alignment

In-Depth Assessment of Local Sequence Alignment 2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.

More information

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS. !! www.clutchprep.com CONCEPT: OVERVIEW OF EVOLUTION Evolution is a process through which variation in individuals makes it more likely for them to survive and reproduce There are principles to the theory

More information

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood For: Prof. Partensky Group: Jimin zhu Rama Sharma Sravanthi Polsani Xin Gong Shlomit klopman April. 7. 2003 Table of Contents Introduction...3

More information

Analytic Solutions for Three Taxon ML MC Trees with Variable Rates Across Sites

Analytic Solutions for Three Taxon ML MC Trees with Variable Rates Across Sites Analytic Solutions for Three Taxon ML MC Trees with Variable Rates Across Sites Benny Chor Michael Hendy David Penny Abstract We consider the problem of finding the maximum likelihood rooted tree under

More information

Lecture 9 Evolutionary Computation: Genetic algorithms

Lecture 9 Evolutionary Computation: Genetic algorithms Lecture 9 Evolutionary Computation: Genetic algorithms Introduction, or can evolution be intelligent? Simulation of natural evolution Genetic algorithms Case study: maintenance scheduling with genetic

More information

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: 15:30-16:30/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office hours:??

More information

BLAST: Target frequencies and information content Dannie Durand

BLAST: Target frequencies and information content Dannie Durand Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). 1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

Reconstruction of certain phylogenetic networks from their tree-average distances

Reconstruction of certain phylogenetic networks from their tree-average distances Reconstruction of certain phylogenetic networks from their tree-average distances Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu October 10,

More information