# Probabilistic modeling and molecular phylogeny

Size: px
Start display at page:

Transcription

1 Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

2 What is a model? Mathematical models are: Incomprehensible Useless No fun at all

3 What is a model? Mathematical models are: Incomprehensible Useless No fun at all

4 What is a model? Model = hypothesis!!! Hypothesis (as used in most biological research): Precisely stated, but qualitative Allows you to make qualitative predictions Arithmetic model: Mathematically explicit (parameters) Allows you to make quantitative predictions

5 The Scientific Method Observation of data Model of how system works Prediction(s) about system behavior (simulation)

6 Modeling: An example

7 Modeling: An example y = ax + b Simple 2-parameter model

8 Modeling: An example y = ax + b Predictions based on model

9 Model Fit, parameter estimation Measure of how well the model fits the data: sum of squared errors (SSE) Best parameter estimates: those that give the smallest SSE (least squares model fitting) y = ax + b

10 Model Fit, parameter estimation Measure of fit between model and data: sum of squared errors (SSE) Best parameter estimates: those that give the smallest SSE (least squares) y = 1.24x

11 Probabilistic modeling applied to phylogeny Observed data: multiple alignment of sequences H.sapiens globin A G G G A T T C A M.musculus globin!! A C G G T T T - A R.rattus globin A C G G A T T - A Probabilistic model parameters (simplest case): Tree topology and branch lengths Nucleotide-nucleotide substitution rates (or substitution probabilities) Nucleotide frequenciesπ: π A, π C, π G, π T

12 The maximum likelihood approach I Starting point:! You have some observed data and a probabilistic model for how the observed data was produced Example: Data: result of tossing coin 10 times - 7 heads, 3 tails Model: coin has probability p for heads, 1-p for tails.! The probability of observing h heads among n tosses is: Goal:! You want to find the best estimate of the (unknown) parameter value based on the observations. (here the only parameter is p )

13 The maximum likelihood approach II Likelihood = Probability (Data Model) Maximum likelihood: Best estimate is the set of parameter values which gives the highest possible likelihood.

14 Maximum likelihood: coin tossing example Data: result of tossing coin 10 times - 7 heads, 3 tails Model: coin has probability p for heads, 1-p for tails.

15 Probabilistic modeling applied to phylogeny Observed data: multiple alignment of sequences H.sapiens globin A G G G A T T C A M.musculus globin!! A C G G T T T - A R.rattus globin A C G G A T T - A Probabilistic model parameters (simplest case): Tree topology and branch lengths Nucleotide-nucleotide substitution rates (or substitution probabilities) Nucleotide frequenciesπ: π A, π C, π G, π T

16 Other models of nucleotide substitution

17 General Time Reversible Model Time-reversibility: The amount of change from state x to y is equal to the amount of change from y to x π A x P AG = π G x P GA => π A x π G x a = π G x π A x a

18 More elaborate models of evolution Codon-codon substitution rates! (64 x 64 matrix of codon substitution rates) Different mutation rates at different sites in the gene! (the gamma-distribution of mutation rates) Molecular clocks! (same mutation rate on all branches of the tree). Different substitution matrices on different branches of the tree! (e.g., strong selection on branch leading to specific group of animals) Etc., etc.

19 Different rates at different sites: the gamma distribution

20 Computing the probability of one column in an alignment given tree topology and other parameters C C t 1 t 2 A A C G G A T T C A A C G G T T T - A A A G G A T T - A A G G G T T T - A t 3 A t 5 t 4 A G Columns in alignment contain homologous nucleotides Assume tree topology, branch lengths, and other parameters are given. Assume ancestral states were A and A. Start computation at any internal or external node. Pr = π C P CA (t 1 ) P AC (t 2 ) P AA (t 3 ) P AG (t 4 ) P AA (t 5 )

21 Computing the probability of an entire alignment given tree topology and other parameters Probability must be summed over all possible combinations of ancestral nucleotides. (Here we have two internal nodes giving 16 possible combinations) Probability of individual columns are multiplied to give the overall probability of the alignment, i.e., the likelihood of the model. Often the log of the probability is used (log likelihood)

22 Tree 1 Tree 2 Node 1 Node 2 Likelihood Node 1 Node 2 Likelihood A A A A A C A C A G A G A T A T C A C A C C C C C G C G C T C T G A G A G C G C G G G G G T G T T A T A T C T C T G T G T T T T Sum Sum

23 Tree 1 Tree 2 Node 1 Node 2 Likelihood Node 1 Node 2 Likelihood A A A A A C A C A G A G A T A T C A C A C C C C C G C G C T C T G A G A G C G C G G G G G T G T T A T A T C T C T G T G T T T T Sum Sum

24 Maximum likelihood phylogeny: heuristic tree search Data: sequence alignment Model parameters: nucleotide frequencies, nucleotide substitution rates, tree topology, branch lengths. Choose random initial values for all parameters, compute likelihood Change parameter values slightly in a direction so likelihood improves Repeat until maximum found Results: (1) ML estimate of tree topology (2) ML estimate of branch lengths (3) ML estimate of other model parameters (4) Measure of how well model fits data (likelihood).

25 Model Selection? Measure of fit between model and data (e.g., SSE, likelihood, etc.) How do we compare different types of models? y = 1.24x

26 Over-fitting: More parameters always result in a better fit to the data, but not necessarily in a better description y = ax + b 2 parameter model Good description, poor fit y = ax 6 +bx 5 +cx 4 +dx 3 +ex 2 +fx+g 7 parameter model Poor description, good fit

27 Selecting the best model: the likelihood ratio test The fit of two alternative models can be compared using the ratio of their likelihoods:!! LR =! P(Data M1) = L,M1!!! P(Data M2) L,M2 Note that LR > 1 if model 1 has the highest likelihood For nested models it can be shown that!! Δ = 2*ln(LR) = 2* (lnl,m1 - lnl,m2)! follows a Χ 2 distribution with degrees of freedom equal to the number of extra parameters in the most complicated model.! This makes it possible to perform stringent statistical tests to determine which model (hypothesis) best describes the data

28 Asking biological questions in a likelihood ratio testing framework Fit two alternative, nested models to the data. Record optimized likelihood and number of free parameters for each fitted model. Test if alternative (parameter-rich) model is significantly better than null-model, given number of additional parameters (n extra ): Compute Δ = 2 x (lnl Alternative - lnl Null ) Compare Δ to Χ 2 distribution with n extra degrees of freedom Depending on models compared, different biological questions can be addressed (presence of molecular clock, presence of positive selection, difference in mutation rates among sites or branches, etc.)

29 Likelihood ratio test example: Which model fits best: JC or K2P? A C G T A - α α α C α - α α G α α - α T α α α - A C G T A - β α β C β - β α G α β - β T β α β - Jukes and Cantor model (JC): All nucleotides have same frequency All substitutions have same rate Kimura 2 parameter model (K2P): All nucleotides have same frequency Transitions and transversions have different rate => K2P has one extra parameter compared to JC

30 Likelihood ratio test example: Which model fits best: JC or K2P? Starting point: set of DNA sequences, fit JC and K2P models to data, record likelihoods JC:! lnl = K2P:! lnl = K2P has better fit than JC: lnl K2P > lnl JC Test whether K2P is significantly better Δ= 2 x (lnl Alternative - lnl Null ) = 2 x ( ) = 6.2 Degrees of freedom = 1 (K2P has one extra parameter compared to JC)

31 Likelihood ratio test example: Which model fits best: JC or K2P? Δ = 2 x (lnl Alternative - lnl Null ) = 6.2 Degrees of freedom = 1 Critical value (5% level) = Statistic = 6.2 => 1% < p < 5% => Difference is significant => K2P is significantly better description than JC

32 Positive selection I: synonymous and non-synonymous mutations 20 amino acids, 61 codons Most amino acids encoded by more than one codon Not all mutations lead to a change of the encoded amino acid Synonymous mutations are rarely selected against 1/3 synonymous 2/3 nonsynymous nucleotide site ATA (Ile) GTA (Val) TTA (Leu) 1 non-synonymous nucleotide site CGA (Arg) CCA (Pro) CTA (Leu) CAA (Gln) CTC (Leu) CTG (Leu) CTT (Leu) 1 synonymous nucleotide site

33 Positive selection II: non-synonymous and synonymous mutation rates contain information about selective pressure dn: rate of non-synonymous mutations per non-synonymous site ds: rate of synonymous mutations per synonymous site Recall: Evolution is a two-step process:! (1) Mutation (random)! (2) Selection (non-random) Randomly occurring mutations will lead to dn/ds=1. Significant deviations from this most likely caused by subsequent selection. dn/ds < 1: Higher rate of synonymous mutations: negative (purifying) selection dn/ds > 1: Higher rate of non-synonymous mutations: positive selection

34 Exercise: positive selection in HIV? Fit two alternative models to HIV data: M1: two classes of sites with different dn/ds ratios: Class 1 has dn/ds<1 Class 2 has dn/ds=1 M2: three distinct classes with different dn/ds ratios: Class 1 has dn/ds<1 Class 2 has dn/ds=1 Class 3 has dn/ds>1 Use likelihood ratio test to examine if M2 is significantly better than M1, If M2 significantly better than M1 AND if some codons belong to the class with dn/ds>1 (the positively selected class) then you have statistical evidence for positive selection. Most likely reason: immune escape (i.e., sites must be in epitopes) : Codons showing dn/ds > 1: likely epitopes

### Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

### Natural selection on the molecular level

Natural selection on the molecular level Fundamentals of molecular evolution How DNA and protein sequences evolve? Genetic variability in evolution } Mutations } forming novel alleles } Inversions } change

### "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

### Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

### Sequence Divergence & The Molecular Clock. Sequence Divergence

Sequence Divergence & The Molecular Clock Sequence Divergence v simple genetic distance, d = the proportion of sites that differ between two aligned, homologous sequences v given a constant mutation/substitution

### Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

### 7. Tests for selection

Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

### POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

### Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

### Dr. Amira A. AL-Hosary

Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

### Molecular Evolution, course # Final Exam, May 3, 2006

Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least

### Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

### Evolutionary Analysis of Viral Genomes

University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral

### Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

### Understanding relationship between homologous sequences

Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

### Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

### Introduction to Molecular Phylogeny

Introduction to Molecular Phylogeny Starting point: a set of homologous, aligned DNA or protein sequences Result of the process: a tree describing evolutionary relationships between studied sequences =

### Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

### Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

### Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

### Phylogenetics. BIOL 7711 Computational Bioscience

Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

### SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.

### C.DARWIN ( )

C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

### Using algebraic geometry for phylogenetic reconstruction

Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA

### Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood For: Prof. Partensky Group: Jimin zhu Rama Sharma Sravanthi Polsani Xin Gong Shlomit klopman April. 7. 2003 Table of Contents Introduction...3

### Phylogenetic Tree Reconstruction

I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

### Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

### Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

### Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Phylogenetics: Parsimony and Likelihood COMP 571 - Spring 2016 Luay Nakhleh, Rice University The Problem Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions

### Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

### Phylogenetic inference

Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

### Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

### Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

### How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What

### Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

### Quantifying sequence similarity

Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

### "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley D.D. Ackerly Feb. 26, 2018 Maximum Likelihood Principles, and Applications to

### Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics June 1, 2009 Smithsonian Workshop on Molecular Evolution Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut, Storrs, CT Copyright 2009

### Phylogenetics: Likelihood

1 Phylogenetics: Likelihood COMP 571 Luay Nakhleh, Rice University The Problem 2 Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions 3 Characters are mutually

### EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

### The Causes and Consequences of Variation in. Evolutionary Processes Acting on DNA Sequences

The Causes and Consequences of Variation in Evolutionary Processes Acting on DNA Sequences This dissertation is submitted for the degree of Doctor of Philosophy at the University of Cambridge Lee Nathan

### T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid

Lecture 11 Increasing Model Complexity I. Introduction. At this point, we ve increased the complexity of models of substitution considerably, but we re still left with the assumption that rates are uniform

### Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/39

### Inferring Molecular Phylogeny

Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction

### GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

!! www.clutchprep.com CONCEPT: OVERVIEW OF EVOLUTION Evolution is a process through which variation in individuals makes it more likely for them to survive and reproduce There are principles to the theory

### Molecular Evolution and Comparative Genomics

Molecular Evolution and Comparative Genomics --- the phylogenetic HMM model 10-810, CMB lecture 5---Eric Xing Some important dates in history (billions of years ago) Origin of the universe 15 ±4 Formation

### "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley B.D. Mishler Feb. 1, 2011. Qualitative character evolution (cont.) - comparing

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

### O 3 O 4 O 5. q 3. q 4. Transition

Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

### Practical Bioinformatics

5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o

### Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution

Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution Maria Anisimova, Joseph P. Bielawski, and Ziheng Yang Department of Biology, Galton Laboratory, University College

### 9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

### Lab 9: Maximum Likelihood and Modeltest

Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2010 Updated by Nick Matzke Lab 9: Maximum Likelihood and Modeltest In this lab we re going to use PAUP*

### Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

### PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD

Annu. Rev. Ecol. Syst. 1997. 28:437 66 Copyright c 1997 by Annual Reviews Inc. All rights reserved PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD John P. Huelsenbeck Department of

### Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley K.W. Will Parsimony & Likelihood [draft] 1. Hennig and Parsimony: Hennig was not concerned with parsimony

### Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC

### Phylogenetics: Building Phylogenetic Trees

1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

### CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

### Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

### C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

### Taming the Beast Workshop

Workshop David Rasmussen & arsten Magnus June 27, 2016 1 / 31 Outline of sequence evolution: rate matrices Markov chain model Variable rates amongst different sites: +Γ Implementation in BES2 2 / 31 genotype

### Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

### Variance and Covariances of the Numbers of Synonymous and Nonsynonymous Substitutions per Site

Variance and Covariances of the Numbers of Synonymous and Nonsynonymous Substitutions per Site Tatsuya Ota and Masatoshi Nei Institute of Molecular Evolutionary Genetics and Department of Biology, The

### Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics 26 January 2011 Workshop on Molecular Evolution Český Krumlov, Česká republika Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut,

### Multiple Alignment. Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis

Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis gorm@cbs.dtu.dk Refresher: pairwise alignments 43.2% identity; Global alignment score: 374 10 20

### Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

### Evolutionary Theory and Principles of Phylogenetics. Lucy Skrabanek ICB, WMC February 25, 2010

Evolutionary Theory and Principles of Phylogenetics Lucy Skrabanek ICB, WMC February 25, 2010 Theory of evolution Evolution: process of change over time 2 main competing models Phyletic gradualism Punctuated

### Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

### Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships

### POPULATION GENETICS Biology 107/207L Winter 2005 Lab 5. Testing for positive Darwinian selection

POPULATION GENETICS Biology 107/207L Winter 2005 Lab 5. Testing for positive Darwinian selection A growing number of statistical approaches have been developed to detect natural selection at the DNA sequence

### Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

### Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

### Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

### Hidden Markov Models for biological sequence analysis

Hidden Markov Models for biological sequence analysis Master in Bioinformatics UPF 2017-2018 http://comprna.upf.edu/courses/master_agb/ Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA

Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib

### Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0

### Evolutionary Change in Nucleotide Sequences. Lecture 3

Evolutionary Change in Nucleotide Sequences Lecture 3 1 So far, we described the evolutionary process as a series of gene substitutions in which new alleles, each arising as a mutation ti in a single individual,

### Theory of Evolution Charles Darwin

Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

### Evolutionary Models. Evolutionary Models

Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

### Reading for Lecture 13 Release v10

Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................

### Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

### APPENDIX : PARTIAL FRACTIONS

APPENDIX : PARTIAL FRACTIONS Appendix : Partial Fractions Given the expression x 2 and asked to find its integral, x + you can use work from Section. to give x 2 =ln( x 2) ln( x + )+c x + = ln k x 2 x+

### A (short) introduction to phylogenetics

A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

### Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) Background: Conditional probability A P (B A) = A,B P (A,

### RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG Department of Biology (Galton Laboratory), University College London, 4 Stephenson

### Supplementary Information for

Supplementary Information for Evolutionary conservation of codon optimality reveals hidden signatures of co-translational folding Sebastian Pechmann & Judith Frydman Department of Biology and BioX, Stanford

### Codon-model based inference of selection pressure. (a very brief review prior to the PAML lab)

Codon-model based inference of selection pressure (a very brief review prior to the PAML lab) an index of selection pressure rate ratio mode example dn/ds < 1 purifying (negative) selection histones dn/ds

### How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

How Molecules Evolve Guest Lecture: Principles and Methods of Systematic Biology 11 November 2013 Chris Simon Approaching phylogenetics from the point of view of the data Understanding how sequences evolve

### Chapter 16: Reconstructing and Using Phylogenies

Chapter Review 1. Use the phylogenetic tree shown at the right to complete the following. a. Explain how many clades are indicated: Three: (1) chimpanzee/human, (2) chimpanzee/ human/gorilla, and (3)chimpanzee/human/

### Why do more divergent sequences produce smaller nonsynonymous/synonymous

Genetics: Early Online, published on June 21, 2013 as 10.1534/genetics.113.152025 Why do more divergent sequences produce smaller nonsynonymous/synonymous rate ratios in pairwise sequence comparisons?

### How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

### Consensus Methods. * You are only responsible for the first two

Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

### Probability and Statistics. Terms and concepts

Probability and Statistics Joyeeta Dutta Moscato June 30, 2014 Terms and concepts Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution

### The Logit Model: Estimation, Testing and Interpretation

The Logit Model: Estimation, Testing and Interpretation Herman J. Bierens October 25, 2008 1 Introduction to maximum likelihood estimation 1.1 The likelihood function Consider a random sample Y 1,...,

### Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates

### Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics

### Lecture Notes: BIOL2007 Molecular Evolution

Lecture Notes: BIOL2007 Molecular Evolution Kanchon Dasmahapatra (k.dasmahapatra@ucl.ac.uk) Introduction By now we all are familiar and understand, or think we understand, how evolution works on traits