Molecular Evolution, course # Final Exam, May 3, 2006

Similar documents
Phylogenetic Tree Reconstruction

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Probabilistic modeling and molecular phylogeny

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Constructing Evolutionary/Phylogenetic Trees

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Reconstruire le passé biologique modèles, méthodes, performances, limites

Constructing Evolutionary/Phylogenetic Trees

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

7. Tests for selection

Inferring Molecular Phylogeny

C.DARWIN ( )

Dr. Amira A. AL-Hosary

A (short) introduction to phylogenetics

Lab 9: Maximum Likelihood and Modeltest

EVOLUTIONARY DISTANCES

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Discrete & continuous characters: The threshold model

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

BINF6201/8201. Molecular phylogenetic methods

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Lecture 4. Models of DNA and protein change. Likelihood methods

Phylogenetics. BIOL 7711 Computational Bioscience

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogenetic analyses. Kirsi Kostamo

Understanding relationship between homologous sequences

Quantifying sequence similarity

Processes of Evolution

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Natural selection on the molecular level

How to read and make phylogenetic trees Zuzana Starostová

Phylogenetic inference

BMI/CS 776 Lecture 4. Colin Dewey

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed


Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Letter to the Editor. Department of Biology, Arizona State University

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Inferring Speciation Times under an Episodic Molecular Clock

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Using algebraic geometry for phylogenetic reconstruction

Systematics - Bio 615

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Evolutionary Tree Analysis. Overview

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Evolutionary Models. Evolutionary Models

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Chapter 7: Models of discrete character evolution

What is Phylogenetics

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Molecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Taming the Beast Workshop

arxiv: v1 [q-bio.pe] 4 Sep 2013

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

Mutation models I: basic nucleotide sequence mutation models

Lecture Notes: Markov chains

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Estimating Evolutionary Trees. Phylogenetic Methods

Effects of Gap Open and Gap Extension Penalties

Molecular Evolution and Phylogenetic Tree Reconstruction

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

林仲彥. Dec 4,

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Andrew/CS ID: Midterm Solutions, Fall 2006

Molecular Evolution & Phylogenetics

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

进化树构建方法的概率方法 第 4 章 : 进化树构建的概率方法 问题介绍. 部分 lid 修改自 i i f l 的 ih l i

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Week 5: Distance methods, DNA and protein models

Evolutionary Analysis of Viral Genomes

What Is Conservation?

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Transcription:

Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least 85 points are required to pass. Note that different problems are worth different amounts of points. Problems should be answered in the space provided on these pages. The duration of the exam is 4 hours (13:00-17:00). All non-living resources may be used (books, internet including exercise web-pages, notes, etc. You should, however, not contact your uncle who is a professor of evolutionary biology at Harvard). You can work on your own, or in the groups you worked in during the computer exercises. Please do not discuss the test outside your group. Members of the group must sign below: Name Student ID number Signature 1

Table of chi-square critical values Significance level DF 0.9950 0.9750 0.9000 0.5000 0.1000 0.0500 0.0100 0.0010 1 0.0000 0.0010 0.0158 0.4549 2.7055 3.8415 6.6349 10.8276 2 0.0100 0.0506 0.2107 1.3863 4.6052 5.9915 9.2103 13.8155 3 0.0717 0.2158 0.5844 2.3660 6.2514 7.8147 11.3449 16.2662 4 0.2070 0.4844 1.0636 3.3567 7.7794 9.4877 13.2767 18.4668 5 0.4117 0.8312 1.6103 4.3515 9.2364 11.0705 15.0863 20.5150 6 0.6757 1.2373 2.2041 5.3481 10.6446 12.5916 16.8119 22.4577 7 0.9893 1.6899 2.8331 6.3458 12.0170 14.0671 18.4753 24.3219 8 1.3444 2.1797 3.4895 7.3441 13.3616 15.5073 20.0902 26.1245 9 1.7349 2.7004 4.1682 8.3428 14.6837 16.9190 21.6660 27.8772 10 2.1559 3.2470 4.8652 9.3418 15.9872 18.3070 23.2093 29.5883 11 2.6032 3.8157 5.5778 10.3410 17.2750 19.6751 24.7250 31.2641 12 3.0738 4.4038 6.3038 11.3403 18.5493 21.0261 26.2170 32.9095 13 3.5650 5.0088 7.0415 12.3398 19.8119 22.3620 27.6882 34.5282 14 4.0747 5.6287 7.7895 13.3393 21.0641 23.6848 29.1412 36.1233 15 4.6009 6.2621 8.5468 14.3389 22.3071 24.9958 30.5779 37.6973 16 5.1422 6.9077 9.3122 15.3385 23.5418 26.2962 31.9999 39.2524 17 5.6972 7.5642 10.0852 16.3382 24.7690 27.5871 33.4087 40.7902 18 6.2648 8.2307 10.8649 17.3379 25.9894 28.8693 34.8053 42.3124 19 6.8440 8.9065 11.6509 18.3377 27.2036 30.1435 36.1909 43.8202 20 7.4338 9.5908 12.4426 19.3374 28.4120 31.4104 37.5662 45.3147 21 8.0337 10.2829 13.2396 20.3372 29.6151 32.6706 38.9322 46.7970 22 8.6427 10.9823 14.0415 21.3370 30.8133 33.9244 40.2894 48.2679 23 9.2604 11.6886 14.8480 22.3369 32.0069 35.1725 41.6384 49.7282 24 9.8862 12.4012 15.6587 23.3367 33.1962 36.4150 42.9798 51.1786 25 10.5197 13.1197 16.4734 24.3366 34.3816 37.6525 44.3141 52.6197 26 11.1602 13.8439 17.2919 25.3365 35.5632 38.8851 45.6417 54.0520 27 11.8076 14.5734 18.1139 26.3363 36.7412 40.1133 46.9629 55.4760 28 12.4613 15.3079 18.9392 27.3362 37.9159 41.3371 48.2782 56.8923 29 13.1211 16.0471 19.7677 28.3361 39.0875 42.5570 49.5879 58.3012 30 13.7867 16.7908 20.5992 29.3360 40.2560 43.7730 50.8922 59.7031 31 14.4578 17.5387 21.4336 30.3359 41.4217 44.9853 52.1914 61.0983 32 15.1340 18.2908 22.2706 31.3359 42.5847 46.1943 53.4858 62.4872 33 15.8153 19.0467 23.1102 32.3358 43.7452 47.3999 54.7755 63.8701 34 16.5013 19.8063 23.9523 33.3357 44.9032 48.6024 56.0609 65.2472 35 17.1918 20.5694 24.7967 34.3356 46.0588 49.8018 57.3421 66.6188 36 17.8867 21.3359 25.6433 35.3356 47.2122 50.9985 58.6192 67.9852 37 18.5858 22.1056 26.4921 36.3355 48.3634 52.1923 59.8925 69.3465 38 19.2889 22.8785 27.3430 37.3355 49.5126 53.3835 61.1621 70.7029 39 19.9959 23.6543 28.1958 38.3354 50.6598 54.5722 62.4281 72.0547 40 20.7065 24.4330 29.0505 39.3353 51.8051 55.7585 63.6907 73.4020 2

1: (5 points) You are performing an experiment on a population of haploid, asexual organisms. Assume that the organisms exhibit exponential growth with discrete, non-overlapping generations, and that genetic drift can be ignored. At a certain locus you observe two alleles: Frequent and Rare. The absolute fitness of both alleles is R = 5 offspring/generation. At the start of the experiment the Rare allele is present in 7% of the population. What is the frequency of the Rare allele after 10 generations? 2: (10 points) Below we have listed four widely used nucleotide substitution models and a list of four PAUP model specifications. Use lines to connect the model names with the corresponding PAUP specifications: (A) Jukes and Cantor model (JC) (B) General Time Reversible model (GTR) (C) Felsenstein 81 (F81) (D) Kimura 2 parameter model (K2P) (1) lset nst=1 basefreq=empirical (2) lset nst=2 basefreq=equal tratio=estimate (3) lset nst=1 basefreq=equal (4) lset nst=6 basefreq=empirical rmatrix=estimate 3: (5 points) You have a strictly bifurcating, unrooted tree with 37 tips. In how many different positions can you place a root? 4: (15 points) You are using parsimony to reconstruct the phylogeny of a set of 9 sequences. Each of your sequences is 1 nucleotide long (!). What is the length of the two trees below? Which tree is the best one according to the parsimony criterion? 3

5: (15 points) Below is an alignment of two 45 bp long DNA sequences: Seq_1: AGCTGGCATTGCGTTATTATCATTGCATGCATCTTGTCATATGGC Seq_2: AGCAGAGTCTCCGCTGTTATCGATGTTTCCATCTTGCCTTACAGT What is the (uncorrected) distance between these two sequences? The above result does not take multiple substitutions into account. Assuming the Jukes and Cantor model of evolution, compute the estimated, corrected distance between the sequences: Now, assume the Kimura 2-parameter model of evolution, and again compute the estimated, corrected distance between the sequences: 6: (5 points) Under the so-called HKY85 model of evolution, the likelihood of a tree will depend on where the root is placed 1) True 2) False 7: (5 points) The likelihood of a model is (select one): 1) The probability of the data given the model P(Data Model) 2) The probability of the model given the data P(Model Data) 3) The probability of the model before any evidence has been collected P(Model) 4) The total probability of the data P(Data) 8: (5 points) Branch and bound searching of tree-space is guaranteed to always find the best possible tree: 1) True 2) False Heuristic searching of tree-space is guaranteed to always find the best possible tree: 1) True 2) False 4

9: (15 points) As a consequence of your graduate studies at the University of Copenhagen you have become interested in the pubic louse - Pthirus pubis. You therefore want to construct a maximum likelihood tree for a set of sequences from 35 species of lice. Your first step is to decide which model to use. Unfortunately your supervisor has run out of funding and you only have access to some extremely cheap Swedish software that is able to fit just two different models to your data: the Kimura 2-parameter model (K2P) and the general time reversible model (GTR). Below we have listed the number of free parameters in the different models (K) along with the maximal log likelihood (lnl) of the fitted models. Use likelihood ratio testing to decide which model is best (use a significance level of 5%). Model K lnl K2P 68-1245.98 GTR 75-1235.31 10: (10 points) Below we have listed the names of five different methods for reconstructing phylogenies as well as five very brief descriptions of the methods. Use lines to connect the method names with the corresponding descriptions: (A) Maximum likelihood (B) Least squares (1) The ultimate goal of this method is to find the full posterior probability distribution over trees and other parameters (2) The best tree is the one implying fewest mutations (C) Parsimony (D) Bayesian (E) Minimum evolution (3) For a given tree topology, the best branch lengths are those having the smallest deviation between observed and patristic distances; the best tree is the shortest one. (4) The best tree, and the best branch lengths for a given tree, are those having the smallest deviation between observed and patristic distances. (5) The best tree is the one giving the highest probability of the alignment having evolved from some common ancestor. 5

11: (35 points) You are investigating an outbreak of Sundbus elsinorii virus. This virus attacks only Latvian chipmunks and is characteristic in having a genome that is merely 2 nucleotides long (its fascinating life-cycle involves heavy use of programmed frame-shifting during translation of the encoded poly-protein). From a population of chipmunks living in cages at the Latvia University of Agriculture (Latvijas Lauksaimniecibas Universitate) you have obtained 6 fullgenome sequences that are known to be related as shown on the unrooted tree below. Note that two of the sequences are placed at the ancestral nodes. All branches have the same length t 1. Indicated below the tree is the time-reversible substitution-probability matrix corresponding to this branch length (specifically, the matrix is of the F81 type). The equilibrium frequencies of the nucleotides are: π A =0.2, π C =0.3, π G =0.4, π T =0.1. Compute the probability of the data given the information provided here. (Tip: perform the computation one nucleotide position at a time and combine the results). to A C G T A 0.79 0.08 0.10 0.03 From C 0.05 0.82 0.10 0.03 t 1 G 0.05 0.08 0.84 0.03 T 0.05 0.08 0.10 0.77 6

12: (25 points) During a ski trip in Norway with 5 friends from your water polo team, you suddenly remember that your supervisor asked you to finish a Bayesian analysis of a set of mitochondrial sequences (obtained from a population of Latvian chipmunks) just before you left. Regrettably, you forgot to do this. However, you fortunately brought a printout of the full data set (in fasta-format) with you to Norway. Since there is no electricity (not to mention a computer) in the ski hut you decide to do the analysis by hand and then send the result to your supervisor by ordinary surface mail. To simplify the job somewhat, you decide to use the Kimura 2 parameter substitution model in a discretized version where only 6 possible values of the transition/transversion ratio are considered: T=1.0, T=2.0, T=3.0, T=4.0, T=5.0, and T=6.0. Based on your previous experience with mitochondrial sequences you have specified the following prior probabilities for the six possible parameter values: Prior probabilities: P(T=1.0) = 0.05 P(T=2.0) = 0.10 P(T=3.0) = 0.25 P(T=4.0) = 0.40 P(T=5.0) = 0.15 P(T=6.0) = 0.05 For each of these possible values you have furthermore computed the probability of the data given that particular parameter value (the likelihood): Likelihoods: P(D T=1.0) = 0.01 P(D T=2.0) = 0.05 P(D T=3.0) = 0.15 P(D T=4.0) = 0.23 P(D T=5.0) = 0.78 P(D T=6.0) = 0.15 Find the posterior probability of the six possible parameter values: P(T=1.0 D) = P(T=2.0 D) = P(T=3.0 D) = P(T=4.0 D) = P(T=5.0 D) = P(T=6.0 D) = 7