Realism and Instrumentalism. in models of. molecular evolution

Similar documents
Orthologous loci for phylogenomics from raw NGS data

Molecular evidence for multiple origins of Insectivora and for a new order of endemic African insectivore mammals

Molecules consolidate the placental mammal tree.

Sequence motif analysis

Loss of information at deeper times

Evidence of Evolution by Natural Selection. Dodo bird

# BBS Routes where species present. Relative abundance - BBS Abundance score - BBS. Precision estimate - BBS. Sample size score - BBS

Stat 529 (Winter 2011) A simple linear regression (SLR) case study. Mammals brain weights and body weights

From then till now the Cenozoic

Molecules, divergence times and the evolution of life-histories

Phylogenetic Tree Reconstruction

A Survey of Clustering Analysis and Clustering Analysis in Graphs

AP Biology. Evolution is "so overwhelmingly established that it has become irrational to call it a theory." Evidence of Evolution by Natural Selection

Mechanisms of Evolution Darwinian Evolution

Land Biomes. Biome- geographic areas that have similar climates and ecosystems

Typing Practice Pages

Complex evolutionary history of the vertebrate sweet/umami taste receptor genes

Lecture 16: Again on Regression

Land Biomes. Biome- geographic areas that have similar climates and ecosystems

Chapter 4.3 Biomes. Slide 1 of 42. End Show. Copyright Pearson Prentice Hall

4-3 Biomes. biology. 4-3 Biomes. Biomes. Slide 1 of 54. Slide 2 of 54. Slide 3 of 54. Copyright Pearson Prentice Hall. Copyright Pearson Prentice Hall

Appendix 7. Wetland birds observed during the survey in different states. Common Name*

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Reconstructing the History of Large-scale Genomic Changes. Jian Ma

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Processing Activities

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Jackie Garner s. Wildlife brainteaser ANSWERS Jackie Garner

F E H M A R N B E L T B I R D S. Bird Investigations in Fehmarnbelt - Baseline. Raw data of FEBI surveys and Distance analysis results

Evidence of Evolution by Natural Selection. Evidence supporting evolution. Fossil record. Fossil record. Anatomical record.

User s Manual for. Continuous. (copyright M. Pagel) Mark Pagel School of Animal and Microbial Sciences University of Reading Reading RG6 6AJ UK

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Environmental Influences on Adaptation

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Information on. Australian Animals. Author: Natalie Young School: Coonabarabran Public School

Estimation of species divergence dates with a sloppy molecular clock

Supplementary information

Proximal point algorithm in Hadamard spaces

Anthro 101: Human Biological Evolution. Lecture 7: Taxonomy/Primate Adaptations. Prof. Kenneth Feldmeier

Modern Evolutionary Classification. Section 18-2 pgs

MAP Examples. Sargur Srihari

0 Mya - Humans Goodbye Big Dinosaurs Mammals EXPLODE First flowers 100 Mya- First 200 Mya-

Week 7: Bayesian inference, Testing trees, Bootstraps

Introduction to Biological Anthropology: Notes 9 What is a primate, and why do we study them? Copyright Bruce Owen 2008

Phylogenetic inference

Phylogenetics. BIOL 7711 Computational Bioscience

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Lab 22: Classification of Species

Relationships of Floras (& Faunas)

Alignment Strategies for Large Scale Genome Alignments

Cladistics and Bioinformatics Questions 2013

Supplementary text and figures: Comparative assessment of methods for aligning multiple genome sequences

Australian Grasslands

Name: Class: Date: Ecosystem Interactions. Multiple Choice Identify the choice that best completes the statement or answers the question.

By Dava Swafford. Saturday, December 6, 14

Modern Phylogenetics. An Introduction to Phylogenetics. Phylogenetics and Systematics. Phylogenetic Tree of Whales

Hidden Markov Models for Unaligned DNA Sequence Comparison. James Cook University, Townsville, QLD 4811, Australia.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

PHYLOGENY AND SYSTEMATICS

Anthro 101: Human Biological Evolution. Lecture 7: Taxonomy/Primate Adaptations. Prof. Kenneth Feldmeier

How should we organize the diversity of animal life?

community. A biome can be defined as a major biological community of plants and animals with similar life forms and

Gels and Trees. Jacob Landis

Integrated Monitoring in Bird Conservation Regions (IMBCR) in Idaho

Introduction to Biological Anthropology: Notes 11 What is a primate, and why do we study them? Copyright Bruce Owen 2011

Biology Keystone (PA Core) Quiz Theory of Evolution - (BIO.B ) Theory Of Evolution, (BIO.B ) Scientific Terms

CHAPTER 2: BIOLOGICAL EVOLUTION Evidence for Evolution

Open a Word document to record answers to any italicized questions. You will the final document to me at

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Biomes. Section 4-3 pgs

Old Document: Appendices 6 pages (1.59 MB) 8/9/13 11:10:56 PM -07'00'

Biome Comparison Chart

Chapter 26 Phylogeny and the Tree of Life

Lab Exercise 9 Microfaunas and Past Environments

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Standard 5.0 Learning expectation 5.2- Performance indicator level 1

Where Animals and Plants Are Found

Doug Cochran. 3 October 2011

Quizizz Biome/Food Chain Quiz with Sci Method/EDP Review

Algorithms in Bioinformatics

Copyright notice. Molecular Phylogeny and Evolution. Goals of the lecture. Introduction. Introduction. December 15, 2008

Fossils ACTIVITY I: FOSSILIZATION. Activity 1 is for K 2 nd. Activities I and II are for 3 rd 5 th

Estimating maximum likelihood phylogenies with PhyML

b. In Table 1 (question #2 on the Answer Sheet describe the function of each set of bones and answer the question.)

Organisatorische Details

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

WHAT IS IT? FOSSILS - preserved remains of organisms that can show skeletal features and can be dated

Math Homework 5 Solutions

THE EVIDENCE FOR EVOLUTION

Name Date Class. In the space at the left, write the letter of the phrase or sentence that best answers each question.

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

ORDER FORM. Electronic brochure and price list/order form available at Item Number Pg Barcode SRP.

Being Bayesian About Network Structure:

California Content Standards Grade 4

Phylogenetic trees 07/10/13

Transcription:

Galileo Realism and Instrumentalism in models of molecular evolution David Penny Montpellier, June 08

Overview sites free to vary summing sources of error rates of molecular evolution estimates of time intervals do we know anything? (flat priors)

Human/chimp divergence 1) Ramapithecus = 12Ma HC = 5±1Ma But Ramapithecus in Asia, HCG in Africa. Is 18-20Ma a better estimate for divergence? 2) Ramapithecus = 18Ma HC = 7.5±1.5Ma Or should we combine uncertainties? In this case, I would rather not leave it as a conditional estimate need both.

sites free to vary rate k aa 10 9 /yr - fibrinopeptides 8.3 - lysozyme 2.0 - hemoglobin α 1.2 - cytochrome c 0.3 - histone H4 0.01 Dickerson, 1971 explained the differences by the proportion of sites free to vary. change of function should show a rate change realism

we use a tiny fraction of the information in the data Alignment Reordered Alignment original sequence order shuffled/reordered AIIFLNSALGPSPELFPIILATKVL ASAGPSPPATPLLIIIILLFFNEKV AIMFLNSALGPPTELFPVILATKVL ASAGPPTPATPLLIMVILLFFNEKV SIMFLNHTLNPTPELFPIILATETL SHTNPTPPATPLLIMIILLFFNEET TILFLNSSLGLQPEVTPTVLATKTL TSSGLQPPATPLLILTVLVTFNEKT TLLFLNSMLKPPSELFPIILATKTL TSMKPPSPATPLLLLIILLFFNEKT ALLFLNSTLNPPTELFPLILATKTL ASTNPPTPATPLLLLLILLFFNEKT AILFLNSFLNPPKEFFPIILATKIL ASFNPPKPATPLLILIILFFFNEKI c columns c! alignments If c = 1000, we use 1/ 1000! of the information

sites change X-ray crystallographers: the strongest conclusion we have is that the same sites in different species may be fixed, in others they are variable. Molecular Phylogeneticists: Our methods (such as the Gamma distribution) assume sites are in the SAME rate class across the entire tree (AND, we only need one parameter- so there).

simulation results with standard model 0 0.5 1 number of internal edges correct, out of 6 neighbor joining, 9 taxa, 1000 columns, i.i.d. 6 5 4 3 2 1 0 5 8 13 20 32 50 80 125 200 320 500 790 1250 2000 millions of years (log scale)

Calculated results, Δ ¼ + ne -qt loss of information 1 0.01 0.005 0.002 0.001 0.8 0.6 0.4 0.2 0-0.2 1 10 100 1000 10000

simulation results with covarion model 120% 100% d=0.001 d=0.100 d=0.500 percentage of trees correct 80% 60% 40% 20% d=1.000 d=2.000 d=5.000 infinite 0% 0.1 1 10

do rates exist!!! We go ON and ON and ON and ON About molecular clocks. Should we??

not enough information to recover the full model 1- γ δ γ 1- δ 1(P R, 1- P R ) 2 2 composition at root Seq 1 Seq 2 5 required, 3 available

two taxa, two codes Seq 1 Seq 2 Seq 1 R Y α β γ * R Y Seq 2 1 2 R R α R Y β Y R γ Y Y * Divergence matrix, F i,j Three independent parameters estimated

three taxa 1- γ δ γ 1- δ 1 (P R, 1- P R ) 2 2 2 Seq 1 Seq 2 Seq 3 7 required

four character states * α β γ δ * ε φ η ι * ϕ κ λ µ * 12 3 12 (P R, 1- P R ) 12 Seq 1 Seq 2 Seq 3 39 required

tensor, 3D matrix 0.001279 0.000071 0.000071 0.000853 0.007819 0.002701 0.004265 0.000284 0.011231 0.000142 0.006682 0.001990 0.000995 0.000284 0.000426 0.000284 0.274950 0.002985 0.007961 0.009383 0.003838 0.004407 0.0007110.000426 0.010520 0.000284 0.188371 0.000284 0.001564 0.004691 0.000426 0.001137 0.009667 0.003838 0.023742 0.004834 0.002985 0.201166 0.0004260.003554 0.001137 0.000995 0.002275 0.000711 0.006682 0.001279 0.000426 0.143588 0.001848 0.000426 0.001848 0.000853 0.015496 0.005118 0.0008530.007819 0.000284 0.000569 0.000853 0.000995 0.000569 0.000142 0.001564 0.002132 64 1 = 63 values, but a sparse matrix!

primary diagonal Gymnure, Mole and Shrew T T 0.274950 0.007961 0.003838 0.000711 T C 0.009667 0.023742 0.002985 0.000426 T A 0.001848 0.001848 0.015496 0.000853 T G 0.000569 0.000142 0.001564 0.002132 C T 0.011231 0.006682 0.000995 0.000426 C C 0.010520 0.188371 0.001564 0.000426 C A 0.001137 0.002275 0.006682 0.000426 C G 0.000284 0.000569 0.000853 0.000995 A T 0.007819 0.002701 0.004265 0.000284 A C 0.002985 0.009383 0.004407 0.000426 A A 0.003838 0.004834 0.201166 0.003554 A G 0.000426 0.000853 0.005118 0.007819 G T 0.001279 0.000071 0.000071 0.000853 G C 0.000142 0.001990 0.000284 0.000284 G A 0.000284 0.000284 0.004691 0.001137 G G 0.000995 0.000711 0.001279 0.143588 T C A G

secondary diagonals Gymnure(moon rat) Mole, Shrew T T 0.274950 0.007961 0.003838 0.000711 T C 0.009667 0.023742 0.002985 0.000426 T A 0.001848 0.001848 0.015496 0.000853 T G 0.000569 0.000142 0.001564 0.002132 C T 0.011231 0.006682 0.000995 0.000426 C C 0.010520 0.188371 0.001564 0.000426 C A 0.001137 0.002275 0.006682 0.000426 C G 0.000284 0.000569 0.000853 0.000995 A T 0.007819 0.002701 0.004265 0.000284 A C 0.002985 0.009383 0.004407 0.000426 A A 0.003838 0.004834 0.201166 0.003554 A G 0.000426 0.000853 0.005118 0.007819 G T 0.001279 0.000071 0.000071 0.000853 G C 0.000142 0.001990 0.000284 0.000284 G A 0.000284 0.000284 0.004691 0.001137 G G 0.000995 0.000711 0.001279 0.143588 T C A G

moon rat, 1+2 T 0.955 0.148 0.087 0.028 C 0.025 0.803 0.025 0.009 A 0.018 0.043 0.876 0.076 G 0.002 0.006 0.012 0.887 T C A G T.955 ±.004.150 ±.013.087 ±.009.029 ±.008 C.025 ±.003.800 ±.014.025 ±.005.009 ±.003 A.018 ±.003.044 ±.006.877 ±.011.077 ±.011 G.002 ±.001.006 ±.002.012 ±.002.886 ±.015 T C A G therefore we believe in symmetric models

mole, shrew and moon rat mole T 0.976 0.062 0.021 0.013 C 0.017 0.931 0.020 0.007 A 0.006 0.006 0.948 0.012 G 0.001 0.001 0.010 0.968 T C A G shrew T 0.977 0.038 0.024 0.011 C 0.020 0.951 0.020 0.003 A 0.002 0.009 0.942 0.011 G 0.001 0.001 0.015 0.976 moon rat T 0.955 0.148 0.087 0.028 C 0.025 0.803 0.025 0.009 A 0.018 0.043 0.876 0.076 G 0.002 0.006 0.012 0.887 T C A G

* α β γ δ * ε φ η ι * ϕ κ λ µ * change in rate * α β γ δ * ε φ η ι * ϕ κ λ µ * * α β γ δ * ε φ η ι * ϕ change κ λ µ in process * * α β γ δ * ε φ η ι * ϕ κ λ µ *

do we know anything? the curse of flat priors the we know nothing syndrome

Probability of a Armadillo Aardvark Tenrec Hedgehog partition Gymnure Mole Shrew LClawShrew Xenarthra Horse IndRhino Cat Dog Elephant Dugong HarbSeal GreySeal FurSeal BrownBear Pig Cow Hippo BlueWhale SpermWhale HecDolphin Alpaca FlyingFox Rhinolophus JFEbat LTailBat PipBat Rabbit Pika Squirrel Dormouse GuineaPig CaneRat Mouse Vole TreeShrew Baboon Gibbon Tarsier Loris 4 Afrotheria # binary trees, b(n) = (2n 5)!! = 1 x 3 x 5 x 7 2n 5. 2 Supraprimates 18 27 Laurasiatheria 6 27 18 27 18 5.68x10 18

Probability of a partition2 # binary trees, b(n) = (2n 5)!! = 1 x 3 x 5 x 7 2n 5. 7 8 6 b(n 1 +1).b(n 2 +1) / b(n t ) 2 7 2 7 6 b(n 1 +1).b(n 2 +1).b(n 3 +1) / b(n t ) b(n 1 +1).b(n 2 +1) b(n i +1) / b(n t )

40 birds KingWood Owls white-tailed trogon pileated woodpecker ivory billed toucan Parrots barn owl morepork New Zealand kingfisher peach-faced lovebird dollar bird kakapo budgerigar Conglomerati Eurasian buzzard Blyth s hawk eagle osprey Cuckoos roadrunner Passerines rook superb lyre bird rifleman New Zealand long-tailed cuckoo rockhopper penguin little blue penguin Kerguelen petrel black-browed albatross Oriental white stork Australian pelican * * * * * * * * * flamingo red-throated loon frigatebird gray-headed broadbill fuscous flycatcher forest falcon peregrine falcon Australian owlet nightjar Ruby-throated hummingbird common swift great potoo ruddy turnstone southern black-backed gull blackish oystercatcher Shorebirds great crested grebe Australasian little grebe Conglomerati CAM

P(n,k) = R(k) B(n-k+1) B(n) probability with n taxa of observing a prespecified clade of size k. with n = 40 and k = 2, P 0.013 k = 3, P 0.0026 parrots k = 4, P 7.12 10-6, k = 5, P 5.84 10-8. cuckoo,roadrunner

A 4 th 5 th 6 th B R(k) C k C 1 D k C 2 E 6C 2 4C 2 B(n - k) B(n - k) B(n - 6)

potoo, owlet-nightjar, owl, barn owl, swift, hummingbird (6)

Where next in Phylogeny? allow realism in phylogeny set the biological question we have some bad failures we need a range of alternatives Belief is the curse of the thinking class

tensor, 2-states Seq 1 Seq 2 Seq 3 R R R Seq 3 R Y β δ R α γ Seq 1 φ * Y ε η R Y Seq 2 7 available! R R R α R R Y β R Y R γ R Y Y δ Y R R ε Y R Y φ Y Y R η Y Y Y * 1 2 3