Phylogenetics Topic 1: An overview

Similar documents
Lecture 11 Friday, October 21, 2011

8/23/2014. Phylogeny and the Tree of Life

What is Phylogenetics

Chapter 26 Phylogeny and the Tree of Life

Dr. Amira A. AL-Hosary

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Classification, Phylogeny yand Evolutionary History

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Biology 211 (2) Week 1 KEY!

How should we organize the diversity of animal life?

Reconstructing the history of lineages

Chapter 19: Taxonomy, Systematics, and Phylogeny

Lecture V Phylogeny and Systematics Dr. Kopeny

PHYLOGENY & THE TREE OF LIFE

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

C.DARWIN ( )

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Chapter 16: Reconstructing and Using Phylogenies

Biology 1B Evolution Lecture 2 (February 26, 2010) Natural Selection, Phylogenies

Chapter 26 Phylogeny and the Tree of Life

Fig. 26.7a. Biodiversity. 1. Course Outline Outcomes Instructors Text Grading. 2. Course Syllabus. Fig. 26.7b Table

Phylogeny and the Tree of Life

Phylogenetic Analysis

Phylogenetic Analysis

Phylogenetic Analysis

A Summary of the Theory of Evolution

CHAPTER 26 PHYLOGENY AND THE TREE OF LIFE Connecting Classification to Phylogeny

1/27/2010. Systematics and Phylogenetics of the. An Introduction. Taxonomy and Systematics

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Phylogeny and the Tree of Life

Chapter 22: Descent with Modification 1. BRIEFLY summarize the main points that Darwin made in The Origin of Species.

The practice of naming and classifying organisms is called taxonomy.

Outline. Classification of Living Things

AP Biology. Cladistics

Name. Ecology & Evolutionary Biology 2245/2245W Exam 2 1 March 2014

CLASSIFICATION OF LIVING THINGS. Chapter 18

Classification and Phylogeny

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

ESS 345 Ichthyology. Systematic Ichthyology Part II Not in Book

BINF6201/8201. Molecular phylogenetic methods

Classification and Phylogeny

Statistical Models in Evolutionary Biology An Introductory Discussion

Unit 9: Evolution Guided Reading Questions (80 pts total)

Historical Biogeography. Historical Biogeography. Systematics

Cladistics and Bioinformatics Questions 2013

Organizing Life s Diversity

PHYLOGENY AND SYSTEMATICS

Biology 2. Lecture Material. For. Macroevolution. Systematics

A) Pre-Darwin History:

SPECIATION. REPRODUCTIVE BARRIERS PREZYGOTIC: Barriers that prevent fertilization. Habitat isolation Populations can t get together

Chapter 27: Evolutionary Genetics

Chapter 19 Organizing Information About Species: Taxonomy and Cladistics

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Introduction to Biosystematics - Zool 575

Name: Class: Date: ID: A

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Mechanisms of Evolution Darwinian Evolution

15 Darwin's Theory of Natural Selection 15-1 The Puzzle of Life's Diversity

The Tempo of Macroevolution: Patterns of Diversification and Extinction

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Evolution Unit: What is Evolution?

A (short) introduction to phylogenetics

Introduction to characters and parsimony analysis

5/31/17. Week 10; Monday MEMORIAL DAY NO CLASS. Page 88

Evolution and Darwin

Phylogeny and the Tree of Life

Using Trees for Classifications. Introduction

Evaluate evidence provided by data from many scientific disciplines to support biological evolution. [LO 1.9, SP 5.3]

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

ELE4120 Bioinformatics Tutorial 8

How to read and make phylogenetic trees Zuzana Starostová

Unit 7: Evolution Guided Reading Questions (80 pts total)

Need for systematics. Applications of systematics. Linnaeus plus Darwin. Approaches in systematics. Principles of cladistics

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Biology Slide 1 of 41

Biology. Slide 1 of 41. End Show. Copyright Pearson Prentice Hall

TEACH EVOLUTION LEARN SCIENCE

Publication of On the Origin of Species Darwin Presents His Case

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Phylogenetics. BIOL 7711 Computational Bioscience

31/10/2012. Human Evolution. Cytochrome c DNA tree

Origin of an idea about origins

OCR (A) Biology A-level

Integrating Fossils into Phylogenies. Throughout the 20th century, the relationship between paleontology and evolutionary biology has been strained.

Algorithms in Bioinformatics

Concept Modern Taxonomy reflects evolutionary history.

Macroevolution Part I: Phylogenies

How Biological Diversity Evolves

Applications of Genetics to Conservation Biology

EVOLUTION change in populations over time

The Tree of Life. Chapter 17

EVOLUTION. HISTORY: Ideas that shaped the current evolutionary theory. Evolution change in populations over time.

Nomenclature and classification

Geography of Evolution

Phylogeny is the evolutionary history of a group of organisms. Based on the idea that organisms are related by evolution

Lecture 6 Phylogenetic Inference

Chapter 26. Phylogeny and the Tree of Life. Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick Pearson Education, Inc.

Workshop: Biosystematics

Transcription:

Phylogenetics Topic 1: An overview Introduction The affinities of all beings of the same class have sometimes been represented by a great tree. I believe this simile largely speaks the truth. The green budding twigs may represent existing species; and those produced during former years may represent the long succession of extinct species...and this connection of the former and present buds by ramifying branches may well represent the classification of all extinct and living species in groups subordinate to groups. Charles Darwin, in Chapter IV of On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life. A fundamental concept of the theory of evolution, independently developed by Charles Robert Darwin and Alfred Russell Wallace and published jointly in a letter of 1858, is that species share a common origin and have subsequently diverged through time. Interestingly, both men came to use the simile of a great tree to illustrate this notion of descent with modification, and ever since biologists have been using tree-like diagrams to describe the pattern and timing of events that gave rise to the earth s biodiversity. The branching pattern of the tree represents the splitting of biological lineages, and the lengths of the branches can be used to signify the age of those events. Today, biologists call these tree-like diagrams phylogenies. Unrooted tree diagram drawn in the margin of one of Charles Darwin s notebooks Phylogenetic tree used in The Origin of Species. Darwin wasn t just thinking about classification based on phylogenies. He used them to visualize the process of divergence within species and the splitting of populations into separate species. Darwin used this figure to illustrate divergence of variants within species; over time successively more variation accumulates. Eventually some of this variation forms the basis for new species.

The biological discipline dedicated to reconstructing organismal phylogenies is called phylogenetics. Parallel advances in a number of fields led to a tremendous growth in phylogenetics over the last 40 years. First, beginning in the 1960 s, sophisticated techniques were developed and refined for the purpose of reconstructing phylogenies from the actual features, or characters, of organisms. Second, phylogenetics grew beyond its traditional application to classification of living organisms. Recognition that phylogenies can provide an evolutionary framework for studying a wide variety of problems led to their application in almost every other sub discipline of biology. Third, rapid increases in the computational power of computers meant that programs implementing phylogeny reconstruction algorithms could accommodate very large amounts of data. Lastly, the revolution in molecular biotechnology opened up a vast new source of characters to phylogenetic analysis. Before discussing the wide-ranging applications of phylogenies, it is necessary to define some essential terminology. An imaginary species phylogeny is presented in figure 1a as a guide. The lines of the phylogeny, called branches, represent species, and the bifurcation points, called nodes, represent speciation events. The tips of the terminal branches are present-day species, and each node represents a species that is the common ancestor of all its descendants, or daughter species. For example, in figure 1a the species at node B is the most recent common ancestor of present-day species 1, 2, and 3, and is not an ancestor of species 4 or 5. Furthermore, the group composed of ancestor B and all its descendants (species 1, 2, 3, and A) is called a clade, or a monophyletic group. Smaller clades are comprised of A and all its descendants, and D and all its descendants. It must be noted that phylogenetics is not restricted to just species. Phylogenetic methods can be used to depict kinship of individuals within a local group or population, relationships among populations or subspecies, relationships among taxonomic lineages above species (e.g., supraspecific categories such as genera, families, etc.), relationships among genes within populations, or relationships among different genes within a gene family. Figure 1

The phylogeny in figure 1a (above) is rooted at node C, allowing us to infer which ancestral species gave rise to which present-day species. Without a root, a phylogeny looks very different; compare figure 1a with 1b, they differ only by the placement of a root. The importance of placing a root on a phylogeny should now be clear; without a root biologists cannot distinguish between what is ANCESTRAL and what is DERIVED (descendant). We will return to the concept of a root in topic 3 [methods]. Rooted phylogenies allow biologists to distinguish similar characteristics due to common decent (HOMOLOGY) from similar characteristics due to convergence from different ancestors (ANALOGY) (see figure 2 to right). However, most methods of phylogenetic inference produce unrooted trees, and the location of the root also must be inferred. Figure 2 Rooted phylogenies allow biologists to infer CHARACTER POLARITY; the evolutionary relationship between two or more states for a given character. Say we have a character with two states, a and b. By mapping them on a phylogeny we can determine that b preceded a in evolutionary history; hence a is the derived state and b is the primitive state. In the former examples, branch lengths were not intended to convey any information (figures 1a and 1b). The phylogeny in figure 1c illustrates how branch lengths can show how much change has occurred along a branch. In the case of molecular characters, if the rate of evolution is constant over time (the so-called molecular clock), the branches will show the relative divergence times of the lineages. For example, figure 1c indicates that the divergence of species 1 and 2 was much more recent than divergence of species 4 and 5. Moreover, if the divergence dates of some points in the phylogeny are known from the fossil record (calibration points), and the characters are evolving in a clock-like fashion, the phylogeny can be used to predict divergences absent from the fossil record. Below is an example of a real dataset (COII and cyt b gene sequences of selected mammals) where the branch lengths have been estimated once by assuming clock-like molecular evolution and again without such an assumption. Branch lengths estimated under the assumption of the molecular clock Branch lengths estimated without assumption of the molecular clock Felis Canis Ursus Felis Canis Ursus Root Bos Hippopotamus Physeter Root Bos Hippopotamus Physeter 0.1 Balaenoptera Rhinoceros Equus Balaenoptera Rhinocero s Equus 0.1 Tips are contemporary; the distance from root to each tip is the same Tips are NOT contemporary; the distance from root to each tip is NOT the same

The phylogenetic comparative method Evolutionary biologists use the comparative method to discover common evolutionary patterns, and to understand the causes of those patterns. The key to this approach is discovering correlated patterns of evolution between different characters of organisms, or between characters of organisms and aspects of the environment that they inhabit. Most comparative studies attempt to address the adaptive significance of biological variation, although many patterns ultimately require non-adaptive explanations. Since Darwin s time, the comparative method has remained one of the most important analytical tools of evolutionary biologists. However, comparative biology has recently undergone a major transformation; the realization that the characteristics of species could be correlated due to shared ancestry, taken alongside the major developments in the field of phylogenetics, meant that evolutionary biologists had to examine comparative trends together with phylogenetic relatedness. What is the problem? Standard statistical methods for assessing the correlation treat the data drawn from different species as independent. Because species are hierarchically related by the phylogeny they cannot be treated as if drawn independently from the same distribution. Let s consider a hypothetical example. Consider a phenotype (say, the size of a primate s big toe; Y) and an ecological variable (say, the frequency of things that a big toe can be stubbed into; X). Suppose you have gone to great trouble to collect measurements for size of big toe and the stubbiness of the habitat, and you are interested in the significance of any relationship of Y on X. So, you plot you data and you find what appears to be a significant correlation. Hypothetical dataset for phenotype (Y) and ecological variable (X) Y X Now consider at some point in early history that two species diverged for toe-size and colonized two different habitats. At that point in time there are only two points that lie on a straight line, but the correlation cannot be significant; there are, after all, only two points and the regression has zero degrees of freedom. Two point dataset from early in evolutionary history Y X Now consider some evolutionary time has passed and each of these two species gives rise to 100 descendent species. By this accident of history, all the descendants in one clade will have a larger toe and tend to be in one habitat type, and the descendents of the other species will have a smaller toe and tend to be in the other habitat type. If our sample of data came from these two clades, we would have effectively sampled only two species.

Phylogeny of two groups of close relatives Big-toe clade Little-toe clade Recent diversifications Old divergence of big-toed and little-toed primates If we code our data to indicate the clade of origin (below) we see that the correlation is an illusion generated by two clusters with different mean values. Hypothetical dataset with points coloured according to clade of origin Y X Little-toed clade Big-toed clade One way to analyze these data is to use a method called FELSENSTEIN S INDEPENDENT CONTRASTS. The phylogeny is divided into subsets of independent branches. A Brownian motion model is used to place an estimate of the variance on the branch lengths of the contrasts. The independent contrasts can be considered drawn from a normal distribution with a mean of zero. An alternative approach is to use ANCESTRAL CHARACTER STATE RECONSTRUCTION, a statistical method of inferring the most likely character state at a site for each ancestral node of a phylogeny. These ancestral reconstructions are then used to infer and count the number of times that a trait of interest has evolved on a phylogeny. Both approaches take a particular topology as given; and additional steps must be employed to take into account the error associated with a particular estimate of a phylogeny. Joseph Felsenstein, in the paper that laid the foundation for the modern transformation of comparative biology (Felsenstein. 1985. Am Nat. 125:1-15.), wrote phylogenies are fundamental to comparative biology; there is no doing it without taking them into account. Phylogenetically related species will be more similar in both phenotype and lifestyle than distantly related species, and modern comparative methods must attempt to distinguish between similarities due to similar adaptive pressures and similarities due to descent from common ancestors.

APPLICATIONS OF PHYLOGENETICS Phylogenies can have practical value in almost every branch of biology, a fact that has become widely recognized only in the last decade. This expansion, however, makes it impossible to review all the applications of phylogenies; instead, some examples are presented that include both classic and novel applications. 1. Systematics, classification, and taxonomy. Perhaps the most traditional application of phylogenetics is classification and systematics. Biological classifications are systems that organize the diversity of life, and systematics is the study of that diversity relative to some kind of specified relationship. Biologists generally agree that classification and systematics of species and supraspecific taxa should reflect the natural organization of biological diversity. The discipline devoted to producing a classification that portrays the evolutionary relationships of species and supraspecific lineages is called phylogenetic systematics. Narrowly defined, phylogenetic systematics has two basic components: (i) phylogenetic inference and (ii) production of a hierarchal classification system that exactly reflects the phylogenetic relationships. However, this definition has been broadened by some biologists to include many aspects of comparative evolutionary biology. ERNST HAECKEL S TREE OF LIFE, DRAWN SOMETIME IN THE LATE 1800 S Placed Menschen ( Men ) at the top of the tree among the Affen ( Apes ). Haeckle was first to suggest man s ancestry was among the Great Apes. This tree was a tree of men, and Haeckels s placement of Menschen at the top was intentional Non-mammalian vertebrates Invertebrates This tree and associated system of classification is different from modern ones in that it is based on the notion of linear progress (like a ladder) from the most primitive single-celled organisms upwards to man (at the very top). Haeckel considered the things near the top as more evolved and things near the bottom as primitive. Protozoa Ernst Haeckel (1834-1919) was a German biologist and scientific illustrator. He was one of the first popularizers of Darwin s Theory of Evolution. The tree to the left is from his book General Morphology founded on the descent theory.

If a classification system is to be phylogenetic, the naming of species and supraspecific taxa (taxonomy) must reflect their phylogenetic relationships. For this reason, named taxa must comprise MONOPHYLETIC GROUPS; i.e., a named taxon must represent a group descended from a single ancestral species, and all descendants of that ancestor must be included in the named taxon. A monophyletic group is also called a CLADE. This means that if a named taxon includes the common ancestor and only some of its descendants (PARAPHYLY), or does not include the most recent common ancestor (POLYPHYLY), it is not acceptable in a phylogenetic classification. Monophyly, paraphyly and polyphyly A B C D E A B C D E H F H F G G Monophyletic group J [Clade] J Paraphyletic group (AHJGFDE) and a polyphyletic group (BC) Take the traditional class Reptilia as an example. The traditional Reptilia included the crocodylomorphs (alligators and crocodiles), the lepidosauromorphs (lizards, snakes, and relatives) and the anapsids (turtles and relatives). Phylogenetic analyses, however, indicated that the common ancestor of reptiles also was the ancestor of birds and mammals, which had been placed in different classes. Therefore, the traditional taxonomic grouping called Reptilia was paraphyletic. Practitioners of phylogenetic systematics point out that by using the traditional classification one neglects to recognize a phylogenetic relationship between birds and Crocodylomorphs, and between mammals and extinct synapsid reptiles. The old Reptilia as an example of classification based on a paraphyletic group. Aves (birds) Old Reptilia is a GRADE Lots of dinosaur diversity Ornithischia (some plant eating dinosaurs) Crocodylomorph (gators and crocs) Lepidosauromorph (lizards snakes, etc.) Amniota is a clade Anapsids (turtles and relatives) Diversity of extinct mammal-like reptiles Mammals (Synapsids) The ultimate goal of phylogenetic systematics is a phylogenetic history of all life on earth, the proverbial Tree of Life. A multiauthored internet project is dedicated to achieving this goal. Individual parts of the Tree of Life are authored by biologists around the world, each working on a specific group of organisms, and are published electronically on the World Wide Web. When completed, it will provide a phylogenetic history for all life on earth, a unified taxonomy, and a means of searching and retrieving information about the characteristics of organisms. You can check the progress of this project by visiting the Tree of Life website (http://phylogeny.arizona.edu/tree/phylogeny.html).

2. Biogeography. Biogeography is the study of the distribution of biological diversity in space and time. The subdiscipline devoted to understanding the underlying historical factors that have influenced biogeographic diversity is called historical biogeography. By considering the relationships of taxa, their geographic distributions, and the geological history of the regions they occupy, biogeographers can sometimes infer the historical importance of dispersals and geographic isolation, and make inferences about modes of speciation. The methods of historical biogeography also can be applied to uncover geographic patterns of genetic variation within species (a pursuit called phylogeography). Phylogeographers use molecular data to infer an intraspecific gene phylogeny that is then mapped onto the geographic distribution of the species. Phylogeorgaphy Phylogeorgaphy allows allows one to one test to hypotheses test hypotheses such as such whether as whether geographic/ geographic/environmental environmental factors have factors been have historically been historically important important barriers to barriers gene flow. to gene WEST: low elevation and dry EAST: high elevation and wet Phylogeographic analysis of mouse lemurs contradicts the expected east-west disjunction for Madagascar, and suggests a completely novel north-south disjunction. The observed phylogenetic tree was inferred from mitochondrial DNA gene sequences. Figure adapted from separate figures in A. D. Yoder (2004) In press

3. Health sciences. With recent advances in DNA sequencing technology, phylogenetic analysis of genes has developed into an important tool for tracking the evolution and spread of infectious diseases. Epidemiological questions that can be addressed by phylogenetic analysis of DNA sequences include: (i) what was the origin of an emerging disease, (ii) was there a single origin or has a disease entered a population in different locations or at different times; (iii) how was the infectious disease spread; (iv) what was the source of a particular transmission event (see slides); (v) how does the disease organism evolve resistance to its host; (vi) how does the host immune system evolve resistance to the disease; and (vii) are there species closely-related to the known pathogens that might be able to cause disease in humans? The case of HIV (human immunodeficiency virus) illustrates the utility of phylogenetics in epidemiology. Phylogenetic analysis indicated that HIV consists of two main types (HIV-1 and HIV-2) and numerous subtypes. Furthermore, it showed that HIV-1 and HIV-2 entered the human population from different sources, as HIV-1 is more closely related to chimpanzee SIVs (simian immunodeficiency virus), and HIV-2 is more closely related to mangabey monkey SIVs. Because different subtypes within HIV-1 are related to different lineages of chimpanzee SIV, and different subtypes of HIV-2 are related to different lineages of mangabey SIV, it seems likely that the both HIV-1 and HIV-2 jumped from primates to humans multiple times. Different subtypes also are prevalent in different human populations or geographic regions, indicating that HIV spread through the human population through different routes and at different times. These phylogenetic analyses illustrate that differences between humans and primates provide only a weak barrier to transmission of this virus, suggesting the disturbing possibility that new subtypes could enter the human population in the future. 4. Agriculture. Applications of phylogenetics to agriculture are similar to epidemiology, but the questions are about the origin and spread of pest species rather than infectious diseases. Agricultural questions include: (i) what was the origin of a pest; (ii) how did the pest spread though agriculture; (iii) how did some pest organisms evolve resistance to pesticides; and (iv) are there species closely-related to known pests that might also cause agricultural problems? Fursarium garminariam is a fungal pathogen of commercially important species of grains. Phylogenetic analysis indicates substantial genetic divergence among strains in different agricultural settings. Phylogenetic tree inferred from the combined gene sequences of six single-copy nuclear gene sequences (7,120 bp) by using the methods of maximum parsimony. Numbers above the nodes are bootstrap proportions. Genetic divergence among strains of Fusarium indicates that movement of crops among different agricultural settings must be carefully monitored to prevent introduction of foreign strains. Local crops are likely to be much less resistant to the foreign strains of Fusarium, as compared with the local strain. Figure adapted from O Donnell et al. (2000) PNAS, 97:7905-7910.

5. Conservation. Tragically, while biologists work to assess and study the diversity of life, the activities of man are causing a loss of biodiversity at a rate unmatched in evolutionary history. Conservation biology is the discipline dedicated to preserving biodiversity. Phylogenetic systematics and taxonomy play a fundamental role in this effort; for how can we conserve biological diversity if we do not have a natural system to organize and study it. However, there also are more direct applications of phylogenetics, including: (i) identification of genetically distinct breeding populations that require separate protection and management; (ii) assess kinship of individuals to populations so that appropriate breeding stock can be identified for captive breeding programs; (iii) assess kinship of dead or captive individuals for the purpose of conservation law enforcement; i.e., molecular forensics; and (iv) guide the collection and organization of long-term storage of germ-plasm in seed banks. Note that when working with evolutionary divergences below the species level, the discipline of phylogenetics is broadly overlapped by the discipline of population genetics, where sophisticated methods based on gene genealogies are widely used. The phylogeography of mouse lemurs presented above also illustrate how the phylogenetic framework has important applications to conservation biology. Before the phylogeographic study of the mouse lemurs, the important environmental barrier to migration was perceived to be elevation and wetness of the habitat, suggesting that important conservation decisions might be made independently for an east-west disjunction; that notion could not have been more incorrect. It seems that the primary disjunction should be north-south; although the situation is in reality much more complicated than that. The comparative method has recently become a popular approach to examining risks of extinction and invasiveness. Excerpts from a recent review of both the powers and pitfalls of this method in conservation biology are presented in the figure below. This article highlights three uses of the comparative method in conservation: (i) develop predictive models for risk assessment (ii) identifying the general ecological principles that cause conservation problems (iii) identifying and using endangering traits as triage to prioritize research and conservation efforts Potential pitfalls are: (i) large and expensive sample sizes required for high power of the method (ii) problems with correlation-based methods to identify causal mechanisms Despite the limitations, it seems that the comparative method will grow to be one of many essential tools for conservation research. A hypothetical example from this paper is presented blow that illustrates how application of fisher s exact test to the raw data (ignoring phylogenetic non-independence) overestimate the relationship between extinction risk and body size Should we use a Fisher exact test?

6. Linguistics. An interesting application of phylogenetic methods is to the discipline of linguistics. In particular, maximum likelihood methods have been applied to infer phylogenies of language groups, to estimate the date of the most recent common ancestors of the model groups, and to identify parts of the language tree with low support, and test specific hypotheses about the process of language evolution. A particularly interesting example is the study by Gray and Atkinson (2003) where they use phylogenetic methods to test two theories for the origin of the Indo-European language group: (1) this language group spread into Europe by Kurgan horseman around 6000 BCE [Kurgan theory]; and (2) this language group spread into Europe with the expansion of agriculture from 8000-9500 BCE [Anatolian theory]. The phylogenetic analysis and dating of the origin of the Indo-European languages by Gray and Atkinson (2003) was in striking agreement with the Anatolian farming theory (see figure below); their estimate was 7800-9800 BCE. Interestingly, this result is consistent with a recent genetic study of human populations that supports a Near-Eastern Neolithic contribution to the European gene pool. Language phylogeny and divergence dates support the Anatolian-origin theory of the Indo-European language family. Data: Cognate word forms were sampled from 87 languages. Three extinct languages thought to be more distantly related than the extant languages were included for the purpose of rooting the tree. Cognates were coded as present or absent (1 or 0) for each language. The final dataset was a binary matrix of 2,449 cognates. Estimated date of ancestral node Methods: Phylogenetic analysis was conducted under a stochastic model of binary character evolution that allowed for unequal character state frequencies, and heterogeneous rate of evolution among cognates. Bayesian methods were used to infer the tree topology shown to the left. Values above each branch (in black) are Bayesian posterior probabilities. Divergence times were estimated by first assuming maximum and minimum divergence dates for 11 calibration nodes on the phylogeny. A semi parametric likelihood based method was used to infer the divergence dates for the nodes of the phylogeny Root Grey and Atkinson (2003) Nature 426:435-439 Extinct languages used as outgroups