Phylogeny and Molecular Evolution Introduction 1
Credit Serafim Batzoglou (UPGMA slides) http://www.stanford.edu/class/cs262/slides Notes by Nir Friedman, Dan Geiger, Shlomo Moran, Ron Shamir, Sagi Snir, Michal Ziv-Ukelson Durbin et al. Jones and Pevzner s lecture notes Bioinformatics Algorithms book by Phillip Compeau and Pavel Pvzner. 2/62
3/62
4/62
Characterizing Evolution Anatomical and behavioral features were the dominant criteria used to derive evolutionary relationships between species since Darwin Equipped with analysis based on these relatively subjective observations, the evolutionary relationships derived from them were often inconclusive and/or later proved incorrect 5/62
How did the panda evolve? For roughly 100 years scientists were unable to figure out which family the giant panda belongs to In 1870 Père Armand David, returned to Paris from China with the bones of the mysterious creature which he called simply black and white bear. Biologists examined the bones and concluded that they more closely resembled the bones of a red panda (raccoons) than those of bears. In 1985, Steve O Brien et al. solved the giant panda classification problem using DNA sequences (the giant panda is a bear) Giant panda Red panda 6/62
Evolutionary Tree of Bears and Raccoons (O Brien 1985) O Brien s study used about 500,000 nucleotides to construct the evolutionary tree of bears and raccoons. Note that bears and raccoons diverged just 35 million years ago and they share many morphological features. 7/62
Human closer to Dog or Mouse? 8/62
Human closer to Dog or Mouse? 9/62
Human closer to Dog or Mouse? Which Monkey is Human Closest to? 10/62
Evolutionary Trees: DNA-based Approach 40 years ago: Emile Zuckerkandl and Linus Pauling brought reconstructing evolutionary relationships with DNA into the spotlight In the first few years after Zuckerkandl and Pauling proposed using DNA for evolutionary studies, the possibility of reconstructing evolutionary trees by DNA analysis was hotly debated Now it is a dominant approach to study evolution.
Emile Zuckerkandl on human-gorilla evolutionary relationships: From the point of hemoglobin structure, it appears that gorilla is just an abnormal human, or man an abnormal gorilla, and the two species form actually one continuous population. Emile Zuckerkandl, Classification and Human Evolution, 1963
Gaylord Simpson vs. Emile Zuckerkandl: From the point of hemoglobin structure, it appears that gorilla is just an abnormal human, or man an abnormal gorilla, and the two species form actually one continuous population. Emile Zuckerkandl, Classification and Human Evolution, 1963 From any point of view other than that properly specified, that is of course nonsense. What the comparison really indicate is that hemoglobin is a bad choice and has nothing to tell us about attributes, or indeed tells us a lie. Gaylord Simpson, Science, 1964
Who are closer?
Different Trees Obtained Based on Different Genes Betta Globin Dopamine D4 receptor 15/62
Evolutionary Tree of Humans Around the time the giant panda riddle was solved, a DNA-based model of the human evolutionary tree lead to the Out of Africa Hypothesis: Claims our most ancient ancestor lived in Africa roughly 200,000 years ago 19/62
Human Evolutionary Tree (cont d) Based on 53 individuals mitochondrial DNA (16,587bp ) http://www.mun.ca/biology/scarr/out_of_africa2.htm 20/62
The Origin of Humans: Out of Africa vs Multiregional Hypothesis Out of Africa: Humans evolved in Africa ~150,000 years ago Humans migrated out of Africa, replacing other humanoids around the globe There is no direct descendence from Neanderthals Multiregional: Humans evolved in the last two million years as a single species. Independent appearance of modern traits in different areas Humans migrated out of Africa mixing with other humanoids on the way There is a genetic continuity from Neanderthals to humans
Human Migration Out of Africa http://www.becominghuman.org
Evolutionary Tree of Humans (mtdna) The evolutionary tree separates one group of Africans from a group containing all five populations. Vigilant, Stoneking, Harpending, Hawkes, and Wilson (1991)
mtdna analysis supports Out of Africa Hypothesis African origin of humans inferred from: African population was the most diverse (sub-populations had more time to diverge) The evolutionary tree separated one group of Africans from a group containing all five populations. Tree was rooted on branch between groups of greatest difference.
Evolutionary Tree of Humans (mtdna) The evolutionary tree separates one group of Africans from a group containing all five populations. Vigilant, Stoneking, Harpending, Hawkes, and Wilson (1991)
Two Neanderthal Discoveries Feldhofer, Germany Mezmaiskaya, Caucasus Distance: 25,000km
Two Neanderthal Discoveries Is there a connection between Neanderthals and today s Europeans? If humans did not evolve from Neanderthals, whom did we evolve from?
Multiregional Hypothesis? May predict some genetic continuity from the Neanderthals through to the Cro- Magnons up to today s Europeans Can explain the occurrence of varying regional characteristics
Sequencing Neanderthal s mtdna mtdna from the bone of Neanderthal is used because it is up to 1,000x more abundant than nuclear DNA DNA decay over time and only a small amount of ancient DNA can be recovered (upper limit: 100,000 years) PCR of mtdna (fragments are too short, human DNA may mixed in)
Neanderthals vs Humans: surprisingly large divergence AMH vs Neanderthal: 22 substitutions and 6 indels in 357 bp region AMH vs AMH only 8 substitutions AMH = Anatomically Modern Human
31/62
New Fossil (Manot Cave) Supports OOA This means several things. 1. First, unless and until other fossil evidence is found, AMHs once they left Africa came though the Sinai and Levant region stopping in what is now modern day Israel before migrating outwards into Europe and the rest of Asia. 2. Secondly, this discovery conclusively shows that AMHs were indeed living near and perhaps even next to Neanderthals as early as 60,000 years ago. 3. Thirdly, the Out of Africa (OOA) hypothesis becomes the best evidenced hypothesis regarding how early humans migrated and conquered the planet. 32/62
Phylogenetic Trees applied as Crime Evidence 33/62
Phylogenetic Analysis of HIV Virus Lafayette, Louisiana, 1994 A woman claimed her ex-lover (who was a physician) injected her with HIV+ blood Records show the physician had drawn blood from an HIV+ patient that day But how to prove the blood from that HIV+ patient ended up in the woman?i HIV is the virus causing AIDS!!!
HIV Transmission HIV has a high mutation rate, which can be used to trace paths of transmission Two people who got the virus from two different people will have very different HIV sequences Three different tree reconstruction methods (including parsimony) were used to track changes in two genes in HIV (gp120 and RT) HIV is the virus causing AIDS!!!
HIV Transmission Took multiple samples from the patient, the woman, and controls (non-related HIV+ people) In every reconstruction, the woman s sequences were found to be evolved from the patient s sequences, indicating a close relationship between the two Nesting of the victim s sequences within the patient sequence indicated the direction of transmission was from patient to victim This was the first time phylogenetic analysis was used in a court case as evidence (Metzker, et. al., 2002) HIV is the virus causing AIDS!!!
Evolutionary Tree Leads to Conviction
Study evolution? If two sequences from different organisms are similar, they may have a common ancestor (Homologues). So sequence alignment (both pairwise and multiple ) can help construct the phylogenetic tree. -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Edit Distance = 4
Phylogenetic Tree Reconstruction Input: four nucleotide sequences: AAG, AAA, GGA, AGA taken from four species. Question: Which evolutionary tree best explains these sequences? One Answer (the parsimony principle): Pick a tree that has a minimum total number of substitutions of symbols between species and their originator in the evolutionary tree (also called phylogenetic tree). Total #substitutions = 4 AAG AAA GGA AAA AAA AAA 1 2 1 AGA 39
Example Continued There are many trees possible. For example: AAA AAA AAA AGA AAA AAA AAG AAA GGA AGA AAG AGA AAA GGA 40
Example Continued There are many trees possible. For example: 1 AAG AAA AAA AAA 1 1 GGA AGA AGA Total #substitutions = 3 Total #substitutions = 4 The left tree is better than the right tree. 41 AAG AAA AGA AAA AAA AAA 1 1 2 GGA Questions: Is this principle yielding realistic phylogenetic trees? (Evolution) How can we compute the best tree efficiently? (Computer Science) What is the probability of substitutions given the data? (Learning) Is the best tree found significantly better than others? (Statistics)
Tree Reconstruction How are these trees built from sequences? First, a little background 42/62
Rooted and Unrooted Trees 43/62
Rooted Trees Infer an evolutionary ancestor leaves represent existing species internal vertices represent hypothetical ancestors can be viewed as directed trees from the root to the leaves 44/62
Tree of life
46/62
47
Unrooted Trees DO NOT infer an evolutionary ancestor, therefore cannot be viewed as a directed graph Otherwise, they are like rooted trees 48/62
49/62
Binary trees Biologists often work with binary weighted trees: every internal vertex has degree 3 if the tree is rooted then the root has degree 2 every edge has a positive weight (or length) 50/62
End Lecture 51/62