Constructing Evolutionary Trees 0-0
HIV Evolutionary Tree SIVs (monkeys)! HIV (human)! human infection! human HIV/M human HIV/M chimpanzee SIV chimpanzee SIV human HIV/N human HIV/N chimpanzee SIV chimpanzee SIV chimpanzee SIV chimpanzee SIV chimpanzee SIV chimpanzee SIV chimpanzee SIV human HIV/O human HIV/O chimpanzee SIV chimpanzee SIV red-capped manabey SIV drill SIV vervet monkey SIV tantalus monkey SIV sooty mangabey SIV human HIV/A human HIV/B sooty mangabey SIV Sykes s monkey SIV greater spot-nosed monkey SIV De Brazzas monkey SIV
HIV Evolutionary Tree SIVs (monkeys)! HIV (human)! human infection! But how did biologists generate this? human HIV/M human HIV/M chimpanzee SIV chimpanzee SIV human HIV/N human HIV/N chimpanzee SIV chimpanzee SIV chimpanzee SIV chimpanzee SIV chimpanzee SIV chimpanzee SIV chimpanzee SIV human HIV/O human HIV/O chimpanzee SIV chimpanzee SIV red-capped manabey SIV drill SIV vervet monkey SIV tantalus monkey SIV sooty mangabey SIV human HIV/A human HIV/B sooty mangabey SIV Sykes s monkey SIV greater spot-nosed monkey SIV De Brazzas monkey SIV
Constructing a Distance Matrix SPECIES ALIGNMENT Chimp Human Seal Whale ACGTAGGCCT ATGTAAGACT TCGAGAGCAC TCGAAAGCAT
Constructing a Distance Matrix D i,j = number of differing symbols between i-th and j-th species. SPECIES ALIGNMENT DISTANCE MATRIX Chimp Human Seal Whale Chimp ACGTAGGCCT 0 3 6 4 Human ATGTAAGACT 3 0 7 5 Seal TCGAGAGCAC 6 7 0 Whale TCGAAAGCAT 4 5 0
Constructing a Distance Matrix D i,j = number of differing symbols between i-th and j-th species. SPECIES ALIGNMENT DISTANCE MATRIX Chimp Human Seal Whale Chimp ACGTAGGCCT 0 3 6 4 Human ATGTAAGACT 3 0 7 5 Seal TCGAGAGCAC 6 7 0 Whale TCGAAAGCAT 4 5 0
bacteria LIFE Trees archaebacteria EUKARYOTES protoctists PLANTS green algae fungi ANIMALS Tree: Connected network containing no cycles. mosses sponges ferns cnidarians flowering! seed plants non-flowering! seed plants flatworms VERTEBRATES lophophorates echinoderms rotifers roundworms ARTHROPODS TETRAPODS AMNIOTES cartilaginous! fish bony fish amphibians segmented! worms mollusks crustaceans insects chelicerates mammals turtles snakes! crocodiles! & lizards & birds
bacteria LIFE Trees archaebacteria EUKARYOTES protoctists PLANTS green algae fungi ANIMALS Tree: Connected network containing no cycles. mosses sponges flowering! seed plants non-flowering! seed plants ferns cnidarians flatworms Leaves (degree = ): present-day species lophophorates rotifers roundworms VERTEBRATES echinoderms ARTHROPODS TETRAPODS AMNIOTES cartilaginous! fish bony fish amphibians segmented! worms mollusks crustaceans insects chelicerates mammals turtles snakes! crocodiles! & lizards & birds
bacteria LIFE Trees archaebacteria EUKARYOTES protoctists PLANTS green algae fungi ANIMALS Tree: Connected network containing no cycles. mosses sponges flowering! seed plants non-flowering! seed plants ferns cnidarians flatworms Leaves (degree = ): present-day species lophophorates rotifers roundworms TETRAPODS AMNIOTES VERTEBRATES cartilaginous! fish bony fish amphibians echinoderms segmented! worms mollusks crustaceans insects ARTHROPODS chelicerates Internal nodes (degree ): ancestral species mammals turtles snakes! crocodiles! & lizards & birds
bacteria LIFE Trees archaebacteria EUKARYOTES protoctists PLANTS green algae fungi ANIMALS Tree: Connected network containing no cycles. mosses sponges flowering! seed plants non-flowering! seed plants ferns cnidarians flatworms Leaves (degree = ): present-day species lophophorates rotifers roundworms TETRAPODS AMNIOTES VERTEBRATES cartilaginous! fish bony fish amphibians echinoderms segmented! worms mollusks crustaceans insects ARTHROPODS chelicerates Internal nodes (degree ): ancestral species turtles snakes! crocodiles! & lizards & birds mammals Exercise: Design a Tree struct with a few methods.
Trees Most Recent Ancestor! TIME! Present Day! Rooted tree: one node is designated as the root (most recent common ancestor)
Trees mathoverflow.net! Unrooted tree: no node is designated as the root (we haven t inferred the location of most recent ancestor).
Distance-Based Phylogeny Distance-Based Phylogeny Problem: Construct an evolutionary tree from a distance matrix. Input: A distance matrix. Output: The unrooted tree corresponding to this distance matrix. Think: Is this problem clearly stated?
Distance-Based Phylogeny Distance-Based Phylogeny Problem: Construct an evolutionary tree from a distance matrix. Input: A distance matrix. Output: The unrooted tree corresponding to this distance matrix. Think: Is this problem clearly stated? We haven t stated what corresponding to means, so this isn t a computational problem!
Fitting a Tree to a Matrix Chimp Human Seal Whale Chimp 0 3 6 4 Human 3 0 7 5 Seal 6 7 0 Whale 4 5 0 Exercise: Find a tree fitting this distance matrix.
Fitting a Tree to a Matrix Chimp Human Seal Whale Chimp 0 3 6 4 Human 3 0 7 5 Seal 6 7 0 Whale 4 5 0 Chimp! 3 Seal! Human! 0 Whale!
Fitting a Tree to a Matrix Chimp Human Seal Whale Chimp 0 3 6 4 Human 3 0 7 5 Seal 6 7 0 Whale 4 5 0 Chimp! 3 Seal! Human! 0 Whale!
Fitting a Tree to a Matrix Chimp Human Seal Whale Chimp 0 3 6 4 Human 3 0 7 5 Seal 6 7 0 Whale 4 5 0 Chimp! 3 Seal! Human! 0 Whale!
Fitting a Tree to a Matrix Chimp Human Seal Whale Chimp 0 3 6 4 Human 3 0 7 5 Seal 6 7 0 Whale 4 5 0 Chimp! 3 Seal! Human! 0 Whale!
Fitting a Tree to a Matrix Chimp Human Seal Whale Chimp 0 3 6 4 Human 3 0 7 5 Seal 6 7 0 Whale 4 5 0 Chimp! 3 Seal! Human! 0 Whale!
Fitting a Tree to a Matrix Chimp Human Seal Whale Chimp 0 3 6 4 Human 3 0 7 5 Seal 6 7 0 Whale 4 5 0 Chimp! 3 Seal! Human! 0 Whale!
Fitting a Tree to a Matrix Chimp Human Seal Whale Chimp 0 3 6 4 Human 3 0 7 5 Seal 6 7 0 Whale 4 5 0 Chimp! 3 Seal! Human! 0 Whale!
Return to Distance-Based Phylogeny Distance-Based Phylogeny Problem: Construct an evolutionary tree from a distance matrix. Input: A distance matrix. Output: The unrooted tree fitting this distance matrix. Think: Is this problem clearly formulated?
Return to Distance-Based Phylogeny Exercise: Design an algorithm (pseudocode) that will construct a tree fitting any input distance matrix (mammal matrix reproduced below for reference). Chimp Human Seal Whale Chimp 0 3 6 4 Human 3 0 7 5 Seal 6 7 0 Whale 4 5 0
Return to Distance-Based Phylogeny Exercise: What tree does your method construct for the following matrix? i j k l i 0 3 j 3 0 3 k 0 3 l 3 3 0
Return to Distance-Based Phylogeny Exercise: Find the tree fitting this distance matrix. i j k l i 0 3 4 3 j 3 0 4 5 k 4 4 0 l 3 5 0
Sometimes, No Tree Fits Exercise: Find the tree fitting this distance matrix. i j k l i 0 3 4 3 j 3 0 4 5 k 4 4 0 l 3 5 0 NO Solution! Additive matrix: distance matrix such that there exists an unrooted tree fitting it.
Sometimes, More Than One Tree Fits Chimp Human Seal Whale Chimp 0 3 6 4 Human 3 0 7 5 Seal 6 7 0 Whale 4 5 0 Chimp! 3 Seal! Human! 0 Whale!
Sometimes, More Than One Tree Fits Chimp Human Seal Whale Chimp 0 3 6 4 Human 3 0 7 5 Seal 6 7 0 Whale 4 5 0 0.5 Seal! Chimp! 3.5 0 Whale! Human!
Which Tree is Better? Chimp! 3 Seal! Human! 0 Whale! 0.5 Seal! Chimp! 3.5 0 Whale! Human!
Which Tree is Better? Chimp! 3 Seal! Human! 0 Whale! 0.5 Seal! Chimp! 3.5 # incoming edges =! # incoming edges =! 0 Whale! Human!
Which Tree is Better? Chimp! 3 Seal! Human! 0 Whale! Degree: number of edges touching a node.
Which Tree is Better? Chimp! 3 Seal! Human! 0 Whale! Degree: number of edges touching a node. Simple tree: tree with no nodes of degree.
Which Tree is Better? Chimp! 3 Seal! Human! 0 Whale! Degree: number of edges touching a node. Simple tree: tree with no nodes of degree. Theorem: There is a unique simple tree fitting an additive matrix.
Reformulating Distance-Based Phylogeny Distance-Based Phylogeny Problem: Construct an evolutionary tree from a distance matrix. Input: A distance matrix. Output: The simple tree fitting this distance matrix (if this matrix is additive).
Reformulating Distance-Based Phylogeny Distance-Based Phylogeny Problem: Construct an evolutionary tree from a distance matrix. Input: A distance matrix. Output: The simple tree fitting this distance matrix (if this matrix is additive). But what should we do about non-additive matrices?
Reformulating Distance-Based Phylogeny Distance-Based Phylogeny Problem: Construct an evolutionary tree from a distance matrix. Input: A distance matrix. Output: The simple tree fitting this distance matrix (if this matrix is additive). But what should we do about non-additive matrices? Heuristic: A quick, practical method that does not necessarily solve a given problem.
Modeling Speciations Researchers often assume that all internal nodes correspond to speciations, where one species splits into two.
Modeling Speciations Squirrel Monkey! Orangutan! Unrooted binary tree: every node has degree or 3. Chimpanzee! Bonobo! Baboon! Human! Gorilla!
Modeling Speciations Rooted binary tree: an unrooted binary tree with a root (of degree ) on one of its edges. Squirrel Monkey! Baboon! Orangutan! Gorilla! Chimpanzee! Bonobo! Human!
Ultrametric Trees 33 3 Molecular clock: assigns ages to each node in the tree (age of leaves = 0). 3 7 6 Squirrel Monkey! Baboon! Orangutan! Gorilla! Chimpanzee! Bonobo! Human!
Ultrametric Trees 0 33 3 edge weights: correspond to difference in ages on the nodes the edge connects. 33 3 0 3 6 7 3 7 6 6 Squirrel Monkey! Baboon! Orangutan! Gorilla! Chimpanzee! Bonobo! Human!
Ultrametric Trees 0 33 3 Ultrametric tree: distance from root to any leaf is the same (i.e., age of root). 33 3 0 3 6 7 3 7 6 6 Squirrel Monkey! Baboon! Orangutan! Gorilla! Chimpanzee! Bonobo! Human!
UPGMA: A Clustering Heuristic. Form a cluster for each present-day species, each containing a single leaf. i j k l i 0 3 4 3 j 3 0 4 5 k 4 4 0 l 3 5 0 i 0 j 0 k 0 l 0
UPGMA: A Clustering Heuristic. Find the two closest clusters C and C according to the average distance D avg (C, C ) = Σ i in C, j in C D i,j / C C where C denotes the number of elements in C. i j k l i 0 3 4 3 j 3 0 4 5 k 4 4 0 l 3 5 0 i 0 j 0 k 0 l 0
UPGMA: A Clustering Heuristic 3. Merge C and C into a single cluster C. i j k l i 0 3 4 3 j 3 0 4 5 k 4 4 0 l 3 5 0 { k, l } i 0 j 0 k 0 l 0
UPGMA: A Clustering Heuristic 4. Form a new node for C and connect to C and C by an edge. Set age of C as D avg (C, C )/. i j k l i 0 3 4 3 j 3 0 4 5 k 4 4 0 l 3 5 0 { k, l } i 0 j 0 k 0 l 0
UPGMA: A Clustering Heuristic 5. Update the distance matrix by computing the average distance between each pair of clusters. i j { k, l } i 0 3 3.5 j 3 0 4.5 { k, l } 3.5 4.5 0 { k, l } i 0 j 0 k 0 l 0
UPGMA: A Clustering Heuristic 6. Iterate until a single cluster contains all species. i j { k, l } i 0 3 3.5 j 3 0 4.5 { k, l } 3.5 4.5 0 { i, j }.5.5.5 i 0 j 0 k 0 l 0
UPGMA: A Clustering Heuristic 6. Iterate until a single cluster contains all species. {i, j} { k, l } {i, j} 0 4 { k, l } 4 0.5 { i, j }.5.5 i 0 j 0 k 0 l 0
UPGMA: A Clustering Heuristic 6. Iterate until a single cluster contains all species. {i, j} { k, l } {i, j} 0 4 { k, l } 4 0.5 0.5.5.5 i 0 j 0 k 0 l 0
UPGMA: A Clustering Heuristic 6. Iterate until a single cluster contains all species. 0.5.5.5.5 i 0 j 0 k 0 l 0
UPGMA: A Clustering Heuristic UPGMA(D):. Form a cluster for each present-day species, each containing a single leaf.. Find the two closest clusters C and C according to the average distance D avg (C, C ) = Σ i in C, j in C D i,j / C C where C denotes the number of elements in C 3. Merge C and C into a single cluster C. 4. Form a new node for C and connect to C and C by an edge. Set age of C as D avg (C, C )/. 5. Update the distance matrix by computing the average distance between each pair of clusters. 6. Iterate steps -5 until a single cluster contains all species.
UPGMA Doesn t Fit a Tree to a Matrix i j k l i 0 3 4 3 j 3 0 4 5 k 4 4 0 l 3 5 0 0.5.5.5.5 i 0 j 0 k 0 l 0
UPGMA Doesn t Fit a Tree to a Matrix i j k l i 0 3 4 3 j 3 0 4 5 k 4 4 0 l 3 5 0 0.5.5.5.5 i 0 j 0 k 0 l 0
Quick Quiz Exercise: Apply UPGMA to the following matrix. i j k l i 0 0 9 j 0 0 7 k 9 7 0 8 l 8 0