Phylogenetic trees 07/10/13
A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share a common ancestor.
The phylogenetic tree of a group of species does not necessarily reflect the phylogenetic tree of their host species (because of gene duplication, lateral gene transfer): Orthologs: Genes which diverged due to speciation Paralogs: Due to gene duplication Xenologs: Due to lateral gene transfer
Some Background All trees we consider will be binary the length of a branch, or edge indicates the amount of evolutionary divergence True biological trees have a root
Distance Matrix Not as bad as may seem methods from an initial look Remarkably little information is lost Introduced by Cavalli-Sforza and Edwards(1967) and Fitch and Margoliash (1967). Influenced by clustering algorithms General idea: Calculate a measure of distance between each pair of taxa Find a tree that predicts the observed set of distances as closely as possible
Distance methods Input: distance matrix between species Outline: Cluster species together Initially clusters are singletons At each iteration combine two closest clusters to get a new one
Unweighted Pair Group Method using Arithmetic Averages (UPGMA) Despite its formidable acronym, the method is simple and intuitively appealing. It works by clustering the sequences, at each stage combining two clusters and, at the same time, creating a new node on the tree. Thus, the tree can be imagined as being assembled upwards, each node being added above the others, and the edge lengths being determined by the difference in the heights of the nodes at the top and bottom of an edge.
An example showing how UPGMA produces a rooted phylogenetic tree 8
An example showing how UPGMA produces a rooted phylogenetic tree 9
An example showing how UPGMA produces a rooted phylogenetic tree 10
An example showing how UPGMA produces a rooted phylogenetic tree 11
An example showing how UPGMA produces a rooted phylogenetic tree 12
UPGMA 1. Initialize n clusters where each cluster i contains the sequence i 2. Find closest pair of clusters i, j, using distances in matrix D 3. Make them neighbors in the tree by adding new node (ij), and set distance from (ij) to i and j as Dij/2 4. Update distance matrix D: for all clusters k do the following (ni and nj are size of clusters i and j respectively) 5. Delete columns and rows for i and j in D and add new ones corresponding to cluster (ij) with distances as computed above 6. Goto step 2 until only one cluster is left
Limitations of UPGMA tree is ultrametric Evolution rate is constant in all branches
Neighbor joining 1. Initialization: same as UPGMA 2. For each species compute 3. Select i and j for which is minimum 4. Make them neighbors in the tree by adding new node (ij), and set distance from (ij) to i and j as 6. D i,(ij) = 1 2 (D i,j + u i u j ), D j,(ij) = 1 2 (D i,j + u j u i )
Neighbor joining 6. Update distance matrix D: for all clusters k do the following D (ij),k = 1 2 (D i,k + D j,k D i,j ) 7. Delete columns and rows for i and j in D and add new ones corresponding to cluster (ij) with distances as computed above 8. Go to 3 until two nodes/clusters are left