Dr. rer. nat. Gong Jing Cancer Research Center Medicine School of Shandong University 2012.11.09 1
Chapter 4 Phylogenetic Tree 2
Phylogeny Evidence from morphological ( 形态学的 ), biochemical, and gene sequence data suggests that all organisms on earth are genetically related, and the genealogical ( 谱系的 ) relationship of living things can be represented by a vast evolutionary tree, the tree of Life. The tree of life then represents the phylogeny of organisms. A phylogeny is a tree representation for the evolutionary history relating the species we are interested in. 3
How to Study the Evolutionary History The most authentic evidences are fossils! But fossils are scattered, not complete, not systematic. 4
How to Study the Evolutionary History We can use comparative morphology and comparative anatomy ( 解剖学 ) to determine general framework of evolution. But many details are controversial. 5
How to Study the Evolutionary History Computational molecular evolution: phylogenetic tree. Evolution process happened on the level of molecules: DNA, RNA and protein. Basic assumptions: 1) Nucleic acid sequences and protein sequences contain all information of evolutionary history of species; 2) Molecular clock: the rate of evolutionary change (the number of amino acid differences) of a certain protein was approximately constant over time and over different lineages. => The more similar two homologous proteins are, the closer they are to their common ancestor. 6
Homologous gene are genes that derive from a common ancestor. They have 3 types of relationships: How to Study the Evolutionary History Orthologs ( 直系同源 ): They re separated by speciation is the phenomenon during which a common ancestor gives birth to two subgroups that slowly drift away from their common genetic makeup to become distinct species. Orthologs usually have similar functions and structure. Paralogs ( 间接同源 ): Paralogs are homologues separated by a duplication event, meaning that within a genome, a gene was duplicated. One of the duplicates may have kept the original function while the other duplicate could have acquired a new function. Xenologs ( 异同源 ): Xeno is a Greek word that means foreigner. Xenologs result from a lateral transfer between two organisms a direct DNA transfer between two species. This means that one of the species contains a gene that does not have the same history as the genome in which it is inserted. This is often seen between pathogenetic bacteria and humans. 7
How to Study the Evolutionary History 8
Phylogenetic Tree What is a phylogenetic tree used for? For a certain protein/gene, determining the closest relatives of the organism that you re interested in. Discovering the function of a new protein/gene. Retracing the origin of a gene. 9
Phylogenetic Tree Conceptions: leaf / outer node branch / lineage root inner node 10
Phylogenetic Tree With different branches, the phylogenetic trees have different names. Cladogram Change-based phylogram Time-based phylogram Branch lengths do Branch lengths indicate Inner nodes indicate not mean anything. numbers of evolutionary branching time points. changes All these trees represent the same evolutionary relationships. 11
Phylogenetic Tree There are many different ways to represent the information found in a phylogenetic tree. 12
Phylogenetic Tree Branches can be rotated at a node, without changing the relationships among the out nodes. 13
Choosing Right Sequences for the Right Tree choose right sequences do multiple sequence alignment build a phylogenetic tree Should you do this on the protein or on the DNA sequence? If DNA sequences > 70% identical: DNA multiple sequence alignment. If DNA sequences 70% identical: If your sequences code for proteins: translate them into proteins and build the protein multiple sequence alignment. If your sequences are too similar at the protein level, you can thread the DNA sequences back onto the protein alignment using pal2nal: http://www.bork.embl.de/pal2nal/. In practice, unless your sequences are almost identical, it is easier to keep working at the protein level. 14
Choosing Right Sequences for the Right Tree Paralogs of a large human gene family: story of this gene family. Orthologs from different species: much like a species tree. 15
Algorithms of Tree Reconstruction Maximum Parsimony (MP) 最大简约法 : Closely related sequences, accurate, sequence number <= 12. Distance (Neighbor Joining, NJ) 邻接法 : Distantly/closely related sequences, not very accurate. Maximum Likelihood (ML) 最大似然法 : Distantly related sequences, very accurate. Speed: Distance > Maximum Parsimony > Maximum Likelihood 16
Algorithms of Tree Reconstruction 17
Preparing Your Multiple Sequence Alignment Computing your multiple sequence alignment: ClustalW: http://www.ebi.ac.uk/tools/msa/clustalw2/ MUSCLE: http://www.ebi.ac.uk/tools/msa/muscle/ T-coffee: http://tcoffee.crg.cat/ Before using your MSA for building a tree, you must make sure that it is as accurate as possible. Removing bad columns that affect the tree quality: 1. Make sure there are as many gap-free columns as possible. 2. Remove the extremities of your multiple alignment. 3. Remove the gap-rich regions of your alignment. 4. Be sure to keep the most informative blocks. 18
Preparing Your Multiple Sequence Alignment 1. Make sure there are as many gap-free columns as possible. columns to remove 19
Preparing Your Multiple Sequence Alignment 2. Remove the bad terminals of your multiple alignment. columns to remove 20
Preparing Your Multiple Sequence Alignment 3. Remove the gap-rich regions of your alignment. columns to remove 21
Preparing Your Multiple Sequence Alignment 4. Be sure to keep the most informative blocks. columns to keep 22
How to Delete Columns with WORD While pressing the Alt key on your keyboard, use the mouse to select entire columns in your alignment. When you ve selected everything you want to remove, press the Delete key to remove the selected block. + 23
Computing Your Tree! Guide Tree is NOT a phylogenetic tree. 24
Computing Your Tree English Courses for Graduate Students EMBL ClustalW http://www.ebi.ac.uk/tools/phylogeny/clustalw2_phylogeny 25
Computing Your Tree English Courses for Graduate Students EMBL ClustalW http://www.ebi.ac.uk/tools/phylogeny/clustalw2_phylogeny 26
Computing Your Tree English Courses for Graduate Students EMBL ClustalW http://www.ebi.ac.uk/tools/phylogeny/clustalw2_phylogeny 27
Computing Your Tree English Courses for Graduate Students EMBL ClustalW http://www.ebi.ac.uk/tools/phylogeny/clustalw2_phylogeny sequences.fasta clustalw.aln 28
Computing Your Tree English Courses for Graduate Students This tree is much more accurate than a guide tree! 29
Computing Your Tree English Courses for Graduate Students A phylogram is a phylogenetic tree that has branch lengths proportional to the amount of character change. In cladogram tree, the branch lengths do not represent any change. Phylogram Tree 30
Computing Your Tree English Courses for Graduate Students A phylogram is a phylogenetic tree that has branch lengths proportional to the amount of character change. In cladogram tree, the branch lengths do not represent any change. Cladogram Tree 31
English Courses for Graduate Students Computing Your Tree Different tree representation by choosing display options. 32
English Courses for Graduate Students Displaying Your Tree The easiest way to save your tree is to make a screen capture with the print-screen (PrntScr) key on your keyboard. You can then cut and paste this image into your favorite application (PowerPoint, Paint. etc.). Paste (Ctrl + V) into Windows-Paint 33
English Courses for Graduate Students Displaying Your Tree MyTree.ph 34
English Courses for Graduate Students Displaying Your Tree Phylodendron http://iubio.bio.indiana.edu/treeapp/treeprint-form.html MyTree.ph 35
English Courses for Graduate Students Displaying Your Tree Phylodendron http://iubio.bio.indiana.edu/treeapp/treeprint-form.html right click MyTree.png 36
English Courses for Graduate Students sequences.fasta MyTree.png clustalw.aln MyTree.ph 37