Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections -
Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila homolog serves as outgroup. A first duplication event (POPD1, POPD2/POPD3) occured before the speciation of Ciona and the vertebrate lineage, and probably after the divergence of arthropods The second duplication event happened in vertebrates and gave rise to the subfamilies POPD2 and POPD3. POPD2 and POPD3 are thus more closely related to each other than to subfamily POPD1 Ortholog to POPD1_HUMAN (Q8NE79): chordate POPD1 clade, and Drosophila bves; all other genes are paralog Ortholog to Ciona POPD1: same as for human POPD1 Ortholog to Ciona POPD23: vertebrate families POPD2 and POPD3, and most probably Drosophila bves PhyML. Tree search : NNIs. Initial tree : BIONJ. Model of amino acids substitution : JTT. Number of taxa : 41. Log-likelihood : -7592.66297. Discrete gamma model : Yes - Number of categories : 8 - Gamma shape parameter : 1.764. Proportion of invariant : 0.061 Further phylogenetic trees are available at http://wiki.isb-sib.ch/swisstree/st001_evidence
Exercise 2 Branch support
Exercise 3
Exercise 3 eggnog Was not available when preparing the slides
From last year, might have changed slightly
Exercise 3 HOVERGEN
Exercise 3
Exercise 3
Exercise 3
Exercise 3
Exercise 3
Exercise 3
Exercise 4 Blast result: vertebrate sequences closest to gem-1 seem to be MIRO2.
Further below (same Blast than previous page), there is a 2 nd vertebrate protein group annotated as MIRO1 again with similarity over the full length of the protein. We copy all the data from the species of interest to infer the phylogenies. Please note that there is no MIRO1 gene for X. tropicalis.
Phylogenetic tree reconstructed using the 1-click mode with default parameter settings. The topology of the result tree corresponds to the species tree with one exception: chicken and frog seem to be monophyletic - this is probably not the case. The result tree further suggests that the vertebrate genes MIRO1 and MIRO2 are paralogs and both are co-orthologs of the fungi gene gem-1 and invertebrate gene MIRO.
Tree robustness Bootstrap replicates are created randomly therefore results for the same analysis can slightly differ. It is thus more meaningful to perform an analysis with 1000 replicates.
Exercise 5
Dataset 2 Phylogenetic analysis parameters: Alignment: muscle (default); Data model: Gblocks stringent; Tree-building: PhyML (JTT, gamma(4), shape parameter estimated, invariable sites fixed (0.0)); Test for topology robustness: alrt (SH). The 1st tree includes species from Archaea and Bacteria, which form the 2 major branches of the tree. The 2 nd dataset consists of sequences derived from Eukaryotes. A phylogenetic tree reconstructed with both datasets suggests, that eukaryotic sequences are more closely related to Bacteria than to Archaea. This result is not in agreement with the species tree. An exaplanation could be that Eukaryotes derived this gene from bacteria by horizontal gene transfer (HGT).
Phylogenetic tree of datasets 1, 2 and 3. The sequences of the 3rd dataset consist of eukaryotic homologs. The result suggests that eukaryotic sequences are possibly derived from a single gene transfer event. Sequence annotation indicates that one group of the eukaryotic proteins ( SYAM ) are mitochondrial. A possible scenario: Eukaryotes could have derived a gene copy by endosymbiontic gene transfer (EGT) from the mitochondrion to the nucleus, followed by a gene duplication event. One of the copies subsequently evolved into the cytoplasmic form. Analysis protokol: Alignment: muscle (default); Data model: Gblocks stringent; Tree-building: PhyML (JTT, gamma(4), shape parameter estimated, invariable sites fixed (0.0)); Test for topology robustness: alrt (SH).
Can you root this tree? Exercise 6
Eukaryota Archaea Eukaryota Bacteria Eukaryota
Eukaryota, nuclear Archaea Eukaryota, mitochondrial, nuclear encoded Bacteria Eukaryota, chloroplast,arath: nuclear encoded CYCME: chloropl. encoded
To be improved: Add more sequence data to the dataset Use alignments methods such as probcons or T-coffee Analyze individual branches, e.g. cyanobacteria and chloroplastic data, proteobacteria and mitochondrial-derived sequences (& meaningful outgroups) ML methods or MrBayes Calculate support values for internal nodes Run other tests, which we have not discussed here Can this tree be rooted by an outgroup? No, as no species is known that diverged prior to the divergence of the 2 major branches of life!