Hillis DM Inferring complex phylogenies. Nature 383:

Hillis DM. 1996. Inferring complex phylogenies. Nature 383: 130-131. Triangles: parsimony Squares: neighbor-joining (under specified model) Circles: UPGMA

Designing your phylogenetic analysis Choice of data (morphology/behavior/molecules ) Choosing your ingroup/outgroup taxa Choosing the amount of information More genes or more taxa?

Wheeler WC. 1992. Extinction, sampling, and molecular phylogenetics. In: Novacek MJ and Wheeler QD, eds. Extinction and phylogeny. New York: Columbia University Press. 205-215. Lecointre G, Philippe H, Van Le HL, and Le Guyader H. 1993. Species sampling has a major impact on phylogenetic inference. Molecular Phylogenetics and Evolution 2: 205-224. Graybeal A. 1998. Is it better to add taxa or characters to a difficult phylogenetic problem? Systematic Biology 47: 9-17. Bremer B, Jansen RK, Oxelman B, Backlund M, Lantz H, and Kim K-J. 1999. More characters or more taxa for a robust phylogeny--case study from the coffee family (Rubiaceae). Systematic Biology 48: 413-435. Giribet G, and Carranza S. 1999. What can 18S rdna do for bivalve phylogeny? Journal of Molecular Evolution 48: 256-261. Pollock DD, Zwickl DJ, McGuire JA, and Hillis DM. 2002. Increased taxon sampling is advantageous for phylogenetic inference. Systematic Biology 51: 664-671. Zwickl DJ, and Hillis DM. 2002. Increased taxon sampling greatly reduces phylogenetic error. Systematic Biology 51: 588-598. Hillis DM, Pollock DD, McGuire JA, and Zwickl DJ. 2003. Is sparse taxon sampling a problem for phylogenetic inference? Systematic Biology 52: 124-126. Rokas A, and Carroll SB. 2005. More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. Molecular Biology and Evolution 22: 1337-1344. Hedtke SM, Townsend TM, and Hillis DM. 2006. Resolution of phylogenetic conflict in large data sets by increased taxon sampling. Systematic Biology 55: 522-529.

Zwickl DJ, and Hillis DM. 2002. Increased taxon sampling greatly reduces phylogenetic error. Systematic Biology 51: 588-598.

Bremer B, Jansen RK, Oxelman B, Backlund M, Lantz H, and Kim K-J. 1999. More characters or more taxa for a robust phylogeny--case study from the coffee family (Rubiaceae). Systematic Biology 48: 413-435.

Graybeal A. 1998. Is it better to add taxa or characters to a difficult phylogenetic problem? Systematic Biology 47: 9-17.

Philippe H, Lartillot N, and Brinkmann H. 2005. Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa and Protostomia. Molecular Biology and Evolution 22: 1246-1253. Rokas A, Krüger D, and Carroll SB. 2005. Animal evolution and the molecular signature of radiations compressed in time. Science 310: 1933-1938. Delsuc F, Brinkmann H, Chourrout D, and Philippe H. 2006. Tunicates and not cephalochordates are the closest living relatives of vertebrates. Nature 439: 965-968. Bourlat SJ, Juliusdottir T, Lowe CJ, Freeman R, Aronowicz J, Kirschner M, Lander ES, Thorndyke M, Nakano H, Kohn AB, Heyland A, Moroz LL, Copley RR, and Telford MJ. 2006. Deuterostome phylogeny reveals monophyletic chordates and the new phylum Xenoturbellida. Nature 444: 85-88.

Gene accumulation Curves (Rarefied with 50 random addition replicates)

ESTs processing Raw trace files need to be processed: trace2dbest EST reads are then assembled using PartiGene (with original quality files allowed) PartiGene calls CLOBB to assign sequences to clusters of like sequences and PHRAP to assemble the EST sequences assigned to each cluster into a contiguous sequence for that cluster The single contiguous sequence for each cluster is then translated with prot4est

Orthology assignment A local database of all Homo sapiens, Canis familiaris, Gallus gallus, Drosophila melanogaster, and Anopheles gambiae sequences that have orthology assignments in Homologene is constructed Masked sequences were queried against these sequences with blastp The blast results were passed to TribeMCL for Markov Chain Clustering (MCL). (The MCL inflation parameter was varied in intervals of 0.1 from 1.4 to 2.5 to identify the value that generated the maximum number of clusters) TribeMCL groups that had no sequences from the homologene dataset were discarded, as were groups that had homologene sequences that were assigned to multiple TribeMCL groups Only groups with sequences from at least 33% of the taxa are preserved The number of sequences for each species represented within each group is calculated, and groups with a median greater than one are discarded

EST data set After passing the taxon sampling and orthology criteria, we obtained 196 genes for 56 species On average, each species was represented by 48% of the genes The combined matrix consists of 32,975 amino acids (19,362 positions are parsimony-informative) Our matrix of 196 genes contains only 50 of the genes included in the 146-gene matrix of Philippe et al. (2005)