Tree Building Activity Introduction In this activity, you will construct phylogenetic trees using a phenotypic similarity (cartoon microbe pictures) and genotypic similarity (real microbe sequences). For the phenotypic similarity exercise, you ll build a similarity matrix by hand to practice the process of tree-building on a small scale, and for the genotypic similarity you ll use a specialized computer program to see how bioinformatics allows us to implement this same process on a much bigger scale. Phenotypic Similarity determining similarity by hand Your worksheets include all the tables necessary for completing this process. The instructions below work through a short demo set of data as a guide for how to work with your cartoon microbes. Below are the example bacteria used in the demo.
1. Determine the phenotype for each bacterium and fill out the trait table. It s usually easier to use short abbreviations or +/- markings to track traits, but for demonstration purposes everything has been written out in full. Shape Color Protein Cilia Bacterium A Rod Green Surface Yes Bacterium B Round Blue Cytoplasm Yes Bacterium C Round Grey Surface No Bacterium D Round Blue Cytoplasm No 2. Determine similarity by comparing the traits between two bacteria and marking down how many differences they have between them in the similarity matrix (we call it a similarity matrix because we re using fewer differences to indicate more similarity). KEEP IN MIND, comparing bacterium A to bacterium B is the same as comparing bacterium B to bacterium A, so you only need to fill out half of the matrix. For the demo, we ve greyed out unnecessary cells. A B C D A 3 3 4 B 3 1 C 2 D
3. Use similarity matrix to determine which two bacteria are most similar. Remember, you marked the number of differences, so the least differences indicate the most similarity. When you ve determined the two most similar bacteria, begin your phylogenetic tree by joining the bacteria to a common ancestor node. 4. Adjust your similarity matrix so that it compares your remaining bacteria to the new common ancestor node. To do this, combine the rows and columns for the bacteria you just joined. The new values for those cells come from averaging the original comparisons. In this example, B compared to C had 3 differences and D compared to C had 2 differences, so BD compared to C has 2.5 differences. A BD C A 3.5 3 BD 2.5 C 5. To finish this process, repeat steps 3 and 4 until all of the bacteria are included in the tree. The rest of the steps are shown below, with matrices on the left and the tree they produce on the right.
Genotypic Similarity the power of bioinformatics Next you re going to work with purely genetic data to do the same thing. Instead of working through each step by hand, you re going to use an online program to compare the DNA sequences, calculate their similarities, and produce a final tree. While you re working through these steps, think about how this compares to what you did before. What was easier by hand with phenotypes? What is easier with computers and DNA? Which of these processes is easier to scale up: if you had to look at hundreds or thousands of organisms, which process would you prefer to use? 1. Get genotype information (FASTA file), you can open the file in notepad. Genetic information is stored in two components: a header line, which begins with >, and contain identifying information for the sequence; and, starting on a new line, the sequence information written using the single letter nucleotide representation. Multiple sequences can be stored in the same FASTA file, so long as each sequence has those components. Below is an example FASTA file, with very short sequences. 2. Next, copy the sequences into the program you re going to use. We re using a free, online program called Clustal Omega, (http://www.ebi.ac.uk/tools/msa/clustalo/). Copy/paste the contents of your FASTA file into the input box on Clustal s main page. be sure to select the type of sequence you re using (DNA, RNA, or Protein). Below is a screen shot to demonstrate.
3. Once you run the program, you ll get three types of information: sequence alignments, a similarity matrix, and a phylogenetic tree. a. Sequence Alignment: the common regions of the sequences are lined up so that differences are easier to spot. b. Similarity Matrix: The percent of two compared sequences which are the same. Clustal labels the rows, but the columns are in the same order. c. Phylogenetic Tree: Relationships from the similarity matrix are shown as distances back to a common ancestor. Distances can be shown in different ways, but the relative relationships remain the same.