Visualising Phylogenetic Trees

Similar documents
Dr. Amira A. AL-Hosary

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Algorithms in Bioinformatics

Effects of Gap Open and Gap Extension Penalties

Phylogenetic Tree Reconstruction

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

What is Phylogenetics

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

A Colour-Filling Approach For Visualising Trait Evolution With Phylogenies

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Evolutionary Tree Analysis. Overview

Finding the best tree by heuristic search

Non-independence in Statistical Tests for Discrete Cross-species Data

Phylogenetic inference

BINF6201/8201. Molecular phylogenetic methods

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Parsimony via Consensus

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.

A Phylogenetic Network Construction due to Constrained Recombination

X X (2) X Pr(X = x θ) (3)

Copyright 2000 N. AYDIN. All rights reserved. 1

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Constructing Evolutionary/Phylogenetic Trees

A (short) introduction to phylogenetics

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Walks in Phylogenetic Treespace

EVOLUTIONARY DISTANCES

Molecular Evolution & Phylogenetics

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Phylogeny: building the tree of life

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

How to read and make phylogenetic trees Zuzana Starostová

Consensus methods. Strict consensus methods

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Comparing whole genomes

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Lecture 7: DecisionTrees

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B

8/23/2014. Phylogeny and the Tree of Life

Chapter 1: Logic systems

Constructing Evolutionary/Phylogenetic Trees

The practice of naming and classifying organisms is called taxonomy.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

1 Basic Definitions. 2 Proof By Contradiction. 3 Exchange Argument

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Cladistics and Bioinformatics Questions 2013

Intraspecific gene genealogies: trees grafting into networks

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Algorithms in Bioinformatics

AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM

I. Short Answer Questions DO ALL QUESTIONS

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation

Quantifying sequence similarity

Pattern Popularity in 132-Avoiding Permutations

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

Single alignment: Substitution Matrix. 16 march 2017

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Evolutionary Models. Evolutionary Models

Outline. Classification of Living Things

Bayesian Models for Phylogenetic Trees

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Lecture 11 Friday, October 21, 2011

Phylogeny: traditional and Bayesian approaches

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection

Isolating - A New Resampling Method for Gene Order Data

Workshop: Biosystematics

Introduction to Bioinformatics Introduction to Bioinformatics

Introduction to characters and parsimony analysis

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley

Phylogenetic methods in molecular systematics

PHYLOGENY AND SYSTEMATICS

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS

Transcription:

Visualising Phylogenetic Trees Wan Nazmee Wan Zainon & Paul Calder School of Informatics and Engineering Flinders University of South Australia PO Box 21, Adelaide 51, South Australia wanz1@infoeng.flinders.edu.au calder@infoeng.flinders.edu.au Abstract This paper describes techniques for visualising pairs of similar trees. Our aim is to develop ways of presenting the information so as to highlight both the common structure of the trees and their points of difference. The impetus for the work comes from the field of bioinformatics, where geneticists construct complex phylogenetic trees to represent the evolution of species or genes. But the techniques can also be used for other treestructured data such as file systems, parse trees, decision trees, and organisational hierarchies. To investigate our techniques, we have built a prototype application that reads and displays phylogenetic trees in the popular Nexus format. The application incorporates a variety of interactive and automated visualisation techniques, and is implemented in Java. We are working with biologists to see how well the techniques work for real-world data. Keywords: Interactive visualisation, phylogenetic trees, bioinformatics. 1 Introduction Tree-structured data occurs in many domains: file systems, parse trees, organisational hierarchies, and classification schemes of many kinds. The impetus for this work described in this paper is the domain of phylogenetic classification, which is used by geneticists to describe possible evolutionary relationships between species or individuals. Although we have developed our techniques specifically for that domain, many of our techniques could also be applied to other domains that use similar trees. This paper presents techniques for visualising pairs of phylogenetic trees in order to emphasise the similarity of the trees while also highlighting how they differ. We have implemented these techniques in the context of a prototype tool for interactively visualising phylogenetic trees, and are in the process of evaluating the effectiveness of the tool for real phylogenetic data. Copyright 26, Australian Computer Society, Inc. This paper appeared at the Seventh Australasian User Interface Conference (AUIC26), Hobart, Australia. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 5. Wayne Piekarski, Ed. Reproduction for academic, notfor profit purposes permitted provided this text is included. The remainder of this paper is organised as follows. Section 2 provides an introduction to the bioinformatics basis of phylogenetic trees and outlines other work that has investigated the visualisation and comparison of such data. Section 3 presents our approach to the problem and details several of the algorithms we use to compute visualisations. Section 4 describes brief implementation details for the prototype visualisation tool, shows examples of its interface and discusses its use. 2 Related Work 2.1 Bioinformatics Context Biologists and geneticists use phylogenetic trees to represent the evolutionary interrelationships between collections of related species or genes. The discovery and analysis of those relationships may help in many practical applications such as drug discovery, forensics, disease control, and ecological modelling. Biologists construct phylogenetic trees by examining the phenotypes or genotypes of a collection of organisms and attempting to infer the evolutionary process by which the organisms came to be. For example, a geneticist might obtain DNA sequence data from a range of species or from individuals within a population. Then, by comparing the sequences, she could infer how the sampled organisms might have evolved via a series of mutations, each caused by one change in the DNA sequence. This hypothesised evolutionary history is then represented as a tree of life showing how possible ancestors could have led to the current organisms. Bioinformaticists have devised a range of algorithms, based on strategies such as Maximum Likelihood (Felsenstein et al. 1982) and Maximum Parsimony (Farris 1983), for computing such phylogenetic trees. However, there is no gold standard ; current practice dictates that several different methods be applied to the sequence data (Thorup 1994). When this happen, biologists often need to compare several similar trees in order to get a more complete picture of the relationships involved. A similar situation arises when several species have evolved in close association (co-evolution); the biologist might be interested in understanding how the phylogenetic tree for one species compares with that for the co-evolved species. In its simplest form, a phylogenetic tree is drawn as a rooted binary tree. Each leaf node represents an actual species or organism; each internal node represents a hypothetical ancestor at which mutation is assumed to

Tree 1 Tree 2 Figure 1: Fictitious phylogenetic trees have occurred (and which therefore has exactly two branches). For example, Figure 1 shows two (clearly fictitious) trees that suggest two possible ways in which 4 present-day species might be related. Tree 1 implies that and diverged recently from a common ancestor, that the / ancestor and share a more distant common ancestry, and finally that the whole // tree split from the branch even further in the past. Tree 2, on the other hand, suggests that a common ancestor split into two branches, one ultimately leading to and and the other to and. Real phylogenetic trees will of course be much larger and thus more complex. Understanding such trees requires visual inspection, structural comparison, and interactive manipulation and exploration, and thus present a number of visualisation challenges (Carrizo 24). Biologists faced with inadequate visualisation tools for comparing trees have had to rely instead on paper, tape, and highlighter pens (Munzner et al. 23). 2.2 Tree Comparison Techniques Bioinformaticists use a variety of techniques to compare phylogenetic trees. Section 3.1 describes how we apply and extend some of these techniques in visualising trees. Consensus trees are widely used to summarise the agreement between a set of trees. A consensus tree represents a lowest common denominator of two or more trees; it depicts those aspects that the individual trees all agree on. Bryant (1997) reports on a variety of methods for creating a consensus tree, including the strict, majority rule, semistrict, and Nelson and Adams techniques. For example, the strict consensus tree of the trees in Figure 1 is as follows: The consensus tree indicates that both trees agree that and had a recent common ancestor, but disagree about how and fit into the picture. The best that can be said is that and both shared a common ancestor with the / ancestor at some time in the past. Note that a consensus tree includes all of the original leaf nodes, but is normally not fully resolved; areas of disagreement generally result in interior nodes with more than two branches. An agreement subtree is a subtree that is common to two or more trees. Conceptually, a subtree can be obtained by pruning leaf nodes (and collapsing the parent internal nodes) from the original tree. An agreement subtree is a subtree that can be extracted in such a manner from all of the trees. A greatest agreement subtree (GAS) is an agreement subtree with the greatest number of leaf nodes. For example, the trees in Figure 1 have two greatest agreement subtrees: Note that a greatest agreement subtree does not normally include all of the leaf nodes (unless, of course, the trees are identical). A triplet is a 3-node subtree and represents the smallest informative subtree of a rooted tree. The structure of a tree is fully characterised by enumerating the structure of its triplets. For example, Tree 1 has the following triplets: Triplets can be used as a basis for quantifying the difference between rooted trees. Using this approach, the structural difference between two trees is the number of triplets whose structure is different in the two trees. For example, Tree 2 has the following triplets. Since 2 triplets (the second and third) are different from the corresponding triplets in Tree 1, the structural triplet difference between the two trees is 2. The nearest neighbour interchange (NNI) technique (Robinson 1971) is also used to quantify the difference between trees. A nearest neighbour interchange is an interchange of two nearest neighbour branches. The NNI difference between two trees is the minimum number of such interchanges needed to convert one tree into the other. NNI is usually applied to unrooted trees, but can be adapted for rooted trees. For rooted trees, the nearest neighbour of a branch is one of the sub-branches (if they exist) of its sibling. For example, in Tree 1 the nearest neighbours of the branch are the and branches, and the nearest neighbours of the branch are the branch and the (unlabelled) common / ancestor branch.

Program PROTPARS NEIGHBOR DRAWGRAM CONSENSE RETREE Use Infers phylogenies from protein sequences using parsimony method Infers phylogenies from distance matrix data using either pairwise clustering or neighbour joining methods Draws a rooted tree based on output from one of the phylogeny inference programs Computes a consensus tree from a group of phylogenies Allows interactive manipulation of a tree Table 1: Selected PHYLIP programs Using this definition, the following trees can all be obtained by one NNI step from Tree 1. Since one of these (the bottom right) is structurally identical with Tree 2, the NNI difference between the two trees is 1. 2.3 Tools for Phylogenetic Tree Analysis and Visualisation Biologists use many applications to analyse and understand phylogenetic data. This section briefly describes four of the most popular tools that are freely available over the Internet. A comprehensive list of other tools is provided on the PHYLIP web site (Felsenstein 25) Gibas and Jambeck (21) report that the most widely used phylogenetic analysis package is PHYLIP (Felsenstein 25), which contains more than 3 programs that implement different phylogenetic algorithms. It has programs for tree plotting, heuristic tree search, interactive tree manipulation, and other phylogenetic analysis methods. Table 1 shows a list of PHYLIP programs that users are most likely to use to analyse protein and DNA sequence data. The COMPONENT application (Roderic 1993) can both display and analyse phylogenetic trees. It s emphasis is on computing comparative metrics between trees, although it includes simple interactive editing operations such as rearranging tree branches, deleting nodes, and rerooting trees. Mesquite (Madison and Madison 25) is a system that its developers describe as a modular system for evolutionary analysis. Available modules include components for construction and comparison of phylogenetic trees. The TreeSet Visualisation module (Klinger and Amenta 22) produces point-set visualisations that suggest clustering within large sets of trees. TreeJuxtaposer (Munzner at al. 23) supports structural comparison of trees. The tool can highlight parts of several trees that are structurally similar, although its emphasis is on efficiently handling very large trees (up to several hundred thousand nodes) rather than on identifying the specific differences between the trees. 3 Visualising Tree Differences Our approach to visualising trees similarities and differences makes use of the fact that a tree with unordered branches can be drawn in many arrangements. In a phylogenetic tree, the order in which branches appear is usually less important than the structural relationships between nodes. In such cases, we can take advantage of this flexibility to draw a pair of trees to highlight both their similarities and differences. Our technique is to draw the pair of trees face-to-face, with the arrangements of each tree chosen to best emphasise the similarities and highlight the differences. For example, the trees in Figure 1 could be drawn as follows: This arrangement shows the greatest agreement subtree (,, ) and also how the differing node () connects in the two trees. In essence, it suggests that in one case diverged from the / line, whereas in the other it diverged from the line. Typical phylogenetic trees can often have 5 or more nodes, and since the number of possible arrangements of a fully resolved tree of size n is 2 n-1 it is usually impractical to manually determine the best arrangement. To help in the process we have considered several strategies for automatically arranging the trees. The minimum triplet difference (MTD) algorithm computes arrangements of two trees for which the difference, as measured by triplet arrangement pattern, is minimised. The maximum branch similarity (MBS) algorithm arranges one tree so that its branches have as many leaf nodes as possible in common with the corresponding branch in the other tree.

A B C Triplet Tree 1 pattern Tree 2 pattern (,, ) A J (,, ) A D (,, ) A D (,, ) G G Table 2: Triplet patterns for Tree 1 and Tree 2 D G J E H K F I L Tree 2 arrangement Tree 1 arrangement 3 4 4 4 2 4 3 4 3 4 4 4 2 4 3 4 4 3 4 4 4 2 4 3 4 3 4 4 4 2 4 3 3 4 2 4 4 4 3 4 3 4 2 4 4 4 3 4 4 3 4 2 4 4 4 3 4 3 4 2 4 4 4 3 Figure 2: Labelled triplet arrangement patterns The all-but-n (ABn) algorithm attempts to arrange the common structures of the two trees so that the nodes that differ can be drawn in alignment. 3.1 Minimum Triplet Difference Nodes in a triplet can be labelled in 3 distinct ways, and there are 4 distinct arrangements for each labelling, making a total of 12 possible labelled triplet patterns, as shown in Figure 2. The nodes in the figure are labelled to suggest how the triplet pattern is assigned to a particular labelled tree. The label is assigned to the tree node with the lowest ordinal number (in some domain-specific ordering) of the three triplet nodes. Similarly, the label is assigned to the tree node with the highest ordinal number, and the label is assigned to the tree node with the intermediate ordinal number. For example, using an alphabetic ordering for the labels in the trees of Figure 1 and considering the triplet (,, ), label would map to (the label with the lowest ordering), to, and to (the highest ordering). Thus this triplet in Tree 1 would match pattern A, and the same triplet in Tree 2 would match pattern J. The triplet difference between two trees is computed by considering all triplets and counting the number of triplets for which the pattern in the two trees is different. For example, Table 2 lists the triplet patterns for all four of the triplets in the trees of Figure 1. Since three of the triplets have different patterns in the two trees, the triplet difference for these tree arrangements is 3. Table 3: Triplet difference matrix for all possible arrangements of Tree 1 and Tree 2 The minimum triplet difference (MTD) algorithm finds an arrangement for each tree that minimises the triplet difference. In principle, the algorithm considers each possible arrangement of each of the two trees, then choses the pair of arrangements for which the triplet difference is smallest. In general, there may be many pairs of arrangements with the same minimum triplet difference; MTD does not specify which such pair should be chosen. For example, Table 3 lists the triplet difference for each of the 8 possible arrangements of Tree 1 and Tree 2. In this case the minimum difference is 2, which is achieved by 8 pairs, of which one is as follows: 3.2 Maximum Branch Similarity The maximum branch similarity (MBS) algorithm arranges one tree so that the branches of each internal node have the largest number of leaf nodes in common with the corresponding branches of the equivalent node in the other tree. For example, consider the original arrangements of the trees in Figure 1. The set of leaf nodes comprised by the upper branch of the root node of Tree 1 is {}, and the

set comprised by the lower branch is {,, }. Similarly, the Tree 2 root node upper branch comprises {, } and lower branch {, }. Thus for this arrangement there are no nodes common to the upper branches, and only one () common to the lower branches, for a total common node count of 1. However, if the branches of the Tree 2 root node were exchanged, then the upper branches would have 1 common node () and the lower branches would have 2 common codes ( and ), for a total common node count of 3. Thus MBS indicates that the root node of Tree 2 should be flipped (its branches swapped), giving the following arrangement: The algorithm then recursively considers the upper and lower children of the original nodes, ultimately terminating at the leaf nodes. In this simple example, no further swaps occur since the upper branch of Tree 1 is already a leaf, and since flipping the lower branch of Tree 2 would not result in an increase in the number of common nodes (both alternatives have only 1 node in common). 3.3 All-But-n We have explored a class of algorithms, which we call All-But-n (ABn), that can arrange trees to maximise leaf node alignment in a face-to-face display where the GAS of the two trees is almost as large as the trees themselves (in other words, where the trees differ with respect to just a few nodes). The simplest situation (AB1) occurs for trees for which the GAS includes all but one node. In this case, the aim of the algorithm is to choose an arrangement for the GAS so that, when the differing node is re-inserted into the tree (which will be in a different position in the two trees), the differing nodes will be aligned. For example, the trees of Figure 1, which have a GAS that excludes the single node, could be drawn as follows. AB1 partitions the GAS into three components at the nearest common ancestor (NCA) of the points in the two original trees at which the different node is attached to the GAS. The component above the NCA (the outer tree) pays no further part in the algorithm. The algorithm proceeds by arranging the upper and lower inner branches of the NCA so that missing node attachment for one tree is on the lower boundary of the upper inner branch, while for the other tree it is on the upper boundary of the lower inner branch. Then, when the two trees are constructed around face-to-face copies of the GAS, the missing node insertion points will coincide. Since it is always possible to arrange a tree so that any one particular node is on the tree boundary, it is always possible to achieve this arrangement when the GAS is only one node short of the full trees. When more than one node must be pruned (and subsequently reinserted), the situation is more complex; sometimes full alignment can be achieved, but sometimes only partial alignment is possible. A full explanation of the ABn algorithm is beyond the scope of this paper. 4 A Visual Tree Comparison Tool We have implemented a prototype application for visualising pairs of phylogenetic trees and used it as a vehicle for developing and evaluating our ideas. The application is implemented in Java using the Swing components. Figure 3 shows the prototype tool displaying two 5-node trees. The program can read standard Nexus-format tree files (David et al. 1997) and display a selected pair of trees. It provides controls for specifying basic parameters of the tree display, including the separation between branches and the depth of each node. The information display area at the bottom of the window provides basic information about the trees and is used largely for debugging. Figure 3 shows the trees displayed in the raw arrangement specified in the Nexus file; in this example, that arrangement does not make it easy to compare the trees. However, the node connection display (between the two trees), which visually connects common leaf nodes in the two trees, provides some indication of similarities in the trees. Horizontal connection lines (coloured green in the application) indicate nodes whose vertical position is the same in the two trees. Clearly, if the two trees (or parts of the trees) are identical, then they can be drawn so that all nodes are aligned, in which case the connection display would consist entirely of parallel horizontal lines. In Figure 3, few nodes are aligned (the exception is a group of 3 towards the top of the display). Slanted connection lines (coloured red or yellow depending on whether the position of the node in the left tree is higher or lower than that in the right tree) indicate nodes that are not aligned. However, parallel slanted lines indicate groups of nodes whose relative positions are the same in the two trees, suggesting a similar structure for those groups in the two trees. Figure 3 shows several such groups.

Figure 3: Visualisation tool interface Figure 4: Collapsing interior nodes 4.1 Using The Application To rearrange the trees (in order to better compare them), the user can use a combination of manual interaction and automatic rearrangement. The palette on the left of Figure 3 includes tools for interactively modifying the tree appearance, including selecting tree nodes, collapsing selected branches, controlling the spacing between branches of a node, swapping the upper and lower branches of a node, and manually setting branch colours and line thicknesses. The collapse tool is used to temporary hide various parts of the tree, as shown in Figure 4. Collapsing nodes enhances visibility, especially for larger trees, because it enables the user to focus on specific parts of the tree while ignoring other parts. Collapsed nodes can then subsequently be expanded (and themselves arranged) once their containing structure has been dealt with. The insert gap and decrease gap tools are used to add additional space between branches in order to arrange a group of nodes so that they are located at the same level in both of the trees. The flip tool is used to swap the positions of the branches of a given interior node, which allows manual manipulation of the tree arrangement and may provide a simpler view of the tree structures. The visualisation tool currently implements the MTD and MBS automatic rearrangement algorithms, but not the ABn algorithm. To apply the algorithms, the user selects a branch (or perhaps the entire tree) in both left and right trees, then invokes the desired algorithm. The application computes the new arrangements, then redraws the trees with the selected nodes rearranged. 4.2 Evaluation Informal evaluation of our prototype visualisation tool has shown that a combination of automatic rearrangement and manual rearrangement is often effective in rapidly generating an arrangement that facilitates tree comparison, even for quite large trees. For example, Figure 5 shows an arrangement of the trees in Figure 3 for which most nodes are aligned. The arrangement was achieved by a combination of MBS (applied to the whole trees to align high-level structure), MTD (to sort out the tangles indicated by groups of nearly parallel connecting lines), manual node flipping (to fine-tune a few branches), and manual gap insertion (to move relatively aligned groups into absolute alignment).

Figure 5: An arrangement with greater alignment Note, however, that alignment of nodes does not necessarily indicate commonality of structure, although it does make it much easier to see such commonality. Figure 6 shows the same arrangement as does Figure 5, but with common leaf-level branches highlighted in colour. The colouring algorithm finds nodes that have the same siblings in both trees, then recursively examines their parents. Note that not all aligned nodes have common structure (although most do) and that not all nodes with common structure are aligned (although most are). Our current investigations suggest that the combination of alignment (to simplify the display) and colouring (to identify common structures) appears promising as a way to understand the two trees. We are working with our bioinformaticist colleagues to validate and further develop our ideas and to determine if interactive visusalisation is a viable technique for data of this kind. 5 Conclusion Information visualisation can play a major role in the analysis of phylogenetic data by allowing geneticists to visually compare and therefore better understand their data. We have developed and are in the process of evaluating a prototype tool that domain specialists that deal with phylogenies can use to help understand the data that they confront. Although we have not yet done so, we believe that our ideas will also be of value in other domains where similarly structured data is used, and where comparisons are key in understanding the implications of that data. Acknowledgements We gratefully acknowledge the contribution of Rejmond Sejic, who built an early version of the prototype tool and implemented the MTD algorithm as part of his Honours project (Sejic 24). Thank you also to our School of Biological Sciences colleagues Dr Cathy Abbott and Assoc. Prof. Mike Schwarz for their valuable insights and bioinformatics expertise. References Carrizo, S. F. (24): Phylogenetic Trees: An Information Visualisation Perspective. In Proc. 2nd Asia-Pacific Bioinformatics Conference (APBC24), Dunedin, New Zealand, Australian Computer Society, Inc. Klinger, J. and Amenta, N. (22): Case Study: Visualizing Sets of Evolutionary Trees. In Proc. IEEE Symposium on Information Visualization, Boston, Massachusetts, USA. Roderic, D. M. (1993): Component 2. User Guide, http://taxonomy.zoology.gla.ac.uk/rod/cplite/manual.ht ml (last accessed 8/8/25). Byrant, D. (1997): Building Trees, Hunting for Trees and Comparing Trees. Ph.D. Thesis, University of Canterbury. Sejic, R. (24): Visual Comparison of Phylogenetic Trees. Honours Thesis, Flinders University of South Australia. Gibas, C. and Jambeck, P. (21): Bioinformatics Computer Skills. O Reilly, USA.

Figure 6: The final presentation David, R. M., David, L. S., and Wayne, P. M. (1997): NEXUS: An Extensible File Format for Systematic Information. Systematic Biology, 46(4):59, 62. Munzner, T., Guimbretiere, F., Tasiran, S., Zhang, L. and Zhou, Y. (23): TreeJuxtaposer: Scalable Tree Comparison Using FocusContext with Guaranteed Visibility. In Proc. SINGGRAPH 23. Thorup, M and Farach, M. (1994): Fast Comparison of Evolutionary Trees. In Proc. 5th Annual ACM_SIAM Symposium on Discrete Algorithms. Maddison, W. P. and Maddison, D. R. (25): Mesquite: a modular system for evolutionary analysis. Version 1.6 http://mesquiteproject.org Felsenstein, J., Sawyer, S., and Kochin, R (1982): An efficient method for matching nucleic acid sequences. Nucleic Acids Research 1(1): 133-139. Farris J. S. (1983): The logical basis of phylogenetic analysis. In Advances in Cladistics, Platnick N.I. & Funk V.A., eds, pp. 1-36. Columbia Uni. Press, New York. Robinson, D. F. (1971): Comparison of labeled trees with valency three, J. Combin. Theory 11:15-119. Felsenstein, J. (accessed 27/1/25): PHYLIP web site. http://evolution.genetics.washington.edu/phylip.html