Exploring Treespace Katherine St. John Lehman College & the Graduate Center City University of New York 20 June 2011 (Joint work with the Treespace Working Group, CUNY: Ann Marie Alcocer, Kadian Brown, Alan Joseph Caceres, Juan Castillo, Samantha Daley, John De Jesus, Eric Ford, Kaitlin Hansen, Michael Hintze, Daniele Ippolito, Jinnie Lee, Oliver Mendez, & Diquan Moore)
Phylogenetic Trees leaves represent extant (living) species Charles Darwin, 1837 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 2 / 19
Phylogenetic Trees leaves represent extant (living) species internal nodes represent extinct species Charles Darwin, 1837 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 2 / 19
Phylogenetic Trees leaves represent extant (living) species internal nodes represent extinct species if rooted, the root represents the ancestor of all the species Charles Darwin, 1837 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 2 / 19
Some Uses of Evolutionary Trees David Hillis, 2002 classifying species building the Tree of Life designing the flu vaccine and other drugs determining the origins of HIV infection Katherine St. John (CUNY) Exploring Treespace 20 June 2011 3 / 19
How Many Phylogenetic Trees? Katherine St. John (CUNY) Exploring Treespace 20 June 2011 4 / 19
How Many Phylogenetic Trees? Semple & Steel 2003: # of trees = 1 3 5 (2n 5) = (2n 5)!! 1 2 n 2 n!n 5 2 Π 1 2 2 ( 2 e )n n n 2 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 4 / 19
How Many Phylogenetic Trees? Semple & Steel 2003: # of trees = 1 3 5 (2n 5) = (2n 5)!! 1 2 n 2 n!n 5 2 Π 1 2 2 ( 2 e )n n n 2 (For n 50, more possible tree topologies than there are atoms in the universe.) Katherine St. John (CUNY) Exploring Treespace 20 June 2011 4 / 19
Searching for Optimal Trees Local search techniques prevail: NNI Treespace for n = 6 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 5 / 19
Searching for Optimal Trees Local search techniques prevail: at each step, choose the next tree from its neighbors NNI Treespace for n = 6 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 5 / 19
Searching for Optimal Trees Local search techniques prevail: at each step, choose the next tree from its neighbors Many different ways to define neighbors. NNI Treespace for n = 6 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 5 / 19
Searching for Optimal Trees NNI Treespace for n = 6 Local search techniques prevail: at each step, choose the next tree from its neighbors Many different ways to define neighbors. Most rely on an underlying metric between trees. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 5 / 19
Popular Metrics Focus on three popular metrics: A B Nearest Neighbor Interchange (NNI) C D E G F Subtree Prune and Regraft (SPR) Tree Bisection and Reconnection (TBR) Katherine St. John (CUNY) Exploring Treespace 20 June 2011 6 / 19
Popular Metrics Focus on three popular metrics: A B Nearest Neighbor Interchange (NNI) C D E G F Subtree Prune and Regraft (SPR) Tree Bisection and Reconnection (TBR) Note that NNI SPR TBR. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 6 / 19
NNI Metric A B A B C D E G F C F G D E The NNI distance between two trees is the minimal number of moves needed to transform one to the other (NP-hard, DasGupta et al. 1997). Katherine St. John (CUNY) Exploring Treespace 20 June 2011 7 / 19
SPR Distance A B A B C C G D F D E E G F C D A B G F E SPR distance is the minimal number of moves that transforms one tree into the other. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 8 / 19
SPR Distance A B A B C C G D F D E E G F C D A B G F E SPR distance is the minimal number of moves that transforms one tree into the other. SPR for rooted trees is NP-hard. (Bordewich & Semple 05) SPR for unrooted trees is NP-hard. (Hickey et al. 08) Katherine St. John (CUNY) Exploring Treespace 20 June 2011 8 / 19
TBR Distance A B C D E G F A B C D E G F A B A B C D E G F C D G E F TBR distance is the minimal number of moves that transforms one tree into the other. TBR for rooted trees is NP-hard. (Allen & Steel 01) Katherine St. John (CUNY) Exploring Treespace 20 June 2011 9 / 19
Treespace For every n, treespace is the space of all phylogenetic trees on a n taxa, under a fixed metric. Treespace for n = 5 under NNI Bastert et al., 2002 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 10 / 19
Treespace For every n, treespace is the space of all phylogenetic trees on a n taxa, under a fixed metric. The size of neighborhoods varies by metric (Allen & Steel, 2001): Treespace for n = 5 under NNI Bastert et al., 2002 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 10 / 19
Treespace For every n, treespace is the space of all phylogenetic trees on a n taxa, under a fixed metric. The size of neighborhoods varies by metric (Allen & Steel, 2001): Treespace for n = 5 under NNI Bastert et al., 2002 General n = 5 NNI 2n 6 4 SPR 2(n 3)(2n 7) 12 TBR < (2n 3)(n 3) 2 12 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 10 / 19
Bryant s Challenge: Walking Through Trees Combinatorial Challenge I: An NNI-walk is a sequencet 1, T 2,..., T k of unrooted binary phylogenetic trees where each consecutive pair of trees differ by a single NNI. David Bryant Katherine St. John (CUNY) Exploring Treespace 20 June 2011 11 / 19
Bryant s Challenge: Walking Through Trees Combinatorial Challenge I: An NNI-walk is a sequencet 1, T 2,..., T k of unrooted binary phylogenetic trees where each consecutive pair of trees differ by a single NNI. What is the shortest NNI walk that passes through all binary trees on n leaves? David Bryant Katherine St. John (CUNY) Exploring Treespace 20 June 2011 11 / 19
Bryant s Challenge: Walking Through Trees David Bryant Combinatorial Challenge I: An NNI-walk is a sequencet 1, T 2,..., T k of unrooted binary phylogenetic trees where each consecutive pair of trees differ by a single NNI. What is the shortest NNI walk that passes through all binary trees on n leaves? Suppose we are given a tree T. What is the shortest NNI walk that passes through all the trees that lie at most one SPR (subtree prune and regraft) move from T? Katherine St. John (CUNY) Exploring Treespace 20 June 2011 11 / 19
Hamiltonicity Recall that a graph is Hamiltonian if there exists a path that visits every node exactly once. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 12 / 19
Hamiltonicity Recall that a graph is Hamiltonian if there exists a path that visits every node exactly once. Caceres, Daley, De Jesus, Hintze, Moore, & St. John, 2011: Theorem: n, treespace is Hamiltonian for the SPR and TBR metrics. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 12 / 19
Hamiltonicity Recall that a graph is Hamiltonian if there exists a path that visits every node exactly once. Caceres, Daley, De Jesus, Hintze, Moore, & St. John, 2011: Theorem: n, treespace is Hamiltonian for the SPR and TBR metrics. Theorem: n 7, treespace is Hamiltonian for the NNI metric. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 12 / 19
Hamiltonicity Recall that a graph is Hamiltonian if there exists a path that visits every node exactly once. Caceres, Daley, De Jesus, Hintze, Moore, & St. John, 2011: Theorem: n, treespace is Hamiltonian for the SPR and TBR metrics. Theorem: n 7, treespace is Hamiltonian for the NNI metric. Open: What is the shortest path for the NNI treespace for n 8? Katherine St. John (CUNY) Exploring Treespace 20 June 2011 12 / 19
Proof Sketch of Walks of Treespace: By induction on n. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 13 / 19
Proof Sketch of Walks of Treespace: By induction on n. Assume there is a path for treespace of n taxa trees, and build from it a path for treespace of n + 1-taxa trees. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 13 / 19
Proof Sketch of Walks of Treespace: By induction on n. Assume there is a path for treespace of n taxa trees, and build from it a path for treespace of n + 1-taxa trees. Expand each n-taxa tree into 2(n 3) (n + 1)-taxa trees. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 13 / 19
Proof Sketch of Walks of Treespace: By induction on n. Assume there is a path for treespace of n taxa trees, and build from it a path for treespace of n + 1-taxa trees. Expand each n-taxa tree into 2(n 3) (n + 1)-taxa trees. For SPR (and TBR), there is a Hamiltonian path in each such group. (Not true for NNI only a 2-walk) Katherine St. John (CUNY) Exploring Treespace 20 June 2011 13 / 19
Proof Sketch of Walks of Treespace: By induction on n. Assume there is a path for treespace of n taxa trees, and build from it a path for treespace of n + 1-taxa trees. Expand each n-taxa tree into 2(n 3) (n + 1)-taxa trees. For SPR (and TBR), there is a Hamiltonian path in each such group. (Not true for NNI only a 2-walk) Glue the paths together to make a path for the whole space. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 13 / 19
SPR Neighborhoods Bryant s Conjecture: What is the shortest NNI-walk on a SPR-neighborhood? Katherine St. John (CUNY) Exploring Treespace 20 June 2011 14 / 19
SPR Neighborhoods Bryant s Conjecture: What is the shortest NNI-walk on a SPR-neighborhood? An SPR-neighborhood for n = 6: Katherine St. John (CUNY) Exploring Treespace 20 June 2011 14 / 19
SPR Neighborhoods Bryant s Conjecture: What is the shortest NNI-walk on a SPR-neighborhood? An SPR-neighborhood for n = 6: Theorem: n 6, there exists SPR-neighborhoods where the NNI-walk cannot be a Hamiltonian path. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 14 / 19
SPR Neighborhoods Bryant s Conjecture: What is the shortest NNI-walk on a SPR-neighborhood? An SPR-neighborhood for n = 6: Theorem: n 6, there exists SPR-neighborhoods where the NNI-walk cannot be a Hamiltonian path. Proof Idea: The SPR-neighborhood of caterpillars all have 4 isolated triangles. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 14 / 19
SPR Neighborhoods Bryant s Conjecture: What is the shortest NNI-walk on a SPR-neighborhood? Katherine St. John (CUNY) Exploring Treespace 20 June 2011 15 / 19
SPR Neighborhoods Bryant s Conjecture: What is the shortest NNI-walk on a SPR-neighborhood? The SPR-neighborhoods for n = 7: caterpillar tree other tree topology Katherine St. John (CUNY) Exploring Treespace 20 June 2011 15 / 19
SPR Neighborhoods Bryant s Conjecture: What is the shortest NNI-walk on a SPR-neighborhood? The SPR-neighborhoods for n = 7: caterpillar tree other tree topology Shortest paths are +8 and +6 more than the minimal Hamiltonian path. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 15 / 19
Summary Caceres, Daley, De Jesus, Hintze, Moore, & St. John, 2011: Theorem: n, treespace is Hamiltonian for the SPR and TBR metrics. Theorem: n 7, treespace is Hamiltonian for the NNI metric. NNI Treespace for n = 6 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 16 / 19
Summary Caceres, Daley, De Jesus, Hintze, Moore, & St. John, 2011: Theorem: n, treespace is Hamiltonian for the SPR and TBR metrics. Theorem: n 7, treespace is Hamiltonian for the NNI metric. Open: What is the shortest path for the NNI treespace for n 8? NNI Treespace for n = 6 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 16 / 19
Summary NNI Treespace for n = 6 Caceres, Daley, De Jesus, Hintze, Moore, & St. John, 2011: Theorem: n, treespace is Hamiltonian for the SPR and TBR metrics. Theorem: n 7, treespace is Hamiltonian for the NNI metric. Open: What is the shortest path for the NNI treespace for n 8? Theorem: n 6, the minimal NNI-walk of an SPR-neighborhood is not Hamiltonian. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 16 / 19
Summary NNI Treespace for n = 6 Caceres, Daley, De Jesus, Hintze, Moore, & St. John, 2011: Theorem: n, treespace is Hamiltonian for the SPR and TBR metrics. Theorem: n 7, treespace is Hamiltonian for the NNI metric. Open: What is the shortest path for the NNI treespace for n 8? Theorem: n 6, the minimal NNI-walk of an SPR-neighborhood is not Hamiltonian. Open: What is the shortest NNI-walk of an SPR-neighborhood? Katherine St. John (CUNY) Exploring Treespace 20 June 2011 16 / 19
Treespace Working Group A team of undergraduate students in mathematics and computer science contributed to this work: Ann Marie Alcocer, Kadian Brown, Alan Caceres, Samantha Daley, John De Jesus, Kaitlin Hansen, Michael Hintze, Daniele Ippolito, Jinnie Lee, Oliver Mendez, and Diquan Moore Katherine St. John (CUNY) Exploring Treespace 20 June 2011 17 / 19
Acknowledgments The organizers and the Isaac Newton Institute The US National Science Foundation for their generous support The New York Louis Stokes Alliance for Minority Participation in Research for student funding Katherine St. John (CUNY) Exploring Treespace 20 June 2011 18 / 19
Acknowledgments The organizers and Isaac Newton Institute The National Science Foundation for their generous support The New York Louis Stokes Alliance for Minority Participation in Research for student funding And last, but not least, Sean Katherine St. John (CUNY) Exploring Treespace 20 June 2011 19 / 19