Maximum Agreement Subtrees

Similar documents
The maximum agreement subtree problem

Probabilities of Evolutionary Trees under a Rate-Varying Model of Speciation

arxiv: v1 [q-bio.pe] 1 Jun 2014

Phylogenetic Algebraic Geometry

The Moran Process as a Markov Chain on Leaf-labeled Trees

k-protected VERTICES IN BINARY SEARCH TREES

CATERPILLAR TOLERANCE REPRESENTATIONS OF CYCLES

On the Longest Common Pattern Contained in Two or More Random Permutations

Data selection. Lower complexity bound for sorting

DISTRIBUTIONS OF CHERRIES FOR TWO MODELS OF TREES

L n = l n (π n ) = length of a longest increasing subsequence of π n.

Math 262A Lecture Notes - Nechiporuk s Theorem

The expected value of the squared euclidean cophenetic metric under the Yule and the uniform models

Expected Number of Distinct Subsequences in Randomly Gener

Algebraic Statistics Tutorial I

Pattern Popularity in 132-Avoiding Permutations

On the Turán number of forests

Jim Pitman. Department of Statistics. University of California. June 16, Abstract

MAXIMAL CLADES IN RANDOM BINARY SEARCH TREES

On the complexity of approximate multivariate integration

Asymptotic distribution of two-protected nodes in ternary search trees

RANDOM MATRIX THEORY AND TOEPLITZ DETERMINANTS

Exploring Treespace. Katherine St. John. Lehman College & the Graduate Center. City University of New York. 20 June 2011

High-dimensional permutations

On the Sensitivity of Cyclically-Invariant Boolean Functions

Toric Fiber Products

On the Error Bound in the Normal Approximation for Jack Measures (Joint work with Le Van Thanh)

arxiv: v1 [math.pr] 21 Mar 2014

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.

On Symmetries of Non-Plane Trees in a Non-Uniform Model

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

ON SOME FACTORIZATION FORMULAS OF K-k-SCHUR FUNCTIONS

On the mean connected induced subgraph order of cographs

arxiv: v2 [cs.ds] 3 Oct 2017

Branching within branching: a general model for host-parasite co-evolution

The Subtree Size Profile of Plane-oriented Recursive Trees

MSTD Subsets and Properties of Divots in the Distribution of Missing Sums

Erdős-Renyi random graphs basics

Lecture 3: Decision Trees

Gap Embedding for Well-Quasi-Orderings 1

With Question/Answer Animations

Asymptotic Analysis. Slides by Carl Kingsford. Jan. 27, AD Chapter 2

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS

On the Average Path Length of Complete m-ary Trees

Probabilities on cladograms: introduction to the alpha model

On the Sensitivity of Cyclically-Invariant Boolean Functions

CSCE 222 Discrete Structures for Computing. Review for Exam 2. Dr. Hyunyoung Lee !!!

PROTECTED NODES AND FRINGE SUBTREES IN SOME RANDOM TREES

Lecture 7: Dynamic Programming I: Optimal BSTs

The discrepancy of permutation families

On the Distortion of Embedding Perfect Binary Trees into Low-dimensional Euclidean Spaces

On the number of cycles in a graph with restricted cycle lengths

Dynamic Programming: Shortest Paths and DFA to Reg Exps

CS Data Structures and Algorithm Analysis

Notes on Logarithmic Lower Bounds in the Cell Probe Model

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

In-Class Soln 1. CS 361, Lecture 4. Today s Outline. In-Class Soln 2

Applications of Analytic Combinatorics in Mathematical Biology (joint with H. Chang, M. Drmota, E. Y. Jin, and Y.-W. Lee)

Fundamental Algorithms

Random Geometric Graphs.

The Minesweeper game: Percolation and Complexity

Fundamental Algorithms

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

A Generalization of Wigner s Law

Tutorial 1.3: Combinatorial Set Theory. Jean A. Larson (University of Florida) ESSLLI in Ljubljana, Slovenia, August 4, 2011

Silver trees and Cohen reals

Walks in Phylogenetic Treespace

DRAFT. Algebraic computation models. Chapter 14

Almost sure asymptotics for the random binary search tree

Rational exponents in extremal graph theory

Phylogenetic Networks, Trees, and Clusters

arxiv: v1 [q-bio.pe] 13 Aug 2015

Protected nodes and fringe subtrees in some random trees

Climbing an Infinite Ladder

SELECTIVELY BALANCING UNIT VECTORS AART BLOKHUIS AND HAO CHEN

CS 581 Paper Presentation

On joint subtree distributions under two evolutionary models

Grade 11/12 Math Circles Fall Nov. 5 Recurrences, Part 2

Lecture 15 - NP Completeness 1

On the number of matchings of a tree.

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Section 4 Fall 2010

Algorithm efficiency analysis

Remarks on a Ramsey theory for trees

CPSC 320 (Intermediate Algorithm Design and Analysis). Summer Instructor: Dr. Lior Malka Final Examination, July 24th, 2009

Dynamic Programming: Shortest Paths and DFA to Reg Exps

The symmetry in the martingale inequality

On zero-sum partitions and anti-magic trees

A basis for the non-crossing partition lattice top homology

Finite Metric Spaces & Their Embeddings: Introduction and Basic Tools

Irredundant Families of Subcubes

Climbing an Infinite Ladder

Lecture 21: Algebraic Computation Models

Proof Techniques (Review of Math 271)

Fast algorithms for even/odd minimum cuts and generalizations

The space requirement of m-ary search trees: distributional asymptotics for m 27

REVERSIBLE MARKOV STRUCTURES ON DIVISIBLE SET PAR- TITIONS

How many randomly colored edges make a randomly colored dense graph rainbow hamiltonian or rainbow connected?

Quivers of Period 2. Mariya Sardarli Max Wimberley Heyi Zhu. November 26, 2014

Distribution of the Number of Encryptions in Revocation Schemes for Stateless Receivers

Lecture 4. 1 Circuit Complexity. Notes on Complexity Theory: Fall 2005 Last updated: September, Jonathan Katz

Transcription:

Maximum Agreement Subtrees Seth Sullivant North Carolina State University March 24, 2018 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 1 / 23

Phylogenetics Problem Given a collection of species, find the tree that explains their history. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 2 / 23

Phylogenetics Problem Given a collection of species, find the tree that explains their history. Yeates, Meier, Wiegman, 2015 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 2 / 23

Rooted Binary X -Trees Definition A rooted tree T has a distinguished vertex ρ, the root. A rooted binary phylogenetic X tree T is a binary tree that has a distinguished root vertex and where the leaves are labeled by X. 1 6 4 2 5 7 8 3 In phylogenetics, only have access to data on extant (not extinct) species. We don t know data or information about species corresponding to internal nodes in the tree. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 3 / 23

Induced subtrees Let X be a label set, with n = X. Let T be a binary rooted phylogenetic X -tree. Given S X, T S is the binary restriction tree. 3 4 6 8 1 2 5 7 3 2 5 3 2 5 Definition Given T 1, T 2 binary rooted phylogenetic X -trees, MAST(T 1, T 2 ) = max{#s : S X and T 1 S = T 2 S } This is the size of a maximum agreement subtree. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 4 / 23

Example 3 4 6 8 1 2 5 7 1 6 4 2 5 7 8 3 MAST(T 1, T 2 ) = 3 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 5 / 23

Example 3 4 6 8 1 2 5 7 1 6 4 2 5 7 8 3 MAST(T 1, T 2 ) = 3 3 2 5 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 5 / 23

Example 3 4 6 8 1 2 5 7 1 6 4 2 5 7 8 3 MAST(T 1, T 2 ) = 3 3 2 5 6 5 7 4 2 7 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 5 / 23

Example 3 4 6 8 1 2 5 7 1 6 4 2 5 7 8 3 MAST(T 1, T 2 ) = 3 3 2 5 6 5 7 4 2 7 Theorem (Steel-Warnow 1993) There is an O(n 2 ) algorithm to compute MAST(T 1, T 2 ) of binary rooted phylogenetic X -trees. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 5 / 23

What is the distribution of MAST(T 1, T 2 )? Problem Determine the distribution of MAST(T 1, T 2 ) for reasonable nice probability distributions on rooted binary trees. Remark Uniform distribution Yule-Harding distribution Simulations [Bryant-Mackenzie-Steel 2003] suggest that under both the uniform distribution and the Yule-Harding distribution E[MAST(T 1, T 2 )] c n where n = X, for some constant c depending on the distribution. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 6 / 23

Motivation: Comparing New Phylogenetic Methods Suppose we come up with a new phylogenetic method. This method takes a data set D and constructs the tree M(D). If we know the correct tree T we can evaluate the method by computing MAST(T, M(D)). If MAST(T, M(D)) is consistently small (for lots of different D), then we conclude that the new method does not work well. How small is small? Is it smaller than what you would expect to see by random chance? Need to know the distribution of MAST(T, T ). Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 7 / 23

Motivation: Cospeciation Let T H be a phylogenetic tree of host species, and T P a phylogenetic tree of parasite species. Host and parasites are paired, so T H and T P have same label set. If MAST(T H, T P ) is large, reject hypothesis that T H and T P evolved independently. i.e. large MAST(T H, T P ) = cospeciation. Need distribution of MAST(T 1, T 2 ) for random trees under null hypothesis of independence to perform hypothesis test. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 8 / 23

Motivation: Cospeciation Let T H be a phylogenetic tree of host species, and T P a phylogenetic tree of parasite species. Host and parasites are paired, so T H and T P have same label set. If MAST(T H, T P ) is large, reject hypothesis that T H and T P evolved independently. i.e. large MAST(T H, T P ) = cospeciation. Need distribution of MAST(T 1, T 2 ) for random trees under null hypothesis of independence to perform hypothesis test. Hafner, M.S., Nadler, S.A. (1988) Nature 332: 258-259 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 8 / 23

Motivation: Cool Math Suppose that both T 1 and T 2 are comb trees. 1 2 3 4 5 6 7 8 9 w w w w w w w 1 2 3 4 5 6 7 w8 w9 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 9 / 23

Motivation: Cool Math Suppose that both T 1 and T 2 are comb trees. 1 2 3 4 5 6 7 8 9 w w w w w w w 1 2 3 4 5 6 7 w8 w9 A maximum agreement subtree corresponds to a longest increasing subsequence of the permutation w = w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9, denoted L(w). Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 9 / 23

Motivation: Cool Math Suppose that both T 1 and T 2 are comb trees. 1 2 3 4 5 6 7 8 9 w w w w w w w 1 2 3 4 5 6 7 w8 w9 A maximum agreement subtree corresponds to a longest increasing subsequence of the permutation w = w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9, denoted L(w). MAST(T 1, T 2 ) for uniformly random comb trees is equivalent to L(w) for uniformly random permutations w S n. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 9 / 23

Motivation: Cool Math Suppose that both T 1 and T 2 are comb trees. 1 2 3 4 5 6 7 8 9 w w w w w w w 1 2 3 4 5 6 7 w8 w9 A maximum agreement subtree corresponds to a longest increasing subsequence of the permutation w = w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9, denoted L(w). MAST(T 1, T 2 ) for uniformly random comb trees is equivalent to L(w) for uniformly random permutations w S n. Theorem (Baik-Deift-Johansson 1999) E[L(w)] = 2 n cn 1/6 + o(n 1/6 ) c 1.77108 L(w) 2 n n 1/6 Tracy-Widom Random Variable Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 9 / 23

Random Trees Biologists are interested in models for random trees as models for speciation processes. Uniform distribution: Select a uniform tree from all (2n 3)!! rooted binary phylogenetic trees Yule-Harding distribution: Grow a random tree by successively splitting leaves selected uniformly at random, then apply leaf labels at random. β-splitting model, α-splitting model, etc. Question 1 5 3 4 2 How well do the different random tree models match the shape and structure of phylogenetic trees occurring in nature? Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 10 / 23

Properties of Random Trees Proposition Both Yule-Harding and uniform random trees satisfy exchangeability and sampling consistency. P( )= P( ) Exchangeability: 1 2 3 4 5 1 5 3 4 2 Sampling Consistency: If T is a random tree, and S X then T S is a random tree from the same distribution on leaf label set S. Theorem (Aldous) The expected depth of a uniformly random tree is Θ( n). The expected depth of Yule-Harding random tree is Θ(log n). Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 11 / 23

Conjecture About The Maximum Agreement Subtree Conjecture For any exchangeable sampling consistent distribution on rooted binary phylogenetic X -trees, E[MAST(T 1, T 2 )] = Θ( n) where n = X. Recall that f (n) = Θ( n) means that there are positive constants c and C such that c n f (n) C n. Note that the constants c and C might depend on the probability distribution. We hope further that we can show that, asymptotically E[MAST(T 1, T 2 )] d n for some d (depending on the distribution) as n. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 12 / 23

Upper bounds Theorem (BHLSSS) For any exchangeable sampling consistent distribution on rooted binary phylogenetic trees, E[MAST(T 1, T 2 )] = O( n). Proof sketch for uniform distribution. For S X let X S = 1 if T 1 S = T 2 S, X S = 0 otherwise. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 13 / 23

Upper bounds Theorem (BHLSSS) For any exchangeable sampling consistent distribution on rooted binary phylogenetic trees, E[MAST(T 1, T 2 )] = O( n). Proof sketch for uniform distribution. For S X let X S = 1 if T 1 S = T 2 S, X S = 0 otherwise. Let Y n,k = S X,#S=k X S = number of agreement sets of size k Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 13 / 23

Upper bounds Theorem (BHLSSS) For any exchangeable sampling consistent distribution on rooted binary phylogenetic trees, E[MAST(T 1, T 2 )] = O( n). Proof sketch for uniform distribution. For S X let X S = 1 if T 1 S = T 2 S, X S = 0 otherwise. Let Y n,k = E[Y n,k ] = S X,#S=k ( ) n P(X k S = 1) = X S = number of agreement sets of size k ( ) n k 1 (2k 3)!! 0 if k > c n Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 13 / 23

Upper bounds Theorem (BHLSSS) For any exchangeable sampling consistent distribution on rooted binary phylogenetic trees, E[MAST(T 1, T 2 )] = O( n). Proof sketch for uniform distribution. For S X let X S = 1 if T 1 S = T 2 S, X S = 0 otherwise. Let Y n,k = E[Y n,k ] = S X,#S=k ( ) n P(X k S = 1) = X S = number of agreement sets of size k ( ) n k 1 (2k 3)!! 0 if k > c n Since E[Y n,k ] 0, with n large = P(MAST(T 1, T 2 ) > c n) goes to 0. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 13 / 23

Lower Bounds: Uniform Distribution Theorem (BHLSSS) Under the uniform distribution on trees E[MAST(T 1, T 2 )] = Ω(n 1/8 ). Proof Sketch. The expected depth of a uniform random tree is Θ(n 1/2 ). So with high probability there is a subset S X so T 1 S is a comb tree of size cn 1/2. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 14 / 23

Lower Bounds: Uniform Distribution Theorem (BHLSSS) Under the uniform distribution on trees E[MAST(T 1, T 2 )] = Ω(n 1/8 ). Proof Sketch. The expected depth of a uniform random tree is Θ(n 1/2 ). So with high probability there is a subset S X so T 1 S is a comb tree of size cn 1/2. Similarly, with high probability there is a subset S S with #S = Θ(n 1/4 ) so that T 1 S and T 2 S are both comb trees. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 14 / 23

Lower Bounds: Uniform Distribution Theorem (BHLSSS) Under the uniform distribution on trees E[MAST(T 1, T 2 )] = Ω(n 1/8 ). Proof Sketch. The expected depth of a uniform random tree is Θ(n 1/2 ). So with high probability there is a subset S X so T 1 S is a comb tree of size cn 1/2. Similarly, with high probability there is a subset S S with #S = Θ(n 1/4 ) so that T 1 S and T 2 S are both comb trees. By sampling consistency, T 1 S and T 2 S are uniformly random comb trees with Θ(n 1/4 ) leaves. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 14 / 23

Lower Bounds: Uniform Distribution Theorem (BHLSSS) Under the uniform distribution on trees E[MAST(T 1, T 2 )] = Ω(n 1/8 ). Proof Sketch. The expected depth of a uniform random tree is Θ(n 1/2 ). So with high probability there is a subset S X so T 1 S is a comb tree of size cn 1/2. Similarly, with high probability there is a subset S S with #S = Θ(n 1/4 ) so that T 1 S and T 2 S are both comb trees. By sampling consistency, T 1 S and T 2 S are uniformly random comb trees with Θ(n 1/4 ) leaves. By Baik-Deift-Johansson, T 1 S and T 2 S have an agreement subtree of expected size Θ(n 1/8 ). Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 14 / 23

Lower Bounds: Yule-Harding Distribution Theorem (BHLSSS) Under the Yule-Harding distribution on trees E[MAST(T 1, T 2 )] = Ω(n α ) where α is the positive root of 2 2 α = (α + 1)(α + 2) (α.344). From the Steel-Warnow algorithm, we see that for trees T 1 and T 2 of the following shapes A B C D MAST(T 1, T 2 ) max (MAST(A, C) + MAST(B, D), MAST(A, D) + MAST(B, C)) Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 15 / 23

Lower Bounds: Fixed Tree Shape Theorem (Misra-S.) Let T 1 and T 2 be uniformly random trees with the same tree shape with n leaves. Then E[MAST(T 1, T 2 )] = Θ( n). Idea comes from random comb trees and connections to longest increasing subsequences. 1 2 3 4 5 6 7 8 9 w w w w w w w 1 2 3 4 5 6 7 w8 w9 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 16 / 23

The Simplest Proof of Ω( n) Lower Bound Let w 1 w 2 w n be a uniformly random permutation. Break this into blocks of length k. B 1 B 2 B n/k = (w 1 w k ) (w k+1 w 2n ) (w n k+1 w n ) Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 17 / 23

The Simplest Proof of Ω( n) Lower Bound Let w 1 w 2 w n be a uniformly random permutation. Break this into blocks of length k. B 1 B 2 B n/k = (w 1 w k ) (w k+1 w 2n ) (w n k+1 w n ) Let s call block B i awesome if one of the numbers (i 1)k + 1,..., ik appears in that block. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 17 / 23

The Simplest Proof of Ω( n) Lower Bound Let w 1 w 2 w n be a uniformly random permutation. Break this into blocks of length k. B 1 B 2 B n/k = (w 1 w k ) (w k+1 w 2n ) (w n k+1 w n ) Let s call block B i awesome if one of the numbers (i 1)k + 1,..., ik appears in that block. Note that the awesome blocks gives AN increasing subsequence (but probably not the longest). 1 5 6 11 8 9 2 3 16 15 13 4 7 14 10 12 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 17 / 23

The Simplest Proof of Ω( n) Lower Bound Let w 1 w 2 w n be a uniformly random permutation. Break this into blocks of length k. B 1 B 2 B n/k = (w 1 w k ) (w k+1 w 2n ) (w n k+1 w n ) Let s call block B i awesome if one of the numbers (i 1)k + 1,..., ik appears in that block. Note that the awesome blocks gives AN increasing subsequence (but probably not the longest). 1 5 6 11 8 9 2 3 16 15 13 4 7 14 10 12 So if we can get a lower bound on the expected number of awesome blocks, that will give a lower bound on the length of the longest increasing subsequence. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 17 / 23

A block B i awesome if one of the numbers (i 1)k + 1,..., ik appears in that block. The probability that B i is awesome is approximately ( 1 1 k ) k n Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 18 / 23

A block B i awesome if one of the numbers (i 1)k + 1,..., ik appears in that block. The probability that B i is awesome is approximately ( 1 1 k ) k n The expected number of awesome blocks is then ( ( 1 1 k ) ) k n n k Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 18 / 23

A block B i awesome if one of the numbers (i 1)k + 1,..., ik appears in that block. The probability that B i is awesome is approximately ( 1 1 k ) k n The expected number of awesome blocks is then ( ( 1 1 k ) ) k n n k Taking k = n we get the expected number of awesome blocks is ( ( 1 1 1 ) ) n n (1 e 1 ) n n Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 18 / 23

A block B i awesome if one of the numbers (i 1)k + 1,..., ik appears in that block. The probability that B i is awesome is approximately ( 1 1 k ) k n The expected number of awesome blocks is then ( ( 1 1 k ) ) k n n k Taking k = n we get the expected number of awesome blocks is ( ( 1 1 1 ) ) n n (1 e 1 ) n n Proposition The expected length of the longest increasing subsequence of a uniformly random permutation is bounded below by (1 e 1 ) n. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 18 / 23

Extending These Ideas for Trees of Same Shape Proposition The leaf set of any tree on n leaves can be grouped into at least n 4k blobs of size between k and 2k 2. The blobs yield a scaffold tree which can force a structure for certain agreement subtrees between two trees of the same shape. n = 17, k = 3 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 19 / 23

Extending These Ideas for Trees of Same Shape Proposition The leaf set of any tree on n leaves can be grouped into at least n 4k blobs of size between k and 2k 2. The blobs yield a scaffold tree which can force a structure for certain agreement subtrees between two trees of the same shape. n = 17, k = 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 19 / 23

Extending These Ideas for Trees of Same Shape Proposition The leaf set of any tree on n leaves can be grouped into at least n 4k blobs of size between k and 2k 2. The blobs yield a scaffold tree which can force a structure for certain agreement subtrees between two trees of the same shape. n = 17, k = 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 19 / 23

Extending These Ideas for Trees of Same Shape Proposition The leaf set of any tree on n leaves can be grouped into at least n 4k blobs of size between k and 2k 2. The blobs yield a scaffold tree which can force a structure for certain agreement subtrees between two trees of the same shape. n = 17, k = 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 19 / 23

Let T 1, T 2 be two random trees with the same shape. Have corresponding blobs B 1 (T i ),..., B n/4k (T i ) i = 1, 2. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 20 / 23

Let T 1, T 2 be two random trees with the same shape. Have corresponding blobs B 1 (T i ),..., B n/4k (T i ) i = 1, 2. Call a blob B j awesome if B j (T 1 ) B j (T 2 ). The expected number of awesome blobs is at least ( ( n 1 1 k ) ) k. 4k n Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 20 / 23

Let T 1, T 2 be two random trees with the same shape. Have corresponding blobs B 1 (T i ),..., B n/4k (T i ) i = 1, 2. Call a blob B j awesome if B j (T 1 ) B j (T 2 ). The expected number of awesome blobs is at least ( ( n 1 1 k ) ) k. 4k n Awesome blobs give AN agreement subtree between T 1 and T 2, subtree of the scaffold. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 20 / 23

Let T 1, T 2 be two random trees with the same shape. Have corresponding blobs B 1 (T i ),..., B n/4k (T i ) i = 1, 2. Call a blob B j awesome if B j (T 1 ) B j (T 2 ). The expected number of awesome blobs is at least ( ( n 1 1 k ) ) k. 4k n Awesome blobs give AN agreement subtree between T 1 and T 2, subtree of the scaffold. Taking k = n gives: Proposition If T 1 and T 2 are uniformly random tree with n leaves and the same shape then E[MAST(T 1, T 2 )] 1 e 1 n. 4 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 20 / 23

Trees with Same Shape Theorem (Misra-S.) Let T 1 and T 2 be uniformly random trees with the same tree shape with n leaves. Then E[MAST(T 1, T 2 )] = Θ( n). Conjecture (Martin Thatte 2013) If T 1 and T 2 are arbitrary completely balanced trees with n leaves, then MAST(T 1, T 2 ) n. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 21 / 23

Summary: Now We Know How Little We Know Computing the distribution of MAST(T 1, T 2 ) is a generalization of hard problems in combinatorial probability. Simulations suggest that E[MAST(T 1, T 2 )] cn 1/2 for the uniform and Yule-Harding distributions. We have upper bounds of the form Cn 1/2 for all exchangeable, sampling consistent distributions. We have lower bounds of the form cn α for uniform, Yule-Harding distributions, fixed shape, and some β-splitting examples. Question Is E[MAST(T 1, T 2 )] cn 1/2 universal for all exchangeable, sampling consistent distributions? What else can be said about the distribution of MAST(T 1, T 2 )? Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 22 / 23

References Aldous, David. Probability distributions on cladograms. Random discrete structures (Minneapolis, MN, 1993), 1 18, IMA Vol. Math. Appl., 76, Springer, New York, 1996. Baik, Jinho; Deift, Percy; Johansson, Kurt On the distribution of the length of the longest increasing subsequence of random permutations. J. Amer. Math. Soc. 12 (1999), no. 4, 1119 1178. Bernstein, Daniel Irving; Ho, Lam Si Tung; Long, Colby; Steel, Mike; St. John, Katherine; Sullivant, Seth. Bounds on the expected size of the maximum agreement subtree. SIAM J. Discrete Math. 29 (2015), no. 4, 2065 2074. Bryant, David; McKenzie, Andy; Steel, Mike. The size of a maximum agreement subtree for random binary trees. Bioconsensus (Piscataway, NJ, 2000/2001), 55?65, DIMACS Ser. Discrete Math. Theoret. Comput. Sci., 61, Amer. Math. Soc., Providence, RI, 2003. Martin, Daniel M. and Thatte, Bhalchandra D. The maximum agreement subtree problem. Discrete Appl. Math. 161 (2013), no. 13 14, 1805 1817. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 23 / 23