Lecture 9 : Identifiability of Markov Models

Similar documents
Lecture 11 : Asymptotic Sample Complexity

Notes 3 : Maximum Parsimony

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Notes 18 : Optional Sampling Theorem

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Math 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E.

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

Notes 6 : First and second moment methods

Learning Nonsingular Phylogenies and Hidden Markov Models

Markov Decision Processes

Notes 15 : UI Martingales

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

Reconstructing Trees from Subtree Weights

A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction

Math 370 Spring 2016 Sample Midterm with Solutions

Lecture 6 & 7. Shuanglin Shao. September 16th and 18th, 2013

Lecture 13 : Kesten-Stigum bound

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Lecture 1 and 2: Random Spanning Trees

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

MTH 2032 SemesterII

The Generalized Neighbor Joining method

BMI/CS 776 Lecture 4. Colin Dewey

Quantum Computing Lecture 2. Review of Linear Algebra

One square and an odd number of triangles. Chapter 22

arxiv: v2 [q-bio.pe] 28 Jul 2009

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

CLOSURE OPERATIONS IN PHYLOGENETICS. Stefan Grünewald, Mike Steel and M. Shel Swenson

Therefore, A and B have the same characteristic polynomial and hence, the same eigenvalues.

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

MAT301H1F Groups and Symmetry: Problem Set 2 Solutions October 20, 2017

5 Quiver Representations

Linear Algebra Practice Problems

Markov properties for graphical time series models

Lectures 2 3 : Wigner s semicircle law

A lower bound for the Laplacian eigenvalues of a graph proof of a conjecture by Guo

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems

CS 581 Paper Presentation

Lassoing phylogenetic trees

Math Matrix Theory - Spring 2012

Lecture 19 : Chinese restaurant process

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS

Chain Independence and Common Information

A note on stochastic context-free grammars, termination and the EM-algorithm

Lecture 2: September 8

NOTES ON THE PERRON-FROBENIUS THEORY OF NONNEGATIVE MATRICES

Abstract Algebra, HW6 Solutions. Chapter 5

, some directions, namely, the directions of the 1. and

TMA Calculus 3. Lecture 21, April 3. Toke Meier Carlsen Norwegian University of Science and Technology Spring 2013

Evolutionary Tree Analysis. Overview

2t t dt.. So the distance is (t2 +6) 3/2

Determinantal Probability Measures. by Russell Lyons (Indiana University)

(df (ξ ))( v ) = v F : O ξ R. with F : O ξ O

1111: Linear Algebra I

Maximizing the numerical radii of matrices by permuting their entries

The least-squares approach to phylogenetics was first suggested

MATH 4107 (Prof. Heil) PRACTICE PROBLEMS WITH SOLUTIONS Spring 2018

Mean-field dual of cooperative reproduction

POSITIVE MAP AS DIFFERENCE OF TWO COMPLETELY POSITIVE OR SUPER-POSITIVE MAPS

Math Topology II: Smooth Manifolds. Spring Homework 2 Solution Submit solutions to the following problems:

Part IV. Rings and Fields

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

Likelihood Analysis of Gaussian Graphical Models

Q N id β. 2. Let I and J be ideals in a commutative ring A. Give a simple description of

Introduction to Matrices

arxiv: v1 [math.ra] 13 Jan 2009

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017

EE512 Graphical Models Fall 2009

Problem # Max points possible Actual score Total 120

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

ANSWERS (5 points) Let A be a 2 2 matrix such that A =. Compute A. 2

Phylogenetic invariants versus classical phylogenetics

1 Principal component analysis and dimensional reduction

Lectures 2 3 : Wigner s semicircle law

Spring, 2012 CIS 515. Fundamentals of Linear Algebra and Optimization Jean Gallier

A Phylogenetic Network Construction due to Constrained Recombination

Notes on the Matrix-Tree theorem and Cayley s tree enumerator

RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

Matrix estimation by Universal Singular Value Thresholding

Lecture 4: Applications: random trees, determinantal measures and sampling

A concise proof of Kruskal s theorem on tensor decomposition

Lecture 21: Spectral Learning for Graphical Models

Sudoku and Matrices. Merciadri Luca. June 28, 2011

A note on the principle of least action and Dirac matrices

1 Planar rotations. Math Abstract Linear Algebra Fall 2011, section E1 Orthogonal matrices and rotations

Math 304 Handout: Linear algebra, graphs, and networks.

1 Bayesian Linear Regression (BLR)

Laplacian Integral Graphs with Maximum Degree 3

Question: Given an n x n matrix A, how do we find its eigenvalues? Idea: Suppose c is an eigenvalue of A, then what is the determinant of A-cI?

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010

Lecture 21: The decomposition theorem into generalized eigenspaces; multiplicity of eigenvalues and upper-triangular matrices (1)

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

18.312: Algebraic Combinatorics Lionel Levine. Lecture (u, v) is a horizontal edge K u,v = i (u, v) is a vertical edge 0 else

IEOR 4106: Introduction to Operations Research: Stochastic Models Spring 2011, Professor Whitt Class Lecture Notes: Tuesday, March 1.

MATH36001 Perron Frobenius Theory 2015

2.1 Laplacian Variants

Gaussian Quiz. Preamble to The Humble Gaussian Distribution. David MacKay 1

Near-domination in graphs

Math 462: Homework 2 Solutions

Transcription:

Lecture 9 : Identifiability of Markov Models MATH285K - Spring 2010 Lecturer: Sebastien Roch References: [SS03, Chapter 8]. Previous class THM 9.1 (Uniqueness of tree metric representation) Let δ be a tree metric on X. Up to isomorphism, there is an unique tree metric representation of δ. 1 Markov Chain on a Tree We describe a standard model of nucleotide substitution. Let C be a finite character state space, e.g., C = {A, G, C, T}. Let T n be the set of rooted phylogenetic trees on X = [n] and M C be the set of all transition matrices on C, i.e., C C nonnegative matrices whose rows sum to 1. The set of probability distributions on C is denoted by C. DEF 9.2 (Markov Chain on a Tree (MCT)) Let T = (T, φ) T n with T = (V, E) rooted at ρ, P = {P e } e E M E C, and µ ρ C. A Markov chain on a tree (T, P, µ ρ ) is the following stochastic process Ξ V = {Ξ v } v V : Pick a state Ξ ρ for ρ according to µ ρ. Moving away from the root toward the leaves, apply to each edge e = {u, v} E with u T v the transition matrix P e independently from everything else. We denote by µ V the distribution on V so obtained. For ξ V = {ξ v } v V C V, the distribution µ V can be written explicitly as µ V (ξ V ) = µ ρ (ξ ρ ) = P[Ξ ρ = ξ ρ ] e={u,v} E,u T v P e ξ u,ξ v e={u,v} E,u T v 1 P[Ξ v = ξ v Ξ u = ξ u ], (1)

Lecture 9: Identifiability of Markov Models 2 where the second line follows from the construction of the process. For W V, ξ W = {ξ w } w W C W and W c = V \W, we let the marginal µ W at W be µ W (ξ W ) = ξ W c C W c µ V ((ξ W, ξ W c)). With a slight abuse of notation, for v V and a, b X we let µ v = µ {v}, µ a,b = µ {φ(a),φ(b)}, µ a,b,c = µ {φ(a),φ(b),φ(c)}, and µ X = µ φ(x). MCTs satisfy a natural generalization of the Markov property of discrete-time Markov processes. For W, Z V, we denote by µ W Z the conditional distribution of Ξ W given Ξ Z. THM 9.3 (Markov Property) Fix u V and v, a child of u, i.e., u T {u, v} E. For any W, Z V \{v} satisfying v and Then, w W, v T w and z Z, v T z. µ W Z {u} (ξ W ξ Z {u} ) = µ W {u} (ξ W ξ u ), for all ξ V C V. In other words, Ξ W and Ξ Z are conditionally independent given Ξ u. Proof: Clear from the construction. The main statistical problem we are interested in is the following. We will see below that reconstructing the root of the model is not in general possible. We denote by T ρ the phylogenetic tree T without its root. DEF 9.4 (Reconstruction Problem) Let Ξ = {Ξ 1 X,..., Ξk X } be i.i.d. samples from an MCT (T, P). Given Ξ, the tree reconstruction problem (TRP) consists in finding a phylogenetic X-tree ˆT such that ˆT = T ρ. Fix ε > 0. Given Ξ, the full reconstruction problem (FRP) consists in finding an MCT ( ˆT, ˆP, ˆµˆρ ) such that the corresponding distribution ˆµ X is satisfies µ X ˆµ X 1 µ X (ξ X ) ˆµ X (ξ X ) ε. ξ X C X 2 Issues with Identifiability For the reconstruction to be well-posed, it must be that the model is identifiable. DEF 9.5 (Identifiability) Let Λ = {λ θ } θ Θ be a family of distributions parametrized by θ Θ. We say that Λ is identifiable if: ν θ ν θ if and only if θ = θ.

Lecture 9: Identifiability of Markov Models 3 We will show below that the tree reconstruction problem is identifiable under the following conditions: µ ρ > 0, and e E, det P e 0, ±1. We denote by Θ n the family of MCT on X = [n] satisfying the conditions above. Re-rooting. We begin by explaining why we restrict ourselves to reconstructing unrooted trees. THM 9.6 (Re-rooting an MCT) Let (T, P, µ ρ ) Θ n with T = (T, φ) and T = (V, E) rooted at ρ. Then for any u V we can re-root the MCT at u without changing the distribution of the process. Proof: It suffices to show that the model can be re-rooted at a neighbour u of ρ. Let T be the tree T re-rooted at u. Let e = {ρ, u} and define P e = U 1 u (P e ) t U ρ, where U v is the C C diagonal matrix with diagonal µ v. Note that the assumptions in Θ n imply that µ u > 0. Indeed, since µ u = µ ρ P e (interpreting µ v as a row vector) and µ ρ > 0, a zero component in µ u would imply the existence of a zero column in P e in which case we would have det P e = 0 a contradiction. Let ( T, P, µ ρ ) be the MCT with T = ( T, φ) and T = T rooted at ρ = u, P being the same as P except for the matrix along e which is now P e, and µ ρ = µ u. Let µ X be the corresponding distribution. We claim that µ X µ X. Note that by Bayes rule, for α, β C, P e α,β = P[Ξ ρ = β Ξ u = α], where Ξ V (T, P, µ ρ ). The result then follows from (1). Determinantal conditions. conditions above are natural. We show through an example that the determinantal EX 9.7 Fix C = {0, 1} and X = {a, b, c, d}. Let q 1 = ab cd and q 2 = ac bd be two quartet trees on X. Assign to each edge of q 1 and q 2 the same transition matrix P. Denote by µ 1 and µ 2 the corresponding distributions. We consider two cases:

Lecture 9: Identifiability of Markov Models 4 Suppose det P = 0. Then, it must be that ( ) p 1 p P =, p 1 p for some 0 < p < 1. But then the endpoints of any edge are independent. (Notice that, in general, this is not the case when C > 2.) In particular, the states at the leaves are independent and µ 1 µ 2. Therefore, the model is not identifiable in that case. Suppose P is the identity matrix, in which case det P = 1. Then all states are equal and, again, µ 1 µ 2. 3 Log-Det Distance Our main result is the following. THM 9.8 (Tree Identifiability) Let (T, P, µ ρ ) Θ n with corresponding distribution µ V. Then T ρ can be obtained from µ X. In fact, pairwise marginals {µ a,b } a,b X suffice to derive T ρ. Proof: The proof relies on Theorem 9.1 through the following metric: DEF 9.9 (Logdet Distance) For a, b X, let P ab be defined as follows α, β C, P ab α,β = P[Ξ φ(b) = β Ξ φ(a) = α]. The logdet distance between a and b is the dissimilarity map δ(a, b) = 1 2 log det[p ab P ba ]. We claim that δ is a tree metric on T with edge weights w e = 1 2 log det[p e P e ] > 0, where P e is defined as above. Let u be the most recent common ancestor of a and b under T. Let e 1,..., e m (resp. e 1,..., e m ) be the path between u and a (resp. b). Then re-rooting at a and b respectively we get and P ab = P e m P e 1 P e1 P em, P ba = P em P e 1 P e 1 P e m. The result then follows from the multiplicativity of the determinant.

Lecture 9: Identifiability of Markov Models 5 4 Chang s Eigenvector Decomposition We showed that the tree structure can be deduced from pairwise distributions at the leaves. Interestingly, this is not the case for transition matrices. For an example, see [Cha96]. Instead, one must use three-way distributions. To avoid the possibility of relabeling the states at interior vertices, we also require the following assumption. DEF 9.10 (Reconstructibility from Rows) A class of transition matrices M is reconstructible by rows if for each P M and each permutation matrix Π I (i.e., a stochastic matrix with exactly one 1 in each row and column) we have ΠP / M. For instance, matrices such that the diagonal is striclty largest in each column is reconstructible by rows. THM 9.11 (Full Identifiability) Let Θ n be a subclass of Θ n reconstructible by rows. Let (T, P, µ ρ ) Θ n with corresponding distribution µ V. Then up to the placement of the root (T, P, µ ρ ) can be derived from µ X. In fact, three-way marginals {µ a,b,c } a,b,c X suffice for this purpose. Proof: From the previous theorem, we can derive the unrooted tree structure. Because the the transition matrix on a path is the product of the transition matrices on the corresponding edges, it is easy to see that it suffices to recover the transition matrices for the case n = 3. Let T be a tree on three leaves {a, b, c} rooted at the interior vertex m. Denote by Ξ V the state vector. By abuse of notation, we denote Ξ x = Ξ φ(x) for x X. Chang s clever solution to the identifiability problem relies on the following calculation [Cha96]. Fix γ C. Note that P[Ξ c = γ, Ξ b = j Ξ a = i] = k C P[Ξ m = k, Ξ c = γ, Ξ b = j Ξ a = i] = P[Ξ m = k Ξ a = i] k C P[Ξ c = γ Ξ m = k, Ξ a = i] P[Ξ b = j Ξ c = γ, Ξ m = k, Ξ a = i] = P[Ξ m = k Ξ a = i] k C P[Ξ c = γ Ξ m = k] P[Ξ b = j Ξ m = k].

Lecture 9: Identifiability of Markov Models 6 Writing P ab,γ for the matrix with elements P[Ξ c = γ, Ξ b = j Ξ a = i], we obtain in matrix form P ab,γ = P am Diag(P mc γ )P mb, or, since (P ab ) 1 = (P mb ) 1 (P am ) 1, Notice that: (P ab ) 1 P ab,γ = (P mb ) 1 Diag(P mc γ )P mb. The L.H.S. depends only on µ X. The R.H.S. is an eigenvector decomposition. We want to extract P mb from the decomposition above. There are two issues: 1. If the entries of P mc γ are not distinct, the eigenvector decomposition is not unique. 2. The eigenvectors are not ordered. The second point is taken care of by the reconstructibility from rows assumption. The solution to the first issue is to force the entries to be distinct. For instance, pick a random vector (G γ ) γ C where each entry is an independent N(0, 1). Define P ab,g = γ C P ab,γ G γ, and P mc = P γ mc G γ. γ C Then, with probability 1, the entries of P mc are distinct and (P ab ) 1 P ab,g = (P mb ) 1 mc Diag( P )P mb. Further reading The definitions and results discussed here were taken from Chapter 8 of [SS03]. Much more on the subject can be found in that excellent monograph. See also [SS03] for the relevant bibliographic references. The identifiability proofs are from [Ste94, Cha96].

Lecture 9: Identifiability of Markov Models 7 References [Cha96] Joseph T. Chang. Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math. Biosci., 137(1):51 73, 1996. [SS03] Charles Semple and Mike Steel. Phylogenetics, volume 24 of Oxford Lecture Series in Mathematics and its Applications. Oxford University Press, Oxford, 2003. [Ste94] M. Steel. Recovering a tree from the leaf colourations it generates under a Markov model. Appl. Math. Lett., 7(2):19 23, 1994.