Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA Workshop Applications in Biology, Dynamics, and Statistics March 8, 2007 M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 1 / 36
Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 2 / 36
Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 2 / 36
Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 2 / 36
Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 2 / 36
Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 3 / 36
Phylogenetic reconstruction Given n current species, e.g. HUMAN, GORILLA, CHIMP, given part of their genome and an alignment alignment Goal: to reconstruct their ancestral relationships (phylogeny): M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 4 / 36
Phylogenetic reconstruction Given n current species, e.g. HUMAN, GORILLA, CHIMP, given part of their genome and an alignment alignment Goal: to reconstruct their ancestral relationships (phylogeny): M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 4 / 36
Alignment Mutations, deletions and insertions of nucleotides occur along the speciation process. Given two DNA sequences, an alignment is a correspondence between them that accounts for their differences. The optimal alignment is the one that minimizes the number of mutations, deletions and insertions. seq 1 : ACGTAGCTAAGTTA... seq 2 : ACCGAGACCCAGTA... A possible alignment is: seq 1 A C G T A G C T A A G T T A seq 2 A C C G A G A C C C A G T A M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 5 / 36
Alignment Mutations, deletions and insertions of nucleotides occur along the speciation process. Given two DNA sequences, an alignment is a correspondence between them that accounts for their differences. The optimal alignment is the one that minimizes the number of mutations, deletions and insertions. seq 1 : ACGTAGCTAAGTTA... seq 2 : ACCGAGACCCAGTA... A possible alignment is: seq 1 A C G T A G C T A A G T T A seq 2 A C C G A G A C C C A G T A M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 5 / 36
Phylogenetic reconstruction Given n current species, e.g. HUMAN, GORILLA, CHIMP, given part of their genome and an alignment alignment AACTTCGAGGCTTACCGCTG AAGGTCGATGCTCACCGATG AACGTCTATGCTCACCGATG Goal: to reconstruct their ancestral relationships (phylogeny): M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 6 / 36
Phylogenetic reconstruction Given n current species, e.g. HUMAN, GORILLA, CHIMP, given part of their genome and an alignment alignment AACTTCGAGGCTTACCGCTG AAGGTCGATGCTCACCGATG AACGTCTATGCTCACCGATG Goal: to reconstruct their ancestral relationships (phylogeny): M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 6 / 36
Algebraic evolutionary models Assume that all sites of the alignment evolve equally and independently. At each node of the tree we put a random variable taking values in {A,C,G,T} t i = branch length (represents the number of mutations per site along that branch) Variables at the leaves are observed and variables at the interior nodes are hidden. At each branch we write a matrix (substitution matrix) with the probabilities of a nucleotide at the parent node being substituted by another at its child. S = A C G T A C G T P(A A) P(C A) P(G A) P(T A) P(A C) P(C C) P(G C) P(T C) P(A G) P(C G) P(G G) P(T G) P(A T ) P(C T ) P(G T ) P(T T ) M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 7 / 36
Algebraic evolutionary models Assume that all sites of the alignment evolve equally and independently. At each node of the tree we put a random variable taking values in {A,C,G,T} t i = branch length (represents the number of mutations per site along that branch) Variables at the leaves are observed and variables at the interior nodes are hidden. At each branch we write a matrix (substitution matrix) with the probabilities of a nucleotide at the parent node being substituted by another at its child. S = A C G T A C G T P(A A) P(C A) P(G A) P(T A) P(A C) P(C C) P(G C) P(T C) P(A G) P(C G) P(G G) P(T G) P(A T ) P(C T ) P(G T ) P(T T ) M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 7 / 36
The entries of S i are unknown parameters. Example (Group-based models. Kimura 3-parameters) Root node has uniform distribution: π A =... = π T = 0.25 S i = A C G T A C G T ( ai b i c i d i b i a i d i c i c i d i a i b i d i c i b i a i Kimura 2-parameters: b i = d i. Jukes-Cantor: b i = c i = d i. ), a i + b i + c i + d i = 1 M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 8 / 36
The entries of S i are unknown parameters. Example (Group-based models. Kimura 3-parameters) Root node has uniform distribution: π A =... = π T = 0.25 S i = A C G T A C G T ( ai b i c i d i b i a i d i c i c i d i a i b i d i c i b i a i Kimura 2-parameters: b i = d i. Jukes-Cantor: b i = c i = d i. ), a i + b i + c i + d i = 1 M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 8 / 36
The entries of S i are unknown parameters. Example (Group-based models. Kimura 3-parameters) Root node has uniform distribution: π A =... = π T = 0.25 S i = A C G T A C G T ( ai b i c i d i b i a i d i c i c i d i a i b i d i c i b i a i Kimura 2-parameters: b i = d i. Jukes-Cantor: b i = c i = d i. ), a i + b i + c i + d i = 1 M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 8 / 36
Hidden Markov process We denote the joint distribution of the observed variables X 1, X 2, X 3 as p x1 x 2 x 3 = Prob(X 1 = x 1, X 2 = x 2, X 3 = x 3 ). p x1 x 2 x 3 = π yr S 1 (x 1, y r )S 4 (y 4, y r )S 2 (x 2, y 4 )S 3 (x 3, y 4 ) y 4,y r {A,C,G,T} p x1 x 2 x 3 is a homogeneous polynomial on the parameters whose degree is the number of edges. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 9 / 36
The evolutionary model defines a polynomial map ϕ : R d R 4n θ = (θ 1,..., θ d ) (p AA...A, p AA...C, p AA...G,..., p TT...T ) ϕ : d 1 4n 1 ϕ : C d C 4n Algebraic variety V = imϕ, closure in Zariski topology. Given an alignment, we can estimate the joint probability p x1,x 2,x 3 as the relative frequency of column x 1 x 2 x 3 in the alignment. In the theoretical model, this would be a point on the variety V. Goal: use the ideal of V, I(V ) R[p AA...A, p AA...C,..., p TT...T ], to infer the topology of the phylogenetic tree. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 10 / 36
The evolutionary model defines a polynomial map ϕ : R d R 4n θ = (θ 1,..., θ d ) (p AA...A, p AA...C, p AA...G,..., p TT...T ) ϕ : d 1 4n 1 ϕ : C d C 4n Algebraic variety V = imϕ, closure in Zariski topology. Given an alignment, we can estimate the joint probability p x1,x 2,x 3 as the relative frequency of column x 1 x 2 x 3 in the alignment. In the theoretical model, this would be a point on the variety V. Goal: use the ideal of V, I(V ) R[p AA...A, p AA...C,..., p TT...T ], to infer the topology of the phylogenetic tree. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 10 / 36
The evolutionary model defines a polynomial map ϕ : R d R 4n θ = (θ 1,..., θ d ) (p AA...A, p AA...C, p AA...G,..., p TT...T ) ϕ : d 1 4n 1 ϕ : C d C 4n Algebraic variety V = imϕ, closure in Zariski topology. Given an alignment, we can estimate the joint probability p x1,x 2,x 3 as the relative frequency of column x 1 x 2 x 3 in the alignment. In the theoretical model, this would be a point on the variety V. Goal: use the ideal of V, I(V ) R[p AA...A, p AA...C,..., p TT...T ], to infer the topology of the phylogenetic tree. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 10 / 36
Computing the ideal (Eriksson Ranestad Sturmfels Sullivant 05) Some generators of I(V ) depend only on the model chosen (not on the topology). E.g. for Jukes-Cantor they are: p x1 x 2 x 3 = 1 p AAA = p CCC = p GGG = p TTT 4 terms p AAC = p AAG = p AAT = = p TTG 12 terms p ACA = p AGA = p ATA = = p TGT 12 terms p CAA = p GAA = p TAA = = p GTT 12 terms p ACG = p ACT = p AGT = = p CGT 24 terms For this model, the unique polynomial that detects the phylogeny of the three species has degree 3. Definition (Cavender Felsenstein 87) The generators of I(V ) are called phylogenetic invariants. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 11 / 36
Computing the ideal (Eriksson Ranestad Sturmfels Sullivant 05) Some generators of I(V ) depend only on the model chosen (not on the topology). E.g. for Jukes-Cantor they are: p x1 x 2 x 3 = 1 p AAA = p CCC = p GGG = p TTT 4 terms p AAC = p AAG = p AAT = = p TTG 12 terms p ACA = p AGA = p ATA = = p TGT 12 terms p CAA = p GAA = p TAA = = p GTT 12 terms p ACG = p ACT = p AGT = = p CGT 24 terms For this model, the unique polynomial that detects the phylogeny of the three species has degree 3. Definition (Cavender Felsenstein 87) The generators of I(V ) are called phylogenetic invariants. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 11 / 36
Computing the ideal (Eriksson Ranestad Sturmfels Sullivant 05) Some generators of I(V ) depend only on the model chosen (not on the topology). E.g. for Jukes-Cantor they are: p x1 x 2 x 3 = 1 p AAA = p CCC = p GGG = p TTT 4 terms p AAC = p AAG = p AAT = = p TTG 12 terms p ACA = p AGA = p ATA = = p TGT 12 terms p CAA = p GAA = p TAA = = p GTT 12 terms p ACG = p ACT = p AGT = = p CGT 24 terms For this model, the unique polynomial that detects the phylogeny of the three species has degree 3. Definition (Cavender Felsenstein 87) The generators of I(V ) are called phylogenetic invariants. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 11 / 36
Problem: computation of invariants Computational algebra software fail to compute the ideal for 4 species! Kimura 3-parameter, 4 species, 8002 generators like: [Small trees webpage] M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 12 / 36
Problem: computation of invariants Computational algebra software fail to compute the ideal for 4 species! Kimura 3-parameter, 4 species, 8002 generators like: [Small trees webpage] M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 12 / 36
Group-based models. Discrete Fourier transform Group-based models (Kimura and Jukes-Cantor): G = Z 2 Z 2, A=(0,0),C=(0,1),G=(1,0),T=(1,1). Substitution matrices are functions on the group: S i (g p, g c ) = f i (g p g c ), g c, g p G. p g1,...,g m = p(g 1,..., g m ) = 1 4 e edges sum over all possible values at interior nodes. f e (g p(e) g c(e) ) Discrete Fourier transform f : G C ˆf (χ) = χ(g)f (g), χ Hom(G, C ) = G g G Convolution: (f 1 f 2 )(g) = h G f 1 (h)f 2 (g h) = f 1 f 2 = f 1 f 2 M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 13 / 36
Group based models. Theorem (Evans Speed) For a group-based model (i.e. Kimura 3, Kimura 2 or Jukes-Cantor) on a tree T, the discrete Fourier transform of the joint distribution p(g 1,..., g n ) has the following form q(χ 1,..., χ m ) = e edges f e( l leaves below e χ l ) Using a linear change of coordinates, one gets a monomial parameterization of the variety associated to the model ϕ : C d C 4n θ = ( θ 1,..., θ d ) (q AA...A, q AA...C, q AA...G,..., q TT...T ) so that it is a toric variety. The ideal is generated by binomials. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 14 / 36
Fourier transform for Kimura 3-parameter model Parameters S e = a e b e c e d e b e a e d e c e c e d e a e b e d e c e b e a e Fourier parameters Pe = simplex: 3 3 a e+b e+c e+d e=1 P e A =1 PA e 0 0 0 0 PC e 0 0 0 0 PG e 0 0 0 0 PT e P e A =ae +b e +c e +d e, P e C =ae b e +c e d e... coordinates: p x1...x n q g1...g n, g 1 + +g n=0 simplex: 4n 1 4n 1 1 px1...x n = 1 q A...A = 1 Linear invariants: q g1...g n = 0 if g 1 + + g n 0 in Z 2 Z 2. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 15 / 36
For the Kimura 3-parameter model, we are interested in V := im(ϕ) where ϕ : e E(T ) C 4 C 4n 1 (P e A,Pe C,Pe G,Pe T ) e (q x1...x n ) {x1,...,x n x 1 + +x n=0} But 4n 1 1 {q A...A = 1}. Definition The Kimura variety of the phylogenetic tree T is W := V {q A...A = 1}. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 16 / 36
Recursive construction of invariants Computational algebra software: even in Fourier parameterization, fail for 5 leaves. Theorem (Sturmfels Sullivant 05) For any group-based model, they give an explicit algorithm for obtaining the generators of the ideal of phylogenetic invariants of an n-leaved tree from those of an unrooted tree of 3 leaves. The generators are binomials of degree 4. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 17 / 36
Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 18 / 36
Phylogenetic inference using algebraic geometry 1990-1995: Biologists claim that phylogenetic invariants are not useful for phylogenetic reconstruction. Lake only used linear invariants! In particular, a work of Huelsenbeck 95 reveals the inefficiency of Lake s method of invariants. (Eriksson 05) for General Markov Model. (C Garcia Sullivant 05) for Kimura 3-parameter. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 19 / 36
Phylogenetic inference using algebraic geometry 1990-1995: Biologists claim that phylogenetic invariants are not useful for phylogenetic reconstruction. Lake only used linear invariants! In particular, a work of Huelsenbeck 95 reveals the inefficiency of Lake s method of invariants. (Eriksson 05) for General Markov Model. (C Garcia Sullivant 05) for Kimura 3-parameter. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 19 / 36
Phylogenetic inference using algebraic geometry Model: Unrooted tree of 4 leaves, Kimura 3-parameters. Naive algorithm: Take 1 of the evaluation of all invariants on the point given by the relative frequencies of the columns of the alignment. Obtain a score for each possible topology and choose the one with smaller score. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 20 / 36
Phylogenetic inference using algebraic geometry Model: Unrooted tree of 4 leaves, Kimura 3-parameters. Naive algorithm: Take 1 of the evaluation of all invariants on the point given by the relative frequencies of the columns of the alignment. Obtain a score for each possible topology and choose the one with smaller score. Huelsenbeck 95: assesses the performance of different phylogenetic reconstruction methods on the following tree space. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 20 / 36
Results on simulated data (Huelsenbeck) Huelsenbeck s results: Length 100 Length 500 Length 1000 Lake invariants NJ ML Huelsenbeck, Syst. Biol., 1995 M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 21 / 36
C Fernandez-Sanchez studies on C Garcia Sullivant method Length 100 Length 500 Length 1000 M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 22 / 36
Why using phylogenetic invariants 1 We do not need to estimate parameters of the model. 2 The algebraic model allows species within the tree to evolve at different mutation rates In the non-algebraic version, one has: S i = exp(q t i ) where Q is a fixed matrix that represents the instantaneous mutation rate of all the species in the tree. In the algebraic model, one is allowing different rates at each branch of the tree. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 23 / 36
For instance using algebraic geometry one should be able to reconstruct the biologically correct tree: dog human chimp rat mouse chicken (Al-Aidroos, Snir 05) chapter 21 in ASCB book. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 24 / 36
For instance using algebraic geometry one should be able to reconstruct the biologically correct tree: dog human chimp rat mouse chicken (Al-Aidroos, Snir 05) chapter 21 in ASCB book. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 24 / 36
For instance using algebraic geometry one should be able to reconstruct the biologically correct tree: dog human chimp rat mouse chicken dog human chimp rat mouse chicken (Al-Aidroos, Snir 05) chapter 21 in ASCB book. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 24 / 36
Simulations on non-homogeneous trees (different rate matrices) M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 25 / 36
Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 26 / 36
The global geometry of the Kimura variety Recall that V := im(ϕ) where ϕ : C 4 C 4n 1 e E(T ) (P e A,Pe C,Pe G,Pe T ) e (q x1...x n ) {x1,...,x n x 1 + +x n=0} But 4n 1 1 {q A...A = 1}. Definition The Kimura variety of the phylogenetic tree T is W := V {q A...A = 1}. W = ϕ( e E(T ) (C4 {P e A = 1})). M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 27 / 36
Local complete intersection V A N algebraic variety, µ minimum number of generators of I(V ), then µ codim(v ) = N dim(v ). V is called a complete intersection if µ = codim(v ). Example: the Kimura variety W C 4n 1 1 has codimension 4 n 1 6n + 8, n = number of leaves. For n = 4, the minimum number of generators of the Kimura variety is 8002 whereas its codimension is 48! On a neighborhood of a smooth point, any variety is a local complete intersection (i.e. it can be defined by codim(v ) equations). Goal: 1 Find the singular points. 2 Provide a set of local generators at non-singular points. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 28 / 36
Local complete intersection V A N algebraic variety, µ minimum number of generators of I(V ), then µ codim(v ) = N dim(v ). V is called a complete intersection if µ = codim(v ). Example: the Kimura variety W C 4n 1 1 has codimension 4 n 1 6n + 8, n = number of leaves. For n = 4, the minimum number of generators of the Kimura variety is 8002 whereas its codimension is 48! On a neighborhood of a smooth point, any variety is a local complete intersection (i.e. it can be defined by codim(v ) equations). Goal: 1 Find the singular points. 2 Provide a set of local generators at non-singular points. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 28 / 36
Local complete intersection V A N algebraic variety, µ minimum number of generators of I(V ), then µ codim(v ) = N dim(v ). V is called a complete intersection if µ = codim(v ). Example: the Kimura variety W C 4n 1 1 has codimension 4 n 1 6n + 8, n = number of leaves. For n = 4, the minimum number of generators of the Kimura variety is 8002 whereas its codimension is 48! On a neighborhood of a smooth point, any variety is a local complete intersection (i.e. it can be defined by codim(v ) equations). Goal: 1 Find the singular points. 2 Provide a set of local generators at non-singular points. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 28 / 36
Case n = 3 The parameterization ϕ is: ϕ : C 9 C 15 ((1,P 1 C,P1 G,P1 T ),(1,P2 C,P2 G,P2 T ),(1,P3 C,P3 G,P3 T )) (qxyz=p1 x P2 y P3 z ) {x+y+z=0} Let H = (Z 2 Z 2, ) and let (ε, δ) H act on C 9 sending (P 1, P 2, P 3 ) to ((1,εP 1 C,δP1 G,εδP1 T ),(1,εP2 C,δP2 G,εδP2 T ),(1,εP3 C,δP3 G,εδP3 T )). E.g. (ε, δ) = (1, 1), in probability parameters the group action permutes A C and G T. Proposition The Kimura variety W = im(ϕ) on T 3 is the affine GIT quotient C 9 //H. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 29 / 36
Arbitrary n T unrooted n-leaved tree (2n 3 edges, n 2 interior nodes) Extending the action of H to the n 2 interior nodes of T, we have Theorem The Kimura variety W is isomorphic to the affine GIT quotient Corollary (C 3 ) 2n 3 //H n 2 W = im(ϕ), no closure needed. ϕ 1 (q) 4 n 2 and there is just one preimage with biological meaning, q W. We are able to determine the singular points. In particular, biologically meaningful points q W + := ϕ( 3 + ) are not singular. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 30 / 36
Arbitrary n T unrooted n-leaved tree (2n 3 edges, n 2 interior nodes) Extending the action of H to the n 2 interior nodes of T, we have Theorem The Kimura variety W is isomorphic to the affine GIT quotient Corollary (C 3 ) 2n 3 //H n 2 W = im(ϕ), no closure needed. ϕ 1 (q) 4 n 2 and there is just one preimage with biological meaning, q W. We are able to determine the singular points. In particular, biologically meaningful points q W + := ϕ( 3 + ) are not singular. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 30 / 36
Arbitrary n T unrooted n-leaved tree (2n 3 edges, n 2 interior nodes) Extending the action of H to the n 2 interior nodes of T, we have Theorem The Kimura variety W is isomorphic to the affine GIT quotient Corollary (C 3 ) 2n 3 //H n 2 W = im(ϕ), no closure needed. ϕ 1 (q) 4 n 2 and there is just one preimage with biological meaning, q W. We are able to determine the singular points. In particular, biologically meaningful points q W + := ϕ( 3 + ) are not singular. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 30 / 36
Arbitrary n T unrooted n-leaved tree (2n 3 edges, n 2 interior nodes) Extending the action of H to the n 2 interior nodes of T, we have Theorem The Kimura variety W is isomorphic to the affine GIT quotient Corollary (C 3 ) 2n 3 //H n 2 W = im(ϕ), no closure needed. ϕ 1 (q) 4 n 2 and there is just one preimage with biological meaning, q W. We are able to determine the singular points. In particular, biologically meaningful points q W + := ϕ( 3 + ) are not singular. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 30 / 36
In Fourier parameters: 3 + = In probability parameters this is transformed into: S = A C G T A C G T ( a b c d b a d c c d a b d c b a ) a + b + c + d = 1 a + b c d > 0 a b + c d > 0 a b c + d > 0 M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 31 / 36
In Fourier parameters: 3 + = In probability parameters this is transformed into: S = A C G T A C G T ( a b c d b a d c c d a b d c b a ) a + b + c + d = 1 a + b > 1/2 a + c > 1/2 a + d > 1/2 M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 31 / 36
Local complete intersection Case n = 3 (Sturmfels Sullivant 05): I(V ) is minimally generated by 16 cubics and 18 quartics. Lemma The following six quartics q AAA q ATT q TCG q TGC q ACC q AGG q TAT q TTA, q CCA q CTG q TAT q TGC q CAC q CGT q TCG q TTA, q AGG q ATT q CAC q CCA q AAA q ACC q CGT q CTG, q ACC q ATT q GAG q GGA q AAA q AGG q GCT q GTC, q CAC q CTG q GCT q GGA q CCA q CGT q GAG q GTC, q GGA q GTC q TAT q TCG q GAG q GCT q TGC q TTA generate a local complete intersection that defines W at each point q W +. It does not depend on the point q. Skip proof M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 32 / 36
Proof. 1 J = (f 1..., f 6 ) defines a variety X containing W and both have the same dimension. 2 The points q W + are smooth points of X and W. 3 Both varieties coincide locally. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 33 / 36
Local complete intersection Arbitrary n quartics: quadrics: Q, non-redundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 34 / 36
Local complete intersection Arbitrary n quartics: quadrics: Q, non-redundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 34 / 36
Local complete intersection Arbitrary n quartics: J 3 quadrics: Q, non-redundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 34 / 36
Local complete intersection Arbitrary n quartics: J 3 quadrics: Q, non-redundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 34 / 36
Local complete intersection Arbitrary n quartics: J 3 J n 1 quadrics: Q, non-redundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 34 / 36
Local complete intersection Arbitrary n quartics: J 3 J n 1 quadrics: Q, non-redundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 34 / 36
Local complete intersection Arbitrary n quartics: J 3 J n 1 quadrics: Q, non-redundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 34 / 36
Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 35 / 36
New results on simulated data 4 leaves Instead of all generators, 48 polynomials that generate locally Length 100 Length 500 Length 1000 M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 36 / 36
New results on simulated data 4 leaves Instead of all generators, 48 polynomials that generate locally Length 100 Length 500 Length 1000 M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 36 / 36
New results on simulated data 4 leaves Instead of all generators, 48 polynomials that generate locally Length 100 Length 500 Length 1000 (Eriksson Yao 07) Machine learning approach, 52 invariants that perform better on this parameter space. M. Casanellas (UPC) Using algebraic geometry... IMA, March 2007 36 / 36