Using algebraic geometry for phylogenetic reconstruction

Size: px
Start display at page:

Download "Using algebraic geometry for phylogenetic reconstruction"

Transcription

1 Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA Workshop Applications in Biology, Dynamics, and Statistics March 8, 2007 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

2 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

3 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

4 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

5 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

6 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

7 Phylogenetic reconstruction Given n current species, e.g. HUMAN, GORILLA, CHIMP, given part of their genome and an alignment alignment Goal: to reconstruct their ancestral relationships (phylogeny): M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

8 Phylogenetic reconstruction Given n current species, e.g. HUMAN, GORILLA, CHIMP, given part of their genome and an alignment alignment Goal: to reconstruct their ancestral relationships (phylogeny): M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

9 Alignment Mutations, deletions and insertions of nucleotides occur along the speciation process. Given two DNA sequences, an alignment is a correspondence between them that accounts for their differences. The optimal alignment is the one that minimizes the number of mutations, deletions and insertions. seq 1 : ACGTAGCTAAGTTA... seq 2 : ACCGAGACCCAGTA... A possible alignment is: seq 1 A C G T A G C T A A G T T A seq 2 A C C G A G A C C C A G T A M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

10 Alignment Mutations, deletions and insertions of nucleotides occur along the speciation process. Given two DNA sequences, an alignment is a correspondence between them that accounts for their differences. The optimal alignment is the one that minimizes the number of mutations, deletions and insertions. seq 1 : ACGTAGCTAAGTTA... seq 2 : ACCGAGACCCAGTA... A possible alignment is: seq 1 A C G T A G C T A A G T T A seq 2 A C C G A G A C C C A G T A M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

11 Phylogenetic reconstruction Given n current species, e.g. HUMAN, GORILLA, CHIMP, given part of their genome and an alignment alignment AACTTCGAGGCTTACCGCTG AAGGTCGATGCTCACCGATG AACGTCTATGCTCACCGATG Goal: to reconstruct their ancestral relationships (phylogeny): M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

12 Phylogenetic reconstruction Given n current species, e.g. HUMAN, GORILLA, CHIMP, given part of their genome and an alignment alignment AACTTCGAGGCTTACCGCTG AAGGTCGATGCTCACCGATG AACGTCTATGCTCACCGATG Goal: to reconstruct their ancestral relationships (phylogeny): M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

13 Algebraic evolutionary models Assume that all sites of the alignment evolve equally and independently. At each node of the tree we put a random variable taking values in {A,C,G,T} t i = branch length (represents the number of mutations per site along that branch) Variables at the leaves are observed and variables at the interior nodes are hidden. At each branch we write a matrix (substitution matrix) with the probabilities of a nucleotide at the parent node being substituted by another at its child. S = A C G T A C G T P(A A) P(C A) P(G A) P(T A) P(A C) P(C C) P(G C) P(T C) P(A G) P(C G) P(G G) P(T G) P(A T ) P(C T ) P(G T ) P(T T ) M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

14 Algebraic evolutionary models Assume that all sites of the alignment evolve equally and independently. At each node of the tree we put a random variable taking values in {A,C,G,T} t i = branch length (represents the number of mutations per site along that branch) Variables at the leaves are observed and variables at the interior nodes are hidden. At each branch we write a matrix (substitution matrix) with the probabilities of a nucleotide at the parent node being substituted by another at its child. S = A C G T A C G T P(A A) P(C A) P(G A) P(T A) P(A C) P(C C) P(G C) P(T C) P(A G) P(C G) P(G G) P(T G) P(A T ) P(C T ) P(G T ) P(T T ) M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

15 The entries of S i are unknown parameters. Example (Group-based models. Kimura 3-parameters) Root node has uniform distribution: π A =... = π T = 0.25 S i = A C G T A C G T ( ai b i c i d i b i a i d i c i c i d i a i b i d i c i b i a i Kimura 2-parameters: b i = d i. Jukes-Cantor: b i = c i = d i. ), a i + b i + c i + d i = 1 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

16 The entries of S i are unknown parameters. Example (Group-based models. Kimura 3-parameters) Root node has uniform distribution: π A =... = π T = 0.25 S i = A C G T A C G T ( ai b i c i d i b i a i d i c i c i d i a i b i d i c i b i a i Kimura 2-parameters: b i = d i. Jukes-Cantor: b i = c i = d i. ), a i + b i + c i + d i = 1 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

17 The entries of S i are unknown parameters. Example (Group-based models. Kimura 3-parameters) Root node has uniform distribution: π A =... = π T = 0.25 S i = A C G T A C G T ( ai b i c i d i b i a i d i c i c i d i a i b i d i c i b i a i Kimura 2-parameters: b i = d i. Jukes-Cantor: b i = c i = d i. ), a i + b i + c i + d i = 1 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

18 Hidden Markov process We denote the joint distribution of the observed variables X 1, X 2, X 3 as p x1 x 2 x 3 = Prob(X 1 = x 1, X 2 = x 2, X 3 = x 3 ). p x1 x 2 x 3 = π yr S 1 (x 1, y r )S 4 (y 4, y r )S 2 (x 2, y 4 )S 3 (x 3, y 4 ) y 4,y r {A,C,G,T} p x1 x 2 x 3 is a homogeneous polynomial on the parameters whose degree is the number of edges. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

19 The evolutionary model defines a polynomial map ϕ : R d R 4n θ = (θ 1,..., θ d ) (p AA...A, p AA...C, p AA...G,..., p TT...T ) ϕ : d 1 4n 1 ϕ : C d C 4n Algebraic variety V = imϕ, closure in Zariski topology. Given an alignment, we can estimate the joint probability p x1,x 2,x 3 as the relative frequency of column x 1 x 2 x 3 in the alignment. In the theoretical model, this would be a point on the variety V. Goal: use the ideal of V, I(V ) R[p AA...A, p AA...C,..., p TT...T ], to infer the topology of the phylogenetic tree. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

20 The evolutionary model defines a polynomial map ϕ : R d R 4n θ = (θ 1,..., θ d ) (p AA...A, p AA...C, p AA...G,..., p TT...T ) ϕ : d 1 4n 1 ϕ : C d C 4n Algebraic variety V = imϕ, closure in Zariski topology. Given an alignment, we can estimate the joint probability p x1,x 2,x 3 as the relative frequency of column x 1 x 2 x 3 in the alignment. In the theoretical model, this would be a point on the variety V. Goal: use the ideal of V, I(V ) R[p AA...A, p AA...C,..., p TT...T ], to infer the topology of the phylogenetic tree. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

21 The evolutionary model defines a polynomial map ϕ : R d R 4n θ = (θ 1,..., θ d ) (p AA...A, p AA...C, p AA...G,..., p TT...T ) ϕ : d 1 4n 1 ϕ : C d C 4n Algebraic variety V = imϕ, closure in Zariski topology. Given an alignment, we can estimate the joint probability p x1,x 2,x 3 as the relative frequency of column x 1 x 2 x 3 in the alignment. In the theoretical model, this would be a point on the variety V. Goal: use the ideal of V, I(V ) R[p AA...A, p AA...C,..., p TT...T ], to infer the topology of the phylogenetic tree. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

22 Computing the ideal (Eriksson Ranestad Sturmfels Sullivant 05) Some generators of I(V ) depend only on the model chosen (not on the topology). E.g. for Jukes-Cantor they are: p x1 x 2 x 3 = 1 p AAA = p CCC = p GGG = p TTT 4 terms p AAC = p AAG = p AAT = = p TTG 12 terms p ACA = p AGA = p ATA = = p TGT 12 terms p CAA = p GAA = p TAA = = p GTT 12 terms p ACG = p ACT = p AGT = = p CGT 24 terms For this model, the unique polynomial that detects the phylogeny of the three species has degree 3. Definition (Cavender Felsenstein 87) The generators of I(V ) are called phylogenetic invariants. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

23 Computing the ideal (Eriksson Ranestad Sturmfels Sullivant 05) Some generators of I(V ) depend only on the model chosen (not on the topology). E.g. for Jukes-Cantor they are: p x1 x 2 x 3 = 1 p AAA = p CCC = p GGG = p TTT 4 terms p AAC = p AAG = p AAT = = p TTG 12 terms p ACA = p AGA = p ATA = = p TGT 12 terms p CAA = p GAA = p TAA = = p GTT 12 terms p ACG = p ACT = p AGT = = p CGT 24 terms For this model, the unique polynomial that detects the phylogeny of the three species has degree 3. Definition (Cavender Felsenstein 87) The generators of I(V ) are called phylogenetic invariants. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

24 Computing the ideal (Eriksson Ranestad Sturmfels Sullivant 05) Some generators of I(V ) depend only on the model chosen (not on the topology). E.g. for Jukes-Cantor they are: p x1 x 2 x 3 = 1 p AAA = p CCC = p GGG = p TTT 4 terms p AAC = p AAG = p AAT = = p TTG 12 terms p ACA = p AGA = p ATA = = p TGT 12 terms p CAA = p GAA = p TAA = = p GTT 12 terms p ACG = p ACT = p AGT = = p CGT 24 terms For this model, the unique polynomial that detects the phylogeny of the three species has degree 3. Definition (Cavender Felsenstein 87) The generators of I(V ) are called phylogenetic invariants. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

25 Problem: computation of invariants Computational algebra software fail to compute the ideal for 4 species! Kimura 3-parameter, 4 species, 8002 generators like: [Small trees webpage] M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

26 Problem: computation of invariants Computational algebra software fail to compute the ideal for 4 species! Kimura 3-parameter, 4 species, 8002 generators like: [Small trees webpage] M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

27 Group-based models. Discrete Fourier transform Group-based models (Kimura and Jukes-Cantor): G = Z 2 Z 2, A=(0,0),C=(0,1),G=(1,0),T=(1,1). Substitution matrices are functions on the group: S i (g p, g c ) = f i (g p g c ), g c, g p G. p g1,...,g m = p(g 1,..., g m ) = 1 4 e edges sum over all possible values at interior nodes. f e (g p(e) g c(e) ) Discrete Fourier transform f : G C ˆf (χ) = χ(g)f (g), χ Hom(G, C ) = G g G Convolution: (f 1 f 2 )(g) = h G f 1 (h)f 2 (g h) = f 1 f 2 = f 1 f 2 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

28 Group based models. Theorem (Evans Speed) For a group-based model (i.e. Kimura 3, Kimura 2 or Jukes-Cantor) on a tree T, the discrete Fourier transform of the joint distribution p(g 1,..., g n ) has the following form q(χ 1,..., χ m ) = e edges f e( l leaves below e χ l ) Using a linear change of coordinates, one gets a monomial parameterization of the variety associated to the model ϕ : C d C 4n θ = ( θ 1,..., θ d ) (q AA...A, q AA...C, q AA...G,..., q TT...T ) so that it is a toric variety. The ideal is generated by binomials. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

29 Fourier transform for Kimura 3-parameter model Parameters S e = a e b e c e d e b e a e d e c e c e d e a e b e d e c e b e a e Fourier parameters Pe = simplex: 3 3 a e+b e+c e+d e=1 P e A =1 PA e PC e PG e PT e P e A =ae +b e +c e +d e, P e C =ae b e +c e d e... coordinates: p x1...x n q g1...g n, g 1 + +g n=0 simplex: 4n 1 4n 1 1 px1...x n = 1 q A...A = 1 Linear invariants: q g1...g n = 0 if g g n 0 in Z 2 Z 2. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

30 For the Kimura 3-parameter model, we are interested in V := im(ϕ) where ϕ : e E(T ) C 4 C 4n 1 (P e A,Pe C,Pe G,Pe T ) e (q x1...x n ) {x1,...,x n x 1 + +x n=0} But 4n 1 1 {q A...A = 1}. Definition The Kimura variety of the phylogenetic tree T is W := V {q A...A = 1}. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

31 Recursive construction of invariants Computational algebra software: even in Fourier parameterization, fail for 5 leaves. Theorem (Sturmfels Sullivant 05) For any group-based model, they give an explicit algorithm for obtaining the generators of the ideal of phylogenetic invariants of an n-leaved tree from those of an unrooted tree of 3 leaves. The generators are binomials of degree 4. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

32 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

33 Phylogenetic inference using algebraic geometry : Biologists claim that phylogenetic invariants are not useful for phylogenetic reconstruction. Lake only used linear invariants! In particular, a work of Huelsenbeck 95 reveals the inefficiency of Lake s method of invariants. (Eriksson 05) for General Markov Model. (C Garcia Sullivant 05) for Kimura 3-parameter. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

34 Phylogenetic inference using algebraic geometry : Biologists claim that phylogenetic invariants are not useful for phylogenetic reconstruction. Lake only used linear invariants! In particular, a work of Huelsenbeck 95 reveals the inefficiency of Lake s method of invariants. (Eriksson 05) for General Markov Model. (C Garcia Sullivant 05) for Kimura 3-parameter. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

35 Phylogenetic inference using algebraic geometry Model: Unrooted tree of 4 leaves, Kimura 3-parameters. Naive algorithm: Take 1 of the evaluation of all invariants on the point given by the relative frequencies of the columns of the alignment. Obtain a score for each possible topology and choose the one with smaller score. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

36 Phylogenetic inference using algebraic geometry Model: Unrooted tree of 4 leaves, Kimura 3-parameters. Naive algorithm: Take 1 of the evaluation of all invariants on the point given by the relative frequencies of the columns of the alignment. Obtain a score for each possible topology and choose the one with smaller score. Huelsenbeck 95: assesses the performance of different phylogenetic reconstruction methods on the following tree space. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

37 Results on simulated data (Huelsenbeck) Huelsenbeck s results: Length 100 Length 500 Length 1000 Lake invariants NJ ML Huelsenbeck, Syst. Biol., 1995 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

38 C Fernandez-Sanchez studies on C Garcia Sullivant method Length 100 Length 500 Length 1000 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

39 Why using phylogenetic invariants 1 We do not need to estimate parameters of the model. 2 The algebraic model allows species within the tree to evolve at different mutation rates In the non-algebraic version, one has: S i = exp(q t i ) where Q is a fixed matrix that represents the instantaneous mutation rate of all the species in the tree. In the algebraic model, one is allowing different rates at each branch of the tree. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

40 For instance using algebraic geometry one should be able to reconstruct the biologically correct tree: dog human chimp rat mouse chicken (Al-Aidroos, Snir 05) chapter 21 in ASCB book. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

41 For instance using algebraic geometry one should be able to reconstruct the biologically correct tree: dog human chimp rat mouse chicken (Al-Aidroos, Snir 05) chapter 21 in ASCB book. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

42 For instance using algebraic geometry one should be able to reconstruct the biologically correct tree: dog human chimp rat mouse chicken dog human chimp rat mouse chicken (Al-Aidroos, Snir 05) chapter 21 in ASCB book. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

43 Simulations on non-homogeneous trees (different rate matrices) M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

44 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

45 The global geometry of the Kimura variety Recall that V := im(ϕ) where ϕ : C 4 C 4n 1 e E(T ) (P e A,Pe C,Pe G,Pe T ) e (q x1...x n ) {x1,...,x n x 1 + +x n=0} But 4n 1 1 {q A...A = 1}. Definition The Kimura variety of the phylogenetic tree T is W := V {q A...A = 1}. W = ϕ( e E(T ) (C4 {P e A = 1})). M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

46 Local complete intersection V A N algebraic variety, µ minimum number of generators of I(V ), then µ codim(v ) = N dim(v ). V is called a complete intersection if µ = codim(v ). Example: the Kimura variety W C 4n 1 1 has codimension 4 n 1 6n + 8, n = number of leaves. For n = 4, the minimum number of generators of the Kimura variety is 8002 whereas its codimension is 48! On a neighborhood of a smooth point, any variety is a local complete intersection (i.e. it can be defined by codim(v ) equations). Goal: 1 Find the singular points. 2 Provide a set of local generators at non-singular points. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

47 Local complete intersection V A N algebraic variety, µ minimum number of generators of I(V ), then µ codim(v ) = N dim(v ). V is called a complete intersection if µ = codim(v ). Example: the Kimura variety W C 4n 1 1 has codimension 4 n 1 6n + 8, n = number of leaves. For n = 4, the minimum number of generators of the Kimura variety is 8002 whereas its codimension is 48! On a neighborhood of a smooth point, any variety is a local complete intersection (i.e. it can be defined by codim(v ) equations). Goal: 1 Find the singular points. 2 Provide a set of local generators at non-singular points. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

48 Local complete intersection V A N algebraic variety, µ minimum number of generators of I(V ), then µ codim(v ) = N dim(v ). V is called a complete intersection if µ = codim(v ). Example: the Kimura variety W C 4n 1 1 has codimension 4 n 1 6n + 8, n = number of leaves. For n = 4, the minimum number of generators of the Kimura variety is 8002 whereas its codimension is 48! On a neighborhood of a smooth point, any variety is a local complete intersection (i.e. it can be defined by codim(v ) equations). Goal: 1 Find the singular points. 2 Provide a set of local generators at non-singular points. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

49 Case n = 3 The parameterization ϕ is: ϕ : C 9 C 15 ((1,P 1 C,P1 G,P1 T ),(1,P2 C,P2 G,P2 T ),(1,P3 C,P3 G,P3 T )) (qxyz=p1 x P2 y P3 z ) {x+y+z=0} Let H = (Z 2 Z 2, ) and let (ε, δ) H act on C 9 sending (P 1, P 2, P 3 ) to ((1,εP 1 C,δP1 G,εδP1 T ),(1,εP2 C,δP2 G,εδP2 T ),(1,εP3 C,δP3 G,εδP3 T )). E.g. (ε, δ) = (1, 1), in probability parameters the group action permutes A C and G T. Proposition The Kimura variety W = im(ϕ) on T 3 is the affine GIT quotient C 9 //H. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

50 Arbitrary n T unrooted n-leaved tree (2n 3 edges, n 2 interior nodes) Extending the action of H to the n 2 interior nodes of T, we have Theorem The Kimura variety W is isomorphic to the affine GIT quotient Corollary (C 3 ) 2n 3 //H n 2 W = im(ϕ), no closure needed. ϕ 1 (q) 4 n 2 and there is just one preimage with biological meaning, q W. We are able to determine the singular points. In particular, biologically meaningful points q W + := ϕ( 3 + ) are not singular. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

51 Arbitrary n T unrooted n-leaved tree (2n 3 edges, n 2 interior nodes) Extending the action of H to the n 2 interior nodes of T, we have Theorem The Kimura variety W is isomorphic to the affine GIT quotient Corollary (C 3 ) 2n 3 //H n 2 W = im(ϕ), no closure needed. ϕ 1 (q) 4 n 2 and there is just one preimage with biological meaning, q W. We are able to determine the singular points. In particular, biologically meaningful points q W + := ϕ( 3 + ) are not singular. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

52 Arbitrary n T unrooted n-leaved tree (2n 3 edges, n 2 interior nodes) Extending the action of H to the n 2 interior nodes of T, we have Theorem The Kimura variety W is isomorphic to the affine GIT quotient Corollary (C 3 ) 2n 3 //H n 2 W = im(ϕ), no closure needed. ϕ 1 (q) 4 n 2 and there is just one preimage with biological meaning, q W. We are able to determine the singular points. In particular, biologically meaningful points q W + := ϕ( 3 + ) are not singular. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

53 Arbitrary n T unrooted n-leaved tree (2n 3 edges, n 2 interior nodes) Extending the action of H to the n 2 interior nodes of T, we have Theorem The Kimura variety W is isomorphic to the affine GIT quotient Corollary (C 3 ) 2n 3 //H n 2 W = im(ϕ), no closure needed. ϕ 1 (q) 4 n 2 and there is just one preimage with biological meaning, q W. We are able to determine the singular points. In particular, biologically meaningful points q W + := ϕ( 3 + ) are not singular. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

54 In Fourier parameters: 3 + = In probability parameters this is transformed into: S = A C G T A C G T ( a b c d b a d c c d a b d c b a ) a + b + c + d = 1 a + b c d > 0 a b + c d > 0 a b c + d > 0 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

55 In Fourier parameters: 3 + = In probability parameters this is transformed into: S = A C G T A C G T ( a b c d b a d c c d a b d c b a ) a + b + c + d = 1 a + b > 1/2 a + c > 1/2 a + d > 1/2 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

56 Local complete intersection Case n = 3 (Sturmfels Sullivant 05): I(V ) is minimally generated by 16 cubics and 18 quartics. Lemma The following six quartics q AAA q ATT q TCG q TGC q ACC q AGG q TAT q TTA, q CCA q CTG q TAT q TGC q CAC q CGT q TCG q TTA, q AGG q ATT q CAC q CCA q AAA q ACC q CGT q CTG, q ACC q ATT q GAG q GGA q AAA q AGG q GCT q GTC, q CAC q CTG q GCT q GGA q CCA q CGT q GAG q GTC, q GGA q GTC q TAT q TCG q GAG q GCT q TGC q TTA generate a local complete intersection that defines W at each point q W +. It does not depend on the point q. Skip proof M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

57 Proof. 1 J = (f 1..., f 6 ) defines a variety X containing W and both have the same dimension. 2 The points q W + are smooth points of X and W. 3 Both varieties coincide locally. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

58 Local complete intersection Arbitrary n quartics: quadrics: Q, non-redundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

59 Local complete intersection Arbitrary n quartics: quadrics: Q, non-redundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

60 Local complete intersection Arbitrary n quartics: J 3 quadrics: Q, non-redundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

61 Local complete intersection Arbitrary n quartics: J 3 quadrics: Q, non-redundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

62 Local complete intersection Arbitrary n quartics: J 3 J n 1 quadrics: Q, non-redundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

63 Local complete intersection Arbitrary n quartics: J 3 J n 1 quadrics: Q, non-redundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

64 Local complete intersection Arbitrary n quartics: J 3 J n 1 quadrics: Q, non-redundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

65 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

66 New results on simulated data 4 leaves Instead of all generators, 48 polynomials that generate locally Length 100 Length 500 Length 1000 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

67 New results on simulated data 4 leaves Instead of all generators, 48 polynomials that generate locally Length 100 Length 500 Length 1000 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

68 New results on simulated data 4 leaves Instead of all generators, 48 polynomials that generate locally Length 100 Length 500 Length 1000 (Eriksson Yao 07) Machine learning approach, 52 invariants that perform better on this parameter space. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36

Phylogenetic invariants versus classical phylogenetics

Phylogenetic invariants versus classical phylogenetics Phylogenetic invariants versus classical phylogenetics Marta Casanellas Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya Algebraic

More information

Practical Bioinformatics

Practical Bioinformatics 5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o

More information

PHYLOGENETIC ALGEBRAIC GEOMETRY

PHYLOGENETIC ALGEBRAIC GEOMETRY PHYLOGENETIC ALGEBRAIC GEOMETRY NICHOLAS ERIKSSON, KRISTIAN RANESTAD, BERND STURMFELS, AND SETH SULLIVANT Abstract. Phylogenetic algebraic geometry is concerned with certain complex projective algebraic

More information

The Generalized Neighbor Joining method

The Generalized Neighbor Joining method The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Mathematics Duke University Joint work with Dan Levy and Lior Pachter www.math.duke.edu/ ruriko data mining 1 Challenge We would like to

More information

Algebraic Statistics Tutorial I

Algebraic Statistics Tutorial I Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University June 9, 2012 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 1 / 34 Introduction to Algebraic Geometry Let R[p] =

More information

Phylogenetic Algebraic Geometry

Phylogenetic Algebraic Geometry Phylogenetic Algebraic Geometry Seth Sullivant North Carolina State University January 4, 2012 Seth Sullivant (NCSU) Phylogenetic Algebraic Geometry January 4, 2012 1 / 28 Phylogenetics Problem Given a

More information

Advanced topics in bioinformatics

Advanced topics in bioinformatics Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib

More information

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.

More information

SUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA

SUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA SUPPORTING INFORMATION FOR SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA Aik T. Ooi, Cliff I. Stains, Indraneel Ghosh *, David J. Segal

More information

Characterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin

Characterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin International Journal of Genetic Engineering and Biotechnology. ISSN 0974-3073 Volume 2, Number 1 (2011), pp. 109-114 International Research Publication House http://www.irphouse.com Characterization of

More information

Regulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)

Regulatory Sequence Analysis. Sequence models (Bernoulli and Markov models) Regulatory Sequence Analysis Sequence models (Bernoulli and Markov models) 1 Why do we need random models? Any pattern discovery relies on an underlying model to estimate the random expectation. This model

More information

High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm

High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm Electronic Supplementary Material (ESI) for Nanoscale. This journal is The Royal Society of Chemistry 2018 High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence

More information

Crick s early Hypothesis Revisited

Crick s early Hypothesis Revisited Crick s early Hypothesis Revisited Or The Existence of a Universal Coding Frame Ryan Rossi, Jean-Louis Lassez and Axel Bernal UPenn Center for Bioinformatics BIOINFORMATICS The application of computer

More information

Supplemental data. Pommerrenig et al. (2011). Plant Cell /tpc

Supplemental data. Pommerrenig et al. (2011). Plant Cell /tpc Supplemental Figure 1. Prediction of phloem-specific MTK1 expression in Arabidopsis shoots and roots. The images and the corresponding numbers showing absolute (A) or relative expression levels (B) of

More information

Clay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.

Clay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. Clay Carter Department of Biology QuickTime and a TIFF (LZW) decompressor are needed to see this picture. Ornamental tobacco

More information

Introduction to Molecular Phylogeny

Introduction to Molecular Phylogeny Introduction to Molecular Phylogeny Starting point: a set of homologous, aligned DNA or protein sequences Result of the process: a tree describing evolutionary relationships between studied sequences =

More information

SSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),

SSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr), 48 3 () Vol. 48 No. 3 2009 5 Journal of Xiamen University (Nat ural Science) May 2009 SSR,,,, 3 (, 361005) : SSR. 21 516,410. 60 %96. 7 %. (),(Between2groups linkage method),.,, 11 (),. 12,. (, ), : 0.

More information

GEOMETRY OF THE KIMURA 3-PARAMETER MODEL arxiv:math/ v1 [math.ag] 27 Feb 2007

GEOMETRY OF THE KIMURA 3-PARAMETER MODEL arxiv:math/ v1 [math.ag] 27 Feb 2007 GEOMETRY OF THE KIMURA 3-PARAMETER MODEL arxiv:math/0702834v1 [math.ag] 27 Feb 2007 MARTA CASANELLAS AND JESÚS FERNÁNDEZ-SÁNCHEZ Abstract. TheKimura3-parametermodelonatreeofnleavesisoneofthemost used in

More information

Supplementary Information for

Supplementary Information for Supplementary Information for Evolutionary conservation of codon optimality reveals hidden signatures of co-translational folding Sebastian Pechmann & Judith Frydman Department of Biology and BioX, Stanford

More information

arxiv: v1 [q-bio.pe] 27 Oct 2011

arxiv: v1 [q-bio.pe] 27 Oct 2011 INVARIANT BASED QUARTET PUZZLING JOE RUSINKO AND BRIAN HIPP arxiv:1110.6194v1 [q-bio.pe] 27 Oct 2011 Abstract. Traditional Quartet Puzzling algorithms use maximum likelihood methods to reconstruct quartet

More information

Why do more divergent sequences produce smaller nonsynonymous/synonymous

Why do more divergent sequences produce smaller nonsynonymous/synonymous Genetics: Early Online, published on June 21, 2013 as 10.1534/genetics.113.152025 Why do more divergent sequences produce smaller nonsynonymous/synonymous rate ratios in pairwise sequence comparisons?

More information

SUPPLEMENTARY DATA - 1 -

SUPPLEMENTARY DATA - 1 - - 1 - SUPPLEMENTARY DATA Construction of B. subtilis rnpb complementation plasmids For complementation, the B. subtilis rnpb wild-type gene (rnpbwt) under control of its native rnpb promoter and terminator

More information

Evolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval

Evolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval Evolvable Neural Networs for Time Series Prediction with Adaptive Learning Interval Dong-Woo Lee *, Seong G. Kong *, and Kwee-Bo Sim ** *Department of Electrical and Computer Engineering, The University

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Evolutionary Analysis of Viral Genomes

Evolutionary Analysis of Viral Genomes University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral

More information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1 Supplementary Figure 1 Zn 2+ -binding sites in USP18. (a) The two molecules of USP18 present in the asymmetric unit are shown. Chain A is shown in blue, chain B in green. Bound Zn 2+ ions are shown as

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Metric learning for phylogenetic invariants

Metric learning for phylogenetic invariants Metric learning for phylogenetic invariants Nicholas Eriksson nke@stanford.edu Department of Statistics, Stanford University, Stanford, CA 94305-4065 Yuan Yao yuany@math.stanford.edu Department of Mathematics,

More information

Introduction to Algebraic Statistics

Introduction to Algebraic Statistics Introduction to Algebraic Statistics Seth Sullivant North Carolina State University January 5, 2017 Seth Sullivant (NCSU) Algebraic Statistics January 5, 2017 1 / 28 What is Algebraic Statistics? Observation

More information

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University Phylogenetics: Parsimony and Likelihood COMP 571 - Spring 2016 Luay Nakhleh, Rice University The Problem Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions

More information

Number-controlled spatial arrangement of gold nanoparticles with

Number-controlled spatial arrangement of gold nanoparticles with Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2016 Number-controlled spatial arrangement of gold nanoparticles with DNA dendrimers Ping Chen,*

More information

NSCI Basic Properties of Life and The Biochemistry of Life on Earth

NSCI Basic Properties of Life and The Biochemistry of Life on Earth NSCI 314 LIFE IN THE COSMOS 4 Basic Properties of Life and The Biochemistry of Life on Earth Dr. Karen Kolehmainen Department of Physics CSUSB http://physics.csusb.edu/~karen/ WHAT IS LIFE? HARD TO DEFINE,

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

TM1 TM2 TM3 TM4 TM5 TM6 TM bp

TM1 TM2 TM3 TM4 TM5 TM6 TM bp a 467 bp 1 482 2 93 3 321 4 7 281 6 21 7 66 8 176 19 12 13 212 113 16 8 b ATG TCA GGA CAT GTA ATG GAG GAA TGT GTA GTT CAC GGT ACG TTA GCG GCA GTA TTG CGT TTA ATG GGC GTA GTG M S G H V M E E C V V H G T

More information

3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies

3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies Richard Owen (1848) introduced the term Homology to refer to structural similarities among organisms. To Owen, these similarities indicated that organisms were created following a common plan or archetype.

More information

Electronic supplementary material

Electronic supplementary material Applied Microbiology and Biotechnology Electronic supplementary material A family of AA9 lytic polysaccharide monooxygenases in Aspergillus nidulans is differentially regulated by multiple substrates and

More information

Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R

Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R AAC MGG ATT AGA TAC CCK G GGY TAC CTT GTT ACG ACT T Detection of Candidatus

More information

Algebraic Invariants of Phylogenetic Trees

Algebraic Invariants of Phylogenetic Trees Harvey Mudd College November 21st, 2006 The Problem DNA Sequence Data Given a set of aligned DNA sequences... Human: A G A C G T T A C G T A... Chimp: A G A G C A A C T T T G... Monkey: A A A C G A T A

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

The Mathematics of Phylogenomics

The Mathematics of Phylogenomics arxiv:math/0409132v2 [math.st] 27 Sep 2005 The Mathematics of Phylogenomics Lior Pachter and Bernd Sturmfels Department of Mathematics, UC Berkeley [lpachter,bernd]@math.berkeley.edu February 1, 2008 The

More information

Open Problems in Algebraic Statistics

Open Problems in Algebraic Statistics Open Problems inalgebraic Statistics p. Open Problems in Algebraic Statistics BERND STURMFELS UNIVERSITY OF CALIFORNIA, BERKELEY and TECHNISCHE UNIVERSITÄT BERLIN Advertisement Oberwolfach Seminar Algebraic

More information

Supporting Information

Supporting Information Supporting Information T. Pellegrino 1,2,3,#, R. A. Sperling 1,#, A. P. Alivisatos 2, W. J. Parak 1,2,* 1 Center for Nanoscience, Ludwig Maximilians Universität München, München, Germany 2 Department of

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION DOI:.38/NCHEM.246 Optimizing the specificity of nucleic acid hyridization David Yu Zhang, Sherry Xi Chen, and Peng Yin. Analytic framework and proe design 3.. Concentration-adjusted

More information

Example: Hardy-Weinberg Equilibrium. Algebraic Statistics Tutorial I. Phylogenetics. Main Point of This Tutorial. Model-Based Phylogenetics

Example: Hardy-Weinberg Equilibrium. Algebraic Statistics Tutorial I. Phylogenetics. Main Point of This Tutorial. Model-Based Phylogenetics Example: Hardy-Weinberg Equilibrium Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University July 22, 2012 Suppose a gene has two alleles, a and A. If allele a occurs in the population

More information

Phylogeny: traditional and Bayesian approaches

Phylogeny: traditional and Bayesian approaches Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent

More information

Geometry of Phylogenetic Inference

Geometry of Phylogenetic Inference Geometry of Phylogenetic Inference Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 References N. Eriksson, K. Ranestad, B. Sturmfels, S. Sullivant, Phylogenetic algebraic

More information

Supporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)-

Supporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)- Supporting Information for Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)- Dependence and Its Ability to Chelate Multiple Nutrient Transition Metal Ions Rose C. Hadley,

More information

Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy

Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy Supporting Information Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy Cuichen Wu,, Da Han,, Tao Chen,, Lu Peng, Guizhi Zhu,, Mingxu You,, Liping Qiu,, Kwame Sefah,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION DOI:.8/NCHEM. Conditionally Fluorescent Molecular Probes for Detecting Single Base Changes in Double-stranded DNA Sherry Xi Chen, David Yu Zhang, Georg Seelig. Analytic framework and probe design.. Design

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

Modelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics

Modelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics 582746 Modelling and Analysis in Bioinformatics Lecture 1: Genomic k-mer Statistics Juha Kärkkäinen 06.09.2016 Outline Course introduction Genomic k-mers 1-Mers 2-Mers 3-Mers k-mers for Larger k Outline

More information

The Trigram and other Fundamental Philosophies

The Trigram and other Fundamental Philosophies The Trigram and other Fundamental Philosophies by Weimin Kwauk July 2012 The following offers a minimal introduction to the trigram and other Chinese fundamental philosophies. A trigram consists of three

More information

Supplemental Table 1. Primers used for cloning and PCR amplification in this study

Supplemental Table 1. Primers used for cloning and PCR amplification in this study Supplemental Table 1. Primers used for cloning and PCR amplification in this study Target Gene Primer sequence NATA1 (At2g393) forward GGG GAC AAG TTT GTA CAA AAA AGC AGG CTT CAT GGC GCC TCC AAC CGC AGC

More information

Molecular Evolution and Phylogenetic Tree Reconstruction

Molecular Evolution and Phylogenetic Tree Reconstruction 1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

More information

Protein Threading. Combinatorial optimization approach. Stefan Balev.

Protein Threading. Combinatorial optimization approach. Stefan Balev. Protein Threading Combinatorial optimization approach Stefan Balev Stefan.Balev@univ-lehavre.fr Laboratoire d informatique du Havre Université du Havre Stefan Balev Cours DEA 30/01/2004 p.1/42 Outline

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Probabilistic modeling and molecular phylogeny

Probabilistic modeling and molecular phylogeny Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical

More information

19 Tree Construction using Singular Value Decomposition

19 Tree Construction using Singular Value Decomposition 19 Tree Construction using Singular Value Decomposition Nicholas Eriksson We present a new, statistically consistent algorithm for phylogenetic tree construction that uses the algeraic theory of statistical

More information

Phylogeny of Mixture Models

Phylogeny of Mixture Models Phylogeny of Mixture Models Daniel Štefankovič Department of Computer Science University of Rochester joint work with Eric Vigoda College of Computing Georgia Institute of Technology Outline Introduction

More information

Supplemental Figure 1.

Supplemental Figure 1. A wt spoiiiaδ spoiiiahδ bofaδ B C D E spoiiiaδ, bofaδ Supplemental Figure 1. GFP-SpoIVFA is more mislocalized in the absence of both BofA and SpoIIIAH. Sporulation was induced by resuspension in wild-type

More information

ELIZABETH S. ALLMAN and JOHN A. RHODES ABSTRACT 1. INTRODUCTION

ELIZABETH S. ALLMAN and JOHN A. RHODES ABSTRACT 1. INTRODUCTION JOURNAL OF COMPUTATIONAL BIOLOGY Volume 13, Number 5, 2006 Mary Ann Liebert, Inc. Pp. 1101 1113 The Identifiability of Tree Topology for Phylogenetic Models, Including Covarion and Mixture Models ELIZABETH

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

arxiv: v1 [q-bio.pe] 23 Nov 2017

arxiv: v1 [q-bio.pe] 23 Nov 2017 DIMENSIONS OF GROUP-BASED PHYLOGENETIC MIXTURES arxiv:1711.08686v1 [q-bio.pe] 23 Nov 2017 HECTOR BAÑOS, NATHANIEL BUSHEK, RUTH DAVIDSON, ELIZABETH GROSS, PAMELA E. HARRIS, ROBERT KRONE, COLBY LONG, ALLEN

More information

Phylogenetics: Parsimony

Phylogenetics: Parsimony 1 Phylogenetics: Parsimony COMP 571 Luay Nakhleh, Rice University he Problem 2 Input: Multiple alignment of a set S of sequences Output: ree leaf-labeled with S Assumptions Characters are mutually independent

More information

Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi

Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi Supporting Information http://www.genetics.org/cgi/content/full/genetics.110.116228/dc1 Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi Ben

More information

Counting phylogenetic invariants in some simple cases. Joseph Felsenstein. Department of Genetics SK-50. University of Washington

Counting phylogenetic invariants in some simple cases. Joseph Felsenstein. Department of Genetics SK-50. University of Washington Counting phylogenetic invariants in some simple cases Joseph Felsenstein Department of Genetics SK-50 University of Washington Seattle, Washington 98195 Running Headline: Counting Phylogenetic Invariants

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

Reading for Lecture 13 Release v10

Reading for Lecture 13 Release v10 Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................

More information

Supplementary Information

Supplementary Information Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2014 Directed self-assembly of genomic sequences into monomeric and polymeric branched DNA structures

More information

Evidence for Evolution: Change Over Time (Make Up Assignment)

Evidence for Evolution: Change Over Time (Make Up Assignment) Lesson 7.2 Evidence for Evolution: Change Over Time (Make Up Assignment) Name Date Period Key Terms Adaptive radiation Molecular Record Vestigial organ Homologous structure Strata Divergent evolution Evolution

More information

Codon Distribution in Error-Detecting Circular Codes

Codon Distribution in Error-Detecting Circular Codes life Article Codon Distribution in Error-Detecting Circular Codes Elena Fimmel, * and Lutz Strüngmann Institute for Mathematical Biology, Faculty of Computer Science, Mannheim University of Applied Sciences,

More information

The role of the FliD C-terminal domain in pentamer formation and

The role of the FliD C-terminal domain in pentamer formation and The role of the FliD C-terminal domain in pentamer formation and interaction with FliT Hee Jung Kim 1,2,*, Woongjae Yoo 3,*, Kyeong Sik Jin 4, Sangryeol Ryu 3,5 & Hyung Ho Lee 1, 1 Department of Chemistry,

More information

A new parametrization for binary hidden Markov modes

A new parametrization for binary hidden Markov modes A new parametrization for binary hidden Markov models Andrew Critch, UC Berkeley at Pennsylvania State University June 11, 2012 See Binary hidden Markov models and varieties [, 2012], arxiv:1206.0500,

More information

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton. 2017-07-29 part 4: and biological inference review types of models phenomenological Newton F= Gm1m2 r2 mechanistic Einstein Gαβ = 8π Tαβ 1 molecular evolution is process and pattern process pattern MutSel

More information

The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded DNA Sequence Is Self-Designed as a Numerical Whole

The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded DNA Sequence Is Self-Designed as a Numerical Whole Applied Mathematics, 2013, 4, 37-53 http://dx.doi.org/10.4236/am.2013.410a2004 Published Online October 2013 (http://www.scirp.org/journal/am) The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded

More information

USING INVARIANTS FOR PHYLOGENETIC TREE CONSTRUCTION

USING INVARIANTS FOR PHYLOGENETIC TREE CONSTRUCTION USING INVARIANTS FOR PHYLOGENETIC TREE CONSTRUCTION NICHOLAS ERIKSSON Abstract. Phylogenetic invariants are certain polynomials in the joint probability distribution of a Markov model on a phylogenetic

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

part 3: analysis of natural selection pressure

part 3: analysis of natural selection pressure part 3: analysis of natural selection pressure markov models are good phenomenological codon models do have many benefits: o principled framework for statistical inference o avoiding ad hoc corrections

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

BMI/CS 776 Lecture 4. Colin Dewey

BMI/CS 776 Lecture 4. Colin Dewey BMI/CS 776 Lecture 4 Colin Dewey 2007.02.01 Outline Common nucleotide substitution models Directed graphical models Ancestral sequence inference Poisson process continuous Markov process X t0 X t1 X t2

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). 1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

evoglow - express N kit distributed by Cat.#: FP product information broad host range vectors - gram negative bacteria

evoglow - express N kit distributed by Cat.#: FP product information broad host range vectors - gram negative bacteria evoglow - express N kit broad host range vectors - gram negative bacteria product information distributed by Cat.#: FP-21020 Content: Product Overview... 3 evoglow express N -kit... 3 The evoglow -Fluorescent

More information

The Algebraic Degree of Semidefinite Programming

The Algebraic Degree of Semidefinite Programming The Algebraic Degree ofsemidefinite Programming p. The Algebraic Degree of Semidefinite Programming BERND STURMFELS UNIVERSITY OF CALIFORNIA, BERKELEY joint work with and Jiawang Nie (IMA Minneapolis)

More information

Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila

Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila biorxiv preprint first posted online May. 3, 2016; doi: http://dx.doi.org/10.1101/051557. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved.

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Molecular Evolution, course # Final Exam, May 3, 2006

Molecular Evolution, course # Final Exam, May 3, 2006 Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least

More information

evoglow - express N kit Cat. No.: product information broad host range vectors - gram negative bacteria

evoglow - express N kit Cat. No.: product information broad host range vectors - gram negative bacteria evoglow - express N kit broad host range vectors - gram negative bacteria product information Cat. No.: 2.1.020 evocatal GmbH 2 Content: Product Overview... 4 evoglow express N kit... 4 The evoglow Fluorescent

More information

Commuting birth-and-death processes

Commuting birth-and-death processes Commuting birth-and-death processes Caroline Uhler Department of Statistics UC Berkeley (joint work with Steven N. Evans and Bernd Sturmfels) MSRI Workshop on Algebraic Statistics December 18, 2008 Birth-and-death

More information

Mathematical Biology. Phylogenetic mixtures and linear invariants for equal input models. B Mike Steel. Marta Casanellas 1 Mike Steel 2

Mathematical Biology. Phylogenetic mixtures and linear invariants for equal input models. B Mike Steel. Marta Casanellas 1 Mike Steel 2 J. Math. Biol. DOI 0.007/s00285-06-055-8 Mathematical Biology Phylogenetic mixtures and linear invariants for equal input models Marta Casanellas Mike Steel 2 Received: 5 February 206 / Revised: July 206

More information

arxiv: v5 [q-bio.pe] 24 Oct 2016

arxiv: v5 [q-bio.pe] 24 Oct 2016 On the Quirks of Maximum Parsimony and Likelihood on Phylogenetic Networks Christopher Bryant a, Mareike Fischer b, Simone Linz c, Charles Semple d arxiv:1505.06898v5 [q-bio.pe] 24 Oct 2016 a Statistics

More information

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information

ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line

ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line PRODUCT DATASHEET ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line CATALOG NUMBER: HTS137C CONTENTS: 2 vials of mycoplasma-free cells, 1 ml per vial. STORAGE: Vials are to be stored in liquid N

More information

Re- engineering cellular physiology by rewiring high- level global regulatory genes

Re- engineering cellular physiology by rewiring high- level global regulatory genes Re- engineering cellular physiology by rewiring high- level global regulatory genes Stephen Fitzgerald 1,2,, Shane C Dillon 1, Tzu- Chiao Chao 2, Heather L Wiencko 3, Karsten Hokamp 3, Andrew DS Cameron

More information