Using algebraic geometry for phylogenetic reconstruction


 Molly McLaughlin
 2 years ago
 Views:
Transcription
1 Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús FernándezSánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA Workshop Applications in Biology, Dynamics, and Statistics March 8, 2007 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
2 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
3 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
4 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
5 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
6 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
7 Phylogenetic reconstruction Given n current species, e.g. HUMAN, GORILLA, CHIMP, given part of their genome and an alignment alignment Goal: to reconstruct their ancestral relationships (phylogeny): M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
8 Phylogenetic reconstruction Given n current species, e.g. HUMAN, GORILLA, CHIMP, given part of their genome and an alignment alignment Goal: to reconstruct their ancestral relationships (phylogeny): M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
9 Alignment Mutations, deletions and insertions of nucleotides occur along the speciation process. Given two DNA sequences, an alignment is a correspondence between them that accounts for their differences. The optimal alignment is the one that minimizes the number of mutations, deletions and insertions. seq 1 : ACGTAGCTAAGTTA... seq 2 : ACCGAGACCCAGTA... A possible alignment is: seq 1 A C G T A G C T A A G T T A seq 2 A C C G A G A C C C A G T A M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
10 Alignment Mutations, deletions and insertions of nucleotides occur along the speciation process. Given two DNA sequences, an alignment is a correspondence between them that accounts for their differences. The optimal alignment is the one that minimizes the number of mutations, deletions and insertions. seq 1 : ACGTAGCTAAGTTA... seq 2 : ACCGAGACCCAGTA... A possible alignment is: seq 1 A C G T A G C T A A G T T A seq 2 A C C G A G A C C C A G T A M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
11 Phylogenetic reconstruction Given n current species, e.g. HUMAN, GORILLA, CHIMP, given part of their genome and an alignment alignment AACTTCGAGGCTTACCGCTG AAGGTCGATGCTCACCGATG AACGTCTATGCTCACCGATG Goal: to reconstruct their ancestral relationships (phylogeny): M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
12 Phylogenetic reconstruction Given n current species, e.g. HUMAN, GORILLA, CHIMP, given part of their genome and an alignment alignment AACTTCGAGGCTTACCGCTG AAGGTCGATGCTCACCGATG AACGTCTATGCTCACCGATG Goal: to reconstruct their ancestral relationships (phylogeny): M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
13 Algebraic evolutionary models Assume that all sites of the alignment evolve equally and independently. At each node of the tree we put a random variable taking values in {A,C,G,T} t i = branch length (represents the number of mutations per site along that branch) Variables at the leaves are observed and variables at the interior nodes are hidden. At each branch we write a matrix (substitution matrix) with the probabilities of a nucleotide at the parent node being substituted by another at its child. S = A C G T A C G T P(A A) P(C A) P(G A) P(T A) P(A C) P(C C) P(G C) P(T C) P(A G) P(C G) P(G G) P(T G) P(A T ) P(C T ) P(G T ) P(T T ) M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
14 Algebraic evolutionary models Assume that all sites of the alignment evolve equally and independently. At each node of the tree we put a random variable taking values in {A,C,G,T} t i = branch length (represents the number of mutations per site along that branch) Variables at the leaves are observed and variables at the interior nodes are hidden. At each branch we write a matrix (substitution matrix) with the probabilities of a nucleotide at the parent node being substituted by another at its child. S = A C G T A C G T P(A A) P(C A) P(G A) P(T A) P(A C) P(C C) P(G C) P(T C) P(A G) P(C G) P(G G) P(T G) P(A T ) P(C T ) P(G T ) P(T T ) M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
15 The entries of S i are unknown parameters. Example (Groupbased models. Kimura 3parameters) Root node has uniform distribution: π A =... = π T = 0.25 S i = A C G T A C G T ( ai b i c i d i b i a i d i c i c i d i a i b i d i c i b i a i Kimura 2parameters: b i = d i. JukesCantor: b i = c i = d i. ), a i + b i + c i + d i = 1 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
16 The entries of S i are unknown parameters. Example (Groupbased models. Kimura 3parameters) Root node has uniform distribution: π A =... = π T = 0.25 S i = A C G T A C G T ( ai b i c i d i b i a i d i c i c i d i a i b i d i c i b i a i Kimura 2parameters: b i = d i. JukesCantor: b i = c i = d i. ), a i + b i + c i + d i = 1 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
17 The entries of S i are unknown parameters. Example (Groupbased models. Kimura 3parameters) Root node has uniform distribution: π A =... = π T = 0.25 S i = A C G T A C G T ( ai b i c i d i b i a i d i c i c i d i a i b i d i c i b i a i Kimura 2parameters: b i = d i. JukesCantor: b i = c i = d i. ), a i + b i + c i + d i = 1 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
18 Hidden Markov process We denote the joint distribution of the observed variables X 1, X 2, X 3 as p x1 x 2 x 3 = Prob(X 1 = x 1, X 2 = x 2, X 3 = x 3 ). p x1 x 2 x 3 = π yr S 1 (x 1, y r )S 4 (y 4, y r )S 2 (x 2, y 4 )S 3 (x 3, y 4 ) y 4,y r {A,C,G,T} p x1 x 2 x 3 is a homogeneous polynomial on the parameters whose degree is the number of edges. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
19 The evolutionary model defines a polynomial map ϕ : R d R 4n θ = (θ 1,..., θ d ) (p AA...A, p AA...C, p AA...G,..., p TT...T ) ϕ : d 1 4n 1 ϕ : C d C 4n Algebraic variety V = imϕ, closure in Zariski topology. Given an alignment, we can estimate the joint probability p x1,x 2,x 3 as the relative frequency of column x 1 x 2 x 3 in the alignment. In the theoretical model, this would be a point on the variety V. Goal: use the ideal of V, I(V ) R[p AA...A, p AA...C,..., p TT...T ], to infer the topology of the phylogenetic tree. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
20 The evolutionary model defines a polynomial map ϕ : R d R 4n θ = (θ 1,..., θ d ) (p AA...A, p AA...C, p AA...G,..., p TT...T ) ϕ : d 1 4n 1 ϕ : C d C 4n Algebraic variety V = imϕ, closure in Zariski topology. Given an alignment, we can estimate the joint probability p x1,x 2,x 3 as the relative frequency of column x 1 x 2 x 3 in the alignment. In the theoretical model, this would be a point on the variety V. Goal: use the ideal of V, I(V ) R[p AA...A, p AA...C,..., p TT...T ], to infer the topology of the phylogenetic tree. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
21 The evolutionary model defines a polynomial map ϕ : R d R 4n θ = (θ 1,..., θ d ) (p AA...A, p AA...C, p AA...G,..., p TT...T ) ϕ : d 1 4n 1 ϕ : C d C 4n Algebraic variety V = imϕ, closure in Zariski topology. Given an alignment, we can estimate the joint probability p x1,x 2,x 3 as the relative frequency of column x 1 x 2 x 3 in the alignment. In the theoretical model, this would be a point on the variety V. Goal: use the ideal of V, I(V ) R[p AA...A, p AA...C,..., p TT...T ], to infer the topology of the phylogenetic tree. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
22 Computing the ideal (Eriksson Ranestad Sturmfels Sullivant 05) Some generators of I(V ) depend only on the model chosen (not on the topology). E.g. for JukesCantor they are: p x1 x 2 x 3 = 1 p AAA = p CCC = p GGG = p TTT 4 terms p AAC = p AAG = p AAT = = p TTG 12 terms p ACA = p AGA = p ATA = = p TGT 12 terms p CAA = p GAA = p TAA = = p GTT 12 terms p ACG = p ACT = p AGT = = p CGT 24 terms For this model, the unique polynomial that detects the phylogeny of the three species has degree 3. Definition (Cavender Felsenstein 87) The generators of I(V ) are called phylogenetic invariants. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
23 Computing the ideal (Eriksson Ranestad Sturmfels Sullivant 05) Some generators of I(V ) depend only on the model chosen (not on the topology). E.g. for JukesCantor they are: p x1 x 2 x 3 = 1 p AAA = p CCC = p GGG = p TTT 4 terms p AAC = p AAG = p AAT = = p TTG 12 terms p ACA = p AGA = p ATA = = p TGT 12 terms p CAA = p GAA = p TAA = = p GTT 12 terms p ACG = p ACT = p AGT = = p CGT 24 terms For this model, the unique polynomial that detects the phylogeny of the three species has degree 3. Definition (Cavender Felsenstein 87) The generators of I(V ) are called phylogenetic invariants. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
24 Computing the ideal (Eriksson Ranestad Sturmfels Sullivant 05) Some generators of I(V ) depend only on the model chosen (not on the topology). E.g. for JukesCantor they are: p x1 x 2 x 3 = 1 p AAA = p CCC = p GGG = p TTT 4 terms p AAC = p AAG = p AAT = = p TTG 12 terms p ACA = p AGA = p ATA = = p TGT 12 terms p CAA = p GAA = p TAA = = p GTT 12 terms p ACG = p ACT = p AGT = = p CGT 24 terms For this model, the unique polynomial that detects the phylogeny of the three species has degree 3. Definition (Cavender Felsenstein 87) The generators of I(V ) are called phylogenetic invariants. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
25 Problem: computation of invariants Computational algebra software fail to compute the ideal for 4 species! Kimura 3parameter, 4 species, 8002 generators like: [Small trees webpage] M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
26 Problem: computation of invariants Computational algebra software fail to compute the ideal for 4 species! Kimura 3parameter, 4 species, 8002 generators like: [Small trees webpage] M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
27 Groupbased models. Discrete Fourier transform Groupbased models (Kimura and JukesCantor): G = Z 2 Z 2, A=(0,0),C=(0,1),G=(1,0),T=(1,1). Substitution matrices are functions on the group: S i (g p, g c ) = f i (g p g c ), g c, g p G. p g1,...,g m = p(g 1,..., g m ) = 1 4 e edges sum over all possible values at interior nodes. f e (g p(e) g c(e) ) Discrete Fourier transform f : G C ˆf (χ) = χ(g)f (g), χ Hom(G, C ) = G g G Convolution: (f 1 f 2 )(g) = h G f 1 (h)f 2 (g h) = f 1 f 2 = f 1 f 2 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
28 Group based models. Theorem (Evans Speed) For a groupbased model (i.e. Kimura 3, Kimura 2 or JukesCantor) on a tree T, the discrete Fourier transform of the joint distribution p(g 1,..., g n ) has the following form q(χ 1,..., χ m ) = e edges f e( l leaves below e χ l ) Using a linear change of coordinates, one gets a monomial parameterization of the variety associated to the model ϕ : C d C 4n θ = ( θ 1,..., θ d ) (q AA...A, q AA...C, q AA...G,..., q TT...T ) so that it is a toric variety. The ideal is generated by binomials. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
29 Fourier transform for Kimura 3parameter model Parameters S e = a e b e c e d e b e a e d e c e c e d e a e b e d e c e b e a e Fourier parameters Pe = simplex: 3 3 a e+b e+c e+d e=1 P e A =1 PA e PC e PG e PT e P e A =ae +b e +c e +d e, P e C =ae b e +c e d e... coordinates: p x1...x n q g1...g n, g 1 + +g n=0 simplex: 4n 1 4n 1 1 px1...x n = 1 q A...A = 1 Linear invariants: q g1...g n = 0 if g g n 0 in Z 2 Z 2. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
30 For the Kimura 3parameter model, we are interested in V := im(ϕ) where ϕ : e E(T ) C 4 C 4n 1 (P e A,Pe C,Pe G,Pe T ) e (q x1...x n ) {x1,...,x n x 1 + +x n=0} But 4n 1 1 {q A...A = 1}. Definition The Kimura variety of the phylogenetic tree T is W := V {q A...A = 1}. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
31 Recursive construction of invariants Computational algebra software: even in Fourier parameterization, fail for 5 leaves. Theorem (Sturmfels Sullivant 05) For any groupbased model, they give an explicit algorithm for obtaining the generators of the ideal of phylogenetic invariants of an nleaved tree from those of an unrooted tree of 3 leaves. The generators are binomials of degree 4. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
32 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
33 Phylogenetic inference using algebraic geometry : Biologists claim that phylogenetic invariants are not useful for phylogenetic reconstruction. Lake only used linear invariants! In particular, a work of Huelsenbeck 95 reveals the inefficiency of Lake s method of invariants. (Eriksson 05) for General Markov Model. (C Garcia Sullivant 05) for Kimura 3parameter. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
34 Phylogenetic inference using algebraic geometry : Biologists claim that phylogenetic invariants are not useful for phylogenetic reconstruction. Lake only used linear invariants! In particular, a work of Huelsenbeck 95 reveals the inefficiency of Lake s method of invariants. (Eriksson 05) for General Markov Model. (C Garcia Sullivant 05) for Kimura 3parameter. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
35 Phylogenetic inference using algebraic geometry Model: Unrooted tree of 4 leaves, Kimura 3parameters. Naive algorithm: Take 1 of the evaluation of all invariants on the point given by the relative frequencies of the columns of the alignment. Obtain a score for each possible topology and choose the one with smaller score. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
36 Phylogenetic inference using algebraic geometry Model: Unrooted tree of 4 leaves, Kimura 3parameters. Naive algorithm: Take 1 of the evaluation of all invariants on the point given by the relative frequencies of the columns of the alignment. Obtain a score for each possible topology and choose the one with smaller score. Huelsenbeck 95: assesses the performance of different phylogenetic reconstruction methods on the following tree space. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
37 Results on simulated data (Huelsenbeck) Huelsenbeck s results: Length 100 Length 500 Length 1000 Lake invariants NJ ML Huelsenbeck, Syst. Biol., 1995 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
38 C FernandezSanchez studies on C Garcia Sullivant method Length 100 Length 500 Length 1000 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
39 Why using phylogenetic invariants 1 We do not need to estimate parameters of the model. 2 The algebraic model allows species within the tree to evolve at different mutation rates In the nonalgebraic version, one has: S i = exp(q t i ) where Q is a fixed matrix that represents the instantaneous mutation rate of all the species in the tree. In the algebraic model, one is allowing different rates at each branch of the tree. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
40 For instance using algebraic geometry one should be able to reconstruct the biologically correct tree: dog human chimp rat mouse chicken (AlAidroos, Snir 05) chapter 21 in ASCB book. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
41 For instance using algebraic geometry one should be able to reconstruct the biologically correct tree: dog human chimp rat mouse chicken (AlAidroos, Snir 05) chapter 21 in ASCB book. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
42 For instance using algebraic geometry one should be able to reconstruct the biologically correct tree: dog human chimp rat mouse chicken dog human chimp rat mouse chicken (AlAidroos, Snir 05) chapter 21 in ASCB book. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
43 Simulations on nonhomogeneous trees (different rate matrices) M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
44 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
45 The global geometry of the Kimura variety Recall that V := im(ϕ) where ϕ : C 4 C 4n 1 e E(T ) (P e A,Pe C,Pe G,Pe T ) e (q x1...x n ) {x1,...,x n x 1 + +x n=0} But 4n 1 1 {q A...A = 1}. Definition The Kimura variety of the phylogenetic tree T is W := V {q A...A = 1}. W = ϕ( e E(T ) (C4 {P e A = 1})). M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
46 Local complete intersection V A N algebraic variety, µ minimum number of generators of I(V ), then µ codim(v ) = N dim(v ). V is called a complete intersection if µ = codim(v ). Example: the Kimura variety W C 4n 1 1 has codimension 4 n 1 6n + 8, n = number of leaves. For n = 4, the minimum number of generators of the Kimura variety is 8002 whereas its codimension is 48! On a neighborhood of a smooth point, any variety is a local complete intersection (i.e. it can be defined by codim(v ) equations). Goal: 1 Find the singular points. 2 Provide a set of local generators at nonsingular points. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
47 Local complete intersection V A N algebraic variety, µ minimum number of generators of I(V ), then µ codim(v ) = N dim(v ). V is called a complete intersection if µ = codim(v ). Example: the Kimura variety W C 4n 1 1 has codimension 4 n 1 6n + 8, n = number of leaves. For n = 4, the minimum number of generators of the Kimura variety is 8002 whereas its codimension is 48! On a neighborhood of a smooth point, any variety is a local complete intersection (i.e. it can be defined by codim(v ) equations). Goal: 1 Find the singular points. 2 Provide a set of local generators at nonsingular points. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
48 Local complete intersection V A N algebraic variety, µ minimum number of generators of I(V ), then µ codim(v ) = N dim(v ). V is called a complete intersection if µ = codim(v ). Example: the Kimura variety W C 4n 1 1 has codimension 4 n 1 6n + 8, n = number of leaves. For n = 4, the minimum number of generators of the Kimura variety is 8002 whereas its codimension is 48! On a neighborhood of a smooth point, any variety is a local complete intersection (i.e. it can be defined by codim(v ) equations). Goal: 1 Find the singular points. 2 Provide a set of local generators at nonsingular points. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
49 Case n = 3 The parameterization ϕ is: ϕ : C 9 C 15 ((1,P 1 C,P1 G,P1 T ),(1,P2 C,P2 G,P2 T ),(1,P3 C,P3 G,P3 T )) (qxyz=p1 x P2 y P3 z ) {x+y+z=0} Let H = (Z 2 Z 2, ) and let (ε, δ) H act on C 9 sending (P 1, P 2, P 3 ) to ((1,εP 1 C,δP1 G,εδP1 T ),(1,εP2 C,δP2 G,εδP2 T ),(1,εP3 C,δP3 G,εδP3 T )). E.g. (ε, δ) = (1, 1), in probability parameters the group action permutes A C and G T. Proposition The Kimura variety W = im(ϕ) on T 3 is the affine GIT quotient C 9 //H. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
50 Arbitrary n T unrooted nleaved tree (2n 3 edges, n 2 interior nodes) Extending the action of H to the n 2 interior nodes of T, we have Theorem The Kimura variety W is isomorphic to the affine GIT quotient Corollary (C 3 ) 2n 3 //H n 2 W = im(ϕ), no closure needed. ϕ 1 (q) 4 n 2 and there is just one preimage with biological meaning, q W. We are able to determine the singular points. In particular, biologically meaningful points q W + := ϕ( 3 + ) are not singular. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
51 Arbitrary n T unrooted nleaved tree (2n 3 edges, n 2 interior nodes) Extending the action of H to the n 2 interior nodes of T, we have Theorem The Kimura variety W is isomorphic to the affine GIT quotient Corollary (C 3 ) 2n 3 //H n 2 W = im(ϕ), no closure needed. ϕ 1 (q) 4 n 2 and there is just one preimage with biological meaning, q W. We are able to determine the singular points. In particular, biologically meaningful points q W + := ϕ( 3 + ) are not singular. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
52 Arbitrary n T unrooted nleaved tree (2n 3 edges, n 2 interior nodes) Extending the action of H to the n 2 interior nodes of T, we have Theorem The Kimura variety W is isomorphic to the affine GIT quotient Corollary (C 3 ) 2n 3 //H n 2 W = im(ϕ), no closure needed. ϕ 1 (q) 4 n 2 and there is just one preimage with biological meaning, q W. We are able to determine the singular points. In particular, biologically meaningful points q W + := ϕ( 3 + ) are not singular. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
53 Arbitrary n T unrooted nleaved tree (2n 3 edges, n 2 interior nodes) Extending the action of H to the n 2 interior nodes of T, we have Theorem The Kimura variety W is isomorphic to the affine GIT quotient Corollary (C 3 ) 2n 3 //H n 2 W = im(ϕ), no closure needed. ϕ 1 (q) 4 n 2 and there is just one preimage with biological meaning, q W. We are able to determine the singular points. In particular, biologically meaningful points q W + := ϕ( 3 + ) are not singular. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
54 In Fourier parameters: 3 + = In probability parameters this is transformed into: S = A C G T A C G T ( a b c d b a d c c d a b d c b a ) a + b + c + d = 1 a + b c d > 0 a b + c d > 0 a b c + d > 0 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
55 In Fourier parameters: 3 + = In probability parameters this is transformed into: S = A C G T A C G T ( a b c d b a d c c d a b d c b a ) a + b + c + d = 1 a + b > 1/2 a + c > 1/2 a + d > 1/2 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
56 Local complete intersection Case n = 3 (Sturmfels Sullivant 05): I(V ) is minimally generated by 16 cubics and 18 quartics. Lemma The following six quartics q AAA q ATT q TCG q TGC q ACC q AGG q TAT q TTA, q CCA q CTG q TAT q TGC q CAC q CGT q TCG q TTA, q AGG q ATT q CAC q CCA q AAA q ACC q CGT q CTG, q ACC q ATT q GAG q GGA q AAA q AGG q GCT q GTC, q CAC q CTG q GCT q GGA q CCA q CGT q GAG q GTC, q GGA q GTC q TAT q TCG q GAG q GCT q TGC q TTA generate a local complete intersection that defines W at each point q W +. It does not depend on the point q. Skip proof M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
57 Proof. 1 J = (f 1..., f 6 ) defines a variety X containing W and both have the same dimension. 2 The points q W + are smooth points of X and W. 3 Both varieties coincide locally. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
58 Local complete intersection Arbitrary n quartics: quadrics: Q, nonredundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
59 Local complete intersection Arbitrary n quartics: quadrics: Q, nonredundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
60 Local complete intersection Arbitrary n quartics: J 3 quadrics: Q, nonredundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
61 Local complete intersection Arbitrary n quartics: J 3 quadrics: Q, nonredundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
62 Local complete intersection Arbitrary n quartics: J 3 J n 1 quadrics: Q, nonredundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
63 Local complete intersection Arbitrary n quartics: J 3 J n 1 quadrics: Q, nonredundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
64 Local complete intersection Arbitrary n quartics: J 3 J n 1 quadrics: Q, nonredundant 2 2 minors from flattening Theorem J 3 J n 1 Q generate a local complete intersection that defines W at the biologically meaningful points q W +. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
65 Outline 1 Algebraic evolutionary models 2 Phylogenetic inference using algebraic geometry 3 The geometry of the Kimura variety 4 New results on simulated data M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
66 New results on simulated data 4 leaves Instead of all generators, 48 polynomials that generate locally Length 100 Length 500 Length 1000 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
67 New results on simulated data 4 leaves Instead of all generators, 48 polynomials that generate locally Length 100 Length 500 Length 1000 M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
68 New results on simulated data 4 leaves Instead of all generators, 48 polynomials that generate locally Length 100 Length 500 Length 1000 (Eriksson Yao 07) Machine learning approach, 52 invariants that perform better on this parameter space. M. Casanellas (UPC) Using algebraic geometry... IMA, March / 36
Phylogenetic invariants versus classical phylogenetics
Phylogenetic invariants versus classical phylogenetics Marta Casanellas Rius (joint work with Jesús FernándezSánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya Algebraic
More informationPractical Bioinformatics
5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o
More informationPHYLOGENETIC ALGEBRAIC GEOMETRY
PHYLOGENETIC ALGEBRAIC GEOMETRY NICHOLAS ERIKSSON, KRISTIAN RANESTAD, BERND STURMFELS, AND SETH SULLIVANT Abstract. Phylogenetic algebraic geometry is concerned with certain complex projective algebraic
More informationThe Generalized Neighbor Joining method
The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Mathematics Duke University Joint work with Dan Levy and Lior Pachter www.math.duke.edu/ ruriko data mining 1 Challenge We would like to
More informationAlgebraic Statistics Tutorial I
Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University June 9, 2012 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 1 / 34 Introduction to Algebraic Geometry Let R[p] =
More informationPhylogenetic Algebraic Geometry
Phylogenetic Algebraic Geometry Seth Sullivant North Carolina State University January 4, 2012 Seth Sullivant (NCSU) Phylogenetic Algebraic Geometry January 4, 2012 1 / 28 Phylogenetics Problem Given a
More informationAdvanced topics in bioinformatics
Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib
More informationSEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA
SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an aminoacid.
More informationSUPPORTING INFORMATION FOR. SEquenceEnabled Reassembly of βlactamase (SEERLAC): a Sensitive Method for the Detection of DoubleStranded DNA
SUPPORTING INFORMATION FOR SEquenceEnabled Reassembly of βlactamase (SEERLAC): a Sensitive Method for the Detection of DoubleStranded DNA Aik T. Ooi, Cliff I. Stains, Indraneel Ghosh *, David J. Segal
More informationCharacterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin
International Journal of Genetic Engineering and Biotechnology. ISSN 09743073 Volume 2, Number 1 (2011), pp. 109114 International Research Publication House http://www.irphouse.com Characterization of
More informationRegulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)
Regulatory Sequence Analysis Sequence models (Bernoulli and Markov models) 1 Why do we need random models? Any pattern discovery relies on an underlying model to estimate the random expectation. This model
More informationHigh throughput near infrared screening discovers DNAtemplated silver clusters with peak fluorescence beyond 950 nm
Electronic Supplementary Material (ESI) for Nanoscale. This journal is The Royal Society of Chemistry 2018 High throughput near infrared screening discovers DNAtemplated silver clusters with peak fluorescence
More informationCrick s early Hypothesis Revisited
Crick s early Hypothesis Revisited Or The Existence of a Universal Coding Frame Ryan Rossi, JeanLouis Lassez and Axel Bernal UPenn Center for Bioinformatics BIOINFORMATICS The application of computer
More informationSupplemental data. Pommerrenig et al. (2011). Plant Cell /tpc
Supplemental Figure 1. Prediction of phloemspecific MTK1 expression in Arabidopsis shoots and roots. The images and the corresponding numbers showing absolute (A) or relative expression levels (B) of
More informationClay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.
QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. Clay Carter Department of Biology QuickTime and a TIFF (LZW) decompressor are needed to see this picture. Ornamental tobacco
More informationIntroduction to Molecular Phylogeny
Introduction to Molecular Phylogeny Starting point: a set of homologous, aligned DNA or protein sequences Result of the process: a tree describing evolutionary relationships between studied sequences =
More informationSSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),
48 3 () Vol. 48 No. 3 2009 5 Journal of Xiamen University (Nat ural Science) May 2009 SSR,,,, 3 (, 361005) : SSR. 21 516,410. 60 %96. 7 %. (),(Between2groups linkage method),.,, 11 (),. 12,. (, ), : 0.
More informationGEOMETRY OF THE KIMURA 3PARAMETER MODEL arxiv:math/ v1 [math.ag] 27 Feb 2007
GEOMETRY OF THE KIMURA 3PARAMETER MODEL arxiv:math/0702834v1 [math.ag] 27 Feb 2007 MARTA CASANELLAS AND JESÚS FERNÁNDEZSÁNCHEZ Abstract. TheKimura3parametermodelonatreeofnleavesisoneofthemost used in
More informationSupplementary Information for
Supplementary Information for Evolutionary conservation of codon optimality reveals hidden signatures of cotranslational folding Sebastian Pechmann & Judith Frydman Department of Biology and BioX, Stanford
More informationarxiv: v1 [qbio.pe] 27 Oct 2011
INVARIANT BASED QUARTET PUZZLING JOE RUSINKO AND BRIAN HIPP arxiv:1110.6194v1 [qbio.pe] 27 Oct 2011 Abstract. Traditional Quartet Puzzling algorithms use maximum likelihood methods to reconstruct quartet
More informationWhy do more divergent sequences produce smaller nonsynonymous/synonymous
Genetics: Early Online, published on June 21, 2013 as 10.1534/genetics.113.152025 Why do more divergent sequences produce smaller nonsynonymous/synonymous rate ratios in pairwise sequence comparisons?
More informationSUPPLEMENTARY DATA  1 
 1  SUPPLEMENTARY DATA Construction of B. subtilis rnpb complementation plasmids For complementation, the B. subtilis rnpb wildtype gene (rnpbwt) under control of its native rnpb promoter and terminator
More informationEvolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval
Evolvable Neural Networs for Time Series Prediction with Adaptive Learning Interval DongWoo Lee *, Seong G. Kong *, and KweeBo Sim ** *Department of Electrical and Computer Engineering, The University
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationEvolutionary Analysis of Viral Genomes
University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral
More informationNature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1
Supplementary Figure 1 Zn 2+ binding sites in USP18. (a) The two molecules of USP18 present in the asymmetric unit are shown. Chain A is shown in blue, chain B in green. Bound Zn 2+ ions are shown as
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION  theory that groups of organisms change over time so that descendeants differ structurally
More informationMetric learning for phylogenetic invariants
Metric learning for phylogenetic invariants Nicholas Eriksson nke@stanford.edu Department of Statistics, Stanford University, Stanford, CA 943054065 Yuan Yao yuany@math.stanford.edu Department of Mathematics,
More informationIntroduction to Algebraic Statistics
Introduction to Algebraic Statistics Seth Sullivant North Carolina State University January 5, 2017 Seth Sullivant (NCSU) Algebraic Statistics January 5, 2017 1 / 28 What is Algebraic Statistics? Observation
More informationPhylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University
Phylogenetics: Parsimony and Likelihood COMP 571  Spring 2016 Luay Nakhleh, Rice University The Problem Input: Multiple alignment of a set S of sequences Output: Tree T leaflabeled with S Assumptions
More informationNumbercontrolled spatial arrangement of gold nanoparticles with
Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2016 Numbercontrolled spatial arrangement of gold nanoparticles with DNA dendrimers Ping Chen,*
More informationNSCI Basic Properties of Life and The Biochemistry of Life on Earth
NSCI 314 LIFE IN THE COSMOS 4 Basic Properties of Life and The Biochemistry of Life on Earth Dr. Karen Kolehmainen Department of Physics CSUSB http://physics.csusb.edu/~karen/ WHAT IS LIFE? HARD TO DEFINE,
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationTM1 TM2 TM3 TM4 TM5 TM6 TM bp
a 467 bp 1 482 2 93 3 321 4 7 281 6 21 7 66 8 176 19 12 13 212 113 16 8 b ATG TCA GGA CAT GTA ATG GAG GAA TGT GTA GTT CAC GGT ACG TTA GCG GCA GTA TTG CGT TTA ATG GGC GTA GTG M S G H V M E E C V V H G T
More information3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies
Richard Owen (1848) introduced the term Homology to refer to structural similarities among organisms. To Owen, these similarities indicated that organisms were created following a common plan or archetype.
More informationElectronic supplementary material
Applied Microbiology and Biotechnology Electronic supplementary material A family of AA9 lytic polysaccharide monooxygenases in Aspergillus nidulans is differentially regulated by multiple substrates and
More informationTable S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R
Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R AAC MGG ATT AGA TAC CCK G GGY TAC CTT GTT ACG ACT T Detection of Candidatus
More informationAlgebraic Invariants of Phylogenetic Trees
Harvey Mudd College November 21st, 2006 The Problem DNA Sequence Data Given a set of aligned DNA sequences... Human: A G A C G T T A C G T A... Chimp: A G A G C A A C T T T G... Monkey: A A A C G A T A
More informationAdditive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.
Additive distances Let T be a tree on leaf set S and let w : E R + be an edgeweighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then
More informationThe Mathematics of Phylogenomics
arxiv:math/0409132v2 [math.st] 27 Sep 2005 The Mathematics of Phylogenomics Lior Pachter and Bernd Sturmfels Department of Mathematics, UC Berkeley [lpachter,bernd]@math.berkeley.edu February 1, 2008 The
More informationOpen Problems in Algebraic Statistics
Open Problems inalgebraic Statistics p. Open Problems in Algebraic Statistics BERND STURMFELS UNIVERSITY OF CALIFORNIA, BERKELEY and TECHNISCHE UNIVERSITÄT BERLIN Advertisement Oberwolfach Seminar Algebraic
More informationSupporting Information
Supporting Information T. Pellegrino 1,2,3,#, R. A. Sperling 1,#, A. P. Alivisatos 2, W. J. Parak 1,2,* 1 Center for Nanoscience, Ludwig Maximilians Universität München, München, Germany 2 Department of
More informationSUPPLEMENTARY INFORMATION
SUPPLEMENTARY INFORMATION DOI:.38/NCHEM.246 Optimizing the specificity of nucleic acid hyridization David Yu Zhang, Sherry Xi Chen, and Peng Yin. Analytic framework and proe design 3.. Concentrationadjusted
More informationExample: HardyWeinberg Equilibrium. Algebraic Statistics Tutorial I. Phylogenetics. Main Point of This Tutorial. ModelBased Phylogenetics
Example: HardyWeinberg Equilibrium Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University July 22, 2012 Suppose a gene has two alleles, a and A. If allele a occurs in the population
More informationPhylogeny: traditional and Bayesian approaches
Phylogeny: traditional and Bayesian approaches 5Feb2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275284, 2003 1 Phylogeny A graph depicting the ancestordescendent
More informationGeometry of Phylogenetic Inference
Geometry of Phylogenetic Inference Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 References N. Eriksson, K. Ranestad, B. Sturmfels, S. Sullivant, Phylogenetic algebraic
More informationSupporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)
Supporting Information for Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II) Dependence and Its Ability to Chelate Multiple Nutrient Transition Metal Ions Rose C. Hadley,
More informationBuilding a Multifunctional AptamerBased DNA Nanoassembly for Targeted Cancer Therapy
Supporting Information Building a Multifunctional AptamerBased DNA Nanoassembly for Targeted Cancer Therapy Cuichen Wu,, Da Han,, Tao Chen,, Lu Peng, Guizhi Zhu,, Mingxu You,, Liping Qiu,, Kwame Sefah,
More informationSUPPLEMENTARY INFORMATION
DOI:.8/NCHEM. Conditionally Fluorescent Molecular Probes for Detecting Single Base Changes in Doublestranded DNA Sherry Xi Chen, David Yu Zhang, Georg Seelig. Analytic framework and probe design.. Design
More informationEvolutionary Models. Evolutionary Models
Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment
More informationModelling and Analysis in Bioinformatics. Lecture 1: Genomic kmer Statistics
582746 Modelling and Analysis in Bioinformatics Lecture 1: Genomic kmer Statistics Juha Kärkkäinen 06.09.2016 Outline Course introduction Genomic kmers 1Mers 2Mers 3Mers kmers for Larger k Outline
More informationThe Trigram and other Fundamental Philosophies
The Trigram and other Fundamental Philosophies by Weimin Kwauk July 2012 The following offers a minimal introduction to the trigram and other Chinese fundamental philosophies. A trigram consists of three
More informationSupplemental Table 1. Primers used for cloning and PCR amplification in this study
Supplemental Table 1. Primers used for cloning and PCR amplification in this study Target Gene Primer sequence NATA1 (At2g393) forward GGG GAC AAG TTT GTA CAA AAA AGC AGG CTT CAT GGC GCC TCC AAC CGC AGC
More informationMolecular Evolution and Phylogenetic Tree Reconstruction
1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length
More informationProtein Threading. Combinatorial optimization approach. Stefan Balev.
Protein Threading Combinatorial optimization approach Stefan Balev Stefan.Balev@univlehavre.fr Laboratoire d informatique du Havre Université du Havre Stefan Balev Cours DEA 30/01/2004 p.1/42 Outline
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationProbabilistic modeling and molecular phylogeny
Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical
More information19 Tree Construction using Singular Value Decomposition
19 Tree Construction using Singular Value Decomposition Nicholas Eriksson We present a new, statistically consistent algorithm for phylogenetic tree construction that uses the algeraic theory of statistical
More informationPhylogeny of Mixture Models
Phylogeny of Mixture Models Daniel Štefankovič Department of Computer Science University of Rochester joint work with Eric Vigoda College of Computing Georgia Institute of Technology Outline Introduction
More informationSupplemental Figure 1.
A wt spoiiiaδ spoiiiahδ bofaδ B C D E spoiiiaδ, bofaδ Supplemental Figure 1. GFPSpoIVFA is more mislocalized in the absence of both BofA and SpoIIIAH. Sporulation was induced by resuspension in wildtype
More informationELIZABETH S. ALLMAN and JOHN A. RHODES ABSTRACT 1. INTRODUCTION
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 13, Number 5, 2006 Mary Ann Liebert, Inc. Pp. 1101 1113 The Identifiability of Tree Topology for Phylogenetic Models, Including Covarion and Mixture Models ELIZABETH
More informationEstimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6057
Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 42818 Jordan 6057 Tree estimation strategies: Parsimony?no model, simply count minimum number
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics  in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa.  before we review the
More informationarxiv: v1 [qbio.pe] 23 Nov 2017
DIMENSIONS OF GROUPBASED PHYLOGENETIC MIXTURES arxiv:1711.08686v1 [qbio.pe] 23 Nov 2017 HECTOR BAÑOS, NATHANIEL BUSHEK, RUTH DAVIDSON, ELIZABETH GROSS, PAMELA E. HARRIS, ROBERT KRONE, COLBY LONG, ALLEN
More informationPhylogenetics: Parsimony
1 Phylogenetics: Parsimony COMP 571 Luay Nakhleh, Rice University he Problem 2 Input: Multiple alignment of a set S of sequences Output: ree leaflabeled with S Assumptions Characters are mutually independent
More informationSexLinked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi
Supporting Information http://www.genetics.org/cgi/content/full/genetics.110.116228/dc1 SexLinked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi Ben
More informationCounting phylogenetic invariants in some simple cases. Joseph Felsenstein. Department of Genetics SK50. University of Washington
Counting phylogenetic invariants in some simple cases Joseph Felsenstein Department of Genetics SK50 University of Washington Seattle, Washington 98195 Running Headline: Counting Phylogenetic Invariants
More informationReconstruire le passé biologique modèles, méthodes, performances, limites
Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire
More informationReading for Lecture 13 Release v10
Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................
More informationSupplementary Information
Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2014 Directed selfassembly of genomic sequences into monomeric and polymeric branched DNA structures
More informationEvidence for Evolution: Change Over Time (Make Up Assignment)
Lesson 7.2 Evidence for Evolution: Change Over Time (Make Up Assignment) Name Date Period Key Terms Adaptive radiation Molecular Record Vestigial organ Homologous structure Strata Divergent evolution Evolution
More informationCodon Distribution in ErrorDetecting Circular Codes
life Article Codon Distribution in ErrorDetecting Circular Codes Elena Fimmel, * and Lutz Strüngmann Institute for Mathematical Biology, Faculty of Computer Science, Mannheim University of Applied Sciences,
More informationThe role of the FliD Cterminal domain in pentamer formation and
The role of the FliD Cterminal domain in pentamer formation and interaction with FliT Hee Jung Kim 1,2,*, Woongjae Yoo 3,*, Kyeong Sik Jin 4, Sangryeol Ryu 3,5 & Hyung Ho Lee 1, 1 Department of Chemistry,
More informationA new parametrization for binary hidden Markov modes
A new parametrization for binary hidden Markov models Andrew Critch, UC Berkeley at Pennsylvania State University June 11, 2012 See Binary hidden Markov models and varieties [, 2012], arxiv:1206.0500,
More informationpart 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.
20170729 part 4: and biological inference review types of models phenomenological Newton F= Gm1m2 r2 mechanistic Einstein Gαβ = 8π Tαβ 1 molecular evolution is process and pattern process pattern MutSel
More informationThe 3 Genomic Numbers Discovery: How Our Genome SingleStranded DNA Sequence Is SelfDesigned as a Numerical Whole
Applied Mathematics, 2013, 4, 3753 http://dx.doi.org/10.4236/am.2013.410a2004 Published Online October 2013 (http://www.scirp.org/journal/am) The 3 Genomic Numbers Discovery: How Our Genome SingleStranded
More informationUSING INVARIANTS FOR PHYLOGENETIC TREE CONSTRUCTION
USING INVARIANTS FOR PHYLOGENETIC TREE CONSTRUCTION NICHOLAS ERIKSSON Abstract. Phylogenetic invariants are certain polynomials in the joint probability distribution of a Markov model on a phylogenetic
More informationLecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)
Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from
More informationpart 3: analysis of natural selection pressure
part 3: analysis of natural selection pressure markov models are good phenomenological codon models do have many benefits: o principled framework for statistical inference o avoiding ad hoc corrections
More informationAmira A. ALHosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. ALHosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut UniversityEgypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis) advantages of different information types
More informationBMI/CS 776 Lecture 4. Colin Dewey
BMI/CS 776 Lecture 4 Colin Dewey 2007.02.01 Outline Common nucleotide substitution models Directed graphical models Ancestral sequence inference Poisson process continuous Markov process X t0 X t1 X t2
More informationMolecular phylogeny  Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016
Molecular phylogeny  Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,
More informationLecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).
1 Bioinformatics: Indepth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More informationPhylogenetic Networks, Trees, and Clusters
Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and LiSan Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University
More informationDr. Amira A. ALHosary
Phylogenetic analysis Amira A. ALHosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut UniversityEgypt Phylogenetic Basics: Biological
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationevoglow  express N kit distributed by Cat.#: FP product information broad host range vectors  gram negative bacteria
evoglow  express N kit broad host range vectors  gram negative bacteria product information distributed by Cat.#: FP21020 Content: Product Overview... 3 evoglow express N kit... 3 The evoglow Fluorescent
More informationThe Algebraic Degree of Semidefinite Programming
The Algebraic Degree ofsemidefinite Programming p. The Algebraic Degree of Semidefinite Programming BERND STURMFELS UNIVERSITY OF CALIFORNIA, BERKELEY joint work with and Jiawang Nie (IMA Minneapolis)
More informationEvolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila
biorxiv preprint first posted online May. 3, 2016; doi: http://dx.doi.org/10.1101/051557. The copyright holder for this preprint (which was not peerreviewed) is the author/funder. All rights reserved.
More informationHMM for modeling aligned multiple sequences: phylohmm & multivariate HMM
I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylohmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington
More informationMolecular Evolution, course # Final Exam, May 3, 2006
Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least
More informationevoglow  express N kit Cat. No.: product information broad host range vectors  gram negative bacteria
evoglow  express N kit broad host range vectors  gram negative bacteria product information Cat. No.: 2.1.020 evocatal GmbH 2 Content: Product Overview... 4 evoglow express N kit... 4 The evoglow Fluorescent
More informationCommuting birthanddeath processes
Commuting birthanddeath processes Caroline Uhler Department of Statistics UC Berkeley (joint work with Steven N. Evans and Bernd Sturmfels) MSRI Workshop on Algebraic Statistics December 18, 2008 Birthanddeath
More informationMathematical Biology. Phylogenetic mixtures and linear invariants for equal input models. B Mike Steel. Marta Casanellas 1 Mike Steel 2
J. Math. Biol. DOI 0.007/s00285060558 Mathematical Biology Phylogenetic mixtures and linear invariants for equal input models Marta Casanellas Mike Steel 2 Received: 5 February 206 / Revised: July 206
More informationarxiv: v5 [qbio.pe] 24 Oct 2016
On the Quirks of Maximum Parsimony and Likelihood on Phylogenetic Networks Christopher Bryant a, Mareike Fischer b, Simone Linz c, Charles Semple d arxiv:1505.06898v5 [qbio.pe] 24 Oct 2016 a Statistics
More information3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylohmm & multivariate HMM
I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylohmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,
More informationChemiScreen CaS Calcium Sensor Receptor Stable Cell Line
PRODUCT DATASHEET ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line CATALOG NUMBER: HTS137C CONTENTS: 2 vials of mycoplasmafree cells, 1 ml per vial. STORAGE: Vials are to be stored in liquid N
More informationRe engineering cellular physiology by rewiring high level global regulatory genes
Re engineering cellular physiology by rewiring high level global regulatory genes Stephen Fitzgerald 1,2,, Shane C Dillon 1, Tzu Chiao Chao 2, Heather L Wiencko 3, Karsten Hokamp 3, Andrew DS Cameron
More information