Phylogenetic Networks with Recombination

Size: px
Start display at page:

Download "Phylogenetic Networks with Recombination"

Transcription

1 Phylogenetic Networks with Recombination October

2 Recombination All DNA is recombinant DNA... [The] natural process of recombination and mutation have acted throughout evolution... Genetic exchange works constantly to blend and rearrange chromosomes, most obviously during meiosis... J. Watson We are interested in reconstructing the history of mutations and recombinations creating observed SNP (binary) sequences. Our thesis is that recombination networks can be constructed by efficient algorithms using genome variation data in populations, and that those networks reflect true recombination history sufficiently well to help resolve or clarify many biological issues.

3 Crossing Over The best understood form of recombination, occurring during every meiosis, is single-crossover recombination, also called crossing-over parent parent recombinant Figure: A single-crossover recombination. The prefix (underlined) contributed by parental sequence 1 consists of the first three characters of SNP sequence 1. The suffix (underlined) contributed by parental sequence 2 consists of the last two characters of SNP sequence 2.

4 P S Figure: A graphical representation of a single-crossover recombination event. P contributes the prefix, and S contributes the suffix of the recombinant sequence. The crossover point is written above the recombination node.

5 SNP sequences The input consists of binary sequences modeling SNP sequences. The SNP sites are linearly ordered, and together on a single chromosome, but are generally not physically contiguous. We follow the infinite sites model: a SNP site mutates exactly once in the history of the sequences. SNP breakpoint

6 Genealogical Network and Ancestral Recombination Graph a: S b: c: P P 5 4 M a: b: c: d: e: f: g: S g: d: e: f: Figure: An ARG N with two recombination nodes, and the matrix of sequences M that are derived by N.

7 Genealogical Network and Ancestral Recombination Graph S P P S P S Figure: An ARG N with three recombination nodes

8 Two parts of this talk 1. Idealized Association Mapping using ARGs - a present and future application. 2. The phenomena of invisible and Steiner nodes in ARGs and in cluster-based phylogenetic networks.

9 Association Mapping: an example of the use of ARGs We examine a very idealized example in order to illustrate the logic. Definition In a pure-mendelian disease, there is a single (causal) site c in the genome, and a single causal state (say 1) for c, such that any individual in the population will have the disease if and only if they have state 1 at site c. We have a sample of known diseased individuals (Cases) and of non-diseased individuals (Controls), and we have a binary (SNP) sequence for each sampled individual.

10 a: S v P b: c: x P 5 4 S g: d: e: f: Diseased individuals Figure: The true ARG and the disease status of the individuals. We want to determine where site c is in the genome, and when the mutation happened there.

11 The first thing we can deduce is that the mutation occurs during the time represented by the edge labeled 2. The location of c can then be deduced as follows. individuals d: f: deduced interval SNP sites: Figure: The intervals (deduced from individuals d and f ) where c might be located.

12 Where is c? From the non-diseased d we deduce that c is after site 2, and from the diseased f we deduce that it is before site 4. Hence, we conclude that c is in the open interval (2,4). That is the finest deduction possible that is consistent with the data and ARG.

13 Association Mapping Summary If the true ARG were known, it would provide the optimal amount of information for mapping no extra information would be available from the genotypes. Not only would disease-associated regions be identified, but the ARG would give the ages of the causative mutations. [?]

14 Invisible and Steiner Nodes The need for Invisible nodes is a key technical problems in both ARGs and cluster-based phylogenetic networks. Steiner nodes are an additional problem for ARGs. What do we know about invisible and Steiner nodes? Four types of results. 1. The History Lower Bound and Rec-Invisible Nodes Given a set of sequences M, the History Bound of Myers and Griffiths is a lower bound on the number of needed recombinations in any ARG that creates M (with all-zero ancestral sequence). It is also a lower bound on the number of reticulation nodes in any softwired phylogenetic network for an input set of clusters. It is defined only by the algorithms that compute it.

15 The computation of the history bound uses three Rules Initially, set M to the input M. As the algorithm proceeds, rows and columns of M will be deleted. Let M denote the current remaining submatrix of M as the algorithm executes. The algorithm executes three Rules. The first two are: Rule Dc: If a column c of M contains at most one entry with value 1, then remove column c from M. Rule Dr: If two rows in M are identical, remove one.

16 Algorithm Clean(M) Execute Rules Dc and Dr on M in any order until no further applications of Rules Dc or Dr are possible. Set M to M. Note that the execution of Rule Dc may create the conditions where Rule Dr applies, and the converse is also true. Since Rules Dc and Dr can be applied in any order, and to different columns and rows, it is conceivable that different executions of Algorithm Clean could produce different results. However

17 Lemma The resulting submatrix M of M created by running Algorithm Clean on M is invariant over all executions of Algorithm Clean. Lemma Assuming there is a perfect-phylogeny with all-zero ancestral sequence for M, the Algorithm Clean reduces M to a matrix containing a single row with no entries.

18 The Third Destructive Rule Rule Dt: If neither Rule Dc nor Dr can be applied, pick a row r in the current M (other than the all-zero row that corresponds to the ancestral sequence) and remove row r from M.

19 Computing the History Bound Algorithm CHB(M) Set CLB(M) = 0. M to M. While ( M contains more than one row or contains some entries) Execute Algorithm Clean on M. Select a row r in M and remove it i.e., apply Rule Dt to M Set CLB(M) = CLB(M) + 1. End While Return CLB(M) The History Bound for M is the Minimum CLB(M) value over all possible executions of Algorithm CHB(M).

20 Example r r r r r r Figure: The input M. No application of Rule Dc or Dr is possible. So pick a row, say r 6, for Rule Dt.

21 Example r r r r r Figure: Now apply Rule Dc to column 4.

22 Example r r r r r Figure: Now apply Rule Dr, and remove row r 5.

23 Example r r r r Figure: Now apply Rule Dc twice to remove columns 1 and 6.

24 Example r r r r Figure: Now apply Rule Dt to remove row 4.

25 Example r r r Figure: Now apply Rule Dt again to remove row 3.

26 Example r r Figure: Now apply Rule Dc twice to remove columns 2 and 5.

27 Example 3 r 1 1 r 2 1 Figure: Now apply Rule Dr to remove row 2.

28 Example 3 r 1 1 Figure: Now apply Rule Dc to remove column 3 to obtain a single row with no entries.

29 A graphical view of Algorithm CHB and the History Bound Consider an ARG N for M, and an execution of Algorithm CHB. Each application of a destructive rule Dc, Dr or Dt removes a column or row from M. We will define ARG destruction rules to reduce N in parallel with the execution of Algorithm CHB. We let Ñ denote the remaining portion of N, and use M, as before, to denote the current matrix derived from M.

30 ARG destruction Rules and Facts: 1. When Rule Dc removes a column c, remove label c on an edge into a leaf. 2. When Rule Dr removes a row in M with sequence s, remove one of two sibling leaves, each of which is labeled with sequence s. 3. Fact: There is an execution of Algorithm CHB where each application of Rule Dt removes a row in M whose sequence s labels a recombination node x in Ñ, and also labels the leaf-child of x. 4. When Rule Dt removes sequence s from M, modify the current Ñ by removing the leaf labeled s and then successively removing edges that are ancestral to the leaf, following any such path backwards until the path reaches a node that has out-degree at least two.

31 Each time we modify Ñ, we we also contract any resulting node incident with one in and one out edge. Lemma Using the destructive rules, during an execution of Algorithm CHB, each ARG Ñ will be an ARG that generates the sequences in the corresponding matrix M. Lemma When M is a single row with no sites, Ñ will only consist of the root node of N.

32 Example S P P S P S Figure: The original ARG N

33 Example S P P 3 S Figure: ARG Ñ after the first application of Rule Dt.

34 Example S P P 3 S Figure: ARG Ñ after application of Rule Dc, removing edge label 6.

35 Example Figure: ARG Ñ after the second application of Rule Dt. That application of Rule Dt also removes the recombination node labeled (originally ). That node is rec-invisible. Next, Algorithm Clean will remove all entries of M, and the remaining tree will be reduced to the root node.

36 Now we return to invisible nodes Definition A recombination node x in an ARG N is called rec-visible if there is some path from v to a leaf of N that does not contain a recombination node other than x. Otherwise it is called rec-invisible, i.e., every path from v to a leaf encounters another recombination node. Also called normal in the cluster-model literature. (S. Willson) Theorem The History-Bound for M will be strictly less than Rmin 0 (M) if there is a MinARG N for M containing a rec-invisible recombination node x. The proof involves showing that the phenomena that happened in the example, always happens if there is a rec-invisible node in any MinARG for N.

37 Normal Cluster Networks Many nice things happen when the data (splits) comes from Normal or Regular or Tree-Child networks. (Baroni, Semple, Steel, Willson, Kelk, van Iersel, Valiente, Nakheleh and more.) Bad things happen sometimes when data comes from non-normal data (Song, Kelk, van Iersel.)

38 Stating Theorem?? differently, in order for the History-Bound to have a chance to be equal to Rmin 0 (M), no MinARG N for M can have a rec-invisible node. Equivalently, if the History-Bound is tight for M, then in every MinARG for M, every recombination node must be normal.

39 Corollary The difference between Rmin 0 (M) and the History-Bound for M is at least as large as the minimum number of rec-invisible recombination nodes in the MinARGs for M. The proof involves a closer look at the proof of the Theorem, so that the phenomena occurs for?? rec-invisible node in any MinARG for M. This suggests that the history bound is generally weak for the cluster-based model. It can be increased in the ARG model by use of the composite method, but that does not work for the cluster model. The graphical interpretation also allows us to establish a lower bound on the Hisotry-Bound.

40 Definition A recombination node v in an ARG is called hyper-visible if no path from v reaches another recombination node. Theorem The History-Bound for M is at least as large as the minimum number of hyper-visible recombination nodes over all the ARGs for M. Corollary The History-Bound is tight for M if every ARG for M has at least Rmin 0 hyper-visible recombination nodes. This can happen only if in every MinARG for M, every recombination node is hyper-visible.

41 Result 2. A NASC for no Steiner nodes Definition A Steiner node in an ARG N is a node that is labeled with a sequence not in the input set M. Steiner nodes are common, but why? When do we need them? Definition Let D r (M) and D c (M) be the number of distinct rows and distinct columns of M, respectively. The Haplotype Lower Bound on M, denoted H(M), is defined to be D r (M) D c (M) 1. Theorem If M is generated on an ARG N whose root sequence is not in M, then the number of recombination nodes in N must be at least the D r (M) D c (M) = H(M) + 1.

42 For simplicity, we will assume that the root sequence is specified and is in M. Close examination of the proof of Theorem?? leads to the following: Theorem If N is an ARG for M that has only H(M) recombination nodes (and hence is a MinARG), and every site in M is distinct, then N has no Steiner nodes, and no edge is labeled with more than one mutation. Theorem If M is derived on an ARG with no Steiner nodes and at most one mutation per edge, then N contains exactly H(M) recombination nodes. Hence N is a MinARG for M. Moreover, no MinARG for M with one mutation per edge can have any Steiner nodes.

43 Since the Haplotype Bound is always less than or equal to the History Bound, we have Theorem M can be derived on an ARG with no Steiner nodes and one mutation per edge only if every MinARG for M (with one mutation per edge) has no rec-invisible recombination nodes.

44 Topic 3 Definition We define the incompatibility graph G(M) for M as the graph containing one node for each site in M, and an edge connecting two nodes c and d if and only if sites c and d are incompatible, i.e. contain all four binary pairs 0,0; 0,1; 1,0; 1,1. Definition A connected component C of a graph is a maximal subgraph such that for any pair of nodes (u,v) in C there is at least one path between u and v in the subgraph C. A trivial component has only one node, and no edges.

45 a: b: P S d: c: e: M a: b: c: d: e: f: g: P Incompatibility graph for M S g:00101 f: Figure: Example

46 Theorem Let G(M) be the incompatibility graph for the set of sequences M. Then there is an ARG N that derives M, where every blob in N contains all and only the sites of a single non-trivial connected component of G(M), and every compatible site is on a cut-edge of N. Definition An ARG N is called fully-decomposed if it has the structure specified above.

47 , 5 S P S 5 P S P P 4 S Figure: Example

48 S P P S P S Figure: ARG that is not fully decomposed, for the same sequence.

49 visibility Theorem Let N be an ARG for M where all the nodes in N are visible. Then there is a fully-decomposed ARG for M with the same number of recombination nodes as N. Corollary There is no fully-decomposed MinARG for M only if every MinARG for M has at least one Steiner node.

50 Next we establish a relationship between the haplotype bound for M, H(M), and full-decomposition. Theorem Assume that the set of sequences M contains no duplicate sites. If the haplotype bound is tight for M, then there is a fully-decomposed MinARG for M. Here we state the most general result established for the existence of a fully-decomposed MinARG. Theorem There is a fully-decomposed MinARG for M if there is a MinARG N such that incompatibility graphs G(M) and G(L N ) have the same number of connected components.

Integer Programming for Phylogenetic Network Problems

Integer Programming for Phylogenetic Network Problems Integer Programming for Phylogenetic Network Problems D. Gusfield University of California, Davis Presented at the National University of Singapore, July 27, 2015.! There are many important phylogeny problems

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

Integer Programming in Computational Biology. D. Gusfield University of California, Davis Presented December 12, 2016.!

Integer Programming in Computational Biology. D. Gusfield University of California, Davis Presented December 12, 2016.! Integer Programming in Computational Biology D. Gusfield University of California, Davis Presented December 12, 2016. There are many important phylogeny problems that depart from simple tree models: Missing

More information

Haplotyping as Perfect Phylogeny: A direct approach

Haplotyping as Perfect Phylogeny: A direct approach Haplotyping as Perfect Phylogeny: A direct approach Vineet Bafna Dan Gusfield Giuseppe Lancia Shibu Yooseph February 7, 2003 Abstract A full Haplotype Map of the human genome will prove extremely valuable

More information

Properties of normal phylogenetic networks

Properties of normal phylogenetic networks Properties of normal phylogenetic networks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu August 13, 2009 Abstract. A phylogenetic network is

More information

Tree-average distances on certain phylogenetic networks have their weights uniquely determined

Tree-average distances on certain phylogenetic networks have their weights uniquely determined Tree-average distances on certain phylogenetic networks have their weights uniquely determined Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu

More information

Reconstruction of certain phylogenetic networks from their tree-average distances

Reconstruction of certain phylogenetic networks from their tree-average distances Reconstruction of certain phylogenetic networks from their tree-average distances Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu October 10,

More information

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS PETER J. HUMPHRIES AND CHARLES SEMPLE Abstract. For two rooted phylogenetic trees T and T, the rooted subtree prune and regraft distance

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

1.1 The (rooted, binary-character) Perfect-Phylogeny Problem

1.1 The (rooted, binary-character) Perfect-Phylogeny Problem Contents 1 Trees First 3 1.1 Rooted Perfect-Phylogeny...................... 3 1.1.1 Alternative Definitions.................... 5 1.1.2 The Perfect-Phylogeny Problem and Solution....... 7 1.2 Alternate,

More information

Aphylogenetic network is a generalization of a phylogenetic tree, allowing properties that are not tree-like.

Aphylogenetic network is a generalization of a phylogenetic tree, allowing properties that are not tree-like. INFORMS Journal on Computing Vol. 16, No. 4, Fall 2004, pp. 459 469 issn 0899-1499 eissn 1526-5528 04 1604 0459 informs doi 10.1287/ijoc.1040.0099 2004 INFORMS The Fine Structure of Galls in Phylogenetic

More information

Beyond Galled Trees Decomposition and Computation of Galled Networks

Beyond Galled Trees Decomposition and Computation of Galled Networks Beyond Galled Trees Decomposition and Computation of Galled Networks Daniel H. Huson & Tobias H.Kloepper RECOMB 2007 1 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or

More information

Estimating Recombination Rates. LRH selection test, and recombination

Estimating Recombination Rates. LRH selection test, and recombination Estimating Recombination Rates LRH selection test, and recombination Recall that LRH tests for selection by looking at frequencies of specific haplotypes. Clearly the test is dependent on the recombination

More information

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT COMMUNICATIONS IN INFORMATION AND SYSTEMS c 2009 International Press Vol. 9, No. 4, pp. 295-302, 2009 001 THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT DAN GUSFIELD AND YUFENG WU Abstract.

More information

Improved maximum parsimony models for phylogenetic networks

Improved maximum parsimony models for phylogenetic networks Improved maximum parsimony models for phylogenetic networks Leo van Iersel Mark Jones Celine Scornavacca December 20, 207 Abstract Phylogenetic networks are well suited to represent evolutionary histories

More information

Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics

Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics 1 Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics Yun S. Song, Rune Lyngsø, and Jotun Hein Abstract Given a set D of input sequences, a genealogy for D can be

More information

Finding a gene tree in a phylogenetic network Philippe Gambette

Finding a gene tree in a phylogenetic network Philippe Gambette LRI-LIX BioInfo Seminar 19/01/2017 - Palaiseau Finding a gene tree in a phylogenetic network Philippe Gambette Outline Phylogenetic networks Classes of phylogenetic networks The Tree Containment Problem

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

Mathematical Approaches to the Pure Parsimony Problem

Mathematical Approaches to the Pure Parsimony Problem Mathematical Approaches to the Pure Parsimony Problem P. Blain a,, A. Holder b,, J. Silva c, and C. Vinzant d, July 29, 2005 Abstract Given the genetic information of a population, the Pure Parsimony problem

More information

Regular networks are determined by their trees

Regular networks are determined by their trees Regular networks are determined by their trees Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu February 17, 2009 Abstract. A rooted acyclic digraph

More information

arxiv: v3 [q-bio.pe] 1 May 2014

arxiv: v3 [q-bio.pe] 1 May 2014 ON COMPUTING THE MAXIMUM PARSIMONY SCORE OF A PHYLOGENETIC NETWORK MAREIKE FISCHER, LEO VAN IERSEL, STEVEN KELK, AND CELINE SCORNAVACCA arxiv:32.243v3 [q-bio.pe] May 24 Abstract. Phylogenetic networks

More information

arxiv: v5 [q-bio.pe] 24 Oct 2016

arxiv: v5 [q-bio.pe] 24 Oct 2016 On the Quirks of Maximum Parsimony and Likelihood on Phylogenetic Networks Christopher Bryant a, Mareike Fischer b, Simone Linz c, Charles Semple d arxiv:1505.06898v5 [q-bio.pe] 24 Oct 2016 a Statistics

More information

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT

More information

A new algorithm to construct phylogenetic networks from trees

A new algorithm to construct phylogenetic networks from trees A new algorithm to construct phylogenetic networks from trees J. Wang College of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia, China Corresponding author: J. Wang E-mail: wangjuanangle@hit.edu.cn

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information

On improving matchings in trees, via bounded-length augmentations 1

On improving matchings in trees, via bounded-length augmentations 1 On improving matchings in trees, via bounded-length augmentations 1 Julien Bensmail a, Valentin Garnero a, Nicolas Nisse a a Université Côte d Azur, CNRS, Inria, I3S, France Abstract Due to a classical

More information

Notes on the Matrix-Tree theorem and Cayley s tree enumerator

Notes on the Matrix-Tree theorem and Cayley s tree enumerator Notes on the Matrix-Tree theorem and Cayley s tree enumerator 1 Cayley s tree enumerator Recall that the degree of a vertex in a tree (or in any graph) is the number of edges emanating from it We will

More information

Reconstructing Phylogenetic Networks

Reconstructing Phylogenetic Networks Reconstructing Phylogenetic Networks Mareike Fischer, Leo van Iersel, Steven Kelk, Nela Lekić, Simone Linz, Celine Scornavacca, Leen Stougie Centrum Wiskunde & Informatica (CWI) Amsterdam MCW Prague, 3

More information

From graph classes to phylogenetic networks Philippe Gambette

From graph classes to phylogenetic networks Philippe Gambette 40 années d'algorithmique de graphes 40 Years of Graphs and Algorithms 11/10/2018 - Paris From graph classes to phylogenetic networks Philippe Gambette Outline Discovering graph classes with Michel An

More information

Learning ancestral genetic processes using nonparametric Bayesian models

Learning ancestral genetic processes using nonparametric Bayesian models Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew

More information

An Overview of Combinatorial Methods for Haplotype Inference

An Overview of Combinatorial Methods for Haplotype Inference An Overview of Combinatorial Methods for Haplotype Inference Dan Gusfield 1 Department of Computer Science, University of California, Davis Davis, CA. 95616 Abstract A current high-priority phase of human

More information

AUTHORIZATION TO LEND AND REPRODUCE THE THESIS. Date Jong Wha Joanne Joo, Author

AUTHORIZATION TO LEND AND REPRODUCE THE THESIS. Date Jong Wha Joanne Joo, Author AUTHORIZATION TO LEND AND REPRODUCE THE THESIS As the sole author of this thesis, I authorize Brown University to lend it to other institutions or individuals for the purpose of scholarly research. Date

More information

The Pure Parsimony Problem

The Pure Parsimony Problem Haplotyping and Minimum Diversity Graphs Courtney Davis - University of Utah - Trinity University Some Genetics Mother Paired Gene Representation Physical Trait ABABBA AAABBB Physical Trait ABA AAA Mother

More information

arxiv: v1 [cs.cc] 9 Oct 2014

arxiv: v1 [cs.cc] 9 Oct 2014 Satisfying ternary permutation constraints by multiple linear orders or phylogenetic trees Leo van Iersel, Steven Kelk, Nela Lekić, Simone Linz May 7, 08 arxiv:40.7v [cs.cc] 9 Oct 04 Abstract A ternary

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

Mathematical models in population genetics II

Mathematical models in population genetics II Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population

More information

The genomes of recombinant inbred lines

The genomes of recombinant inbred lines The genomes of recombinant inbred lines Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman C57BL/6 2 1 Recombinant inbred lines (by sibling mating)

More information

arxiv: v1 [q-bio.pe] 1 Jun 2014

arxiv: v1 [q-bio.pe] 1 Jun 2014 THE MOST PARSIMONIOUS TREE FOR RANDOM DATA MAREIKE FISCHER, MICHELLE GALLA, LINA HERBST AND MIKE STEEL arxiv:46.27v [q-bio.pe] Jun 24 Abstract. Applying a method to reconstruct a phylogenetic tree from

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm

More information

Acyclic Digraphs arising from Complete Intersections

Acyclic Digraphs arising from Complete Intersections Acyclic Digraphs arising from Complete Intersections Walter D. Morris, Jr. George Mason University wmorris@gmu.edu July 8, 2016 Abstract We call a directed acyclic graph a CI-digraph if a certain affine

More information

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION MAGNUS BORDEWICH, KATHARINA T. HUBER, VINCENT MOULTON, AND CHARLES SEMPLE Abstract. Phylogenetic networks are a type of leaf-labelled,

More information

Reconstructing Trees from Subtree Weights

Reconstructing Trees from Subtree Weights Reconstructing Trees from Subtree Weights Lior Pachter David E Speyer October 7, 2003 Abstract The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree

More information

An introduction to phylogenetic networks

An introduction to phylogenetic networks An introduction to phylogenetic networks Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University Email: steven.kelk@maastrichtuniversity.nl Web: http://skelk.sdf-eu.org Genome sequence,

More information

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics: Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships

More information

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG) CS1820 Notes hgupta1, kjline, smechery April 3-April 5 April 3 Notes 1 Minichiello-Durbin Algorithm input: set of sequences output: plausible Ancestral Recombination Graph (ARG) note: the optimal ARG is

More information

Markov properties for directed graphs

Markov properties for directed graphs Graphical Models, Lecture 7, Michaelmas Term 2009 November 2, 2009 Definitions Structural relations among Markov properties Factorization G = (V, E) simple undirected graph; σ Say σ satisfies (P) the pairwise

More information

Processes of Evolution

Processes of Evolution 15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection

More information

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published

More information

Cherry picking: a characterization of the temporal hybridization number for a set of phylogenies

Cherry picking: a characterization of the temporal hybridization number for a set of phylogenies Bulletin of Mathematical Biology manuscript No. (will be inserted by the editor) Cherry picking: a characterization of the temporal hybridization number for a set of phylogenies Peter J. Humphries Simone

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Cell division and multiplication

Cell division and multiplication CELL DIVISION Cell division and multiplication As we already mentioned, the genetic information contained in the nucleus is hereditary Meaning it is passed on from cell to cell; from parent to child This

More information

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees Francesc Rosselló 1, Gabriel Valiente 2 1 Department of Mathematics and Computer Science, Research Institute

More information

SCIENCE M E I O S I S

SCIENCE M E I O S I S SCIENCE 9 6. 1 - M E I O S I S OBJECTIVES By the end of the lesson you should be able to: Describe the process of meiosis Compare and contrast meiosis and mitosis Explain why meiosis is needed MEIOSIS

More information

Lecture 5 January 16, 2013

Lecture 5 January 16, 2013 UBC CPSC 536N: Sparse Approximations Winter 2013 Prof. Nick Harvey Lecture 5 January 16, 2013 Scribe: Samira Samadi 1 Combinatorial IPs 1.1 Mathematical programs { min c Linear Program (LP): T x s.t. a

More information

Populations in statistical genetics

Populations in statistical genetics Populations in statistical genetics What are they, and how can we infer them from whole genome data? Daniel Lawson Heilbronn Institute, University of Bristol www.paintmychromosomes.com Work with: January

More information

ACYCLIC DIGRAPHS GIVING RISE TO COMPLETE INTERSECTIONS

ACYCLIC DIGRAPHS GIVING RISE TO COMPLETE INTERSECTIONS ACYCLIC DIGRAPHS GIVING RISE TO COMPLETE INTERSECTIONS WALTER D. MORRIS, JR. ABSTRACT. We call a directed acyclic graph a CIdigraph if a certain affine semigroup ring defined by it is a complete intersection.

More information

Allen Holder - Trinity University

Allen Holder - Trinity University Haplotyping - Trinity University Population Problems - joint with Courtney Davis, University of Utah Single Individuals - joint with John Louie, Carrol College, and Lena Sherbakov, Williams University

More information

Tree sets. Reinhard Diestel

Tree sets. Reinhard Diestel 1 Tree sets Reinhard Diestel Abstract We study an abstract notion of tree structure which generalizes treedecompositions of graphs and matroids. Unlike tree-decompositions, which are too closely linked

More information

A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES

A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES SIMONE LINZ AND CHARLES SEMPLE Abstract. Calculating the rooted subtree prune and regraft (rspr) distance between two rooted binary

More information

Haploid & diploid recombination and their evolutionary impact

Haploid & diploid recombination and their evolutionary impact Haploid & diploid recombination and their evolutionary impact W. Garrett Mitchener College of Charleston Mathematics Department MitchenerG@cofc.edu http://mitchenerg.people.cofc.edu Introduction The basis

More information

The Multi-State Perfect Phylogeny Problem with Missing and Removable Data: Solutions via Integer-Programming and Chordal Graph Theory

The Multi-State Perfect Phylogeny Problem with Missing and Removable Data: Solutions via Integer-Programming and Chordal Graph Theory The Multi-State Perfect Phylogeny Problem with Missing and Removable Data: Solutions via Integer-Programming and Chordal Graph Theory Dan Gusfield Department of Computer Science, University of California,

More information

arxiv: v1 [cs.ds] 21 May 2013

arxiv: v1 [cs.ds] 21 May 2013 Easy identification of generalized common nested intervals Fabien de Montgolfier 1, Mathieu Raffinot 1, and Irena Rusu 2 arxiv:1305.4747v1 [cs.ds] 21 May 2013 1 LIAFA, Univ. Paris Diderot - Paris 7, 75205

More information

Haplotype Inference Constrained by Plausible Haplotype Data

Haplotype Inference Constrained by Plausible Haplotype Data Haplotype Inference Constrained by Plausible Haplotype Data Michael R. Fellows 1, Tzvika Hartman 2, Danny Hermelin 3, Gad M. Landau 3,4, Frances Rosamond 1, and Liat Rozenberg 3 1 The University of Newcastle,

More information

Restricted trees: simplifying networks with bottlenecks

Restricted trees: simplifying networks with bottlenecks Restricted trees: simplifying networks with bottlenecks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu February 17, 2011 Abstract. Suppose N

More information

Part V. Matchings. Matching. 19 Augmenting Paths for Matchings. 18 Bipartite Matching via Flows

Part V. Matchings. Matching. 19 Augmenting Paths for Matchings. 18 Bipartite Matching via Flows Matching Input: undirected graph G = (V, E). M E is a matching if each node appears in at most one Part V edge in M. Maximum Matching: find a matching of maximum cardinality Matchings Ernst Mayr, Harald

More information

arxiv: v4 [q-bio.pe] 7 Jul 2016

arxiv: v4 [q-bio.pe] 7 Jul 2016 Complexity and algorithms for finding a perfect phylogeny from mixed tumor samples Ademir Hujdurović a,b Urša Kačar c Martin Milanič a,b Bernard Ries d Alexandru I. Tomescu e arxiv:1506.07675v4 [q-bio.pe]

More information

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele

More information

Exact and Approximate Equilibria for Optimal Group Network Formation

Exact and Approximate Equilibria for Optimal Group Network Formation Exact and Approximate Equilibria for Optimal Group Network Formation Elliot Anshelevich and Bugra Caskurlu Computer Science Department, RPI, 110 8th Street, Troy, NY 12180 {eanshel,caskub}@cs.rpi.edu Abstract.

More information

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Graph coloring, perfect graphs

Graph coloring, perfect graphs Lecture 5 (05.04.2013) Graph coloring, perfect graphs Scribe: Tomasz Kociumaka Lecturer: Marcin Pilipczuk 1 Introduction to graph coloring Definition 1. Let G be a simple undirected graph and k a positive

More information

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES INTRODUCTION CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES This worksheet complements the Click and Learn developed in conjunction with the 2011 Holiday Lectures on Science, Bones, Stones, and Genes:

More information

Shortest paths with negative lengths

Shortest paths with negative lengths Chapter 8 Shortest paths with negative lengths In this chapter we give a linear-space, nearly linear-time algorithm that, given a directed planar graph G with real positive and negative lengths, but no

More information

Connectivity and tree structure in finite graphs arxiv: v5 [math.co] 1 Sep 2014

Connectivity and tree structure in finite graphs arxiv: v5 [math.co] 1 Sep 2014 Connectivity and tree structure in finite graphs arxiv:1105.1611v5 [math.co] 1 Sep 2014 J. Carmesin R. Diestel F. Hundertmark M. Stein 20 March, 2013 Abstract Considering systems of separations in a graph

More information

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics. Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary

More information

Linear Algebra II. 2 Matrices. Notes 2 21st October Matrix algebra

Linear Algebra II. 2 Matrices. Notes 2 21st October Matrix algebra MTH6140 Linear Algebra II Notes 2 21st October 2010 2 Matrices You have certainly seen matrices before; indeed, we met some in the first chapter of the notes Here we revise matrix algebra, consider row

More information

Information Theory and Statistics Lecture 2: Source coding

Information Theory and Statistics Lecture 2: Source coding Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection

More information

Exact Algorithms for Dominating Induced Matching Based on Graph Partition

Exact Algorithms for Dominating Induced Matching Based on Graph Partition Exact Algorithms for Dominating Induced Matching Based on Graph Partition Mingyu Xiao School of Computer Science and Engineering University of Electronic Science and Technology of China Chengdu 611731,

More information

Realization Plans for Extensive Form Games without Perfect Recall

Realization Plans for Extensive Form Games without Perfect Recall Realization Plans for Extensive Form Games without Perfect Recall Richard E. Stearns Department of Computer Science University at Albany - SUNY Albany, NY 12222 April 13, 2015 Abstract Given a game in

More information

AGENDA Go Over DUT; offer REDO opportunity Notes on Intro to Evolution Cartoon Activity

AGENDA Go Over DUT; offer REDO opportunity Notes on Intro to Evolution Cartoon Activity Date: Number your notebook and label the top the following: EVEN Pages-LEFT SIDE Page 176- Concept Map Page 178- Sequence Page 180- Vocabulary Page 182- Warm Ups Page 184- Cartoon Questions HN- Natural

More information

Phylogenetic networks: overview, subclasses and counting problems Philippe Gambette

Phylogenetic networks: overview, subclasses and counting problems Philippe Gambette ANR-FWF-MOST meeting 2018-10-30 - Wien Phylogenetic networks: overview, subclasses and counting problems Philippe Gambette Outline An introduction to phylogenetic networks Classes of phylogenetic networks

More information

1 Efficient Transformation to CNF Formulas

1 Efficient Transformation to CNF Formulas 1 Efficient Transformation to CNF Formulas We discuss an algorithm, due to Tseitin [?], which efficiently transforms an arbitrary Boolean formula φ to a CNF formula ψ such that ψ has a model if and only

More information

Week 4. (1) 0 f ij u ij.

Week 4. (1) 0 f ij u ij. Week 4 1 Network Flow Chapter 7 of the book is about optimisation problems on networks. Section 7.1 gives a quick introduction to the definitions of graph theory. In fact I hope these are already known

More information

Unit 4 Review - Genetics. UNIT 4 Vocabulary topics: Cell Reproduction, Cell Cycle, Cell Division, Genetics

Unit 4 Review - Genetics. UNIT 4 Vocabulary topics: Cell Reproduction, Cell Cycle, Cell Division, Genetics Unit 4 Review - Genetics Sexual vs. Asexual Reproduction Mendel s Laws of Heredity Patterns of Inheritance Meiosis and Genetic Variation Non-Mendelian Patterns of Inheritance Cell Reproduction/Cell Cycle/

More information

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

The Lander-Green Algorithm. Biostatistics 666 Lecture 22 The Lander-Green Algorithm Biostatistics 666 Lecture Last Lecture Relationship Inferrence Likelihood of genotype data Adapt calculation to different relationships Siblings Half-Siblings Unrelated individuals

More information

Family Trees for all grades. Learning Objectives. Materials, Resources, and Preparation

Family Trees for all grades. Learning Objectives. Materials, Resources, and Preparation page 2 Page 2 2 Introduction Family Trees for all grades Goals Discover Darwin all over Pittsburgh in 2009 with Darwin 2009: Exploration is Never Extinct. Lesson plans, including this one, are available

More information

Cographs; chordal graphs and tree decompositions

Cographs; chordal graphs and tree decompositions Cographs; chordal graphs and tree decompositions Zdeněk Dvořák September 14, 2015 Let us now proceed with some more interesting graph classes closed on induced subgraphs. 1 Cographs The class of cographs

More information

Genetic Engineering and Creative Design

Genetic Engineering and Creative Design Genetic Engineering and Creative Design Background genes, genotype, phenotype, fitness Connecting genes to performance in fitness Emergent gene clusters evolved genes MIT Class 4.208 Spring 2002 Evolution

More information

Enumeration and symmetry of edit metric spaces. Jessie Katherine Campbell. A dissertation submitted to the graduate faculty

Enumeration and symmetry of edit metric spaces. Jessie Katherine Campbell. A dissertation submitted to the graduate faculty Enumeration and symmetry of edit metric spaces by Jessie Katherine Campbell A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1 10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex

More information

On the Subnet Prune and Regraft distance

On the Subnet Prune and Regraft distance On the Subnet Prune and Regraft distance Jonathan Klawitter and Simone Linz Department of Computer Science, University of Auckland, New Zealand jo. klawitter@ gmail. com, s. linz@ auckland. ac. nz arxiv:805.07839v

More information

The Gauss-Jordan Elimination Algorithm

The Gauss-Jordan Elimination Algorithm The Gauss-Jordan Elimination Algorithm Solving Systems of Real Linear Equations A. Havens Department of Mathematics University of Massachusetts, Amherst January 24, 2018 Outline 1 Definitions Echelon Forms

More information

Family Trees for all grades. Learning Objectives. Materials, Resources, and Preparation

Family Trees for all grades. Learning Objectives. Materials, Resources, and Preparation page 2 Page 2 2 Introduction Family Trees for all grades Goals Discover Darwin all over Pittsburgh in 2009 with Darwin 2009: Exploration is Never Extinct. Lesson plans, including this one, are available

More information

The Inflation Technique for Causal Inference with Latent Variables

The Inflation Technique for Causal Inference with Latent Variables The Inflation Technique for Causal Inference with Latent Variables arxiv:1609.00672 (Elie Wolfe, Robert W. Spekkens, Tobias Fritz) September 2016 Introduction Given some correlations between the vocabulary

More information

Maximising the number of induced cycles in a graph

Maximising the number of induced cycles in a graph Maximising the number of induced cycles in a graph Natasha Morrison Alex Scott April 12, 2017 Abstract We determine the maximum number of induced cycles that can be contained in a graph on n n 0 vertices,

More information

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas. Solving the Maximum Agreement Subtree and Maximum Compatible Tree problems on bounded degree trees LIRMM, Montpellier France 4th July 2006 Introduction The Mast and Mct problems: given a set of evolutionary

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

Matroid Secretary for Regular and Decomposable Matroids

Matroid Secretary for Regular and Decomposable Matroids Matroid Secretary for Regular and Decomposable Matroids Michael Dinitz Weizmann Institute of Science mdinitz@cs.cmu.edu Guy Kortsarz Rutgers University, Camden guyk@camden.rutgers.edu Abstract In the matroid

More information