Haplotyping estimation from aligned single nucleotide polymorphism fragments has attracted increasing

Size: px
Start display at page:

Download "Haplotyping estimation from aligned single nucleotide polymorphism fragments has attracted increasing"

Transcription

1 INFORMS Journal on Computing Vol. 22, No. 2, Spring 2010, pp issn eissn informs doi /ijoc INFORMS A Class Representative Model for Pure Parsimony Haplotyping Daniele Catanzaro Graphs and Mathematical Optimization, Computer Science Department, Université Libre de Bruxelles, B-1050 Brussels, Belgium, dacatanz@ulb.ac.be Alessandra Godi Istituto di Analisi dei Sistemi ed Informatica, Consiglio Nazionale delle Ricerche, Roma Italy, alessandragodi@inwind.it Martine Labbé Graphs and Mathematical Optimization, Computer Science Department, Université Libre de Bruxelles, B-1050 Brussels, Belgium, mlabbe@ulb.ac.be Haplotyping estimation from aligned single nucleotide polymorphism fragments has attracted increasing attention in recent years because of its importance in the analysis of fine-scale genetic data. Its application fields range from mapping of complex disease genes to inferring population histories, passing through designing drugs, functional genomics, and pharmacogenetics. The literature proposes several criteria for haplotyping populations, each of them characterized by biological motivations. One of the most important haplotyping criteria is parsimony, which consists of finding the minimum number of haplotypes necessary to explain a given set of genotypes. Parsimonious haplotype estimation is an -hard problem for which the literature has proposed several integer programming (IP) models. Here, we describe a new polynomial-sized IP model based on the concept of class representatives, already used for the coloring problem. We propose valid inequalities to strengthen our model and show, through computational experiments, that our model outperforms the best IP models currently known in literature. Key words: haplotype inference; computational biology; integer programming History: Accepted by Harvey Greenberg, former Area Editor for Computational Biology and Medical Applications; received October 2008; revised January 2009, February 2009, March 2009; accepted March Published online in Articles in Advance July 29, Introduction Diploid organisms, such as humans, are characterized by having the DNA organized in pairs of chromosomes, of which one copy is inherited from the father and the other from the mother. The recent completion of the sequencing phase of the Human Genome Project (Venter et al. 2001) showed that such copies are extremely similar and that the genomes of two different individuals are identical in more than 99% of the overall number of nucleotides. Nevertheless, differences at the genomic level (also known as polymorphisms) occur, on average, every 1,000 bases (Chakravarti 1998) and are (excluding the recombination process) the predominant form of human variation as well as of genetic diseases (Hoehe et al. 2000, Terwilliger and Weiss 1998). When a site (i.e., the position of a specific nucleotide) of the genome shows a statistically significant variability within a population (i.e., a set of individuals) it is then called a single nucleotide polymorphism (SNP). Specifically, a site is considered a SNP if for a minority of the population a certain nucleotide is observed (called the least frequent allele) while for the rest of the population a different nucleotide is observed (the most frequent allele). For a given SNP, an individual can be either homozygous (i.e., possess the same allele on both chromosomes) or heterozygous (i.e., possess two different alleles). The values of a set of SNPs on a particular chromosome region define a haplotype. Haplotyping an individual therefore consists of determining, for each copy of a given chromosome region, a pair of haplotypes. Haplotyping populations of individuals has attracted an increasing amount of attention in recent years (Helmuth 2001, Marshall 1999) because of its importance in the analysis of fine-scale genetic data (Clark et al. 1998, Schwartz et al. 2002). For example, haplotypes are necessary in evolutionary studies to extract the information needed to detect diseases and to reduce the number of tests to be carried out. In 195

2 196 INFORMS Journal on Computing 22(2), pp , 2010 INFORMS functional genomics, haplotypes are used to discover a functional gene or to study an altered response of an organism to a particular therapy. In human pharmacogenetics, haplotypes explain why people react differently to different types or amounts of drugs. In fact, because SNPs affect the structure and function of proteins and enzymes, they may influence the way in which a drug is absorbed and metabolized. Direct sequencing of haplotypes via experimental methods is both time-consuming and expensive, and therefore current molecular sequencing methods generally provide more general genotype information. Specifically, genotype data provide information about the multiplicity of each SNP allele of a given individual, i.e., knowledge about its homozygous or heterozygous nature. Unfortunately, a drawback of using genotype data is that information regarding which heterozygous site SNP variants came from the same chromosome copy remains unknown (Wang and Xu 2003). Hence, in silico haplotyping methods become attractive alternatives and in some cases the only viable way for haplotyping populations (Lancia et al. 2004). The simplest way for haplotyping a population is described in Bonizzoni et al. (2003), Gusfield (2003), and Lancia et al. (2004) and can be resumed as follows: first, to experimentally obtain genotype data, and subsequently, for each individual, to retrieve the haplotypes computationally i.e., to find a set of haplotypes such that, if they are assumed to be the corresponding set of chromosome copies, then computing the multiplicity of each SNP allele one can obtain exactly the genotypes given. However, this approach requires the presence of some haplotyping criterion. Several criteria have been proposed for haplotyping populations, each of them based on biological motivations (see, for example, Bafna et al. 2003, Clark et al. 1998, Eskin et al. 2003, Excoffier and Slatkin 1995, Fallin and Schork 2000, Lancia and Rizzi 2006, Niu et al. 2002, Qin et al. 2002, Stephens and Donnelly 2003, Stephens et al. 2001). In this article we consider the parsimony criterion (Lancia et al. 2004). The idea at the core of the parsimony criterion is that under many plausible explanations of an observed phenomenon, the one requiring the fewest assumptions should be preferred (Semple and Steel 2003). Hence, because the number of distinct haplotypes observed in a population is much smaller than the number of possible haplotypes, the parsimony-based approaches aim to determine the minimum number of different haplotypes that, combined in pairs over time, have given rise to a set of observed genotypes (Lancia et al. 2004). The problem of haplotyping populations under parsimony (hereafter denoted as the pure parsimony haplotyping (PPH) problem) is known to be -hard (Lancia et al. 2004). This result justifies the development of several enumerative optimization algorithms that aim to solve exactly instances of PPH. Specifically, Gusfield (2003) first proposed an integerprogramming model to tackle instances of PPH. The author described a model, exponential in size, characterized by two kinds of variables one for haplotypes and the other for haplotype pairs and by the exhaustive generation of the set of all haplotypes compatible with some genotype in the input. Similar integer programming models were also used by Brown and Harrower (2004) and Bertolazzi et al. (2008). To minimize the number of distinct haplotypes, Brown and Harrower (2004) proposed constructing haplotype vectors by associating a variable to each site; they subsequently used constraints to establish the exact haplotype structures. On the other hand, Bertolazzi et al. (2008) first formulated PPH as a minimization problem characterized by a polynomial number of variables and constraints. Then the authors turned the problem into a maximization problem and strengthened the model by using clique inequalities, symmetry breaking, inequalities, and dominance relations. Whereas Gusfield (2003) and Bertolazzi et al. (2008) used commercial mixed-integer programing solvers (CPLEX and Xpress-MP, respectively) to get solutions to their models, Brown and Harrower (2004) used a branch-and-cut algorithm to solve their polynomial model. A comparison of their results shows that the Brown and Harrower (2004) polynomial model is well suited for big dimension samples, whereas the Gusfield (2003) and Bertolazzi et al. (2008) models are more efficient for medium dimensions, and specifically, when the recombination level (i.e., the parameter that affects the structure of the haplotypes) increases. So far, to our best knowledge, data sets containing 68 genotypes and 75 SNPs represent the limit size instances of PPH that can be exactly analyzed (Brown and Harrower 2006). In this article we present a new polynomialsized integer programming (IP) model for PPH. (See the Online Supplement, available at pubs.informs.org/ecompanion.html/, for codes and data.) The idea at the core of the model is based on class representatives already proposed by Campelo et al. (2008) for the coloring problem (Garey and Johnson 2003). Computational experiments show that our model outperforms the existing polynomial IP models for PPH (Bonizzoni et al. 2003, Brown and Harrower 2006, Gusfield 2003, Lancia et al. 2004) both on real and simulated data, and allows haplotyping of populations containing hundreds of genotypes and SNPs. The model is compact, easy to implement, solvable with standard solvers, and usable in those cases for which the parsimony criterion is well suited for haplotyping populations. 2. Notation and Problem Formulation Assume that a population of diploid individuals is given and that for each individual the knowledge of

3 INFORMS Journal on Computing 22(2), pp , 2010 INFORMS 197 Individual Chromosome region DNA sequence Paternal A T A G C T G C C C A A A T Maternal A T A G G T G C C C A T A T Paternal haplotype C C A } Maternal haplotype G C T genotype Figure 1 SNP1 SNP2 SNP3 An Example of Haplotypes and SNPs a specific (paternal and maternal) chromosome region is available. The sites of the chromosome regions from the individuals at which polymorphism arises are called SNP. Given a SNP, the most frequent allele in the population is encoded with the value 0 and the least frequent allele with the value 1. For an individual in the population, a SNP is homozygous if the corresponding paternal and maternal nucleotides are equal, and heterozygous otherwise. The set of SNPs of a paternal (or maternal) chromosome region of an individual defines a haplotype. Haplotypes can be encoded as strings of fixed length over an alphabet = 0 1. For example, consider the paternal and maternal chromosome regions from an individual as shown in Figure 1. Encode the first SNPs (i.e., C and G ) as 0 and 1, respectively; the second SNPs (i.e., C and C ) as 1; and the third SNPs (i.e., A and T ) as 0 and 1, respectively. Then the paternal haplotype is the string (010) and the maternal haplotype is the string (111). Because at each SNP only three possibilities can arise (homozygous of type 0 or 1, or heterozygous), a genotype can be encoded as strings of fixed length over an alphabet = 0 1 2, where entries equal to 0 (or 1) denote homozygous sites of type 0 (or 1), and entries equal to 2 denote heterozygous sites. Given a pair of haplotypes h i and h j, define the operator sum among h i and h j as the genotype g k whose pth entry is h ip if h ip = h jp, and 2 otherwise. For example, the genotype obtained by summing the paternal and maternal haplotypes in Figure 1 is g k = 212. We say that a genotype g k is resolved from the pair of haplotypes h i and h j if g k = h i h j. An instance of PPH is specified by indicating a set of m genotypes. A feasible solution to PPH is constituted by (i) a set H of (at most 2m (Lancia et al. 2004)) haplotypes and (ii) for each genotype g k, a pair of haplotypes, say, h i h j, resolving g k such that g k = h i h j. An optimal (or minimum) solution to PPH is a feasible solution in which the set H has the minimum cardinality (Blain et al. 2009). As an example, an instance of PPH and two alternative (minimum) solutions are shown in Figure 2. It is worth nothing that, although having the same set of haplotypes, Solutions 1 and 2 of Figure 2 are different. Bipartite graphs can be used to represent feasible solutions to PPH (see Blain et al. 2009). Specifically, a feasible solution can be represented by such a graph H E in which each vertex g k is of degree 2 and the two other vertices, say, h i and h j, adjacent to g k satisfy g k = h i h j (i.e., h i and h j are mates for g k ; see Blain et al. 2009). The bipartite graphs corresponding to Solutions 1 and 2 of Figure 2 are depicted in Figures 3(a) and 3(b), respectively. In the bipartite graph representation of a solution, each haplotype is incident to a subset of genotypes. The family of the subsets of genotypes induced by the haplotypes of a feasible solution satisfies the following three properties: (1) each subset of genotypes shares one haplotype, (2) each genotype belongs to exactly two subsets, and (3) every pair of subsets intersects in at most one genotype. As an example, the haplotypes of Solution 1 in Figure 2 induce the family of subsets of Figure 4(a) satisfying properties (1) (3). Specifically, the subsets are induced by the following four haplotypes: h 1 = 1100 inducing the subset S 1 = , h 2 = 0011 inducing the subset S 2 = , h 3 = 1001 inducing the subset S 3 = 1202, and finally, h 4 = 0110 inducing the subset S 4 = Similarly, the haplotypes of Solution 2 in Figure 2 induce the family of subsets of Figure 4(b) satisfying properties (1) (3). Specifically, the subsets are induced by the following four haplotypes: h 1 = 1100 inducing the subset S 1 = 1202, h 2 = 0011 inducing the subset S 2 = 0212, h 3 = 1001 inducing the subset S 3 = , and finally, h 4 = 0110 inducing the subset S 4 = Integer Programming Models In this section we describe an IP model for PPH. First, we describe a basic model that requires a polynomial number of variables and constraints. We subsequently show how to reduce the model size by exploiting the particular structure and properties of the variables involved. Finally, we provide valid inequalities to further strengthen the model Basic Model The integer programming model for PPH that we propose is based on the class representatives with the smallest index, similar to the one proposed by Campelo et al. (2008) to tackle the coloring problem (Garey and Johnson 2003). Following the properties of a feasible solution presented in the previous section, we associate an index to each subset S of genotypes induced by a haplotype h. Specifically, if i is the smallest index of a genotype belonging to S, then i is the index associated to S and the subset will be denoted as S i. Because each genotype g k belongs to exactly two subsets, it may happen that g k is itself the genotype with the smallest index in

4 198 INFORMS Journal on Computing 22(2), pp , 2010 INFORMS Instance of PPH Genotypes SNP Genotype Genotype Figure 2 Haplotypes Haplotype 1 Haplotype 2 Haplotype 3 Haplotype 4 Haplotypes Haplotype 1 Haplotype 2 Haplotype 3 Haplotype 4 Genotype 3 SNP SNP Minimum solution 1 Minimum solution Genotype 1 = Haplotype 1 Haplotype 2 Genotype 2 = Haplotype 1 Haplotype 3 Genotype 3 = Haplotype 2 Haplotype 4 Genotype 1 Graphical Representation of an Instance of PPH and Two Alternative Solutions both subsets. In this case a dummy genotype g k is added, and the subset S k is created. As an example, one can imagine that the haplotype h 1 induces the subset S i = g i g j g k, h 2 induces the subset S i = g i g l g r g s, h 3 induces the subset S k = g k g l g s g t, and so on. We remark that the index k can be considered only if k was previously used, i.e., if the subset S k already exists. For the sake of Figure 3 h 1 h 2 h 3 h 4 (a) g 1 g 2 g 3 h 1 h 2 h 3 h 4 (b) Bipartite Graph Representation of Solutions 1 and 2 of Figure 2 g 1 g 2 g 3 Conflation Conflation = Haplotype 3 Haplotype 4 Genotype 2 = Haplotype 1 Haplotype 3 Genotype 3 = Haplotype 2 Haplotype 4 notation, in the rest of the paper we will refer to the genotype g k as the genotype k. Because at most 2m haplotypes are necessary to resolve m genotypes (Lancia et al. 2004), the indices i of the subsets S i can vary inside K K, where K = 1 m and K = 1 m. Assume that an order is defined on K K in such a way that 1 < 1 < 2 < 2 < <m<m. Define x i, i K K, as a decision variable equal to 1 if, in the solution, there exists a haplotype inducing a subset S i of genotypes whose smallest index genotype is i, and 0 otherwise. Denote y k ij, k K i j K K, i<j, as a decision variable equal to 1 if the genotype k belongs to the subsets S i and S j, and 0 otherwise. Finally, denote as the set of the p SNPs characterizing each genotype g k in, and let z ip, i K K, p, be a decision variable equal to 1 if the haplotype inducing the subset S i of genotypes has such a value at the pth site, and 0 otherwise. Variables z ip describe explicitly the haplotypes of the solution. Then, an integer programming model for PPH follows: Formulation 1. Basic Model BM min x i (1) i K K s.t. x i x i i K (2)

5 INFORMS Journal on Computing 22(2), pp , 2010 INFORMS 199 (a) (b) Figure 4 Genotype 1 = 2222 Genotype 1 = 2222 Haplotype 1 = 1100 Haplotype 4 = 0110 Haplotype 3 = 1001 Haplotype 2 = 0011 Genotype 2 = 1202 Genotype 2 = 1202 Haplotype 1 = 1100 Genotype 3 = 0212 Haplotype 4 = 0110 Haplotype 3 = 1001 Genotype 3 = 0212 Haplotype 2 = 0011 Examples of Subsets of Genotypes Induced by the Two Alternative Solutions of Figure 2 i j K K y k ij j K K j i 1 k K (3) y k ij + j K K j<i y k ji x i k K i K K (4) y k kk x k k K (5) z kp = z k p = 0 k K p g kp = 0 (6) z kp = z k p = 1 k K p g kp = 1 (7) z kp + z k p = 1 k K p g kp = 2 (8) z ip 1 y k ij j K K j i y k ji j K K j<i p k K g kp =0 i K K i k k (9) z ip y k ij + j K K j i y k ji j K K j<i p k K g kp =1 i K K i k k (10) z ip + z jp y k ij p k K g kp = 2 i j K K (11) z ip + z jp 2 y k ij p k K g kp = 2 i j K K (12) x i z ip y k ij 0 1 (13) The objective function (1) represents the number of distinct haplotypes or equivalently the cardinality of H. Because the index i is considered only if i is already used, constraints (2) imply that if the haplotype h i is not used, then h i should not be used. Constraints (3) impose that each genotype g k must belong to exactly two subsets S i S j, and constraints (4) force x i to be 1, i.e., to take haplotype h i into account, if some genotype g k is resolved by h i. Constraints (5) are a consequence of the definition of the dummy genotype k ; actually, they constitute a special version of constraints (4) when genotype k is resolved by haplotype k. Constraints (6) (8) translate the sum operation among haplotypes. Specifically, constraints (6) impose that the pth SNP of the haplotype h i (h i ), which induces the subset S i (S i ), must be set to 0 when genotype g i has its pth SNP equal to 0. By analogy, constraints (7) impose that the pth SNP of the haplotype h i (h i ) must be set to 1 when genotype g i has its pth SNP equal to 1. Constraints (8) impose that exactly one among the pth SNPs of the haplotypes h i and h i can be set to 1 when genotype g i has its pth SNP is equal to 2. Constraints (9) establish the relations between variables z ip and yij k. Specifically, they force the pth SNP of the haplotype h i to be equal to 0 when at least one genotype g k, whose pth entry is equal to 0, belongs to the induced subset S i. By analogy, constraints (10) force the pth SNP of the haplotype h i to be equal to 1 when at least one genotype g k, whose pth entry is equal to 1, belongs to the induced subset S i. Finally, constraints (11) and (12) impose that the sum operation among haplotypes is respected for any pair of haplotypes h i and h j in the solution Reducing Model Size It is worth noting that the particular nature of the set of indices K K can be exploited to reduce the size of BM. In fact, given that yij k = 1 if and only if k belongs to two subsets having g i and g j for the smallest index genotype, we need to define variables yij k only when i<j k or i = k and j = k. For example, variable y1 2 1 does not need to be defined as well as all variables yii k for all i k K, k i. Similarly, variables yk ik or y k k i (depending on whether k is smaller or larger than i) do not need to be defined for i K K with i k and i k. In fact, if y k ik = 1, then k belongs to two subsets, one represented by i and the other by k, which contradicts the assumption that the dummy genotype k can be considered only if k is already used. By extending this analysis to all the possible cases in which variables yij k are redundant and assuming that variable y11 1 = 1, it is easy to see that variables yk ij do not need to be defined whenever they belong to one of the following sets: R 1 = y k ij k K i j K K j<i<k (14)

6 200 INFORMS Journal on Computing 22(2), pp , 2010 INFORMS R 2 = y k ik k K i K K i k 1 (15) R 3 = y k ii k K i K K 2 i k 1 (16) The sets of redundant variables can be further expanded by observing that for each triplet of genotypes g i g j g k such that the respective pth SNP is g ip = 0, g jp = 0, and g kp = 2, variable y k ij is necessarily equal to 0 since the containment of the genotype g k to the subsets S i and S j would violate the sum operator among haplotypes at least at the pth SNP. Extending this argument to all the possible combinations of triplets of genotypes that violate the haplotype sum operator, we see that the following proposition, whose proof is omitted, holds: Proposition 1. The set of variables R 4 = y k ij i j k K p g kp = 2 g ip = g jp 2 (17) is redundant. Similar arguments can be used to prove the following proposition: Proposition 2. The set of variables R 5 = { y k ij i j k K p g kp + g ip = 1 or g kp + g jp = 1 } (18) is redundant. Proof. By contradiction, if the set R 5 is not redundant, then for some p, genotype k may belong to two subsets S i and S j. Without loss of generality, assume g kp = 0 and g ip = 1. Because haplotype h i is associated with the subset S i and explains genotype i, it must have the pth SNP equal to 1; otherwise, the sum operator would be violated. In turn, this implies that genotype k cannot be resolved by h i because a necessary condition for its resolution is that the pth SNP of h i be equal to 0. Hence, independent of j, yij k = 0 in any feasible solution of PPH. Note that removing the redundant variables yij k of Propositions 1 and 2 can be performed in O m 3 n. Finally, a similar process of reduction can be applied to the variables z ip both by removing those whose value is fixed by constraints (6) and (7) and by the set of redundant variables (17) and (18). In this way, only variables z ip involved in constraints (8) need to be defined. In conclusion, denoting Y as the set of variables yij k, R = q R q, and Y = Y \R, we can be reduce BM to the following: Formulation 2. Reduced Model RM min x i (19) i K K s.t. x i x i i K (20) i j y k ij Y \R y k ij y k ij + j yij k Y \R j i 1 k K (21) j y k ij Y j<i y k ji x i k K i y k ij Y (22) y k kk x k k K (23) z kp + z k p = 1 k K p g kp = 2 (24) z ip 1 y k ij p k K z ip z ip y k ij z jp y k ij j y k ij Y j i j y k ij Y j i z ip 1 y k ij z jp 1 y k ij z ip + z jp y k ij y k ji j yij k Y j<i i y k ij Y g kp = 0 g ip = 2 (25) y k ij + p k K y k ji j yij k Y j<i i y k ij Y g kp = 1 g ip = 2 (26) p k K i j y k ij Y g ip = 2 g jp = 0 g kp = 2 (27) p k K i j y k ij Y g ip = 0 g jp = 2 g kp = 2 (28) p k K i j y k ij Y g ip = 2 g jp = 1 g kp = 2 (29) p k K i j y k ij Y g ip = 1 g jp = 2 g kp = 2 (30) p k K i j y k ij Y g ip = 2 g jp = 2 g kp = 2 (31) z ip + z jp 2 y k ij p k K i j y k ij Y g ip = 2 g jp = 2 g kp = 2 (32) x i z ip y k ij 0 1 (33) Note that the integrality of variables yij k suffices to guarantee the integrality of variables x i. Hence, we denote by RMM the formulation in which the integrality condition on variables x i is relaxed Strengthening Inequalities In this section, we provide valid inequalities to strengthen RM. Proposition 3. The inequality k K y k ij Y y k ij x i i j y k ij Y (34)

7 INFORMS Journal on Computing 22(2), pp , 2010 INFORMS 201 Proof. Two subsets S i and S j of a feasible solution can share at most one genotype. So it follows that k K yij k 0 1. If the solution is such that k K yij k = 0, then the inequality reduces to x i 0, which is trivially valid. If the solution is such that k K yij k = 1, then genotype k belongs to the subset represented by genotype i, which in turn implies x i = 1, and the inequality is again valid. Similarly, it is easy to see that the following proposition holds: Proposition 4. The inequality y k ij x j i j y k ij Y (35) k K y k ij Y Note that k K y k ij 0 1 also implies k S y k ij 0 1, for any S K. Let 1 k = k K g kp = 1. The following proposition holds: Proposition 5. The inequality z ip y k ij p i j y k ij Y (36) k 1 k yk ij Y Proof. If the solution is such that k k 1 yk ij = 0, then the inequality reduces to z ip 0, which is trivially valid. If the solution is such that k k 1 yk ij = 1, then the inequality reduces to z ip = 1, which is again valid. Note that inequality (36) only applies if g ip = 2. In fact, if g ip = 1, (36) is redundant, and if g ip = 0, the corresponding variables yij k are not defined. By analogy, fixing i, j, and p, consider 0 k = k K g kp = 0. Then, the following proposition holds: Proposition 6. The inequality z ip x i y k ij p i j y k ij Y (37) k 0 k yk ij Y Proof. If the solution is such that k k 0 yk ij = 0, then the inequality reduces to z ip x i, which is trivially valid. If the solution is such that k k 0 yk ij = 1, then genotype k belongs to the subset represented by genotype i, which implies x i = 1 and thus z ip = 0. Also in this case, the inequality (37) only applies if g ip = 2. In fact, if g ip = 0 and if g ip = 1, the corresponding variables yij k are not defined. Defining 2 k = k K g kp = 2, we have the following proposition: Proposition 7. The inequality z ip + z jp y k ij + 2 k 2 k yk ij Y k 1 k yk ij Y y k ij p i j y k ij Y (38) Proof. The right-hand side of the inequality k k 2 yk ij + 2 k k 1 yk ij If the solution is such that k k 2 yk ij + 2 k k 1 yk ij = 0, then the inequality reduces to z ip + z jp 0, which is trivially valid. If the solution is such that k p 2 yk ij = 1, then the inequality reduces to z ip + z jp 1, which is still valid. Finally, if the solution is such that k k 1 yk ij = 1, then the inequality reduces to z ip +z jp = 2, which is again valid. In fact, if haplotypes h i and h j resolve genotype k, then they must be characterized by having the pth SNP equal to 1. Similar arguments can be used to prove the following propositions: Proposition 8. The inequality z ip + z jp x i + x j y k ij 2 k 2 k yk ij Y k 0 k yk ij Y y k ij p i j y k ij Y (39) A distinct class of inequalities are obtained by introducing the concept of conflict among genotypes (Bertolazzi et al. 2008) at the pth SNP. Specifically, consider a pair of two genotypes g i g j and assume that there exists a conflict; i.e., one of the two genotypes (e.g., g i ) is such that g ip = 1, whereas the corresponding entry of the other is such that g jp = 0 (or vice versa). Then this condition is sufficient to exclude common haplotypes explaining g i g j because their existence would violate the haplotype sum operator. The rationale can be extended to all the subset of genotypes having conflicts and can be formalized as follows. Define CG as a graph whose set of vertices is the set of genotypes and whose set of edges contains all the pairs of genotypes g i g j that are in conflict. Then, the following proposition holds: Proposition 9. Let C be a clique in CG, then the inequality y k ij x i i y k ij Y C CG (40) k C j yij k Y j i Proof. Only one genotype in C can be shared by two subsets S i and S j of a feasible solution. This implies that k C j yij k Y j i yk ij 0 1. If the solution is such that k C j yij k Y j i yk ij = 0, then the inequality reduces to x i 0, which is valid. If the solution is such that k C j yij k Y j i yk ij = 1, then the genotype k belongs to the subset represented by genotype i, which in turn implies x i = 1; hence, the inequality is again valid.

8 202 INFORMS Journal on Computing 22(2), pp , 2010 INFORMS Proposition 9 can be further extended to variables z ip. Specifically, define the set Ep 1 = g i g j 1 k : q q p g iq + g jq = 1 for all p, and consider the graph CG 1 p = 1 k E1 p CG. Then the following proposition holds: Proposition 10. The inequality z ip y k ij + k C 1 j yij k Y j i k C 1 j yij k Y j<i y k ji p i y k ij Y clique C 1 CG 1 p (41) By analogy, defining the set Ep 0 = g i g j 0 k : q q p g iq + g jq = 1, for all p, and considering the graph CG 0 p = 0 k E0 p CG, we have the following proposition: Proposition 11. The inequality z ip x i y k ij k C 0 j yij k Y j i k C 0 j yij k Y j<i y k ji p i y k ij Y clique C 0 CG 0 p (42) The proofs of Propositions 10 and 11 are quite similar to the ones for Propositions 7 and 8 and are omitted. Finally, observe that Propositions 10 and 11 imply Proposition 9 and only apply if g ip = Experiments In this section we analyze the performances of our model to solve the pure parsimony haplotyping problem. Our experiments were motivated by a number of goals namely, to evaluate, with respect to BM, the benefits obtained by removing the redundant variables and including the valid inequalities previously described; to compare the performances of our model with the ones obtained by Brown and Harrower s HybridIP model (Brown and Harrower 2006), currently the best model for PPH; and finally, to allow the analysis of larger data sets with respect to the ones currently analyzed. Similar to Brown and Harrower (2006), we emphasize that our experiments aim simply to evaluate the runtime performance of our model for solving PPH. We neither attempt to study the efficiency of PPH for haplotype inference nor compare the accuracy of our algorithm to haplotype inference solvers that do not use the parsimony criterion; this analysis has been performed by both Gusfield (2003) and Wang and Xu (2003), and we refer the interested reader to these articles Implementation We implemented BM and RM by means of Mosel 2.0 of Xpress-MP, Optimizer version 18, running on a Pentium 4, 3.2 GHz, equipped with 2 GB of RAM, and operating system Gentoo release 7 (kernel linux ). We have combined RM with different constraints obtaining two other models specifically, RM including the valid inequalities (hereafter indicated with SM) and RMM including the valid inequalities and such that variables x i are declared continuous (hereafter indicated with SMM). To evaluate the benefits of the valid inequalities, we have deactivated the Xpress Optimizer automatic cuts and the pre-solving strategy. Finally, we have used the Xpress-MP primal heuristic to generate the first upper bound Separation Oracle for the Valid Inequalities When using models SM and SMM, the valid inequalities (34) and (35) are loaded together with RM. On the contrary, the valid inequalities (36) (39) are dynamically generated by means of a separation oracle. Specifically, let x ȳ z be a current solution at a given node of the search tree. For each p, we build the set 1 k and for all i j such that yk ij Y, and we check if z ip < k k 1 yk ij Y ȳk ij. If so, the inequality z ip y k ij k k 1 yk ij Y is added. To reduce the computational overhead, no more than 10 cuts are added at each node of the search tree. A similar procedure is used to separate inequalities (37) (39). If no inequality of types (36) (39) is added to the formulation, then the separation oracle checks for violated clique inequalities (41) (42). Before running the exact search, we precompute the graph CG for the instance analyzed. At a given node of the search tree, we subsequently pick at random an index p and an index i yij k Y, and we dynamically construct the graph CG 1 p and set w k = j yij k Y j i ȳk ij + j yij k Y j<iȳk ji, k CG1 p. If z ip k w p k, then there does not exist any violated inequality in CG 1 p 1, and we select at random a different index i; otherwise, we proceed by finding a maximum clique in CG 1 p. After testing alternative (heuristic and exact) solution strategies for finding a maximum clique in a graph, a well-known hard problem (Pardalos and Xue 1994), we opted for Nemhauser and Trotter s exact algorithm (Pardalos and Xue 1994), which proved to be, for the data sets analyzed, a good trade-off between speed (a few milliseconds) and efficiency. Specifically, Nemhauser and Trotter s algorithm consists in solving the model Formulation 3. k 1 max z o = k=1 w k t k (43) s.t. t i + t j 1 i j E 1 p (44) t i 0 1 (45)

9 INFORMS Journal on Computing 22(2), pp , 2010 INFORMS 203 where t i denotes a decision variable equal to 1 if vertex t i 1 k belongs to the clique and 0 otherwise, and E p 1 is the complement of the edge set E1 p. If the optimal value of the objective function is greater than z ip, then the inequality z ip k C 1 j yij k Y j i y k ij + k C 1 j yij k Y j<i is added to the formulation. A similar procedure is used for separating the clique inequalities (42). To speed up the cut generation, branching priorities on the integer variables were introduced. Specifically, first the branches are performed on variables z ip, then on variables x i, and finally on variables y k ij. Moreover, the search is performed previously on variables z ip whose relaxed values are in the interval and subsequently on those whose relaxed values are in the interval Basic and Reduced Models on Simulated Data Sets Analogously to Brown and Harrower (2006), we used Hudson s ms program (Hudson 1990) to generate benchmark simulated data sets to compare the performances of BM and RM. Under the neutral evolution hypothesis (Kimura 1983), ms is able to generate many independent replicate samples (haplotypes) through a variety of assumptions about migration, recombination rate, and population size. A coalescent approach (Hudson 1990) is at the core of the simulator: a random genealogy of the sample is generated, and then mutations are randomly placed on the genealogy. A parameter called recombination level allows one to tune the heterogeneity degree of the haplotypes in a population: as r increases, haplotypes become increasingly different from each other, and hence the number of heterozygous sites in genotype samples tends to become large. Using a strategy similar to the one already proposed by Brown and Harrower (2006) we generated three collections: 10 data sets containing 80 samples of 50 SNPs each; 10 data sets containing 140 samples of 70 SNPs each; and 10 data sets containing 200 samples of 100 SNPs each, respectively. For each data set, the recombination level was randomly chosen in The reason for choosing nonzero recombination levels was due to the fact that positive recombination levels may lead to more difficult PPH instances (Brown and Harrower 2006). To obtain data sets of genotypes, sample data sets were paired randomly so that all of them were used. This process led to the generation of three data sets of 10 instances, each containing 40 genotypes, 70 genotypes, and 100 genotypes, respectively. Duplicate genotypes were removed though a preprocessing step. It is worth noting that this reduction does y k ji Table 1 Overview of the Data Sets Analyzed to Compare the Basic and the Reduced Models Max no. Recombination Mean no. of 2s per No. of Genotypes SNPs level of 2s sites genotype (mean) data sets Random in Random in Random in not interfere with the value of the optimal solution or reduce the gap between the optimal solution of the IP problem and its relaxation. Furthermore, Brown and Harrower s preprocessing step consisting in the elimination of duplicate columns was also implemented (Brown and Harrower 2006). Table 1 summarizes the details about the genotype data sets used, whereas Table 2 shows the results obtained by BM and RM. The second and third columns of Table 2 indicate the mean and the standard deviation of the time (RAM) required by BM and RM to solve a given data set. The fourth column indicates the maximum number of branch-and-bound branches required by BM and RM to solve a given data set. Finally, the fifth column shows the overall number of instances solved by BM and RM on each data set. In general, numerical experiments show that RM performs better than BM both in terms of runtime and memory. The reduction of variables yij k and z ip (and the corresponding constraints) performed by RM approaches 99% of the overall number of variables and constraints required by BM. As a consequence, this strategy leads to a model requiring only a few megabytes of RAM, whereas several hundred of megabytes of RAM are required by the BM model (the main reason the BM model is unable to tackle instances of PPH larger than 40 genotypes). Because of its better performance, we prefer RM to BM for the remainder of our numerical analysis Performance Analysis on Brown and Harrower s Data Sets We used Brown and Harrower s data sets (Brown and Harrower 2006) for testing the performances of our models versus Brown and Harrower s HybridIP stateof-the-art model for PPH. Specifically, using Hudson s (1990) ms program, the authors created two families of data sets (called the uniform and nonuniform data sets) by randomly pairing the resulting haplotypes. The distinction in the two simulated methods comes in how the random pairing is performed. In the uniform data sets the haplotypes are randomly paired by sampling uniformly from the set of distinct haplotypes. In the nonuniform data sets the haplotypes are sampled uniformly from the collection of haplotypes generated by the coalescent process. In this collection,

10 204 INFORMS Journal on Computing 22(2), pp , 2010 INFORMS Table 2 Results Obtained by the Basic and the Reduced Models on the Data Sets Analyzed Time (s) RAM required (MB) Max no. of branches Fraction solved Data sets BM RM BM RM BM RM BM RM 40 50s ± ± ± 20 5 ± /10 10/ s ± 0 94 > ± 3 1 0/10 10/ s ± 2 33 > ± 6 1 0/10 10/10 haplotypes may not be unique, so some haplotypes are sampled with higher frequency than others. As shown in Table 3, both the uniform and nonuniform data sets consist of collections of 30 or 50 genotypes having 10, 30, 50, 75, or 100 SNPs each. Each data set contains a number of instances varying between 15 and 50. Brown and Harrower (2006) also considered biological inputs from chromosomes 10 and 21 over all four HapMap (International HapMap Consortium 2004) populations. For each input length, the authors selected sequences of lengths 30, 50, and 75, giving a total of eight data sets consisting of three instances each. Table 4 summarizes the main characteristics of Brown and Harrower s biological data sets. We show in Tables 5 7 the performances of our models on the uniform, nonuniform, and biological data sets, respectively. Specifically, the columns of each table evidence the mean, the maximum, and the minimum of the solution time, the gap (i.e., the difference between the optimal value found and the value of linear relaxation at the root node of the search tree, divided by the optimal value), the number of branches performed, and the number of cuts added to solve each group of instances belonging to a given data set Uniform Data Sets. Brown and Harrower (2006) showed that the HybridIP model is able to Table 3 Characteristics of Brown and Harrower s (2006) Artificial Data Sets Max no. of 2s Mean no. of 2s Recombination Genotypes SNPs in a genotype in a genotype level Uniform Nonuniform n.a n.a n.a n.a n.a. solve instances of the data sets having a genotype length of 30 or less. Specifically, the authors observed that HybridIP needs a run time ranging from 7 seconds to 2 minutes to solve instances of a data set having genotype length of 10, and from 30 seconds to an hour to solve instances of the data set having genotype length of 30. However, HybridIP is only partially able to solve instances of the data sets having a genotype length of 30 or greater. In fact, HybridIP solved 70% of the instances of the data set having a genotype length of 50, 60% of instances of the data set having a genotype length of 75, and only 30% of the instances of the data set having a genotype length of 100. The run times range from five seconds to two hours. On the contrary, our numerical experiments (see Table 5) show that RM on average is able to solve instances of the data sets having genotype length of 10 in about 8 seconds, although five instances of the data set 50 10r16 (specifically, instances 02, 04, 06, 11, and 13) took a bit longer (22.44, , , , and Table 4 Characteristics of Brown and Harrower s (2006) Biological Data Sets Max no. of 2s Mean no. of 2s Data set Genotypes SNPs in a genotype in a genotype Biological Test-chr10-CEU Test-chr10-CEU Test-chr10-CEU Test-chr10-HCB Test-chr10-HCB Test-chr10-HCB Test-chr10-JPT Test-chr10-JPT Test-chr10-JPT Test-chr10-YRI Test-chr10-YRI Test-chr10-YRI Test-chr21-CEU Test-chr21-CEU Test-chr21-CEU Test-chr21-HCB Test-chr21-HCB Test-chr21-HCB Test-chr21-JPT Test-chr21-JPT Test-chr21-JPT Test-chr21-YRI Test-chr21-YRI Test-chr21-YRI

11 INFORMS Journal on Computing 22(2), pp , 2010 INFORMS 205 Table 5 Performances of RM Time (sec.) Gap (%) Nodes Data set Average Max Min Average Max Min Average Max Min Uniform r r Nonuniform Biological CHR10-CEU CHR10-HCB CHR10-JPT CHR10-YRI CHR21-CEU CHR21-HCB CHR21-JPT CHR21-YRI seconds, respectively). Instances of the data set having a genotype length of 30 on average have been solved in about seconds, with two instances (02 and 08) taking more than the average time (53.42 and seconds, respectively). The remaining ones took less than nine seconds. Instances of the data sets having genotype length greater than 30 on average were solved in at most seconds, with a maximum solution of hence, much less than Table 6 Performances of SM HybridIP. Moreover, the time taken by the Xpress-MP primal heuristic to obtain the first feasible solution was always smaller than one second even in the most difficult instance. However, it is worth noting that the hardware used in our experiments is twice as fast than that of Brown and Harrower (2006); hence, a direct comparison of the runtimes is unfair. The comparison of the gap and the branches instead provides a more independent insight about the effectiveness Time (sec.) Gap (%) Nodes Cuts Data set Average Max Min Average Max Min Average Max Min Average Max Min Uniform r r Nonuniform Biological CHR10-CEU CHR10-HCB CHR10-JPT CHR10-YRI CHR21-CEU CHR21-HCB CHR21-JPT CHR21-YRI

12 206 INFORMS Journal on Computing 22(2), pp , 2010 INFORMS Table 7 Performances of SMM Time (sec.) Gap (%) Nodes Cuts Data set Average Max Min Average Max Min Average Max Min Average Max Min Uniform r r Nonuniform Biological CHR10-CEU CHR10-HCB CHR10-JPT CHR10-YRI CHR21-CEU CHR21-HCB CHR21-JPT CHR21-YRI of our model. To this end, we observe that, unlike HybridIP, the gap is 0 in at least the 50% of the instances analyzed and that several instances are typically solved without branching. SM shows better performances than RM (see Table 6) both in terms of solution times and gaps. In fact, on average, SM was able to solve instances of the data sets having genotype length of 10 in about 5.8 seconds, and maximum and minimum solution times were always smaller than RM. The first two data sets were solved at the root node, whereas an instance of the third data set needed at most 49 branches to reach the optimum. The efficiency of the valid inequalities is confirmed by both the smaller gaps than RM and the average number of cuts added during the exact search. This trend is also confirmed for the data sets in which genotype length was greater than or equal to 30, with the only exception being instance 19 of the data set 30 50, which took seconds versus seconds of RM. This instance had a gap of (versus 0.0 of RM), a number of branches equal to 1,109 (versus 75 of RM), and the number of cuts added equal to 741. The anomaly of this instance seems to be due to the valid inequalities (34) and (35), which negatively interfere with the primal heuristic of the Xpress Optimizer. Finally, SMM shows worse running time performances than RM and SM (see Table 7) in all uniform data sets. This is in contrast to the average, maximum, and minimum gaps, which are always smaller than RM and SM. Once again, the anomalous behavior is due to the valid inequalities (34) and (35), which interfere both with the primal heuristic of the Xpress Optimizer and with the automatic branching strategy Nonuniform Data Sets. Brown and Harrower (2006) showed that on the nonuniform data sets the HybridIP solves the instances having lengths 10 and 30. Specifically, the instances having genotypes of length 10 are solved in 1 second, and ten out of the fifteen length 30 instances are solved in less than 15 seconds. However, the HybridIP was only able to solve six instances having genotype length of 50, although those instances were solved in 39 seconds or less. Again, the Hybrid was able to solve 5 of the 15 instances with a genotype length of 75, two of which were solved in less than one minute. Finally, the HybridIP is able to solve three instances having genotype length 100, two of which solved in under two minutes. Numerical experiments show that our models outperform HybridIP on the nonuniform data sets. In fact, as shown in Table 5, RM on average is able to solve instances of the data sets having genotype length of 10 in about seconds, with a maximum runtime of seconds; instances having genotype length of 30 in about seconds, with a maximum runtime of seconds; instances having genotype length of 50 in about seconds, with a maximum runtime of seconds; instances having genotype length of 75 in about seconds, with a maximum

A Class Representative Model for Pure Parsimony Haplotyping

A Class Representative Model for Pure Parsimony Haplotyping A Class Representative Model for Pure Parsimony Haplotyping Daniele Catanzaro, Alessandra Godi, and Martine Labbé June 5, 2008 Abstract Haplotyping estimation from aligned Single Nucleotide Polymorphism

More information

SAT in Bioinformatics: Making the Case with Haplotype Inference

SAT in Bioinformatics: Making the Case with Haplotype Inference SAT in Bioinformatics: Making the Case with Haplotype Inference Inês Lynce 1 and João Marques-Silva 2 1 IST/INESC-ID, Technical University of Lisbon, Portugal ines@sat.inesc-id.pt 2 School of Electronics

More information

The pure parsimony haplotyping problem: overview and computational advances

The pure parsimony haplotyping problem: overview and computational advances Intl. Trans. in Op. Res. 16 (2009) 561 584 DOI: 10.1111/j.1475-3995.2009.00716.x INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH The pure parsimony haplotyping problem: overview and computational advances

More information

Efficient Haplotype Inference with Boolean Satisfiability

Efficient Haplotype Inference with Boolean Satisfiability Efficient Haplotype Inference with Boolean Satisfiability Joao Marques-Silva 1 and Ines Lynce 2 1 School of Electronics and Computer Science University of Southampton 2 INESC-ID/IST Technical University

More information

Haplotyping as Perfect Phylogeny: A direct approach

Haplotyping as Perfect Phylogeny: A direct approach Haplotyping as Perfect Phylogeny: A direct approach Vineet Bafna Dan Gusfield Giuseppe Lancia Shibu Yooseph February 7, 2003 Abstract A full Haplotype Map of the human genome will prove extremely valuable

More information

An Overview of Combinatorial Methods for Haplotype Inference

An Overview of Combinatorial Methods for Haplotype Inference An Overview of Combinatorial Methods for Haplotype Inference Dan Gusfield 1 Department of Computer Science, University of California, Davis Davis, CA. 95616 Abstract A current high-priority phase of human

More information

Allen Holder - Trinity University

Allen Holder - Trinity University Haplotyping - Trinity University Population Problems - joint with Courtney Davis, University of Utah Single Individuals - joint with John Louie, Carrol College, and Lena Sherbakov, Williams University

More information

On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem

On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem Paola Bonizzoni, Riccardo Dondi, Gunnar W. Klau, Yuri Pirola, Nadia Pisanti and Simone Zaccaria DISCo, computer

More information

Mathematical Approaches to the Pure Parsimony Problem

Mathematical Approaches to the Pure Parsimony Problem Mathematical Approaches to the Pure Parsimony Problem P. Blain a,, A. Holder b,, J. Silva c, and C. Vinzant d, July 29, 2005 Abstract Given the genetic information of a population, the Pure Parsimony problem

More information

Efficient Haplotype Inference with Answer Set Programming

Efficient Haplotype Inference with Answer Set Programming Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Efficient Haplotype Inference with Answer Set Programming Esra Erdem and Ferhan Türe Faculty of Engineering and Natural

More information

Integer Programming in Computational Biology. D. Gusfield University of California, Davis Presented December 12, 2016.!

Integer Programming in Computational Biology. D. Gusfield University of California, Davis Presented December 12, 2016.! Integer Programming in Computational Biology D. Gusfield University of California, Davis Presented December 12, 2016. There are many important phylogeny problems that depart from simple tree models: Missing

More information

Complexity and Approximation of the Minimum Recombination Haplotype Configuration Problem

Complexity and Approximation of the Minimum Recombination Haplotype Configuration Problem Complexity and Approximation of the Minimum Recombination Haplotype Configuration Problem Lan Liu 1, Xi Chen 3, Jing Xiao 3, and Tao Jiang 1,2 1 Department of Computer Science and Engineering, University

More information

The Pure Parsimony Problem

The Pure Parsimony Problem Haplotyping and Minimum Diversity Graphs Courtney Davis - University of Utah - Trinity University Some Genetics Mother Paired Gene Representation Physical Trait ABABBA AAABBB Physical Trait ABA AAA Mother

More information

Solution of Large-scale LP Problems Using MIP Solvers: Repeated Assignment Problem

Solution of Large-scale LP Problems Using MIP Solvers: Repeated Assignment Problem The Eighth International Symposium on Operations Research and Its Applications (ISORA 09) Zhangjiajie, China, September 20 22, 2009 Copyright 2009 ORSC & APORC, pp. 190 197 Solution of Large-scale LP Problems

More information

Integer Programming Formulations for the Minimum Weighted Maximal Matching Problem

Integer Programming Formulations for the Minimum Weighted Maximal Matching Problem Optimization Letters manuscript No. (will be inserted by the editor) Integer Programming Formulations for the Minimum Weighted Maximal Matching Problem Z. Caner Taşkın Tınaz Ekim Received: date / Accepted:

More information

Critical Reading of Optimization Methods for Logical Inference [1]

Critical Reading of Optimization Methods for Logical Inference [1] Critical Reading of Optimization Methods for Logical Inference [1] Undergraduate Research Internship Department of Management Sciences Fall 2007 Supervisor: Dr. Miguel Anjos UNIVERSITY OF WATERLOO Rajesh

More information

Learning ancestral genetic processes using nonparametric Bayesian models

Learning ancestral genetic processes using nonparametric Bayesian models Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew

More information

Claw-free Graphs. III. Sparse decomposition

Claw-free Graphs. III. Sparse decomposition Claw-free Graphs. III. Sparse decomposition Maria Chudnovsky 1 and Paul Seymour Princeton University, Princeton NJ 08544 October 14, 003; revised May 8, 004 1 This research was conducted while the author

More information

Haplotype Inference Constrained by Plausible Haplotype Data

Haplotype Inference Constrained by Plausible Haplotype Data Haplotype Inference Constrained by Plausible Haplotype Data Michael R. Fellows 1, Tzvika Hartman 2, Danny Hermelin 3, Gad M. Landau 3,4, Frances Rosamond 1, and Liat Rozenberg 3 1 The University of Newcastle,

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics. Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary

More information

Mathematical models in population genetics II

Mathematical models in population genetics II Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population

More information

Introduction to population genetics & evolution

Introduction to population genetics & evolution Introduction to population genetics & evolution Course Organization Exam dates: Feb 19 March 1st Has everybody registered? Did you get the email with the exam schedule Summer seminar: Hot topics in Bioinformatics

More information

Technische Universität Dresden Institute of Numerical Mathematics

Technische Universität Dresden Institute of Numerical Mathematics Technische Universität Dresden Institute of Numerical Mathematics An Improved Flow-based Formulation and Reduction Principles for the Minimum Connectivity Inference Problem Muhammad Abid Dar Andreas Fischer

More information

The Minimum k-colored Subgraph Problem in Haplotyping and DNA Primer Selection

The Minimum k-colored Subgraph Problem in Haplotyping and DNA Primer Selection The Minimum k-colored Subgraph Problem in Haplotyping and DNA Primer Selection M.T. Hajiaghayi K. Jain K. Konwar L.C. Lau I.I. Măndoiu A. Russell A. Shvartsman V.V. Vazirani Abstract In this paper we consider

More information

An Exact Algorithm for the Steiner Tree Problem with Delays

An Exact Algorithm for the Steiner Tree Problem with Delays Electronic Notes in Discrete Mathematics 36 (2010) 223 230 www.elsevier.com/locate/endm An Exact Algorithm for the Steiner Tree Problem with Delays Valeria Leggieri 1 Dipartimento di Matematica, Università

More information

Tree-average distances on certain phylogenetic networks have their weights uniquely determined

Tree-average distances on certain phylogenetic networks have their weights uniquely determined Tree-average distances on certain phylogenetic networks have their weights uniquely determined Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Humans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase

Humans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase Humans have two copies of each chromosome Inherited from mother and father. Genotyping technologies do not maintain the phase Genotyping technologies do not maintain the phase Recall that proximal SNPs

More information

via Tandem Mass Spectrometry and Propositional Satisfiability De Novo Peptide Sequencing Renato Bruni University of Perugia

via Tandem Mass Spectrometry and Propositional Satisfiability De Novo Peptide Sequencing Renato Bruni University of Perugia De Novo Peptide Sequencing via Tandem Mass Spectrometry and Propositional Satisfiability Renato Bruni bruni@diei.unipg.it or bruni@dis.uniroma1.it University of Perugia I FIMA International Conference

More information

On the Complexity of SNP Block Partitioning Under the Perfect Phylogeny Model

On the Complexity of SNP Block Partitioning Under the Perfect Phylogeny Model On the Complexity of SNP Block Partitioning Under the Perfect Phylogeny Model Jens Gramm Wilhelm-Schickard-Institut für Informatik, Universität Tübingen, Germany. Tzvika Hartman Dept. of Computer Science,

More information

Reconstruction of certain phylogenetic networks from their tree-average distances

Reconstruction of certain phylogenetic networks from their tree-average distances Reconstruction of certain phylogenetic networks from their tree-average distances Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu October 10,

More information

Improvements to Core-Guided Binary Search for MaxSAT

Improvements to Core-Guided Binary Search for MaxSAT Improvements to Core-Guided Binary Search for MaxSAT A.Morgado F.Heras J.Marques-Silva CASL/CSI, University College Dublin Dublin, Ireland SAT, June 2012 Outline Motivation Previous Algorithms - Overview

More information

A Survey of Computational Methods for Determining Haplotypes

A Survey of Computational Methods for Determining Haplotypes A Survey of Computational Methods for Determining Haplotypes Bjarni V. Halldórsson, Vineet Bafna, Nathan Edwards, Ross Lippert, Shibu Yooseph, and Sorin Istrail Informatics Research, Celera Genomics/Applied

More information

Estimating Recombination Rates. LRH selection test, and recombination

Estimating Recombination Rates. LRH selection test, and recombination Estimating Recombination Rates LRH selection test, and recombination Recall that LRH tests for selection by looking at frequencies of specific haplotypes. Clearly the test is dependent on the recombination

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

SINGLE nucleotide polymorphisms (SNPs) are differences

SINGLE nucleotide polymorphisms (SNPs) are differences IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 3, NO. 3, JULY-SEPTEMBER 2006 303 Islands of Tractability for Parsimony Haplotyping Roded Sharan, Bjarni V. Halldórsson, and Sorin

More information

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics: Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships

More information

Properties of normal phylogenetic networks

Properties of normal phylogenetic networks Properties of normal phylogenetic networks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu August 13, 2009 Abstract. A phylogenetic network is

More information

OPTIMIZATION TECHNIQUES FOR EDIT VALIDATION AND DATA IMPUTATION

OPTIMIZATION TECHNIQUES FOR EDIT VALIDATION AND DATA IMPUTATION Proceedings of Statistics Canada Symposium 2001 Achieving Data Quality in a Statistical Agency: a Methodological Perspective OPTIMIZATION TECHNIQUES FOR EDIT VALIDATION AND DATA IMPUTATION Renato Bruni

More information

An Application of Integer Linear Programming to Haplotyping Inference by Parsimony Problem

An Application of Integer Linear Programming to Haplotyping Inference by Parsimony Problem Università degli Studi Roma Tre Dottorato di Ricerca in Informatica e Automazione XVIII Ciclo 2005 An Application of Integer Linear Programming to Haplotyping Inference by Parsimony Problem Alessandra

More information

THE EXISTENCE AND USEFULNESS OF EQUALITY CUTS IN THE MULTI-DEMAND MULTIDIMENSIONAL KNAPSACK PROBLEM LEVI DELISSA. B.S., Kansas State University, 2014

THE EXISTENCE AND USEFULNESS OF EQUALITY CUTS IN THE MULTI-DEMAND MULTIDIMENSIONAL KNAPSACK PROBLEM LEVI DELISSA. B.S., Kansas State University, 2014 THE EXISTENCE AND USEFULNESS OF EQUALITY CUTS IN THE MULTI-DEMAND MULTIDIMENSIONAL KNAPSACK PROBLEM by LEVI DELISSA B.S., Kansas State University, 2014 A THESIS submitted in partial fulfillment of the

More information

Introduction to Bin Packing Problems

Introduction to Bin Packing Problems Introduction to Bin Packing Problems Fabio Furini March 13, 2015 Outline Origins and applications Applications: Definition: Bin Packing Problem (BPP) Solution techniques for the BPP Heuristic Algorithms

More information

Chapter 13 Meiosis and Sexual Reproduction

Chapter 13 Meiosis and Sexual Reproduction Biology 110 Sec. 11 J. Greg Doheny Chapter 13 Meiosis and Sexual Reproduction Quiz Questions: 1. What word do you use to describe a chromosome or gene allele that we inherit from our Mother? From our Father?

More information

Phylogenetic Networks with Recombination

Phylogenetic Networks with Recombination Phylogenetic Networks with Recombination October 17 2012 Recombination All DNA is recombinant DNA... [The] natural process of recombination and mutation have acted throughout evolution... Genetic exchange

More information

Scheduling a conference to minimize attendee preference conflicts

Scheduling a conference to minimize attendee preference conflicts MISTA 2015 Scheduling a conference to minimize attendee preference conflicts Jeffrey Quesnelle Daniel Steffy Abstract This paper describes a conference scheduling (or timetabling) problem where, at the

More information

IMA Preprint Series # 2176

IMA Preprint Series # 2176 SOLVING LARGE DOUBLE DIGESTION PROBLEMS FOR DNA RESTRICTION MAPPING BY USING BRANCH-AND-BOUND INTEGER LINEAR PROGRAMMING By Zhijun Wu and Yin Zhang IMA Preprint Series # 2176 ( October 2007 ) INSTITUTE

More information

This is a survey designed for mathematical programming people who do not know molecular biology and

This is a survey designed for mathematical programming people who do not know molecular biology and INFORMS Journal on Computing Vol. 16, No. 3, Summer 2004, pp. 211 231 issn 0899-1499 eissn 1526-5528 04 1603 0211 informs doi 10.1287/ijoc.1040.0073 2004 INFORMS Opportunities for Combinatorial Optimization

More information

The Chromatic Number of Ordered Graphs With Constrained Conflict Graphs

The Chromatic Number of Ordered Graphs With Constrained Conflict Graphs The Chromatic Number of Ordered Graphs With Constrained Conflict Graphs Maria Axenovich and Jonathan Rollin and Torsten Ueckerdt September 3, 016 Abstract An ordered graph G is a graph whose vertex set

More information

arxiv: v3 [cs.ds] 24 Jul 2018

arxiv: v3 [cs.ds] 24 Jul 2018 New Algorithms for Weighted k-domination and Total k-domination Problems in Proper Interval Graphs Nina Chiarelli 1,2, Tatiana Romina Hartinger 1,2, Valeria Alejandra Leoni 3,4, Maria Inés Lopez Pujato

More information

arxiv: v1 [cs.cc] 5 Dec 2018

arxiv: v1 [cs.cc] 5 Dec 2018 Consistency for 0 1 Programming Danial Davarnia 1 and J. N. Hooker 2 1 Iowa state University davarnia@iastate.edu 2 Carnegie Mellon University jh38@andrew.cmu.edu arxiv:1812.02215v1 [cs.cc] 5 Dec 2018

More information

CS 6820 Fall 2014 Lectures, October 3-20, 2014

CS 6820 Fall 2014 Lectures, October 3-20, 2014 Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given

More information

SEQUENTIAL AND SIMULTANEOUS LIFTING IN THE NODE PACKING POLYHEDRON JEFFREY WILLIAM PAVELKA. B.S., Kansas State University, 2011

SEQUENTIAL AND SIMULTANEOUS LIFTING IN THE NODE PACKING POLYHEDRON JEFFREY WILLIAM PAVELKA. B.S., Kansas State University, 2011 SEQUENTIAL AND SIMULTANEOUS LIFTING IN THE NODE PACKING POLYHEDRON by JEFFREY WILLIAM PAVELKA B.S., Kansas State University, 2011 A THESIS Submitted in partial fulfillment of the requirements for the degree

More information

Exhaustive search. CS 466 Saurabh Sinha

Exhaustive search. CS 466 Saurabh Sinha Exhaustive search CS 466 Saurabh Sinha Agenda Two different problems Restriction mapping Motif finding Common theme: exhaustive search of solution space Reading: Chapter 4. Restriction Mapping Restriction

More information

AUTHORIZATION TO LEND AND REPRODUCE THE THESIS. Date Jong Wha Joanne Joo, Author

AUTHORIZATION TO LEND AND REPRODUCE THE THESIS. Date Jong Wha Joanne Joo, Author AUTHORIZATION TO LEND AND REPRODUCE THE THESIS As the sole author of this thesis, I authorize Brown University to lend it to other institutions or individuals for the purpose of scholarly research. Date

More information

DISTINGUISHING PARTITIONS AND ASYMMETRIC UNIFORM HYPERGRAPHS

DISTINGUISHING PARTITIONS AND ASYMMETRIC UNIFORM HYPERGRAPHS DISTINGUISHING PARTITIONS AND ASYMMETRIC UNIFORM HYPERGRAPHS M. N. ELLINGHAM AND JUSTIN Z. SCHROEDER In memory of Mike Albertson. Abstract. A distinguishing partition for an action of a group Γ on a set

More information

Research Reports on Mathematical and Computing Sciences

Research Reports on Mathematical and Computing Sciences ISSN 1342-284 Research Reports on Mathematical and Computing Sciences Exploiting Sparsity in Linear and Nonlinear Matrix Inequalities via Positive Semidefinite Matrix Completion Sunyoung Kim, Masakazu

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Population Genetics I. Bio

Population Genetics I. Bio Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn

More information

Integer Programming for Phylogenetic Network Problems

Integer Programming for Phylogenetic Network Problems Integer Programming for Phylogenetic Network Problems D. Gusfield University of California, Davis Presented at the National University of Singapore, July 27, 2015.! There are many important phylogeny problems

More information

K 4 -free graphs with no odd holes

K 4 -free graphs with no odd holes K 4 -free graphs with no odd holes Maria Chudnovsky 1 Columbia University, New York NY 10027 Neil Robertson 2 Ohio State University, Columbus, Ohio 43210 Paul Seymour 3 Princeton University, Princeton

More information

p(d g A,g B )p(g B ), g B

p(d g A,g B )p(g B ), g B Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Computational Complexity and Genetic Algorithms

Computational Complexity and Genetic Algorithms Computational Complexity and Genetic Algorithms BART RYLANDER JAMES FOSTER School of Engineering Department of Computer Science University of Portland University of Idaho Portland, Or 97203 Moscow, Idaho

More information

Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples

Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples Published in: Journal of Global Optimization, 5, pp. 69-9, 199. Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples Evangelos Triantaphyllou Assistant

More information

On the Approximate Linear Programming Approach for Network Revenue Management Problems

On the Approximate Linear Programming Approach for Network Revenue Management Problems On the Approximate Linear Programming Approach for Network Revenue Management Problems Chaoxu Tong School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853,

More information

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler Complexity Theory Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 15 May, 2018 Reinhard

More information

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181. Complexity Theory Complexity Theory Outline Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität

More information

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems Rani M. R, Mohith Jagalmohanan, R. Subashini Binary matrices having simultaneous consecutive

More information

Introduction to Complexity Theory

Introduction to Complexity Theory Introduction to Complexity Theory Read K & S Chapter 6. Most computational problems you will face your life are solvable (decidable). We have yet to address whether a problem is easy or hard. Complexity

More information

Monotone Submodular Maximization over a Matroid

Monotone Submodular Maximization over a Matroid Monotone Submodular Maximization over a Matroid Yuval Filmus January 31, 2013 Abstract In this talk, we survey some recent results on monotone submodular maximization over a matroid. The survey does not

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

Detecting Backdoor Sets with Respect to Horn and Binary Clauses

Detecting Backdoor Sets with Respect to Horn and Binary Clauses Detecting Backdoor Sets with Respect to Horn and Binary Clauses Naomi Nishimura 1,, Prabhakar Ragde 1,, and Stefan Szeider 2, 1 School of Computer Science, University of Waterloo, Waterloo, Ontario, N2L

More information

Scheduling with Advanced Process Control Constraints

Scheduling with Advanced Process Control Constraints Scheduling with Advanced Process Control Constraints Yiwei Cai, Erhan Kutanoglu, John Hasenbein, Joe Qin July 2, 2009 Abstract With increasing worldwide competition, high technology manufacturing companies

More information

Processes of Evolution

Processes of Evolution 15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection

More information

Fundamental Domains for Integer Programs with Symmetries

Fundamental Domains for Integer Programs with Symmetries Fundamental Domains for Integer Programs with Symmetries Eric J. Friedman Cornell University, Ithaca, NY 14850, ejf27@cornell.edu, WWW home page: http://www.people.cornell.edu/pages/ejf27/ Abstract. We

More information

CS 583: Approximation Algorithms: Introduction

CS 583: Approximation Algorithms: Introduction CS 583: Approximation Algorithms: Introduction Chandra Chekuri January 15, 2018 1 Introduction Course Objectives 1. To appreciate that not all intractable problems are the same. NP optimization problems,

More information

On the Tightness of an LP Relaxation for Rational Optimization and its Applications

On the Tightness of an LP Relaxation for Rational Optimization and its Applications OPERATIONS RESEARCH Vol. 00, No. 0, Xxxxx 0000, pp. 000 000 issn 0030-364X eissn 526-5463 00 0000 000 INFORMS doi 0.287/xxxx.0000.0000 c 0000 INFORMS Authors are encouraged to submit new papers to INFORMS

More information

Lectures 6, 7 and part of 8

Lectures 6, 7 and part of 8 Lectures 6, 7 and part of 8 Uriel Feige April 26, May 3, May 10, 2015 1 Linear programming duality 1.1 The diet problem revisited Recall the diet problem from Lecture 1. There are n foods, m nutrients,

More information

Shorelines of islands of tractability: Algorithms for parsimony and minimum perfect phylogeny haplotyping problems

Shorelines of islands of tractability: Algorithms for parsimony and minimum perfect phylogeny haplotyping problems 1 arxiv:q-bio/0605024v3 [q-bio.ot] 12 Jan 2007 Shorelines of islands of tractability: Algorithms for parsimony and minimum perfect phylogeny haplotyping problems Leo van Iersel, Judith Keijsper, Steven

More information

Efficient Approximation for Restricted Biclique Cover Problems

Efficient Approximation for Restricted Biclique Cover Problems algorithms Article Efficient Approximation for Restricted Biclique Cover Problems Alessandro Epasto 1, *, and Eli Upfal 2 ID 1 Google Research, New York, NY 10011, USA 2 Department of Computer Science,

More information

Minimum Linear Arrangements

Minimum Linear Arrangements Minimum Linear Arrangements Rafael Andrade, Tibérius Bonates, Manoel Câmpelo, Mardson Ferreira ParGO - Research team in Parallel computing, Graph theory and Optimization Department of Statistics and Applied

More information

On Dominator Colorings in Graphs

On Dominator Colorings in Graphs On Dominator Colorings in Graphs Ralucca Michelle Gera Department of Applied Mathematics Naval Postgraduate School Monterey, CA 994, USA ABSTRACT Given a graph G, the dominator coloring problem seeks a

More information

Travelling Salesman Problem

Travelling Salesman Problem Travelling Salesman Problem Fabio Furini November 10th, 2014 Travelling Salesman Problem 1 Outline 1 Traveling Salesman Problem Separation Travelling Salesman Problem 2 (Asymmetric) Traveling Salesman

More information

On Two Class-Constrained Versions of the Multiple Knapsack Problem

On Two Class-Constrained Versions of the Multiple Knapsack Problem On Two Class-Constrained Versions of the Multiple Knapsack Problem Hadas Shachnai Tami Tamir Department of Computer Science The Technion, Haifa 32000, Israel Abstract We study two variants of the classic

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 35, No., May 010, pp. 84 305 issn 0364-765X eissn 156-5471 10 350 084 informs doi 10.187/moor.1090.0440 010 INFORMS On the Power of Robust Solutions in Two-Stage

More information

BBM402-Lecture 20: LP Duality

BBM402-Lecture 20: LP Duality BBM402-Lecture 20: LP Duality Lecturer: Lale Özkahya Resources for the presentation: https://courses.engr.illinois.edu/cs473/fa2016/lectures.html An easy LP? which is compact form for max cx subject to

More information

An Optimization-Based Heuristic for the Split Delivery Vehicle Routing Problem

An Optimization-Based Heuristic for the Split Delivery Vehicle Routing Problem An Optimization-Based Heuristic for the Split Delivery Vehicle Routing Problem Claudia Archetti (1) Martin W.P. Savelsbergh (2) M. Grazia Speranza (1) (1) University of Brescia, Department of Quantitative

More information

The Complexity of Maximum. Matroid-Greedoid Intersection and. Weighted Greedoid Maximization

The Complexity of Maximum. Matroid-Greedoid Intersection and. Weighted Greedoid Maximization Department of Computer Science Series of Publications C Report C-2004-2 The Complexity of Maximum Matroid-Greedoid Intersection and Weighted Greedoid Maximization Taneli Mielikäinen Esko Ukkonen University

More information

Essential facts about NP-completeness:

Essential facts about NP-completeness: CMPSCI611: NP Completeness Lecture 17 Essential facts about NP-completeness: Any NP-complete problem can be solved by a simple, but exponentially slow algorithm. We don t have polynomial-time solutions

More information

IE418 Integer Programming

IE418 Integer Programming IE418: Integer Programming Department of Industrial and Systems Engineering Lehigh University 2nd February 2005 Boring Stuff Extra Linux Class: 8AM 11AM, Wednesday February 9. Room??? Accounts and Passwords

More information

Lecture 4: NP and computational intractability

Lecture 4: NP and computational intractability Chapter 4 Lecture 4: NP and computational intractability Listen to: Find the longest path, Daniel Barret What do we do today: polynomial time reduction NP, co-np and NP complete problems some examples

More information

Solving the Order-Preserving Submatrix Problem via Integer Programming

Solving the Order-Preserving Submatrix Problem via Integer Programming Solving the Order-Preserving Submatrix Problem via Integer Programming Andrew C. Trapp, Oleg A. Prokopyev Department of Industrial Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261,

More information

arxiv: v5 [q-bio.pe] 24 Oct 2016

arxiv: v5 [q-bio.pe] 24 Oct 2016 On the Quirks of Maximum Parsimony and Likelihood on Phylogenetic Networks Christopher Bryant a, Mareike Fischer b, Simone Linz c, Charles Semple d arxiv:1505.06898v5 [q-bio.pe] 24 Oct 2016 a Statistics

More information

Q Expected Coverage Achievement Merit Excellence. Punnett square completed with correct gametes and F2.

Q Expected Coverage Achievement Merit Excellence. Punnett square completed with correct gametes and F2. NCEA Level 2 Biology (91157) 2018 page 1 of 6 Assessment Schedule 2018 Biology: Demonstrate understanding of genetic variation and change (91157) Evidence Q Expected Coverage Achievement Merit Excellence

More information

Some hard families of parameterised counting problems

Some hard families of parameterised counting problems Some hard families of parameterised counting problems Mark Jerrum and Kitty Meeks School of Mathematical Sciences, Queen Mary University of London {m.jerrum,k.meeks}@qmul.ac.uk September 2014 Abstract

More information

Revisiting the Hamiltonian p-median problem: a new formulation on directed graphs and a branch-and-cut algorithm

Revisiting the Hamiltonian p-median problem: a new formulation on directed graphs and a branch-and-cut algorithm Revisiting the Hamiltonian p-median problem: a new formulation on directed graphs and a branch-and-cut algorithm Tolga Bektaş 1, Luís Gouveia 2, Daniel Santos 2 1 Centre for Operational Research, Management

More information

Integer Linear Programming Models for 2-staged Two-Dimensional Knapsack Problems. Andrea Lodi, Michele Monaci

Integer Linear Programming Models for 2-staged Two-Dimensional Knapsack Problems. Andrea Lodi, Michele Monaci Integer Linear Programming Models for 2-staged Two-Dimensional Knapsack Problems Andrea Lodi, Michele Monaci Dipartimento di Elettronica, Informatica e Sistemistica, University of Bologna Viale Risorgimento,

More information