Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line
|
|
- Anabel Norma Wiggins
- 5 years ago
- Views:
Transcription
1 Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VF Files On-line MatBio 18 Solon P. Pissis and Ahmad Retha King s ollege London 02-Aug-2018 Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 1 / Files 20 O
2 ontents Pan-Genomes and data structures. Elastic Degenerate Strings and other definitions. EDSM vs MEDSM searches. Multi-EDSM search example. Experiments. Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 2 / Files 20 O
3 Introduction A pan-genome is a set of genomes from the same species or closely-related species. Various data structures to store pan-genomes have been suggested: De Bruijn graphs [PanGenome onsortium, Brief. Bioinform, 2016] Variation graphs [J. Siren, ALENEX, 2017] Bloom Filter Trie [Holly et al, Algorithms Mol Biol, 2016] Suffix Tree-based structures [Baier et al, Bioinformatics, 2016] Elastic Degenerate Strings [Huang et al, Bioinformatics, 2013] All but the last are Off-line solutions. Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 3 / Files 20 O
4 Elastic Degenerate Strings onsider the following Multiple Sequence Alignment of three closely-related sequences: ATGAAGGGTA--TTTTA ATGAAGGGTATATTTTA ATGATGG----TTTTA ED string compacted represention of length n and total size N: T = ATGA A G T GG TA TATA ε TTTTA Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 4 / Files 20 O
5 Definitions An Elastic Degenerate (ED) string consists of deterministic and non-deterministic/degenerate positions or segments. Fixed Sized Alphabet Σ = O(1) and Σ = A,, G, T. Plus ε, representing deleted segment. An ED string has length n positions, with each character in deterministic segments and each degenerate position counted as 1 position. Total size of an ED string N, with ε = 1, is defined by: n 1 N = i=0 X[i] 1 j=0 X[i][j]. Problem EDSM - We report ending position <i> for every match of pattern p in the ED string, where 0 i < n. Problem MEDSM - We report <i, j> for any ending position i we find any pattern j in set P, where 0 j < P. Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 5 / Files 20 O
6 Variant all Format 1000 Human Genomes Project, Phase 3, 2504 individuals many SNPs and other variants. VF files record the position of variants of every sample relative to the reference genome. #HROM POS ID REF ALT HG00096 HG00097 HG rs T When the variants are applied for these samples, reference: TTTGTTAT, they produce: allele 0 allele 1 HG00096 TTTGTTAT TTTGTTATT HG00097 TTTGTTATT TTTGTTATT HG00143 TTTGTTAT TTTGTTAT Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 6 / Files 20 O
7 EDSM Example Problem EDSM: Let T be an ED string for which we want to find p = AAA. We find a match ending at position i = 2: A T = A AA ε A And another match at i = 4: T = A A AA ε A Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 7 / Files 20 O
8 MEDSM Example Problem MEDSM: Let T be an ED string for which we want to find P = AA, AA. T = Our matches are: A A AA ε Position i Pattern j A Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 8 / Files 20 O
9 Algorithm Multi-EDSM Three/Four steps at each position T [i]: 1 Memorize positions of prefixes. 2 heck if possible to extend pattern prefixes. 3 heck if suffix ends here and report. 4 If T [i][j] m min, do full pattern matching, and report if found. Pre-processing by storing the letter positions of P = AA, AA in vector I: Letter Bitvector I A I I G I T Time O(M + σ M/w ) and space O(σ M/w ). Where w is the computer word size - usually 64. Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 9 / Files 20 O
10 Algorithm Multi-EDSM continued... In pre-processing we also construct a Suffix Tree of P = AA#AA$, ST P in time O(M). (Some nodes ommitted) A 7 $ 0 A#AA$ #AA$ 2 A 5A$ #AA$ 1 6$ A$ 4 Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 10 / Files 20 O
11 Algorithm Multi-EDSM continued... The tree will be used answer OccVec queries given a string α, return a bitvector marking starting positions where α occurs in P. Lemma Given an integer parameter τ, where 1 τ M/w, a data structure of size O( M/τ M/w ) can be constructed in time and space O( M/τ M/w ) answering OccVec queries in time O( α + τ). But how? Lemma Given a k-ary tree, where k represents edge cardinality and k 3, its equivalent binary tree T P can be generated in time and space O(M). Lemma We partition T P with cardinality k 3 into n/τ disjoint connected subgraphs (micro trees). Each micro tree contains up to τ nodes and upper and lower boundary nodes connecting them with other micro trees. Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 11 / Files 20 O
12 Algorithm Multi-EDSM continued... By setting τ = M w, any OccVec query requires at most time O( α + τ). v u Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 12 / Files 20 O
13 Algorithm Multi-EDSM continued... At each boundary node we store a bitvector b marking all positions of the suffix. When we query OccVec(α) we spell down the tree down to a node v: If v is a boundary node of some micro tree, we simply return a pointer to b v ; this takes constant time. If v is not a boundary node, we can find the bottom boundary node u (if it exists) of the micro tree. We create a new empty bitvector b u and if node u exists, we set b u = b u. Then we traverse the micro tree rooted at v, obtaining the suffix number of each leaf node and setting the corresponding bits on in b u. We then return a pointer to the updated b u; this takes time O(τ). Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 13 / Files 20 O
14 MEDSM Example Back to our problem of finding P = AA, AA in T : Letter Bitvector I A I T = A A AA ε A At position 1, B = , we mark the position of prefixes of P. At position 2, we are able to extend the prefix of P 0 from B and report a match. At position 2, we do full pattern matching and report P 1. We finish at position 2 with prefix B = Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 14 / Files 20 O
15 MEDSM Example ontinued Still looking for P = AA, AA in T : T = A A AA ε A Remember B = from previous segment (2). At position 3 we run B = OccVec() to get B = Note that positions are pre-shifted right and boundary 1s are excluded. We bitwise-and B with B. B = & to mark valid infixes and shift it by α. We have matched A so far. The presence of ε means we simply do a bitwise-or (set union) of previously matched prefixes with prefixes found in the current position. Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 15 / Files 20 O
16 MEDSM Example ontinued Still looking for P = AA, AA in T : T = A A AA ε A At position 4, when we try to extend the prefix on the first word using OccVec() but it fails on the first letter. Recall that B = because we memorised positions from segments 2 and 3. So then we try the next word and we are able to extend the prefix and complete the pattern, reporting a match for P 1. We do these three/four steps for every segment. Searching takes time O(N M/w ) and requires O( M/w ) extra space. Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 16 / Files 20 O
17 Experiments: Time Performance DNA; randomly-generated synthetic ED texts; 10% of degenerate positions. 7 Log processing time (s) Log processing time (s) Total pattern length (M) (a) Processing time with increasing patterns total length on a fixed ED text of length n = (N = ) Total text size (N) (b) Processing time with increasing ED total text size on a fixed set of patterns of total length M = Figure: Time performance of Multi-EDSM. Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 17 / Files 20 O
18 Experiments: Multi-EDSM vs EDSM-BV DNA; randomly-generated synthetic ED texts; 10% of degenerate positions Multi-EDSM EDSM-BV Time (s) No. of patterns Figure: Elapsed-time comparison of Multi-EDSM and EDSM-BV with an ED text of total size N = and sets of randomly-generated patterns of length 40 each. Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 18 / Files 20 O
19 Experiments: MAW Validation We designed a three stage pipeline for determining the validity of Minimal Absent Words (MAWs) discovered in the human genome. A Minimal Absent Word (MAW) y of a text x is a word whose proper factors occur in x but y itself is absent from x. We searched a list of patterns (max-length 12) with combined length M = We discovered 73% of MAWs were invalid, leaving only of potential MAWs remaining. We checked the work of Silva et al [Bioinform, 2015], who had identified 3 MAWs present in Ebola virus genomes not present in the Human reference genome... but it is present in Human PanGenome: id sequence position variant id sample id ethnicity RAW1 TTTGGAT 6: rs NA18606 Han hinese RAW2 TAGTATG 1: rs HG02146 Peruvian RAW3 TAGGAAA 15: rs HG03598 Bengali Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 19 / Files 20 O
20 onclusion Multi-EDSM is an algorithm to solve the Multiple Elastic Degenerate String Matching (MEDSM) problem On-line. Preprocessing time and space (with τ = M/w ) is O(M). Multi-EDSM solves the EDSM problem with time O(N m/w ) and space O(m) whereas EDSM-BV requires space O(m m/w ). Search time is O(N M/w ). Multi-EDSM is a robust and fast algorithm. It is a useful tool for searching VF data on-line, especially for partially indexed data. Simplified implementation of Multi-EDSM: Other tools: Solon P. Pissis and Ahmad Retha (King s ollege Dictionary London) Matching in Elastic-Degenerate Texts with Applications 02-Aug-2018 in Searching VF 20 / Files 20 O
On-line pattern matching on similar texts
On-line pattern matching on similar texts Roberto Grossi, Costas Iliopoulos, Chang Liu, Nadia Pisanti, Solon Pissis, Ahmad Retha, Giovanna Rosone, Fatima Vayani, Luca Versari To cite this version: Roberto
More information2. Exact String Matching
2. Exact String Matching Let T = T [0..n) be the text and P = P [0..m) the pattern. We say that P occurs in T at position j if T [j..j + m) = P. Example: P = aine occurs at position 6 in T = karjalainen.
More informationBLAST: Basic Local Alignment Search Tool
.. CSC 448 Bioinformatics Algorithms Alexander Dekhtyar.. (Rapid) Local Sequence Alignment BLAST BLAST: Basic Local Alignment Search Tool BLAST is a family of rapid approximate local alignment algorithms[2].
More informationFaster Online Elastic Degenerate String Matching
Faster Online Elastic Degenerate String Matching Kotaro Aoyama Department of Electrical Engineering and Computer Science, Kyushu University, Japan kotaro.aoyama@inf.kyushu-u.ac.jp Yuto Nakashima Department
More information1 Alphabets and Languages
1 Alphabets and Languages Look at handout 1 (inference rules for sets) and use the rules on some examples like {a} {{a}} {a} {a, b}, {a} {{a}}, {a} {{a}}, {a} {a, b}, a {{a}}, a {a, b}, a {{a}}, a {a,
More informationSIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding
SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.
More informationLecture 18 April 26, 2012
6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 18 April 26, 2012 1 Overview In the last lecture we introduced the concept of implicit, succinct, and compact data structures, and
More informationLecture 4 : Adaptive source coding algorithms
Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv
More informationCompressed Index for Dynamic Text
Compressed Index for Dynamic Text Wing-Kai Hon Tak-Wah Lam Kunihiko Sadakane Wing-Kin Sung Siu-Ming Yiu Abstract This paper investigates how to index a text which is subject to updates. The best solution
More informationModule 9: Tries and String Matching
Module 9: Tries and String Matching CS 240 - Data Structures and Data Management Sajed Haque Veronika Irvine Taylor Smith Based on lecture notes by many previous cs240 instructors David R. Cheriton School
More informationDefinition: A binary relation R from a set A to a set B is a subset R A B. Example:
Chapter 9 1 Binary Relations Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Let A = {0,1,2} and B = {a,b} {(0, a), (0, b), (1,a), (2, b)} is a relation from A to B.
More informationCMPSCI 311: Introduction to Algorithms Second Midterm Exam
CMPSCI 311: Introduction to Algorithms Second Midterm Exam April 11, 2018. Name: ID: Instructions: Answer the questions directly on the exam pages. Show all your work for each question. Providing more
More informationSmall-Space Dictionary Matching (Dissertation Proposal)
Small-Space Dictionary Matching (Dissertation Proposal) Graduate Center of CUNY 1/24/2012 Problem Definition Dictionary Matching Input: Dictionary D = P 1,P 2,...,P d containing d patterns. Text T of length
More informationSection Summary. Relations and Functions Properties of Relations. Combining Relations
Chapter 9 Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations Closures of Relations (not currently included
More informationA Faster Grammar-Based Self-Index
A Faster Grammar-Based Self-Index Travis Gagie 1 Pawe l Gawrychowski 2 Juha Kärkkäinen 3 Yakov Nekrich 4 Simon Puglisi 5 Aalto University Max-Planck-Institute für Informatik University of Helsinki University
More informationRandomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th
CSE 3500 Algorithms and Complexity Fall 2016 Lecture 10: September 29, 2016 Quick sort: Average Run Time In the last lecture we started analyzing the expected run time of quick sort. Let X = k 1, k 2,...,
More informationCS 455/555: Mathematical preliminaries
CS 455/555: Mathematical preliminaries Stefan D. Bruda Winter 2019 SETS AND RELATIONS Sets: Operations: intersection, union, difference, Cartesian product Big, powerset (2 A ) Partition (π 2 A, π, i j
More informationOnline Sorted Range Reporting and Approximating the Mode
Online Sorted Range Reporting and Approximating the Mode Mark Greve Progress Report Department of Computer Science Aarhus University Denmark January 4, 2010 Supervisor: Gerth Stølting Brodal Online Sorted
More informationarxiv: v1 [cs.db] 29 Sep 2015
Probabilistic Threshold Indexing for Uncertain Strings arxiv:509.08608v [cs.db] 29 Sep 205 Sharma Thankachan Georgia Institute of Technology Georgia, USA thanks@csc.lsu.edu ABSTRACT Strings form a fundamental
More informationCSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182
CSE182-L7 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding 10-07 CSE182 Bell Labs Honors Pattern matching 10-07 CSE182 Just the Facts Consider the set of all substrings
More informationRank and Select Operations on Binary Strings (1974; Elias)
Rank and Select Operations on Binary Strings (1974; Elias) Naila Rahman, University of Leicester, www.cs.le.ac.uk/ nyr1 Rajeev Raman, University of Leicester, www.cs.le.ac.uk/ rraman entry editor: Paolo
More informationarxiv: v1 [cs.ds] 15 Feb 2012
Linear-Space Substring Range Counting over Polylogarithmic Alphabets Travis Gagie 1 and Pawe l Gawrychowski 2 1 Aalto University, Finland travis.gagie@aalto.fi 2 Max Planck Institute, Germany gawry@cs.uni.wroc.pl
More informationOn-line String Matching in Highly Similar DNA Sequences
On-line String Matching in Highly Similar DNA Sequences Nadia Ben Nsira 1,2,ThierryLecroq 1,,MouradElloumi 2 1 LITIS EA 4108, Normastic FR3638, University of Rouen, France 2 LaTICE, University of Tunis
More informationJumbled String Matching: Motivations, Variants, Algorithms
Jumbled String Matching: Motivations, Variants, Algorithms Zsuzsanna Lipták University of Verona (Italy) Workshop Combinatorial structures for sequence analysis in bioinformatics Milano-Bicocca, 27 Nov
More informationarxiv: v2 [cs.ds] 3 Oct 2017
Orthogonal Vectors Indexing Isaac Goldstein 1, Moshe Lewenstein 1, and Ely Porat 1 1 Bar-Ilan University, Ramat Gan, Israel {goldshi,moshe,porately}@cs.biu.ac.il arxiv:1710.00586v2 [cs.ds] 3 Oct 2017 Abstract
More informationBloom Filters, Minhashes, and Other Random Stuff
Bloom Filters, Minhashes, and Other Random Stuff Brian Brubach University of Maryland, College Park StringBio 2018, University of Central Florida What? Probabilistic Space-efficient Fast Not exact Why?
More informationRelations. We have seen several types of abstract, mathematical objects, including propositions, predicates, sets, and ordered pairs and tuples.
Relations We have seen several types of abstract, mathematical objects, including propositions, predicates, sets, and ordered pairs and tuples. Relations use ordered tuples to represent relationships among
More informationImplementing Approximate Regularities
Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National
More informationComputation Theory Finite Automata
Computation Theory Dept. of Computing ITT Dublin October 14, 2010 Computation Theory I 1 We would like a model that captures the general nature of computation Consider two simple problems: 2 Design a program
More informationMotif Extraction from Weighted Sequences
Motif Extraction from Weighted Sequences C. Iliopoulos 1, K. Perdikuri 2,3, E. Theodoridis 2,3,, A. Tsakalidis 2,3 and K. Tsichlas 1 1 Department of Computer Science, King s College London, London WC2R
More informationAlphabet Friendly FM Index
Alphabet Friendly FM Index Author: Rodrigo González Santiago, November 8 th, 2005 Departamento de Ciencias de la Computación Universidad de Chile Outline Motivations Basics Burrows Wheeler Transform FM
More informationarxiv: v1 [cs.ds] 24 Sep 2015
Deterministic Sparse Suffix Sorting on Rewritable Texts Johannes Fischer Tomohiro I Dominik Köppl Abstract arxiv:1509.07417v1 [cs.ds] 24 Sep 2015 Given a rewriteable text T of length n on an alphabet of
More informationPATTERN MATCHING WITH SWAPS IN PRACTICE
International Journal of Foundations of Computer Science c World Scientific Publishing Company PATTERN MATCHING WITH SWAPS IN PRACTICE MATTEO CAMPANELLI Università di Catania, Scuola Superiore di Catania
More informationRelations. Relations of Sets N-ary Relations Relational Databases Binary Relation Properties Equivalence Relations. Reading (Epp s textbook)
Relations Relations of Sets N-ary Relations Relational Databases Binary Relation Properties Equivalence Relations Reading (Epp s textbook) 8.-8.3. Cartesian Products The symbol (a, b) denotes the ordered
More informationSplay trees (Sleator, Tarjan 1983)
Splay trees (Sleator, Tarjan 1983) 1 Main idea Try to arrange so frequently used items are near the root We shall assume that there is an item in every node including internal nodes. We can change this
More informationSpace-Efficient Construction Algorithm for Circular Suffix Tree
Space-Efficient Construction Algorithm for Circular Suffix Tree Wing-Kai Hon, Tsung-Han Ku, Rahul Shah and Sharma Thankachan CPM2013 1 Outline Preliminaries and Motivation Circular Suffix Tree Our Indexes
More informationarxiv: v1 [cs.ds] 19 Apr 2011
Fixed Block Compression Boosting in FM-Indexes Juha Kärkkäinen 1 and Simon J. Puglisi 2 1 Department of Computer Science, University of Helsinki, Finland juha.karkkainen@cs.helsinki.fi 2 Department of
More informationTheoretical Computer Science. Dynamic rank/select structures with applications to run-length encoded texts
Theoretical Computer Science 410 (2009) 4402 4413 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Dynamic rank/select structures with
More informationSource Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria
Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal
More informationCPSC 421: Tutorial #1
CPSC 421: Tutorial #1 October 14, 2016 Set Theory. 1. Let A be an arbitrary set, and let B = {x A : x / x}. That is, B contains all sets in A that do not contain themselves: For all y, ( ) y B if and only
More informationLanguages. A language is a set of strings. String: A sequence of letters. Examples: cat, dog, house, Defined over an alphabet:
Languages 1 Languages A language is a set of strings String: A sequence of letters Examples: cat, dog, house, Defined over an alphaet: a,, c,, z 2 Alphaets and Strings We will use small alphaets: Strings
More informationAdvanced Data Structures
Simon Gog gog@kit.edu - Simon Gog: KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu Predecessor data structures We want to support
More informationConverting SLP to LZ78 in almost Linear Time
CPM 2013 Converting SLP to LZ78 in almost Linear Time Hideo Bannai 1, Paweł Gawrychowski 2, Shunsuke Inenaga 1, Masayuki Takeda 1 1. Kyushu University 2. Max-Planck-Institut für Informatik Recompress SLP
More informationUkkonen's suffix tree construction algorithm
Ukkonen's suffix tree construction algorithm aba$ $ab aba$ 2 2 1 1 $ab a ba $ 3 $ $ab a ba $ $ $ 1 2 4 1 String Algorithms; Nov 15 2007 Motivation Yet another suffix tree construction algorithm... Why?
More informationProblem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26
Binary Search Introduction Problem Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Strategy 1: Random Search Randomly select a page until the page containing
More informationCS60007 Algorithm Design and Analysis 2018 Assignment 1
CS60007 Algorithm Design and Analysis 2018 Assignment 1 Palash Dey and Swagato Sanyal Indian Institute of Technology, Kharagpur Please submit the solutions of the problems 6, 11, 12 and 13 (written in
More informationHW #4. (mostly by) Salim Sarımurat. 1) Insert 6 2) Insert 8 3) Insert 30. 4) Insert S a.
HW #4 (mostly by) Salim Sarımurat 04.12.2009 S. 1. 1. a. 1) Insert 6 2) Insert 8 3) Insert 30 4) Insert 40 2 5) Insert 50 6) Insert 61 7) Insert 70 1. b. 1) Insert 12 2) Insert 29 3) Insert 30 4) Insert
More informationMore Dynamic Programming
CS 374: Algorithms & Models of Computation, Spring 2017 More Dynamic Programming Lecture 14 March 9, 2017 Chandra Chekuri (UIUC) CS374 1 Spring 2017 1 / 42 What is the running time of the following? Consider
More informationTheory of Computation
Theory of Computation (Feodor F. Dragan) Department of Computer Science Kent State University Spring, 2018 Theory of Computation, Feodor F. Dragan, Kent State University 1 Before we go into details, what
More informationLeast Random Suffix/Prefix Matches in Output-Sensitive Time
Least Random Suffix/Prefix Matches in Output-Sensitive Time Niko Välimäki Department of Computer Science University of Helsinki nvalimak@cs.helsinki.fi 23rd Annual Symposium on Combinatorial Pattern Matching
More informationInteger Sorting on the word-ram
Integer Sorting on the word-rm Uri Zwick Tel viv University May 2015 Last updated: June 30, 2015 Integer sorting Memory is composed of w-bit words. rithmetical, logical and shift operations on w-bit words
More informationAdvanced Data Structures
Simon Gog gog@kit.edu - Simon Gog: KIT The Research University in the Helmholtz Association www.kit.edu Predecessor data structures We want to support the following operations on a set of integers from
More informationCS1800: Mathematical Induction. Professor Kevin Gold
CS1800: Mathematical Induction Professor Kevin Gold Induction: Used to Prove Patterns Just Keep Going For an algorithm, we may want to prove that it just keeps working, no matter how big the input size
More informationGTRAC FAST R ETRIEVAL FROM C OMPRESSED C OLLECTIONS OF G ENOMIC VARIANTS. Kedar Tatwawadi Mikel Hernaez Idoia Ochoa Tsachy Weissman
GTRAC FAST R ETRIEVAL FROM C OMPRESSED C OLLECTIONS OF G ENOMIC VARIANTS Kedar Tatwawadi Mikel Hernaez Idoia Ochoa Tsachy Weissman Overview Introduction Results Algorithm Details Summary & Further Work
More informationBOUNDS ON ZIMIN WORD AVOIDANCE
BOUNDS ON ZIMIN WORD AVOIDANCE JOSHUA COOPER* AND DANNY RORABAUGH* Abstract. How long can a word be that avoids the unavoidable? Word W encounters word V provided there is a homomorphism φ defined by mapping
More informationMore Dynamic Programming
Algorithms & Models of Computation CS/ECE 374, Fall 2017 More Dynamic Programming Lecture 14 Tuesday, October 17, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 48 What is the running time of the following?
More informationDynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.
Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal
More informationSuccinct 2D Dictionary Matching with No Slowdown
Succinct 2D Dictionary Matching with No Slowdown Shoshana Neuburger and Dina Sokol City University of New York Problem Definition Dictionary Matching Input: Dictionary D = P 1,P 2,...,P d containing d
More informationUNIT-VIII COMPUTABILITY THEORY
CONTEXT SENSITIVE LANGUAGE UNIT-VIII COMPUTABILITY THEORY A Context Sensitive Grammar is a 4-tuple, G = (N, Σ P, S) where: N Set of non terminal symbols Σ Set of terminal symbols S Start symbol of the
More informationA Graph Polynomial Approach to Primitivity
A Graph Polynomial Approach to Primitivity F. Blanchet-Sadri 1, Michelle Bodnar 2, Nathan Fox 3, and Joe Hidakatsu 2 1 Department of Computer Science, University of North Carolina, P.O. Box 26170, Greensboro,
More informationLecture 2: Divide and conquer and Dynamic programming
Chapter 2 Lecture 2: Divide and conquer and Dynamic programming 2.1 Divide and Conquer Idea: - divide the problem into subproblems in linear time - solve subproblems recursively - combine the results in
More informationQuiz 1 Solutions. (a) f 1 (n) = 8 n, f 2 (n) = , f 3 (n) = ( 3) lg n. f 2 (n), f 1 (n), f 3 (n) Solution: (b)
Introduction to Algorithms October 14, 2009 Massachusetts Institute of Technology 6.006 Spring 2009 Professors Srini Devadas and Constantinos (Costis) Daskalakis Quiz 1 Solutions Quiz 1 Solutions Problem
More informationOptimal lower bounds for rank and select indexes
Optimal lower bounds for rank and select indexes Alexander Golynski David R. Cheriton School of Computer Science, University of Waterloo agolynski@cs.uwaterloo.ca Technical report CS-2006-03, Version:
More information2 Generating Functions
2 Generating Functions In this part of the course, we re going to introduce algebraic methods for counting and proving combinatorial identities. This is often greatly advantageous over the method of finding
More informationarxiv: v1 [cs.ds] 9 Apr 2018
From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract
More informationSUFFIX TREE. SYNONYMS Compact suffix trie
SUFFIX TREE Maxime Crochemore King s College London and Université Paris-Est, http://www.dcs.kcl.ac.uk/staff/mac/ Thierry Lecroq Université de Rouen, http://monge.univ-mlv.fr/~lecroq SYNONYMS Compact suffix
More informationAutomata Theory and Formal Grammars: Lecture 1
Automata Theory and Formal Grammars: Lecture 1 Sets, Languages, Logic Automata Theory and Formal Grammars: Lecture 1 p.1/72 Sets, Languages, Logic Today Course Overview Administrivia Sets Theory (Review?)
More informationarxiv: v1 [math.co] 30 Mar 2010
arxiv:1003.5939v1 [math.co] 30 Mar 2010 Generalized Fibonacci recurrences and the lex-least De Bruijn sequence Joshua Cooper April 1, 2010 Abstract Christine E. Heitsch The skew of a binary string is the
More informationCombinatorics on Finite Words and Data Structures
Combinatorics on Finite Words and Data Structures Dipartimento di Informatica ed Applicazioni Università di Salerno (Italy) Laboratoire I3S - Université de Nice-Sophia Antipolis 13 March 2009 Combinatorics
More informationDisconnecting Networks via Node Deletions
1 / 27 Disconnecting Networks via Node Deletions Exact Interdiction Models and Algorithms Siqian Shen 1 J. Cole Smith 2 R. Goli 2 1 IOE, University of Michigan 2 ISE, University of Florida 2012 INFORMS
More informationLearning Large-Alphabet and Analog Circuits with Value Injection Queries
Learning Large-Alphabet and Analog Circuits with Value Injection Queries Dana Angluin 1 James Aspnes 1, Jiang Chen 2, Lev Reyzin 1,, 1 Computer Science Department, Yale University {angluin,aspnes}@cs.yale.edu,
More informationSUPPLEMENTARY INFORMATION
SUPPLEMENTARY INFORMATION doi:10.1038/nature11875 Method for Encoding and Decoding Arbitrary Computer Files in DNA Fragments 1 Encoding 1.1: An arbitrary computer file is represented as a string S 0 of
More informationEfficient High-Similarity String Comparison: The Waterfall Algorithm
Efficient High-Similarity String Comparison: The Waterfall Algorithm Alexander Tiskin Department of Computer Science University of Warwick http://go.warwick.ac.uk/alextiskin Alexander Tiskin (Warwick)
More informationComparing whole genomes
BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will
More informationReversal Distance for Strings with Duplicates: Linear Time Approximation using Hitting Set
Reversal Distance for Strings with Duplicates: Linear Time Approximation using Hitting Set Petr Kolman Charles University in Prague Faculty of Mathematics and Physics Department of Applied Mathematics
More informationHarvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries
Harvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries Harry Lewis September 5, 2013 Reading: Sipser, Chapter 0 Sets Sets are defined by their members A = B means that for every x, x A iff
More informationPRGs for space-bounded computation: INW, Nisan
0368-4283: Space-Bounded Computation 15/5/2018 Lecture 9 PRGs for space-bounded computation: INW, Nisan Amnon Ta-Shma and Dean Doron 1 PRGs Definition 1. Let C be a collection of functions C : Σ n {0,
More informationDictionary: an abstract data type
2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees
More informationDe novo assembly and genotyping of variants using colored de Bruijn graphs
De novo assembly and genotyping of variants using colored de Bruijn graphs Iqbal et al. 2012 Kolmogorov Mikhail 2013 Challenges Detecting genetic variants that are highly divergent from a reference Detecting
More informationFinite State Automata and Simple Conceptual Graphs with Binary Conceptual Relations
Finite State Automata and Simple Conceptual Graphs with Binary Conceptual Relations Galia Angelova and Stoyan Mihov Institute for Parallel Processing, Bulgarian Academy of Sciences 25A Acad. G. Bonchev
More informationSequence comparison by compression
Sequence comparison by compression Motivation similarity as a marker for homology. And homology is used to infer function. Sometimes, we are only interested in a numerical distance between two sequences.
More informationSets are one of the basic building blocks for the types of objects considered in discrete mathematics.
Section 2.1 Introduction Sets are one of the basic building blocks for the types of objects considered in discrete mathematics. Important for counting. Programming languages have set operations. Set theory
More informationProblem: Shortest Common Superstring. The Greedy Algorithm for Shortest Common Superstrings. Overlap graphs. Substring-freeness
Problem: Shortest Common Superstring The Greedy Algorithm for Shortest Common Superstrings Course Discrete Biological Models (Modelli Biologici Discreti) Zsuzsanna Lipták Laurea Triennale in Bioinformatica
More informationLecture 1: September 25, A quick reminder about random variables and convexity
Information and Coding Theory Autumn 207 Lecturer: Madhur Tulsiani Lecture : September 25, 207 Administrivia This course will cover some basic concepts in information and coding theory, and their applications
More informationString Range Matching
String Range Matching Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi Department of Computer Science, University of Helsinki Helsinki, Finland firstname.lastname@cs.helsinki.fi Abstract. Given strings
More informationText matching of strings in terms of straight line program by compressed aleshin type automata
Text matching of strings in terms of straight line program by compressed aleshin type automata 1 A.Jeyanthi, 2 B.Stalin 1 Faculty, 2 Assistant Professor 1 Department of Mathematics, 2 Department of Mechanical
More informationEquivalence relations
Equivalence relations R A A is an equivalence relation if R is 1. reflexive (a, a) R 2. symmetric, and (a, b) R (b, a) R 3. transitive. (a, b), (b, c) R (a, c) R Example: Let S be a relation on people
More informationApproximating Shortest Superstring Problem Using de Bruijn Graphs
Approximating Shortest Superstring Problem Using de Bruijn Advanced s Course Presentation Farshad Barahimi Department of Computer Science University of Lethbridge November 19, 2013 This presentation is
More informationTheoretical aspects of ERa, the fastest practical suffix tree construction algorithm
Theoretical aspects of ERa, the fastest practical suffix tree construction algorithm Matevž Jekovec University of Ljubljana Faculty of Computer and Information Science Oct 10, 2013 Text indexing problem
More informationBloom Filters, general theory and variants
Bloom Filters: general theory and variants G. Caravagna caravagn@cli.di.unipi.it Information Retrieval Wherever a list or set is used, and space is a consideration, a Bloom Filter should be considered.
More informationarxiv: v2 [cs.ds] 8 Apr 2016
Optimal Dynamic Strings Paweł Gawrychowski 1, Adam Karczmarz 1, Tomasz Kociumaka 1, Jakub Łącki 2, and Piotr Sankowski 1 1 Institute of Informatics, University of Warsaw, Poland [gawry,a.karczmarz,kociumaka,sank]@mimuw.edu.pl
More informationarxiv: v1 [cs.ds] 25 Nov 2009
Alphabet Partitioning for Compressed Rank/Select with Applications Jérémy Barbay 1, Travis Gagie 1, Gonzalo Navarro 1 and Yakov Nekrich 2 1 Department of Computer Science University of Chile {jbarbay,
More informationDefine M to be a binary n by m matrix such that:
The Shift-And Method Define M to be a binary n by m matrix such that: M(i,j) = iff the first i characters of P exactly match the i characters of T ending at character j. M(i,j) = iff P[.. i] T[j-i+.. j]
More informationLongest Gapped Repeats and Palindromes
Discrete Mathematics and Theoretical Computer Science DMTCS vol. 19:4, 2017, #4 Longest Gapped Repeats and Palindromes Marius Dumitran 1 Paweł Gawrychowski 2 Florin Manea 3 arxiv:1511.07180v4 [cs.ds] 11
More informationA conjecture on the alphabet size needed to produce all correlation classes of pairs of words
A conjecture on the alphabet size needed to produce all correlation classes of pairs of words Paul Leopardi Thanks: Jörg Arndt, Michael Barnsley, Richard Brent, Sylvain Forêt, Judy-anne Osborn. Mathematical
More informationMARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for
MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1
More informationString Search. 6th September 2018
String Search 6th September 2018 Search for a given (short) string in a long string Search problems have become more important lately The amount of stored digital information grows steadily (rapidly?)
More informationDeterministic Finite Automata (DFAs)
CS/ECE 374: Algorithms & Models of Computation, Fall 28 Deterministic Finite Automata (DFAs) Lecture 3 September 4, 28 Chandra Chekuri (UIUC) CS/ECE 374 Fall 28 / 33 Part I DFA Introduction Chandra Chekuri
More informationShortest paths with negative lengths
Chapter 8 Shortest paths with negative lengths In this chapter we give a linear-space, nearly linear-time algorithm that, given a directed planar graph G with real positive and negative lengths, but no
More informationSection 1 (closed-book) Total points 30
CS 454 Theory of Computation Fall 2011 Section 1 (closed-book) Total points 30 1. Which of the following are true? (a) a PDA can always be converted to an equivalent PDA that at each step pops or pushes
More information