Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 1/35
|
|
- Melvin McDonald
- 5 years ago
- Views:
Transcription
1 Efficient Algorithms for Regular Expression Constrained Sequence Alignment Yun-Sheng Chung, Chin Lung Lu, and Chuan Yi Tang Department of Computer Science National Tsing Hua University, Taiwan Department of Biological Science and Technology National Chiao Tung University, Taiwan 17th Annual Symposium on Combinatorial Pattern Matching Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 1/5
2 Regular Expression Constraints PROSITE contains biologically important sites and motifs. PROSITE motifs can be represented by regular expressions. P-loop motif: [AG]-x()-G-K-[ST] Regular expression: (A + G)ΣΣΣΣGK(S + T) Efficient Algorithms forregular Expression Constrained Sequence Alignment p. /5
3 Regular Expression Constraints In an alignment, it is reasonable to expect functional sites to be aligned together. Introduced by Arslan (CPM 005). T G F P S V G K T K D D A T - F - S V A - - K D D D G K S A T G F P S V G K T K D D A T F S V A K D D D G K S A (G + A)ΣΣΣΣGK(S + T): the P-loop motif. Efficient Algorithms forregular Expression Constrained Sequence Alignment p. /5
4 Outline Related works Weighted automata Arslan s algorithm Our algorithms Conclusion and future works Efficient Algorithms forregular Expression Constrained Sequence Alignment p. /5
5 Related Works Tang et al. (CSB 00, JBCB 00) introduced the Constrained Multiple Sequence Alignment (CMSA) problem Constraint: a sequence of characters Example: Sequences from RNase family share the conserved sequence of residues H, K, H. In this case the constraint is H, K, H. W W A Q H K P H C H A Q K P Y H C Chin et al. (CSB 00, JBCB 005) then proposed a more effieicnt algorithm for pairwise CSA, along with a -approximation algorithm for CMSA. Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 5/5
6 Related Works Tsai et al. (Bioinformatics 00) generalized the definition of constraints. Constraint: Sequence of strings allowing mismatches P 1 = AGCC, P = CG ɛ = 0.5 MuSiC: a web tool A G C C C G A G U C C G Lu and Huang (Bioinformatics 005) gave a more space efficient algorithm MuSiC-ME Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 6/5
7 RECSA Arslan (CPM 005) introduced the regular expression constrained sequence alignment (RECSA) problem. Constraint: Regular expression R. R = (G + A)ΣΣΣΣGK(S + T) T T F S V G A F K P D S D V D G G K K T S K D D A A In this current work, we give more time and space efficient algorithms. Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 7/5
8 Weighted Automata: Topology Regular expression constraint R is converted into an ɛ-free NFA A. The topology of a weighted automaton M is essentially the same as the topology of the product machine of A. R = a; Σ- = Σ {-}: (q 0, q 0 ) (q 1, q 0 ) (a, a) a q 0 q 1 (q 0, q 1 ) (q 1, q 1 ) Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 8/5
9 Weighted Automata Let δ be the transition function of A. Then δ M, the transition function of M, is defined as δ M ((p, q), (a, b)) = { δ(p, a) δ(q, b) if (p, q) (F F ) {(q 0, q 0 )} (δ(p, a) δ(q, b)) {(p, q)} otherwise (q 0, q 0 ) (a, a) (q 1, q 0 ) a q 0 q 1 (q 0, q 1 ) (q 1, q 1 ) Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 9/5
10 Weighted Automata and Alignments S 1 [1..] = tca, S [1..] = atc, R = a [ t c a A = a t c ] (q 0, q 0 ) (q 1, q 0 ) (a, a) (q 0, q 1 ) (q 1, q 1 ) Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 10/5
11 Weighted Automata and Alignments S 1 [1..] = tca, S [1..] = atc, R = a [ t c a A = a t c ] (q 0, q 0 ) (q 1, q 0 ) (a, a) (q 0, q 1 ) (q 1, q 1 ) Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 11/5
12 Weighted Automata and Alignments S 1 [1..] = tca, S [1..] = atc, R = a [ t c a A = a t c ] (q 0, q 0 ) (q 1, q 0 ) (a, a) (q 0, q 1 ) (q 1, q 1 ) Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 1/5
13 Weighted Automata and Alignments S 1 [1..] = tca, S [1..] = atc, R = a [ t c a A = a t c ] (q 0, q 0 ) (q 1, q 0 ) (a, a) (q 0, q 1 ) (q 1, q 1 ) Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 1/5
14 Weighted Automata and Alignments S 1 [1..] = tca, S [1..] = atc, R = a [ t c a A = a t c ] (q 0, q 0 ) (q 1, q 0 ) (a, a) (q 0, q 1 ) (q 1, q 1 ) Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 1/5
15 Weighted Automata and Alignments S 1 [1..] = tca, S [1..] = atc, R = a [ ] t c a A = a t c score: 1 (q 0, q 0 ) (q 1, q 0 ) (a, a) (q 0, q 1 ) (q 1, q 1 ) Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 15/5
16 Weighted Automata and Alignments A reaches the final state if and only if A is a feasible constrained alignment [ ] t c a A = score: 0 a t c (q 0, q 0 ) (q 1, q 0 ) (a, a) (q 0, q 1 ) (q 1, q 1 ) Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 16/5
17 Weighted Automata: Scores S 1 [1..] = tca, S [1..] = atc, R = a [ ] t c a A 1 = A = a t c [ t ] c a a t c M, : (match: scored 1; other: scored 0) A 1 A 1 (a, a) 1 A Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 17/5
18 Weighted Automata: Computation Let a = S 1 [i 1 ] and b = S [i ]: ɛ a t ɛ M 0,0 M 0,1 M 0, t M 1,0 M 1,1 M 1, c M,0 M,1 M, S [i ] M i1 1,i 1 M i1 1,i S 1 [i 1 ] M i1,i 1 M i1,i = max{m (a,b) i 1 1,i 1, M (a,-) i 1 1,i, M (-,b) i 1,i 1 } W i1,i (p, q) = max{w (a,b) (a,-) i 1 1,i 1 (p, q), W i 1 1,i (p, q), W (-,b) i 1,i 1 (p, q)} Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 18/5
19 Weighted Automata: Insertion S 1 [1..] = tca, S [1..] = atca Additional edit operation:, with score 0 M, M (-,a), A 1 A 1 (a, a) A 1 [ a (-, ] a) (a, a) A 1 A 1 [ a ] Σ- A Σ- 1 [ a ] The computation can be done by finding in M, all the arcs that are labeled with (or ). Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 19/5
20 Arslan s Algorithm Let r be the size of the transition function of M. In the algorithm by Arslan, the whole automaton is constructed and searched for each (i 1, i ), hence taking O(r) time and space. Total time and space: O(rn ) and O(rn), repectively. Let V be the set of states in A. a c #arcs in A = Θ( Σ V ) in the worst case. a t r = #arcs in M, which is Θ( Σ V ) in the worst case. c g Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 0/5
21 Time and Space Complexity Worst case complexity for Arslan s algorithm: O( Σ V n ) time and O( Σ V n) space. The space complexity is for the computation of the optimal score only. It is not mentioned how to reconstruct the optimal alignment without affecting the space complexity. In this work we propose two algorithms. Worst case time: O( V n ) and O( V log V n ), respectively. Space: O( V n), with the optimal alignment reconstructed. Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 1/5
22 The First Algorithm It is more efficient to separate the transitions from the scores. (a, a) 1 L, q 0 q 1 q 0 q 1 1 Worst case space complexity: O( Σ V n) O( V n) Efficient Algorithms forregular Expression Constrained Sequence Alignment p. /5
23 Search in A, not M V V Consider the computation of M (a,a), from M,. (a, a) (a, a) 1 a q 0 q 1 L, q 0 q 1 q 0 q 1 1 temp q 0 q 1 q 0 q 1 L (a,a), q 0 q 1 q 0 q Efficient Algorithms forregular Expression Constrained Sequence Alignment p. /5
24 The First Algorithm Let L be a temporary V V table initialized to be all. for p V do for p V with T [p, S 1 [i 1 ]; p] = 1 do for q V do L[p, q ] max{l[p, q ], L i1 1,i 1[p, q ] + γ(s 1 [i 1 ], S [i ])}; for q V do for q V with T [q, S [i ]; q] = 1 do for p V do L (S 1[i 1 ],S [i ]) i 1 1,i 1 [p, q] max{l (S 1[i 1 ],S [i ]) i 1 1,i 1 [p, q], L[p, q ]}; The insertion and deletion cases are similar. Efficient Algorithms forregular Expression Constrained Sequence Alignment p. /5
25 Reconstruct the Alignment If we know which substrings of S 1 and S are aligned to satisfy the constraint in an optimal constrained alignment, then the optimal alignment can be reconstructed easily in O(n) space. R S R S 1 The indices of such substrings can be found by doing some additional bookkeeping during the computation without affecting the space complexity mentioned previously. Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 5/5
26 Reconstruct the Alignment M,0 0 A (a, a) A [ a a ] M (a,a),0 1 (a, a) A = [ t ] c, A [ a ] = [ t ] c a a 1 A [ a ] start: (, 1); end: (, 1) Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 6/5
27 The Second Algorithm The second algorithm requires V = O(log n). Let T [p, a] be the bit vector (T [p, a; ],..., T [p, a; p V ]). Under the assumption, T [p, a] can be stored in a single word. We take L (S 1[i 1 ],-) i 1 1,i [p, q] as an example. The other two cases are similar. Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 7/5
28 The Second Algorithm The arcs shown are all transitions in δ labeled with S 1 [i 1 ]. V = {,,, p }. Fix q V. T [, S 1 [i 1 ]] = (1, 0, 1, 0), T [, S 1 [i 1 ]] = (0, 1, 1, 0), T [, S 1 [i 1 ]] = (1, 0, 0, 1), T [p, S 1 [i 1 ]] = (0, 1, 1, 1) L (S 1[i 1 ],-) i 1 1,i [p, q] L (S 1[i 1 ],-) i 1 1,i [p, q] 1 1 p p p p L i1 1,i [p, q] L i1 1,i [p, q] Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 8/5
29 The Second Algorithm Y : Bit vector representing the states p in V such that L (S 1[i 1 ],-) i 1 1,i [p, q] has not yet been set. Initially: Y = (1, 1, 1, 1). T [, S 1 [i 1 ]] Y = (1, 0, 0, 1) L (S 1[i 1 ],-) i 1 1,i [p, q] L (S 1[i 1 ],-) i 1 1,i [p, q] p p 1 p 1 p L i1 1,i [p, q] L i1 1,i [p, q] Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 9/5
30 The Second Algorithm Y Y T [, S 1 [i 1 ]] Y = (1, 1, 1, 1) (1, 0, 0, 1) = (0, 1, 1, 0) T [, S 1 [i 1 ]] Y = (0, 0, 1, 0) L (S 1[i 1 ],-) i 1 1,i [p, q] L (S 1[i 1 ],-) i 1 1,i [p, q] p p 1 p 1 p L i1 1,i [p, q] L i1 1,i [p, q] Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 0/5
31 The Second Algorithm Y Y T [, S 1 [i 1 ]] Y = (0, 1, 1, 0) (0, 0, 1, 0) = (0, 1, 0, 0) T [p, S 1 [i 1 ]] Y = (0, 1, 0, 0) L (S 1[i 1 ],-) i 1 1,i [p, q] L (S 1[i 1 ],-) i 1 1,i [p, q] p p 1 p 1 p L i1 1,i [p, q] L i1 1,i [p, q] Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 1/5
32 The Second Algorithm Y Y T [p, S 1 [i 1 ]] Y = (0, 1, 0, 0) (0, 1, 0, 0) = (0, 0, 0, 0) T [, S 1 [i 1 ]] Y = (0, 0, 0, 0) L (S 1[i 1 ],-) i 1 1,i [p, q] L (S 1[i 1 ],-) i 1 1,i [p, q] p p 1 p 1 p L i1 1,i [p, q] L i1 1,i [p, q] Efficient Algorithms forregular Expression Constrained Sequence Alignment p. /5
33 The Second Algorithm L (S 1[i 1 ],-) i 1 1,i [p, q] L (S 1[i 1 ],-) i 1 1,i [p, q] p p 1 p 1 p L i1 1,i [p, q] L i1 1,i [p, q] Time: O( V log V ) for each fixed q; O( V log V ) for the entire L (S 1[i 1 ],-) i 1 1,i Efficient Algorithms forregular Expression Constrained Sequence Alignment p. /5
34 Conclusion and Future Works More time and space efficient algorithms for regular expression constrained sequence alignment are proposed. A multiple alignment version is important. Generalize the weighted automata structure to obtain optimal solutions: suggested by Arslan (CPM 005). Progressive heuristics: e.g., Lu and Huang s work (Bioinformatics 005). Approximation algorithms: in preparation. Local alignment version would also be useful. Has a similar motivation as PHI-BLAST; may work complementarily by taking a different approach. Efficient Algorithms forregular Expression Constrained Sequence Alignment p. /5
35 Thank you for your attention. Efficient Algorithms forregular Expression Constrained Sequence Alignment p. 5/5
Closure under the Regular Operations
September 7, 2013 Application of NFA Now we use the NFA to show that collection of regular languages is closed under regular operations union, concatenation, and star Earlier we have shown this closure
More informationEfficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem
Efficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem Hsing-Yen Ann National Center for High-Performance Computing Tainan 74147, Taiwan Chang-Biau Yang and Chiou-Ting
More informationCSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata (cont )
CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata (cont ) Sungjin Im University of California, Merced 2-3-214 Example II A ɛ B ɛ D F C E Example II A ɛ B ɛ D F C E NFA accepting
More informationCSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata
CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata Sungjin Im University of California, Merced 1-27-215 Nondeterminism Michael Rabin and Dana Scott (1959) Michael Rabin Dana
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationPattern Matching (Exact Matching) Overview
CSI/BINF 5330 Pattern Matching (Exact Matching) Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Pattern Matching Exhaustive Search DFA Algorithm KMP Algorithm
More informationUNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r
Syllabus R9 Regulation UNIT-II NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: In the automata theory, a nondeterministic finite automaton (NFA) or nondeterministic finite state machine is a finite
More informationState Complexity of Neighbourhoods and Approximate Pattern Matching
State Complexity of Neighbourhoods and Approximate Pattern Matching Timothy Ng, David Rappaport, and Kai Salomaa School of Computing, Queen s University, Kingston, Ontario K7L 3N6, Canada {ng, daver, ksalomaa}@cs.queensu.ca
More informationFinite Automata. Seungjin Choi
Finite Automata Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 28 Outline
More informationProofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.
Proofs, Strings, and Finite Automata CS154 Chris Pollett Feb 5, 2007. Outline Proofs and Proof Strategies Strings Finding proofs Example: For every graph G, the sum of the degrees of all the nodes in G
More informationRE-MuSiC: a tool for multiple sequence alignment with regular expression constraints
Nucleic Acids Research, 2007, Vol. 35, Web Server issue W639 W644 doi:10.1093/nar/gkm275 RE-MuSiC: a tool for multiple sequence alignment with regular expression constraints Yun-Sheng Chung 1, Wei-Hsun
More informationNondeterminism. September 7, Nondeterminism
September 7, 204 Introduction is a useful concept that has a great impact on the theory of computation Introduction is a useful concept that has a great impact on the theory of computation So far in our
More informationOctober 6, Equivalence of Pushdown Automata with Context-Free Gramm
Equivalence of Pushdown Automata with Context-Free Grammar October 6, 2013 Motivation Motivation CFG and PDA are equivalent in power: a CFG generates a context-free language and a PDA recognizes a context-free
More informationCSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182
CSE182-L7 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding 10-07 CSE182 Bell Labs Honors Pattern matching 10-07 CSE182 Just the Facts Consider the set of all substrings
More informationCompressed Index for Dynamic Text
Compressed Index for Dynamic Text Wing-Kai Hon Tak-Wah Lam Kunihiko Sadakane Wing-Kin Sung Siu-Ming Yiu Abstract This paper investigates how to index a text which is subject to updates. The best solution
More informationFinite Automata. Wen-Guey Tzeng Computer Science Department National Chiao Tung University
Finite Automata Wen-Guey Tzeng Computer Science Department National Chiao Tung University Syllabus Deterministic finite acceptor Nondeterministic finite acceptor Equivalence of DFA and NFA Reduction of
More informationChapter 2: Finite Automata
Chapter 2: Finite Automata Peter Cappello Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 cappello@cs.ucsb.edu Please read the corresponding chapter before
More informationCS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa
CS:4330 Theory of Computation Spring 2018 Regular Languages Finite Automata and Regular Expressions Haniel Barbosa Readings for this lecture Chapter 1 of [Sipser 1996], 3rd edition. Sections 1.1 and 1.3.
More informationSeptember 11, Second Part of Regular Expressions Equivalence with Finite Aut
Second Part of Regular Expressions Equivalence with Finite Automata September 11, 2013 Lemma 1.60 If a language is regular then it is specified by a regular expression Proof idea: For a given regular language
More informationNondeterministic Finite Automata. Nondeterminism Subset Construction
Nondeterministic Finite Automata Nondeterminism Subset Construction 1 Nondeterminism A nondeterministic finite automaton has the ability to be in several states at once. Transitions from a state on an
More informationSA-REPC - Sequence Alignment with a Regular Expression Path Constraint
SA-REPC - Sequence Alignment with a Regular Expression Path Constraint Nimrod Milo Tamar Pinhas Michal Ziv-Ukelson Ben-Gurion University of the Negev, Be er Sheva, Israel Graduate Seminar, BGU 2010 Milo,
More information20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming
20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, 2008 4 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance 4. Global and local alignment
More informationCopyright 2000 N. AYDIN. All rights reserved. 1
Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment
More information3515ICT: Theory of Computation. Regular languages
3515ICT: Theory of Computation Regular languages Notation and concepts concerning alphabets, strings and languages, and identification of languages with problems (H, 1.5). Regular expressions (H, 3.1,
More informationCISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)
CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST
More informationRegular Expressions Kleene s Theorem Equation-based alternate construction. Regular Expressions. Deepak D Souza
Regular Expressions Deepak D Souza Department of Computer Science and Automation Indian Institute of Science, Bangalore. 16 August 2012 Outline 1 Regular Expressions 2 Kleene s Theorem 3 Equation-based
More informationA Survey of the Longest Common Subsequence Problem and Its. Related Problems
Survey of the Longest Common Subsequence Problem and Its Related Problems Survey of the Longest Common Subsequence Problem and Its Related Problems Thesis Submitted to the Faculty of Department of Computer
More informationPattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1
Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Outline and Reading Strings ( 9.1.1) Pattern matching algorithms Brute-force algorithm ( 9.1.2) Boyer-Moore algorithm ( 9.1.3) Knuth-Morris-Pratt
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationNondeterministic Finite Automata
Nondeterministic Finite Automata Not A DFA Does not have exactly one transition from every state on every symbol: Two transitions from q 0 on a No transition from q 1 (on either a or b) Though not a DFA,
More information5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT
5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology:
More informationAutomata: a short introduction
ILIAS, University of Luxembourg Discrete Mathematics II May 2012 What is a computer? Real computers are complicated; We abstract up to an essential model of computation; We begin with the simplest possible
More informationImplementing Approximate Regularities
Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National
More informationarxiv: v1 [cs.ds] 15 Feb 2012
Linear-Space Substring Range Counting over Polylogarithmic Alphabets Travis Gagie 1 and Pawe l Gawrychowski 2 1 Aalto University, Finland travis.gagie@aalto.fi 2 Max Planck Institute, Germany gawry@cs.uni.wroc.pl
More informationCMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013
CMPSCI 250: Introduction to Computation Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013 λ-nfa s to NFA s to DFA s Reviewing the Three Models and Kleene s Theorem The Subset
More informationEfficient Algorithms for the Longest Common Subsequence Problem with Sequential Substring Constraints
2011 11th IEEE International Conference on Bioinformatics and Bioengineering Efficient Algorithms for the Longest Common Subsequence Problem with Sequential Substring Constraints Chiou-Ting Tseng, Chang-Biau
More informationLecture 3: Nondeterministic Finite Automata
Lecture 3: Nondeterministic Finite Automata September 5, 206 CS 00 Theory of Computation As a recap of last lecture, recall that a deterministic finite automaton (DFA) consists of (Q, Σ, δ, q 0, F ) where
More informationOverview Multiple Sequence Alignment
Overview Multiple Sequence Alignment Inge Jonassen Bioinformatics group Dept. of Informatics, UoB Inge.Jonassen@ii.uib.no Definition/examples Use of alignments The alignment problem scoring alignments
More informationComputer Sciences Department
1 Reference Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER 3 objectives Finite automaton Infinite automaton Formal definition State diagram Regular and Non-regular
More informationAutomata and Languages
Automata and Languages Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Nondeterministic Finite Automata with empty moves (-NFA) Definition A nondeterministic finite automaton
More informationTheoretical Computer Science. Dynamic rank/select structures with applications to run-length encoded texts
Theoretical Computer Science 410 (2009) 4402 4413 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Dynamic rank/select structures with
More informationFormal Models in NLP
Formal Models in NLP Finite-State Automata Nina Seemann Universität Stuttgart Institut für Maschinelle Sprachverarbeitung Pfaffenwaldring 5b 70569 Stuttgart May 15, 2012 Nina Seemann (IMS) Formal Models
More informationSequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir
Sequence Bioinformatics Multiple Sequence Alignment Waqas Nasir 2010-11-12 Multiple Sequence Alignment One amino acid plays coy; a pair of homologous sequences whisper; many aligned sequences shout out
More informationKleene Algebras and Algebraic Path Problems
Kleene Algebras and Algebraic Path Problems Davis Foote May 8, 015 1 Regular Languages 1.1 Deterministic Finite Automata A deterministic finite automaton (DFA) is a model of computation that can simulate
More informationNondeterministic Finite Automata
Nondeterministic Finite Automata Lecture 6 Section 2.2 Robb T. Koether Hampden-Sydney College Mon, Sep 5, 2016 Robb T. Koether (Hampden-Sydney College) Nondeterministic Finite Automata Mon, Sep 5, 2016
More informationFormal Definition of Computation. August 28, 2013
August 28, 2013 Computation model The model of computation considered so far is the work performed by a finite automaton Finite automata were described informally, using state diagrams, and formally, as
More informationTrace Reconstruction Revisited
Trace Reconstruction Revisited Andrew McGregor 1, Eric Price 2, Sofya Vorotnikova 1 1 University of Massachusetts Amherst 2 IBM Almaden Research Center Problem Description Take original string x of length
More informationLeast Random Suffix/Prefix Matches in Output-Sensitive Time
Least Random Suffix/Prefix Matches in Output-Sensitive Time Niko Välimäki Department of Computer Science University of Helsinki nvalimak@cs.helsinki.fi 23rd Annual Symposium on Combinatorial Pattern Matching
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationComputational Theory
Computational Theory Finite Automata and Regular Languages Curtis Larsen Dixie State University Computing and Design Fall 2018 Adapted from notes by Russ Ross Adapted from notes by Harry Lewis Curtis Larsen
More informationCourse 4 Finite Automata/Finite State Machines
Course 4 Finite Automata/Finite State Machines The structure and the content of the lecture is based on (1) http://www.eecs.wsu.edu/~ananth/cpts317/lectures/index.htm, (2) W. Schreiner Computability and
More informationNondeterministic Finite Automata
Nondeterministic Finite Automata Mahesh Viswanathan Introducing Nondeterminism Consider the machine shown in Figure. Like a DFA it has finitely many states and transitions labeled by symbols from an input
More informationClosure under the Regular Operations
Closure under the Regular Operations Application of NFA Now we use the NFA to show that collection of regular languages is closed under regular operations union, concatenation, and star Earlier we have
More informationChapter 5. Finite Automata
Chapter 5 Finite Automata 5.1 Finite State Automata Capable of recognizing numerous symbol patterns, the class of regular languages Suitable for pattern-recognition type applications, such as the lexical
More information8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011
8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011 2 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance and alignment 4. The number
More informationOutline. Nondetermistic Finite Automata. Transition diagrams. A finite automaton is a 5-tuple (Q, Σ,δ,q 0,F)
Outline Nondeterminism Regular expressions Elementary reductions http://www.cs.caltech.edu/~cs20/a October 8, 2002 1 Determistic Finite Automata A finite automaton is a 5-tuple (Q, Σ,δ,q 0,F) Q is a finite
More informationPairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55
Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise
More informationStructure-Based Comparison of Biomolecules
Structure-Based Comparison of Biomolecules Benedikt Christoph Wolters Seminar Bioinformatics Algorithms RWTH AACHEN 07/17/2015 Outline 1 Introduction and Motivation Protein Structure Hierarchy Protein
More informationTheory of Computation (II) Yijia Chen Fudan University
Theory of Computation (II) Yijia Chen Fudan University Review A language L is a subset of strings over an alphabet Σ. Our goal is to identify those languages that can be recognized by one of the simplest
More informationMultiple Sequence Alignment
Multiple Sequence Alignment BMI/CS 576 www.biostat.wisc.edu/bmi576.html Colin Dewey cdewey@biostat.wisc.edu Multiple Sequence Alignment: Tas Definition Given a set of more than 2 sequences a method for
More informationDeterministic Finite Automata (DFAs)
Algorithms & Models of Computation CS/ECE 374, Fall 27 Deterministic Finite Automata (DFAs) Lecture 3 Tuesday, September 5, 27 Sariel Har-Peled (UIUC) CS374 Fall 27 / 36 Part I DFA Introduction Sariel
More informationAnalysis and Design of Algorithms Dynamic Programming
Analysis and Design of Algorithms Dynamic Programming Lecture Notes by Dr. Wang, Rui Fall 2008 Department of Computer Science Ocean University of China November 6, 2009 Introduction 2 Introduction..................................................................
More informationApplication of Associative Matrices to Recognize DNA Sequences in Bioinformatics
Application of Associative Matrices to Recognize DNA Sequences in Bioinformatics 1. Introduction. Jorge L. Ortiz Department of Electrical and Computer Engineering College of Engineering University of Puerto
More informationMultiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:
Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:50 5001 5 Multiple Sequence Alignment The first part of this exposition is based on the following sources, which are recommended reading:
More informationDeterministic Finite Automata (DFAs)
CS/ECE 374: Algorithms & Models of Computation, Fall 28 Deterministic Finite Automata (DFAs) Lecture 3 September 4, 28 Chandra Chekuri (UIUC) CS/ECE 374 Fall 28 / 33 Part I DFA Introduction Chandra Chekuri
More informationChapter Five: Nondeterministic Finite Automata
Chapter Five: Nondeterministic Finite Automata From DFA to NFA A DFA has exactly one transition from every state on every symbol in the alphabet. By relaxing this requirement we get a related but more
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More informationGiven a string manipulating program, string analysis determines all possible values that a string expression can take during any program execution
l Given a string manipulating program, string analysis determines all possible values that a string expression can take during any program execution l Using string analysis we can verify properties of
More informationThe Separating Words Problem
The Separating Words Problem Jeffrey Shallit School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1 Canada shallit@cs.uwaterloo.ca http://www.cs.uwaterloo.ca/~shallit 1 / 48 An Advertisement
More informationOutline DP paradigm Discrete optimisation Viterbi algorithm DP: 0 1 Knapsack. Dynamic Programming. Georgy Gimel farb
Outline DP paradigm Discrete optimisation Viterbi algorithm DP: Knapsack Dynamic Programming Georgy Gimel farb (with basic contributions by Michael J. Dinneen) COMPSCI 69 Computational Science / Outline
More informationMinimal Height and Sequence Constrained Longest Increasing Subsequence
Minimal Height and Sequence Constrained Longest Increasing Subsequence Chiou-Ting Tseng, Chang-Biau Yang and Hsing-Yen Ann Department of Computer Science and Engineering National Sun Yat-sen University,
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationExamples of Regular Expressions. Finite Automata vs. Regular Expressions. Example of Using flex. Application
Examples of Regular Expressions 1. 0 10, L(0 10 ) = {w w contains exactly a single 1} 2. Σ 1Σ, L(Σ 1Σ ) = {w w contains at least one 1} 3. Σ 001Σ, L(Σ 001Σ ) = {w w contains the string 001 as a substring}
More informationBio nformatics. Lecture 3. Saad Mneimneh
Bio nformatics Lecture 3 Sequencing As before, DNA is cut into small ( 0.4KB) fragments and a clone library is formed. Biological experiments allow to read a certain number of these short fragments per
More informationUniversity of New Mexico Department of Computer Science. Final Examination. CS 561 Data Structures and Algorithms Fall, 2013
University of New Mexico Department of Computer Science Final Examination CS 561 Data Structures and Algorithms Fall, 2013 Name: Email: This exam lasts 2 hours. It is closed book and closed notes wing
More informationConstructions on Finite Automata
Constructions on Finite Automata Informatics 2A: Lecture 4 Mary Cryan School of Informatics University of Edinburgh mcryan@inf.ed.ac.uk 24 September 2018 1 / 33 Determinization The subset construction
More informationAccelerated Natural Language Processing Lecture 3 Morphology and Finite State Machines; Edit Distance
Accelerated Natural Language Processing Lecture 3 Morphology and Finite State Machines; Edit Distance Sharon Goldwater (based on slides by Philipp Koehn) 20 September 2018 Sharon Goldwater ANLP Lecture
More informationThree new strategies for exact string matching
Three new strategies for exact string matching Simone Faro 1 Thierry Lecroq 2 1 University of Catania, Italy 2 University of Rouen, LITIS EA 4108, France SeqBio 2012 November 26th-27th 2012 Marne-la-Vallée,
More informationTheory of Computation (I) Yijia Chen Fudan University
Theory of Computation (I) Yijia Chen Fudan University Instructor Yijia Chen Homepage: http://basics.sjtu.edu.cn/~chen Email: yijiachen@fudan.edu.cn Textbook Introduction to the Theory of Computation Michael
More informationConsensus Optimizing Both Distance Sum and Radius
Consensus Optimizing Both Distance Sum and Radius Amihood Amir 1, Gad M. Landau 2, Joong Chae Na 3, Heejin Park 4, Kunsoo Park 5, and Jeong Seop Sim 6 1 Bar-Ilan University, 52900 Ramat-Gan, Israel 2 University
More informationGEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I
GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I Internal Examination 2017-18 B.Tech III Year VI Semester Sub: Theory of Computation (6CS3A) Time: 1 Hour 30 min. Max Marks: 40 Note: Attempt all three
More informationTheory of Computation
Thomas Zeugmann Hokkaido University Laboratory for Algorithmics http://www-alg.ist.hokudai.ac.jp/ thomas/toc/ Lecture 3: Finite State Automata Motivation In the previous lecture we learned how to formalize
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationFORMAL LANGUAGES, AUTOMATA AND COMPUTATION
FORMAL LANGUAGES, AUTOMATA AND COMPUTATION DECIDABILITY ( LECTURE 15) SLIDES FOR 15-453 SPRING 2011 1 / 34 TURING MACHINES-SYNOPSIS The most general model of computation Computations of a TM are described
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More information8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009
8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009 2 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance and alignment 4. The number
More informationFinite Automata and Regular languages
Finite Automata and Regular languages Huan Long Shanghai Jiao Tong University Acknowledgements Part of the slides comes from a similar course in Fudan University given by Prof. Yijia Chen. http://basics.sjtu.edu.cn/
More informationThe Separating Words Problem
The Separating Words Problem Jeffrey Shallit School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1 Canada shallit@cs.uwaterloo.ca https://www.cs.uwaterloo.ca/~shallit 1/54 The Simplest
More informationIntroduction to Formal Languages, Automata and Computability p.1/42
Introduction to Formal Languages, Automata and Computability Pushdown Automata K. Krithivasan and R. Rama Introduction to Formal Languages, Automata and Computability p.1/42 Introduction We have considered
More informationDeterministic Finite Automata (DFAs)
Algorithms & Models of Computation CS/ECE 374, Spring 29 Deterministic Finite Automata (DFAs) Lecture 3 Tuesday, January 22, 29 L A TEXed: December 27, 28 8:25 Chan, Har-Peled, Hassanieh (UIUC) CS374 Spring
More informationCS 121, Section 2. Week of September 16, 2013
CS 121, Section 2 Week of September 16, 2013 1 Concept Review 1.1 Overview In the past weeks, we have examined the finite automaton, a simple computational model with limited memory. We proved that DFAs,
More informationAutomata and Formal Languages - CM0081 Finite Automata and Regular Expressions
Automata and Formal Languages - CM0081 Finite Automata and Regular Expressions Andrés Sicard-Ramírez Universidad EAFIT Semester 2018-2 Introduction Equivalences DFA NFA -NFA RE Finite Automata and Regular
More informationAutomata and Formal Languages - CM0081 Non-Deterministic Finite Automata
Automata and Formal Languages - CM81 Non-Deterministic Finite Automata Andrés Sicard-Ramírez Universidad EAFIT Semester 217-2 Non-Deterministic Finite Automata (NFA) Introduction q i a a q j a q k The
More informationComputational Models - Lecture 5 1
Computational Models - Lecture 5 1 Handout Mode Iftach Haitner and Yishay Mansour. Tel Aviv University. April 10/22, 2013 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice
More informationDiscovering Most Classificatory Patterns for Very Expressive Pattern Classes
Discovering Most Classificatory Patterns for Very Expressive Pattern Classes Masayuki Takeda 1,2, Shunsuke Inenaga 1,2, Hideo Bannai 3, Ayumi Shinohara 1,2, and Setsuo Arikawa 1 1 Department of Informatics,
More informationAutomata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,
Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu ETH Zürich (D-ITET) September, 24 2015 Last week was all about Deterministic Finite Automaton We saw three main
More informationTobias Markus. January 21, 2015
Automata Advanced Seminar Computer Engineering January 21, 2015 (Advanced Seminar Computer Engineering ) Automata January 21, 2015 1 / 35 1 2 3 4 5 6 obias Markus (Advanced Seminar Computer Engineering
More informationCpSc 421 Final Exam December 15, 2006
CpSc 421 Final Exam December 15, 2006 Do problem zero and six of problems 1 through 9. If you write down solutions for more that six problems, clearly indicate those that you want graded. Note that problems
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationFormal Definition of a Finite Automaton. August 26, 2013
August 26, 2013 Why a formal definition? A formal definition is precise: - It resolves any uncertainties about what is allowed in a finite automaton such as the number of accept states and number of transitions
More information