CSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182
|
|
- Gary Robertson
- 5 years ago
- Views:
Transcription
1 CSE182-L7 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182
2 Bell Labs Honors Pattern matching CSE182
3 Just the Facts Consider the set of all substrings of the query string of fixed length W. Prob. of exact match to a random database string is very low. Prob. of exact match to a true homolog is very high. Keyword Search (exact matches) is MUCH faster than sequence alignment 10/28/14 CSE182
4 Speeding up via an exact match heuristics Consider a query string of length m A db string of length n Start by looking for exact matches of keywords of length W between the query and database string. Wherever, there is an exact match, perform a SW local alignment. 10/28/14 CSE182
5 Why is BLAST fast? Assume that keyword searching does not consume any time and that alignment computation the expensive step. Query m=1000, random Db n=10 7, no TP SW = O(nm) = 1000*10 7 = computations BLAST, W=11 E(#11-mer hits)= 1000* (1/4) 11 * 10 7 =2384 Number of computations = 2384*100*100=2.384*10 7 Ratio=10 10 /(2.384*10 7 )=420 Further speed improvements are possible 10/28/14 CSE182
6 Keyword (Dictionary) Matching How fast can we match keywords? Hash table/db index? What is the size of the hash table, for m=11 Suffix trees? What is the size of the suffix trees? Trie based search. We will do this in class. 10/28/14 CSE182 AATCA 567
7 The last step in Blast We have discussed Alignments Db filtering using keywords Scoring matrices E-values and P-values The last step: Database filtering requires us to scan a large sequence fast for matching keywords 10/28/14 CSE182
8 Dictionary Matching 1:POTATO 2:POTASSIUM 3:TASTE P O T A S T P O T A T O database dictionary Q: Given k words (s i has length l i ), and a database of size n, find all matches to these words in the database string. How fast can this be done? 10/28/14 CSE182
9 Dict. Matching & string matching How fast can you do it, if you only had one word of length m? Trivial algorithm O(nm) time Pre-processing O(m), Search O(n) time. Dictionary matching Trivial algorithm (l 1 +l 2 +l 3 )n Using a keyword tree, l p n (l p is the length of the longest pattern) Aho-Corasick: O(n) after preprocessing O(l 1 +l 2..) We will consider the most general case 10/28/14 CSE182
10 Direct Algorithm P O P O P O T A S T P O T A T O! P O P T O P A P T O O A T O! T A A O! T T O! O! P O T A T O! Observations: When we mismatch, we (should) know something about where the next match will be. When there is a mismatch, we (should) know something about other patterns in the dictionary as well. 10/28/14 CSE182
11 The Trie Automaton Construct an automaton A from the dictionary A[v,x] describes the transition from node v to a node w upon reading x. A[u, T ] = v, and A[u, S ] = w Special root node r Some nodes are terminal, and labeled with the index of the dictionary word. r P O T A T O T A S T E 10/28/14 CSE182 u S w 3 v S I 1 U M 1:POTATO 2:POTASSIUM 3:TASTE 2
12 An O(l p n) algorithm for keyword matching Start with the first position in the db, and the root node. If successful transition Else Increment current pointer Move to a new node If terminal node success Retract current pointer Increment start pointer Move to root & repeat 10/28/14 CSE182
13 Illustration: l c P O T A S T P O T A T O v P O T A T O 1 T S A S T E 10/28/14 CSE182 3 S I U M 2
14 Idea for improving the time Suppose we have partially matched pattern i (indicated by l, and c), but fail subsequently. If some other pattern j is to match Then prefix(pattern j) = suffix [ first c-l characters of pattern(i)) l P O T A S T P O T A T O P O T A S S I U M T A S T E 10/28/14 CSE182 c Pattern j Pattern i 1:POTATO 2:POTASSIUM 3:TASTE
15 An O(n) alg. For keyword matching Start with the first position in the db, and the root node. If successful transition Increment current pointer Move to a new node If terminal node success Else (if at root) Increment current pointer Mv start pointer Move to root Else Move start pointer forward Move to failure node 10/28/14 CSE182
16 Failure function Every node v corresponds to a string s v that is a prefix of some pattern. Define F[v] to be the node u such that s u is the longest suffix of s v If we fail to match at v, we should jump to F[v], and commence matching from there Let lp[v] = s u n 1! P! O! T! A! T! O! v T! S! n 7! n 2! A! n 3! n 4! S! T! E! n 5! n 10! n 6! 10/28/14 n! 8 n! 9 CSE182 1! S! I! U! M!
17 Illustration What is F(n 10 )? What is F(n 5 )? F(n 3 )? Lp(n 10 )? n 1! P! O! T! A! T! O! v T! S! n 7! n 2! A! n 3! n 4! S! T! E! n 5! n 10! n 6! 10/28/14 n! 8 n! 9 CSE182 1! S! I! U! M!
18 Illustration P O T A S T P O T A T O! l = 1 c = 1 n 1! v n 7! P! O! T! A! T! O! n 2! T! S! A! n 3! n 4! S! T! E! n 5! n 10! n 6! 10/28/14 n! 8 n! 9 CSE182 1! S! I! U! M!
19 Illustration P O T A S T P O T A T O! l = 1 c = 2 n 1! n 7! v P! n! 2 O! n! 3 T! n! 4 A! n! 5 T! n! 6 O! 1! T! S! A! S! T! E! n 10! 10/28/14 n! 8 n! 9 CSE182 S! I! U! M!
20 Illustration P O T A S T P O T A T O! l = 1 c = 6 n 1! n 7! P! O! T! A! T! O! T! n 2! A! n 3! n 4! S! T! E! n 5! vs! n 10! n 6! 10/28/14 n! 8 n! 9 CSE182 1! S! I! U! M!
21 Illustration P O T A S T P O T A T O! l = 3 c = 6 n 1! n 7! P! O! T! A! T! O! n 2! T! S! A! n 3! n 4! v S! T! E! n 5! n 10! n 6! 10/28/14 n! 8 n! 9 CSE182 1! S! I! U! M!
22 Illustration P O T A S T P O T A T O! l = 3 c = 7 n 1! n 7! P! O! T! A! T! O! n 2! T! S! A! n 3! n 4! v S! T! E! n 5! n 10! n 6! 10/28/14 n! 8 n! 9 n 11! CSE182 1! S! I! U! M!
23 Illustration P O T A S T P O T A T O! l = 7 c = 7 v n 1! n 7! P! O! T! A! T! O! n 2! T! S! A! n 3! n 4! S! T! E! n 5! n 10! n 6! 10/28/14 n! 8 n! 9 CSE182 1! S! I! U! M!
24 Illustration P O T A S T P O T A T O! l = 7 c = 8 n 1! n 7! v P! O! T! A! T! O! n 2! T! S! A! n 3! n 4! S! T! E! n 5! n 10! n 6! 10/28/14 n! 8 n! 9 CSE182 1! S! I! U! M!
25 Illustration P O T A S T P O T A T O! l = 7 c = 7 n 1! n 7! v P! n! 2 O! n! 3 T! n! 4 A! n! 5 T! n! 6 O! 1! T! S! A! S! T! E! n 10! 10/28/14 n! 8 n! 9 CSE182 S! I! U! M!
26 Time analysis In each step, either c is incremented, or l is incremented Neither pointer is ever decremented (lp[v] < c-l). l and c do not exceed n Total time <= 2n l! c! P O T A S T P O T A T O! 10/28/14 CSE182
27 Blast: Putting it all together Input: Query of length m, database of size n Select word-size, scoring matrix, gap penalties, E-value cutoff Blast 10/28/14 CSE182
28 Blast Steps 1. Generate an automaton of all query keywords. 2. Scan database using a Dictionary Matching algorithm (O(n) time). Identify all hits. 3. Extend each hit using a variant of local alignment algorithm. Use the scoring matrix and gap penalties. 4. For each alignment with score S, compute E-value, and the P-value. Sort according to increasing E-value until the cut-off is reached. 5. Output results. 10/28/14 CSE182
29 BLAST output 10/28/14 CSE182
30 Distant hits 10/28/14 CSE182
31 Family assignment question Query A has a distant match to B and C from the database. Is A similar to B, or to C? Should A inherit the function of B, or of C B A C CSE182
32 Silly Quiz Skin patterns Facial Features Fa 07 CSE182
33 Not all features(residues) are important Skin patterns Facial Features Fa 07 CSE182
34 Diverged family members provide key features Fa 07 CSE182
35 Protein sequence motifs Premise: The sequence of a protein sequence gives clues about its structure and function. Not all residues are equally important in determining function. Suppose we knew the key residues of a family. If our query matches in those residues, it is a member. Otherwise, it is not. How can we identify these key residues? B Fam(B) A C A C CSE182
36 Regular expressions as Protein sequence motifs C-X-[DE]-X{10,12}-C-X-C--[STYLV] Fam(B) A C E V CSE182
37 The sequence analysis perspective Zinc Finger motif (Prosite database) C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H 2 conserved C, and 2 conserved H How can we search a database using these motifs? The motif is described using a regular expression. What is a regular expression? Fa 07 CSE182
38 End of L CSE182
39 Regular Expressions Concise representation of a set of strings over alphabet. Described by a string over R is a r.e. if and only if { Σ,,,+ } R = {ε} R = {σ},σ Σ R = R 1 + R 2 R = R 1 R 2 * R = R 1 Base case Union of strings Concatenation 0 or more repetitions Fa 07 CSE182
40 Regular Expression Q: Let ={A,C,E} Is (A+C)*EEC* a regular expression? *(A+C)? AC*..E? Q: When is a string s in a regular expression? R =(A+C)*EEC* Is CEEC in R? AEC? ACEE? Fa 07 CSE182
41 Regular Expression & Automata Every R.E can be expressed by an automaton (a directed graph) with the following properties: The automaton has a start and end node Each edge is labeled with a symbol from, or ε Suppose R is described by automaton A S R if and only if there is a path from start to end in A, labeled with s. Fa 07 CSE182
42 Examples: Regular Expression & Automata (A+C)*EEC* A C start E E end C Fa 07 CSE182
43 Constructing automata from R.E R = {ε} R = {σ}, σ R = R 1 + R 2 ε σ ε R = R 1 R 2 R = R 1 * ε ε ε ε ε ε CSE182 ε
44 Matching Regular expressions A string s belongs to R if and only if, there is a path from START to END in R A, labeled by s. Given a regular expression R (automaton R A ), and a database D, is there a string D[b..c] that matches R A (D[b..c] R) Simpler Q: Is D[1..c] accepted by the automaton of R? CSE182
45 Alg. For matching R.E. If D[1..c] is accepted by the automaton R A There is a path labeled D[1] D[c] that goes from START to END in R A D[1] ε D[2] D[c] CSE182
46 Alg. For matching R.E. If D[1..c] is accepted by the automaton R A There is a path labeled D[1] D[c] that goes from START to END in R A There is a path labeled D[1]..D[c-1] from START to node u, and a path labeled D[c] from u to the END D[1].. D[c-1] u D[c] CSE182
47 D.P. to match regular expression Define: A[u,σ] = Automaton node reached from u after reading σ Eps(u): set of all nodes reachable from node u using epsilon transitions. N[c] = subset of nodes reachable from START node after reading D[1..c] Q: when is v N[c] u u ε σ v Eps(u) CSE182
48 D.P. to match regular expression Q: when is v N[c]? A: If for some u N[c-1], w = A[u,D[c]], v {w}+ Eps(w) CSE182
49 Algorithm CSE182
50 The final step We have answered the question: Is D[1..c] accepted by R? Yes, if END N[c] We need to answer Is D[l..c] (for some l, and some c) accepted by R D[l..c] R D[1..c] Σ R CSE182
51 END of L CSE182
L3: Blast: Keyword match basics
L3: Blast: Keyword match basics Fa05 CSE 182 Silly Quiz TRUE or FALSE: In New York City at any moment, there are 2 people (not bald) with exactly the same number of hairs! Assignment 1 is online Due 10/6
More informationPattern Matching (Exact Matching) Overview
CSI/BINF 5330 Pattern Matching (Exact Matching) Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Pattern Matching Exhaustive Search DFA Algorithm KMP Algorithm
More informationBLAST: Basic Local Alignment Search Tool
.. CSC 448 Bioinformatics Algorithms Alexander Dekhtyar.. (Rapid) Local Sequence Alignment BLAST BLAST: Basic Local Alignment Search Tool BLAST is a family of rapid approximate local alignment algorithms[2].
More informationAlgorithm Theory. 13 Text Search - Knuth, Morris, Pratt, Boyer, Moore. Christian Schindelhauer
Algorithm Theory 13 Text Search - Knuth, Morris, Pratt, Boyer, Moore Institut für Informatik Wintersemester 2007/08 Text Search Scenarios Static texts Literature databases Library systems Gene databases
More informationString Matching with Variable Length Gaps
String Matching with Variable Length Gaps Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, and David Kofoed Wind Technical University of Denmark Abstract. We consider string matching with variable length
More information15 Text search. P.D. Dr. Alexander Souza. Winter term 11/12
Algorithms Theory 15 Text search P.D. Dr. Alexander Souza Text search Various scenarios: Dynamic texts Text editors Symbol manipulators Static texts Literature databases Library systems Gene databases
More informationGrundlagen der Bioinformatik, SS 08, D. Huson, May 2,
Grundlagen der Bioinformatik, SS 08, D. Huson, May 2, 2008 39 5 Blast This lecture is based on the following, which are all recommended reading: R. Merkl, S. Waack: Bioinformatik Interaktiv. Chapter 11.4-11.7
More informationModule 9: Tries and String Matching
Module 9: Tries and String Matching CS 240 - Data Structures and Data Management Sajed Haque Veronika Irvine Taylor Smith Based on lecture notes by many previous cs240 instructors David R. Cheriton School
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationString Regularities and Degenerate Strings
M. Sc. Thesis Defense Md. Faizul Bari (100705050P) Supervisor: Dr. M. Sohel Rahman String Regularities and Degenerate Strings Department of Computer Science and Engineering Bangladesh University of Engineering
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More informationCISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)
CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and
More informationDefine M to be a binary n by m matrix such that:
The Shift-And Method Define M to be a binary n by m matrix such that: M(i,j) = iff the first i characters of P exactly match the i characters of T ending at character j. M(i,j) = iff P[.. i] T[j-i+.. j]
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 05: Index-based alignment algorithms Slides adapted from Dr. Shaojie Zhang (University of Central Florida) Real applications of alignment Database search
More informationarxiv: v1 [cs.ds] 9 Apr 2018
From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract
More informationSmall-Space Dictionary Matching (Dissertation Proposal)
Small-Space Dictionary Matching (Dissertation Proposal) Graduate Center of CUNY 1/24/2012 Problem Definition Dictionary Matching Input: Dictionary D = P 1,P 2,...,P d containing d patterns. Text T of length
More informationIn-Depth Assessment of Local Sequence Alignment
2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.
More informationAutomata and Languages
Automata and Languages Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Nondeterministic Finite Automata with empty moves (-NFA) Definition A nondeterministic finite automaton
More informationPattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1
Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Outline and Reading Strings ( 9.1.1) Pattern matching algorithms Brute-force algorithm ( 9.1.2) Boyer-Moore algorithm ( 9.1.3) Knuth-Morris-Pratt
More informationSuccinct 2D Dictionary Matching with No Slowdown
Succinct 2D Dictionary Matching with No Slowdown Shoshana Neuburger and Dina Sokol City University of New York Problem Definition Dictionary Matching Input: Dictionary D = P 1,P 2,...,P d containing d
More informationPattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching Goodrich, Tamassia
Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Brute-Force Pattern Matching ( 11.2.1) The brute-force pattern matching algorithm compares the pattern P with the text T for each possible shift
More information2. Exact String Matching
2. Exact String Matching Let T = T [0..n) be the text and P = P [0..m) the pattern. We say that P occurs in T at position j if T [j..j + m) = P. Example: P = aine occurs at position 6 in T = karjalainen.
More informationGiven a string manipulating program, string analysis determines all possible values that a string expression can take during any program execution
l Given a string manipulating program, string analysis determines all possible values that a string expression can take during any program execution l Using string analysis we can verify properties of
More informationLecture 2: Pairwise Alignment. CG Ron Shamir
Lecture 2: Pairwise Alignment 1 Main source 2 Why compare sequences? Human hexosaminidase A vs Mouse hexosaminidase A 3 www.mathworks.com/.../jan04/bio_genome.html Sequence Alignment עימוד רצפים The problem:
More informationFast profile matching algorithms A survey
Theoretical Computer Science 395 (2008) 137 157 www.elsevier.com/locate/tcs Fast profile matching algorithms A survey Cinzia Pizzi a,,1, Esko Ukkonen b a Department of Computer Science, University of Helsinki,
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationSearching Sear ( Sub- (Sub )Strings Ulf Leser
Searching (Sub-)Strings Ulf Leser This Lecture Exact substring search Naïve Boyer-Moore Searching with profiles Sequence profiles Ungapped approximate search Statistical evaluation of search results Ulf
More informationChapter 5. Proteomics and the analysis of protein sequence Ⅱ
Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and
More informationCS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa
CS:4330 Theory of Computation Spring 2018 Regular Languages Finite Automata and Regular Expressions Haniel Barbosa Readings for this lecture Chapter 1 of [Sipser 1996], 3rd edition. Sections 1.1 and 1.3.
More informationIntroduction to Sequence Alignment. Manpreet S. Katari
Introduction to Sequence Alignment Manpreet S. Katari 1 Outline 1. Global vs. local approaches to aligning sequences 1. Dot Plots 2. BLAST 1. Dynamic Programming 3. Hash Tables 1. BLAT 4. BWT (Burrow Wheeler
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationCSE182-L8. Mass Spectrometry
CSE182-L8 Mass Spectrometry Project Notes Implement a few tools for proteomics C1:11/2/04 Answer MS questions to get started, select project partner, select a project. C2:11/15/04 (All but web-team) Plan
More informationTheoretical Computer Science
Theoretical Computer Science 443 (2012) 25 34 Contents lists available at SciVerse ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs String matching with variable
More informationFinite Automata - Deterministic Finite Automata. Deterministic Finite Automaton (DFA) (or Finite State Machine)
Finite Automata - Deterministic Finite Automata Deterministic Finite Automaton (DFA) (or Finite State Machine) M = (K, Σ, δ, s, A), where K is a finite set of states Σ is an input alphabet s K is a distinguished
More informationSubset seed automaton
Subset seed automaton Gregory Kucherov, Laurent Noé, and Mikhail Roytberg 2 LIFL/CNRS/INRIA, Bât. M3 Cité Scientifique, 59655, Villeneuve d Ascq cedex, France, {Gregory.Kucherov,Laurent.Noe}@lifl.fr 2
More informationRegular Expressions and Language Properties
Regular Expressions and Language Properties Mridul Aanjaneya Stanford University July 3, 2012 Mridul Aanjaneya Automata Theory 1/ 47 Tentative Schedule HW #1: Out (07/03), Due (07/11) HW #2: Out (07/10),
More informationChapter 0 Introduction. Fourth Academic Year/ Elective Course Electrical Engineering Department College of Engineering University of Salahaddin
Chapter 0 Introduction Fourth Academic Year/ Elective Course Electrical Engineering Department College of Engineering University of Salahaddin October 2014 Automata Theory 2 of 22 Automata theory deals
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationString Search. 6th September 2018
String Search 6th September 2018 Search for a given (short) string in a long string Search problems have become more important lately The amount of stored digital information grows steadily (rapidly?)
More informationList of Code Challenges. About the Textbook Meet the Authors... xix Meet the Development Team... xx Acknowledgments... xxi
Contents List of Code Challenges xvii About the Textbook xix Meet the Authors................................... xix Meet the Development Team............................ xx Acknowledgments..................................
More information20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming
20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, 2008 4 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance 4. Global and local alignment
More informationDeterministic Finite Automaton (DFA)
1 Lecture Overview Deterministic Finite Automata (DFA) o accepting a string o defining a language Nondeterministic Finite Automata (NFA) o converting to DFA (subset construction) o constructed from a regular
More informationComputational Biology
Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,
More informationUNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r
Syllabus R9 Regulation UNIT-II NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: In the automata theory, a nondeterministic finite automaton (NFA) or nondeterministic finite state machine is a finite
More informationClarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.
CMSC 330: Organization of Programming Languages Last Lecture Languages Sets of strings Operations on languages Finite Automata Regular expressions Constants Operators Precedence CMSC 330 2 Clarifications
More informationCPSC 421: Tutorial #1
CPSC 421: Tutorial #1 October 14, 2016 Set Theory. 1. Let A be an arbitrary set, and let B = {x A : x / x}. That is, B contains all sets in A that do not contain themselves: For all y, ( ) y B if and only
More information11.3 Decoding Algorithm
11.3 Decoding Algorithm 393 For convenience, we have introduced π 0 and π n+1 as the fictitious initial and terminal states begin and end. This model defines the probability P(x π) for a given sequence
More informationAutomata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,
Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu ETH Zürich (D-ITET) September, 24 2015 Last week was all about Deterministic Finite Automaton We saw three main
More informationSize reduction of multitape automata
Literature: Size reduction of multitape automata Hellis Tamm Tamm, H. On minimality and size reduction of one-tape and multitape finite automata. PhD thesis, Department of Computer Science, University
More informationHow do regular expressions work? CMSC 330: Organization of Programming Languages
How do regular expressions work? CMSC 330: Organization of Programming Languages Regular Expressions and Finite Automata What we ve learned What regular expressions are What they can express, and cannot
More informationINF 4130 / /8-2017
INF 4130 / 9135 28/8-2017 Algorithms, efficiency, and complexity Problem classes Problems can be divided into sets (classes). Problem classes are defined by the type of algorithm that can (or cannot) solve
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationClosure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism
Closure Properties of Regular Languages Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism Closure Properties Recall a closure property is a statement
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang
More informationString Matching Problem
String Matching Problem Pattern P Text T Set of Locations L 9/2/23 CAP/CGS 5991: Lecture 2 Computer Science Fundamentals Specify an input-output description of the problem. Design a conceptual algorithm
More informationA Unifying Framework for Compressed Pattern Matching
A Unifying Framework for Compressed Pattern Matching Takuya Kida Yusuke Shibata Masayuki Takeda Ayumi Shinohara Setsuo Arikawa Department of Informatics, Kyushu University 33 Fukuoka 812-8581, Japan {
More informationChap. 1.2 NonDeterministic Finite Automata (NFA)
Chap. 1.2 NonDeterministic Finite Automata (NFA) DFAs: exactly 1 new state for any state & next char NFA: machine may not work same each time More than 1 transition rule for same state & input Any one
More informationCS21 Decidability and Tractability
CS21 Decidability and Tractability Lecture 3 January 9, 2017 January 9, 2017 CS21 Lecture 3 1 Outline NFA, FA equivalence Regular Expressions FA and Regular Expressions January 9, 2017 CS21 Lecture 3 2
More informationProofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.
Proofs, Strings, and Finite Automata CS154 Chris Pollett Feb 5, 2007. Outline Proofs and Proof Strategies Strings Finding proofs Example: For every graph G, the sum of the degrees of all the nodes in G
More informationSri vidya college of engineering and technology
Unit I FINITE AUTOMATA 1. Define hypothesis. The formal proof can be using deductive proof and inductive proof. The deductive proof consists of sequence of statements given with logical reasoning in order
More informationFast String Kernels. Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200
Fast String Kernels Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200 Alex.Smola@anu.edu.au joint work with S.V.N. Vishwanathan Slides (soon) available
More informationCS243, Logic and Computation Nondeterministic finite automata
CS243, Prof. Alvarez NONDETERMINISTIC FINITE AUTOMATA (NFA) Prof. Sergio A. Alvarez http://www.cs.bc.edu/ alvarez/ Maloney Hall, room 569 alvarez@cs.bc.edu Computer Science Department voice: (67) 552-4333
More informationCMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata
: Organization of Programming Languages Theory of Regular Expressions Finite Automata Previous Course Review {s s defined} means the set of string s such that s is chosen or defined as given s A means
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology
More informationDictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line
Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VF Files On-line MatBio 18 Solon P. Pissis and Ahmad Retha King s ollege London 02-Aug-2018 Solon P. Pissis and Ahmad Retha
More informationTheory of Computation
Theory of Computation (Feodor F. Dragan) Department of Computer Science Kent State University Spring, 2018 Theory of Computation, Feodor F. Dragan, Kent State University 1 Before we go into details, what
More informationLanguages, regular languages, finite automata
Notes on Computer Theory Last updated: January, 2018 Languages, regular languages, finite automata Content largely taken from Richards [1] and Sipser [2] 1 Languages An alphabet is a finite set of characters,
More informationAdvanced Automata Theory 7 Automatic Functions
Advanced Automata Theory 7 Automatic Functions Frank Stephan Department of Computer Science Department of Mathematics National University of Singapore fstephan@comp.nus.edu.sg Advanced Automata Theory
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationCS 455/555: Finite automata
CS 455/555: Finite automata Stefan D. Bruda Winter 2019 AUTOMATA (FINITE OR NOT) Generally any automaton Has a finite-state control Scans the input one symbol at a time Takes an action based on the currently
More informationMultiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:
Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:50 5001 5 Multiple Sequence Alignment The first part of this exposition is based on the following sources, which are recommended reading:
More information{a, b, c} {a, b} {a, c} {b, c} {a}
Section 4.3 Order Relations A binary relation is an partial order if it transitive and antisymmetric. If R is a partial order over the set S, we also say, S is a partially ordered set or S is a poset.
More informationBioinformatics and BLAST
Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists
More informationAlgorithms: COMP3121/3821/9101/9801
NEW SOUTH WALES Algorithms: COMP3121/3821/9101/9801 Aleks Ignjatović School of Computer Science and Engineering University of New South Wales LECTURE 8: STRING MATCHING ALGORITHMS COMP3121/3821/9101/9801
More informationDeterministic Finite Automata (DFAs)
CS/ECE 374: Algorithms & Models of Computation, Fall 28 Deterministic Finite Automata (DFAs) Lecture 3 September 4, 28 Chandra Chekuri (UIUC) CS/ECE 374 Fall 28 / 33 Part I DFA Introduction Chandra Chekuri
More informationEfficient High-Similarity String Comparison: The Waterfall Algorithm
Efficient High-Similarity String Comparison: The Waterfall Algorithm Alexander Tiskin Department of Computer Science University of Warwick http://go.warwick.ac.uk/alextiskin Alexander Tiskin (Warwick)
More informationOpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language. Part I. Theory and Algorithms
OpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language Part I. Theory and Algorithms Overview. Preliminaries Semirings Weighted Automata and Transducers.
More informationCISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata
CISC 4090: Theory of Computation Chapter Regular Languages Xiaolan Zhang, adapted from slides by Prof. Werschulz Section.: Finite Automata Fordham University Department of Computer and Information Sciences
More informationPeter Wood. Department of Computer Science and Information Systems Birkbeck, University of London Automata and Formal Languages
and and Department of Computer Science and Information Systems Birkbeck, University of London ptw@dcs.bbk.ac.uk Outline and Doing and analysing problems/languages computability/solvability/decidability
More informationSequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir
Sequence Bioinformatics Multiple Sequence Alignment Waqas Nasir 2010-11-12 Multiple Sequence Alignment One amino acid plays coy; a pair of homologous sequences whisper; many aligned sequences shout out
More informationAlgorithms for Molecular Biology
Algorithms for Molecular Biology BioMed Central Research A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series Sara C Madeira* 1,2,3 and Arlindo
More informationCSE : Computational Issues in Molecular Biology. Lecture 6. Spring 2004
CSE 397-497: Computational Issues in Molecular Biology Lecture 6 Spring 2004-1 - Topics for today Based on premise that algorithms we've studied are too slow: Faster method for global comparison when sequences
More informationLecture 4 : Adaptive source coding algorithms
Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv
More informationCFG PSA Algorithm. Sequence Alignment Guided By Common Motifs Described By Context Free Grammars
FG PS lgorithm Sequence lignment Guided By ommon Motifs Described By ontext Free Grammars motivation Find motifs- conserved regions that indicate a biological function or signature. Other algorithm do
More informationDeterministic Finite Automata (DFAs)
Algorithms & Models of Computation CS/ECE 374, Fall 27 Deterministic Finite Automata (DFAs) Lecture 3 Tuesday, September 5, 27 Sariel Har-Peled (UIUC) CS374 Fall 27 / 36 Part I DFA Introduction Sariel
More informationSingle alignment: Substitution Matrix. 16 march 2017
Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block
More informationINF 4130 / /8-2014
INF 4130 / 9135 26/8-2014 Mandatory assignments («Oblig-1», «-2», and «-3»): All three must be approved Deadlines around: 25. sept, 25. oct, and 15. nov Other courses on similar themes: INF-MAT 3370 INF-MAT
More informationHashing Techniques For Finite Automata
Hashing Techniques For Finite Automata Hady Zeineddine Logic Synthesis Course Project - Spring 2007 Professor Adnan Aziz 1. Abstract This report presents two hashing techniques - Alphabet and Power-Set
More information2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.
Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand
More informationCS21 Decidability and Tractability
CS21 Decidability and Tractability Lecture 2 January 5, 2018 January 5, 2018 CS21 Lecture 2 1 Outline Finite Automata Nondeterministic Finite Automata Closure under regular operations NFA, FA equivalence
More informationMining Emerging Substrings
Mining Emerging Substrings Sarah Chan Ben Kao C.L. Yip Michael Tang Department of Computer Science and Information Systems The University of Hong Kong {wyschan, kao, clyip, fmtang}@csis.hku.hk Abstract.
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationCSE 311: Foundations of Computing. Lecture 23: Finite State Machine Minimization & NFAs
CSE : Foundations of Computing Lecture : Finite State Machine Minimization & NFAs State Minimization Many different FSMs (DFAs) for the same problem Take a given FSM and try to reduce its state set by
More informationBio nformatics. Lecture 3. Saad Mneimneh
Bio nformatics Lecture 3 Sequencing As before, DNA is cut into small ( 0.4KB) fragments and a clone library is formed. Biological experiments allow to read a certain number of these short fragments per
More informationOptimizing Finite Automata
Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states
More informationConverting SLP to LZ78 in almost Linear Time
CPM 2013 Converting SLP to LZ78 in almost Linear Time Hideo Bannai 1, Paweł Gawrychowski 2, Shunsuke Inenaga 1, Masayuki Takeda 1 1. Kyushu University 2. Max-Planck-Institut für Informatik Recompress SLP
More informationHierarchical Overlap Graph
Hierarchical Overlap Graph B. Cazaux and E. Rivals LIRMM & IBC, Montpellier 8. Feb. 2018 arxiv:1802.04632 2018 B. Cazaux & E. Rivals 1 / 29 Overlap Graph for a set of words Consider the set P := {abaa,
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More information