RNA Secondary Structure Prediction
|
|
- Ralf Rose
- 5 years ago
- Views:
Transcription
1 RNA Secondary Structure Prediction 1
2 RNA structure prediction methods Base-Pair Maximization Context-Free Grammar Parsing. Free Energy Methods Covariance Models 2
3 The Nussinov-Jacobson Algorithm q = 9 A C A G U U G C A A C A G U U G C A 1 A C A G U U G C A 0 0
4 SCFG Version Nussinov algorithm can be converted to a stochastic context-free grammar: S W W aw cw gw uw W Wa Wc Wg Wu W awu cwg uwa gwc W WW 4
5 SCFGs Stochastic Context Free Grammars (SCFGs) have also been used to model RNA secondary structure Examples trnascan-se program created to find snornas Grammars are created by using a training set of data, and then the grammars are applied to potential sequences to see if they fit into the language 5
6 SCFGs SCFGs allow the detection of sequences belonging to a family trnas group I introns snornas snrnas 6
7 SCFGs Any RNA structure can be reduced to a SCFG (see Durbin, et al., p ) 7
8 Transformational Grammars First described by linguist Noam Chomsky in the 1950 s. (Yes, the same Noam Chomsky who has expressed various dissident political views throughout the years!) 8
9 13 June
10 13 June
11 Transformational Grammars Very important in computer science, most notably in compiler design Covered in detail in compiler and automaton classes 11
12 Transformational Grammars Idea: take a set of outputs (sentence, RNA structure) and determine if it can be produced using a set of rules Consist of a set of symbols and production rules The symbols can be terminal (emitting) symbols or non-terminal symbols 12
13 13 June
14 13 June
15 13 June
16 13 June
17 Grammar for Palindromes Consider palindromic DNA sequences Five possible terminal symbols: {a, c, g, t, ) ( represents the blank terminal symbol) 17
18 Grammar for Palindromes Production Rules, where S and W are non-terminal symbols: S W W awa cwc gwg twt W a c g t 18
19 Derivation of Sequences Using these production rules, a derivation of the palindromic sequence acttgttca follows: S W awa acwca actwtca acttwttca acttgttca 19
20 13 June
21 SCFGs for RNA base-paired columns modeled by pairwise emitting non terminals awu; uwa; gwc; cwg;... single-stranded columns modeled by leftwise emitting nonterminals (when possible) aw; cw; gw; uw;..., when possible 21
22 Parse Trees A context-free grammar can be aligned to a sequence using a parse tree Root of the tree is the non-terminal start symbol, S Leaves are terminal symbols Internal nodes are the nonterminals Leaves can be parsed from left to right to view the results of production 23
23 13 June
24 Parse Tree S W W W W W a c t t g t t c a 25
25 13 June
26 13 June
27 13 June
28 13 June
29 13 June
30 دانشگاه صنعتی امیر کبیر دانشکده مهندسی کامپیوتر CYK )Cocke-Younger-Kasami) Parsing Algorithm سید محمد حسین معطر پردازش زبان طبیعی
31 Parsing Algorithms CFGs are basis for describing (syntactic) structure of NL sentences Thus - Parsing Algorithms are core of NL analysis systems Recognition vs. Parsing: Recognition - deciding the membership in the language: Parsing Recognition +producing a parse tree for it Parsing is more difficult than recognition? (time complexity) Ambiguity - an input may have exponentially many parses
32 CYK )Cocke-Younger-Kasami) One of the earliest recognition and parsing algorithms The standard version of CYK can only recognize languages defined by context-free grammars in Chomsky Normal Form (CNF). It is also possible to extend the CYK algorithm to handle some grammars which are not in CNF Harder to understand Based on a dynamic programming approach: Build solutions compositionally from sub-solutions Store sub-solutions and re-use them whenever necessary Recognition version: decide whether S == > w?
33 CYK Algorithm The CYK algorithm for the membership problem is as follows: Let the input string be a sequence of n letters a1... an. Let the grammar contain r terminal and nonterminal symbols R1... Rr, and let R1 be the start symbol. Let P[n,n,r] be an array of booleans. Initialize all elements of P to false. For each i = 1 to n For each unit production Rj -> ai, set P[i,1,j] = true. For each i = 2 to n -- Length of span For each j = 1 to n-i+1 -- Start of span For each k = 1 to i-1 -- Partition of span» For each production RA -> RB RC» If P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true If P[1,n,1] is true Then string is member of language Else string is not member of language
34 CYK Pseudocode On input x = x 1 x 2 x n : for (i = 1 to n) //create middle diagonal for (each var. A) if(a x i ) add A to table[i-1][i] for (d = 2 to n) // d th diagonal for (i = 0 to n-d) for (k = i+1 to i+d-1) for (each var. A) for(each var. B in table[i][k]) for(each var. C in table[k][k+d]) if(a BC) add A to table[i][k+d] return S table[0][n]? ACCEPT : REJECT
35 CYK Algorithm this algorithm considers every possible consecutive subsequence of the sequence of letters and sets P[i,j,k] to be true if the sequence of letters starting from i of length j can be generated from Rk. Once it has considered sequences of length 1, it goes on to sequences of length 2, and so on. For subsequences of length 2 and greater, it considers every possible partition of the subsequence into two halves, and checks to see if there is some production P -> Q R such that Q matches the first half and R matches the second half. If so, it records P as matching the whole subsequence. Once this process is completed, the sentence is recognized by the grammar if the subsequence containing the entire string is matched by the start symbol
36 CYK Algorithm for Deciding Context Free Languages Q: Consider the grammar G given by S AB XB T AB XB X AT A a B b 1. Is x = in L(G )?
37 CYK Algorithm for Deciding Context Free Languages Now look at : S AB XB T AB XB X AT A a B b a a a b b b
38 CYK Algorithm for Deciding Context Free Languages 1) Write variables for all length 1 substrings. S AB XB T AB XB X AT A a B b a a a b b A A A B B b B
39 CYK Algorithm for Deciding Context Free Languages 2) Write variables for all length 2 substrings. S AB XB T AB XB X AT A a B b a a a b b A A A B B S,T b B
40 CYK Algorithm for Deciding Context Free Languages 3) Write variables for all length 3 substrings. S AB XB T AB XB X AT A a B b a a a b b A A A B B S,T T X b B
41 CYK Algorithm for Deciding Context Free Languages 4) Write variables for all length 4 substrings. S AB XB T AB XB X AT A a B b a a a b b A A A B B S,T T X S,T b B
42 CYK Algorithm for Deciding Context Free Languages 5) Write variables for all length 5 substrings. S AB XB T AB XB X AT A a B b a a a b b A A A B B S,T T X S,T X b B
43 CYK Algorithm for Deciding Context Free Languages 6) Write variables for all length 6 substrings. S AB XB T AB XB X AT A a B b S is included so accepted! a a a b b A A A B B X X S,T T S,T S,T b B
44 CYK Algorithm for Deciding Context Free Languages Can also use a table for same purpose. end at start at 1: 2: 3: 4: 5: 6: 0: 1: 2: 3: 4: 5:
45 CYK Algorithm for Deciding Context Free Languages 1. Variables for length 1 substrings. end at start at 1: 2: 3: 4: 5: 6: 0: A 1: A 2: A 3: B 4: B 5: B
46 CYK Algorithm for Deciding Context Free Languages 2. Variables for length 2 substrings. end at start at 1: 2: 0: A - 3: 1: A - 4: 2: A S,T 5: 3: B - 6: 4: B - 5: B
47 CYK Algorithm for Deciding Context Free Languages 3. Variables for length 3 substrings. end at start at 1: 2: 3: 0: A - - 4: 1: A - X 5: 2: A S,T - 6: 3: B - - 4: B - 5: B
48 CYK Algorithm for Deciding Context Free Languages 4. Variables for length 4 substrings. end at start at 1: 2: 3: 4: 0: A : 1: A - X S,T 6: 2: A S,T - - 3: B - - 4: B - 5: B
49 CYK Algorithm for Deciding Context Free Languages 5. Variables for length 5 substrings. end at start at 1: 2: 3: 4: 5: 0: A X 6: 1: A - X S,T - 2: A S,T - - 3: B - - 4: B - 5: B
50 CYK Algorithm for Deciding Context Free Languages 6. Variables for. ACCEPTED! end at start at 1: 2: 3: 4: 5: 6: 0: A X S,T 1: A - X S,T - 2: A S,T - - 3: B - - 4: B - 5: B
51 Parsing results We keep the results for every w ij in a table. Note that we only need to fill in entries up to the diagonal the longest substring starting at i is of length n-i+1
52 Constructing parse tree we need to construct parse trees for string w: Idea: Keep back-pointers to the table entries that we combine At the end - reconstruct a parse from the back-pointers This allows us to find all parse trees
53 References Hopcroft and Ullman, Intro. to Automata Theory, Lang. and Comp. Section 6.3, pp CYK algorithm, Wikipedia, the free encyclopedia A representation by Zeph Grunschlag
54 The Nussinov-Jacobson Algorithm q = 9 A C A G U U G C A A C A G U U G C A 1 A C A G U U G C A 0 0
55 The Nussinov-Jacobson Algorithm A C A G U U G C A A C A G U U G C A 1 A C A G U U G C A
56 The Nussinov-Jacobson Algorithm i < q j q-1 q A C A G U U G C A A C A G U U G C A 1 A C A G U U G C A
57 Co-terminus foldings: A U C A U G G C A U Partitionable foldings: A C A G U U G C A
58 Another way to write the Nussinov-Jacobson recursion Initialization: ( i, i 1) ( i, i) 0 0 for i 2 to L Recursion: ( i 1, j); ( i, j 1); ( i, j) max ( i 1, j 1) BasePairScore( i, j); maxi k j ( i, k) ( k 1, j). 68 Two special cases of Partitionable Folding Co-Terminus Folding Partitionable Folding
59 SCFG version of the Nussinov-Jacobson algorithm Stochastic Context-Free Grammars Makes use of production rules: W aw cw gw uw (i unpaired) Every production rule has a associated probability parameter. The maximum probability parse is equivalent to the maximum probability secondary structure. 69
60 SCFG Version of Nussinov- Jacobson Algorithm The algorithm can be converted to a stochastic context-free grammar: S W W aw cw gw uw W Wa Wc Wg Wu W awu cwg uwa gwc W WW 70
61 Needed terminology The inside-outside (recursive dynamic programming) algorithm for SCFGs in Chomsky normal form is the natural counterpart of the forward-backward algorithm for HMM. Best path variant of the inside-outside algorithm is the Cocke-Younger-Kasami (CYK) algorithm. It finds the maximum probabilistic alignment of the SCFG to the sequence. 71
62 CYK for Nussinov-style RNA SCFG Initialization: ( i, i 1) ( i, i) for i 2 to log max log p( x S) i p( Sx ) i L for i 1to L Addition to the fill stage of the Nussinov algorithm. The principal difference is that the SCFG description is a probabilistic model. Recursion: ( i 1, j) log p( xw i ); ( i, j 1) log p( Wx j ); ( i, j) max ( i 1, j 1) log p( xwx i j ); maxi k j ( i, k) ( k 1, j) log p( WW ). Two special cases of Partitionable Folding Co-Terminus Folding Partitionable Folding 72
63 CYK for Nussinov-style RNA SCFG (2) The log P( x, ˆ ) is the log likelihood of the optimal structure given the SCFG model The traceback to find the secondary structure corresponding to the best score is performed analogously to the traceback in the Nussinov algorithm ˆ 73
64 Example of RNA Structure SCFG RNA structure for the sequence produced by MFOLD, can be constructed (5 to 3 ): GCUUACGACCAUAUCACGUUGAAUGCAC GCCAUCCCGUCCGAUCUGGCAAGUUAAG CAACGUUGAGUCCAGUUAGUACUUGGAU CGGAGACGGCCUGGGAAUCCUGGAUGU UGUAAGCU 74
65 Example Construction S W Wu gwcu gcwgcu gcuwagcu gcuuwaagcu gcuuawuaagcu gcuuacwguaagcu gcuuacgwuguaagcu gcuuacgawuuguaagcu gcuuacgacwguuguaagcu gcuuacgaccwguuguaagcu gcuuacgaccawguuguaagcu... 75
66 CYK for Nussinov-style RNA SCFG Good starting example, but it is too simple to be an accurate RNA folder The algorithm does not consider important structural features like preferences for certain: Loop lengths Nearest neighbours in the structure caused by stacking interactions between neighbouring base pairs in a stem. 76
Even More on Dynamic Programming
Algorithms & Models of Computation CS/ECE 374, Fall 2017 Even More on Dynamic Programming Lecture 15 Thursday, October 19, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 26 Part I Longest Common Subsequence
More informationCSCI Compiler Construction
CSCI 742 - Compiler Construction Lecture 12 Cocke-Younger-Kasami (CYK) Algorithm Instructor: Hossein Hojjat February 20, 2017 Recap: Chomsky Normal Form (CNF) A CFG is in Chomsky Normal Form if each rule
More informationStatistical Machine Translation
Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding
More informationRNA Basics. RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U. Bases can only pair with one other base. wobble pairing. 23 Hydrogen Bonds more stable
RNA STRUCTURE RNA Basics RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U wobble pairing Bases can only pair with one other base. 23 Hydrogen Bonds more stable RNA Basics transfer RNA (trna) messenger
More informationCS681: Advanced Topics in Computational Biology
CS681: Advanced Topics in Computational Biology Can Alkan EA224 calkan@cs.bilkent.edu.tr Week 10 Lecture 1 http://www.cs.bilkent.edu.tr/~calkan/teaching/cs681/ RNA folding Prediction of secondary structure
More informationLecture 10: December 22, 2009
Computational Genomics Fall Semester, 2009 Lecture 0: December 22, 2009 Lecturer: Roded Sharan Scribe: Adam Weinstock and Yaara Ben-Amram Segre 0. Context Free Grammars 0.. Introduction In lecture 6, we
More informationFoundations of Informatics: a Bridging Course
Foundations of Informatics: a Bridging Course Week 3: Formal Languages and Semantics Thomas Noll Lehrstuhl für Informatik 2 RWTH Aachen University noll@cs.rwth-aachen.de http://www.b-it-center.de/wob/en/view/class211_id948.html
More informationIntroduction to Computational Linguistics
Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington May 3, 2018 1 / 101 Midterm Project Milestone 2: due Friday Assgnments 4& 5 due dates
More informationIn this chapter, we explore the parsing problem, which encompasses several questions, including:
Chapter 12 Parsing Algorithms 12.1 Introduction In this chapter, we explore the parsing problem, which encompasses several questions, including: Does L(G) contain w? What is the highest-weight derivation
More informationCYK Algorithm for Parsing General Context-Free Grammars
CYK Algorithm for Parsing General Context-Free Grammars Why Parse General Grammars Can be difficult or impossible to make grammar unambiguous thus LL(k) and LR(k) methods cannot work, for such ambiguous
More informationFinite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018
Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Lecture 14 Ana Bove May 14th 2018 Recap: Context-free Grammars Simplification of grammars: Elimination of ǫ-productions; Elimination of
More informationCS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)
CS5371 Theory of Computation Lecture 7: Automata Theory V (CFG, CFL, CNF) Announcement Homework 2 will be given soon (before Tue) Due date: Oct 31 (Tue), before class Midterm: Nov 3, (Fri), first hour
More informationCS481F01 Prelim 2 Solutions
CS481F01 Prelim 2 Solutions A. Demers 7 Nov 2001 1 (30 pts = 4 pts each part + 2 free points). For this question we use the following notation: x y means x is a prefix of y m k n means m n k For each of
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri RNA Structure Prediction Secondary
More informationRemembering subresults (Part I): Well-formed substring tables
Remembering subresults (Part I): Well-formed substring tables Detmar Meurers: Intro to Computational Linguistics I OSU, LING 684.01, 1. February 2005 Problem: Inefficiency of recomputing subresults Two
More informationChap. 7 Properties of Context-free Languages
Chap. 7 Properties of Context-free Languages 7.1 Normal Forms for Context-free Grammars Context-free grammars A where A N, (N T). 0. Chomsky Normal Form A BC or A a except S where A, B, C N, a T. 1. Eliminating
More informationProperties of Context-Free Languages
Properties of Context-Free Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationImproved TBL algorithm for learning context-free grammar
Proceedings of the International Multiconference on ISSN 1896-7094 Computer Science and Information Technology, pp. 267 274 2007 PIPS Improved TBL algorithm for learning context-free grammar Marcin Jaworski
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationGrammars and Context Free Languages
Grammars and Context Free Languages H. Geuvers and A. Kissinger Institute for Computing and Information Sciences Version: fall 2015 H. Geuvers & A. Kissinger Version: fall 2015 Talen en Automaten 1 / 23
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationTo make a grammar probabilistic, we need to assign a probability to each context-free rewrite
Notes on the Inside-Outside Algorithm To make a grammar probabilistic, we need to assign a probability to each context-free rewrite rule. But how should these probabilities be chosen? It is natural to
More informationComputing if a token can follow
Computing if a token can follow first(b 1... B p ) = {a B 1...B p... aw } follow(x) = {a S......Xa... } There exists a derivation from the start symbol that produces a sequence of terminals and nonterminals
More informationThis kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.
Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one
More informationA parsing technique for TRG languages
A parsing technique for TRG languages Daniele Paolo Scarpazza Politecnico di Milano October 15th, 2004 Daniele Paolo Scarpazza A parsing technique for TRG grammars [1]
More informationIntroduction to Theory of Computing
CSCI 2670, Fall 2012 Introduction to Theory of Computing Department of Computer Science University of Georgia Athens, GA 30602 Instructor: Liming Cai www.cs.uga.edu/ cai 0 Lecture Note 3 Context-Free Languages
More informationGrammars and Context Free Languages
Grammars and Context Free Languages H. Geuvers and J. Rot Institute for Computing and Information Sciences Version: fall 2016 H. Geuvers & J. Rot Version: fall 2016 Talen en Automaten 1 / 24 Outline Grammars
More informationCMPT-825 Natural Language Processing. Why are parsing algorithms important?
CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate
More informationPlan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form
Plan for 2 nd half Pumping Lemma for CFLs The Return of the Pumping Lemma Just when you thought it was safe Return of the Pumping Lemma Recall: With Regular Languages The Pumping Lemma showed that if a
More informationEinführung in die Computerlinguistik
Einführung in die Computerlinguistik Context-Free Grammars formal properties Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2018 1 / 20 Normal forms (1) Hopcroft and Ullman (1979) A normal
More informationParsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing
L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the
More informationParsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.
: Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching
More informationMA/CSSE 474 Theory of Computation
MA/CSSE 474 Theory of Computation CFL Hierarchy CFL Decision Problems Your Questions? Previous class days' material Reading Assignments HW 12 or 13 problems Anything else I have included some slides online
More informationGrammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG
Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing Laura Kallmeyer, Timm Lichte, Wolfgang Maier Universität Tübingen Part I Formal Properties of TAG 16.05.2007 und 21.05.2007 TAG Parsing
More informationCPS 220 Theory of Computation
CPS 22 Theory of Computation Review - Regular Languages RL - a simple class of languages that can be represented in two ways: 1 Machine description: Finite Automata are machines with a finite number of
More informationNotes for Comp 497 (Comp 454) Week 10 4/5/05
Notes for Comp 497 (Comp 454) Week 10 4/5/05 Today look at the last two chapters in Part II. Cohen presents some results concerning context-free languages (CFL) and regular languages (RL) also some decidability
More informationCISC4090: Theory of Computation
CISC4090: Theory of Computation Chapter 2 Context-Free Languages Courtesy of Prof. Arthur G. Werschulz Fordham University Department of Computer and Information Sciences Spring, 2014 Overview In Chapter
More informationFormal Languages and Automata
Formal Languages and Automata Lecture 6 2017-18 LFAC (2017-18) Lecture 6 1 / 31 Lecture 6 1 The recognition problem: the Cocke Younger Kasami algorithm 2 Pushdown Automata 3 Pushdown Automata and Context-free
More informationTheory of Computation 8 Deterministic Membership Testing
Theory of Computation 8 Deterministic Membership Testing Frank Stephan Department of Computer Science Department of Mathematics National University of Singapore fstephan@comp.nus.edu.sg Theory of Computation
More informationCS311 Computational Structures. NP-completeness. Lecture 18. Andrew P. Black Andrew Tolmach. Thursday, 2 December 2010
CS311 Computational Structures NP-completeness Lecture 18 Andrew P. Black Andrew Tolmach 1 Some complexity classes P = Decidable in polynomial time on deterministic TM ( tractable ) NP = Decidable in polynomial
More informationNotes for Comp 497 (454) Week 10
Notes for Comp 497 (454) Week 10 Today we look at the last two chapters in Part II. Cohen presents some results concerning the two categories of language we have seen so far: Regular languages (RL). Context-free
More informationstraight segment and the symbol b representing a corner, the strings ababaab, babaaba and abaabab represent the same shape. In order to learn a model,
The Cocke-Younger-Kasami algorithm for cyclic strings Jose Oncina Depto. de Lenguajes y Sistemas Informaticos Universidad de Alicante E-03080 Alicante (Spain) e-mail: oncina@dlsi.ua.es Abstract The chain-code
More informationFORMAL LANGUAGES, AUTOMATA AND COMPUTATION
FORMAL LANGUAGES, AUTOMATA AND COMPUTATION DECIDABILITY ( LECTURE 15) SLIDES FOR 15-453 SPRING 2011 1 / 34 TURING MACHINES-SYNOPSIS The most general model of computation Computations of a TM are described
More informationTheory of Computation Turing Machine and Pushdown Automata
Theory of Computation Turing Machine and Pushdown Automata 1. What is a Turing Machine? A Turing Machine is an accepting device which accepts the languages (recursively enumerable set) generated by type
More informationProperties of context-free Languages
Properties of context-free Languages We simplify CFL s. Greibach Normal Form Chomsky Normal Form We prove pumping lemma for CFL s. We study closure properties and decision properties. Some of them remain,
More informationContext-Free Grammar
Context-Free Grammar CFGs are more powerful than regular expressions. They are more powerful in the sense that whatever can be expressed using regular expressions can be expressed using context-free grammars,
More informationDecidable and undecidable languages
The Chinese University of Hong Kong Fall 2011 CSCI 3130: Formal languages and automata theory Decidable and undecidable languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130 Problems about
More informationParsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22
Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky
More informationAddress for Correspondence
Proceedings of BITCON-2015 Innovations For National Development National Conference on : Research and Development in Computer Science and Applications Research Paper SUBSTRING MATCHING IN CONTEXT FREE
More informationComputational Models: Class 5
Computational Models: Class 5 Benny Chor School of Computer Science Tel Aviv University March 27, 2019 Based on slides by Maurice Herlihy, Brown University, and modifications by Iftach Haitner and Yishay
More informationHarvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition
Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition Salil Vadhan October 11, 2012 Reading: Sipser, Section 2.3 and Section 2.1 (material on Chomsky Normal Form). Pumping Lemma for
More informationNatural Language Processing : Probabilistic Context Free Grammars. Updated 5/09
Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require
More informationCONTEXT-SENSITIVE HIDDEN MARKOV MODELS FOR MODELING LONG-RANGE DEPENDENCIES IN SYMBOL SEQUENCES
CONTEXT-SENSITIVE HIDDEN MARKOV MODELS FOR MODELING LONG-RANGE DEPENDENCIES IN SYMBOL SEQUENCES Byung-Jun Yoon, Student Member, IEEE, and P. P. Vaidyanathan*, Fellow, IEEE January 8, 2006 Affiliation:
More informationReview. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering
Review Earley Algorithm Chapter 13.4 Lecture #9 October 2009 Top-Down vs. Bottom-Up Parsers Both generate too many useless trees Combine the two to avoid over-generation: Top-Down Parsing with Bottom-Up
More informationFollow sets. LL(1) Parsing Table
Follow sets. LL(1) Parsing Table Exercise Introducing Follow Sets Compute nullable, first for this grammar: stmtlist ::= ε stmt stmtlist stmt ::= assign block assign ::= ID = ID ; block ::= beginof ID
More informationCS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa
CS:4330 Theory of Computation Spring 2018 Regular Languages Finite Automata and Regular Expressions Haniel Barbosa Readings for this lecture Chapter 1 of [Sipser 1996], 3rd edition. Sections 1.1 and 1.3.
More informationAC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013
Q.2 a. Prove by mathematical induction n 4 4n 2 is divisible by 3 for n 0. Basic step: For n = 0, n 3 n = 0 which is divisible by 3. Induction hypothesis: Let p(n) = n 3 n is divisible by 3. Induction
More informationEinführung in die Computerlinguistik
Einführung in die Computerlinguistik Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 22 CFG (1) Example: Grammar G telescope : Productions: S NP VP NP
More informationFinal exam study sheet for CS3719 Turing machines and decidability.
Final exam study sheet for CS3719 Turing machines and decidability. A Turing machine is a finite automaton with an infinite memory (tape). Formally, a Turing machine is a 6-tuple M = (Q, Σ, Γ, δ, q 0,
More informationParametrized Stochastic Grammars for RNA Secondary Structure Prediction
Parametrized Stochastic Grammars for RNA Secondary Structure Prediction Robert S. Maier Departments of Mathematics and Physics University of Arizona Tucson, AZ 85721, USA Email: rsm@math.arizona.edu Abstract
More informationChomsky Normal Form and TURING MACHINES. TUESDAY Feb 4
Chomsky Normal Form and TURING MACHINES TUESDAY Feb 4 CHOMSKY NORMAL FORM A context-free grammar is in Chomsky normal form if every rule is of the form: A BC A a S ε B and C aren t start variables a is
More information5 Context-Free Languages
CA320: COMPUTABILITY AND COMPLEXITY 1 5 Context-Free Languages 5.1 Context-Free Grammars Context-Free Grammars Context-free languages are specified with a context-free grammar (CFG). Formally, a CFG G
More informationBefore We Start. The Pumping Lemma. Languages. Context Free Languages. Plan for today. Now our picture looks like. Any questions?
Before We Start The Pumping Lemma Any questions? The Lemma & Decision/ Languages Future Exam Question What is a language? What is a class of languages? Context Free Languages Context Free Languages(CFL)
More informationAn Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia
An Efficient Context-Free Parsing Algorithm Speakers: Morad Ankri Yaniv Elia Yaniv: Introduction Terminology Informal Explanation The Recognizer Morad: Example Time and Space Bounds Empirical results Practical
More informationFORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
15-453 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY Chomsky Normal Form and TURING MACHINES TUESDAY Feb 4 CHOMSKY NORMAL FORM A context-free grammar is in Chomsky normal form if every rule is of the form:
More informationProbabilistic Context-Free Grammar
Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612
More informationCSCI 1010 Models of Computa3on. Lecture 17 Parsing Context-Free Languages
CSCI 1010 Models of Computa3on Lecture 17 Parsing Context-Free Languages Overview BoCom-up parsing of CFLs. BoCom-up parsing via the CKY algorithm An O(n 3 ) algorithm John E. Savage CSCI 1010 Lect 17
More informationParsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)
Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements
More informationNPDA, CFG equivalence
NPDA, CFG equivalence Theorem A language L is recognized by a NPDA iff L is described by a CFG. Must prove two directions: ( ) L is recognized by a NPDA implies L is described by a CFG. ( ) L is described
More informationComputability Theory
CS:4330 Theory of Computation Spring 2018 Computability Theory Decidable Problems of CFLs and beyond Haniel Barbosa Readings for this lecture Chapter 4 of [Sipser 1996], 3rd edition. Section 4.1. Decidable
More informationComputational Models - Lecture 5 1
Computational Models - Lecture 5 1 Handout Mode Iftach Haitner. Tel Aviv University. November 28, 2016 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice Herlihy, Brown University.
More informationParsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is
Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is Unger s Parser Laura Heinrich-Heine-Universität Düsseldorf Wintersemester 2012/2013 a top-down parser: we start with S and
More informationComputational Models - Lecture 4
Computational Models - Lecture 4 Regular languages: The Myhill-Nerode Theorem Context-free Grammars Chomsky Normal Form Pumping Lemma for context free languages Non context-free languages: Examples Push
More informationREDUCING THE WORST CASE RUNNING TIMES OF A FAMILY OF RNA AND CFG PROBLEMS, USING VALIANT S APPROACH
REDUCING THE WORST CASE RUNNING TIMES OF A FAMILY OF RNA AND CFG PROBLEMS, USING VALIANT S APPROACH SHAY ZAKOV, DEKEL TSUR, AND MICHAL ZIV-UKELSON Abstract. We study Valiant s classical algorithm for Context
More informationDecidability (intro.)
CHAPTER 4 Decidability Contents Decidable Languages decidable problems concerning regular languages decidable problems concerning context-free languages The Halting Problem The diagonalization method The
More informationContext Free Grammars: Introduction. Context Free Grammars: Simplifying CFGs
Context Free Grammars: Introduction CFGs are more powerful than RGs because of the following 2 properties: 1. Recursion Rule is recursive if it is of the form X w 1 Y w 2, where Y w 3 Xw 4 and w 1, w 2,
More informationProbabilistic Context-free Grammars
Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John
More informationContext-free Grammars and Languages
Context-free Grammars and Languages COMP 455 002, Spring 2019 Jim Anderson (modified by Nathan Otterness) 1 Context-free Grammars Context-free grammars provide another way to specify languages. Example:
More informationAspects of Tree-Based Statistical Machine Translation
Aspects of Tree-Based Statistical Machine Translation Marcello Federico Human Language Technology FBK 2014 Outline Tree-based translation models: Synchronous context free grammars Hierarchical phrase-based
More informationCSE 355 Test 2, Fall 2016
CSE 355 Test 2, Fall 2016 28 October 2016, 8:35-9:25 a.m., LSA 191 Last Name SAMPLE ASU ID 1357924680 First Name(s) Ima Regrading of Midterms If you believe that your grade has not been added up correctly,
More informationParsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21
Parsing Unger s Parser Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2016/17 1 / 21 Table of contents 1 Introduction 2 The Parser 3 An Example 4 Optimizations 5 Conclusion 2 / 21 Introduction
More informationOn the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar
Proceedings of Machine Learning Research vol 73:153-164, 2017 AMBN 2017 On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar Kei Amii Kyoto University Kyoto
More informationSection 1 (closed-book) Total points 30
CS 454 Theory of Computation Fall 2011 Section 1 (closed-book) Total points 30 1. Which of the following are true? (a) a PDA can always be converted to an equivalent PDA that at each step pops or pushes
More informationAdvanced Natural Language Processing Syntactic Parsing
Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm
More informationRNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"
RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure
More informationSimplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University
Simplification of CFG and Normal Forms Wen-Guey Tzeng Computer Science Department National Chiao Tung University Normal Forms We want a cfg with either Chomsky or Greibach normal form Chomsky normal form
More informationSimplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University
Simplification of CFG and Normal Forms Wen-Guey Tzeng Computer Science Department National Chiao Tung University Normal Forms We want a cfg with either Chomsky or Greibach normal form Chomsky normal form
More informationCS Pushdown Automata
Chap. 6 Pushdown Automata 6.1 Definition of Pushdown Automata Example 6.2 L ww R = {ww R w (0+1) * } Palindromes over {0, 1}. A cfg P 0 1 0P0 1P1. Consider a FA with a stack(= a Pushdown automaton; PDA).
More informationDecision problem of substrings in Context Free Languages.
Decision problem of substrings in Context Free Languages. Mauricio Osorio, Juan Antonio Navarro Abstract A context free grammar (CFG) is a set of symbols and productions used to define a context free language.
More informationCS20a: summary (Oct 24, 2002)
CS20a: summary (Oct 24, 2002) Context-free languages Grammars G = (V, T, P, S) Pushdown automata N-PDA = CFG D-PDA < CFG Today What languages are context-free? Pumping lemma (similar to pumping lemma for
More informationComputational Models - Lecture 4 1
Computational Models - Lecture 4 1 Handout Mode Iftach Haitner and Yishay Mansour. Tel Aviv University. April 3/8, 2013 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice
More informationComputational Models - Lecture 4 1
Computational Models - Lecture 4 1 Handout Mode Iftach Haitner. Tel Aviv University. November 21, 2016 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice Herlihy, Brown University.
More informationAutomata Theory CS F-08 Context-Free Grammars
Automata Theory CS411-2015F-08 Context-Free Grammars David Galles Department of Computer Science University of San Francisco 08-0: Context-Free Grammars Set of Terminals (Σ) Set of Non-Terminals Set of
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationAutomata Theory - Quiz II (Solutions)
Automata Theory - Quiz II (Solutions) K. Subramani LCSEE, West Virginia University, Morgantown, WV {ksmani@csee.wvu.edu} 1 Problems 1. Induction: Let L denote the language of balanced strings over Σ =
More informationLogic. proof and truth syntacs and semantics. Peter Antal
Logic proof and truth syntacs and semantics Peter Antal antal@mit.bme.hu 10/9/2015 1 Knowledge-based agents Wumpus world Logic in general Syntacs transformational grammars Semantics Truth, meaning, models
More informationPushdown Automata (Pre Lecture)
Pushdown Automata (Pre Lecture) Dr. Neil T. Dantam CSCI-561, Colorado School of Mines Fall 2017 Dantam (Mines CSCI-561) Pushdown Automata (Pre Lecture) Fall 2017 1 / 41 Outline Pushdown Automata Pushdown
More information11. Automata and languages, cellular automata, grammars, L-systems
11. Automata and languages, cellular automata, grammars, L-systems 11.1 Automata and languages Automaton (pl. automata): in computer science, a simple model of a machine or of other systems. ( a simplification
More information1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.
Code No: R09220504 R09 Set No. 2 II B.Tech II Semester Examinations,December-January, 2011-2012 FORMAL LANGUAGES AND AUTOMATA THEORY Computer Science And Engineering Time: 3 hours Max Marks: 75 Answer
More informationHarvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs
Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs Harry Lewis October 8, 2013 Reading: Sipser, pp. 119-128. Pushdown Automata (review) Pushdown Automata = Finite automaton
More information