Computing if a token can follow

Similar documents
Follow sets. LL(1) Parsing Table

CYK Algorithm for Parsing General Context-Free Grammars

h>p://lara.epfl.ch Compiler Construc/on 2011 CYK Algorithm and Chomsky Normal Form

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

CS153: Compilers Lecture 5: LL Parsing

Syntax Analysis Part I

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

Syntax Analysis Part I

Top-Down Parsing and Intro to Bottom-Up Parsing

Compiler Construction Lent Term 2015 Lectures (of 16)

Compiler Construction Lent Term 2015 Lectures (of 16)

CSCI Compiler Construction

CS415 Compilers Syntax Analysis Top-down Parsing

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Introduction to Bottom-Up Parsing

CONTEXT FREE GRAMMAR AND

Syntax Analysis Part III

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Compiler Construction Lectures 13 16

Introduction to Bottom-Up Parsing

Syntactic Analysis. Top-Down Parsing

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 3. Y.N. Srikant

CSE302: Compiler Design

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id

CS20a: summary (Oct 24, 2002)

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Compiler Design Spring 2017

THEORY OF COMPILATION

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Announcement: Midterm Prep

CS 406: Bottom-Up Parsing

Foundations of Informatics: a Bridging Course

Bottom-Up Syntax Analysis

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

I 1 : {S S } I 2 : {S X ay, Y X } I 3 : {S Y } I 4 : {X b Y, Y X, X by, X c} I 5 : {X c } I 6 : {S Xa Y, Y X, X by, X c} I 7 : {X by } I 8 : {Y X }

Even More on Dynamic Programming

Shift-Reduce parser E + (E + (E) E [a-z] In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.

LL(1) Grammar and parser

Computer Science 160 Translation of Programming Languages

Parsing -3. A View During TD Parsing

Lecture 11 Sections 4.5, 4.7. Wed, Feb 18, 2009

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

Bottom up parsing. General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing);

Announcements. H6 posted 2 days ago (due on Tue) Midterm went well: Very nice curve. Average 65% High score 74/75

Creating a Recursive Descent Parse Table

Chap. 7 Properties of Context-free Languages

EXAM. Please read all instructions, including these, carefully NAME : Problem Max points Points 1 10 TOTAL 100

CS 314 Principles of Programming Languages

INF5110 Compiler Construction

Compiler Principles, PS4

Syntax Directed Transla1on

Grammars and Context Free Languages

Notes for Comp 497 (Comp 454) Week 10 4/5/05

Context Free Grammars

Context Free Grammars: Introduction. Context Free Grammars: Simplifying CFGs

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

Context free languages

CA Compiler Construction

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Lecture 11 Context-Free Languages

MA/CSSE 474 Theory of Computation

Context Free Grammars: Introduction

Syntax Analysis, VII The Canonical LR(1) Table Construction. Comp 412

Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition

Grammars and Context Free Languages

Course Script INF 5110: Compiler construction

RNA Secondary Structure Prediction

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Compiler Design. Spring Syntactic Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Context- Free Parsing with CKY. October 16, 2014

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

CISC4090: Theory of Computation

Syntax Analysis - Part 1. Syntax Analysis

Compiler construction

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Lecture 12 Simplification of Context-Free Grammars and Normal Forms

Computational Models - Lecture 3

Context-Free Languages (Pre Lecture)

Syntax Analysis (Part 2)

LR(1) Parsers Part III Last Parsing Lecture. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Lecture 17: Language Recognition

Simplification and Normalization of Context-Free Grammars

Context-Free Grammars (and Languages) Lecture 7

Introduction to Theory of Computing

Parsing VI LR(1) Parsers

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

What we have done so far

A parsing technique for TRG languages

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

From EBNF to PEG. Roman R. Redziejowski. Concurrency, Specification and Programming Berlin 2012

Transcription:

Computing if a token can follow first(b 1... B p ) = {a B 1...B p... aw } follow(x) = {a S......Xa... } There exists a derivation from the start symbol that produces a sequence of terminals and nonterminals of the form...xa... (the token a follows the non-terminal X)

Rule for Computing Follow Given X ::= YZ (for reachable X) then first(z) follow(y) and follow(x) follow(z) Now take care of nullable ones as well: For each rule X ::= Y 1... Y p... Y q... Y r follow(y p ) should contain: first(y p+1 Y p+2...y r ) S => Xa => YZa => Yba also follow(x) if nullable(y p+1 Y p+2 Y r ) S ::= Xa X ::= YZ Y ::= b Z ::= c

Compute nullable, first, follow stmtlist ::= stmt stmtlist stmt ::= assign block assign ::= ID = ID ; block ::= beginof ID stmtlist ID ends Compute follow (for that we need nullable,first)

Conclusion of the Solution The grammar is not LL(1) because we have nullable(stmtlist) first(stmt) follow(stmtlist) = {ID} If a recursive-descent parser sees ID, it does not know if it should finish parsing stmtlist or parse another stmt

LL(1) Grammar - good for building recursive descent parsers Grammar is LL(1) if for each nonterminal X first sets of different alternatives of X are disjoint if nullable(x), first(x) must be disjoint from follow(x) For each LL(1) grammar we can build recursive-descent parser Each LL(1) grammar is unambiguous If a grammar is not LL(1), we can sometimes transform it into equivalent LL(1) grammar

Table for LL(1) Parser: Example S ::= B EOF (1) B ::= B (B) (1) (2) nullable: B first(s) = { ( } follow(s) = {} first(b) = { ( } follow(b) = { ), (, EOF } Parsing table: EOF ( ) S {1} {1} {} B {1} {1,2} {1} empty entry: when parsing S, if we see ), report error parse conflict - choice ambiguity: grammar not LL(1) 1 is in entry because ( is in follow(b) 2 is in entry because ( is in first(b(b))

Table for LL(1) Parsing Tells which alternative to take, given current token: choice : Nonterminal x Token -> Set[Int] A ::= (1) B 1... B p (2) C 1... C q (3) D 1... D r For example, when parsing A and seeing token t choice(a,t) = {2} means: parse alternative 2 (C 1... C q ) choice(a,t) = {1} means: parse alternative 3 (D 1... D r ) choice(a,t) = {} means: report syntax error choice(a,t) = {2,3} : not LL(1) grammar if t first(c 1... C q ) add 2 to choice(a,t) if t follow(a) add K to choice(a,t) where K is nullable alternative

Transform Grammar for LL(1) S ::= B EOF B ::= B (B) (1) (2) Transform the grammar so that parsing table has no conflicts. Old parsing table: EOF ( ) S {1} {1} {} B {1} {1,2} {1} conflict - choice ambiguity: grammar not LL(1) 1 is in entry because ( is in follow(b) 2 is in entry because ( is in first(b(b)) S ::= B EOF B ::= (B) B (1) (2) Left recursion is bad for LL(1) S B EOF ( ) choice(a,t)

Parse Table is Code for Generic Parser var stack : Stack[GrammarSymbol] // terminal or non-terminal stack.push(eof); stack.push(startnonterminal); var lex = new Lexer(inputFile) while (true) { X = stack.pop t = lex.curent if (isterminal(x)) if (t==x) if (X==EOF) return success else lex.next // eat token t else parseerror("expected " + X) else { // non-terminal cs = choice(x)(t) // look up parsing table cs match { // result is a set case {i} => { // exactly one choice rhs = p(x,i) // choose correct right-hand side stack.push(reverse(rhs)) } case {} => parseerror("parser expected an element of " + unionofall(choice(x))) case _ => crash( parse table with conflicts - grammar was not LL(1)") } }

What if we cannot transform the grammar into LL(1)? 1) Redesign your language 2) Use a more powerful parsing technique

LL(1) regular LR(0) LALR(1) SLR deterministic = LR(1) unambiguous context-free Languages context-sensitive decidable semi-decidable

Remark: Grammars and Languages Language S is a set of words For each language S, there can be multiple possible grammars G such that S=L(G) Language S is Non-ambiguous if there exists a non-ambiguous grammar for it LL(1) if there is an LL(1) grammar for it Even if a language has ambiguous grammar, it can still be non-ambiguous if it also has a non-ambiguous grammar

Parsing General Grammars: Why Can be difficult or impossible to make grammar unambiguous Some inputs are more complex than simple programming languages mathematical formulas: x = y /\ z? (x=y) /\ z x = (y /\ z) future programming languages natural language: I saw the man with the telescope.

Ambiguity 1) 2) I saw the man with the telescope.

CYK Parsing Algorithm C: John Cocke and Jacob T. Schwartz (1970). Programming languages and their compilers: Preliminary notes. Technical report, Courant Institute of Mathematical Sciences, New York University. Y: Daniel H. Younger (1967). Recognition and parsing of context-free languages in time n 3. Information and Control 10(2): 189 208. K: T. Kasami (1965). An efficient recognition and syntax-analysis algorithm for context-free languages. Scientific report AFCRL-65-758, Air Force Cambridge Research Lab, Bedford, MA.

Two Steps in the Algorithm 1) Transform grammar to normal form called Chomsky Normal Form (Noam Chomsky, mathematical linguist) 2) Parse input using transformed grammar dynamic programming algorithm a method for solving complex problems by breaking them down into simpler steps. It is applicable to problems exhibiting the properties of overlapping subproblems (>WP)

Chomsky Normal Form Essentially, only binary rules Concretely, these kinds of rules: X ::= Y Z binary rule X,Y,Z - non-terminals X ::= a non-terminal as a name for token S ::= only for top-level symbol S

Balanced Parentheses Grammar Original grammar G S ( S ) S S Modified grammar in Chomsky Normal Form: S S S N ( N S) N ( N ) S S N S) S N ) N ( ( N ) ) Terminals: ( ) Nonterminals: S S N S) N ) N (

Idea How We Obtained the Grammar S ( S ) S N ( N S) N ( N ) N ( ( N S) S N ) N ) ) Chomsky Normal Form transformation can be done fully mechanically

Steps: Transforming Grammars into Chomsky Normal Form 1. remove unproductive symbols 2. remove unreachable symbols 3. remove epsilons (no non-start nullable symbols) 4. remove single non-terminal productions X::=Y 5. transform productions w/ more than 3 on RHS 6. make terminals occur alone on right-hand side

4) Eliminating single productions Single production is of the form X ::=Y where X,Y are non-terminals program ::= stmtseq stmtseq ::= stmt stmt ; stmtseq stmt ::= assignment whilestmt assignment ::= expr = expr whilestmt ::= while (expr) stmt

4) Eliminate single productions - Result Generalizes removal of epsilon transitions from non-deterministic automata program ::= expr = expr while (expr) stmt stmt ; stmtseq stmtseq ::= expr = expr while (expr) stmt stmt ; stmtseq stmt ::= expr = expr while (expr) stmt assignment ::= expr = expr whilestmt ::= while (expr) stmt

4) Single Production Terminator If there is single production X ::=Y put an edge (X,Y) into graph If there is a path from X to Z in the graph, and there is rule Z ::= s 1 s 2 s n then add rule X ::= s 1 s 2 s n At the end, remove all single productions. program ::= expr = expr while (expr) stmt stmt ; stmtseq stmtseq ::= expr = expr while (expr) stmt stmt ; stmtseq stmt ::= expr = expr while (expr) stmt

5) No more than 2 symbols on RHS becomes stmt ::= while (expr) stmt stmt ::= while stmt 1 stmt 1 ::= ( stmt 2 stmt 2 ::= expr stmt 3 stmt 3 ::= ) stmt

6) A non-terminal for each terminal becomes stmt ::= while (expr) stmt stmt ::= N while stmt 1 stmt 1 ::= N ( stmt 2 stmt 2 ::= expr stmt 3 stmt 3 ::= N ) stmt N while ::= while N ( ::= ( N ) ::= )

Parsing using CYK Algorithm Transform grammar into Chomsky Form: 1. remove unproductive symbols 2. remove unreachable symbols 3. remove epsilons (no non-start nullable symbols) 4. remove single non-terminal productions X::=Y 5. transform productions of arity more than two 6. make terminals occur alone on right-hand side Have only rules X ::= Y Z, X ::= t, and possibly S ::= Apply CYK dynamic programming algorithm

Dynamic Programming to Parse Input Assume Chomsky Normal Form, 3 types of rules: S S (only for the start non-terminal) N j t (names for terminals) N i N j N k Decomposing long input: (just 2 non-terminals on RHS) N i N j N k ( ( ( ) ( ) ) ( ) ) ( ( ) ) find all ways to parse substrings of length 1,2,3,

S N ( N S) N ( N ) S S Parsing an Input N S) S N ) N ( ( N ) ) 7 6 5 4 3 2 1 ambiguity N ( N ( N ) N ( N ) N ( N ) N ) ( ( ) ( ) ( ) )

S S S w pq substring from p to q d pq all non-terminals that could expand to w pq Algorithm Idea 7 6 Initially d pp has N w(p,p) key step of the algorithm: if X Y Z is a rule, Y is in d p r, and Z is in d (r+1)q then put X into d pq (p r < q), in increasing value of (q-p) 5 4 3 2 1 N ( N ( N ) N ( N ) N ( N ) N ) ( ( ) ( ) ( ) )

Algorithm INPUT: grammar G in Chomsky normal form word w to parse using G OUTPUT: true iff (w in L(G)) N = w var d : Array[N][N] for p = 1 to N { d(p)(p) = {X G contains X->w(p)} for q in {p + 1.. N} d(p)(q) = {} } for k = 2 to N // substring length for p = 0 to N-k // initial position for j = 1 to k-1 // length of first half val r = p+j-1; val q = p+k-1; for (X::=Y Z) in G if Y in d(p)(r) and Z in d(r+1)(q) d(p)(q) = d(p)(q) union {X} return S in d(0)(n-1) What is the running time as a function of grammar size and the size of input? O( ) ( ( ) ( ) ( ) )

Parsing another Input S N ( N S) N ( N ) S S N S) S N ) N ( ( N ) ) 7 6 5 4 3 2 1 N ( N ) N ( N ) N ( N ) N ( N ) ( ) ( ) ( ) ( )

Number of Parse Trees Let w denote word ()()() it has two parse trees Give a lower bound on number of parse trees of the word w n (n is positive integer) w 5 is the word ()()() ()()() ()()() ()()() ()()() CYK represents all parse trees compactly can re-run algorithm to extract first parse tree, or enumerate parse trees one by one

Earley s Algorithm also parser arbitrary grammars J. Earley, "An efficient context-free parsing algorithm", Communications of the Association for Computing Machinery, 13:2:94-102, 1970.

CYK vs Earley s Parser Comparison Z ::= X Y Z parses w pq CYK: if d pr parses X and d (r+1)q parses Y, then in d pq stores symbol Z Earley s parser: in set S q stores item (Z ::= XY., p) Move forward, similar to top-down parsers Use dotted rules to avoid binary rules ( ( ) ( ) ( ) )

Example: expressions D ::= e EOF e ::= ID e e e == e Rules with a dot inside D ::=. e EOF e. EOF e EOF. e ::=. ID ID.. e e e. e e. e e e.. e == e e. == e e ==. e e == e.

ID - ID == ID EOF ID ID- ID-ID ID-ID== ID-ID==ID ID - -ID -ID== -ID==ID - ID ID== ID==ID ID == ==ID == ID ID EOF S ::=. e EOF e. EOF e EOF. e ::=. ID ID.. e e e. e e. e e e.. e == e e. == e e ==. e e == e.

ID - ID == ID EOF ID ID- ID-ID ID-ID== ID-ID==ID ID - -ID -ID== -ID==ID - ID ID== ID==ID ID == ==ID == ID ID EOF S ::=. e EOF e. EOF e EOF. e ::=. ID ID.. e e e. e e. e e e.. e == e e. == e e ==. e e == e.