h>p://lara.epfl.ch Compiler Construc/on 2011 CYK Algorithm and Chomsky Normal Form

Similar documents
CYK Algorithm for Parsing General Context-Free Grammars

Computing if a token can follow

Follow sets. LL(1) Parsing Table

CSCI Compiler Construction

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

Lecture 12 Simplification of Context-Free Grammars and Normal Forms

Context Free Grammars: Introduction. Context Free Grammars: Simplifying CFGs

CS20a: summary (Oct 24, 2002)

Finite Automata Theory and Formal Languages TMV026/TMV027/DIT321 Responsible: Ana Bove

This lecture covers Chapter 7 of HMU: Properties of CFLs

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

CS 373: Theory of Computation. Fall 2010

Announcement: Midterm Prep

Chap. 7 Properties of Context-free Languages

Simplification and Normalization of Context-Free Grammars

The Pumping Lemma for Context Free Grammars

CONTEXT FREE GRAMMAR AND

Context Free Grammars: Introduction

CISC4090: Theory of Computation

Context-Free Grammars (and Languages) Lecture 7

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

6.1 The Pumping Lemma for CFLs 6.2 Intersections and Complements of CFLs

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

NPDA, CFG equivalence

CPS 220 Theory of Computation

Homework 4 Solutions. 2. Find context-free grammars for the language L = {a n b m c k : k n + m}. (with n 0,

MA/CSSE 474 Theory of Computation

Grammars and Context Free Languages

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Grammars and Context Free Languages

Plan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form

Lecture 17: Language Recognition

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Lecture 11 Context-Free Languages

Context-Free Languages (Pre Lecture)

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Ogden s Lemma for CFLs

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

Formal Languages, Grammars and Automata Lecture 5

Context Free Grammars

Foundations of Informatics: a Bridging Course

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Syntax Analysis Part I

Syntax Analysis Part I

Automata Theory Final Exam Solution 08:10-10:00 am Friday, June 13, 2008

Parsing beyond context-free grammar: Parsing Multiple Context-Free Grammars

Computational Models - Lecture 4

Syntax Analysis Part III

Theory of Computation - Module 3

Einführung in die Computerlinguistik

Context-Free Grammar

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Syntax Directed Transla1on

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Notes for Comp 497 (Comp 454) Week 10 4/5/05

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

MTH401A Theory of Computation. Lecture 17

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES

Chap. 1.2 NonDeterministic Finite Automata (NFA)

Properties of context-free Languages

CS153: Compilers Lecture 5: LL Parsing

Computability Theory

Eng. Maha Talaat Page 1

Recitation 4: Converting Grammars to Chomsky Normal Form, Simulation of Context Free Languages with Push-Down Automata, Semirings

Context-Free Grammars: Normal Forms

Introduction to Theory of Computing

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

PUSHDOWN AUTOMATA (PDA)

Even More on Dynamic Programming

Non-context-Free Languages. CS215, Lecture 5 c

Computational Models - Lecture 3

Computational Models - Lecture 5 1

Bottom-Up Syntax Analysis

CS 406: Bottom-Up Parsing

Properties of Context-free Languages. Reading: Chapter 7

CSCI 1010 Models of Computa3on. Lecture 17 Parsing Context-Free Languages

CS415 Compilers Syntax Analysis Top-down Parsing

3515ICT: Theory of Computation. Regular languages

Context Free Language Properties

Top-Down Parsing and Intro to Bottom-Up Parsing

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia

Final exam study sheet for CS3719 Turing machines and decidability.

Pushdown Automata: Introduction (2)

Theory of Computation (II) Yijia Chen Fudan University

Compiler Design. Spring Syntactic Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition

Fall 1999 Formal Language Theory Dr. R. Boyer. Theorem. For any context free grammar G; if there is a derivation of w 2 from the

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

Tree Adjoining Grammars

Context-Free Grammars. 2IT70 Finite Automata and Process Theory

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

A* Search. 1 Dijkstra Shortest Path

Transcription:

h>p://lara.epfl.ch Compiler Construc/on 2011 CYK Algorithm and Chomsky Normal Form

S à N ( N S) N ( N ) S S Parsing an Input N S) à S N ) N ( à ( N ) à ) 7 6 5 4 3 2 1 ambiguity N ( N ( N ) N ( N ) N ( N ) N ) ( ( ) ( ) ( ) )

Algorithm Idea S à S S w pq substring from p to q d pq all non- terminals that could expand to w pq Ini/ally d pp has N w(p,p) key step of the algorithm: if X à Y Z is a rule, Y is in d p r, and Z is in d (r+1)q then put X into d pq (p r < q), in increasing value of (q- p) 7 6 5 4 3 2 1 N ( N ( N ) N ( N ) N ( N ) N ) ( ( ) ( ) ( ) )

Algorithm INPUT: grammar G in Chomsky normal form word w to parse using G OUTPUT: true iff (w in L(G)) N = w var d : Array[N][N] for p = 1 to N { d(p)(p) = {X G contains X- >w(p)} for q in {p + 1.. N} d(p)(q) = {} } for k = 2 to N // substring length for p = 0 to N- k // ini/al posi/on for j = 1 to k- 1 // length of first half val r = p+j- 1; val q = p+k- 1; for (X::=Y Z) in G if Y in d(p)(r) and Z in d(r+1)(q) d(p)(q) = d(p)(q) union {X} return S in d(0)(n- 1) What is the running /me as a func/on of grammar size and the size of input? O( ) ( ( ) ( ) ( ) )

Parsing another Input S à N ( N S) N ( N ) S S N S) à S N ) N ( à ( N ) à ) 7 6 5 4 3 2 1 N ( N ) N ( N ) N ( N ) N ( N ) ( ) ( ) ( ) ( )

Number of Parse Trees Let w denote word ()()() it has two parse trees Give a lower bound on number of parse trees of the word w n (n is posi/ve integer) w 5 is the word ()()() ()()() ()()() ()()() ()()() CYK represents all parse trees compactly can re- run algorithm to extract first parse tree, or enumerate parse trees one by one

Algorithm Idea S à S S w pq substring from p to q d pq all non- terminals that could expand to w pq Ini/ally d pp has N w(p,p) key step of the algorithm: if X à Y Z is a rule, Y is in d p r, and Z is in d (r+1)q then put X into d pq (p r < q), in increasing value of (q- p) 7 6 5 4 3 2 1 N ( N ( N ) N ( N ) N ( N ) N ) ( ( ) ( ) ( ) )

Transforming to Chomsky Form Steps: 1. remove unproduc/ve symbols 2. remove unreachable symbols 3. remove epsilons (no non- start nullable symbols) 4. remove single non- terminal produc/ons X::=Y 5. transform produc/ons of arity more than two 6. make terminals occur alone on right- hand side

1) Unproduc/ve non- terminals How to compute them? What is funny about this grammar: stmt ::= iden/fier := iden/fier while (expr) stmt if (expr) stmt else stmt expr ::= term + term term term term ::= factor * factor factor ::= ( expr ) There is no deriva/on of a sequence of tokens from expr Why? In every step will have at least one expr, term, or factor If it cannot derive sequence of tokens we call it unproduc(ve

1) Unproduc/ve non- terminals Produc/ve symbols are obtained using these two rules (what remains is unproduc/ve) Terminals are produc/ve If X::= s 1 s 2 s n is rule and each s i is produc/ve then X is produc/ve stmt ::= iden/fier := iden/fier while (expr) stmt if (expr) stmt else stmt expr ::= term + term term term term ::= factor * factor factor ::= ( expr ) program ::= stmt stmt program Delete unproduc/ve symbols. Will the meaning of top- level symbol (program) change?

2) Unreachable non- terminals What is funny about this grammar with star/ng terminal program program ::= stmt stmt program stmt ::= assignment whilestmt assignment ::= expr = expr ifstmt ::= if (expr) stmt else stmt whilestmt ::= while (expr) stmt expr ::= iden/fier No way to reach symbol ifstmt from program

2) Unreachable non- terminals What is funny about this grammar with star/ng terminal program program ::= stmt stmt program stmt ::= assignment whilestmt assignment ::= expr = expr ifstmt ::= if (expr) stmt else stmt whilestmt ::= while (expr) stmt expr ::= iden/fier What is the general algorithm?

2) Unreachable non- terminals Reachable terminals are obtained using the following rules (the rest are unreachable) star/ng non- terminal is reachable (program) If X::= s 1 s 2 s n is rule and X is reachable then each non- terminal among s 1 s 2 s n is reachable Delete unreachable symbols. Will the meaning of top- level symbol (program) change?

2) Unreachable non- terminals What is funny about this grammar with star/ng terminal program program ::= stmt stmt program stmt ::= assignment whilestmt assignment ::= expr = expr ifstmt ::= if (expr) stmt else stmt whilestmt ::= while (expr) stmt expr ::= iden/fier

3) Removing Empty Strings Ensure only top- level symbol can be nullable program ::= stmtseq stmtseq ::= stmt stmt ; stmtseq stmt ::= assignment whilestmt blockstmt blockstmt ::= { stmtseq } assignment ::= expr = expr whilestmt ::= while (expr) stmt expr ::= iden/fier How to do it in this example?

3) Removing Empty Strings - Result program ::= stmtseq stmtseq ::= stmt stmt ; stmtseq ; stmtseq stmt ; ; stmt ::= assignment whilestmt blockstmt blockstmt ::= { stmtseq } { } assignment ::= expr = expr whilestmt ::= while (expr) stmt whilestmt ::= while (expr) expr ::= iden/fier

3) Removing Empty Strings - Algorithm Compute the set of nullable non- terminals Add extra rules If X::= s 1 s 2 s n is rule then add new rules of form X::= r 1 r 2 r n where r i is either s i or, if s i is nullable then r i can also be the empty string (so it disappears) Remove all empty right- hand sides If star/ng symbol S was nullable, then introduce a new start symbol S instead, and add rule S ::= S

3) Removing Empty Strings Since stmtseq is nullable, the rule blockstmt ::= { stmtseq } gives blockstmt ::= { stmtseq } { } Since stmtseq and stmt are nullable, the rule stmtseq ::= stmt stmt ; stmtseq gives stmtseq ::= stmt stmt ; stmtseq ; stmtseq stmt ; ;

4) Elimina/ng single produc/ons Single produc/on is of the form X ::=Y where X,Y are non- terminals program ::= stmtseq stmtseq ::= stmt stmt ; stmtseq stmt ::= assignment whilestmt assignment ::= expr = expr whilestmt ::= while (expr) stmt

4) Eliminate single produc/ons - Result Generalizes removal of epsilon transi/ons from non- determinis/c automata program ::= expr = expr while (expr) stmt stmt ; stmtseq stmtseq ::= expr = expr while (expr) stmt stmt ; stmtseq stmt ::= expr = expr while (expr) stmt assignment ::= expr = expr whilestmt ::= while (expr) stmt

4) Single Produc/on Terminator If there is single produc/on X ::=Y put an edge (X,Y) into graph If there is a path from X to Z in the graph, and there is rule Z ::= s 1 s 2 s n then add rule X ::= s 1 s 2 s n At the end, remove all single productions. program ::= expr = expr while (expr) stmt stmt ; stmtseq stmtseq ::= expr = expr while (expr) stmt stmt ; stmtseq stmt ::= expr = expr while (expr) stmt

5) No more than 2 symbols on RHS becomes stmt ::= while (expr) stmt stmt ::= while stmt 1 stmt 1 ::= ( stmt 2 stmt 2 ::= expr stmt 3 stmt 3 ::= ) stmt

6) A non- terminal for each terminal becomes stmt ::= while (expr) stmt stmt ::= N while stmt 1 stmt 1 ::= N ( stmt 2 stmt 2 ::= expr stmt 3 stmt 3 ::= N ) stmt N while ::= while N ( ::= ( N ) ::= )

Parsing using CYK Algorithm Transform grammar into Chomsky Form: 1. remove unproduc/ve symbols 2. remove unreachable symbols 3. remove epsilons (no non- start nullable symbols) 4. remove single non- terminal produc/ons X::=Y 5. transform produc/ons of arity more than two 6. make terminals occur alone on right- hand side Have only rules X ::= Y Z, X ::= t, and possibly S ::= Apply CYK dynamic programming algorithm

Algorithm Idea S à S S w pq substring from p to q d pq all non- terminals that could expand to w pq Ini/ally d pp has N w(p,p) key step of the algorithm: if X à Y Z is a rule, Y is in d p r, and Z is in d (r+1)q then put X into d pq (p r < q), in increasing value of (q- p) 7 6 5 4 3 2 1 N ( N ( N ) N ( N ) N ( N ) N ) ( ( ) ( ) ( ) )

Earley s Algorithm J. Earley, "An efficient context-free parsing algorithm", Communications of the Association for Computing Machinery, 13:2:94-102, 1970.

CYK vs Earley s Parser Comparison Z ::= X Y Z parses w pq CYK: if d pr parses X and d (r+1)q parses Y, then in d pq stores symbol Z Earley s parser: in set S q stores item (Z ::= XY., p) Move forward, similar to top- down parsers Use do>ed rules to avoid binary rules ( ( ) ( ) ( ) )