An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia

Similar documents
CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

MA/CSSE 474 Theory of Computation

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Context Free Grammars

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

CA Compiler Construction

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

Chapter 4: Context-Free Grammars

Computer Science 160 Translation of Programming Languages

(NB. Pages are intended for those who need repeated study in formal languages) Length of a string. Formal languages. Substrings: Prefix, suffix.

Everything You Always Wanted to Know About Parsing

UNIT-VIII COMPUTABILITY THEORY

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Parsing -3. A View During TD Parsing

Context-free Grammars and Languages

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

Compiling Techniques

Parsing with Context-Free Grammars

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

Introduction to Theory of Computing

Foundations of Informatics: a Bridging Course

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

MA/CSSE 474 Theory of Computation

CPS 220 Theory of Computation

Compiling Techniques

This lecture covers Chapter 5 of HMU: Context-free Grammars

Introduction to Bottom-Up Parsing

Follow sets. LL(1) Parsing Table

Computational Models - Lecture 4

THEORY OF COMPILATION

Lecture Notes on Inductive Definitions

Section 1 (closed-book) Total points 30

Syntax Analysis Part I

Creating a Recursive Descent Parse Table

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

Syntax Analysis (Part 2)

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

Introduction to Bottom-Up Parsing

Accept or reject. Stack

CS481F01 Prelim 2 Solutions

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Lecture Notes on Inductive Definitions

CSE302: Compiler Design

CYK Algorithm for Parsing General Context-Free Grammars

Introduction to Bottom-Up Parsing

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Compiling Techniques

Solutions to Problem Set 3

Statistical Machine Translation

Introduction to Bottom-Up Parsing

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Syntax Analysis Part I

Syntactic Analysis. Top-Down Parsing

The Post Correspondence Problem

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Functions on languages:

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

CONTEXT FREE GRAMMAR AND

Context-Free Grammars and Languages

Context-Free and Noncontext-Free Languages

A* Search. 1 Dijkstra Shortest Path

CISC4090: Theory of Computation

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer.

Syntax Analysis - Part 1. Syntax Analysis

INF5110 Compiler Construction

Announcements. H6 posted 2 days ago (due on Tue) Midterm went well: Very nice curve. Average 65% High score 74/75

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

CSCI Compiler Construction

Top-Down Parsing and Intro to Bottom-Up Parsing

Bottom up parsing. General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing);

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

Natural Language Processing. Lecture 13: More on CFG Parsing

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

FLAC Context-Free Grammars

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

Introduction to Computers & Programming

Context-Free Grammars and Languages. We have seen that many languages cannot be regular. Thus we need to consider larger classes of langs.

UNIT II REGULAR LANGUAGES

Context-Free Languages (Pre Lecture)

On Parsing Expression Grammars A recognition-based system for deterministic languages

EXAM. Please read all instructions, including these, carefully NAME : Problem Max points Points 1 10 TOTAL 100

Introduction to Metalogic 1

CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata

CMSC 330: Organization of Programming Languages

Variants of Turing Machine (intro)

The Church-Turing Thesis

5 Context-Free Languages

Transcription:

An Efficient Context-Free Parsing Algorithm Speakers: Morad Ankri Yaniv Elia

Yaniv: Introduction Terminology Informal Explanation The Recognizer Morad: Example Time and Space Bounds Empirical results Practical Use Outline

Introduction: The Author

Grammar Introduction cont. The rules governing the use of a language Types of grammar: regular expressions Context-free Context-sensitive Recursively Enumerable

Introduction cont. Chomsky Grammars Hierarchy: Recursively Enumerable (Any) a n b n c n a n b n a*b* Context Free (A-> abc) Regular Expression (S->aB) Context Sensitive (AB-> CD)

Introduction cont. Representing Sentence Structure: Not just FSTs! Issue: Recursion Potentially infinite: a + a + a +.. Capture constituent structure: Basic units => Terminals Subcategorization => Non Terminals Hierarchical => Parse Tree

Introduction cont. Context-free Grammars (BNF grammars) Allows a simple and precise description of sentences which are built from smaller blocks Why "context-free? Non-terminals can be rewritten without regard to the context in which they occur Parsing Algorithms for these grammars play a large role in compilers and interpreters implementation (e.g. Yacc, Bison, JavaCC)

Introduction cont. Parsing Algorithms types: General Algorithms: handle all context-free grammars Restricted Algorithms: handle sub-classes of grammars Tend to be more efficient

Introduction cont. Earley s Algorithm is more efficient than all other parsing algorithms: can parse all context-free languages executes in cubic time O(n 3 ) in the general case O(n 2 ) for unambiguous grammars linear time for almost all LR(k) grammars It performs particularly well when the rules are written left-recursively

Language Terminology A set of strings over a finite set of terminal symbols. These terminal Symbols are represented by lowercase letters: a, b, c Non-terminal Symbols syntactic classes Represented by Capital letters: A, B, C

Terminology - cont. Strings of either terminals or non-terminals are represented by Greek letters: α, β, γ The empty string is λ. α k = α, α,, α (k times) α is the number of symbols in α.

Terminology - cont. Productions/rewriting rules A finite set of rules Represented as : A α The root of the grammar A non-terminal which stands for "sentence Alternatives The productions with a particular non-terminal D on their left sides

Terminology - cont. Example: T P T T * P P a Root Terminals Non Production Alternative Terminals Rule

Terminology - cont. Given a context-free grammar G: α => β There are γ, δ, η, A s.t. α = γaδ, β = γηδ and A η is a production α = * > β (β is derived from α) There are strings α 0, α 1,, α m s.t. α = α 0 => α 1 => => α m = β The sequence α 0, α 1,, α m is called a derivation

Terminology - cont. sentential form a string α s.t. α is derived from the root of the grammar ( R = * > α ) Sentence a sentential form consisting entirely of terminals Derivation tree (a.k.a. parse tree) a representation of a sentential from reflecting the steps made in deriving it

Terminology - cont. Example: a * a + a E => E + T (E E + T) => T + T (E T) => T + P (T P) => T * P + P (T T * P) => P * P + P (T P) => a * P + P (P a) => a * a + P (P a) => a * a + a (P a) E E + T T P T * P P a a a

Terminology - cont. Note: a derivation tree is not unique for a derivation! E => E + T (E E + T) => E + P (T P) => T + P (E T) =>T * P + P (T T * P) => P * P + P (T P) => a * P + P (P a) => a * a + P (P a) => a * a + a (P a) E E + T T P T * P P a a a

Terminology - cont. Note: a derivation tree is not unique for a derivation! A Parse Tree represents the steps deriving it, but E not their order! E => E + T (E E + T) => E + T (E E + T) => E + P (T P) => T + T (E T) => T + P (E T) => T + P (T P) =>T * P + P (T T * P) => T * P + P (T T * P) => P * P + P (T P) => P * P + P (T P) => a * P + P (P a) => a * P + P (P a) => a * a + P (P a) => a * a + P (P a) => a * a + a (P a) => a * a + a (P a)

Degree of ambiguity Terminology - cont. number of distinct derivation trees of a sentence Unambiguous sentence a sentence whose degree of ambiguity is 1 Unambiguous grammar contains only unambiguous sentences Bounded unambiguity a grammar with a bound b on the degree of ambiguity

The recognizer Terminology - cont. An algorithm which take a string as input Accepts/rejects it depending on whether or not the string is a sentence of the grammar The parser A recognizer which also outputs the set of all legal derivation trees for the string

Informal Explanation How does the recognizer work? Scans an input string X 1, X 2,, X n from left to right looking ahead some fixed number k of symbols As each symbol X i is scanned, a set of states S i is constructed representing the condition of the recognition process at that point in the scan

Informal Explanation Each state in the set represents a production s.t. we are currently scanning a portion of the input string which is derived from its right side a point in that production which shows how much of the production's right side we have recognized so far a k-symbol string which is a syntactically allowed successor to that instance of the production a pointer back to the position in the input string at which we began to look for that instance of the production

Example: Informal Explanation In grammar AE, with k = 1, S o starts as the single state Φ. E & & 0 new new non-terminal terminal production rule point K-symbol string (k=1) Pointer back to input string position

Informal Explanation Uses dynamic programming to do parallel top-down search in (worst case) O(N 3 ) time First, left-to-right pass fills out N+1 states sets Think of the states sets as sitting between words in the input string, keeping track of states of the parse at these positions For each word position, a set of states represents all partial parse trees generated to date. E.g. the state set S 0 contains all partial parse trees generated at the beginning of the sentence

Informal Explanation How to recognize a sentence? When we go over a state in S i, we notice 3 cases: The dot is not at the end of the state The dot is before a non-terminal symbol => Predictor The dot is before a terminal symbol => Scanner The dot is at the end of the state => Completer

Informal Explanation The predictor operation: If the dot is before a non-terminal symbol: Adds new states to the current state set One new state for each expansion of the non-terminal in the grammar Formally: Why? S j : A α B β l 1 i S j : B γ l 2 j (l 2 = first k symbols of β +l 1 )

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 Grammar: Input string:

Informal Explanation The scanner operation: If the dot is before a terminal symbol: compare that symbol with X i+1 if they match, it adds the state to the next state set, with the dot moved over one symbol in the state Formally: S j : A α B β l 1 i S j+1 : A α B β l 1 i Why?

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 Grammar: Input string: S 1. & 0. + 0

The completer: Informal Explanation if the dot of a state is at the end of a its production: compares the look-ahead string with P=X i+1 X i+k If they match: goes back to the state set S i indicated by the pointer adds all states from S i which have the derived nonterminal to the right of the dot For each of these states the dot is placed after this nonterminal

S 0 S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ.E & E.E + T E.T E.E + T E.T T.a T.a & & & + + & + 0 0 0 0 0 0 0

Informal Explanation After going over all states in S i, we move on to S i+1 If the algorithm ever produces an S i+1 consisting of the single state Φ E &. & 0 then we have correctly scanned E and the & Symbol we are finished with the string, which means the input string is a sentence of the grammar!

The Recognizer A precise description of the recognizer: Given: input string X 1 X n., grammar G. We arbitrarily number the productions 1 d-1, where each production p is of the form: D p C p1 C pm (m = # of symbols in the alternative) We add a 0-th production: D 0 R & (R is the root of the grammar)

The Recognizer Definition: a state S i is a quadruple <p, j, f, α>: p the number of the production rule (0 p d-1) j the location in production rules (0 j m) f the number of state set that created this state (0 f n+1) α look ahead string state set is an ordered set of states A final state is one in which j = m We add a state to a state set by putting it last in the ordered set (unless it is already a member)

The Recognizer Definition: H k (γ) is the set of all k-symbol terminal strings which begin some string derived from γ H k (γ) = { α α is terminal, α = k and Эβ s.t. γ = * > αβ } used in forming the look-ahead string for the states

The Recognizer This is a function of 3 arguments - REC(G, X i X n, k) computed as follows: // initialization: Let X n+i = & (for each 1 i k + 1) Let S i be empty (for each 0 i n + 1) Add (0,0,0,& k ) to S o

For i 0 step 1 until n do Begin The Recognizer Process the states of S i in order, performing one of the following three operations on each state s = <p, j, f, α>:

The Recognizer (1) Predictor: If s is nonfinal and C p(j+l) is a nonterminal, then for each q s.t. C p(j+l) = Dq, and for each β Є H k (C p(j+2)..- C pk ) add <q, 0, i, β> to S i

The Recognizer (2) Completer: If s is final and α = X i+1... X i+k, then for each <q,l,g,β> Є S f (after all states have been added to S f ) s.t. C q(l+1) = D p add <q,l + 1, g, β> to S i

The Recognizer (3) Scanner: If s is non-final and C p(j+l) is terminal then if C p(j+l) = X i+1 add <p, j+1, f, α> to S i+1

The Recognizer // rejection condition If S i+1 is empty, return rejection // acceptance condition If i = n and S i+1 = {(0,2,0,&>}, return acceptance End

The Recognizer Notations: The ordering imposed on state sets is not important to their meaning simply a device which allows their members to be processed correctly by the algorithm i cannot become greater than n without either rejection or acceptance occurring the & symbol appears only in production zero

Outline revisited Yaniv: Introduction Terminology Informal Explanation The Recognizer Morad: Example Time and Space Bounds Empirical results Practical Use

Grammar: Terminals: {a, +} Non-terminals: {E, T} Root: E Look ahead: 1 Input String: a + a

S 0 Φ.E & & 0 Put the initial state in S 0

S 0 Φ.E & & 0 E.E + T & 0 Predictor

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 Predictor

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 Predictor

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 Predictor

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 Predictor

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 Predictor state already exist

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 Predictor state already exist

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 Predictor

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & 0 Scanner

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & 0. + 0 Scanner

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & 0. + 0 Completer look ahead is not equal.

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & 0. + 0. & 0. + 0 Completer look ahead is equal. add all states from S 0 that the dot is before T.

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & 0. + 0. & 0. + 0 Completer look ahead is not equal.

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Completer look ahead is equal. add all states from S 0 that the dot is before E.

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Predictor nothing to do.

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Scanner symbol is not equal.

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 Scanner. S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 E E +. T + 0 Scanner. S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 E E +. T + 0 T.a & 2 Predictor. S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2 Predictor. S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2 Scanner. S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2 Scanner. S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Completer look ahead is add all states equal. from S 2 that the dot is before T.

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Completer look ahead is not equal.

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Completer look ahead is equal. add all states from S 0 that the dot is before E.

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Completer look ahead is not equal.

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 Scanner.

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 Scanner symbol is not equal.

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 Scanner symbol is not equal.

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 We ve reached the final state the string belongs to the grammar.

Time and Space Bounds In general the running time of the algorithm is O(n 3 ). S i ={<p, j, f, α>} p the number of the production rule. j the location in the production rule. f the number of state set that created this state. α look ahead. The number of states in any state set S i is O(i): p, j and α are bounded by the grammar properties. f bounded by i.

Time and Space Bounds cont. The scanner and predictor operations each execute a bounded number of steps per state in any state set. So the total time for processing the states in S i plus the scanner and predictor operations is O(i). The completer executes O(i) steps for each state it processes in the worst case because it may have to add O(j) states for S j, the state set pointed back to. So it takes O(i 2 ) steps in S i. Summing for all of the state sets give us O(n 3 ) steps. This bound holds even if the look-ahead is not used.

Time and Space Bounds cont. Only the completer is O(i 2 ) in what cases the completer will need only O(i) steps? After the completer has been applied on a state S i there are O(i) states in it. So unless some of the states were added in more than one way it took the completer O(i) steps to complete its operation.

Time and Space Bounds cont. In case that the grammar is unambiguous and reduced, we can show that each such state gets added in only one way. Assume that the state D q C q1,1 C q,(j+1) C q,q α f is added to S i in two different ways by the completer. Then we have two states in S i D p1 A p1,1 A p1,p 1 X i+1... X i+k f l D p2 A p2,1 A p2,p 2 X i+1... X i+k f 2 And C q,(j+1) = D p1 = D p2 and (p 1 p 2 or f 1 f 2 )

Time and Space Bounds cont. That means that we have two state sets S f1 and S f2 as follows: s f1 : D q C q1,1 C q,(j+1) C q,q α f s f2 : D q C q1,1 C q,(j+1) C q,q α f s i : D p1 A p1,1 A p1,p 1 X i+1... X i+k f l D p2 A p2,1 A p2,p 2 X i+1... X i+k f 2 So now we have S X 1... X f D q β X 1... X f C q,1... C q,(j+1)... C q,q β X 1... X f1 A p1,1... A p1,p 1 β 1 X 1... X i β 1 and S X 1... X f D q β X 1... X f C q,1... C q,(j+1)... C q,q β X 1... X f2 A p2,1... A p2,p 2 β 2 X 1... X i β 2

Time and Space Bounds cont. Since that p 1 p 2 or f 1 f 2 the derivations of X 1... X i are represented by different derivation trees. Therefore there is an ambiguous sentence X 1... X i α for some α. So if the grammar is unambiguous, the completer executes O(i) steps per state set and the time is bounded by O(n 2 ). This running time is also true for grammars with bounded ambiguity.

Time and Space Bounds cont. For LR(k) grammars the running time is O(n). Space the algorithm uses O(n) state sets, each containing O(n) states, therefore the space bound is O(n 2 ) in general.

Empirical Results The algorithm was tested with other context-free parsing algorithms and its running time was similar or better than the other algorithms. The algorithm was also as good as other specialist algorithms that works fast but only on specific types of grammars (like Knuth's algorithm that works only on LR(k) grammars in O(n))

Practical Use Changing the recognizer into a parser: Each time the completer add a state E αd.β g construct a pointer from the instance of D in that state to the state D γ. f which caused the completer to do the operation. E α D β γ

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & 0. + 0 Completer look ahead is not equal.

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & 0. + 0. & 0. + 0 Completer look ahead is equal. add all states from S 0 that the dot is before T.

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & 0. + 0. & 0. + 0 Completer look ahead is not equal.

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Completer look ahead is equal. add all states from S 0 that the dot is before E.

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Predictor nothing to do.

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Scanner symbol is not equal.

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 Scanner. S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 E E +. T + 0 Scanner. S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 E E +. T + 0 T.a & 2 Predictor. S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2 Predictor. S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2 Scanner. S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0

S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2 Scanner. S 1. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Completer look ahead is add all states equal. from S 2 that the dot is before T.

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Completer look ahead is not equal.

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Completer look ahead is equal. add all states from S 0 that the dot is before E.

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Completer look ahead is not equal.

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 Scanner.

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 Scanner symbol is not equal.

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 Scanner symbol is not equal.

S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & 0. + 0. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2. + 2. & 0. + 0 Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 E T a Φ E + T We ve reached the final state the string belongs to the grammar. a

Practical Use cont. The algorithm can also handle context-free grammars which makes use of the Kleene star notation: A (BC) * D Any state of the form A α.(β)*γ Or A α (β.)*γ is replaced by A α (.β)*γ A α (β)*.γ f f f f

Thank you