An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia

Size: px
Start display at page:

Download "An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia"

Transcription

1 An Efficient Context-Free Parsing Algorithm Speakers: Morad Ankri Yaniv Elia

2 Yaniv: Introduction Terminology Informal Explanation The Recognizer Morad: Example Time and Space Bounds Empirical results Practical Use Outline

3 Introduction: The Author

4 Grammar Introduction cont. The rules governing the use of a language Types of grammar: regular expressions Context-free Context-sensitive Recursively Enumerable

5 Introduction cont. Chomsky Grammars Hierarchy: Recursively Enumerable (Any) a n b n c n a n b n a*b* Context Free (A-> abc) Regular Expression (S->aB) Context Sensitive (AB-> CD)

6 Introduction cont. Representing Sentence Structure: Not just FSTs! Issue: Recursion Potentially infinite: a + a + a +.. Capture constituent structure: Basic units => Terminals Subcategorization => Non Terminals Hierarchical => Parse Tree

7 Introduction cont. Context-free Grammars (BNF grammars) Allows a simple and precise description of sentences which are built from smaller blocks Why "context-free? Non-terminals can be rewritten without regard to the context in which they occur Parsing Algorithms for these grammars play a large role in compilers and interpreters implementation (e.g. Yacc, Bison, JavaCC)

8 Introduction cont. Parsing Algorithms types: General Algorithms: handle all context-free grammars Restricted Algorithms: handle sub-classes of grammars Tend to be more efficient

9 Introduction cont. Earley s Algorithm is more efficient than all other parsing algorithms: can parse all context-free languages executes in cubic time O(n 3 ) in the general case O(n 2 ) for unambiguous grammars linear time for almost all LR(k) grammars It performs particularly well when the rules are written left-recursively

10 Language Terminology A set of strings over a finite set of terminal symbols. These terminal Symbols are represented by lowercase letters: a, b, c Non-terminal Symbols syntactic classes Represented by Capital letters: A, B, C

11 Terminology - cont. Strings of either terminals or non-terminals are represented by Greek letters: α, β, γ The empty string is λ. α k = α, α,, α (k times) α is the number of symbols in α.

12 Terminology - cont. Productions/rewriting rules A finite set of rules Represented as : A α The root of the grammar A non-terminal which stands for "sentence Alternatives The productions with a particular non-terminal D on their left sides

13 Terminology - cont. Example: T P T T * P P a Root Terminals Non Production Alternative Terminals Rule

14 Terminology - cont. Given a context-free grammar G: α => β There are γ, δ, η, A s.t. α = γaδ, β = γηδ and A η is a production α = * > β (β is derived from α) There are strings α 0, α 1,, α m s.t. α = α 0 => α 1 => => α m = β The sequence α 0, α 1,, α m is called a derivation

15 Terminology - cont. sentential form a string α s.t. α is derived from the root of the grammar ( R = * > α ) Sentence a sentential form consisting entirely of terminals Derivation tree (a.k.a. parse tree) a representation of a sentential from reflecting the steps made in deriving it

16 Terminology - cont. Example: a * a + a E => E + T (E E + T) => T + T (E T) => T + P (T P) => T * P + P (T T * P) => P * P + P (T P) => a * P + P (P a) => a * a + P (P a) => a * a + a (P a) E E + T T P T * P P a a a

17 Terminology - cont. Note: a derivation tree is not unique for a derivation! E => E + T (E E + T) => E + P (T P) => T + P (E T) =>T * P + P (T T * P) => P * P + P (T P) => a * P + P (P a) => a * a + P (P a) => a * a + a (P a) E E + T T P T * P P a a a

18 Terminology - cont. Note: a derivation tree is not unique for a derivation! A Parse Tree represents the steps deriving it, but E not their order! E => E + T (E E + T) => E + T (E E + T) => E + P (T P) => T + T (E T) => T + P (E T) => T + P (T P) =>T * P + P (T T * P) => T * P + P (T T * P) => P * P + P (T P) => P * P + P (T P) => a * P + P (P a) => a * P + P (P a) => a * a + P (P a) => a * a + P (P a) => a * a + a (P a) => a * a + a (P a)

19 Degree of ambiguity Terminology - cont. number of distinct derivation trees of a sentence Unambiguous sentence a sentence whose degree of ambiguity is 1 Unambiguous grammar contains only unambiguous sentences Bounded unambiguity a grammar with a bound b on the degree of ambiguity

20 The recognizer Terminology - cont. An algorithm which take a string as input Accepts/rejects it depending on whether or not the string is a sentence of the grammar The parser A recognizer which also outputs the set of all legal derivation trees for the string

21 Informal Explanation How does the recognizer work? Scans an input string X 1, X 2,, X n from left to right looking ahead some fixed number k of symbols As each symbol X i is scanned, a set of states S i is constructed representing the condition of the recognition process at that point in the scan

22 Informal Explanation Each state in the set represents a production s.t. we are currently scanning a portion of the input string which is derived from its right side a point in that production which shows how much of the production's right side we have recognized so far a k-symbol string which is a syntactically allowed successor to that instance of the production a pointer back to the position in the input string at which we began to look for that instance of the production

23 Example: Informal Explanation In grammar AE, with k = 1, S o starts as the single state Φ. E & & 0 new new non-terminal terminal production rule point K-symbol string (k=1) Pointer back to input string position

24 Informal Explanation Uses dynamic programming to do parallel top-down search in (worst case) O(N 3 ) time First, left-to-right pass fills out N+1 states sets Think of the states sets as sitting between words in the input string, keeping track of states of the parse at these positions For each word position, a set of states represents all partial parse trees generated to date. E.g. the state set S 0 contains all partial parse trees generated at the beginning of the sentence

25 Informal Explanation How to recognize a sentence? When we go over a state in S i, we notice 3 cases: The dot is not at the end of the state The dot is before a non-terminal symbol => Predictor The dot is before a terminal symbol => Scanner The dot is at the end of the state => Completer

26 Informal Explanation The predictor operation: If the dot is before a non-terminal symbol: Adds new states to the current state set One new state for each expansion of the non-terminal in the grammar Formally: Why? S j : A α B β l 1 i S j : B γ l 2 j (l 2 = first k symbols of β +l 1 )

27 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 Grammar: Input string:

28 Informal Explanation The scanner operation: If the dot is before a terminal symbol: compare that symbol with X i+1 if they match, it adds the state to the next state set, with the dot moved over one symbol in the state Formally: S j : A α B β l 1 i S j+1 : A α B β l 1 i Why?

29 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 Grammar: Input string: S 1. &

30 The completer: Informal Explanation if the dot of a state is at the end of a its production: compares the look-ahead string with P=X i+1 X i+k If they match: goes back to the state set S i indicated by the pointer adds all states from S i which have the derived nonterminal to the right of the dot For each of these states the dot is placed after this nonterminal

31 S 0 S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ.E & E.E + T E.T E.E + T E.T T.a T.a & & & + + &

32 Informal Explanation After going over all states in S i, we move on to S i+1 If the algorithm ever produces an S i+1 consisting of the single state Φ E &. & 0 then we have correctly scanned E and the & Symbol we are finished with the string, which means the input string is a sentence of the grammar!

33 The Recognizer A precise description of the recognizer: Given: input string X 1 X n., grammar G. We arbitrarily number the productions 1 d-1, where each production p is of the form: D p C p1 C pm (m = # of symbols in the alternative) We add a 0-th production: D 0 R & (R is the root of the grammar)

34 The Recognizer Definition: a state S i is a quadruple <p, j, f, α>: p the number of the production rule (0 p d-1) j the location in production rules (0 j m) f the number of state set that created this state (0 f n+1) α look ahead string state set is an ordered set of states A final state is one in which j = m We add a state to a state set by putting it last in the ordered set (unless it is already a member)

35 The Recognizer Definition: H k (γ) is the set of all k-symbol terminal strings which begin some string derived from γ H k (γ) = { α α is terminal, α = k and Эβ s.t. γ = * > αβ } used in forming the look-ahead string for the states

36 The Recognizer This is a function of 3 arguments - REC(G, X i X n, k) computed as follows: // initialization: Let X n+i = & (for each 1 i k + 1) Let S i be empty (for each 0 i n + 1) Add (0,0,0,& k ) to S o

37 For i 0 step 1 until n do Begin The Recognizer Process the states of S i in order, performing one of the following three operations on each state s = <p, j, f, α>:

38 The Recognizer (1) Predictor: If s is nonfinal and C p(j+l) is a nonterminal, then for each q s.t. C p(j+l) = Dq, and for each β Є H k (C p(j+2)..- C pk ) add <q, 0, i, β> to S i

39 The Recognizer (2) Completer: If s is final and α = X i+1... X i+k, then for each <q,l,g,β> Є S f (after all states have been added to S f ) s.t. C q(l+1) = D p add <q,l + 1, g, β> to S i

40 The Recognizer (3) Scanner: If s is non-final and C p(j+l) is terminal then if C p(j+l) = X i+1 add <p, j+1, f, α> to S i+1

41 The Recognizer // rejection condition If S i+1 is empty, return rejection // acceptance condition If i = n and S i+1 = {(0,2,0,&>}, return acceptance End

42 The Recognizer Notations: The ordering imposed on state sets is not important to their meaning simply a device which allows their members to be processed correctly by the algorithm i cannot become greater than n without either rejection or acceptance occurring the & symbol appears only in production zero

43 Outline revisited Yaniv: Introduction Terminology Informal Explanation The Recognizer Morad: Example Time and Space Bounds Empirical results Practical Use

44 Grammar: Terminals: {a, +} Non-terminals: {E, T} Root: E Look ahead: 1 Input String: a + a

45 S 0 Φ.E & & 0 Put the initial state in S 0

46 S 0 Φ.E & & 0 E.E + T & 0 Predictor

47 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 Predictor

48 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 Predictor

49 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 Predictor

50 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 Predictor

51 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 Predictor state already exist

52 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 Predictor state already exist

53 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 Predictor

54 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & 0 Scanner

55 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & Scanner

56 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & Completer look ahead is not equal.

57 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & & Completer look ahead is equal. add all states from S 0 that the dot is before T.

58 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & & Completer look ahead is not equal.

59 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Completer look ahead is equal. add all states from S 0 that the dot is before E.

60 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Predictor nothing to do.

61 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Scanner symbol is not equal.

62 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 Scanner. S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0

63 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 E E +. T + 0 Scanner. S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0

64 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 E E +. T + 0 T.a & 2 Predictor. S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0

65 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2 Predictor. S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0

66 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2 Scanner. S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0

67 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & Scanner. S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0

68 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Completer look ahead is add all states equal. from S 2 that the dot is before T.

69 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Completer look ahead is not equal.

70 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Completer look ahead is equal. add all states from S 0 that the dot is before E.

71 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Completer look ahead is not equal.

72 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 Scanner.

73 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 Scanner symbol is not equal.

74 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 Scanner symbol is not equal.

75 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 We ve reached the final state the string belongs to the grammar.

76 Time and Space Bounds In general the running time of the algorithm is O(n 3 ). S i ={<p, j, f, α>} p the number of the production rule. j the location in the production rule. f the number of state set that created this state. α look ahead. The number of states in any state set S i is O(i): p, j and α are bounded by the grammar properties. f bounded by i.

77 Time and Space Bounds cont. The scanner and predictor operations each execute a bounded number of steps per state in any state set. So the total time for processing the states in S i plus the scanner and predictor operations is O(i). The completer executes O(i) steps for each state it processes in the worst case because it may have to add O(j) states for S j, the state set pointed back to. So it takes O(i 2 ) steps in S i. Summing for all of the state sets give us O(n 3 ) steps. This bound holds even if the look-ahead is not used.

78 Time and Space Bounds cont. Only the completer is O(i 2 ) in what cases the completer will need only O(i) steps? After the completer has been applied on a state S i there are O(i) states in it. So unless some of the states were added in more than one way it took the completer O(i) steps to complete its operation.

79 Time and Space Bounds cont. In case that the grammar is unambiguous and reduced, we can show that each such state gets added in only one way. Assume that the state D q C q1,1 C q,(j+1) C q,q α f is added to S i in two different ways by the completer. Then we have two states in S i D p1 A p1,1 A p1,p 1 X i+1... X i+k f l D p2 A p2,1 A p2,p 2 X i+1... X i+k f 2 And C q,(j+1) = D p1 = D p2 and (p 1 p 2 or f 1 f 2 )

80 Time and Space Bounds cont. That means that we have two state sets S f1 and S f2 as follows: s f1 : D q C q1,1 C q,(j+1) C q,q α f s f2 : D q C q1,1 C q,(j+1) C q,q α f s i : D p1 A p1,1 A p1,p 1 X i+1... X i+k f l D p2 A p2,1 A p2,p 2 X i+1... X i+k f 2 So now we have S X 1... X f D q β X 1... X f C q,1... C q,(j+1)... C q,q β X 1... X f1 A p1,1... A p1,p 1 β 1 X 1... X i β 1 and S X 1... X f D q β X 1... X f C q,1... C q,(j+1)... C q,q β X 1... X f2 A p2,1... A p2,p 2 β 2 X 1... X i β 2

81 Time and Space Bounds cont. Since that p 1 p 2 or f 1 f 2 the derivations of X 1... X i are represented by different derivation trees. Therefore there is an ambiguous sentence X 1... X i α for some α. So if the grammar is unambiguous, the completer executes O(i) steps per state set and the time is bounded by O(n 2 ). This running time is also true for grammars with bounded ambiguity.

82 Time and Space Bounds cont. For LR(k) grammars the running time is O(n). Space the algorithm uses O(n) state sets, each containing O(n) states, therefore the space bound is O(n 2 ) in general.

83 Empirical Results The algorithm was tested with other context-free parsing algorithms and its running time was similar or better than the other algorithms. The algorithm was also as good as other specialist algorithms that works fast but only on specific types of grammars (like Knuth's algorithm that works only on LR(k) grammars in O(n))

84 Practical Use Changing the recognizer into a parser: Each time the completer add a state E αd.β g construct a pointer from the instance of D in that state to the state D γ. f which caused the completer to do the operation. E α D β γ

85 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & Completer look ahead is not equal.

86 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & & Completer look ahead is equal. add all states from S 0 that the dot is before T.

87 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & & Completer look ahead is not equal.

88 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Completer look ahead is equal. add all states from S 0 that the dot is before E.

89 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Predictor nothing to do.

90 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Scanner symbol is not equal.

91 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 Scanner. S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0

92 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 E E +. T + 0 Scanner. S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0

93 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 E E +. T + 0 T.a & 2 Predictor. S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0

94 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2 Predictor. S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0

95 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & 2 Scanner. S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0

96 S 0 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & Scanner. S 1. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0

97 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Completer look ahead is add all states equal. from S 2 that the dot is before T.

98 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Completer look ahead is not equal.

99 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Completer look ahead is equal. add all states from S 0 that the dot is before E.

100 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Completer look ahead is not equal.

101 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 Scanner.

102 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 Scanner symbol is not equal.

103 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 Scanner symbol is not equal.

104 S 0 S 1 Φ.E & & 0 E.E + T & 0 E.T & 0 E.E + T + 0 E.T + 0 T.a & 0 T.a + 0. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 S 2 S 3 S 4 E E +. T & 0 E E +. T + 0 T.a & 2 T.a + 2. & & Φ E. & & 0 E E. + T & 0 E E. + T + 0 Φ E &. & 0 E T a Φ E + T We ve reached the final state the string belongs to the grammar. a

105 Practical Use cont. The algorithm can also handle context-free grammars which makes use of the Kleene star notation: A (BC) * D Any state of the form A α.(β)*γ Or A α (β.)*γ is replaced by A α (.β)*γ A α (β)*.γ f f f f

106

107 Thank you

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016 CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:

More information

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write: Languages A language is a set (usually infinite) of strings, also known as sentences Each string consists of a sequence of symbols taken from some alphabet An alphabet, V, is a finite set of symbols, e.g.

More information

MA/CSSE 474 Theory of Computation

MA/CSSE 474 Theory of Computation MA/CSSE 474 Theory of Computation CFL Hierarchy CFL Decision Problems Your Questions? Previous class days' material Reading Assignments HW 12 or 13 problems Anything else I have included some slides online

More information

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering Review Earley Algorithm Chapter 13.4 Lecture #9 October 2009 Top-Down vs. Bottom-Up Parsers Both generate too many useless trees Combine the two to avoid over-generation: Top-Down Parsing with Bottom-Up

More information

Context Free Grammars

Context Free Grammars Automata and Formal Languages Context Free Grammars Sipser pages 101-111 Lecture 11 Tim Sheard 1 Formal Languages 1. Context free languages provide a convenient notation for recursive description of languages.

More information

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the

More information

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46. : Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching

More information

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing CMSC 330: Organization of Programming Languages Pushdown Automata Parsing Chomsky Hierarchy Categorization of various languages and grammars Each is strictly more restrictive than the previous First described

More information

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing Laura Kallmeyer, Timm Lichte, Wolfgang Maier Universität Tübingen Part I Formal Properties of TAG 16.05.2007 und 21.05.2007 TAG Parsing

More information

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0 Massachusetts Institute of Technology 6.863J/9.611J, Natural Language Processing, Spring, 2001 Department of Electrical Engineering and Computer Science Department of Brain and Cognitive Sciences Handout

More information

CA Compiler Construction

CA Compiler Construction CA4003 - Compiler Construction Bottom Up Parsing David Sinclair Bottom Up Parsing LL(1) parsers have enjoyed a bit of a revival thanks to JavaCC. LL(k) parsers must predict which production rule to use

More information

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10 Ambiguity, Precedence, Associativity & Top-Down Parsing Lecture 9-10 (From slides by G. Necula & R. Bodik) 2/13/2008 Prof. Hilfinger CS164 Lecture 9 1 Administrivia Team assignments this evening for all

More information

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

CMPT-825 Natural Language Processing. Why are parsing algorithms important? CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate

More information

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF) CS5371 Theory of Computation Lecture 7: Automata Theory V (CFG, CFL, CNF) Announcement Homework 2 will be given soon (before Tue) Due date: Oct 31 (Tue), before class Midterm: Nov 3, (Fri), first hour

More information

Chapter 4: Context-Free Grammars

Chapter 4: Context-Free Grammars Chapter 4: Context-Free Grammars 4.1 Basics of Context-Free Grammars Definition A context-free grammars, or CFG, G is specified by a quadruple (N, Σ, P, S), where N is the nonterminal or variable alphabet;

More information

Computer Science 160 Translation of Programming Languages

Computer Science 160 Translation of Programming Languages Computer Science 160 Translation of Programming Languages Instructor: Christopher Kruegel Building a Handle Recognizing Machine: [now, with a look-ahead token, which is LR(1) ] LR(k) items An LR(k) item

More information

(NB. Pages are intended for those who need repeated study in formal languages) Length of a string. Formal languages. Substrings: Prefix, suffix.

(NB. Pages are intended for those who need repeated study in formal languages) Length of a string. Formal languages. Substrings: Prefix, suffix. (NB. Pages 22-40 are intended for those who need repeated study in formal languages) Length of a string Number of symbols in the string. Formal languages Basic concepts for symbols, strings and languages:

More information

Everything You Always Wanted to Know About Parsing

Everything You Always Wanted to Know About Parsing Everything You Always Wanted to Know About Parsing Part V : LR Parsing University of Padua, Italy ESSLLI, August 2013 Introduction Parsing strategies classified by the time the associated PDA commits to

More information

UNIT-VIII COMPUTABILITY THEORY

UNIT-VIII COMPUTABILITY THEORY CONTEXT SENSITIVE LANGUAGE UNIT-VIII COMPUTABILITY THEORY A Context Sensitive Grammar is a 4-tuple, G = (N, Σ P, S) where: N Set of non terminal symbols Σ Set of terminal symbols S Start symbol of the

More information

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar CSC 4181Compiler Construction Context-ree Grammars Using grammars in parsers CG 1 Parsing Process Call the scanner to get tokens Build a parse tree from the stream of tokens A parse tree shows the syntactic

More information

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario Parsing Algorithms CS 4447/CS 9545 -- Stephen Watt University of Western Ontario The Big Picture Develop parsers based on grammars Figure out properties of the grammars Make tables that drive parsing engines

More information

Parsing -3. A View During TD Parsing

Parsing -3. A View During TD Parsing Parsing -3 Deterministic table-driven parsing techniques Pictorial view of TD and BU parsing BU (shift-reduce) Parsing Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1) Problems with

More information

Context-free Grammars and Languages

Context-free Grammars and Languages Context-free Grammars and Languages COMP 455 002, Spring 2019 Jim Anderson (modified by Nathan Otterness) 1 Context-free Grammars Context-free grammars provide another way to specify languages. Example:

More information

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing Plan for Today Recall Predictive Parsing when it works and when it doesn t necessary to remove left-recursion might have to left-factor Error recovery for predictive parsers Predictive parsing as a specific

More information

Compiling Techniques

Compiling Techniques Lecture 5: Top-Down Parsing 26 September 2017 The Parser Context-Free Grammar (CFG) Lexer Source code Scanner char Tokeniser token Parser AST Semantic Analyser AST IR Generator IR Errors Checks the stream

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars

More information

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements

More information

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union. We can show that every RL is also a CFL Since a regular grammar is certainly context free. We can also show by only using Regular Expressions and Context Free Grammars That is what we will do in this half.

More information

Introduction to Theory of Computing

Introduction to Theory of Computing CSCI 2670, Fall 2012 Introduction to Theory of Computing Department of Computer Science University of Georgia Athens, GA 30602 Instructor: Liming Cai www.cs.uga.edu/ cai 0 Lecture Note 3 Context-Free Languages

More information

Foundations of Informatics: a Bridging Course

Foundations of Informatics: a Bridging Course Foundations of Informatics: a Bridging Course Week 3: Formal Languages and Semantics Thomas Noll Lehrstuhl für Informatik 2 RWTH Aachen University noll@cs.rwth-aachen.de http://www.b-it-center.de/wob/en/view/class211_id948.html

More information

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis Context-free grammars Derivations Parse Trees Left-recursive grammars Top-down parsing non-recursive predictive parsers construction of parse tables Bottom-up parsing shift/reduce parsers LR parsers GLR

More information

MA/CSSE 474 Theory of Computation

MA/CSSE 474 Theory of Computation MA/CSSE 474 Theory of Computation Bottom-up parsing Pumping Theorem for CFLs Recap: Going One Way Lemma: Each context-free language is accepted by some PDA. Proof (by construction): The idea: Let the stack

More information

CPS 220 Theory of Computation

CPS 220 Theory of Computation CPS 22 Theory of Computation Review - Regular Languages RL - a simple class of languages that can be represented in two ways: 1 Machine description: Finite Automata are machines with a finite number of

More information

Compiling Techniques

Compiling Techniques Lecture 5: Top-Down Parsing 6 October 2015 The Parser Context-Free Grammar (CFG) Lexer Source code Scanner char Tokeniser token Parser AST Semantic Analyser AST IR Generator IR Errors Checks the stream

More information

This lecture covers Chapter 5 of HMU: Context-free Grammars

This lecture covers Chapter 5 of HMU: Context-free Grammars This lecture covers Chapter 5 of HMU: Context-free rammars (Context-free) rammars (Leftmost and Rightmost) Derivations Parse Trees An quivalence between Derivations and Parse Trees Ambiguity in rammars

More information

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing Introduction to Bottom-Up Parsing Outline Review LL parsing Shift-reduce parsing The LR parsing algorithm Constructing LR parsing tables Compiler Design 1 (2011) 2 Top-Down Parsing: Review Top-down parsing

More information

Follow sets. LL(1) Parsing Table

Follow sets. LL(1) Parsing Table Follow sets. LL(1) Parsing Table Exercise Introducing Follow Sets Compute nullable, first for this grammar: stmtlist ::= ε stmt stmtlist stmt ::= assign block assign ::= ID = ID ; block ::= beginof ID

More information

Computational Models - Lecture 4

Computational Models - Lecture 4 Computational Models - Lecture 4 Regular languages: The Myhill-Nerode Theorem Context-free Grammars Chomsky Normal Form Pumping Lemma for context free languages Non context-free languages: Examples Push

More information

THEORY OF COMPILATION

THEORY OF COMPILATION Lecture 04 Syntax analysis: top-down and bottom-up parsing THEORY OF COMPILATION EranYahav 1 You are here Compiler txt Source Lexical Analysis Syntax Analysis Parsing Semantic Analysis Inter. Rep. (IR)

More information

Lecture Notes on Inductive Definitions

Lecture Notes on Inductive Definitions Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 September 2, 2004 These supplementary notes review the notion of an inductive definition and

More information

Section 1 (closed-book) Total points 30

Section 1 (closed-book) Total points 30 CS 454 Theory of Computation Fall 2011 Section 1 (closed-book) Total points 30 1. Which of the following are true? (a) a PDA can always be converted to an equivalent PDA that at each step pops or pushes

More information

Syntax Analysis Part I

Syntax Analysis Part I 1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007-2013 2 Position of a Parser in the Compiler Model Source Program Lexical Analyzer

More information

Creating a Recursive Descent Parse Table

Creating a Recursive Descent Parse Table Creating a Recursive Descent Parse Table Recursive descent parsing is sometimes called LL parsing (Left to right examination of input, Left derivation) Consider the following grammar E TE' E' +TE' T FT'

More information

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking) Announcements n Quiz 1 n Hold on to paper, bring over at the end n HW1 due today n HW2 will be posted tonight n Due Tue, Sep 18 at 2pm in Submitty! n Team assignment. Form teams in Submitty! n Top-down

More information

Syntax Analysis (Part 2)

Syntax Analysis (Part 2) Syntax Analysis (Part 2) Martin Sulzmann Martin Sulzmann Syntax Analysis (Part 2) 1 / 42 Bottom-Up Parsing Idea Build right-most derivation. Scan input and seek for matching right hand sides. Terminology

More information

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars Salil Vadhan October 2, 2012 Reading: Sipser, 2.1 (except Chomsky Normal Form). Algorithmic questions about regular

More information

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing Outline Introduction to Bottom-Up Parsing Review LL parsing Shift-reduce parsing he LR parsing algorithm Constructing LR parsing tables Compiler Design 1 (2011) 2 op-down Parsing: Review op-down parsing

More information

Accept or reject. Stack

Accept or reject. Stack Pushdown Automata CS351 Just as a DFA was equivalent to a regular expression, we have a similar analogy for the context-free grammar. A pushdown automata (PDA) is equivalent in power to contextfree grammars.

More information

CS481F01 Prelim 2 Solutions

CS481F01 Prelim 2 Solutions CS481F01 Prelim 2 Solutions A. Demers 7 Nov 2001 1 (30 pts = 4 pts each part + 2 free points). For this question we use the following notation: x y means x is a prefix of y m k n means m n k For each of

More information

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26 Parsing Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 26 Table of contents 1 Context-Free Grammars 2 Simplifying CFGs Removing useless symbols Eliminating

More information

Lecture Notes on Inductive Definitions

Lecture Notes on Inductive Definitions Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 August 28, 2003 These supplementary notes review the notion of an inductive definition and give

More information

CSE302: Compiler Design

CSE302: Compiler Design CSE302: Compiler Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University February 27, 2007 Outline Recap

More information

CYK Algorithm for Parsing General Context-Free Grammars

CYK Algorithm for Parsing General Context-Free Grammars CYK Algorithm for Parsing General Context-Free Grammars Why Parse General Grammars Can be difficult or impossible to make grammar unambiguous thus LL(k) and LR(k) methods cannot work, for such ambiguous

More information

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing Introduction to Bottom-Up Parsing Outline Review LL parsing Shift-reduce parsing The LR parsing algorithm Constructing LR parsing tables 2 Top-Down Parsing: Review Top-down parsing expands a parse tree

More information

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where Recitation 11 Notes Context Free Grammars Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x A V, and x (V T)*. Examples Problem 1. Given the

More information

Compiling Techniques

Compiling Techniques Lecture 6: 9 October 2015 Announcement New tutorial session: Friday 2pm check ct course webpage to find your allocated group Table of contents 1 2 Ambiguity s Bottom-Up Parser A bottom-up parser builds

More information

Solutions to Problem Set 3

Solutions to Problem Set 3 V22.0453-001 Theory of Computation October 8, 2003 TA: Nelly Fazio Solutions to Problem Set 3 Problem 1 We have seen that a grammar where all productions are of the form: A ab, A c (where A, B non-terminals,

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding

More information

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing Outline Introduction to Bottom-Up Parsing Review LL parsing Shift-reduce parsing he LR parsing algorithm Constructing LR parsing tables 2 op-down Parsing: Review op-down parsing expands a parse tree from

More information

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET Regular Languages and FA A language is a set of strings over a finite alphabet Σ. All languages are finite or countably infinite. The set of all languages

More information

Syntax Analysis Part I

Syntax Analysis Part I 1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007-2013 2 Position of a Parser in the Compiler Model Source Program Lexical Analyzer

More information

Syntactic Analysis. Top-Down Parsing

Syntactic Analysis. Top-Down Parsing Syntactic Analysis Top-Down Parsing Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in Compilers class at University of Southern California (USC) have explicit permission to make

More information

The Post Correspondence Problem

The Post Correspondence Problem The Post Correspondence Problem - Given a set, P of pairs of strings: where t i, b i Σ P = {[ t 1 b 1 ], t 2 b 2 ],, t k b k ]} - Question: Does there exist a sequence i 1, i 2, i n such that: t i1 t i2

More information

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4 1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van ngelen, Flora State University, 2007 Position of a Parser in the Compiler Model 2 Source Program Lexical Analyzer Lexical

More information

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages Do Homework 2. What Is a Language? Grammars, Languages, and Machines L Language Grammar Accepts Machine Strings: the Building Blocks of Languages An alphabet is a finite set of symbols: English alphabet:

More information

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1 Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS 164 -- Berkley University 1 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient

More information

Functions on languages:

Functions on languages: MA/CSSE 474 Final Exam Notation and Formulas page Name (turn this in with your exam) Unless specified otherwise, r,s,t,u,v,w,x,y,z are strings over alphabet Σ; while a, b, c, d are individual alphabet

More information

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id Bottom-Up Parsing Attempts to traverse a parse tree bottom up (post-order traversal) Reduces a sequence of tokens to the start symbol At each reduction step, the RHS of a production is replaced with LHS

More information

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example Administrivia Test I during class on 10 March. Bottom-Up Parsing Lecture 11-12 From slides by G. Necula & R. Bodik) 2/20/08 Prof. Hilfinger CS14 Lecture 11 1 2/20/08 Prof. Hilfinger CS14 Lecture 11 2 Bottom-Up

More information

CONTEXT FREE GRAMMAR AND

CONTEXT FREE GRAMMAR AND CONTEXT FREE GRAMMAR AND PARSING STATIC ANALYSIS - PARSING Source language Scanner (lexical analysis) tokens Parser (syntax analysis) Syntatic structure Semantic Analysis (IC generator) Syntatic/sema ntic

More information

Context-Free Grammars and Languages

Context-Free Grammars and Languages Context-Free Grammars and Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Context-Free and Noncontext-Free Languages

Context-Free and Noncontext-Free Languages Examples: Context-Free and Noncontext-Free Languages a*b* is regular. A n B n = {a n b n : n 0} is context-free but not regular. A n B n C n = {a n b n c n : n 0} is not context-free The Regular and the

More information

A* Search. 1 Dijkstra Shortest Path

A* Search. 1 Dijkstra Shortest Path A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the

More information

CISC4090: Theory of Computation

CISC4090: Theory of Computation CISC4090: Theory of Computation Chapter 2 Context-Free Languages Courtesy of Prof. Arthur G. Werschulz Fordham University Department of Computer and Information Sciences Spring, 2014 Overview In Chapter

More information

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer.

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer. Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer. Language Generator: Context free grammars are language generators,

More information

Syntax Analysis - Part 1. Syntax Analysis

Syntax Analysis - Part 1. Syntax Analysis Syntax Analysis Outline Context-Free Grammars (CFGs) Parsing Top-Down Recursive Descent Table-Driven Bottom-Up LR Parsing Algorithm How to Build LR Tables Parser Generators Grammar Issues for Programming

More information

INF5110 Compiler Construction

INF5110 Compiler Construction INF5110 Compiler Construction Parsing Spring 2016 1 / 131 Outline 1. Parsing Bottom-up parsing Bibs 2 / 131 Outline 1. Parsing Bottom-up parsing Bibs 3 / 131 Bottom-up parsing: intro LR(0) SLR(1) LALR(1)

More information

Announcements. H6 posted 2 days ago (due on Tue) Midterm went well: Very nice curve. Average 65% High score 74/75

Announcements. H6 posted 2 days ago (due on Tue) Midterm went well: Very nice curve. Average 65% High score 74/75 Announcements H6 posted 2 days ago (due on Tue) Mterm went well: Average 65% High score 74/75 Very nice curve 80 70 60 50 40 30 20 10 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106

More information

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar TAFL 1 (ECS-403) Unit- III 3.1 Definition of CFG (Context Free Grammar) and problems 3.2 Derivation 3.3 Ambiguity in Grammar 3.3.1 Inherent Ambiguity 3.3.2 Ambiguous to Unambiguous CFG 3.4 Simplification

More information

CSCI Compiler Construction

CSCI Compiler Construction CSCI 742 - Compiler Construction Lecture 12 Cocke-Younger-Kasami (CYK) Algorithm Instructor: Hossein Hojjat February 20, 2017 Recap: Chomsky Normal Form (CNF) A CFG is in Chomsky Normal Form if each rule

More information

Top-Down Parsing and Intro to Bottom-Up Parsing

Top-Down Parsing and Intro to Bottom-Up Parsing Predictive Parsers op-down Parsing and Intro to Bottom-Up Parsing Lecture 7 Like recursive-descent but parser can predict which production to use By looking at the next few tokens No backtracking Predictive

More information

Bottom up parsing. General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing);

Bottom up parsing. General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing); Bottom up parsing General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing); 1 Top-down vs Bottom-up Bottom-up more powerful than top-down; Can process

More information

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis 1 Introduction Parenthesis Matching Problem Describe the set of arithmetic expressions with correctly matched parenthesis. Arithmetic expressions with correctly matched parenthesis cannot be described

More information

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa CS:4330 Theory of Computation Spring 2018 Regular Languages Finite Automata and Regular Expressions Haniel Barbosa Readings for this lecture Chapter 1 of [Sipser 1996], 3rd edition. Sections 1.1 and 1.3.

More information

Natural Language Processing. Lecture 13: More on CFG Parsing

Natural Language Processing. Lecture 13: More on CFG Parsing Natural Language Processing Lecture 13: More on CFG Parsing Probabilistc/Weighted Parsing Example: ambiguous parse Probabilistc CFG Ambiguous parse w/probabilites 0.05 0.05 0.20 0.10 0.30 0.20 0.60 0.75

More information

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class LR2: LR(0) Parsing LR Parsing CMPT 379: Compilers Instructor: Anoop Sarkar anoopsarkar.github.io/compilers-class Parsing - Roadmap Parser: decision procedure: builds a parse tree Top-down vs. bottom-up

More information

FLAC Context-Free Grammars

FLAC Context-Free Grammars FLAC Context-Free Grammars Klaus Sutner Carnegie Mellon Universality Fall 2017 1 Generating Languages Properties of CFLs Generation vs. Recognition 3 Turing machines can be used to check membership in

More information

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata CISC 4090: Theory of Computation Chapter Regular Languages Xiaolan Zhang, adapted from slides by Prof. Werschulz Section.: Finite Automata Fordham University Department of Computer and Information Sciences

More information

Introduction to Computers & Programming

Introduction to Computers & Programming 16.070 Introduction to Computers & Programming Theory of computation: What is a computer? FSM, Automata Prof. Kristina Lundqvist Dept. of Aero/Astro, MIT Models of Computation What is a computer? If you

More information

Context-Free Grammars and Languages. We have seen that many languages cannot be regular. Thus we need to consider larger classes of langs.

Context-Free Grammars and Languages. We have seen that many languages cannot be regular. Thus we need to consider larger classes of langs. Context-Free Grammars and Languages We have seen that many languages cannot be regular. Thus we need to consider larger classes of langs. Contex-Free Languages (CFL s) played a central role natural languages

More information

UNIT II REGULAR LANGUAGES

UNIT II REGULAR LANGUAGES 1 UNIT II REGULAR LANGUAGES Introduction: A regular expression is a way of describing a regular language. The various operations are closure, union and concatenation. We can also find the equivalent regular

More information

Context-Free Languages (Pre Lecture)

Context-Free Languages (Pre Lecture) Context-Free Languages (Pre Lecture) Dr. Neil T. Dantam CSCI-561, Colorado School of Mines Fall 2017 Dantam (Mines CSCI-561) Context-Free Languages (Pre Lecture) Fall 2017 1 / 34 Outline Pumping Lemma

More information

On Parsing Expression Grammars A recognition-based system for deterministic languages

On Parsing Expression Grammars A recognition-based system for deterministic languages Bachelor thesis in Computer Science On Parsing Expression Grammars A recognition-based system for deterministic languages Author: Démian Janssen wd.janssen@student.ru.nl First supervisor/assessor: Herman

More information

EXAM. Please read all instructions, including these, carefully NAME : Problem Max points Points 1 10 TOTAL 100

EXAM. Please read all instructions, including these, carefully NAME : Problem Max points Points 1 10 TOTAL 100 EXAM Please read all instructions, including these, carefully There are 7 questions on the exam, with multiple parts. You have 3 hours to work on the exam. The exam is open book, open notes. Please write

More information

Introduction to Metalogic 1

Introduction to Metalogic 1 Philosophy 135 Spring 2012 Tony Martin Introduction to Metalogic 1 1 The semantics of sentential logic. The language L of sentential logic. Symbols of L: (i) sentence letters p 0, p 1, p 2,... (ii) connectives,

More information

CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata

CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata CMSC 330: Organization of Programming Languages Regular Expressions and Finite Automata CMSC330 Spring 2018 1 How do regular expressions work? What we ve learned What regular expressions are What they

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Regular Expressions and Finite Automata CMSC 330 Spring 2017 1 How do regular expressions work? What we ve learned What regular expressions are What they

More information

Variants of Turing Machine (intro)

Variants of Turing Machine (intro) CHAPTER 3 The Church-Turing Thesis Contents Turing Machines definitions, examples, Turing-recognizable and Turing-decidable languages Variants of Turing Machine Multi-tape Turing machines, non-deterministic

More information

The Church-Turing Thesis

The Church-Turing Thesis The Church-Turing Thesis Huan Long Shanghai Jiao Tong University Acknowledgements Part of the slides comes from a similar course in Fudan University given by Prof. Yijia Chen. http://basics.sjtu.edu.cn/

More information

5 Context-Free Languages

5 Context-Free Languages CA320: COMPUTABILITY AND COMPLEXITY 1 5 Context-Free Languages 5.1 Context-Free Grammars Context-Free Grammars Context-free languages are specified with a context-free grammar (CFG). Formally, a CFG G

More information