CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Similar documents
Parsing -3. A View During TD Parsing

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Top-Down Parsing and Intro to Bottom-Up Parsing

Syntax Analysis Part I

CSE302: Compiler Design

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Introduction to Bottom-Up Parsing

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Syntax Analysis Part III

Bottom-Up Syntax Analysis

THEORY OF COMPILATION

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Syntactic Analysis. Top-Down Parsing

Compiler Principles, PS4

Introduction to Bottom-Up Parsing

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

Syntax Analysis Part I

Shift-Reduce parser E + (E + (E) E [a-z] In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.

Compiling Techniques

Creating a Recursive Descent Parse Table

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 3. Y.N. Srikant

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

CISC4090: Theory of Computation

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata

MA/CSSE 474 Theory of Computation

CPS 220 Theory of Computation Pushdown Automata (PDA)

Pushdown Automata (2015/11/23)

Theory of Computation (IV) Yijia Chen Fudan University

CS 314 Principles of Programming Languages

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Pushdown Automata: Introduction (2)

Computer Science 160 Translation of Programming Languages

CA Compiler Construction

CSE 105 THEORY OF COMPUTATION

Compiler Construction Lent Term 2015 Lectures (of 16)

CS415 Compilers Syntax Analysis Top-down Parsing

Compiler Construction Lent Term 2015 Lectures (of 16)

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES

Compiler Construction Lectures 13 16

Compiling Techniques

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Accept or reject. Stack

Foundations of Informatics: a Bridging Course

Compiling Techniques

Pushdown Automata (Pre Lecture)

Chapter 4: Bottom-up Analysis 106 / 338

CS 406: Bottom-Up Parsing

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Theory of Computation Turing Machine and Pushdown Automata

5 Context-Free Languages

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

CSE 105 THEORY OF COMPUTATION

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Computer Science 160 Translation of Programming Languages

Before We Start. The Pumping Lemma. Languages. Context Free Languages. Plan for today. Now our picture looks like. Any questions?

CS153: Compilers Lecture 5: LL Parsing

October 6, Equivalence of Pushdown Automata with Context-Free Gramm

Turing machines and linear bounded automata

Automata Theory (2A) Young Won Lim 5/31/18

Turing machines and linear bounded automata

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Pushdown Automata. We have seen examples of context-free languages that are not regular, and hence can not be recognized by finite automata.

Compiler Principles, PS7

CS Rewriting System - grammars, fa, and PDA

Announcement: Midterm Prep

Lecture Notes on Inductive Definitions

Announcements. H6 posted 2 days ago (due on Tue) Midterm went well: Very nice curve. Average 65% High score 74/75

Lecture Notes on Inductive Definitions

What languages are Turing-decidable? What languages are not Turing-decidable? Is there a language that isn t even Turingrecognizable?

CPS 220 Theory of Computation

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering

Pushdown Automata. Reading: Chapter 6

Introduction to Theory of Computing

Lecture 17: Language Recognition

UNIT-VI PUSHDOWN AUTOMATA

Fundamentele Informatica II

SCHEME FOR INTERNAL ASSESSMENT TEST 3

UNIT-VIII COMPUTABILITY THEORY

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

CS20a: summary (Oct 24, 2002)

Turing machines and linear bounded automata

The Pumping Lemma. for all n 0, u 1 v n u 2 L (i.e. u 1 u 2 L, u 1 vu 2 L [but we knew that anyway], u 1 vvu 2 L, u 1 vvvu 2 L, etc.

Intro to Theory of Computation

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Section 1 (closed-book) Total points 30

Compilers. Lexical analysis. Yannis Smaragdakis, U. Athens (original slides by Sam

Homework Assignment 6 Answers

Outline. CS21 Decidability and Tractability. Machine view of FA. Machine view of FA. Machine view of FA. Machine view of FA.

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Introduction to Turing Machines. Reading: Chapters 8 & 9

Transcription:

CMSC 330: Organization of Programming Languages Pushdown Automata Parsing

Chomsky Hierarchy Categorization of various languages and grammars Each is strictly more restrictive than the previous First described by Noam Chomsky in 1956 Type 0: Any formal grammar Turing machines Type-1: Linear bounded automata Type-2: Pushdown automata (PDAs) Type-3: Regular expressions Finite state automata (NFAs/DFAs) CMSC 330

Implementing context-free languages Problem: enforcing balanced language constructs Solution: add a stack CMSC 330 3

Pushdown Automaton (PDA) A pushdown automaton (PDA) is an abstract machine similar to a DFA Has a finite set of states and transitions Also has a pushdown stack Moves of the PDA are as follows: An input symbol is read and the top symbol on the stack is read Based on both inputs, the machine Enters a new state, and Pushes zero or more symbols onto the pushdown stack Or pops zero or more symbols from the stack String accepted if the input has ended AND the stack is empty CMSC 330 4

Power of PDAs PDAs are more powerful than DFAs a n b n, which cannot be recognized by a DFA, can easily be recognized by the PDA Push all a symbols onto the stack For each b, pop an a off the stack If the end of input is reached at the same time that the stack becomes empty, the string is accepted CMSC 330 5

Formal Definition Q finite set of states Σ input alphabet Γ stack alphabet δ transitions from (Q Γ) to (Q Γ) on (Σ U {ε}) q0 start state (member of Q) Z initial stack symbol (member of Γ) F set of accepting states (subset of Q) CMSC 330

Parsing There are many efficient techniques for turning strings into parse trees They all have strange names, like LL(k), SLR(k), LR(k) They use various forms of PDAs We will look at one very simple technique: recursive descent parsing This is a top-down parsing algorithm because we re going to begin at the start symbol and try to produce the string We won t actually formally construct any PDAs CMSC 330 7

Example E id = n { L } L E ; L ε Here n is an integer and id is an identifier One input might be { x = 3; { y = 4; }; } This would get turned into a list of tokens { x = 3 ; { y = 4 ; } ; } And we want to turn it into a parse tree CMSC 330 8

Example (cont d) E id = n { L } L E ; L ε { x = 3; { y = 4; }; } E { L } E ; L x = 3 E ; L { L } ε E ; L y = 4 ε CMSC 330 9

Parsing Algorithm Goal: determine if we can produce a string from the grammar's start symbol At each step, we'll keep track of two facts What tree node are we trying to match? What is the next token (lookahead) of the input string? CMSC 330 10

Parsing Algorithm There are three cases: If we re trying to match a terminal and the next token (lookahead) is that token, then succeed, advance the lookahead, and continue If we re trying to match a nonterminal then pick which production to apply based on the lookahead Otherwise, fail with a parsing error CMSC 330 11

Example (cont d) E id = n { L } L E ; L ε E { L } E ; L x = 3 E ; L { L } ε E ; L { x = 3 ; { y = 4 ; } ; } y = 4 ε lookahead CMSC 330 12

Definition of First(γ) First(γ), for any terminal or nonterminal γ, is the set of initial terminals of all strings that γ may expand to We ll use this to decide what production to apply CMSC 330 13

Definition of First(γ), cont d For a terminal a, First(a) = { a } For a nonterminal N: If N ε, then add ε to First(N) If N α 1 α 2... α n, then (note the α i are all the symbols on the right side of one single production): Add First(α 1 α 2... α n ) to First(N), where First(α 1 α 2... α n ) is defined as First(α 1 ) if ε First(α 1 ) Otherwise (First(α 1 ) ε) First(α 2... α n ) If ε First(α i ) for all i, 1 i k, then add ε to First(N) CMSC 330 14

Examples E id = n { L } L E ; L ε First(id) = { id } First("=") = { "=" } First(n) = { n } First("{")= { "{" } First("}")= { "}" } First(";")= { ";" } First(E) = { id, "{" } First(L) = { id, "{", ε } E id = n { L } ε L E ; L ε First(id) = { id } First("=") = { "=" } First(n) = { n } First("{")= { "{" } First("}")= { "}" } First(";")= { ";" } First(E) = { id, "{", ε } First(L) = { id, "{", ";", ε } CMSC 330 15

Implementing a Recursive Descent Parser For each terminal symbol a, create a function parse_a, which: If the lookahead is a it consumes the lookahead by advancing the lookahead to the next token, and returns Otherwise fails with a parse error For each nonterminal N, create a function parse_n This function is called when we re trying to parse a part of the input which corresponds to (or can be derived from) N parse_s for the start symbol S begins the process CMSC 330 16

Implementing a Recursive Descent Parser, con't. The body of parse_n for a nonterminal N does the following: Let N β 1... β k be the productions of N Here β i is the entire right side of a production- a sequence of terminals and nonterminals Pick the production N β i such that the lookahead is in First(β i ) It must be that First(β i ) First(β j ) = for i j If there is no such production, but N ε then return Otherwise, then fail with a parse error Suppose β i = α 1 α 2... α n. Then call parse_α 1 ();... ; parse_α n () to match the expected right-hand side, and return CMSC 330 17

Example E id = n { L } L E ; L ε let parse_term t = if lookahead = t then lookahead := <next token> else raise <Parse error> let rec parse_e () = if lookahead = 'id' then begin parse_term 'id'; parse_term '='; parse_term 'n' end else if lookahead = '{' then begin parse_term '{'; parse_l (); parse_term '}'; end else raise <Parse error>; CMSC 330 18

Example (cont d) E id = n { L } L E ; L ε mutually recursive with previous let rec and parse_l () = if lookahead = 'id' lookahead = '{' then begin parse_e (); parse_term ';'; parse_l () end (* else return (not an error) *) CMSC 330 19

Things to Notice If you draw the execution trace of the parser as a tree, then you get the parse tree This is a predictive parser because we use the lookahead to determine exactly which production to use CMSC 330 20

Limitations: Overlapping First Sets This parsing strategy may fail on certain grammars because the First sets overlap This doesn t mean the grammar is not usable in a parser, just not in this type of parser Consider parsing the grammar E n + E n First(E) = n = First(n), so we can t use this technique Exercise: Rewrite this grammar so it becomes amenable to our parsing technique CMSC 330 21

Limitations: Left Recursion How about the grammar S Sa ε First(Sa) = a, so we re ok as far as which production But the body of parse_s() has an infinite loop if (lookahead = "a") then parse_s() This technique cannot handle left-recursion Exercise: rewrite this grammar to be right-recursive CMSC 330 22

Expr Grammar for Top-Down Parsing E T E' E' ε + E T P T' T' ε * T P n ( E ) Notice we can always decide what production to choose with only one symbol of lookahead CMSC 330 23

Interesting Question Recursive descent parsers are a form of pushdown automata But where's the stack? CMSC 330 24

What s Wrong with Parse Trees? We don't actually use parse trees to do translation Parse trees contain too much information E.g., they have parentheses and they have extra nonterminals for precedence This extra stuff is needed for parsing But when we want to reason about languages, it gets in the way (it s too much detail) CMSC 330 25

Abstract Syntax Trees (ASTs) An abstract syntax tree is a more compact, abstract representation of a parse tree, with only the essential parts parse tree AST CMSC 330 26

ASTs (cont d) Intuitively, ASTs correspond to the data structure you d use to represent strings in the language Note that grammars describe trees (so do OCaml datatypes which we ll see later) E a b c E+E E-E E*E (E) CMSC 330 27

Producing an AST To produce an AST, we modify the parse() functions to construct the AST along the way CMSC 330 28

General parsing algorithms LL parsing Scans input Left-to-right Builds Leftmost derivations Sometimes called top-down parsing Implemented with tables or recursive descent algorithm LR parsing Scans input Left-to-right Builds Rightmost derivations Sometimes called bottom-up parsing Usually implemented with shift-reduce algorithm LL(k) means LL with k-symbol lookahead CMSC 330 29

General parsing algorithms Recursive descent parsers are easy to write They're unable to handle certain kinds of grammars More powerful techniques generally require tool support, such as yacc and bison LR(k), SLR(k) [Simple LR(k)], and LALR(k) [Lookahead LR(k)] are all techniques used today to build efficient parsers. Recursive descent is a form of LL(k) parsing You ll study more about parsing in CMSC 430 CMSC 330 30

Context-free Grammars in Practice Regular expressions and finite automata are used to turn raw text into a string of tokens E.g., if, then, identifier, etc. Whitespace and comments are simply skipped These tokens are the input for the next phase of compilation This process is called lexing Lexer generators include lex and flex Grammars and pushdown automata are used to turn tokens into parse trees and/or ASTs This process is called parsing Parser generators include yacc and bison The compiler produces object code from ASTs CMSC 330 31

The Compilation Process CMSC 330 32