Complers Sprng term Alfonso Ortega: alfonso.ortega@uam.es nrque Alfonseca: enrque.alfonseca@uam.es Chapter : Syntactc analyss. Introducton. Bottom-up Analyss
Syntax Analyser Concepts It analyses the context-ndependent structures n the programmng language. A context-ndependent grammar s a quadruple: < Σ,, axom, rules > where axom rules Σ *,, Σ= Σ In other words, the producton rules have the followng syntax: A v, A v Σ * he syntax analyser s also called parser. It usually controls all the actons performed by the compler. he analyss can be understood as the procedure to fnd the syntactc dervaton tree of the source program f ths s correct, whch s a way of reducng the whole program to the grammar s axom. Remember that, after the morphologcal analyss, the termnal symbols n the grammar are the syntactc unts. Language features whch are not context-ndependent Declarng dentfers before ts use [Aho] Consder the followng language L={wcw w a b*} hese are some of the words n the language: aabcaab aca bcb We can consder t an abstracton of the language that requres that dentfers must be declared before ts use before the c we would have the declaraton of the varable, and after the c we would fnd ts use: In aabcaab the dentfer would be aab In aca the dentfer would be a In bcb the dentfer would be b hs s a very smplfed example of the real stuaton descrbed the varable mght be used many tmes, for nstance.
Language features whch are not context-ndependent Declarng dentfers before ts use [Aho] It can be proved, usng the pumpng lemma, that the prevous language s not context-ndependent. Any real programmng language n whch varables have to be declared before ther use, therefore, wll NO be context-ndependent. Possble solutons: Use a formalsm more expressve than context-ndependent grammars, and an archtecture whch s more powerful than pushdown automata. Add to the grammar a mechansm whch allows us to solve the problem: he symbols table It s the most usual procedure. Language features whch are not context-ndependent Declarng dentfers before ts use [Aho] Let s consder the followng language L={a n b m c n d m n, m } If we analyse any of the words n ths language aaabcccd abbcdd aaaaabbcccccdd he concdence of the number of repettons of the sequences of a s and c s can be nterpreted as, for nstance, checkng that the number of parameters, when callng a subroutne, s the same as when the subroutne was declared. In aaabcccd the frst functon has been declared wth three arguments three a s, and then called wth three arguments three c s aaa ccc In the same way, the second functon s declared and called wth parameter b d. he other two examples are alke. Observe that ths language s a bg smplfcaton wth respect to the real stuaton descrbed e.g. no functon name and parameter types.
Language features whch are not context-ndependent Declarng dentfers before ts use [Aho] It can be proved, usng the pumpng lemma, that the prevous language s not context-ndependent. Any real programmng language n whch procedures are called wth parameters and t s necessary to check ts number or type wll NO be context-ndependent. Possble solutons: Use a formalsm more expressve than context-ndependent grammars, and an archtecture whch s more powerful than pushdown automata. Add to the grammar a mechansm whch allows us to solve the problem: he symbols table It s the most usual procedure. Context-ndependent structures n programmng languages Structured programmng Consder the followng fragment of the ASPL grammar: : <stm tran> ::= <statement> : <statement> ; <stm tran> : <cond stm> ::= f <exp> then <stm tran> f : f <exp> then <stm tran> else <stm tran> f : <loop stm> ::= whle <exp> do <stm tran> end : repeat <stm tran> untl <exp> hese rules represent, respectvely, the followng structures for flow-control: Block of sentences. Condtonal flow. Loops. he exstence of these rules show that the language s context-ndependent and t s not a regular language. repeat n untl n
Context-ndependent structures n programmng languages We wll be requred to use Context-ndependent grammars A symbols table Conclusons 9 Syntax Analyser Strateges for syntactc analyss he varous technques for parsng can be grouped n two man types: Frst approach top-down: Havng the axom, the grammar and the program transformed nto a sequence of syntactc unts as startng ponts, ry wth the dfferent dervaton optons for each non-termnal that we mght fnd. Proceed smultaneously along the program and the leaves of the dervaton tree that we are buldng when we fnd a concdence between both. Untl ther we have generated the whole program from the axom, usng the grammar rules and, so, the program was syntactcally correct. Or we have tred every sngle possble opton for each non-termnal symbol ncludng the axom, and t s utterly mpossble to obtan the source program from the grammar, so ths must be syntactcally wrong.
Syntax Analyser Strateges for syntactc analyss Second approach bottom-up: Havng the sequence of tokens returned by the morphologcal analyser, we start readng t tryng to match t wth the rght-hand sdes of the grammar rules. here mght be several possbltes for ths. When ths happens, we substtute the sequence of tokens wth the nontermnal at the left-hand sde of the rule, and contnue the same analyss. If, at any tme, none of the sequences n the strng corresponds to any of the rght-hand sdes of the rules, we backtrack and try a dfferent substtuton. When every opton has been systematcally tred, and the analyss could not be completed, the program must be syntactcally ncorrect. he other way of fnshng the analyss s when we fnally obtan the axom of the grammar alone. In ths last case, the program s syntactcally correct. Bottom-up analyss Introducton he nput strng wll be scanned, lookng for concdence wth the rghthand sdes of the rules n the grammar. Whle the termnal symbols match wth the rght-hand sdes of any rule, we advance along the nput. When we fnd the complete rght-hand sde of a rule, t wll be reduced, by substtutng that rght-hand sde wth ts left-hand sde. In ths way, the strng that must be reduced to the axom changes reproducng the dervaton tree of the strng from the bottom upwards.
LR analyss Concepts here are two man operatons n bottom-up parsng: shft and reduce. Intutvely: Reducton, It occurs when all the components n the rght-hand-sde of a rule have all been matched wth symbols n the nput strng. A α...α m In essence, analysers replace the rght-hand-sde wth the nontermnal symbol at the left of the rule, constructng the new reduced strng. he fnal am s to reduce that strng to the axom. A A α. α m t...t j α. α - t...t j LR Analyss Concepts Intutvely, Shft, It occurs when the termnals found n the strng, one by one, match wth the correspondng part of the rght-hand sde of a rule n the grammar. he analysers note the crcumstance and store the poston n the rght-hand sde of all the rules that mght fnally be reduced n the nput. We shall see a few examples of ths process. N h t k N v t j N b t j N g t l N t t m N y N N q t p N s N N h t k N v t j N b t j N g t l N t t m N y N N q t p N s N...,t j t j t j t j t j t j t j t j,t j,......,t j t j t j t j t j t j t j t j,t j,...
LR Analyss General LR algorthm: ntroducton hs knd of analyss needs two steps:. Constructng a table for syntactc analyss.. Usng that table n the analyss. Step wll be the same for all subtypes of the LR algorthm. hese dffer n step. he table has the followng structure: As many columns as symbols n the grammar termnals and nontermnals. he columns correspondng to termnal symbols determne the acton that the parser does n each case. he columns correspondng to non-termnals determne a go-to functon. As many rows as analyss states seen afterwards. LR Analyss General LR algorthm: use of the table Use of the table n the analyss An LR parser s a pushdown automata assocated to a context-ndependent grammar, whch has been modfed so as to smplfy as much as possble ts codng, management and performance. he followng fgure shows the structure of the parser: INPU ANALYSIS ABL SACK a Σ s m top a n N N m X m s s m- X m- OUPU a u s k Actons s
LR Analyss General LR algorthm: nput, stack and table he analyser has the followng components: he nput has, ntally, the strng that we want to analyse, followed by an endng symbol, e.g. he stack, where symbols wll be ntroduced n pars at most, n the order X m s m where X m s termnal or non-termnal, and s m s a state he table, whch contans In the acton columns, nstructons about how the analyser should perform f t s n the state correspondng to the row, and t reads the symbol ndcated by the column. Possble actons are: ss, shft the state s rp, reduce usng the producton rule P Accept, for eptng the analyss rror, for endng the analyss wth an error. In the go-to columns, the transtons between the states. herefore, the table cells contan names of states. Its behavour s as follows: LR Analyss General LR algorthm: ntroductory example Let s consder the followng grammar: G =<{,}, {,,,} { }, > 9
LR Analyss General LR algorthm: ntroductory example he grammar wll be extended wth a new axom, whose only purpose s to add the end-of-strng symbol: G =<{,, }, {,,,,} { }, > 9 LR Analyss General LR algorthm: ntroducton Let us assume that the analyss table s ths we shall see how to buld t later: Σ * r r r s Acton
LR Analyss Algorthm of the analyser he algorthm can be summarsed as follows: State LRAnalyseranalyss_table, nput, stack, grammar /* nput contans the strng w to be analysed */ { ponter current_symbol=entrada[]; state current_state; pushstack,; whle true /* Unendng loop */ { f analyss_table[current_state, current_symbol] == ss { pushstack, current_symbol; pushstack, s ; current_symbol;} else f analyss_table[current_state, current_symbol] == rj { /* Assume the j-th rule s A α */ perform *longtudα popstack s topstack; pusha; pushstack, analyss_table[s,a]; prntf Reduce: A α ;} else f analyss_table[current_state, current_symbol] == ept return ACCPD SRING ; else /* empty cell */ return Syntactc error: RJCD SRING ; } } LR Analyss: ntroductory example Σ r r r s Acton
LR Analyss: ntroductory example Σ r r r s Acton LR Analyss: ntroductory example Σ r r r s Acton
LR Analyss: ntroductory example Σ r r r s Acton LR Analyss: ntroductory example Σ r r r s Acton
LR Analyss: ntroductory example Σ r r r s Acton LR Analyss: ntroductory example Σ r r r s Acton
LR Analyss: ntroductory example Σ r r r s Acton 9 LR Analyss: ntroductory example Σ r r r s Acton
LR Analyss: ntroductory example Σ r r r s Acton LR Analyss: ntroductory example Σ r r r s Acton
LR Analyss: ntroductory example Σ r r r s Acton LR Analyss: ntroductory example Σ r r r s Acton
LR Analyss: exercse Problem Repeat the analyss performed by the prevous example, usng the analyss table and nput strng shown n the next page. LR Analyss: example state AnalzadorLR I * Σ F d * d d d d d d d d d r r d d d r r 9 *F F F d d d 9 d r r r r r r r r Acton