Topics in IT Prsing nd Pttern Recognition Week Context-Free Prsing College of Informtion Science nd Engineering Ritsumeikn University
this week miguity in nturl lnguge in mchine lnguges top-down, redth-first prser: Erley s lgorithm otining derivtion from the prser chrt
miguity ungyun Seo nd Roert F. Simmom Syntctic Grphs: A Representtion for the Union of All Amiguous prse Trees \f I vpp rood npp [,rrow,nl I I Sentence: Time flies like n rrow. Figure 6 Grph Representtion nd Prse Trees of Highly Amiguous Sentence. OED: most-used 00 words verge menings ech reding, ecuse ech node cn e modifier node only once in one reding. Therefore, we cn focus on the rcs pointing to the sme node s miguous points. In terms of triples, ny two triples with identicl modifier terms revel point of miguity, where modifier term is dominted y more thn one node. In the exmple in Figure, the syntctic miguities ech position must prticipte in every syntctic reding of syntctic grph, every node which is not root node nd hs only one in-rc, must lwys e included in every syntctic reding. Such unmiguous nodes re common to the intersections of ll possile redings. When we know the exct loctions of severl pieces in jigsw puzzle, it is much esier to plce the other
Jungyun Seo nd Roert F. Simmons miguity Syntctic Grphs: A Representtion for the Union of All Amiguous Prse Trees., Jl [I'se''vl I vtp. _... _ ~ /.I"....,,, i i,.,.,...o..o, I I dot Sentence: I sw mn on the hill with telescope. I[8,,rtl [ trees in shred, pcked-prse forest) We clim tht syntctic grph represented y the triples nd n exclusion mtrix contins ll importnt syntctic informtion in the prse forest. In the next section, we motivte this work with n exmple. Then we riefly introduce X (X-r) theory Figure : Syntctic Grph of the Exmple Sentence. OED: most-used 00 words verge menings ech I sw mn on the hill with telescope. I clened the lens to get etter view. When we red the first sentence, we cnnot determine whether the mn hs telescope or the telescope is used to see the mn. This is known s the PP-ttchment prolem, nd mny reserchers hve proposed vrious wys to solve it (Frzier nd Fodor 979; Shuert 98,
miguity ny grmmr cn e mde miguous y dding duplictes: two derivtions for the empty string: A ɛ B ɛ infinite numer of derivtions for the empty string: A A ɛ left- nd right-recursive derivtions (nd ny comintion of them): nd... A A A ɛ A A + A A - A clssic exmple, dngling else : if (x) if (y) P; else Q;
miguity not prolem in trditionl lnguges designed/implemented y one person who is experienced in grmmrs who would never llow the ove exmples in their grmmr ut: there re lnguges with extensile grmmrs end user cn dd rules to existing grmmr without complete knowledge of the se grmmr without experience in lnguge design nd while we cnnot stop them reking the lnguge... we cn ttempt to produce correct results even when miguity is introduced 6
miguity recursive descent cnnot hndle these constructions something more powerful required: emrcing multiple derivtions (for miguity) immune to infinite left-recursion immune to infinite right-recursion etc. in other words technique to prse ny CFG we lredy sw NFAs deling esily with miguity prllel mtching lgorithm let s review how... 7
Let s try to mtch: "c" prllel recognition with NFAs egin y dding just the strt stte to the first set (input position 0) ( )*?c 0 c input string nd set of permitted sttes 6 7 8 c 9 8
prllel recognition with NFAs ( )*?c 6 0 c 7 8 input string nd set of permitted sttes c 9 9
prllel recognition with NFAs ( )*?c 6 0 c 7 8 input string nd set of permitted sttes c 9 0
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 input string nd set of permitted sttes c 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 input string nd set of permitted sttes c 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 input string nd set of permitted sttes c 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 input string nd set of permitted sttes c 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 input string nd set of permitted sttes c 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 7 8 input string nd set of permitted sttes c 9 6
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 7 8 input string nd set of permitted sttes c 9 7
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 7 8 input string nd set of permitted sttes c 9 8
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 7 8 input string nd set of permitted sttes c 9 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 7 8 input string nd set of permitted sttes c 9 0
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 7 8 input string nd set of permitted sttes c 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 7 8 input string nd set of permitted sttes c 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 7 8 input string nd set of permitted sttes c 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 7 8 input string nd set of permitted sttes c 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 7 8 input string nd set of permitted sttes c 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 7 8 input string nd set of permitted sttes c 9 6
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 7 8 input string nd set of permitted sttes c 9 7
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 7 8 input string nd set of permitted sttes c 9 8
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 6 7 8 input string nd set of permitted sttes c 9 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 6 7 8 input string nd set of permitted sttes c 9 0
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 6 7 8 input string nd set of permitted sttes c 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 6 7 8 input string nd set of permitted sttes c 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 6 7 8 7 8 input string nd set of permitted sttes c 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 6 7 8 7 8 input string nd set of permitted sttes c 9
prllel recognition with NFAs ( )*?c 6 0 c 6 7 8 8 6 7 8 7 8 input string nd set of permitted sttes c 9
prllel recognition with NFAs ( )*?c 0 c input string nd set of permitted sttes 6 8 6 9 7 7 8 8 6 7 8 c 9 6
prllel recognition with NFAs ( )*?c 0 c input string nd set of permitted sttes 6 8 6 9 7 7 8 8 6 7 8 c 9 7
prllel recognition with NFAs ( )*?c 0 c input string nd set of permitted sttes 6 8 6 9 7 7 8 8 6 7 8 c 9 8
prllel recognition with NFAs ( )*?c 0 c input string nd set of permitted sttes 6 8 6 9 7 7 8 8 6 7 8 c 9 9
prllel recognition with NFAs ( )*?c 0 c input string nd set of permitted sttes 6 8 6 9 7 7 8 8 6 7 8 c 9 0
prllel recognition with NFAs ( )*?c 0 c input string nd set of permitted sttes 6 8 6 9 7 7 8 8 6 7 8 c 9
prllel recognition with NFAs Success! ( )*?c 0 c input string nd set of permitted sttes 6 8 6 9 7 7 8 8 6 7 8 c 9
tht technique ws useful to turn top-down prsing ck-trcking NFA implementtion into prllel (non-ck-trcking) NFA implementtion we sw (lst week) how top-down ck-trcking prser cn prse input ccording to (restricted) CFG grmmr y expnding sententil forms rememering positions s prser items cking up when the derivtion cnnot continue we will remove ck-trcking from top-down prsing using similr technique to the one we used to remove ck-trcking from NFA mtching first review of the ck-trcking top-down prsing...
top-down prsing: ck-trcking production sententil input L,, grmmr for list of two or more digits: L N, L N, L,, N, L,, L N, L, L,, L N, N, L,, L N, L, N, L,, N [0 9] 6 N,, L,, 7,, L,, 8,, L,, 9 L N, L,, N, L,, 0 N,,, L,,,,, L,,,,, L,, input, ck-trck to 8,, L,, nd try different prodution for L L N, N,, N, N,, N,,, N,, 6,,, N,, input, ck-trck to 7, L,, nd try different production for L 8 L N, N, N, N,, 9 N,, N,, 0,, N,,,, N,, N,,,,,,,, derivtion:,, 8, 9,
exmple: prllel prsing of CFGs using the grmmr P S S S + M M M M * T T T [0-9] # the strt rule let s consider prsing the sentence: + * the initil input position will e: +* the initil stte (item) set will e: S 0 = {(P S)}
exmple: prllel prsing of CFGs P S S S + M M M M * T T T [0-9] # the strt rule + * S 0 = (P S) initil stte inspecting the first item in the set... the item s position ( ) is immeditely efore non-terminl S we predict n S might pper t this input position we dd items corresponding to ech production of S to the set (there re two of them) 6
exmple: prllel prsing of CFGs P S S S + M M M M * T T T [0-9] # the strt rule + * S 0 = (P S ) initil stte (S S + M) predict from (S M ) predict from inspecting the next item in the set... the items corresponding to ech production of S re lredy in the set, so there is nothing to do 7
exmple: prllel prsing of CFGs P S S S + M M M M * T T T [0-9] # the strt rule + * S 0 = (P S ) initil stte (S S + M) predict from (S M ) predict from inspecting the next item in the set... we predict n M might pper t this input position we dd items for ech production of M to the set (there re two of them) 8
exmple: prllel prsing of CFGs P S S S + M M M M * T T T [0-9] # the strt rule + * S 0 = (P S ) initil stte (S S + M) predict from (S M ) predict from (M M * T ) predict from (M T ) predict from inspecting the next item in the set... we predict n M might pper t this input position the items corresponding to ech production of M re lredy in the set, so there is nothing to do 9
exmple: prllel prsing of CFGs P S S S + M M M M * T T T [0-9] # the strt rule + * S 0 = (P S ) initil stte (S S + M) predict from (S M ) predict from (M M * T ) predict from (M T ) predict from inspecting the next item in the set... we predict T might pper t this input position dd n item corresponding to its production... 0
exmple: prllel prsing of CFGs P S S S + M M M M * T T T [0-9] # the strt rule + * S 0 = (P S ) initil stte (S S + M) predict from (S M ) predict from (M M * T ) predict from (M T ) predict from 6 (T ) predict from the current item is positioned efore terminl we check the input for the sme terminl, nd if present... dd n item to the next set representing the prser stte fter scnning pst the terminl
exmple: prllel prsing of CFGs P S S S + M M M M * T T T [0-9] # the strt rule + * S 0 = (P S ) initil stte (S S + M) predict from (S M ) predict from (M M * T ) predict from (M T ) predict from 6 (T ) predict from S = (T ) scn from S 0.6 there re no more sttes left in set S 0, so... dvnce the input position to the next terminl symol repet the process with the next set, S
exmple: prllel prsing of CFGs P S S S + M M M M * T T T [0-9] # the strt rule + * S 0 = (P S ) initil stte (S S + M) predict from (S M ) predict from (M M * T ) predict from (M T ) predict from 6 (T ) predict from S = (T ) scn from S 0.6 the current item (now in S ) is positioned t the end of its production we hve completely recognised T we cn dvnce pst it in the set where it ws originlly predicted: T T ut, how do we know tht the item we re looking for is numer in set S 0?
exmple: prllel prsing of CFGs let s dd the originl input position to our stte set items insted of storing sets of (X α β) we will store sets of (j : X α β) where j is the input position where this X ws originlly predicted then, when we rech the end of n item, such s (j : X αβ ) we look in set S j for ny items tht originlly predicted X those items will contin α X β which cn now e dvnced to α X β
reooted exmple using the grmmr P S S S + M M M M * T T T [0-9] # the strt rule let s consider prsing the sentence: + * the initil input position will e: +* the initil stte set will e: S 0 = { (0 : P S) }
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 S + S S * S S (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) ( 0 : T [0-9] ) we completed T t position 0, so... copy the corresponding item from S 0 to S with the T chnged to T 6
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 S + S S * S S (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) (0 : T [0-9] ) ( 0 : M T ) we completed n M t position 0, so... copy the corresponding items to S with the M chnged to M 7
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 S + S S * S S (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) (0 : T [0-9] ) (0 : M T ) ( 0 : M M * T) (0 : S M ) the next item requires * on the input, which is not there move onto the next item 8
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 S + S S * S S (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) (0 : T [0-9] ) (0 : M T ) (0 : M M * T) ( 0 : S M ) we hve completed n S t position 0 copy the corresponding items with S chnged to S 9
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 S + S S * S S (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) ( 0 : S S + M) (0 : P S ) we need to scn +, which is the next input symol, so... copy the current item to the next set, with + chnged to + 60
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 S + S S * S S (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) ( 0 : P S ) (0 : S S + M) we hve lso completed the strt rule P in position 0 this input ( ) would e vlid prse, ut... we re not t the end of the input, so we cn continue 6
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 S + S S * S S (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) (0 : S S + M) there re no more items left in S, so... dvnce the input to the next terminl symol egin considering the next stte set, S 6
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 S + S S * S S (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) ( 0 : S S + M) predict n M dd two items corresponding to the two productions for M 6
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 S + S S * S S (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) (0 : S S + M) ( : M M * T ) ( : M T ) predict n M gin ll corresponding items re lredy present in S, so do nothing 6
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 S + S S * S S (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) (0 : S S + M) ( : M M * T ) ( : M T ) predict T dd the corresponding item 6
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 S + S S * S S (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) we need to scn digit which we hve ( ), so dd the scnned item to the next stte set, S 66
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 S + S S * S S (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) ( : T [0-9] ) no more items left in S dvnce the input to the next terminl, nd consider the next stte set, S 67
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) * S S we hve completed T in S dd the corresponding item to the set, chnging T to T 68
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) * S S we hve completed n M in S dd the corresponding items to the set, chnging M to M 69
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 S + S S * S S (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) ( : T [0-9] ) ( : M T ) ( : M M * T ) (0 : S S + M ) we need to scn *, which is present on the input dd the corresponding item to the next set, chnging * to * 70
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) ( : M M * T ) ( 0 : S S + M ) * S ( : M M * T) S we hve completed n S in S 0 dd the corresponding items to the set, chnging S to S 7
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) ( : M M * T ) (0 : S S + M ) ( 0 : P S ) (0 : S S + M) * S ( : M M * T) S we hve completed P in S 0 this input ( + ) would e vlid prse, ut there is more input continue with the next item 7
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) ( : M M * T ) (0 : S S + M ) (0 : P S ) ( 0 : S S + M) * S ( : M M * T) S we need to scn +, which is not present move on to the next item 7
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) ( : M M * T ) (0 : S S + M ) (0 : P S ) (0 : S S + M) * S ( : M M * T) S no more items in S dvnce input, consider the next stte set S 7
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) ( : M M * T ) (0 : S S + M ) (0 : P S ) (0 : S S + M) * S ( : M M * T) S predict T dd the corresponding item 7
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) ( : M M * T ) (0 : S S + M ) (0 : P S ) (0 : S S + M) * S ( : M M * T) ( : T [0-9] ) S need to scn digit, which is present on the input ( ) copy the item to the next set, with [0-9] chnged to [0-9] 76
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) ( : M M * T ) (0 : S S + M ) (0 : P S ) (0 : S S + M) * S ( : M M * T) ( : T [0-9] ) S ( : T [0-9] ) no more items in S dvnce input, consider S 77
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) ( : M M * T ) (0 : S S + M ) (0 : P S ) (0 : S S + M) * S ( : M M * T) ( : T [0-9] ) S ( : T [0-9] ) completed T in S copy the corresponding item to the current set, with T chnged to T 78
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) ( : M M * T ) (0 : S S + M ) (0 : P S ) (0 : S S + M) * S ( : M M * T) ( : T [0-9] ) S ( : T [0-9] ) ( : M M * T ) completed n M in S copy the corresponding item to the current set, with M chnged to M 79
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) ( : M M * T ) (0 : S S + M ) (0 : P S ) (0 : S S + M) * S ( : M M * T) ( : T [0-9] ) S ( : T [0-9] ) ( : M M * T ) ( : M M * T ) (0 : S S + M ) need to scn *, which is not present ignore the item, move on to the next item 80
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) ( : M M * T ) (0 : S S + M ) (0 : P S ) (0 : S S + M) * S ( : M M * T) ( : T [0-9] ) S ( : T [0-9] ) ( : M M * T ) ( : M M * T ) ( 0 : S S + M ) completed n M in S copy the items, with M chnged to M 8
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) ( : M M * T ) (0 : S S + M ) (0 : P S ) (0 : S S + M) * S ( : M M * T) ( : T [0-9] ) S ( : T [0-9] ) ( : M M * T ) ( : M M * T ) (0 : S S + M ) ( 0 : S S + M) (0 : P S ) need to scn +, which is not present on the input move on to the next item 8
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) ( : M M * T ) (0 : S S + M ) (0 : P S ) (0 : S S + M) * S ( : M M * T) ( : T [0-9] ) S ( : T [0-9] ) ( : M M * T ) ( : M M * T ) (0 : S S + M ) (0 : S S + M) ( 0 : P S ) completed P (the strt symol) in S 0 (strt of input) this ( +* ) is vlid prse of the input there is no more input: ccept this s one possile derivtion 8
P S S S + M M M M * T T T [0-9] complete exmple # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) ( : M M * T ) (0 : S S + M ) (0 : P S ) (0 : S S + M) * S ( : M M * T) ( : T [0-9] ) S ( : T [0-9] ) ( : M M * T ) ( : M M * T ) (0 : S S + M ) (0 : S S + M) (0 : P S ) there re no more items in this set, nd no more sets there re no more possile prses of the input ll derivtions ccepted so fr re possile prses of the input 8
complete exmple P S S S + M M M M * T T T [0-9] # the strt rule S 0 (0 : P S ) (0 : S S + M) (0 : S M ) (0 : M M * T ) (0 : M T ) (0 : T [0-9] ) S (0 : T [0-9] ) (0 : M T ) (0 : M M * T ) (0 : S M ) (0 : S S + M) (0 : P S ) + S (0 : S S + M) ( : M M * T ) ( : M T ) ( : T [0-9] ) S ( : T [0-9] ) ( : M T ) ( : M M * T ) (0 : S S + M ) (0 : S S + M) (0 : P S ) * S ( : M M * T) ( : T [0-9] ) S ( : T [0-9] ) ( : M M * T ) ( : M M * T ) (0 : S S + M ) (0 : S S + M) (0 : P S ) we cn formulte this process s few simple rules... 8
prsing CFGs: items let upper-cse letters (e.g., X or Y) e non-terminl symols lower-cse letters (e.g., p or q) e terminl symols greek letters (e.g., α or β) e ny sequence of symols n item is production with dot indicting the current position X α β nd so... X α β γ = n item tht my soon mtch X X α β γ = n item tht hs egun mtching X X α β γ = n item tht hs lmost mtched X X α β γ = n item tht hs finished mtching X 86
prsing CFGs: stte sets stte is tuple contining n item nd n input position (i : X α β) the item s production sys wht we re trying to mtch the dot sys how much of the production we hve mtched the position i indictes where the production egn the origin position of X for every input position, the prser genertes stte set position 0 is efore the first token of the input sentence position n is the position fter ccepting the n th token the stte set t input position k is clled S k if the strt symol is S nd the strt rule is S α then, initilly, S 0 = { (0 : S α) } 87
prsing CFGs there re three possile forms tht ech stte might tke: stte in S k interprettion (j : X α p β) we expect to see terminl p next on the input (j : X α Y β) we expect to mtch the entire non-terminl Y next (j : X γ ) we hve completely mtched n X t input position j ech cse suggests n pproprite response: stte in S k interprettion (j : X α p β) try to scn p t the current position (j : X α Y β) predict we might see Y t the current position (j : X γ ) complete the mtching of X in stte S j 88
prsing CFGs ech response stte in S k interprettion (j : X α p β) try to scn p t the current position (j : X α Y β) predict we might see Y t the current position (j : X γ ) complete the mtching of X in set S j trnsltes into mnipultions of (dditions to) the stte sets: stte in S k new stte(s) (j : X α p β) if the next input token is p, then scn p dd (j : X α, p β) to stte set S k+ (j : X α Y β) for every production in the grmmr for Y, Y γ predict Y dd (k : Y γ) to S k (j : X γ ) for ll sttes in S j of the form (i : Y α X β) complete X dd (i : Y α X β) to S k 89
prsing CFGs the result is chrt of the pths tken during prsing oth successful, nd unsuccessful ll successful derivtions will hve completion of the strt rule steps tken within derivtions cn e found esily identify completion of the initil (strt) rule follow completions nd scns ckwrds to the initil rule in S 0 mrk ech step encountered prse tree cn then e reconstructed from the mrked steps either following scns nd predictions forwrds (top-down) or following scns nd completions ckwrds (ottom-up) 90
prsing CFGs first vlid prse: S 0. (P S, 0) initil stte S 0. (S M, 0) predict from S 0. (M T, 0) predict from S 0.6 (T [0-9], 0) predict from S. (T [0-9], 0) scn from S 0.6 = (T [0-9], 0) S. (M T, 0) complete from, S 0. = (M T, 0) S. (S M, 0) complete from, S 0. = (S M, 0) S.6 (P S, 0) complete from, S 0. = (P S, 0) 9
nother vlid prse: + prsing CFGs S 0. (P S, 0) initil stte S 0. (S M, 0) predict from S 0. (M T, 0) predict from S 0.6 (T [0-9], 0) predict from S. (T [0-9], 0) scn from S 0.6 = (T [0-9], 0) S. (M T, 0) complete from, S 0. = (M T, 0) S. (S M, 0) complete from, S 0. = (S M, 0) S. (S S + M, 0) complete from, S 0. = (S S + M, 0) S. (S S + M, 0) scn from S. = (S M + M, 0) S. (M T, ) predict from S. (T [0-9], ) predict from S. (T [0-9], ) scn from S. S. (M T, ) complete from, S. = (M T, ) S. (S S + M, 0) complete from, S. = (S S + M, 0) S.6 (P S, 0) complete from, S 0. = (P S, 0) 9
prsing CFGs finl vlid prse: + * S 0. (P S, 0) initil stte S 0. (S M, 0) predict from S 0. (M T, 0) predict from S 0.6 (T [0-9], 0) predict from S. (T [0-9], 0) scn from S 0.6 S. (M T, 0) complete from, S 0. S. (S M, 0) complete from, S 0. S. (S S + M, 0) complete from, S 0. S. (S S + M, 0) scn from S. S. (M T, ) predict from S. (T [0-9], ) predict from S. (T [0-9], ) scn from S. S. (M T, ) complete from, S. S. (M M * T, ) complete from, S. S. (M M * T, ) scn from S. S. (T [0-9], ) predict from S. (T [0-9], ) scn from S. S. (M M * T, ) complete from, S. S. (S S + M, 0) complete from, S. S.6 (P S, 0) complete from, S 0. 9
Erley prsing this form of chrt prser ws invented y Jy Erley in 968 hence we cll it n Erley Prser it cn prse ny context-free grmmr LL nd LR comptile grmmrs in liner time: O(n) ny non-miguous grmmrs in t most qudrtic time: O(n ) ny miguous CF grmmr in (t most) cuic time: O(n ) where n is the size of the input sentence it performs prticulrly well with left-recursive rules very good for left-ssocitive opertors (e.g., most rithmetic opertors) it will deliver ll vlid prses of the input nd useful informtion even if there re no vlid prses Erley prsers re populr for nturl lnguge processing 9
Erley prsing function erley-prse(input, grmmr) = dd (0 : S γ) to S 0 for ech stte set S 0, S,..., S k for ech stte s S k if s = (j : X α p β) /* scn */ if input[k] = p dd (j : X α p β) to S k+ else if s = (j : X α Yβ) /* predict */ for ech (Y γ) grmmr dd (k : Y γ) to S k else /* s = (j : X γ ) */ /* complete */ for ech (i : A α X β) S j dd (i : A α X β) to S k 9
homework prctice Erley prsing using the input: 9*+ P S S S + M M M M * T T T [0-9] # the strt rule S 0 9 S * S S + S S 96
glossry complete step in n Erley prser tht finishes recognising production. All sttes tht were predicting the corresponding non-terminl t the position where the rule egn cn e dvnced nd dded to the current stte. extensile lnguge whose syntx (or semntics) cn e extended y the user, who is often not n expert in the domin of syntx (or semntics). item prser stte, used s n element of stte set or prser stte mchine, contining production rule nd dot representing the progress tht hs een mde in recognising the right-hnd side of tht rule. origin position the input position t which the recognising of production s right-hnd side ws egun. position numericl offset from the strt of the sentence, mesured in tokens. 97
predict step in n Erley prser tht predicts seeing non-terminl next in the input. All productions corresponding to the non-terminl re dded to the current stte set. scn step in n Erley prser tht mkes progress y dvncing dot over terminl symol when the next input symol mtches it. Progress is noted y dding new item to the next stte set with the dot moved to fter the terminl symol in the production s right hnd side. stte tuple in n Erley prser comining prsing item with n origin position recording where in the sentence the prsing of the rule ws egun. stte set in n Erley prser, set of sttes ssocited with specific position in the input. tuple pir of vlues tht re ssocited with ech other. In n Erley prser, ech stte is tuple tht comines prser item with n input position. 98