Computation of Substring Probabilities in Stochastic Grammars Ana L. N. Fred Instituto de Telecomunicac~oes Instituto Superior Tecnico IST-Torre Norte

Size: px

Start display at page:

Download "Computation of Substring Probabilities in Stochastic Grammars Ana L. N. Fred Instituto de Telecomunicac~oes Instituto Superior Tecnico IST-Torre Norte"

Georgina Watts
5 years ago
Views:

1 Computation of Substring Probabilities in Stochastic Grammars Ana L. N. Fred Instituto de Telecomunicac~oes Instituto Superior Tecnico IST-Torre Norte, Av. Rovisco Pais, Lisboa, Portugal Abstract. The computation of the probability of string generation according to stochastic grammars given only some of the symbols that compose it underlies pattern recognition problems concerning the prediction and/or recognition based on partial observations. This paper presents algorithms for the computation of substring probabilities in stochastic regular languages. Situations covered include prex, sux and island probabilities. The computational time complexity of the algorithms is analyzed. 1 Introduction The computation of the probability of string generation according to stochastic grammars given only some of the symbols that compose it, underlies some pattern recognition problems such as prediction and recognition of patterns based on partial observations. Examples of this type of problems are illustrated in [2{ 4], in the context of automatic speech understanding, and in [5, 6], concerning the prediction of a particular physiological state based on the syntactic analysis of electro-encephalographic signals. Another potential application, in the area of image understanding, is the recognition of partially occluded objects based on their string contour descriptions. Expressions for the computation of substring probabilities according to stochastic context-free grammars written in Chomsky Normal Form have been proposed [1{3]. In this paper algorithms for the computation of substring probabilities for regular-type languages expressed by stochastic grammars of the form! F i F i! F i! F j 2 F i F j 2 V N ( representing the grammar start symbol and V N corresponding to the nonterminal and terminal symbols sets, respectively) are described. This type of grammars arises, for instance, in the process of grammatical inference based on Crespi-Reghizzi's method [7] when structural samples assume the form [:::[dc[fgb[e[cd[ab]]]] (meaning some sort of temporal alignment of sub-patterns).

2 The algorithms presented next explore the particular structure of these grammars, being essentially dynamic programming methods. Situations described include the recognition of xed length strings (probability of exact recognition { section 3.1, highest probability interpretation { section 3.2) and arbitrary length strings (prex { section 3.3, sux { section 3.5 and island probabilities { section 3.4). The computational complexity ofthemethods is analyzed in terms of worst case time complexity. Minor and obvious modications of the above algorithms enable the computation of probabilities according to the standard form of regular grammars. 2 Notation and Denitions Let G =(V N R s )beastochastic context-free grammar, where V N is the - nite set of non-terminal symbols or syntactical categories is the set of terminal symbols (vocabulary) R s is a nite set of productions of the form p i : A!, A 2 V N, 2 ( [ V N ), the star representing any combination of symbols in the set and p i is the rule probability and 2 V N is the start symbol. When rules take the particular form A! ab or A! a, with A B 2 V N and a 2, then the grammar is designated as nite-state or regular. Along the text the symbols A, G and will be used to represent non-terminal symbols. The following denitions are also useful: def w 1 :::w n = nite sequence of terminal symbols ) def = derivations of from through the application of an arbitrary number of rules C T () def = number of terminal symbols in (with repetitions) C N () def = number of non-terminal symbols in (with repetitions) n min ( G) def = min n max fc T () :! Gg The following graphical notation is used to represent derivation trees: { Arcs are associated with the direct application of rules. For instance, the rule! w 1 w 2 G is represented by or w1w2 G w1w2 G { The triangle represents derivation trees having the top non-terminal symbol as root and leading to the string on the base of the triangle: ) w 1 :::w n w1 :::w n

3 3 Algorithms Description 3.1 Probability of Derivation of Exact String { Pr( ) w i :::w in) Let Pr ( ) w i :::w in ) be the probability of all derivation trees having as root and generating exactly w i :::w in. According to the type of grammars considered the computation of this probability can be obtained as follows (see gure 1): Pr( ) w i :::w in )= G Pr(! G)Pr(G ) w i :::w in ) (1) Pr( ) w i :::w in ) 6= =Pr(! w i :::w in ) n min( G) k=1 Pr(! w i :::w ik;1 G)Pr(G ) w ik :::w in ) (2) w i w i1 A1 ::: w in w iw i1 ::: w i2 A1 ::: w in w i :::w in;1 A1 w in Fig. 1. Terms of the expression 2. This can be summarized by the iterative algorithm: 1. For all non-terminal symbols 2 V N ;fg, determine Pr(! w in ) 2. For k =1 :::n and for all non-terminal symbols 2 V N ;fg, compute: Pr( ) w in;k :::w in ) =Pr(! w in;k :::w in ) n min( G) j=1 Pr(! w in;k :::w in;kj;1 G) Pr(G ) w in;kj :::w in ) (3)

4 3. Pr( ) w i :::w in )= G Pr(! G)Pr(G ) w i :::w in ) (4) This corresponds to lling in the matrix of gure 2, column by column, from the right to the left. In this gure, each element corresponds to the probability of all derivation trees for the substring on the top of the column, with root on the non-terminal symbols indicated on the left of the row, i.e. Pr ( i ) w j :::w in ). w i :::w in ::: w in;1 :::w in w in 1 Pr (1! w in) 2 Pr (2 ) w in;1w in). jvn j Fig. 2. Array representing the syntactic recognition of strings according to the algorithm. Notice that the calculations on step 2 are based only on direct rules and access to previously computed probabilities on columns to the right of the position under calculation. Computational complexity analysis For strings of length n, the computational cycle is repeated n times, all non-terminal symbols being considered. Associating to the terms in equation 3 A: Pr(! w in;k :::w in ) B: G n min( G) j=1 Pr(! w in;k :::w in;kj;1 G)Pr(G ) w in;kj :::w in ) the worst case time complexity is of the order O 0 (jv N j;1) step 1 n;1 k=1 (jv N j;1) 0 B computation {z of A 1 computation of B C A step 2 {z} step 3 1 C A = O (jv N jn)

5 3.2 Maximum Probability of Derivation { P m ( ) w i :::w in) Let P m ( ) w 1 :::w n ) denote the highest probability over all derivation trees having as root and producing exactly w 1 :::w n and dene the matrix M u as follows: M u [i j] =P m ( i ) w j :::w n ) i =1 ::: jv N j j =1 ::: n (5) Observing that P m ( ) w 1 :::w n )= max G n o P m (G ) w 1 :::w n ) (6) and P m ( ) w 1 :::w n ) 6= =maxfpr(! w 1 :::w n ) max i G n Pr(! w 1 :::w i G)P m (G ) w i1 :::w n ) o (7) the following algorithm computes the desired probability: 1. For i =1 ::: jv N j i 6= M u [i n] = Pr(i! w n ) if ( i! w n ) 2 R s 0 otherwise 2. For j = n ; 1 ::: 1 and for i =1 ::: jv N j i 6= M u [i j] = max f Pr( i! w j :::w n ) max fpr( i! w j :::w k l )M u [l k 1]g g k>j l 3. For i : i = M u [i 1] =max j fpr(! j )M u [j 1]g This algorithm corresponds to the computation of an array similar to the one in gure 2, but where probabilities refer to maximum values of single derivations instead of total probability of derivation when ambiguous grammars are considered. Based on the similarity between this algorithm and the one developed in section 3.1 it is straightforward to conclude that it runs in O (jv N jn) time.

6 3.3 Prex Probability {Pr( ) w i :::w in ) The probability of all derivation trees having as root and generating arbitrary length strings with the prex w i :::w in {Pr( ) w i :::w in )){canbe expressed as follows: Pr( ) w i :::w in )= G Pr(! G)Pr(G ) w i :::w in ) (8) Pr( ) w i :::w in ) 6= =Pr(! w i :::w in ) n min( G) Pr(! w i :::w ik;1 G)Pr(G ) w ik :::w in ) G k=1 Pr(! w i :::w in G) G Pr(! w i :::w in v 1 :::v k G) G k2n =Pr(! w i :::w in ) G G k2n 0 n min( G) k=1 Pr(! w i :::w ik;1 G)Pr(G ) w ik :::w in ) Pr(! w i :::w in v 1 :::v k G) (9) ::: wi:::win wi A1 wi:::win;1 A1 wi:::win A1 wi:::win A1 wi1:::win win Fig. 3. Terms of the expression 9. The previous expression suggest the following iterative algorithm: 1. For all nonterminal symbols 2 V N ;fg, determine Pr(! w in ) k2n Pr(! w in v 1 :::v k ) 2. For k =1 ::: n and for all nonterminal symbols 2 V N ;fg, compute Pr( ) w in;k :::w in )

7 =Pr(! w in;k :::w in ) G G k min( G) j=1 j2n 0 Pr(! w in;k :::w in;kj;1 G) Pr(G ) w in;kj :::w in ) Pr(! w in;k :::w in v 1 :::v j G) (10) 3. Pr( ) w i :::w in )= G Pr(! G)Pr(G ) w i :::w in ) (11) This algorithm, like the one of section 3.1, corresponds to lling in a parsing matrix similar to the one in gure 2. The similarity with the algorithm in section 3.1 leads to the conclusion that this algorithm has O (jv N jn) time complexity. 3.4 Island Probability {Pr ( ) w i :::w in ) The island probability consists of Pr ( ) w i :::w in ), the probability of all derivation trees with root in that generate arbitrary length sequences containing the subsequence w i :::w in. Let P R (! G) def = Pr(! G) 2 (12) = probability of rewriting by sequences with the nonterminal symbol G as sux One can write Pr( ) w i :::w in )= G Pr(! G)Pr(G ) w i :::w in ) (13) Pr( ) w i :::w in ) 6= = G P R (! G)Pr(G ) w i :::w in ) n min( G) j=1 P R (! w i :::w ij;1 G)Pr(G ) w ij :::w in ) P R (! w i :::w in G)Pr(G ) ) =1

8 j k j k Pr(! v 1 :::v k w i :::w in z 1 :::z j G)Pr(G ) ) =1 Pr(! v 1 :::v k w i :::w in z 1 :::z j ) (14) For strings suciently long ( n> max G fc T () :G! g) the last three terms do not exist, therefore we will ignore them from now on. G wi:::wik G wi:::win G wi:::win wik1:::win wi:::win G wi:::win Fig. 4. Terms of the expression 14. Pr( ) w i :::w in ) 6= = G P R (! G)Pr(G ) w i :::w in ) n min( G) j=1 P R (! w i :::w ij;1 G)Pr(G ) w ij :::w in ) (15) Recursively applying the above expression and after some manipulation (details can be found in [6]) one obtains: Pr( ) w i :::w in ) 6= = Q R ( ) A) P R (A! w i :::w ij;1 G)Pr(G ) w ij :::w in ) A G j2n P R (! w i :::w ij;1 G)Pr(G ) w ij :::w in ) (16) G j2n with Q R ( ) G) =Pr( ) G) =P R (! G) P R (! A)P R (A! G) P R (! A 1 )P R (A 1! A 2 ) A A 1A 2 P R (A 2! G)::: (17) Q R ( ) G) obeys the equation Q R ( ) G) = P R (! A)Q R (A ) G)P R (! G) (18) A

9 Dening the matrices P R [ G] =P R (! G) (19) Q R [ G] =Q R ( ) G) (20) Q is given by [6]: The algorithm can thus be described as: Q R = P R [I ;P R ] ;1 (21) 1. O-line computation of Q R = P R [I ;P R ] ;1. 2. On-line computation of Pr(G ) w n ) Pr(G ) w n;1 w n ) ::: Pr(G ) w 2 :::w n ) for all nonterminal symbols G 2 V N ;fg using the algorithm in section For all nonterminal symbols 2 V N ;fg compute 4. Pr( ) w i :::w in ) = A Q R ( ) A) G j2n P R (A! w i :::w ij;1 G) Pr(G ) w ij :::w in ) j2n P R (! w i :::w ij;1 G)Pr(G ) w ij :::w in ) Pr( ) w i :::w in )= G Pr(! G)Pr(G ) w i :::w in ) The required on-line computations have the following time complexity: 0 B N j;1)(n ; 1) (jv N j;1)jv N j step 2 ; = max(o (jv N jn) O jv N j 2 step 3 A = step 4 {z} 1 C 3.5 Sux Probability {Pr() w i :::w in) Let Pr ( ) w i :::w in ) be the probability of all derivation trees with root in generating arbitrary length strings having w i :::w in as a sux. This probability can be derived as follows: Pr( ) w i :::w in )= G Pr(! G)Pr(G ) w i :::w in ) (22)

10 Pr( ) w i :::w in ) 6= = G P R (! G)Pr(G ) w i :::w in ) n min( G) j=1 P R (! w i :::w ij;1 G)Pr(G ) w ij :::w in ) P R (! w i :::w in ) (23) G wi:::wik G wi:::win wi:::win wik1 :::w in Fig. 5. Terms of the expression 23. For strings suciently long ( n> max G fc T () :G! g) the third term does not exist, so it will be ignored henceforth. Recursively applying the resulting expression to the second part of the rst term one obtains: Pr( ) w i :::w in ) 6= = A 1 ) w ij :::w in ) P R (! w i :::w ij;1 A 1 )Pr(A 1 j2n P R (! A 1 )P R (A 1! w i :::w ij;1 A 2 ): j2n A 1 A 2 :Pr(A 2 ) w ij :::w in ) ::: = A A 1 ::: A k P R (! A 1 )P R (A 1! A 2 ) ::: :::P R (A k;1! w i :::w ij;1 A k )Pr(A k ) w ij :::w in )::: (24) Q R ( ) A) G j2n P R (A! w i :::w ij;1 G)Pr(G ) w ij :::w in ) j2n P R (! w i :::w ij;1 G)Pr(G ) w ij :::w in ) (25) The algorithm is then: 1. O-line computation of Q R = P R [I ;P R ] ;1.

11 2. On-line computation of Pr(G ) w n ) Pr(G ) w n;1 w n ) ::: Pr(G ) w 2 :::w n ) for all non-terminal symbols G 2 V N ;fg using the algorithm in section For all non-terminal symbols 2 V N ;fg compute 4. Pr( ) w i :::w in ) = A Q R ( ) A) G j2n P R (A! w i :::w ij;1 G)Pr(G ) w ij :::w in ) j2n P R (! w i :::w ij;1 G)Pr(G ) w ij :::w in ) Pr( ) w i :::w in )= G Pr(! G)Pr(G ) w i :::w in ) On-line computations have the time complexity: 0 B N j;1)(n ; 1) (jv N j;1) step 2 step 3 A = O (jv N jn) step 4 {z} 1 C 4 Conclusions This paper described several algorithms for the computation of substring probabilities according to grammars written in the form! F i F i! F i! F j 2 F i F j 2 V N Table 1 summarizes the probabilities considered here, and the order of complexity of the associated algorithms. Algorithm Expression time Complexity Fixed length Pr( ) w1 :::w n) O(jV N jn) strings Pm( ) w1 :::w n) O(jV N jn) Arbitrary Pr( ) w1 :::w n ) O(jV N jn) length strings Pr( ) w1 :::w n) O(jV N jn) Pr( ) w1 :::w n ) max(o (jv N jn) O ; jv N j 2 Table 1. Summary of proposed algorithms for the computation of sub-string probabilities.

12 More general algorithms for the computation of sub-string probabilities according to stochastic context-free grammars, written in Chomsky Normal Form, can be found in [1{3]. owever, the later have O(n 3 ) time complexity [2]. The herein proposed algorithms, exhibiting linear time complexity in string's length, represent a computationally appealing alternative to be used whenever the application at hand can adequately be modeled by the types of grammars described above. Examples of application of the algorithms described in this paper can be found in [5, 6, 8, 9]. References 1. F. Jelinek, J. D. Laerty and R. L. Mercer. Basic Methods of Probabilistic Context Free Grammars. In Speech Recognition and Understanding. Recent Advances, pages 345{360. Springer-Verlag, A. Corazza, R. De Mori, R. Gretter, and G. Satta. Computation of Probabilities for an Island-Driven Parser. IEEE Trans. Pattern Anal. Machine Intell., vol. 13, No. 9, pages 936{949, A. Corazza, R. De Mori, and G. Satta. Computation of Upper-Bounds for Stochastic Context-Free Languages. In proceedings AAAI-92, pages 344{349, A. Corazza, R. De Mori, R. Gretter, and G. Satta. Some Recent results on Stochastic Language Modelling. In Advances in Structural and Syntactic Pattern Recognition, World-Scientic, pages 163{183, A. L. N. Fred, A. C. Rosa, and J. M. N. Leit~ao. Predicting REM in sleep EEG using a structural approach. In E. S. Gelsema and L. N. Kanal, editors, Pattern Recognition in Practice IV, pages 107 { 117. Elsevier Science Publishers, A. L. N. Fred, Structural Pattern Recognition: Applications in Automatic Sleep Analysis, PhD Thesis, Technical University of Lisbon, K. S. Fu and T. L. Booth. Grammatical inference: Introduction and survey { part IandII. IEEE Trans. Pattern Anal. Machine Intell., PAMI-8:343{359, May Ana L. N. Fred and T. Paiva. Sleep Dynamics and Sleep Disorders: a Syntactic Approach to ypnogram Classication. Em 10th Nordic-Baltic Conference on Biomedical Engineering and 1st International Conference on Bioelectromagnetism, pp 395{ 396, Tampere, Finl^andia Junho, A. L. N. Fred, J. S. Marques, P. M. Jorge, idden Markov Models vs Syntactic Modeling in Object Recognition, Proc. Intl. Conference on Image Processing,ICIP'97, 1997.

straight segment and the symbol b representing a corner, the strings ababaab, babaaba and abaabab represent the same shape. In order to learn a model,

straight segment and the symbol b representing a corner, the strings ababaab, babaaba and abaabab represent the same shape. In order to learn a model, The Cocke-Younger-Kasami algorithm for cyclic strings Jose Oncina Depto. de Lenguajes y Sistemas Informaticos Universidad de Alicante E-03080 Alicante (Spain) e-mail: oncina@dlsi.ua.es Abstract The chain-code