Congruence based approaches

Size: px
Start display at page:

Download "Congruence based approaches"

Transcription

1 Congruence based approaches Learnable representations for languages Alexander Clark Department of Computer Science Royal Holloway, University of London August 00 ESSLLI, 00

2 Outline Distributional learning Congruence classes Syntactic monoid Canonical CFG Algorithms Substitutable languages MAT learner NTS languages Conclusion

3 Context free grammar Tuple Σ, V, P, S V is a set of non-terminals S V is a start symbol P is a set of productions of the form V (V Σ), which we write as N α

4 Context free grammar Tuple Σ, V, P, S V is a set of non-terminals S V is a start symbol P is a set of productions of the form V (V Σ), which we write as N α Derivation relation: βnγ βαγ when N α P L(G, N){w N w} L(G) = {w S w}

5 Almost Chomsky normal form All rule are of the form N PQ N a N λ Multiple start symbols: G = Σ, V, P, I I V is a set of initial symbols. L(G) = S I L(G, S)

6 Distributional learning Several reasons to take distributional learning seriously: Cognitively plausible (Saffran et al,, Mintz, 00) It works in practice: large scale lexical induction (Curran, J. 00) Linguists use it as a constituent structure test (Carnie, A, 00) Historically, PSGs were intended to be the output from distributional learning algorithms: Chomsky (/00) The concept of "phrase structure grammar" was explicitly designed to express the richest system that could reasonable be expected to result from the application of Harris-type procedures to a corpus.

7 Distributional Learning Zellig Harris (4, ) Here as throughout these procedures X and Y are substitutable if for every utterance which includes X we can find (or gain native acceptance for) an utterance which is identical except for having Y in the place of X

8 Empirical work on Distributional Learning Real corpora Sample is not just of grammatical sentences Also semantically well-formed Also true in some non-technical sense Empirical distribution is very complex Distributional similarity in real corpora often reflects semantic relatedness.

9 Various notions of context Local syntactic context Immediately preceding and following word If the candidate has an outstanding examination result Word has context (candidate, an) Wide bag-of-words context Skip stop words Set of words occurring in the same sentence/discourse. Probabilistic models Vector of counts of frequent words Schütze

10 Distribution Full context Context (or environment) A context is just a pair of strings (l, r) Σ Σ. (l, r) u = lur f = (l, r). Special context (λ, λ) Given a language L Σ. Distribution of a string C L (u) = {(l, r) lur L} = {f f u L} Distributional Learning models/exploits the distribution of strings;

11 Simple CFL L = {a n b n n 0} L = {λ, ab, aabb... } Distribution of aab (λ, b) aab = aabb L Example (a, bb) aab = aaabbb L (λ, abb) aab = aababb L

12 Simple CFL L = {a n b n n 0} L = {λ, ab, aabb... } Distribution of aab (λ, b) aab = aabb L Example (a, bb) aab = aaabbb L (λ, abb) aab = aababb L C L (aab) = {(λ, b), (a, bb),... (a i, b i+ )... } C L (aaabb) = C L (aab) C L (a) = {(λ, b), (a, bb), (λ, abb)... }

13 Outline Distributional learning Congruence classes Syntactic monoid Canonical CFG Algorithms Substitutable languages MAT learner NTS languages Conclusion

14 Congruence classes and the syntactic monoid Congruence classes u v iff C L (u) = C L (v) Write [u] for class of u Two strings are congruent if they are perfectly substitutable in every context. Words: Tuesday/Wednesday, cat/dog, man/student

15 Congruence classes of a regular language Example L = (ab) [λ] = {λ} [a] = {a, aba, ababa,... } [b] = {b, bab, babab,... } [ab] = {ab, abab,... } [ba] = {ba, baba,... } [bb] every other string with C L (u) =

16 Regular language Theorem A regular languages has finitely many congruence classes. Proof Later. Number of congruence classes may be exponential in size of minimal DFA: Q Q.

17 Example blowup S0 a b b S a b S a a b S a b S4 a b S

18 Outline Distributional learning Congruence classes Syntactic monoid Canonical CFG Algorithms Substitutable languages MAT learner NTS languages Conclusion

19 Monoid Simple algebraic structure Definition Associative operation with an element u = u = u Examples Strings under concatenation Numbers under addition Matrices under multiplication

20 Syntactic monoid Concatenation If u u and v v then uv u v [u][v] [uv] Example L = (ab) [b][ab] = {b, bab,... }{ab, abab... } = {bab, babab... } {b, bab... } = [b] Syntactic monoid Σ / L [u] [v] = [uv] Associative and [λ] is identity

21 Zero Suppose Sub(L) is not equal to Σ There are strings that do not occur as a substring of any string in L. Example L = {(ab) } bb Sub(L) C L (bb) = Fact: if u Sub(L) then uv Sub(L) So we may have a congruence class 0; 0 X = 0 = X 0

22 Example L = {(ab) } [λ] [bb] [a] [b] [ab] [ba] [λ] [λ] [bb] [a] [b] [ab] [ba] [bb] [bb] [bb] [bb] [bb] [bb] [bb] [a] [a] [bb] [bb] [ab] [bb] [a] [b] [b] [bb] [ba] [bb] [b] [bb] [ab] [ab] [bb] [a] [bb] [ab] [bb] [ba] [ba] [bb] [bb] [b] [bb] [ba]

23 Exercise Language {w w a = w b } Language of equal numbers of a s and b s Question What is the syntactic monoid of this? What is it isomorphic too?

24 Outline Distributional learning Congruence classes Syntactic monoid Canonical CFG Algorithms Substitutable languages MAT learner NTS languages Conclusion

25 Context free grammar Suppose we have a grammar with non-terminals N, P, Q We have a rule N PQ This means that Y (N) Y (P)Y (Q).

26 Context free grammar Suppose we have a grammar with non-terminals N, P, Q We have a rule N PQ This means that Y (N) Y (P)Y (Q). Backwards Given a collection of sets of strings X, Y, Z Suppose X YZ Then we add a rule X YZ.

27 Congruence classes Partition of the strings Congruence classes have nice properties! [u][v] [uv] [uv] [u][v] [u] [v] [u][v] L = {a n b n n 0} [a] = {a} [abb] = {abb, aabbb,... } [a][abb] = {aabb, aaabbb... } [aabb] = [ab] So we have a rule [ab] [a][aab]

28 Constructing CFG from congruence classes One non-terminal per congruence class [uv] [u][v] [a] a [λ] λ I = {[u] [u] L} Multiple start symbols.

29 Example L = {a n b n n 0} Grammar I = {[ab], [λ]}

30 Example L = {a n b n n 0} Grammar I = {[ab], [λ]} [a] a, [b] b, [λ] λ

31 Example L = {a n b n n 0} Grammar I = {[ab], [λ]} [a] a, [b] b, [λ] λ [ab] [aab][b], [ab] [a][b], [ab] [a][abb] [aab] [a][ab], [abb] [ab][b]

32 Example L = {a n b n n 0} Grammar I = {[ab], [λ]} [a] a, [b] b, [λ] λ [ab] [aab][b], [ab] [a][b], [ab] [a][abb] [aab] [a][ab], [abb] [ab][b] Plus [ba] [b][ba]... Plus [a] [λ][a]...

33 Problems If we can figure out whether u L v then we can write down a grammar. Problem A In general we will have an infinite set of congruence classes Pick some finite subset of them This does not give us every CFL {a n b m n < m} Problem B Hard to tell whether u L v We need some way of testing this.

34 Outline Distributional learning Congruence classes Syntactic monoid Canonical CFG Algorithms Substitutable languages MAT learner NTS languages Conclusion

35 Three algorithms All based on the same representational idea:. Substitutable languages from positive data. Congruential languages with queries. NTS languages with stochastic positive examples

36 Outline Distributional learning Congruence classes Syntactic monoid Canonical CFG Algorithms Substitutable languages MAT learner NTS languages Conclusion

37 Two ideas Chomsky review of Greenberg, let us say that two units A and B are substitutable if there are expressions X and Y such that XAY and XBY are sentences of L.; substitutable if whenever XAY is a sentence of L then so is XBY and whenever XBY is a sentence of L so is XAY (i.e. A and B are completely mutually substitutable). These are the simplest and most basic notions. Problem: we need substitutability but what we observe is substitutability

38 Old concept John Myhill, 0 commenting on Bar-Hillel I shall call a system regular if the following holds for all expressions µ, ν and all wffs φ, ψ each of which contains an occurrence of ν: If the result of writing µ for some occurrence of ν in φ is a wff, so is the result of writing µ for any occurrence of ν in ψ. Nearly all formal systems so far constructed are regular; ordinary word-languages are conspicuously not so. Clark and Eyraud, 00 A language is substitutable if lur, lvr, l ur L means that l vr L.

39 substitutable and reversible Clark and Eyraud, 00 A language is substitutable if lur, lvr, l ur L means that l vr L. Angluin, A language is reversible if ur, vr, ur L means that vr L.

40 A Bad Intuition One context in common in enough The cat died The dog died So cat and dog are congruent

41 A Bad Intuition One context in common in enough The cat died The dog died So cat and dog are congruent He is an Englishman He is thin So thin and an Englishman are congruent.

42 Result Clark and Eyraud, 00/00 Polynomial result The class of substitutable context free languages is polynomially identifiable in the limit from positive data only. Polynomial characteristic set Polynomial update time Why the delay?

43 A Simple Algorithm Non-technical description Given a sample of strings W = {w,..., w n }. Define a graph G = N, E N is the set of all non-empty substrings (factors) of W. E = {(u, v) (l, r), lur W lvr W }. Define a grammar in Chomsky normal form The set of non-terminals is the set of components of the graph G Have productions for: [a] a Add rules for non terminals: [w] [u][v] iff [w] = [uv].

44 Simple linguistically motivated example the man who is hungry died. the man ordered dinner. the man died. the man is hungry. is the man hungry? the man is ordering dinner. is the man who is hungry ordering dinner? is the man who hungry is ordering dinner?

45 Substitution graph Auxiliary fronting example

46 Derivations [is the man hungry?] [is the man hungry] [?] [is the man] [hungry] [?] [is] [the man] [hungry] [?] [is] [the man] [who is hungry] [hungry] [?] [is] [the man] [who is hungry] [ordering dinner] [?]. Note that this will not work for all data since English is not substitutable (Berwick, Coen and Niyogi, p.c.)

47 Substitution graph Learnable example p p p p e j d a m g p e j d a m g n p e j d a m g n n p e j d a m g n n o m p e j d a m g n n o m h p e j d a m g n n o m h e p e j d a m g n n o m h e r p e j d a m g o 0 p e j d a m g o b 4 p e j d a m g o b j 4 p e j d a m g o b j r p e j d a m g o b j r t p e j d a m g o b j p e j d a m g o b j o p n 4 p e g b t s j r i s p s p s h p m t p r l q o p d 0 p r l q o p d i p r n c c m p r n c c m t p c p p c n j a i q o j d p n p j a p o p p o 4 p o m c 4 p k 4 e e l b p e l b p m e l b p b 0 e s 0 e s n i s h b t f e n p p q q e j d a m g n n o m h e r e j d a m g o b j r t e j d a m g o b j o p n 4 e g b t s j r i s e o 44 e o s e o c 4 e o g s i j t 4 t l p t b 4 t l p t b c 0 t s 0 t s p e j d a m g 4 t s p e j d a m g o 4 t s p e j d a m g o b 4 t s p e j d a m g o b j 4 t s p e j d a m g o b j r t s p e j d a m g o b j c t s q l j a d i d o 0 t s q l j a d i d o b 0 t s q l j a d i d o b j 0 0 t s s h l o 0 t s s h l o b 0 t s s h l o b j 0 t s d 0 t s d j o 0 0 t s d j o b 0 0 t s d j o b j 0 0 t s d j o b j r t s d j o b j c t s d k o b s j j s k o t s d k o b s j j s k o b t s d k o b s j j s k o b j t s d k o b s j j s k o b j r t s o n m e d k d c o 0 0 t s o n m e d k d c o b 0 0 t s o n m e d k d c o b j 0 0 t s f i 0 0 t s f i o t s f i o b t s f i o b j t s f i o b j r t s a s j e i o t s a s j e i o b t s a s j e i o b j t h 4 t r b n r f r b t n n l t j k q i e 0 t f k i i e l b p i t b g i m d l a 4 i t b g i m d l a c i i p j a p o p i i n r o i d d h h i d o g h q i m 00 i n 4 i n n i n n o m h e r i j i o t m p e t 4 i a n m i a n m d 4 i a n m d d i a n m f l 4 l i l i q n 0 l i q n j 0 l i q n j l l l n g t r l n g t r t l n g t r t m l n g t r t b l g d s q i 0 l g d s q i o 0 l g d s q i o f 0 l g d s q i o f i 0 l g d s q i o f i n l b p m q q p q p p q p k q e e i s j q i k q r n k q l j a d q l j a d i q l j a d i d q l j a d i d n q l j a d i d n n q l j a d i d n n o m q l j a d i d n n o m h q l j a d i d n n o m q l j a d i d n n o m h e r q l j a d i d o 0 q l j a d i d o b 0 q l j a d i d o b j 0 q s q m h n q r k c q b n t l t q b n t l t p 0 q b n t l t k q o r t b s m q o r t b s m i q f s 0 q f s c q a f l a c t 4 s p e j d a m g 4 s p e j d a m g o 4 s p e j d a m g o b j 4 s p e j d a m g o b j r s p e j d a m g o b j c s t j k q i e 0 s l s q 4 s q l j a d i d 0 s q l j a d i d o 0 s q l j a d i d o b j 0 s s h l 0 s s h l o 0 s s h l o b j 0 s d t m h a s d j 0 0 s d j o 0 0 s d j o b j 0 0 s d j o b j r s d j o b j c s d k o b s j j s k s d k o b s j j s k o s d k o b s j j s k o b j s d k o b s j j s k o b j r s h 4 s h p m t s h p m t d s h p m t d d 44 s h p m t f 0 s h l 0 s h l n s h l n n s h l n n o m s h l n n o m h s h l n n o m h e s h l n n o m h e r s h l o 0 s h l o b 0 s h l o b j 0 s m k 4 s n i s h b t f s n f a e s n f a e j s b i e i q l 0 s o s o n m e d k d c 0 0 s o n m e d k d c o 0 0 s o n m e d k d c o b j 0 0 s o g g 0 s f s f i s f i o s f i o b j s f i o b j r s f q s a s a s j e 0 s a s j e i s a s j e i o s a s j e i o b j s a m j c s b h d e i d 4 0 d e l b p d t g d l n g t r t d c n j a i q o j d d j d j n d j n n d j n n o m d j n n o m h d j n n o m h e d j n n o m h e r d j o d j o b 0 0 d j o b j 0 0 d j o b j r d j o b j r t d o g h q 0 d k o b s j j s k 0 4 d k o b s j j s k n d k o b s j j s k n n d k o b s j j s k n n o m d k o b s j j s k n n o m h d k o b s j j s k n n o d k o b s j j s k n n o m h e r d k o b s j j s k o 4 d k o b s j j s k o b d k o b s j j s k o b j d k o b s j j s k o b j r d k o b s j j s k o b j r t d f c h p m t h p m t d h p m t f 0 h t s p e j d a m g 4 h t s p e j d a m g o 4 h t s p e j d a m g o b 4 h t s p e j d a m g o b j 4 h t s p e j d a m g o b j r h t s p e j d a m g o b j c h t s q l j a d i d 0 h t s q l j a d i d o 0 h t s q l j a d i d o b j 0 h t s s h l 0 h t s s h l o 0 h 0 h t s s h l o b j 0 h t s d j 0 0 h t s d j o 0 0 h t s d j o b 0 h t s d j o b j 0 0 h t s d j o b j r h t s d j o b j c h t s d k o b s j j s k h t s d k o b s j j s k o h t s d k o b s j j s k o b j h t s d k o b s j j s k o b j r h t s o n m e d k d c 0 0 h t s o n m e d k d c o 0 0 h t s o n m e d k d c o b j 0 0 h t s f 0 h t s f i h t s f i o h t s f i o b 0 0 h t s f i o b j h t s f i o b j r 0 h t s a s j e i h t s a s j e i o h t s a s j e i o b j h l n n o m h e r h d s c h h t s p e j d a m g o 4 h h t s p e j d a m g o b 4 h h t s p e j d a m g o b j h h t s p e j d a m g o b j r t 4 h h t s p e j d a m g o b j c h h t s p e j d a m g o b j o p n 4 h h t s q l j a d i d o b 0 h h t s s h l o b 0 0 h h t s d j o b 0 0 h h t s d j o b j r t h h t s d j o b j c h h t s d k o b s j j s k o b h h t s d k o b s j j s k o b j r t h h t s o n m e d k d c o b 0 0 h h t s f 0 h h t s f h h t s f i o b h h t s h h t s f i o b j r t h h t s a s j e 0 h h t s a s j e i o b h r a c j o h f h n e t j e p g h n e t j e p g n h j l t m q b h g j q c n c b h g j q c n c b j h f i t g a n p 0 h f i t g a n p p 0 h f i t g a n p p c 0 m 4 m l m h l b a m h g m m m 4 0 m m m c m n l c m n l c j 4 m k 0 m f o 0 m f o a 0 m a i a r 4 4 r p n i q o d p q r p n i q o d p q d r p n i q o d p q d d r p n i q o d p q f 4 0 r t 4 r t l l 4 c p c l i q n j l l c q e e i s c q e e i s j c q e e i s j l l c s d t m h a 0 0 c m c m f o 0 c m f o a c m f o a p 0 4 c m f o a k c r t l l 4 c c q e e i s j l l c c a 4 c n j a i q o j d c n j a i q o j d o 0 c n j a i q o j d f c j c j a 40 c o p n l l c f j c f k d i i l e s e l l n p r l q o p d 0 n e t j e p g n n e t j e p g n i 0 n i s k n i o t m p e t 4 n l 0 4 n l h b d q j e 4 n h r a c j o h f 4 4 n m e d k d c n n o m h e r n r b e b m b o 4 n r o 4 4 n j a i q o j d n j a i q o j d s n j a i q o j d o 0 n j a i q o j d k n f t s p i n f a e j j 4 j e l b p j t e l j i d d h h j l t m q b j l t m q b i j l i q n j j s b i e i q l 4 j d e i d j d a m g o b j r t j d a m g o b j h e j h r q e c m q j m h l b j m h l b a j r p n i q o d p q d j r t j r g h j c q e e i s j j c s d t m h j c s d t m h a j n e o k c i i c 4 0 j n n j n n o j n n o m j n n o m h j n n o m h e j g t j o j o p n j o m c d 0 j o b 4 j o b j 44 j o b j r t j o b j c q e e i s j 4 j f k d i i l e s e j a j a p o p j a l g d s q i o f i 0 j a l g d s q i o f i i 0 j a s j a h n e t j e p g n j a h n e t j e p g n i j a h j l t m q b j a h j l t m q b i j a n p r l q o p d 4 j a n p r l q o p d i 4 j a a e n p p q q 4 4 j a a e n p p q q i 4 4 g l f h h 4 g r f k j 0 g r f k j i 0 g r f k j i d 0 4 g b t s j r i s g o b j r t g o b j c q e e i s j g o b j o p n 4 b c h a f 04 b c h a f d 4 b c h a f d d 40 b c h a f f b c c 4 4 b c c a b n t l t b n t l t p 0 b j i d d h h b j l i q n j b j s b i e i q l 4 b j d e i d b j m h l b a b j r t b j r g h b j r g h p 4 b j r g h k b j c q e e i s j b j c s d t m h a b j g t b j g t p 4 b j g t k b j o p n b j f k d i i l e s e o o p n o p n l l o e o g s i o e o g s i j o i o s o m c o m c d 4 o m c d d 0 4 o m c f o n m e d k d c o n m e d k d c n o n m e d k d c n n o n m e d k d c n n o m o n m e d k d c n n o m h o n m e d k d c n n o m h e o n m e d k d c n n o m h e r o n m e d k d c o 0 0 o n m e d k d c o b 0 0 o n m e d k d c o b j 0 0 o g h q o g h q o 4 o g g 0 o b j i d d h h o b j l i q n j o b j s o b j d e i d o b j m h l b o b j m h l b a o b j c s d t m h o b j c s d t m h a o b j o p n o b j f k d i i l e s e o o j j e l l f s 4 o a k o a k m 0 k t t k t t p 0 4 k t t p c 0 k t f k 0 0 k t f k d 4 k t f k d d 0 4 k l g d s q i o f i 0 k s m k k s o g g 0 k d t g 0 0 k d t g d k d t g d d 4 0 k c p 0 k c n j a i q o j d 0 k n 0 k n t 4 k n t m k n l k n l h b d q j e k o e o g s i j k o g h q k o b s j j s k 0 4 k o b s j j s k n n k o b s j j s k n n o k o b s j j s k n n o m k o b s j j s k n n o m h k o b s j j s k n n o m h e k o b s j j s k o 4 k o b s j j s k o b k o b s j j s k o b j k o b s j j s k o b j r t k f j j b r k k f j j b r k p k f j j b r k k k a k a o 0 k a f f f i f i n 0 f i n n f i n n o m f i n n o m h f i n n o m h e f i n n o m h e r f i o f i o b f i o b j f i o b j r 4 f i o b j r t 4 f i f p 0 f q f c f c i k 4 f c i k a f c i k a p f c i k a k f o l 4 4 f o l p f k d i i l e s e f k d i i l e s e l l a 0 a p a e n p p q a e n p p q q 4 a e n p p q q n a e o g d s 0 4 a i a l g d s q i o f a l g d s q i o f i i 0 a q o r t b s m 0 4 a s a s j e a s j e i a s j e i n a s j e i n n a s j e i n n o m a s j e i n n o m h a s j e i n n o m h e a s j e i n n o m h e r a s j e i o a s j e i o b a s j e i o b j a h n e t j e p g a h j l t m q b i a m j c s b h a m g o b j r t a m g o b j o p n 4 a n p r l q o p d i 4 a n r b e b m b o 4 4 a k a k m a f l a c t 4 a f s a j j f e a f s a j j f e d 4 a f s a j j f e d d 4 a f s a j j f e f 4 4 4

48 Substitution graph Unlearnable languages

49 Substitutable languages Some very basic languages are not substitutable: L = {a, aa} L = {a n b n n > 0} Dyck language The very strict requirement for contexts to be disjoint is unrealistic. If we move to a probabilistic learning approach, we can test for congruence with Ĉ(u) Ĉ(v).

50 Congruence class results Positive data alone lur L and lvr L implies u L v Polynomial result from positive data. (Clark and Eyraud, 00) k-l substitutable languages, (Yoshinaka 00) Stochastic data If data is generated from a PCFG PAC-learn unambiguous NTS languages, (Clark, 00) Membership queries An efficient query-learning result (Clark, 00) Pick a finite set of contexts F Test if C L (u) F = C L (v) F

51 Outline Distributional learning Congruence classes Syntactic monoid Canonical CFG Algorithms Substitutable languages MAT learner NTS languages Conclusion

52 ICGI 00 Use query learning just as with LSTAR the MAT model. Membership queries Equivalence queries counterexamples Efficient learning but not for all CFLs.

53 Algorithm Maintain two sets: A set of strings K ; includes Σ and λ A set of contexts F ; includes (λ, λ) We have rows for K And also for KK every pair.

54 Observation table K a set of strings and F a set of contexts f f f f 4 f f f f f f 0 k k k k k 4 k k k

55 Observation table K a set of strings and F a set of contexts (λ, λ) (a, λ) (λ, b) λ a b ab

56 bcc aab bbcc bc abc aabb ab c b a λ Observation table K a set of strings and F a set of contexts (aaabb, bccc) (λ, abbccc) (aa, bbc) (abb, cc) (λ, λ) (aaabbc, λ) (aaab, bccc) (aa, bbbc) (abbb, cc)

57 bcc aab abc aabb ab λ bbcc bc c b a Observation table K a set of strings and F a set of contexts (λ, λ) (aaabb, bccc) (λ, abbccc) (aa, bbbc) (aa, bbc) (abb, cc) (aaabbc, λ) (aaab, bccc) (abbb, cc)

58 a Substitutable L = {a n cb n n 0} (a, b) (ac, b) (λ, b) (aac, b) (λ, λ) (λ, cb) (a, λ) (ac, λ) aacb ac acbb cb aacbb acb c b

59 Example Artificial f f f f 4 f f f f f f 0 k k k k k 4 k k k

60 Example Artificial f f f f 4 f f f f f f 0 k k k k k 4 k k k

61 Example Artificial f f f f 4 f f f f f f 0 k k k k k 4 k k k

62 Example Artificial f f f f 4 f f f f f f 0 k k k k k 4 k k k

63 Example Artificial f f f f 4 f f f f f f 0 k k k k k 4 k k k

64 Example Artificial f f f f 4 f f f f f f 0 k k k k k 4 k k k

65 Constructing CFG Given the observation table: Non-terminals are equivalence classes of rows: w N if N is [w] Add N PQ if there is a u P, v Q and uv N. Initial non-terminals are those rows with (λ, λ): S N iff N L

66 (λ, λ) (a, λ) (λ, b) λ 0 0 a 0 0 b 0 0 ab 0 0 aab 0 0 abb 0 0 aa ba bb bab 0 0 aba 0 0 abab 0 0 Example Dyck language

67 Example Dyck language (λ, λ) (a, λ) (λ, b) λ 0 0 ab 0 0 abab 0 0 a 0 0 aab 0 0 aba 0 0 b 0 0 abb 0 0 bab 0 0 aa ba bb Discard rows with no element in K.

68 Example Dyck language Three classes {λ, ab, abab} Call this S {a, aab, aba} A {b, bab, abb} B. Add rules S AB, S SS A AS, SA, a B BS, SB, b Note that there is no rule X AA.

69 Closure LSTAR In the LSTAR algorithm, the table needed to be closed. For every transition we needed to represent the state at the end. For every u K, a Σ, we needed a v K such that (ua) L = v L. This algorithm: non-regular language There are an infinite number of congruence classes We cannot have a closed table Doesn t matter With a DFA unique derivation With a CFG lots of different trees

70 Consistency The example is consistent: If u F u and v F v, then uv F u v. If it is not consistent then we know something is wrong and we can make it consistent by adding a feature. If (l, r) is a context that distinguishes uv and u v One of (lu, r), (l, vr), (lu, r), (l, v r) will split u, u or v, v. But it might take exponential time each context could be twice the size of the previous one. Not essential so leave it out.

71 Undergenerating If we get a counter-example w L, then we add Sub(w) to K. Lemma Each positive counter-example will give us at least one of the non-terminals in the target. Thus we will get at most n positive counterexamples for a CFG of size n

72 Overgeneralising If we have enough contexts then we won t overgeneralise. If we overgeneralise, we don t have enough contexts. Problem S w, S I, but w L. Triple w, N, (l, r) N w All elements of N should have (l, r) (l, r) C L (w)

73 FindContext Recursive Goal: find a set of strings N and a context (l, r) that splits N. N w N PQ uv = w Assume all elements of N have the feature (l, r) So we must have u v N, u P, v Q.

74 Table Possibilities luvr NO lu vr lu v r YES luv r Query the two gaps

75 Table luvr NO lu vr NO luv r YES lu v r YES v cannot be congruent to v as (lu, r) and (lu, r) C L (v ) but not C L (v).

76 Table luvr NO lu vr YES luv r NO lu v r YES u cannot be congruent to u as (l, v r) and (l, vr) C L (u ) but not C L (u).

77 Table luvr NO lu vr YES luv r YES lu v r YES u cannot be congruent to u : (l, vr) v cannot be congruent to v : (lu, r)

78 Table 4 luvr NO lu vr NO luv r NO lu v r YES u cannot be congruent to u : (l, v r) v cannot be congruent to v : (lu, r)

79 Recursion w, N, (l, r) Start at root with (λ, λ) Go down through the parse tree: At each step we have a triple such that at least one element of N has the context (l, r), but w does not. Terminate when we can split N If we get down to a leaf we will always terminate, as w N

80 4 4 Algorithm Result: A CFG G K Σ {λ}, K = K ; F {(λ, λ)}; D = L {λ} ; G = K, D, F ; while true do if Equiv(G) returns correct then return G ; w Equiv(G) ; if w is not in L(G) then K K Sub(w) ; else F AddContexts(G,w); G MakeGrammar(K, D, F) ; Algorithm : LearnCFG

81 Example Step 0 (λ, λ) λ

82 Step (λ, λ) λ a 0 b 0 ab aa 0 ba 0 bb 0 abb 0 aba 0 aab 0 bab 0 abab Example

83 Step (λ, λ) (a, λ) λ 0 a 0 0 b 0 ab 0 aa 0 0 ba 0 0 bb 0 0 abb 0 aba 0 0 aab 0 0 bab 0 abab 0 Example

84 Step (λ, λ) (a, λ) (λ, b) λ 0 0 a 0 0 b 0 0 ab 0 0 aa ba bb abb 0 0 aba 0 0 aab 0 0 bab 0 0 abab 0 0 Example

85 Result Clark, ICGI 00 Theorem This algorithm polynomially learns the class of congruential context free languages from MQs and EQs Language class A CFG is congruential if N u, N v implies u L v Observation The same approach works for positive data and membership queries alone

86 Outline Distributional learning Congruence classes Syntactic monoid Canonical CFG Algorithms Substitutable languages MAT learner NTS languages Conclusion

87 NTS Languages Boasson and Senizergues Up to now we have relied on undecidable language theoretic properties. NTS is a decidable syntactic property. Definition A grammar G is non terminally separated (NTS) iff for every N, M if N α and M uαv then M unv.

88 NTS Languages Boasson and Senizergues Up to now we have relied on undecidable language theoretic properties. NTS is a decidable syntactic property. Definition A grammar G is non terminally separated (NTS) iff for every N, M if N α and M uαv then M unv. For we like sheep have been led astray.

89 NTS Languages Boasson and Senizergues Up to now we have relied on undecidable language theoretic properties. NTS is a decidable syntactic property. Definition A grammar G is non terminally separated (NTS) iff for every N, M if N α and M uαv then M unv. For we like sheep have been led astray. Informally If there is a string like We like sheep that can be a sentence, whenever you see an occurrence of We like sheep it can be an S.

90 Congruence classes of NTS languages Congruence classes of non-terminals Suppose G is an NTS grammar, and N is a non-terminal, then if N u and N v then u L v.

91 Congruence classes of NTS languages Congruence classes of non-terminals Suppose G is an NTS grammar, and N is a non-terminal, then if N u and N v then u L v. Thus we have a partial equivalence between the congruence classes of the language, and the non-terminals. Decidable property of CFG. Decidable equivalence The MAT algorithm can learn all NTS languages.

92 PCFG Suppose we have a PCFG which defines a probability distribution D. Context distribution C D (u)[l, r] = P D(lur) E D (u) Note l,r P D(lur) = E D (u), which could be greater than.

93 Unambiguous NTS languages If we have an unambiguous NTS language then: Any two strings generated from the same non-terminal will have the same probabilistic distribution. Ambiguous NTS languages Some strings in {w N w} could have different sets of derivations. Add production N w with large parameter.

94 PAC-learning Unambiguous NTS grammars If we restrict the distributions to those generated by a PCFG with the same structure then we can learn.

95 PAC-learning Unambiguous NTS grammars If we restrict the distributions to those generated by a PCFG with the same structure then we can learn. Parameters µ -Distinguishability: same as with PDFAs

96 PAC-learning Unambiguous NTS grammars If we restrict the distributions to those generated by a PCFG with the same structure then we can learn. Parameters µ -Distinguishability: same as with PDFAs Separability: contexts are sufficiently far apart: A PCFG is ν-separable for some ν > 0 if for every pair of strings u, v in Sub(L(G)) such that u v, it is the case that L (C u C v ) ν min(l (C u ), L (C v ))

97 PAC-learning Unambiguous NTS grammars If we restrict the distributions to those generated by a PCFG with the same structure then we can learn. Parameters µ -Distinguishability: same as with PDFAs Separability: contexts are sufficiently far apart: A PCFG is ν-separable for some ν > 0 if for every pair of strings u, v in Sub(L(G)) such that u v, it is the case that L (C u C v ) ν min(l (C u ), L (C v )) Reachability: A PCFG is µ -reachable, if for every non-terminal N V there is a string u such that N G u and L (C u ) > µ.

98 Result Theorem The class of unambiguous NTS grammars is PAC-learnable, with parameters µ, ν, µ, when the distributions are generated by a PCFG with the same structure as the NTS grammar.

99 Result Theorem The class of unambiguous NTS grammars is PAC-learnable, with parameters µ, ν, µ, when the distributions are generated by a PCFG with the same structure as the NTS grammar. Requirement for unambiguity is very strong with NTS grammars. Too many stratificational parameters?

100 Outline Distributional learning Congruence classes Syntactic monoid Canonical CFG Algorithms Substitutable languages MAT learner NTS languages Conclusion

101 Includes Language class Still limited All regular languages (syntactic monoid is finite) Dyck language {a n b n n 0}... Many simple languages are not in this class: Palindromes over {a, b} L = {a n b n n 0} {a n b n n 0} L = {a n b n c m n, m 0} {a m b n c n n, m 0} (Some subtle differences in classes but we conjecture that they define the same class)

102 Linguistic modelling This seems close to the base model that is used in a lot of empirical work. Limitations Inadequate for natural language if Σ is a set of words: Exact substitutability is too strict Scholz and Pullum (00): fond versus proud cat is not perfectly substitutable with dog Words are almost never perfectly substitutable; phrases sometimes. The classes are far too small they are treated as completely unrelated

103 NP DT N company companies associate furniture sheep oil the a an this those some

104 NP DT N company companies associate furniture sheep oil the a an this those some Every determiner and noun is in a different congruence class Different rule for each combination

105 Regular language Theorem A regular languages has finitely many congruence classes. Proof Take a DFA for the language L with state set q Define f w : Q Q such that f w (q) = δ(q, w) There are finitely many functions and if f u = f v then u L v. Number of congruence classes may be exponential in size of minimal DFA: Q Q.

106 Monoid of regular languages f uv = f u f v If u has δ(q, u) = q and v has δ(q, v) = q then uv has δ(q, uv) = q There is some structure in the congruence classes; we can view the function as being composed of simpler basis functions.

107 Summary Context free Congruence based approach Define a set of primitives Congruence classes Derivation relation Syntactic monoid Language class Subclass of CF languages Inference algorithms Testing congruence We have made the change from regular to CF but results are too weak.

Learning Context Free Grammars with the Syntactic Concept Lattice

Learning Context Free Grammars with the Syntactic Concept Lattice Learning Context Free Grammars with the Syntactic Concept Lattice Alexander Clark Department of Computer Science Royal Holloway, University of London alexc@cs.rhul.ac.uk ICGI, September 2010 Outline Introduction

More information

Efficient, correct, unsupervised learning of context-sensitive languages

Efficient, correct, unsupervised learning of context-sensitive languages Efficient, correct, unsupervised learning of context-sensitive languages Alexander Clark Department of Computer Science Royal Holloway, University of London alexc@cs.rhul.ac.uk Abstract A central problem

More information

Automata Theory CS F-08 Context-Free Grammars

Automata Theory CS F-08 Context-Free Grammars Automata Theory CS411-2015F-08 Context-Free Grammars David Galles Department of Computer Science University of San Francisco 08-0: Context-Free Grammars Set of Terminals (Σ) Set of Non-Terminals Set of

More information

Computational Models - Lecture 4 1

Computational Models - Lecture 4 1 Computational Models - Lecture 4 1 Handout Mode Iftach Haitner and Yishay Mansour. Tel Aviv University. April 3/8, 2013 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice

More information

Computational Models - Lecture 4 1

Computational Models - Lecture 4 1 Computational Models - Lecture 4 1 Handout Mode Iftach Haitner. Tel Aviv University. November 21, 2016 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice Herlihy, Brown University.

More information

FLAC Context-Free Grammars

FLAC Context-Free Grammars FLAC Context-Free Grammars Klaus Sutner Carnegie Mellon Universality Fall 2017 1 Generating Languages Properties of CFLs Generation vs. Recognition 3 Turing machines can be used to check membership in

More information

Computational Models - Lecture 4

Computational Models - Lecture 4 Computational Models - Lecture 4 Regular languages: The Myhill-Nerode Theorem Context-free Grammars Chomsky Normal Form Pumping Lemma for context free languages Non context-free languages: Examples Push

More information

The View Over The Horizon

The View Over The Horizon The View Over The Horizon enumerable decidable context free regular Context-Free Grammars An example of a context free grammar, G 1 : A 0A1 A B B # Terminology: Each line is a substitution rule or production.

More information

Context-Free Grammars and Languages. Reading: Chapter 5

Context-Free Grammars and Languages. Reading: Chapter 5 Context-Free Grammars and Languages Reading: Chapter 5 1 Context-Free Languages The class of context-free languages generalizes the class of regular languages, i.e., every regular language is a context-free

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Lecture 14 Ana Bove May 14th 2018 Recap: Context-free Grammars Simplification of grammars: Elimination of ǫ-productions; Elimination of

More information

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor 60-354, Theory of Computation Fall 2013 Asish Mukhopadhyay School of Computer Science University of Windsor Pushdown Automata (PDA) PDA = ε-nfa + stack Acceptance ε-nfa enters a final state or Stack is

More information

Computational Models - Lecture 3

Computational Models - Lecture 3 Slides modified by Benny Chor, based on original slides by Maurice Herlihy, Brown University. p. 1 Computational Models - Lecture 3 Equivalence of regular expressions and regular languages (lukewarm leftover

More information

Context-Free Grammars and Languages

Context-Free Grammars and Languages Context-Free Grammars and Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Theory of Computation - Module 3

Theory of Computation - Module 3 Theory of Computation - Module 3 Syllabus Context Free Grammar Simplification of CFG- Normal forms-chomsky Normal form and Greibach Normal formpumping lemma for Context free languages- Applications of

More information

Lecture 12 Simplification of Context-Free Grammars and Normal Forms

Lecture 12 Simplification of Context-Free Grammars and Normal Forms Lecture 12 Simplification of Context-Free Grammars and Normal Forms COT 4420 Theory of Computation Chapter 6 Normal Forms for CFGs 1. Chomsky Normal Form CNF Productions of form A BC A, B, C V A a a T

More information

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET Regular Languages and FA A language is a set of strings over a finite alphabet Σ. All languages are finite or countably infinite. The set of all languages

More information

Context-Free Grammar

Context-Free Grammar Context-Free Grammar CFGs are more powerful than regular expressions. They are more powerful in the sense that whatever can be expressed using regular expressions can be expressed using context-free grammars,

More information

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages Context Free Languages Automata Theory and Formal Grammars: Lecture 6 Context Free Languages Last Time Decision procedures for FAs Minimum-state DFAs Today The Myhill-Nerode Theorem The Pumping Lemma Context-free

More information

CSCI 340: Computational Models. Regular Expressions. Department of Computer Science

CSCI 340: Computational Models. Regular Expressions. Department of Computer Science CSCI 340: Computational Models Regular Expressions Chapter 4 Department of Computer Science Yet Another New Method for Defining Languages Given the Language: L 1 = {x n for n = 1 2 3...} We could easily

More information

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules). Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules). 1a) G = ({R, S, T}, {0,1}, P, S) where P is: S R0R R R0R1R R1R0R T T 0T ε (S generates the first 0. R generates

More information

CS375: Logic and Theory of Computing

CS375: Logic and Theory of Computing CS375: Logic and Theory of Computing Fuhua (Frank) Cheng Department of Computer Science University of Kentucky 1 Table of Contents: Week 1: Preliminaries (set algebra, relations, functions) (read Chapters

More information

Context Free Grammars

Context Free Grammars Automata and Formal Languages Context Free Grammars Sipser pages 101-111 Lecture 11 Tim Sheard 1 Formal Languages 1. Context free languages provide a convenient notation for recursive description of languages.

More information

CISC4090: Theory of Computation

CISC4090: Theory of Computation CISC4090: Theory of Computation Chapter 2 Context-Free Languages Courtesy of Prof. Arthur G. Werschulz Fordham University Department of Computer and Information Sciences Spring, 2014 Overview In Chapter

More information

Languages, regular languages, finite automata

Languages, regular languages, finite automata Notes on Computer Theory Last updated: January, 2018 Languages, regular languages, finite automata Content largely taken from Richards [1] and Sipser [2] 1 Languages An alphabet is a finite set of characters,

More information

Concordia University Department of Computer Science & Software Engineering

Concordia University Department of Computer Science & Software Engineering Concordia University Department of Computer Science & Software Engineering COMP 335/4 Theoretical Computer Science Winter 2015 Assignment 3 1. In each case, what language is generated by CFG s below. Justify

More information

HW 3 Solutions. Tommy November 27, 2012

HW 3 Solutions. Tommy November 27, 2012 HW 3 Solutions Tommy November 27, 2012 5.1.1 (a) Online solution: S 0S1 ɛ. (b) Similar to online solution: S AY XC A aa ɛ b ɛ C cc ɛ X axb aa b Y by c b cc (c) S X A A A V AV a V V b V a b X V V X V (d)

More information

CS A Term 2009: Foundations of Computer Science. Homework 2. By Li Feng, Shweta Srivastava, and Carolina Ruiz.

CS A Term 2009: Foundations of Computer Science. Homework 2. By Li Feng, Shweta Srivastava, and Carolina Ruiz. CS3133 - A Term 2009: Foundations of Computer Science Prof. Carolina Ruiz Homework 2 WPI By Li Feng, Shweta Srivastava, and Carolina Ruiz Chapter 4 Problem 1: (10 Points) Exercise 4.3 Solution 1: S is

More information

download instant at Assume that (w R ) R = w for all strings w Σ of length n or less.

download instant at  Assume that (w R ) R = w for all strings w Σ of length n or less. Chapter 2 Languages 3. We prove, by induction on the length of the string, that w = (w R ) R for every string w Σ. Basis: The basis consists of the null string. In this case, (λ R ) R = (λ) R = λ as desired.

More information

Theory of Computation (Classroom Practice Booklet Solutions)

Theory of Computation (Classroom Practice Booklet Solutions) Theory of Computation (Classroom Practice Booklet Solutions) 1. Finite Automata & Regular Sets 01. Ans: (a) & (c) Sol: (a) The reversal of a regular set is regular as the reversal of a regular expression

More information

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26 Parsing Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 26 Table of contents 1 Context-Free Grammars 2 Simplifying CFGs Removing useless symbols Eliminating

More information

Properties of Context-Free Languages

Properties of Context-Free Languages Properties of Context-Free Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Section 1 (closed-book) Total points 30

Section 1 (closed-book) Total points 30 CS 454 Theory of Computation Fall 2011 Section 1 (closed-book) Total points 30 1. Which of the following are true? (a) a PDA can always be converted to an equivalent PDA that at each step pops or pushes

More information

Introduction to Theory of Computing

Introduction to Theory of Computing CSCI 2670, Fall 2012 Introduction to Theory of Computing Department of Computer Science University of Georgia Athens, GA 30602 Instructor: Liming Cai www.cs.uga.edu/ cai 0 Lecture Note 3 Context-Free Languages

More information

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY 15-453 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY REVIEW for MIDTERM 1 THURSDAY Feb 6 Midterm 1 will cover everything we have seen so far The PROBLEMS will be from Sipser, Chapters 1, 2, 3 It will be

More information

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar TAFL 1 (ECS-403) Unit- III 3.1 Definition of CFG (Context Free Grammar) and problems 3.2 Derivation 3.3 Ambiguity in Grammar 3.3.1 Inherent Ambiguity 3.3.2 Ambiguous to Unambiguous CFG 3.4 Simplification

More information

Theory of Computer Science

Theory of Computer Science Theory of Computer Science C1. Formal Languages and Grammars Malte Helmert University of Basel March 14, 2016 Introduction Example: Propositional Formulas from the logic part: Definition (Syntax of Propositional

More information

Learning Regular Sets

Learning Regular Sets Learning Regular Sets Author: Dana Angluin Presented by: M. Andreína Francisco Department of Computer Science Uppsala University February 3, 2014 Minimally Adequate Teachers A Minimally Adequate Teacher

More information

Automata Theory Final Exam Solution 08:10-10:00 am Friday, June 13, 2008

Automata Theory Final Exam Solution 08:10-10:00 am Friday, June 13, 2008 Automata Theory Final Exam Solution 08:10-10:00 am Friday, June 13, 2008 Name: ID #: This is a Close Book examination. Only an A4 cheating sheet belonging to you is acceptable. You can write your answers

More information

This lecture covers Chapter 7 of HMU: Properties of CFLs

This lecture covers Chapter 7 of HMU: Properties of CFLs This lecture covers Chapter 7 of HMU: Properties of CFLs Chomsky Normal Form Pumping Lemma for CFs Closure Properties of CFLs Decision Properties of CFLs Additional Reading: Chapter 7 of HMU. Chomsky Normal

More information

Homework 4 Solutions. 2. Find context-free grammars for the language L = {a n b m c k : k n + m}. (with n 0,

Homework 4 Solutions. 2. Find context-free grammars for the language L = {a n b m c k : k n + m}. (with n 0, Introduction to Formal Language, Fall 2016 Due: 21-Apr-2016 (Thursday) Instructor: Prof. Wen-Guey Tzeng Homework 4 Solutions Scribe: Yi-Ruei Chen 1. Find context-free grammars for the language L = {a n

More information

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY 15-453 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY Chomsky Normal Form and TURING MACHINES TUESDAY Feb 4 CHOMSKY NORMAL FORM A context-free grammar is in Chomsky normal form if every rule is of the form:

More information

Computational Models - Lecture 5 1

Computational Models - Lecture 5 1 Computational Models - Lecture 5 1 Handout Mode Iftach Haitner. Tel Aviv University. November 28, 2016 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice Herlihy, Brown University.

More information

In English, there are at least three different types of entities: letters, words, sentences.

In English, there are at least three different types of entities: letters, words, sentences. Chapter 2 Languages 2.1 Introduction In English, there are at least three different types of entities: letters, words, sentences. letters are from a finite alphabet { a, b, c,..., z } words are made up

More information

Chomsky Normal Form and TURING MACHINES. TUESDAY Feb 4

Chomsky Normal Form and TURING MACHINES. TUESDAY Feb 4 Chomsky Normal Form and TURING MACHINES TUESDAY Feb 4 CHOMSKY NORMAL FORM A context-free grammar is in Chomsky normal form if every rule is of the form: A BC A a S ε B and C aren t start variables a is

More information

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen Pushdown automata Twan van Laarhoven Institute for Computing and Information Sciences Intelligent Systems Version: fall 2014 T. van Laarhoven Version: fall 2014 Formal Languages, Grammars and Automata

More information

A* Search. 1 Dijkstra Shortest Path

A* Search. 1 Dijkstra Shortest Path A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the

More information

Propositional logic. First order logic. Alexander Clark. Autumn 2014

Propositional logic. First order logic. Alexander Clark. Autumn 2014 Propositional logic First order logic Alexander Clark Autumn 2014 Formal Logic Logical arguments are valid because of their form. Formal languages are devised to express exactly that relevant form and

More information

Pushdown Automata. Reading: Chapter 6

Pushdown Automata. Reading: Chapter 6 Pushdown Automata Reading: Chapter 6 1 Pushdown Automata (PDA) Informally: A PDA is an NFA-ε with a infinite stack. Transitions are modified to accommodate stack operations. Questions: What is a stack?

More information

Theory of Computation (II) Yijia Chen Fudan University

Theory of Computation (II) Yijia Chen Fudan University Theory of Computation (II) Yijia Chen Fudan University Review A language L is a subset of strings over an alphabet Σ. Our goal is to identify those languages that can be recognized by one of the simplest

More information

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols Finite Automata and Formal Languages TMV026/DIT321 LP4 2012 Lecture 13 Ana Bove May 7th 2012 Overview of today s lecture: Normal Forms for Context-Free Languages Pumping Lemma for Context-Free Languages

More information

Context-Free Grammars: Normal Forms

Context-Free Grammars: Normal Forms Context-Free Grammars: Normal Forms Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Foundations of Informatics: a Bridging Course

Foundations of Informatics: a Bridging Course Foundations of Informatics: a Bridging Course Week 3: Formal Languages and Semantics Thomas Noll Lehrstuhl für Informatik 2 RWTH Aachen University noll@cs.rwth-aachen.de http://www.b-it-center.de/wob/en/view/class211_id948.html

More information

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write: Languages A language is a set (usually infinite) of strings, also known as sentences Each string consists of a sequence of symbols taken from some alphabet An alphabet, V, is a finite set of symbols, e.g.

More information

NPDA, CFG equivalence

NPDA, CFG equivalence NPDA, CFG equivalence Theorem A language L is recognized by a NPDA iff L is described by a CFG. Must prove two directions: ( ) L is recognized by a NPDA implies L is described by a CFG. ( ) L is described

More information

Non-context-Free Languages. CS215, Lecture 5 c

Non-context-Free Languages. CS215, Lecture 5 c Non-context-Free Languages CS215, Lecture 5 c 2007 1 The Pumping Lemma Theorem. (Pumping Lemma) Let be context-free. There exists a positive integer divided into five pieces, Proof for for each, and..

More information

Context-Free Grammars. 2IT70 Finite Automata and Process Theory

Context-Free Grammars. 2IT70 Finite Automata and Process Theory Context-Free Grammars 2IT70 Finite Automata and Process Theory Technische Universiteit Eindhoven May 18, 2016 Generating strings language L 1 = {a n b n n > 0} ab L 1 if w L 1 then awb L 1 production rules

More information

Homework 4. Chapter 7. CS A Term 2009: Foundations of Computer Science. By Li Feng, Shweta Srivastava, and Carolina Ruiz

Homework 4. Chapter 7. CS A Term 2009: Foundations of Computer Science. By Li Feng, Shweta Srivastava, and Carolina Ruiz CS3133 - A Term 2009: Foundations of Computer Science Prof. Carolina Ruiz Homework 4 WPI By Li Feng, Shweta Srivastava, and Carolina Ruiz Chapter 7 Problem: Chap 7.1 part a This PDA accepts the language

More information

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata. Code No: R09220504 R09 Set No. 2 II B.Tech II Semester Examinations,December-January, 2011-2012 FORMAL LANGUAGES AND AUTOMATA THEORY Computer Science And Engineering Time: 3 hours Max Marks: 75 Answer

More information

Properties of Context-Free Languages. Closure Properties Decision Properties

Properties of Context-Free Languages. Closure Properties Decision Properties Properties of Context-Free Languages Closure Properties Decision Properties 1 Closure Properties of CFL s CFL s are closed under union, concatenation, and Kleene closure. Also, under reversal, homomorphisms

More information

Notes for Comp 497 (454) Week 10

Notes for Comp 497 (454) Week 10 Notes for Comp 497 (454) Week 10 Today we look at the last two chapters in Part II. Cohen presents some results concerning the two categories of language we have seen so far: Regular languages (RL). Context-free

More information

C1.1 Introduction. Theory of Computer Science. Theory of Computer Science. C1.1 Introduction. C1.2 Alphabets and Formal Languages. C1.

C1.1 Introduction. Theory of Computer Science. Theory of Computer Science. C1.1 Introduction. C1.2 Alphabets and Formal Languages. C1. Theory of Computer Science March 20, 2017 C1. Formal Languages and Grammars Theory of Computer Science C1. Formal Languages and Grammars Malte Helmert University of Basel March 20, 2017 C1.1 Introduction

More information

Notes for Comp 497 (Comp 454) Week 10 4/5/05

Notes for Comp 497 (Comp 454) Week 10 4/5/05 Notes for Comp 497 (Comp 454) Week 10 4/5/05 Today look at the last two chapters in Part II. Cohen presents some results concerning context-free languages (CFL) and regular languages (RL) also some decidability

More information

MA/CSSE 474 Theory of Computation

MA/CSSE 474 Theory of Computation MA/CSSE 474 Theory of Computation CFL Hierarchy CFL Decision Problems Your Questions? Previous class days' material Reading Assignments HW 12 or 13 problems Anything else I have included some slides online

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF) CS5371 Theory of Computation Lecture 7: Automata Theory V (CFG, CFL, CNF) Announcement Homework 2 will be given soon (before Tue) Due date: Oct 31 (Tue), before class Midterm: Nov 3, (Fri), first hour

More information

Grammars and Context-free Languages; Chomsky Hierarchy

Grammars and Context-free Languages; Chomsky Hierarchy Regular and Context-free Languages; Chomsky Hierarchy H. Geuvers Institute for Computing and Information Sciences Version: fall 2015 H. Geuvers Version: fall 2015 Huygens College 1 / 23 Outline Regular

More information

Author: Vivek Kulkarni ( )

Author: Vivek Kulkarni ( ) Author: Vivek Kulkarni ( vivek_kulkarni@yahoo.com ) Chapter-3: Regular Expressions Solutions for Review Questions @ Oxford University Press 2013. All rights reserved. 1 Q.1 Define the following and give

More information

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where Recitation 11 Notes Context Free Grammars Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x A V, and x (V T)*. Examples Problem 1. Given the

More information

Theory Of Computation UNIT-II

Theory Of Computation UNIT-II Regular Expressions and Context Free Grammars: Regular expression formalism- equivalence with finite automata-regular sets and closure properties- pumping lemma for regular languages- decision algorithms

More information

What we have done so far

What we have done so far What we have done so far DFAs and regular languages NFAs and their equivalence to DFAs Regular expressions. Regular expressions capture exactly regular languages: Construct a NFA from a regular expression.

More information

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1) COSE212: Programming Languages Lecture 1 Inductive Definitions (1) Hakjoo Oh 2018 Fall Hakjoo Oh COSE212 2018 Fall, Lecture 1 September 5, 2018 1 / 10 Inductive Definitions Inductive definition (induction)

More information

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars COMP-330 Theory of Computation Fall 2017 -- Prof. Claude Crépeau Lec. 10 : Context-Free Grammars COMP 330 Fall 2017: Lectures Schedule 1-2. Introduction 1.5. Some basic mathematics 2-3. Deterministic finite

More information

Grammars and Context Free Languages

Grammars and Context Free Languages Grammars and Context Free Languages H. Geuvers and A. Kissinger Institute for Computing and Information Sciences Version: fall 2015 H. Geuvers & A. Kissinger Version: fall 2015 Talen en Automaten 1 / 23

More information

Einführung in die Computerlinguistik

Einführung in die Computerlinguistik Einführung in die Computerlinguistik Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 22 CFG (1) Example: Grammar G telescope : Productions: S NP VP NP

More information

UNIT II REGULAR LANGUAGES

UNIT II REGULAR LANGUAGES 1 UNIT II REGULAR LANGUAGES Introduction: A regular expression is a way of describing a regular language. The various operations are closure, union and concatenation. We can also find the equivalent regular

More information

Automata and Computability. Solutions to Exercises

Automata and Computability. Solutions to Exercises Automata and Computability Solutions to Exercises Spring 27 Alexis Maciel Department of Computer Science Clarkson University Copyright c 27 Alexis Maciel ii Contents Preface vii Introduction 2 Finite Automata

More information

Computability and Complexity

Computability and Complexity Computability and Complexity Rewriting Systems and Chomsky Grammars CAS 705 Ryszard Janicki Department of Computing and Software McMaster University Hamilton, Ontario, Canada janicki@mcmaster.ca Ryszard

More information

Ogden s Lemma for CFLs

Ogden s Lemma for CFLs Ogden s Lemma for CFLs Theorem If L is a context-free language, then there exists an integer l such that for any u L with at least l positions marked, u can be written as u = vwxyz such that 1 x and at

More information

Fundamentele Informatica II

Fundamentele Informatica II Fundamentele Informatica II Answer to selected exercises 5 John C Martin: Introduction to Languages and the Theory of Computation M.M. Bonsangue (and J. Kleijn) Fall 2011 5.1.a (q 0, ab, Z 0 ) (q 1, b,

More information

Miscellaneous. Closure Properties Decision Properties

Miscellaneous. Closure Properties Decision Properties Miscellaneous Closure Properties Decision Properties 1 Closure Properties of CFL s CFL s are closed under union, concatenation, and Kleene closure. Also, under reversal, homomorphisms and inverse homomorphisms.

More information

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1) COSE212: Programming Languages Lecture 1 Inductive Definitions (1) Hakjoo Oh 2017 Fall Hakjoo Oh COSE212 2017 Fall, Lecture 1 September 4, 2017 1 / 9 Inductive Definitions Inductive definition (induction)

More information

Probabilistic Context-Free Grammar

Probabilistic Context-Free Grammar Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612

More information

Lecture 11 Context-Free Languages

Lecture 11 Context-Free Languages Lecture 11 Context-Free Languages COT 4420 Theory of Computation Chapter 5 Context-Free Languages n { a b : n n { ww } 0} R Regular Languages a *b* ( a + b) * Example 1 G = ({S}, {a, b}, S, P) Derivations:

More information

Using Multiplicity Automata to Identify Transducer Relations from Membership and Equivalence Queries

Using Multiplicity Automata to Identify Transducer Relations from Membership and Equivalence Queries Using Multiplicity Automata to Identify Transducer Relations from Membership and Equivalence Queries Jose Oncina Dept. Lenguajes y Sistemas Informáticos - Universidad de Alicante oncina@dlsi.ua.es September

More information

Automata and Computability. Solutions to Exercises

Automata and Computability. Solutions to Exercises Automata and Computability Solutions to Exercises Fall 28 Alexis Maciel Department of Computer Science Clarkson University Copyright c 28 Alexis Maciel ii Contents Preface vii Introduction 2 Finite Automata

More information

SFWR ENG 2FA3. Solution to the Assignment #4

SFWR ENG 2FA3. Solution to the Assignment #4 SFWR ENG 2FA3. Solution to the Assignment #4 Total = 131, 100%= 115 The solutions below are often very detailed on purpose. Such level of details is not required from students solutions. Some questions

More information

6.8 The Post Correspondence Problem

6.8 The Post Correspondence Problem 6.8. THE POST CORRESPONDENCE PROBLEM 423 6.8 The Post Correspondence Problem The Post correspondence problem (due to Emil Post) is another undecidable problem that turns out to be a very helpful tool for

More information

The Pumping Lemma for Context Free Grammars

The Pumping Lemma for Context Free Grammars The Pumping Lemma for Context Free Grammars Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar is in the form A BC A a Where a is any terminal

More information

Theory Bridge Exam Example Questions

Theory Bridge Exam Example Questions Theory Bridge Exam Example Questions Annotated version with some (sometimes rather sketchy) answers and notes. This is a collection of sample theory bridge exam questions. This is just to get some idea

More information

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University Simplification of CFG and Normal Forms Wen-Guey Tzeng Computer Science Department National Chiao Tung University Normal Forms We want a cfg with either Chomsky or Greibach normal form Chomsky normal form

More information

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University Simplification of CFG and Normal Forms Wen-Guey Tzeng Computer Science Department National Chiao Tung University Normal Forms We want a cfg with either Chomsky or Greibach normal form Chomsky normal form

More information

Learning cover context-free grammars from structural data

Learning cover context-free grammars from structural data Learning cover context-free grammars from structural data Mircea Marin Gabriel Istrate West University of Timişoara, Romania 11th International Colloquium on Theoretical Aspects of Computing ICTAC 2014

More information

VTU QUESTION BANK. Unit 1. Introduction to Finite Automata. 1. Obtain DFAs to accept strings of a s and b s having exactly one a.

VTU QUESTION BANK. Unit 1. Introduction to Finite Automata. 1. Obtain DFAs to accept strings of a s and b s having exactly one a. VTU QUESTION BANK Unit 1 Introduction to Finite Automata 1. Obtain DFAs to accept strings of a s and b s having exactly one a.(5m )( Dec-2014) 2. Obtain a DFA to accept strings of a s and b s having even

More information

Chapter 6. Properties of Regular Languages

Chapter 6. Properties of Regular Languages Chapter 6 Properties of Regular Languages Regular Sets and Languages Claim(1). The family of languages accepted by FSAs consists of precisely the regular sets over a given alphabet. Every regular set is

More information

Chapter 5: Context-Free Languages

Chapter 5: Context-Free Languages Chapter 5: Context-Free Languages Peter Cappello Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 cappello@cs.ucsb.edu Please read the corresponding chapter

More information

Computational Theory

Computational Theory Computational Theory Finite Automata and Regular Languages Curtis Larsen Dixie State University Computing and Design Fall 2018 Adapted from notes by Russ Ross Adapted from notes by Harry Lewis Curtis Larsen

More information

CS500 Homework #2 Solutions

CS500 Homework #2 Solutions CS500 Homework #2 Solutions 1. Consider the two languages Show that L 1 is context-free but L 2 is not. L 1 = {a i b j c k d l i = j k = l} L 2 = {a i b j c k d l i = k j = l} Answer. L 1 is the concatenation

More information

CISC 4090 Theory of Computation

CISC 4090 Theory of Computation CISC 4090 Theory of Computation Context-Free Languages and Push Down Automata Professor Daniel Leeds dleeds@fordham.edu JMH 332 Languages: Regular and Beyond Regular: Captured by Regular Operations a b

More information

Theory of Computation

Theory of Computation Fall 2002 (YEN) Theory of Computation Midterm Exam. Name:... I.D.#:... 1. (30 pts) True or false (mark O for true ; X for false ). (Score=Max{0, Right- 1 2 Wrong}.) (1) X... If L 1 is regular and L 2 L

More information

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen Pushdown Automata Notes on Automata and Theory of Computation Chia-Ping Chen Department of Computer Science and Engineering National Sun Yat-Sen University Kaohsiung, Taiwan ROC Pushdown Automata p. 1

More information