Categorial Grammar. Larry Moss NASSLLI. Indiana University

1/37 Categorial Grammar Larry Moss Indiana University NASSLLI

2/37 Categorial Grammar (CG) CG is the tradition in grammar that is closest to the work that we ll do in this course. Reason: In CG, syntax and semantics are closely related. This set of slides first covers the basics of syntax in CG. The specific system is applicative categorial grammar, also called the Ajdukiewicz-Bar Hillel form. (But if these don t mean anything to you, don t worry.) Then we ll look at how semantics interfaces with syntax. Later on, we ll extend the syntax to be more real.

3/37 Syntactic categories in CG Basic syntactic categories A categorial grammar always begins with basic categories. You should think of these as simple syntactic categories. In our linguistic applications, we usually will take N, NP and S. But we could just as well take other basic categories, and to make this point, I ll present other choices as well, coming from formal language theory.

Syntactic categories in CG Slash categories If C and D are categories, so are C\D and C/D. It s very important to see the difference between the two slashes! I personally have names for these: \ look left / look right (mnemonic: go from bottom to top) But they have other names: backslash and slash, over and under. The overall idea is that they are directional versions of the usual division notation for fractions: X Y corresponds to both X\Y and X/Y 4/37

5/37 Examples of Categories First, here are some categories when the basic ones are S and NP S/S S\NP (S\NP)/NP ((S\NP)/NP)/S The idea These categories are going to play the role of parts of speech like nouns, noun phrases, adjectives, etc.

6/37 Another set of examples This time, let s let the basic categories be U, S, T, and Y Here are some categories: Y S/T (S\(U/Y))\U There are infinitely many categories.

7/37 Lexicon A lexicon is a set of pairs, consisting of a symbols together with a category. Our first lexicon (Dana, NP) (Kim, NP) (smiled, S\NP) (laughed, S\NP) (cried, S\NP) (praised, (S\NP)/NP) (teased, (S\NP)/NP) (interviewed, (S\NP)/NP) A given word usually is associated to more than one category.

8/37 Another example of a lexicon This time, we use S, T, U, X, and Y as the basic categories. The symbols here are the letters a, b, and c. Lexicon (a, T/X) (a, S/X) (b, X) (b, X/T) (a, U/Y) (a, S/Y) (c, Y) (c, Y/U)

9/37 Our first lexicon: explanation One uses a lexicon to make parse trees like praised: (NP\S)/NP Kim: NP Dana: NP praised Kim: NP\S Dana praised Kim: S If we take a tree t whose root is of category S\NP and put a tree u whose root is of category NP on the left of t, and then add a new root, the whole tree will be of category S. If we take a tree t whose root is of category (S\NP)/NP and put a tree u whose root is of category NP on the right of t, and then add a new root, the whole tree will be of category S\NP.

10/37 Example using our first lexicon praised: (S\NP)/NP Kim: NP Dana: NP praised Kim: S\NP Dana praised Kim: S The leaves must match the categories in the lexicon, and going up we use the construction principles that we just saw. The key point here is that S\NP is verb phrase.

11/37 A second lexicon Lexicon (a, T/X) (a, S/X) (b, X) (b, X/T) (a, U/Y) (a, S/Y) (c, Y) (c, Y/U) Let s parse abab as a string of category S: a : T/X b : X b : X/T ab : T a : S/X bab : X abab : S

12/37 Grammars and languages A categorial grammar is a pair G = (Lex, C) where Lex is a lexicon (over some set of atomic categories), and C is some fixed category. The language of G is the set of sequences of symbols from the lexicon which can be parsed by some tree whose root is labeled C.

13/37 Let s start with Example of a grammar and its language G 1 = (Lex 1, S), where Lex 1 is our first lexicion, repeated below: Lex 1 (Dana, NP) (Kim, NP) (smiled, S\NP) (laughed, S\NP) (cried, S\NP) (praised, (S\NP)/NP) (teased, (S\NP)/NP) (interviewed, (S\NP)/NP) The language of this grammar G 1 is the set containing the following 18 sequences of lexical items: Dana smiled Dana laughed Dana cried Kim smiled Kim laughed Kim cried Dana praised Dana Dana teased Dana Dana interviewed Dana Dana praised Kim Dana teased Kim Dana interviewed Kim Kim praised Dana Kim teased Dana Kim interviewed Dana Kim praised Kim Kim teased Kim Kim interviewed Kim

14/37 Second example of a grammar and its language Let s next consider G 2 = (Lex 2, S), where Lex 2 is our second lexicion, repeated below: Lex 2 (a, T/X) (a, S/X) (b, X) (b, X/T) (a, U/Y) (a, S/Y) (c, Y) (c, Y/U) The language of this grammar G 2 is harder to determine. It turns out to be {ab, abab, ababab, ababab,..., ac, acac, acacac, acacac,...}

15/37 Our third lexicon Lex 3 (Dana, NP) (Kim, NP) (smiled, S\NP) (laughed, S\NP) (cried, S\NP) (praised, (S\NP)/NP) (teased, (S\NP)/NP) (interviewed, (S\NP)/NP) (joyfully, (S\NP)\(S\NP)) (carefully, (S\NP)\(S\NP)) (excitedly, (S\NP)\(S\NP))

16/37 Two examples using adverbs smiled: S\NP joyfully: (S\NP)\(S\NP) Dana: NP smiled joyfully: S\NP Dana smiled joyfully: S Kim: NP criticized: (S\NP)/NP Dana: NP criticized Dana: S\NP carefully: (S\NP)\(S\NP) criticized Dana carefully: S\NP Kim criticized Dana carefully: S

17/37 NP coordination To get we add to the lexicon Farid and Bettina and Cynthia left (and, (NP\NP)/NP)

17/37 NP coordination To get we add to the lexicon Farid and Bettina and Cynthia left (and, (NP\NP)/NP) F : NP and : (NP\NP)/NP C : NP B : NP and C : NP\NP and : (NP\NP)/NP B and C : NP and B and C : NP\NP F and B and C : NP F and B and C left : S left : NP\S

18/37 Semantics begins here We have seen how to build complex categories in CG using the two directional slashes \ and /. We now carry out a parallel development on the semantic side, but with a few differences. We start with a syntax for semantics, generating a set of semantic types.

19/37 Semantic Types We begin with a set T 0 of basic types. Every basic type is a type. If σ and τ are types, so is (σ, τ). The set of all types is written T.

19/37 Semantic Types We begin with a set T 0 of basic types. Every basic type is a type. If σ and τ are types, so is (σ, τ). The set of all types is written T. Our main example is when the basic types are e and t, standing for entity and truth value.

19/37 Semantic Types We begin with a set T 0 of basic types. Every basic type is a type. If σ and τ are types, so is (σ, τ). The set of all types is written T. By the way σ is the Greek letter sigma, in lower-case. τ is similarly the Greek letter tau.

20/37 Semantic Types: examples Note that we drop often the comma, as is customary. e t (et) ((et), t) ((et)((et)t)).

21/37 Semantic Domains Let D be a function which assigns sets to the basic types. In our setting, the basic types are usually e and t, so we would have D(e) and D(t). We ll write these as D e and D t, since this is what everyone does. This function D can be extended to all types by the rule D (στ) = (D σ D τ ) This is the set of all functions from D σ to D τ. As a function, the domain of D is the set T of all types.

22/37 The idea behind the equation D (στ) = (D σ D τ ) Think of X\Y and X/Y in terms of functions: input\output input/output A phrase of type v : X\Y will be interpreted by a function [[v]] X\Y : Y interpretations X interpretations A phrase of type v : X/Y will be interpreted by a function [[v]] X/Y : Y interpretations X interpretations Either way, putting one phrase after another corresponds to function application.

23/37 Semantic Domains: Example Let s use D e = {a, b, c, d}, and D t = 2 = {0, 1}. Incidentally We ll always use the set 2 = {0, 1} = {false, true} for D t. D e is our set of entities, and this can be anything.

23/37 Semantic Domains: Example Let s use D e = {a, b, c, d}, and D t = 2 = {0, 1}. type σ D σ e {a, b, c, d} t 2 (et) functions f : {a, b, c, d} 2 ((et), t) functions from the set above to 2 (tt) the four functions from 2 to 2

Connecting the syntactic categories with the semantic types Here is how we connecting the syntactic categories with the semantic types. Let Cat 0 be the set of basic categories in the syntax, and let Cat be the full set of categories. We start with a function k : Cat 0 T. syntactic category X S N NP semantic type k(x) t (et) ((et)t) We then extend this to all syntactic categories by using function spaces for both directional slashes. We ll call the extended function k. 24/37

25/37 A picture of k and k Act I: k has domain Cat 0 S NP N Cat 0 k k k (e, (et, t)) e t (et, t) et (et, (et, t)) t T

26/37 A picture of k and k Act II: k extends k NP/N S NP N S\NP N\N Cat k k k k k k (et, (et, t)) e t (et, t) et ((et, t), t) (et, et) T

27/37 Connecting the syntactic categories with the semantic types syntactic category X name semantic type k(x) S sentence t N noun (et) NP noun phrase ((et)t) N/N adjective ((et)(et)) S\NP verb phrase (((et)t)t) (S\NP)\(S\NP) adverb ((((et)t)t)(((et)t)t)) (S\NP)/NP transitive verb (((et)t)(((et)t)t)) NP/N determiner (et, (et, t)) In more detail on how k is defined on the transitive verb category, k((s\np)/np) = (((et)t)(((et)t)t)) we are using the general definition that explains the chart: k(x\y) = (k(y), k(x)) k(y/x) = (k(x), k(y))

27/37 Connecting the syntactic categories with the semantic types syntactic category X name semantic type k(x) S sentence t N noun (et) NP noun phrase ((et)t) N/N adjective ((et)(et)) S\NP verb phrase (((et)t)t) (S\NP)\(S\NP) adverb ((((et)t)t)(((et)t)t)) (S\NP)/NP transitive verb (((et)t)(((et)t)t)) NP/N determiner (et, (et, t)) You can also see why people who work on this need special abbreviations for commonly-found semantic types.

28/37 What do we need to do semantics of a CG? That is, what is a model? We start with a specific CG, used for syntax. We have a set T of semantic types, and sets D σ, for σ T. Next, we must have a function and this induces an extension k : CAT 0 T k : CAT T This connects the syntactic categories with the semantic types.

What do we need to do semantics of a CG? That is, what is a model? We start with a specific CG, used for syntax. We have a set T of semantic types, and sets D σ, for σ T. Next, we must have a function and this induces an extension k : CAT 0 T k : CAT T This connects the syntactic categories with the semantic types. A lexical interpretation function [[ ]] is a function which takes an item in the lexicon, say (w, X), and gives some [[w]] X D k(x). This is what it takes to give a model. 28/37

29/37 The main point of the semantics, again Given: a categorical lexicon, k : CAT 0 T, and a lexical interpretation function Every parse tree in the grammar has a semantic correlate. Every node in the parse tree, say (v, X), has a correlate on the semantic side of the form [[v]] X D k(x). Each use of the CG cancellation rules v 1 : X/Y v 2 : Y v 1 v 2 : X v 1 : Y v 2 : X\Y v 1 v 2 : X corresponds to function application [[v 1 ]] X/Y [[v 2 ]] Y [[v 1 v 2 ]] X = [[v 1 ]] X/Y ([[v 2 ]] Y ) [[v 1 ]] Y [[v 2 ]] X\Y [[v 1 v 2 ]] X = [[v 2 ]] X\Y ([[v 1 ]] Y )

30/37 Algebra as grammar We take a single base type r, and as our lexicon we take plus : r/(r/r) minus : r/(r/r) times : r/(r/r) div2 : r/(r/r) v : r w : r x : r y : r z : r 1 : r 2 : r

31/37 We get terms in Polish notation plus : r/(r/r) v : r minus : r/(r/r) z : r plus v : r/r minus z : r/r plus v w : r minus z plus v w : r w : r This would correspond to the term usually written z (v + w).

32/37 Semantics The semantics will use higher-order (one-place) functions on the real numbers. We take D r = R. Then automatically, And D r r = R R. D r (r r) = R (R R).

Semantics As one particular model, we take [[v]] = 4 [[w]] = 2 [[x]] = 65 [[y]] = 3 [[z]] = 0 [[1]] = 1 [[2]] = 2 [[plus]](x)(y) = x + y [[minus]](x)(y) = x y [[times]](x)(y) = x y [[div2]](x)(y) = 2 x y The part in the middle is standard ; it would not be sensible to use any other choice. 33/37

34/37 We get terms in Polish notation plus : r/(r/r) v : r minus : r/(r/r) z : r plus v : r/r minus z : r/r plus v w : r minus z plus v w : r w : r After some calculation, [[minus z plus v w]] = [[z]] ([[v]] + [[w]]) = 0 (4 + 2) = 6

35/37 We get terms in Polish notation We are interested in a term corresponding to f(v, w, x, y, z) = x y 2 z (v+w). To fit it all on the screen, let s drop the types: minus x plus v minus x y minus z plus v w div2 minus x y minus z plus v w div2 minus x y minus z plus v w div2 minus x y minus z plus v w div2(t)(u) is supposed to mean 2 t u.

36/37 Can we determine the polarities of the variables from the tree? Go from the root to the leaves, marking green for red for The rule for propagating colors is: the right branches of completed nodes for div2 and minus flip colors. Otherwise, we keep colors as we go up the tree.

Can we determine the polarities of the variables from the tree? Go from the root to the leaves, marking green for red for The rule for propagating colors is: the right branches of completed nodes for div2 and minus flip colors. Otherwise, we keep colors as we go up the tree. minus x plus v minus x y minus z plus v w div2 minus x y minus z plus v w div2 minus x y minus z plus v w div2 minus x y minus z plus v w 36/37

36/37 Can we determine the polarities of the variables from the tree? Go from the root to the leaves, marking green for red for minus x plus v minus x y minus z plus v w div2 minus x y minus z plus v w div2 minus x y minus z plus v w div2 minus x y minus z plus v w This agrees with what we saw before: f(v, w, x, y, z ).

This algorithm has a history It was first proposed in CG by van Benthem in the 1990 s to formalize the, notation. His proposal was then worked out by Sanchez-Valencia. (Older versions exists: e.g., Sommers.) Versions of it are even implemented in real-world CL systems: Rowan Nairn, Cleo Condoravdi, and Lauri Karttunen. Computing relative polarity for textual inference. In Proceedings of ICoS-5 (Inference in Computational Semantics), Buxton, UK, 2006. (Karttunen was an IU Linguistics PhD and spoke about this work at a distinguished alum talk here a few years ago.) 37/37