O ste n D ahl Un iv o f StOGKholin THE limterpretatlon OF BOUND PRONOUNS This p a p e r is a r e p o r t o n w o r k in p r o g r e s s w i t h the a i m of simulating o n a c o m p u t e r s o m e a s p e c t s o f t h e p r o c e s s o f u n d e rstanding sentences, m o re sp ec if ic al ly the interpretation of so-called bound pronouns. The w o r k connects directly to some earlier p a p e r s of m i n e (Dahl 1 9 8 3 a and 1983b), w h e r e t h o s e pr ob le ms were di scussed from a theoretical point of view. A b o u n d p r o n o u n is, r o u g h l y s p e a k i n g, a p r o n o u n w h i c h b e h a v e s analogously to a b o u n d v a r i a b l e in logic. T h e r e a r e at l e a s t two k i n d s of c r i t e r i a for r e g a r d i n g a p r o n o u n as bound: (i) syntactic c r i t e r i a - s o m e k i n d s of p r o n o u n s, s u c h as reflexives, r e c i p r o c a l s a n d s o - c a l l e d i o g o p h o r i c s, m u s t f i nd their a n t e c e d e n t s in s y n t a c t i c a l l y d e f i n e d d o m a i n s, (ii) semantic criteria - so me pron ou ns cannot be as si gn ed referents 'in the w o r l d ' b u t c a n be u n d e r s t o o d o n l y if r e g a r d e d as re fe re nt ia ll y d e p e n d e n t on t h e i r a n t e c e d e n t s : th i s c o n c e r n s e.g. p r o n o u n s b o u n d by q u a n t i f i e d N P s and w h - p h r a s e s (e.g. himself in N o b o d y li ke s h i m s e 1 f ). A l t h o u g h the c l a s s e s of pronouns d e l i m i t e d by t h e s e c r i t e r i a a r e n o t q u i t e i d e n t i c a l, they o v e r l a p to s u c h an e x t e n t t h a t in m o s t case s, t h e y c a n be regarded as equivalent. In m y e a r l i e r pa p e r s, 1 h a v e d i s c u s s e d s o m e c a s e s of b o u n d pronouns w h i c h are t r o u b l e s o m e for the c u r r e n t t h e o r i e s t h a t account for b o u n d p r o n o u n s by t r a n s l a t i n g t h e m i n t o s o m e k i n d of logical notation using bound variable or e q uivalent devices. Th os e case s include: (i) ' s l o p p y i d e n t i t y ' c a s e s (as in J o h n l o v e s hi s w ife a n d so does B i l l, w h e r e Bill m a y be un de rs to od to love either his o w n or John's wife) (ii) ' d i s l o c a t e d b o u n d pronouns', t h a t is p r o n o u n s t h a t h a v e -A9- The interpretation of bound pronouns Ö sten Dahl, pages 49-57
been 'moved' (presupposing a transformational analysis) out of the s c o p e of t h e i r bi n d e r s, e.g. H i m self, e v e r y o n e d e s p i s e s, and in part icular am on g those (iii) p r o n o u n s w i t h ' r e l a t i o n a l ' (Engdatil 1985) or 'sec o n d - order' r e a d i n g s, as in a s e n t e n c e s u c h as T h ^ o n l y womari e v e r y En gl is hm an a d m i res is his m o t h e r, the i n t e r p r e t a t i o n of w h i c h cannot be r e n d e r e d w i t h o u t h a v i n g r e c o u r s e to s e c o n d - o r d e r logic The m a i n idea put forward in my papers was that the t r o u blesome cases could be accounted for if the location of the antecedent in the syntactic structure w e r e considered an integral part of a boun d pronoun' s interpretation. So far, two ma i n versions of the pr on ou n in terpretation p r o g r a m have b e e n d e v e l o p e d. In Da h l 1985, 1 r e p o r t an a t t e m p t to co ns tr uc t a s y n t a c t i c p a r s e r t o b e u s e d o n a n 8 - b i t microcomputer, w r i t t e n in the L I S P d i a l e c t m u L I S P. The f i r s t version of the p r o n o u n i n t e r p r e t a t i o n p r o g r a m ( h e n c e f o r t h 'Version l') u s e d a s o m e w h a t m o r e e l a b o r a t e v e r s i o n of t h i s parser as its base, t h a t is, the s e m a n t i c p a r t of the p r o g r a m took s y n t actically analysed sentences as its input and assigned referents to the noun phrases in them. In addition, the p r o g r a m had what can be called r u d i m e n t a r y conversational competence: the s e n t e n c e s p r o c e s s e d w e r e c o m p a r e d w i t h a d a t a b a s e a n d depending o n th e t y p e of s e n t e n c e, t h e a p p r o p r i a t e a c t i o n w a s taken: in the c a s e of i n t e r r o g a t i v e s e n t e n c e s, an a n s w e r w a s given, in th e c a s e of a d e c l a r a t i v e s e n t e n c e, t h e p r o p o s i t i o n was added to the database. The reference a s s i g n m e n t process in Version 1 w o r k e d in an t o p - d o w n f a s h i o n, a s s i g n i n g r e f e r e n t s first to the highest NPs in the syntactic structure. Extensive use w a s ma d e of t e m p o r a r y registers, wh er e proc e s s e d NPs (with identified r e f e r e n t s ) w e r e s t o r e d so as to be r e t r i e v e d l a t e r on w h e n n e e d e d as a n t e c e d e n t s of p r o n o u n s. W h e n an a n t e c e d e n t was f o u n d for a p r o n o u n, b o t h the r e f e r e n t a n d t h e l o c a t i o n (that is, w h e r e it w a s f o u n d in t h e r e g i s t e r s ) o f t h e a n tecedent was stored on the prop e r t y list of the pronoun. This info rm at io n w a s t h e n e x p l o i t e d by t h e m e c h a n i s m s u s e d in t h e -50-50
'troublesome cases' l i s t e d above. V e r s i o n 1 w a s t h u s a b l e to handle b o t h s l o p p y i d e n t i t y a n d at le a s t s o m e ' r e l a t i o n a l questions', e.g. the f o l l o w i n g : (1) W h o m d o e s e v e r y m a n love, h i s w i f e or M a r y? However, V e r s i o n 1 w a s r a t h e r sl ow, w i t h p r o c e s s i n g t i m e s up to half a minute for processing s o me sentences (this wo u l d include both syntactic parsing, reference assignment, c o m p a r i s o n w i th the d a t a b a s e a n d a p p r o p r i a t e r e a c t i o n ). T h e r e w e r e s e v e r a l reasons f o r t h a t, i n c l u d i n g i n h e r e n t l i m i t a t i o n s in t h e hard ware a n d s o f t w a r e u s e d. Of a m o r e d i r e c t l i n g u i s t i c relevance, h o w e v e r, w e r e the f o l l o w i n g c i r c u m s t a n c e s : The syntactic a n d s e m a n t i c c o m p o n e n t s of the s y s t e m s w e r e w h o l l y au tonomous f r o m e a c h o t h e r, a n d i n d e e d w o r k e d in r a t h e r di fferent fashions: the syntactic parser was strictly bottomup, s y s t e m a t i c a l l y t a k i n g i n t o c o n s i d e r a t i o n s all p o s s i b l e analyses of the s e n t e n c e s, w h e r e a s the s e m a n t i c s, as h a s already b e e n p o i n t e d out, w o r k e d f r o m the top down, a n d w i t h the p r i n c i p l e o f a l w a y s c h o o s i n g t h e f i r s t p o s s i b l e alternative. W h e n I c o n s i d e r e d the s l o w n e s s of V e r s i o n 1 a n d a l so r e a l i z e d t h a t w h a t t h e s e m a n t i c p a r t of it d i d w a s l a r g e l y repeating th e s y n t a c t i c a n a l y s i s of the s e n t e n c e, it a p p e a r e d to m e t h a t it m i g h t be f r u i t f u l to t r y a n d b u i l d a s y s t e m w h e r e syntactic and sema n t i c anal ys is wo ul d be done in an integrated fashion. This, h o w e v e r, p u t s t r o n g e r d e m a n d s o n th e p a r s i n g mechanism, since it required a m o r e inte ll ig en t w a y of handling structural ambiguities. Version 2, then, has been designed to m e e t these demands. It is written in the M S - D O S v e r s i o n of m u L I S P and h a s b e e n r u n o n several kinds of IBM c o m p a t i b l e computers. It has not yet been de veloped as f u l l y as V e r s i o n 1 (the ' c o n v e r s a t i o n a l ' p a r t h a s not b e e n i m p l e m e n t e d, for i n s t a n c e ) but its p e r f o r m a n c e is significantly b e t t e r t h a n t h a t of V e r s i o n 1, p a r t l y d u e to better h a r d w a r e but a l s o d u e to a m o r e e f f i c i e n t s t r u c t u r e of the p a r s i n g m e c h a n i s m. Tnus, the p a r s i n g t i m e ( i n c l u d i n g reference a s s i g n m e n t ) is a b o u t 20 m i l l i s e c o n d s p e r w o r d on an IBM AT c o m p u t e r. P e r h a p s th e m a i n a d v a n t a g e is t h a t t h i s t i m e -51-51
is m o r e or less linear, w h e r e a s the p a r s i n g t i m e p e r w o r d in Ve rs io n 1 grew very rapidly w i t h the length of sentences. The s y n t a c t i c a n a l y s i s in V e r s i o n 2 is d o n e a c c o r d i n g to the following principles: (i) the o u t p u t is a L I S P s t r u c t u r e w h i c h c a n be c h a r a c t e r i z e d as an ' a l m o s t u n l a b e l l e d b r a c k e t i n g ', tliat is, w i t h v e r y f e w exceptions, t h e s y n t a c t i c c a t e g o r y o f a c o n s t i t u e n t (which h a s the f o r m of a list) is not e x p l i c i t l y m a r k e d but h a s to be d e d u c e d f r o m the l e x i c a l c a t e g o r y of its 'head', t h a t is the first m e m b e r (CAK; of the list (ii) p a r s i n g is d o n e f r o m left to r i g h t in a m o r e or less d e terministic w a y (iii) the fact that the category of a constituent is in general kn ow n w h e n y o u h a v e i d e n t i f i e d its h e a d or its f i r s t w o r d (which is often the same thing) is s y s t e m a t i c a l l y expl oi te d in predicting what c o mes next (iv) b a c k t r a c k i n g is m a d e b y a s y s t e m a t i c u s e o f l o c a l pa ra me te rs of L I S P f u n c t i o n s : e v e r y t i m e a n e w w o r d is p a r s e d a call is m a d e to a f u n c t i o n a n d the p a r t i a l s t r u c t u r e b u i l t so far is p a s s e d t o t h a t f u n c t i o n a s a p a r a m e t e r - if t h e cont in ue d parse does not succeed, one a u t o m a t i c a l l y returns to the pr ev io us state (v) at a n y p o i n t in the p a r s i n g p r o c e s s, the p a r t i a l a n a l y s i s arrived at so far is r e p r e s e n t e d as a s i n g l e s t a c k of 'active constituents' ( c a l l e d the A C T 1V E S T A C K ), t h a t is, c o n s t i t u e n t s that ha v e not y e t b e e n f i n i s h e d. T o s h o w w h a t the p a r s i n g of a sentence m a y look like, w e s h o w the s u c c e s s i v e s t a g e s of the parsing of (2) in (3). -52-52
(2) John believes that Mary loves Bill (3) (expression to be parsed:) (ACTIVESTACK:) 1: John b e liev es that Mary loves B i l l NIL 2: b e lie v e s th a t Mary lo v e s B i l l ((John) VP (S)) 3: that Mary lo v e s B i l l (NP (believe -s) (S (John))) 4: Mary lo v e s B i l l (s (that) (believe -s) (S (John))) 5: loves B i l l ((Mary) VP (S) (that)(bel ieve -s) (S (John))) 6: B i l l (NP (love -s) (S (Mary)) (that) (believe -s) (S (John))) 7: NIL ((Bill) (love -s) (S (Mary)) (that) (believe -s) (S (John))) (close all constituents) (S (John) (believe -s (that (S (Mary) (love -s (Bill)))))) The a s s i g n m e n t of referents to HPs is done daring the syntactic analysis, m o r e s p e c i f i c a l l y, w h e n the n o u n p h r a s e in q u e s t i o n is 'closed', i.e. m o v e d o f f the s t a c k of a c t i v e c o n s t i t u e n t s. When a r e f e r e n t is a s s i g n e d to a n o u n ph ra se, a ' d otted pair' re pr es en ti ng the r e f e r e n t is a d d e d to the l i st w h i c h r e p r e s e n t s the c o n s t i t u e n t in the s t r u c t u r e. At p r e s e n t, the s y s t e m can handle t h r e e k i n d s of MPs: p r o p e r n a m e s, b o u n d p r o n o u n s, a n d NPs w i t h a po ss es si ve in the d e t e r m i n e r slot. For proper nouns, the a s s i g n m e n t p r o c e s s is t r i v i a l : the p r o p e r n a m e i t s e l f is used as a reference indicator. Thus, tlie LISP e x pression to the left of th e a r r o w is c o n v e r t e d i n t o the o n e to t h e r i g h t of t h e a r r o w : (4) ( J o h n ) -----> (John (REF. John)) Some p e o p l e m a y be d i s t u r b e d by t h i s r a t h e r v a c u o u s p r o c e s s : the p o i n t h e r e is t h a t s i n c e w e a r e not d i r e c t l y c o n c e r n e d w i t h h o w p r o p e r n a m e s ar e i n t e r p r e t e d w e d o not w a n t to i n t r o d u c e any c o m p l i c a t i o n s here. Of course, w e c o u l d e a s i l y p l u g in a -53-53
routine that puts in a r e fe r e n tia l index or tlie lik e. For N P s w i t h a p o s s e s s i v e d e t e r m i n e r, the p r i n c i p l e is a l s o slightly ad hoc: the r e f e r e n t of th e p o s s e s s i v e e x p r e s s i o n is first d e t e r m i n e d, t h e n the p r o p e r t y li s t of t h a t r e f e r e n t is ex am in ed to see if there is so m e pr op er ty w h i c h coincides w i t h the h e a d n o u n of the NP: in t h a t case, the v a l u e of t h a t p r operty be co me s the referent of the wh o l e NP. For instance, if we h a v e the NP J ohn^s w ife a n d w e fi n d the i t e m ( w i f e. M a r y ) o n John's prop e r t y list, then the referent of John's wi f e is taken to be Mary. The mo s t interesting part of the referent a s s i g n m e n t proc ed ur e is t h a t w h i c h a s s i g n s a n t e c e d e n t s and r e f e r e n t s to b o u n d pronouns. T h e a s s u m p t i o n is t h a t the a n t e c e d e n t of a b o u n d p r onoun is t o be f o u n d a m o n g t h e N P s t h a t c - c o m m a n d it. A c cording to the current definition, a node x c - c o m m a n d s a node y if a n d o n l y if th e n o d e t h a t i m m e d i a t e l y d o m i n a t e s x a l s o d o minates In the p r e s e n t s y s t e m, tlie c - c o m m a n d e r s of an NP that is b e i n g 'closed' are a l w a y s p r e c i s e l y t h o s e NPs t h a t a r e immediate c o n s t i t u e n t s of the m e m b e r s of the A C T I V E S T A C K. Fo r instance, w h e n t h e N P B _l^ in (2) a b o v e is c l o s e d, t h e ACTIVESTACK looks as follows: (5) (love -s) (S (Mary)) (that) ( b e l i e v e -s) (S (John))) The c - c o m m a n d e r s in (b) are thus Mary and J o h n. This m a k e s it p o s s i b l e to f o r m u l a t e a r e l a t i v e l y s i m p l e a l g o r i t h m for finding the possible antecedents. In addi ti on to assigning a r e f e r e n t to a p r o n o u n, the a l g o r i t h m a l s o s t o r e s the distance (in nodes) b e t w e e n the pr on ou n and its antecedent. The po int of this wi l l become clear later. In mulisp f o r m a l i s m the m a i n a n t e c e ndent-finding function looks as f o llows (some irrelevant details have been left out): -54-54
(6) (DEFUN ANTECEDENT (LAMBDA (X Y XNP XNODE DIST) (SETQ Y (CDR ACTIVESTACK)) Define Y as the ACTIVESTACK minus the NP under consideration. (SETQ DIST 0) Set the variable DIST to 0. (LOOP Repeat until Y is empty or antecedent is f ound: ((NULL Y) NIL) (SETQ XNODE (POP Y)) Set XNODE to next member of Y. (SETQ XNP (FIRSTNP XNODE)) Find the first NP in XNODE: call it XNP. ((AND (MEMBER (CAR X) REFLPROLIST) If the pronoun is reflexive and (EQ (CAT XNODE) S) ) XNODE is a sentence, then (PUT X 'ANTEC-DIST DIST) set the antecedent-distance to DIST and (AGREE X XNP) ) the antecedent to XNP, if it agrees with the pronoun, else to NIL, ((AND (if the pronoun is non-reflexive:) (AGREE X XNP) if XNP agrees with the pronoun then (NOT (AND unless the pronoun is non-possessive (NOT (POSSESSIVE X)) and (EQ (GET XNP REF) (GET (SUBJECT) REF)) )) ) XNP is coreferent with the subject of the sentence, (PUT X 'ANTEC-DIST DIST)then set the antecedent-distance to DIST XNP ) and the antecedent to XNP. XNP ) (SETQ DIST (ADDl DIST)) ) ) )) Add 1 to DIST. This is c e r t a i n l y a s i m p l i f i e d rule; for in s t a n c e, it a s s u m e s that t h e a n t e c e d e n t of a r e f l e x i v e is a l w a y s the s u b j e c t. However, in m o s t simple cases, it assigns the closest possible antecedent to any bound pronoun. Let us n o w h a v e a c l o s e r lo o k at the a n t e c e d e n t d i s t a n c e parameter. I t s f u n c t i o n is t o d e f i n e t h e l o c a t i o n o f t h e antecedent of a p r o n o u n ; t h i s i n f o r m a t i o n is a b o v e a l l u s e f u l w h e n the s t o r e d i n t e r p r e t a t i o n of t h e c o n s t i t u e n t w h i c h -55-55
contains the pronoun is retrieved later on. We shall illustrate what t h i s m e a n s by l o o k i n g at the w a y in w h i c h th e p r o g r a m handies 'sloppy identity'. Consider the again the e x a m p l e from the beginning of the paper; (7) J o hn loves his w i f e and so does Bill At p r e s e n t, the p r o g r a m is o n l y a b l e to h a n d l e a s o m e w h a t unidiomatic paraphrase of (7); (8) John loves his wi f e and Bill too. Basically, t h e f o l l o w i n g is w h a t h a p p e n s w h e n (8) is interpreted by the s y s t e m ; First, th e c l a u s e J o h n l o v e s his wife is pa rs ed. A s s u m i n g t h a t the s y s t e m k n o w s t h a t M a r y is John's w i fe, it w i l l a s s i g n M a r y as a r e f e r e n t to hi s w i f e. Then, the r e d u c e d c l a u s e B ill t o o is parsed. A f t e r the s u b j e c t NP Bill the s y stem expects a verb phrase; it takes the pa rt ic le too as a signal of an elliptical VP. Every time a VP is parsed, it b e c o m e s the v a l u e of the v a r i a b l e L A S T V P ; in th i s case, LASTVP is loves his w i f e. The pars ed version of this expr ession is n o w c o p i e d i n t o t h e p l a c e w h e r e the VP s h o u l d o c c u r in t h e el li pt ic al s e n t e n c e. W h e n t h i s h a p p e n s, the NP h i s w ife is ag ain subjected to the reference a s s i g n m e n t process - however, since it is the second time, the a n tecedent is found not by the function A N T E C E D E N T b u t by a n o t h e r c a l l e d F I N D - A N T E C E D E N T - AGAIN. T h i s f u n c t i o n l o o k s a t t h e a n t e c e d e n t d i s t a n c e a s sociated w i t h t h e p r o n o u n h i s a n d t r i e s to f i n d t h e NP at the c o rresponding p l a c e in the tree. In t h i s case, it is B i l l, so the referent of his w i f e is n o w taken to be Bill's wife. V e r s i o n 2 h a s n o t y e t b e e n d e v e l o p e d so far t h a t it c a n t a k e care of t h e o t h e r p r o b l e m a t i c c a s e s of b o u n d p r o n o u n s, b u t in prin c i p l e s i m i l a r m e c h a n i s m s as the o n e m e n t i o n e d s h o u l d be su ff ic ie nt to s o l v e the p r o b l e m s, as w a s d e m o n s t r a t e d b y Version 1. The po in t is that the a n t e cedent distance p a r a m e t e r a p proach is i n h e r e n t l y m o r e p o w e r f u l t h a n th e c o m m o n w a y o f di splaying c o r e f e r e n c e r e l a t i o n s, viz. by r e f e r e n t i a l i n d i c e s or m u ltiple occurrences of the same variable letter, in that it -56-56
has a meaningful interpretation also out of context. The a b o v e a c c o u n t h a s b e e n l a c k i n g in e x p l i c i t n e s s in v a r i o u s ways. Th er e are t w o reasons for this: the rather early stage of d e velopment of the p r o g r a m and the limited space available. The long-range a i m of the u n d e r t a k i n g is to p r o v i d e a s m a l l y e t po we rf ul 'module' for p r o c e s s i n g n a t u r a l l a n g u a g e s s e n t e n c e s and texts, w h e r e the pr on ou n i n t e rpretation m e c h a n i s m wi l l only be a s m a l l part. H o p e f u l l y, th e w o r k o n the 'module' w i l l be po ss ib le to shed light on some qu estions of general theoretical i n t e r e s t. RE FERENCES Dahl, Ö. 1983a. On th e n a t u r e of b o u n d p r o n o u n s. P I L U S 48, Dept, of Li ng ui st ic s, Univ. of Stockholm. Dahl, Ö. 1983b. B o u n d p r o n o u n s in an i n t e g r a t e d p r o c e s s mo del. In F. K a r l s s o n, ed.. P a p e r s f r o m the S e v e n t h S c a n d i n a v i a n Co nference of L i n g u i s t i c s. U n i v e r s i t y of H e l s i n k i, Dept, of Ge ne ra l Linguistics. Dahl, U. 1985. S y n t a c t i c P a r s i n g o n a M i c r o c o m p u t e r. In S. Bäckman a n d G. K j e l l m e r, eds., Lite rature ^ n t e d t o A ^ v a r EJii: 9 d a n d L k ^}i5[^an. Go th en bu rg Studies in English 60. Göteborg: Acta Universitatis G o t h o b u r g e n s i s. Engdahl, E. 1985. T h e S y n t a x a n d S e m a n t i c s of Q u e s t i o n s w i t h Special Reference to S w e d i s h. Dordrecht: Reidel. -57-57