Report Documentation Page

Similar documents
c. What is the average rate of change of f on the interval [, ]? Answer: d. What is a local minimum value of f? Answer: 5 e. On what interval(s) is f

Class Diagrams. CSC 440/540: Software Engineering Slide #1

LU N C H IN C LU D E D

MOLINA HEALTHCARE, INC. (Exact name of registrant as specified in its charter)

600 Billy Smith Road, Athens, VT

UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C Form 8-K/A (Amendment No. 2)

B ooks Expans ion on S ciencedirect: 2007:

gender mains treaming in Polis h practice

THE BANK OF NEW YORK MELLON CORPORATION (Exact name of registrant as specified in its charter)

Functional pottery [slide]

AGRICULTURE SYLLABUS

TTM TECHNOLOGIES, INC. (Exact Name of Registrant as Specified in Charter)

STEEL PIPE NIPPLE BLACK AND GALVANIZED

UNITED STATES SECURITIES AND EXCHANGE COMMISSION FORM 8-K. Farmer Bros. Co.

TECHNICAL MANUAL OPTIMA PT/ST/VS

UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C FORM 8-K

Grain Reserves, Volatility and the WTO

NORWEGIAN MARITIME DIRECTORATE

ANNUAL MONITORING REPORT 2000

M a n a g e m e n t o f H y d ra u lic F ra c tu rin g D a ta

UNITED STATES SECURITIES AND EXCHANGE COMMISSION WASHINGTON, D.C FORM 8-K

Use of Wijsman's Theorem for the Ratio of Maximal Invariant Densities in Signal Detection Applications

Report Documentation Page

On Applying Point-Interval Logic to Criminal Forensics

McCormick & Company, Incorporated (Exact name of registrant as specified in its charter)

A L A BA M A L A W R E V IE W

CRS Report for Congress

MySQL 5.1. Past, Present and Future. Jan Kneschke MySQL AB

UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, DC FORM 8-K. Current Report

The Effects of Apprehension, Conviction and Incarceration on Crime in New York State

P. Kestener and A. Arneodo. Laboratoire de Physique Ecole Normale Supérieure de Lyon 46, allée d Italie Lyon cedex 07, FRANCE

REGENERATION OF SPENT ADSORBENTS USING ADVANCED OXIDATION (PREPRINT)

Attribution Concepts for Sub-meter Resolution Ground Physics Models

Form and content. Iowa Research Online. University of Iowa. Ann A Rahim Khan University of Iowa. Theses and Dissertations

PIPS 3.0. Pamela G. Posey NRL Code 7322 Stennis Space Center, MS Phone: Fax:

System Reliability Simulation and Optimization by Component Reliability Allocation

Distributive Justice, Injustice and Beyond Justice: The Difference from Principle to Reality between Karl Marx and John Rawls

Broadband matched-field source localization in the East China Sea*

University Microfilms

Report Documentation Page

Closed-form and Numerical Reverberation and Propagation: Inclusion of Convergence Effects

Thermo-Kinetic Model of Burning for Polymeric Materials

WSFS Financial Corporation (Exact name of registrant as specified in its charter)

Diagonal Representation of Certain Matrices

1980 Annual Report / FEDERAL R ESER V E BA N K OF RICHMOND. Digitized for FRASER Federal Reserve Bank of St.

Analysis Comparison between CFD and FEA of an Idealized Concept V- Hull Floor Configuration in Two Dimensions. Dr. Bijan Khatib-Shahidi & Rob E.

Report Documentation Page

Quantitation and Ratio Determination of Uranium Isotopes in Water and Soil Using Inductively Coupled Plasma Mass Spectrometry (ICP-MS)

TECH DATA CORPORATION (Exact name of registrant as specified in its charter)

Matador Resources Company (Exact name of registrant as specified in its charter)

SW06 Shallow Water Acoustics Experiment Data Analysis

Super-Parameterization of Boundary Layer Roll Vortices in Tropical Cyclone Models

Predictive Model for Archaeological Resources. Marine Corps Base Quantico, Virginia John Haynes Jesse Bellavance

Crowd Behavior Modeling in COMBAT XXI

Optimizing Robotic Team Performance with Probabilistic Model Checking

EKOLOGIE EN SYSTEMATIEK. T h is p a p e r n o t to be c i t e d w ith o u t p r i o r r e f e r e n c e to th e a u th o r. PRIMARY PRODUCTIVITY.

Range-Dependent Acoustic Propagation in Shallow Water with Elastic Bottom Effects

MONTHLY REVIEW. f C r e d i t a n d B u s i n e s s C o n d i t i o n s F E D E R A L R E S E R V E B A N K O F N E W Y O R K MONEY MARKET IN JUNE

REPORT DOCUMENTATION PAGE. Theoretical Study on Nano-Catalyst Burn Rate. Yoshiyuki Kawazoe (Tohoku Univ) N/A AOARD UNIT APO AP

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORT NUMBER

Playing Abstract games with Hidden States (Spatial and Non-Spatial).

MODELING SOLAR UV/EUV IRRADIANCE AND IONOSPHERE VARIATIONS WITH MT. WILSON MAGNETIC INDICIES

Assimilation of Synthetic-Aperture Radar Data into Navy Wave Prediction Models

Models of Marginal Seas Partially Enclosed by Islands

Extension of the BLT Equation to Incorporate Electromagnetic Field Propagation

Vlaamse Overheid Departement Mobiliteit en Openbare Werken

Metrology Experiment for Engineering Students: Platinum Resistance Temperature Detector

Super-Parameterization of Boundary Layer Roll Vortices in Tropical Cyclone Models

Coastal Engineering Technical Note

BIRLA ERICSSON OPTICAL LIMITED

DIRECTIONAL WAVE SPECTRA USING NORMAL SPREADING FUNCTION

Real-Time Environmental Information Network and Analysis System (REINAS)

High-Fidelity Computational Simulation of Nonlinear Fluid- Structure Interaction Problems

Form 8-K. Piedmont Office Realty Trust, Inc. (Exact name of registrant as specified in its charter)

Sums of the Thue Morse sequence over arithmetic progressions

A report (dated September 20, 2011) on. scientific research carried out under Grant: FA

Model Checking. Automated Verification of Computational Systems

Information System Desig

VLBA IMAGING OF SOURCES AT 24 AND 43 GHZ

A GENERAL PROCEDURE TO SET UP THE DYADIC GREEN S FUNCTION OF MULTILAYER CONFORMAL STRUCTURES AND ITS APPLICATION TO MICROSTRIP ANTENNAS

Improved Parameterizations Of Nonlinear Four Wave Interactions For Application In Operational Wave Prediction Models

LSU Historical Dissertations and Theses

Flocculation, Optics and Turbulence in the Community Sediment Transport Model System: Application of OASIS Results

Capacitor Discharge called CD welding

CHAPTER 6 SUMMARV, m a in FINDIN6S AND C0NCUL5I0NS

USMC Enlisted Endstrength Model

Comparative Analysis of Flood Routing Methods

Report Documentation Page

REFUGEE AND FORCED MIGRATION STUDIES

IMPROVED SYNTHESIS AND REACTION CHEMISTRY OF FN 3

Development and Application of Acoustic Metamaterials with Locally Resonant Microstructures

Grant Number: N IP To compare obtained theoretical results with NPAL experimental data.

Experimental and Theoretical Studies of Ice-Albedo Feedback Processes in the Arctic Basin

HIGH PERFORMANCE CONTROLLERS BASED ON REAL PARAMETERS TO ACCOUNT FOR PARAMETER VARIATIONS DUE TO IRON SATURATION

Characterization of Caribbean Meso-Scale Eddies

High Resolution Surface Characterization from Marine Radar Measurements

Scattering of Internal Gravity Waves at Finite Topography

Sensitivity of West Florida Shelf Simulations to Initial and Boundary Conditions Provided by HYCOM Data-Assimilative Ocean Hindcasts

Joh n L a w r e n c e, w ho is on sta ff at S ain t H ill, w r ite s :

Marginal Sea - Open Ocean Exchange

Transcription:

% &('()*! "$# +-,/. 0214365*798;:<3=0?>@:(021BAC3ED=1FG1H3@D=1H3?IJ86KL3=1M!KON$:IPKOQ?3SR3@0?KO3@1 TVUXWY Z[VY \<Y ]_^ =Y '(` &<Y `)ay '(b )Cb(UcYd GZfe4gh)C]f]fUX'(ì@Y\(ZjYkfeZ_l '(]kjumkjuxkf&()noez VbpY '(gh)cb eqdr(&kf)hz kf&(b(ux)h] esxst)h` ) Y Z_u2vw xy z 7J{ }IPM~:< I ^(UX]*r<Yr )CZ*b()C]fgCZfUt\?)h]-Y sty '(`&<Y 4` )iux'(b()hr?)h'b()h'$k-sxut'()hy ZfUXWYkfUte'ƒ)C'(`UX'() veep 4l )h'/ ^(UX]*]lH] kj)cqygceqirutst)c] kˆyzf`)}kgscy '`&<Y `) `ZjY qdqiy Zf]EUt'$kje r(z_e`zfy qi]ekj^<ykgkjy u )no)hykj&(z_)`zjyr(^(];y ]GUX'(r(&kf] Y '(bš`)h')hzjy!kj)i ŒeZ_b scy!kfkjuxgh)c]žkf^<yk*gy 'Š\ )ir<y ]_]f)hb Y ste'` kfe kf^() ]kˆykfut]_kfutghy sg)} 4kjZfY gckfute'šqie4bh &(sx)he nkf^()j`)c'()hzfykjuxe' ]_l4]_kf)hq T UXkfZfe`)C'/ ^() ` ZjY qdq YZf]*Y Zf) Z_UXk_kj)h' &]fut'`by <)} HUX\(st) Y'(b r?e~ Œ)hZno&(s=scY '`&<Y `)ve 4lH ;v<kf^<yk^<y ]kf^()žr ep Œ)hZe n Ydr(Z_e`ZfY qiqdut'` scy'(`&<y ` ) \(&knoe4gh&(]_)h] e' '<Ykf&(ZjYswscY'(`&<Y ` ) Zf)hY stuxwykfute'w ^(UX])h'(` Ut'()^(YPp )Ž\ )C)h' &(]f)cb ]f&(gcgh)h]_]_no&(sxsxl UX'JghZ_)YkfUt'(` Y' š;'4 `sxut]f^ sxut'()hy ZfUXWYkfUte'-rZfe`ZfY q kj^(yk6ut];gh&(z_zf)c'kfsxl &(]_)hb Y]Œr<YZ_k6e nwkf^() ^(Ut'()C]f)œ šg'(`sxut]f^dqiy ĝ ^(Ut') kjzfy '(]_scykfute' ]lh]kj)cq ^) ]f&r(r e Z_kŒe nwkf^() E)Cĝ ^('(UtghY s )Cr ezk )hzfux)h]œy '(bdkj^()vr<y Z_kfUcY s]f&(r(r?ezke n kj^(ux] Zf)C]f)hY Zfĝ ^ž\$l kj^() TVYkjUXe'<Y s gcut)h'gh) Ÿ<e&('(b(YkjUXe'ƒ&('(b()CZ `ZjY'k š; P"JY '(bƒkf^() x)hr<yz_k_ qd)h'$k e n x)}no)h'(]_)ž&('(b()czvgce'$kjzjygck x! j~# *Ut]`ZfYkj)}no&(stsmlJY ĝ u'(ep sx)hb(`)cb/

Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE JUN 2000 2. REPORT TYPE 3. DATES COVERED 00-06-2000 to 00-06-2000 4. TITLE AND SUBTITLE Oxygen: A Language Independent Linerization Engine 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Language and Media Processing Laboratory,Institute for Advanced Computer Studies,University of Maryland,College Park,MD,20742-3275 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR S ACRONYM(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES The original document contains color images. 14. ABSTRACT 15. SUBJECT TERMS 11. SPONSOR/MONITOR S REPORT NUMBER(S) 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT a. REPORT unclassified b. ABSTRACT unclassified c. THIS PAGE unclassified 18. NUMBER OF PAGES 11 19a. NAME OF RESPONSIBLE PERSON Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18

oxygen: A Language Independent Linearization Engine Nizar Habash Institute for Advanced Computer Studies University of Maryland College Park, MD 20740 phone: +1 (301) 405-6768 fax: +1 (301) 314-9658 habash@umiacs.umd.edu http://umiacs.umd.edu/labs/clip Abstract This paper describes a language independent linearization engine, oxygen. This system compiles target language grammars into programs that take feature graphs as inputs and generate word lattices that can be passed along to the statistical extraction module of the generation system Nitrogen. The grammars are written using a flexible and powerful language, oxyl, that has the power of a programming language but focuses on natural language realization. This engine have been used successfully in creating an English linearization program that is currently used as part of a Chinese-English machine translation system. 1 Introduction This paper describes a language independent realization engine, oxygen. This system compiles linearization grammars into programs that run independently of the grammar and the compilation engine. The grammars are written in oxyl, a powerful and flexible natural language grammar description language. The syntax of oxyl is described in the paper. Currently, the input to the compiled grammar is a feature graph and the output is a word lattice to be fed into the statistical extraction module of the generation engine Nitrogen (Langkilde and Knight 1998a, 1998b 1998c). 2 Research context The work described in this paper has been developed as part of an interlingual Chinese-English Machine Translation system at the University of Maryland College Park. (Dorr et al. 1998), (Traum and Habash 2000). The focus of this paper is only on the Linearization sub-module of the realization module in the generation component of the MT system. The realization module discussed is Nitrogen, a hybrid rule-based/statistical realization engine (Langkilde and Knight 1998a, 1998b 1998c). The system consists of two components, Linearization and Statistical Extraction (Graph1). First, a Feature Graph (FG) representation of the sentence to realize is converted into a word lattice of possible word sequence renderings, i.e. linearized. Then, the uni and bigram statistics are used to determine the most probable set of paths along the word lattice. 1

(G r a p h 1 ) T h e p a rtic u la r fo rm o f F G s e x e m p lifie d in th is p a p e r is a m o d ifie d version of Nitrogen s Abstract Meaning Representation for our MT system s purposes(dorr et. al 1998). AMRs are labeled directed feature graphs written using the syntax for the Penman Sentence Plan Language (Penman 1989): <AMR> ::= (<label> {<role> <value>}+ ) <value> ::= <AMR> <terminal> (B N F 1 ) Every node in an AMR has a label and one or more role-value pairs. Roles, i.e. features, are marked by a colon prefix except for the default role :instance which can be represented as a forward slash /. Values can be meaning carrying terminal tokens or AMR nodes. Meaning carrying tokens can be semantic concepts such as china or love, syntactic categories such as N or V, or plain surface text strings such as Once upon a time. T h e ro le s a n d c o n c e p ts o f A M R s a re a m ix o f s y n ta c tic a n d s e m a n tic s ig n ific a n c e : th e re a re :LCS-AG (le x ic a l c o n c e p tu a l s tru c tu re a g e n t) a n d s y n ta c tic c a te g o rie s s u c h a s ADV. T h e fo llo w in g is a n e x a m p le A M R fo r The United States unilaterally reduced the China textile export quota: (a1 / reduce :CAT V :LCS-AG (a2 / united states :CAT N) :LCS-TH (a3 / quota :CAT N :LCS-MOD-THING (a4 / china :CAT N) :LCS-MOD-THING (a5 / textile :CAT N) :LCS-MOD-THING (a6 / export :CAT N)) :LCS-MOD-MANNER (a8 / unilaterally :CAT ADV)) (A M R 1 ) In th is e x a m p le, (a4 / united states :CAT N), is th e a g e n t o f th e c o n c e p t reduce. A n d s im ila rly, N is th e c a te g o ry o f th e c o n c e p t united states. T h e b a s ic ro le :instance o r / is a lw a y s p re s e n t in a n o n a m b ig u o u s A M R. A n a m b ig u o u s A M R, i.e., a c o n g lo m e ra tio n o f d iffe re n t A M R s h a s o n e o r m o re ro le - v a lu e p a irs u s in g th e s p e c ia l ro le :OR. F o r e x a m p le, a n v a ria n t o f th e a b o v e A M R in w h ic h th e ro o t c o n c e p t is th re e w a y a m b ig u o u s w o u ld lo o k a s fo llo w s a t th e to p n o d e (# :OR (# / reduce... ) :OR (# / cut... ) :OR (# / decrease... )) (A M R 2 ) S in c e s u c h a m b ig u ity c a n o c c u r a n y w h e re in a n A M R, it p re s e n ts a c h a lle n g e to w ritin g s im p le lin e a riz a tio n ru le s w h o s e a p p lic a tio n is c o n d itio n a l u p o n s p e c ific A M R ro le c o m b in a tio n s a t d iffe re n t d e p th s. T h is is s u e is a d d re s s e d la te r in th is p a p e r.

T h e o u tp u t o f th e L in e a riz a tio n m o d u le is a w o rd la ttic e o f p o s s ib le w o rd s e q u e n c e re n d e rin g s. It in c lu d e s a m b ig u o u s p a th s re s u ltin g fro m u n d e r-s p e c ifie d fe a tu re s, s u c h a s d e fin ite n e s s, a n d u n d e te rm in e d re la tiv e w o rd o rd e rs, s u c h a s th a t o f m o d ifie rs. T h e fo llo w in g is a p o s s ib le w o rd la ttic e c o rre s p o n d in g to (A M R 1 ). ( S E Q ( W R D " * s t a r t - s e n t e n c e * " B O S ) ( W R D " u n i t e d s t a t e s " N O U N ) ( W R D " u n i l a t e r a l l y " A D J ) ( W R D " r e d u c e d " V E R B ) ( O R ( W R D " t h e " A R T ) ( W R D " a " A R T ) ( W R D " a n " A R T ) ) ( W R D " c h i n a " A D J ) ( O R ( S E Q ( W R D " e x p o r t " A D J ) ( W R D " t e x t i l e " A D J ) ) ( S E Q ( W R D " t e x t i l e " A D J ) ( W R D " e x p o r t " A D J ) ) ) ( W R D " q u o t a " N O U N ) ( W R D " * e n d - s e n t e n c e * " E O S ) ) (W L 1 ) T h e n th e s ta tis tic a l e x tra c tio n m o d u le e v a lu a te s th e d iffe re n t p a th s re p re s e n te d in th e w o rd la ttic e u s in g u n i a n d b ig ra m s ta tis tic s a n d re tu rn s th e fo llo w in g : u n ite d s ta te s u n ila te ra lly re d u c e d th e c h in a te x tile e x p o rt q u o ta. [ L E N G T H 1 0, S C O R E -4 1.6 5 7 1 7 4 ] u n ite d s ta te s u n ila te ra lly re d u c e d a c h in a te x tile e x p o rt q u o ta. [ L E N G T H 1 0, S C O R E -4 2.8 1 7 6 7 3 ] u n ite d s ta te s u n ila te ra lly re d u c e d th e c h in a e x p o rt te x tile q u o ta. [ L E N G T H 1 0, S C O R E -4 2.8 6 7 4 3 4 ] u n ite d s ta te s u n ila te ra lly re d u c e d a c h in a e x p o rt te x tile q u o ta. [ L E N G T H 1 0, S C O R E -4 4.0 2 7 9 3 2 ] u n ite d s ta te s u n ila te ra lly re d u c e d a n c h in a te x tile e x p o rt q u o ta. [ L E N G T H 1 0, S C O R E -4 4.7 4 6 7 1 1 ] u n ite d s ta te s u n ila te ra lly re d u c e d a n c h in a e x p o rt te x tile q u o ta. [ L E N G T H 1 0, S C O R E -4 5.9 5 6 9 7 1 ] T h e fo c u s o f th is p a p e r is o n th e im p le m e n ta tio n te c h n iq u e s o f th e L in e a riz a tio n m o d u le o f th e re a liz a tio n s y s te m. 3 M o tiv a tio n T h e L in e a riz a tio n m o d u le is b a s ic a lly a n im p le m e n ta tio n o f a s e t o f ru le s, a g ra m m a r, th a t g o v e rn s th e re la tiv e w o rd o rd e rin g (s y n ta x ) a n d w o rd fo rm (m o rp h o lo g y ) o f a ta rg e t la n g u a g e. A lin e a riz a tio n g ra m m a r c a n b e im p le m e n te d d e c la ra tiv e ly o r p ro c e d u ra lly. In th e d e c la ra tiv e a p p ro a c h, th e s y s te m c o n ta in s a g ra m m a r d e s c rip tio n fo rm a lis m a n d a lin e a riz a tio n e n g in e th a t in te rp re ts th e g ra m m a r o n -lin e a n d a p p lie s its ru le s to th e in p u t s e n te n c e re p re s e n ta tio n. T h e a d v a n ta g e s o f th is a p p ro a c h a re re u s a b ility, e a s y e x te n d ib ility a n d la n g u a g e in d e p e n d e n c e. Its main drawback is slow speed. Nitrogen s Linearization module is an example of this approach. It provides rules to decompose an AMR and order the results linearly. The Nitrogen grammar description formalism uses a recasting mechanism to transform AMRs into other AMRs. Besides the slowness inherited from the paradigm of its implementation, Nitrogen s grammar formalism is limited and inflexible: Rule application is conditional upon equality of concepts or existence of roles at the top level of an AMR only. This makes it impossible to write a single rule that is conditioned upon a combination of features at different levels. Cascading features is a solution to this problem that only increases the size of the grammar and aggravates the speed problem. Recasting operations are limited to adding feature-value pairs and introducing new nodes. Implementing a thematic hierarchy ordering in which thematic roles such as a g e n t and th e m e are recast as syntactic roles such as s u b je c t and o b je c t cannot be implemented in a single recast operation. Again, cascading of features is the only way to do this. An implementation of thematic hierarchies using cascading features is discussed in (Dorr et al. 1998). There is no mechanism to perform range-unbounded or computationally complex transformations. For example, number formatting is a transformation problem that requires access to functions such as

m u ltip lic a tio n a n d a d d itio n w h ic h a re n o t a v a ila b le to th e g ra m m a r. O n e in s ta n c e o f th is p ro b le m a p p e a re d in o u r s y s te m w h e n tra n s la tin g C h in e s e n u m b e rs re p re s e n te d a s m u ltip le s u n its o f 1 0,0 0 0. F o r e x a m p le, 8 0,0 0 0 is th e c o n c e p t 8 m o d ifie d b y th e c o n c e p t 1 0,0 0 0. M u ltip ly in g C h in e s e n u m b e r c o n c e p ts a n d fo rm a ttin g th e m in to E n g lis h n u m b e r s e q u e n c e s w a s n e c e s s a ry a n d is im p o s s ib le to d o u s in g re c a s tin g w ith o u t e n u m e ra tin g a ll c o m b in a tio n s! T h e p ro c e d u ra l a p p ro a c h to L in e a riz a tio n g ra m m a rs u s e s a p ro g ra m m in g la n g u a g e to im p le m e n t th e ru le s o f th e g ra m m a r. T h e m a in a d v a n ta g e s o f th is a p p ro a c h a re fle x ib ility, p o w e r a n d s p e e d. H a v in g a c c e s s to th e fu ll c o m p u tin g p o w e r o f a p ro g ra m m in g la n g u a g e o p e n s a lo t o f p o s s ib ilitie s fo r e ffic ie n t im p le m e n ta tio n. It a ls o frees the linearizer s designer from the restrictions of a limited declarative grammar by providing access to the operating system, databases, the web, etc. However, a major disadvantage of this approach is that the linguistic knowledge is coupled with the programming code. This hard-coding of grammar rules makes the system rather redundant, difficult to understand and debug, non-reusable and language specific. 4 o x y G e n The oxygen approach to implementing the Linearization module is a hybrid implementation between the declarative and procedural paradigms. oxygen uses a linearization grammar description language to write declarative grammar rules which are then compiled into a programming language for efficient performance. oxygen contains three elements: a linearization grammar description language (oxyl), an oxyl to Lisp compiler (oxycompile) and a run-time support library (oxyrun). Target language linearization grammars written in oxyl are compiled off-line into oxygen Linearizers using oxycompile (Graph 2). (G r a p h 2 ) oxygen Linearizers are Lisp programs that require the oxyrun library of basic functions in order to execute (Graph 3). They take AMRs as input and create word lattices that are passed on to some Statistical Extraction unit. (G r a p h 3 ) This implementation maximizes the advantages and minimizes the disadvantages inherent in the declarative and procedural paradigms: The separation between the linearization engine (oxycompile and oxyrun) and the linearization grammar (oxyl) combines in one system the best of two worlds: the simplicity and focus of a declarative grammar with the power and efficiency of a procedural implementation. It also provides language independence and reusability since needs of the target language are only addressed in its specific oxyl

g ra m m a r. S e c o n d ly, T h e ru n -tim e s e p a ra tio n b e tw e e n la n g u a g e -s p e c ific c o d e (c o m p ile d o x y L file oxygen Linearizer) and language-independent code (oxyrun) allows for efficient resource-sharing implementation especially when running multiple linearizers for different languages at the same time as in multilingual generation. Finally, oxygen s linearization grammar description language, oxyl, is as powerful as a regular programming language but with the focus on linearization needs. This is accomplished through providing powerful linearization mechanisms for the most common needs of a linearization grammar and also by allowing embedding of code in a standard programming language (Lisp) to allow for efficient implementation of the more language specific realization problems (e.g., Chinese number formatting). oxyl linearization grammars are also simple, clear, concise and easily extendible. An example of the simplicity of oxyl grammars is that redundant issues such as the handling of : O R ambiguities are hidden from the linearization grammar designer and are treated only in the compiler and support library. The following section describes oxyl s syntax and the mechanism of application of oxyl rules. 5 o x y L In many ways, it is similar to the language Nitrogen grammars are written in; however, it has several special features that makes it more powerful. First, oxyl linearization rules can be conditionally applied using general Boolean expressions and embedded if-then-else control flow structures which allows for powerful and compact linearization grammars. Second, oxyl provides accessibility functions that can return the value of any descendant of the AMR. Contrast these two features with Nitrogen s grammar s conditions of application which are flat if-then structures and use only equality of roles or role-value combinations at the top level of the AMR. Third, oxyl provides recasting mechanisms that are more powerful than Nitrogen s. For example, a thematic hierarchy recast in oxyl is implemented in a single rule whereas it requires as many rules as the number of hierarchy slots in Nitrogen. Finally, oxyl can embed calls to lisp functions that can be included in the oxyl file. This feature provides oxyl linearization grammars with access to all the tools available to a programming language. The rest of this section will describe oxyl s syntax. 5.1 O x y L B a s ic T o k e n s The function of different tokens in oxyl is marked through their form using a prefix symbol: variables are prefixed with a dollar sign ( e.g. $ f o r m, $ t e n s e ), role-names are prefixed with a colon ( e.g. : a g e n t, : c a t ) and functions are prefixed with an ampersand ( e.g. & e q, & P r o p e r N a m e H a s h ). Some of oxyl s functions resemble Lisp functions (e.g. & e q and e q ). However, their implementation is different in oxygen since ambiguity has to be handled. So, & e q for example is aware of the existence of : O R ed AMRs in which matching one of the possible @ O R s is enough to return true, whereas lisp e q is not. In addition to general functions, oxyl has a special class of functions called r e fe r e n tia l fu n c tio n s. These functions, which are prefixed with an a t sign ( e.g. @ a g e n t, @ t h i s ), are used to access values corresponding to specific roles of the current AMR. For example, @ L C S - A G returns the value corresponding to the role : L C S - A G. If the current AMR is (AMR 1) in section 2, @ L C S - A G returns ( a 2 / u n i t e d _ s t a t e s : c a t n ). The instance role, /, is returned using the special referential function @ i n s t. A referential function can specify the path from the current AMR s root to any value under it by concatenating the references along such path. For instance, if the current AMR is (AMR 1), @ L C S - A G. C A T returns N. If the current AMR contains multiple instances of the same role as in : L C S - M O D - T H I N G in (AMR 1), the values are combined in a : O R structure. For example, if the current AMR is (AMR 1), @ L C S - T H. L C S - M O D - T H I N G. I N S T returns ( # : O R c h i n a : O R t e x t i l e : O R e x p o r t ). Access to the full current AMR is provided through the self-referential function @ t h i s. For example, @ t h i s. a g e n t is equal to @ a g e n t. The last oxyl basic token type is Macros, which are prefixed with a circumflex (e.g. ^ N P - N O M ). Macros are treated like variables except that while variables appear as is in the compiled grammar, macros are substituted in the compiler. The use of macros makes the grammar description more concise. For example, if a set of role-

v a lu e p a irs is v e ry c o m m o n ly u s e d s u c h a s ( : F o r m N P : C a s e N O M ), th e y c a n b e re fe rre d to u s in g a s in g le m a c ro, ^ N P - N O M.. 5.2 o x y L F ile A n o x y L file c o n ta in s th e a s e t o f d e c la ra tio n s. S o m e a re o b lig a to ry (m a rk e d b e lo w w ith a n a s te ris k ) fo r p ro p e r c o m p ila tio n in to L is p c o d e. O th e rs in tro d u c e s y m b o ls th a t c o u ld b e u s e d e v e n tu a lly in th e g ra m m a r ru le s s u c h a s g lo b a l v a ria b le o r s p e c ia l lis p fu n c tio n s. T h e fo llo w in g is a lis t o f th e s e d e c la ra tio n s : D e c la r a tio n F u n c tio n E x a m p le : L a n g u a g e * N a m e o f g e n e ra te d g ra m m a r :Language English :SupportCode U s e r-d e fin e d L is p fu n c tio n s :SupportCode ( <lisp code> ) :SupportInclude L is p file to lo a d a t ru n tim e :SupportInclude support.lisp :CLASS D e fin e s a c la s s o f ro le s :CLASS :THETA (:AG :TH :GOAL :SRC) :GLOBAL D e c la re s a g lo b a l v a ria b le :GLOBAL $ :MACRO D e c la re s a m a c ro :MACRO ^NP-ACC (:CAT N :CASE ACC) :MORPH * D e fin e s th e m o rp h o lo g ic a l g e n e ra tio n fu n c tio n :SupportInclude EnglMorph.lisp :MORPH (&Morph @word @morphemes) :RULES * D e fin e s th e g ra m m a r :RULES <Linearization-Grammar> * O b lig a to ry d e c la ra tio n s A ll L is p s u p p o rtin g c o d e in tro d u c e d th ro u g h : S u p p o r t I n c l u d e o r : S u p p o r t C o d e n e e d a ll in te rfa c in g fu n c tio n s to b e p re fix e d w ith a n & lik e o x y L g e n e ra l fu n c tio n s. A :Class is a " s u p e r" ro le. It is a c o v e r s y m b o l th a t c a n b e u s e d to re fe re n c e d iffe re n t c la s s e s o f ro le s. F o r e x a m p le, :THETA c a n b e d e fin e d to re fe r to a ll th e m a tic ro le s a n d :MOD c a n re fe r to a ll ty p e s o f m o d ifie rs. O n c e d e fin e d, re fe re n tia l fu n c tio n s c a n b e u s e d fo r it. In te rn a lly, c la s s ro le s a n d re g u la r ro le s a re p ro c e s s e d d iffe re n tly b u t th a t is h id d e n fro m th e u s e r. T h e s y n ta x o f th e o x y L g ra m m a r ru le s d e c la re d u s in g : R U L E S is d e s c rib e d in th e n e x t s e c tio n. 5.3 o x y L T a r g e t L a n g u a g e G r a m m a r < G R A M M A R > : : = < R U L E > + < R U L E > : : = ( [ = = < A S S I G N > ] {?? < C O N D > - > < R E S U L T > } * [ - > < R E S U L T > ] ) < A S S I G N > : : = ( ( < v a r i a b l e > < v a l u e > ) + ) < C O N D > : : = < B o o l e a n E x p r e s s i o n > < R E S U L T > : : = < R U L E > < S E Q U E N C E > < S E Q U E N C E > : : = ( { < A M R > < R E C A S T > } + } ) ( O R < S E Q U E N C E > < S E Q U E N C E > + ) < R E C A S T > : : = ( < A M R > { < R E C A S T - O P > < R E C A S T - O P - A R G S > } + ) (B N F 2 )

(B N F 2 ) d e s c rib e s th e s y n ta x o f a n o x y L g ra m m a r. A g ra m m a r c o n s is ts o f a s e t o f o rd e re d ru le s e a c h o f w h ic h is c o n s id e re d fo r a p p lic a tio n o v e r th e c u rre n t A M R. E a c h ru le h a s a n o p tio n a l a s s ig n m e n t s e c tio n, in tro d u c e d w ith = =, in w h ic h lo c a l v a ria b le s a re d e fin e d. T h e s e c o n d p a rt o f a ru le is a n o p tio n a l c o n d itio n a n d re s u lt p a ir th a t c a n b e re p e a te d m u ltip le tim e s. C o n d itio n s a re in tro d u c e d w ith?? a n d re s u lts w ith - >. A n d fin a lly a n o p tio n a l re s u lt th a t is tre a te d a s th e d e fa u lt if a ll c o n d itio n s fa il. A re s u lt c a n b e a ru le in its e lf w ith a ll o f th e d e s c rib e d p o rtio n s o r it c a n b e a s e q u e n c e o f A M R s o r A M R -re tu rn in g to k e n s s u c h a s v a ria b le s o r fu n c tio n s. T h e a b ility to e m b e d ru le s w ith in ru le s a n d d e c la re lo c a l v a ria b le w ith d e e p s c o p e a llo w s u s e rs to lim it th e s iz e o f th e g ra m m a r a n d in c re a s e th e s p e e d o f its a p p lic a tio n lo g a rith m ic a lly. T h e lin e a r o rd e r o f A M R s in th e re s u lt s p e c ifie s th e lin e a r o rd e r o f th e s u rfa c e fo rm s c o rre s p o n d in g to th e s e A M R s. T h e g ra m m a r is ru n re c u rs iv e ly o v e r e a c h o n e o f th e d iffe re n t A M R s. T h is p ro c e s s c o n tin u e s u n til te rm in a l v a lu e s, i.e. s u rfa c e fo rm s, a re re a c h e d. C o n s id e r th e fo llo w in g o v e rs im p lifie d ru le : ( = = ( ( $ f o r m @ f o r m ) )?? ( & e q $ f o r m S ) - > (?? ( & e q @ v o i c e P a s s i v e ) - > (@object (&passivize @inst) by @subject) -> (@subject @inst @object))) (R u le 1 ) In itia lly, th is ru le ta k e s th e v a lu e o f th e ro le : f o r m in th e c u rre n t A M R a n d a s s ig n s it to th e v a ria b le $ f o r m. In th e c a s e th e v a lu e o f $ f o r m e q u a ls S, a s e c o n d c h e c k o n th e v o ic e o f th e c u rre n t A M R is d o n e. If th e v o ic e is p a s s iv e, th e p a s s iv e w o rd o rd e r is re a liz e d. O th e rw is e, th e a c tiv e v o ic e w o rd o rd e r is re a liz e d. T h e g ra m m a r is th e n c a lle d re c u rs iv e ly o v e r th e A M R s o f @subject, @ o b j e c t a n d @ i n s t. T h e fu n c tio n & p a s s i v i z e ta k e s th e A M R o f @ i n s t a s in p u t a n d c a n re tu rn a p a s s iv e v e rb A M R th a t g e ts p ro c e s s e d b y th e g ra m m a r o r a te rm in a l w o rd s e q u e n c e. In a d d itio n to A M R s, a lin e a riz a tio n s e q u e n c e c a n c o n ta in A M R re c a s t o p e ra tio n s. A re c a s t o p e ra tio n is m a d e o u t o f a n A M R fo llo w e d b y o n e o r m o re p a irs o f re c a s t o p e ra to r a n d re c a s t o p e ra to r a rg u m e n ts. R e c a s t o p e ra tio n s m o d ify A M R s b e fo re th e y a re re c u rs iv e ly ru n th ro u g h th e g ra m m a r. T h e re c a s t m e c h a n is m is v e ry u s e fu l in re s tru c tu rin g th e c u rre n t A M R o r a n y o f its c o m p o n e n ts. F o r e x a m p le, th e + + re c a s t o p e ra to r a d d s ro le -v a lu e p a irs to a n A M R. T h is is u s e fu l in c a s e s s u c h a s a d d in g c a s e m a rk in g ro le s o n th e s u b je c t a n d o b je c t A M R s w h e re s u c h c a s e m a rk e rs a re n o t s p e c ifie d in th e o rig in a l, m o re s e m a n tic, re p re s e n ta tio n. (R u le 1 ) d e s c rib e d in th e p re v io u s s e c tio n c o u ld b e m o d ifie d to s p e c ify c a s e a s fo llo w s : (== (($form @form))?? (&eq $form S) -> (?? (&eq @voice Passive) -> ((@object ++ (:case nom)) (&passivize @inst) by (@subject ++ (:case gen))) -> ((@subject ++ (:case nom)) @inst (@object ++ (:case acc))))) (R u le 2 ) T h e fo llo w in g is a lis t o f o x y L re c a s t o p e ra to rs a n d th e ir u s a g e fo rm a lis m a n d fu n c tio n a lity : N a m e O P U s a g e F u n c tio n A d d + + (AMR ++ :role 0 value 0 :role 1 :value 1 ) A d d ro le -v a lu e p a irs to A M R D e le te - - (AMR -- (:role 0 :role 1 ) R e m o v e a ll ro le n -v a lu e p a irs R e p la c e & & (AMR && (:role 0 value 0 :role 1 value 1 ) R e p la c e v a lu e s o f : r o l e n S im p le < < ( A M R < < ( : n e w / : o l d 0 : o l d 1 )) * R e n a m e a ll e x is tin g : o l d n a s

R e c a s t H ie ra rc h y R e c a s t : n e w <! ( A M R <! ( : n e w 0 : n e w 1 / :old 0 :old 1 )) * H ie ra rc h ic a lly re n a m e a v a ila b le : o l d n a s : n e w n M o rp h + - (AMR +- morpheme) In v o k e th e m o rp h o lo g ic a l g e n e ra tio n fu n c tio n o n th e A M R if it is a v a lu e, o r o n its in s ta n c e * T h e u s e o f / h e re is d iffe re n t fro m its ro le a s a s h o rth a n d fo r : i n s t. 6 E v a lu a tio n In th is s e c tio n, o x y G e n is e v a lu a te d b a s e d o n S p e e d o f p e rfo rm a n c e, S iz e o f g ra m m a r, E x p re s s iv e n e s s o f th e g ra m m a r d e s c rip tio n la n g u a g e, R e u s a b ility a n d R e a d a b ility /W rita b ility. T h e e v a lu a tio n c o n te x t is p ro v id e d b y c o m p a rin g a n o x y G e n L in e a riz a tio n g ra m m a r fo r E n g lis h to tw o o th e r im p le m e n ta tio n s, o n e p ro c e d u ra l (u s in g L is p ) a n d o n e d e c la ra tiv e (u s in g N itro g e n L in e a riz a tio n m o d u le ). T h re e c o m p a ra b le lin e a riz a tio n g ra m m a rs a re u s e d to c a lc u la te s p e e d a n d s iz e. A ll th re e w e re a c tu a lly im p le m e n te d a t d iffe re n t s ta g e s o f d e v e lo p m e n t in th e C h in e s e -E n g lis h M T s y s te m m e n tio n e d in s e c tio n 2. S p e e d : T w o te s ts w e re p e rfo rm e d. T h e firs t te s t u s e s a s m a ll c o rp u s o f 1 0 0 s im p le A M R s o f a n a v e ra g e o f 1 7 p a rtic le s (la b e l, ro le o r te rm in a l v a lu e ) p e r A M R. T h e s e c o n d te s t u s e s a c o rp u s o f 2 1 3 A M R s re p re s e n tin g tra n s la te d C h in e s e n e w s a rtic le s e n te n c e s. T h e s e a v e ra g e d 4 6 3 n o d e s a n d 7 : O R s p e r A M R. T h e fo llo w in g ta b le c o n ta in s th e tim e s s p e n t o n a v e ra g e p e r s y s te m in m illis e c o n d s. T h e L is p im p le m e n ta tio n is th e fa s te s t fo llo w e d b y o x y G e n. N itro g e n la g s b e h in d c o n s id e ra b ly. P r o c e d u r a l (L is p ) o x y G e n D e c la r a tiv e (N itr o g e n ) T e s t 1 3.8 4 m s 3 7.6 7 m s 6 3 0.5 6 m s T e s t 2 1 1.5 0 m s 2 7 8.4 5 m s 1 7 0 2 8.0 0 m s S iz e : T h e fo llo w in g ta b le c o n ta in s th e s iz e o f c o d e in lin e s o f c o d e o f th e th re e im p le m e n ta tio n s. T h e o x y G e n c o d e s iz e is th e s u m o f th e o x y L g ra m m a r (1 9 2 lo c )a n d th e L is p E n g lis h s u p p o rt fu n c tio n s (6 2 lo c ). T h e N itro g e n c o d e s iz e is th e s u m o f Nitrogen s English grammar (1655 lo c ) and an extension grammar to make it compatible with our system (375 lo c ). Clearly, oxygen performs the best. P r o c e d u r a l D e c la r a tiv e (L is p ) o x y G e n (N itr o g e n ) Size 7 6 3 lo c 2 5 2 lo c 2 0 3 0 lo c E x p r e s s iv e n e s s : Lisp and oxygen are equally expressive in the sense of their accessibility to computational tools as described earlier. Whereas Nitrogen falls behind. R e u s a b ility : Both Nitrogen and oxygen are language independent, an advantage over any procedural implementation. R e a d a b ility /W r ita b ility : All three approaches need a certain amount of training. However, oxygen s simple syntax is an advantage over lisp (for linearization purposes, that is). Its compact powerful rules are an advantage over Nitrogen s simple rule mechanisms.

O v e r a ll: o x y G e n h a s th e b e s t o v e ra ll p e rfo rm a n c e o f th e th re e s y s te m s. P r o c e d u r a l (L is p ) o x y G e n D e c la r a tiv e (N itr o g e n ) S p e e d + 0 - S iz e 0 + - E x p re s s iv e n e s s + + - R e u s a b ility - + + R e a d a b ility / W rita b ility - + - 7 F u tu r e W o r k T h is p ro je c t is s till in its in itia l p h a s e s a n d m o re w o rk is s till n e e d e d. A s fa r a s th e o x y L la n g u a g e d e fin itio n a n d th e ru n tim e lib ra ry s u p p o rt o x y R u n, m o re to o ls a n d fu n c tio n lib ra rie s a re n e e d e d s u c h a s m e ta -le v e l fu n c tio n s th a t re tu rn in fo rm a tio n a b o u t th e c u rre n t A M R, e.g., its ro le u n d e r its p a re n t A M R, th e n u m b e r o f th e ta ro le s o r m o d ifie rs in it, its to ta l d e p th, e tc. S u c h in fo rm a tio n c a n b e v e ry h e lp fu l fo r s e n te n c e p la n n in g p u rp o s e s. O th e r fu n c tio n lib ra rie s c a n b e c re a te d to h a n d le g e n e ra tio n o f s p e c ific d o m a in s s u c h a s tim e /d a te fo rm a ttin g, n e w s p a p e r title s, e tc. A s fo r o x y C o m p ile, m o re d e b u g g in g to o ls a n d e rro r h a n d lin g ro u tin e s a re n e e d e d to m a k e th e s y s te m m o re ro b u s t a n d u s e r-frie n d ly. In d e p e n d e n tly o f th e e n g in e its e lf, m o re o x y L g ra m m a rs fo r o th e r la n g u a g e s a re n e e d e d to te s t th e s y s te m s e x te n d ib ility. A ra b ic a n d S p a n is h g e n e ra tio n a re e s p e c ia lly u n d e r c o n s id e ra tio n s in c e w e c u rre n tly h a v e a ll th e n e e d e d re s o u rc e s g iv e n o u r L C S -B a s e d M a c h in e T ra n s la tio n p a ra d ig m. A p o s s ib le e x te n s io n to th e o x y G e n s u ite c o u ld b e to a llo w d iffe re n t in p u t fo rm a ts y e t s till u s in g th e s a m e c o m m o n e n g in e. O th e r p o s s ib le in p u t fo rm a ts b e s id e s P e n m a n s e n te n c e p la n n in g in c lu d e N M S U F -S tru c tu re s, X M L a n d C y c L. S u c h a n e n d e a v o r w o u ld re q u ire a h ig h e r le v e l o f s e p a ra tio n b e tw e e n th e c o m p ile r a n d th e in p u t fo rm a t w h ic h h a s to b e s p e c ifie d to th e c o m p ile r th ro u g h s o m e in p u t la n g u a g e d e fin itio n g ra m m a r. A n o th e r a re a fo r p o s s ib le fu tu re w o rk is to u s e o f o x y G e n a s p a rt o f N L P a p p lic a tio n s b e s id e s m a c h in e tra n s la tio n s u c h a s te x t s u m m a riz a tio n. 8 C o n c lu s io n I h a v e p re s e n te d a la n g u a g e in d e p e n d e n t lin e a riz a tio n e n g in e th a t c o m p ile s ta rg e t la n g u a g e g ra m m a rs in to p ro g ra m s th a t ta k e a b s tra c t m e a n in g re p re s e n ta tio n s a s in p u t a n d g e n e ra te w o rd la ttic e th a t c a n b e p a s s e d a lo n g to a s ta tis tic a l e x tra c tio n m o d u le. T h e g ra m m a rs a re w ritte n u s in g a fle x ib le a n d p o w e rfu l la n g u a g e, o x y L, th a t h a s th e p o w e r o f a p ro g ra m m in g la n g u a g e b u t fo c u s e s o n n a tu ra l la n g u a g e re a liz a tio n. T h is a p p ro a c h w a s e v a lu a te d to b e m o re e ffic ie n t th a n o th e r p u re ly d e c la ra tiv e o r p ro c e d u ra l a p p ro a c h e s. 9 A c k n o w le d g e m e n ts T h is w o rk h a s b e e n s u p p o rte d b y N S A C o n tra c t M D A 9 0 4-9 6 -C -1 2 5 0 a n d N S F P F F /P E C A S E A w a rd IR I- 9 6 2 9 1 0 8. I w o u ld lik e to th a n k m e m b e rs o f th e C L IP la b fo r h e lp fu l c o n v e rs a tio n s a n d a d v ic e a n d e s p e c ia lly B o n n ie D o rr, P h ilip R e s n ik, D a v id T ra u m a n d A m y W e in b e rg. I w o u ld a ls o lik e to th a n k K e v in K n ig h t a n d Ire n e L a n g k ild e fo r m a k in g th e N itro g e n s y s te m a v a ila b le a n d h e lp w ith u n d e rs ta n d in g th e N itro g e n g ra m m a r fo rm a lis m.

1 0 R e fe r e n c e s D o rr, B o n n ie a n d N iz a r H a b a s h a n d D a v id T ra u m. A T h e m a tic H ie ra rc h y fo r E ffic ie n t G e n e ra tio n fro m L e x ic a l C o n c e p tu a l S tru c tu re. In P r o c e e d in g s o f th e th ir d C o n fe r e n c e o f th e A s s o c ia tio n fo r M a c h in e T r a n s la tio n in th e A m e r ic a s (A M T A ), p a g e s 3 3 3 --3 4 3, L a n g h o rn e, P A. 1 9 9 8. K n ig h t, K e v in a n d V a s ile io s H a tz iv a s s ilo g lo u. T w o -L e v e l, M a n y -P a th s G e n e ra tio n. In P r o c e e d in g s o f A C L -9 1, p a g e s 1 4 3 --1 5 1, 1 9 9 1. L a n g k ild e, Ire n e a n d K e v in K n ig h t. G e n e r a tin g W o r d L a ttic e s fr o m A b s tr a c t M e a n in g R e p r e s e n ta tio n. T e c h n ic a l R e p o rt, In fo rm a tio n S c ie n c e In s titu te, U n iv e rs ity o f S o u th e rn C a lifo rn ia, 1 9 9 8 a. L a n g k ild e, Ire n e a n d K e v in K n ig h t. G e n e ra tio n th a t E x p lo its C o rp u s -B a s e d S ta tis tic a l K n o w le d g e. In P r o c e e d in g s o f C O L IN G -A C L 9 8, p a g e s 7 0 4 --7 1 0, 1 9 9 8 b. L a n g k ild e, Ire n e a n d K e v in K n ig h t. T h e P ra c tic a l V a lu e o f N -G ra m s in G e n e ra tio n. In In te r n a tio n a l N a tu r a l L a n g u a g e G e n e r a tio n W o r k s h o p, 1 9 9 8 c. P e n m a n. T h e P e n m a n D o c u m e n ta tio n. T e c h n ic a l re p o rt, U S C /In fo rm a tio n S c ie n c e s In s titu te. 1 9 8 9. T ra u m, D a v id a n d N iz a r H a b a s h. G e n e ra tio n fro m L e x ic a l C o n c e p tu a l S tru c tu re s. In P r o c e e d in g s o f W o r k s h o p o n A p p lie d In te r lin g u a s, N A A C L /A N L P 2 0 0 0, S e a ttle W a s h in g to n, 2 0 0 0.