Sen tence Boundary D etection in B iom ed ica l Texts Using Con textm orpholog ica l Fea tures

Similar documents
M otor Fault D iagnosis w ith M ultisen sor Da ta Fusion

V o l. 21, N o. 2 M ar., 2002 PRO GR ESS IN GEO GRA PH Y ,, 2030, (KZ9522J 12220) E2m ail: w igsnrr1ac1cn

Internet-assisted Chinese-English Dictionary Compilation

M ultisen sor Information Fusion and Its Application: A Survey

, H 1M ayer G1W eigend

Sof tware Eng ineer ing for Autonom ic Com puting

A L A BA M A L A W R E V IE W

The Effects of Physica l and Chem ica l Cond ition s on Form ing M ycelia l Pellet of P hanerochaete ch rysosp orium and B iosorption of L ead

( Stationary wavelet transform, SW T) [ 5 ]

, (A ssociation fo r Educational Comm un ication s and T echno logy, A ECT ) (Barbara Seels),

The Electron ic PSC Testing System

, ( IM A G) (10 cm 10 cm 6. 5 cm ) > 5 ),, m , 2. 5 m. m 2 ), (V PD ),

5 Ch inese Journal of M anagem ent : A

3. 1% 1. 7% 0. 8% 0. 5% 0. 1% ;

90% , :W o lfe R. R. (1985) [ 2 ] (BCC2boundary chain code), 98. 5%, , Pavlid is T. (1982) , [ 4 ] 2

(IGBP) km 2,? PRO GR ESS IN GEO GRA PH Y. V o l. 20, N o. 4 D ec., 2001 : (2001) m m 3,

On the M in imum Spann ing Tree Determ ined by n Poin ts in the Un it Square

China Academic Journal Electronic Publishing House. All rights reserved.

M M 3. F orc e th e insid e netw ork or p rivate netw ork traffic th rough th e G RE tunnel using i p r ou t e c ommand, fol l ow ed b y th e internal

N o V o l. 18. JOU RNAL O F T S IN GHUA UN IV ER S IT Y (Ph ilo sophy and Social Sciences) ( IT ) [2 ] [1 ] ; (H en ri Fayo l) :

CPU. 60%/yr. Moore s Law. Processor-Memory Performance Gap: (grows 50% / year) DRAM. 7%/yr. DRAM

(L uf engp ithecus luf engensis)

H STO RY OF TH E SA NT

Use precise language and domain-specific vocabulary to inform about or explain the topic. CCSS.ELA-LITERACY.WHST D

Agenda Rationale for ETG S eek ing I d eas ETG fram ew ork and res u lts 2

CH COOCH 2CH 2C nf 2n+ 1 (n= 6, 8, 10), D u Pont

(2009) Journal of Rem ote Sensing (, 2006) 2. 1 (, 1999), : ( : 2007CB714402) ;

What are S M U s? SMU = Software Maintenance Upgrade Software patch del iv ery u nit wh ich once ins tal l ed and activ ated prov ides a point-fix for

(w ater resou rces stress) [1 ]

2001 Journal of X i an Fo reign L anguages U niversity V o l. 9 N o. 3 ( )

( name, ), 1 ( a), (p lay2scrip t) ( act) ( b),

Form and content. Iowa Research Online. University of Iowa. Ann A Rahim Khan University of Iowa. Theses and Dissertations

1. 1 M oo 3-TiO 2gSiO 2. Perk in E lm er2l am bda 35 UV 2V is Spectrom eter : E2m ail: tp ṫ tj. cn

S e x m a turity a nd m a ting ha b its of H e licove rp a a rm ige ra (Hubne r)

The Ind ian Mynah b ird is no t fro m Vanuat u. It w as b ro ug ht here fro m overseas and is now causing lo t s o f p ro b lem s.

Use precise language and domain-specific vocabulary to inform about or explain the topic. CCSS.ELA-LITERACY.WHST D

M Line Card Redundancy with Y-Cab l es Seamless Line Card Failover Solu t ion f or Line Card H ardw or Sof t w are Failu res are Leverages hardware Y-

A Ge ne ric B la ckboa rd2b a se d D a ta Fus ion S ys tem

19 2 A CTA AN THRO POLO G ICA S IN ICA. M ay, ) 11, (H y loba tes concolor) (Keystone species) : ( )

O verv iew on Con trol Stra teg ies of Brushless D oubly - Fed M ach ines. L IU Hang - hang, HAN L i

(PEO ) (PPO ). (DM F) 2L ic lo 4. V o l. 4 N o. 1 Feb EL ECTROCH EM ISTR Y ) . (PAN ) 2 (XPS) (NM R ) , L i + PAN (N SFC)

A Comparison of Two Methods of Teaching Computer Programming to Secondary Mathematics Students.

Table of C on t en t s Global Campus 21 in N umbe r s R e g ional Capac it y D e v e lopme nt in E-L e ar ning Structure a n d C o m p o n en ts R ea

Double closed2control of active filter using repetitive algorithm

; ; ; The rela tion sh ip between landscape pa ttern and the habita t of g ian t panda s on the southern slope of the m iddle Qin ling M oun ta in s

Lesson Ten. What role does energy play in chemical reactions? Grade 8. Science. 90 minutes ENGLISH LANGUAGE ARTS

EKOLOGIE EN SYSTEMATIEK. T h is p a p e r n o t to be c i t e d w ith o u t p r i o r r e f e r e n c e to th e a u th o r. PRIMARY PRODUCTIVITY.

Th e E u r o p e a n M ig r a t io n N e t w o r k ( E M N )

Electron ic pole changing techn ique of multi2phase induction motor

Alles Taylor & Duke, LLC Bob Wright, PE RECORD DRAWINGS. CPOW Mini-Ed Conf er ence Mar ch 27, 2015

ID: V i e w A l l F i r s t 1 of 1 L a s t

Information System Desig

Spontaneous reactions occur only between the reactants shown in red.

Con struction and applica tion of m odeling tendency of land type tran sition ba sed on spa tia l adjacency

M odeling and sim ula ting the forag ing system in multi2source groups w ith random d isturbances

(A ID S A cqu ired Imm une D eficiency Syndrom e ),, (H IV ) 580, 200, , % 10%,

Chinese Journal of Scientific Instrument. High frequency we ighted M FCC extraction for noise robust speaker ver if ication

Vol112, No11 Feb1, 2010 JOURNAL OF GEO2INFORMATION SC IENCE , CBERS IRS - P5, ;, : ; : E2mail: lreis1ac1cn [ 6-13 ]

B ench mark Test 3. Special Segments in Triangles. Answers. Geometry B enchmark T ests. 1. What is AC if } DE is a midsegment of the triangle?

Improving estimations of a robot s position and attitude w ith accelerom eter enhanced odometry

6 500 ka BP A CTA GEO GRA PH ICA S IN ICA. M ay, Fe2O 3 CaCO 3. E2m ail: L snnu1edu1cn

Beechwood Music Department Staff

Bellman-F o r d s A lg o r i t h m The id ea: There is a shortest p ath f rom s to any other verte that d oes not contain a non-negative cy cle ( can

E2m il: supermap1com. DO I: / j1issn ,, , 200, : Vol133 No11 Jan1

OH BOY! Story. N a r r a t iv e a n d o bj e c t s th ea t e r Fo r a l l a g e s, fr o m th e a ge of 9

P a g e 5 1 of R e p o r t P B 4 / 0 9

Study on disturbance torques compensation in high precise servo turn table control system

Results as of 30 September 2018

I zm ir I nstiute of Technology CS Lecture Notes are based on the CS 101 notes at the University of I llinois at Urbana-Cham paign

Sodium-Initiated Polymerization of Alpha- Methylstyrene in the Vicinity of Its Reported Ceiling Temperature

ators, ETSO), PJM ATC 1. 1 O rder 2000 [ 2 ], (A ssociation of European Transm ission System Oper2 ( Transm ission System Operator, TSO ),


I/O7 I/O6 GND I/O5 I/O4. Pin Con fig u ra tion Pin Con fig u ra tion

Drugs other than alcohol (medicines and illicit drugs) in people involved in fatal road accidents in Spain

The Ability C ongress held at the Shoreham Hotel Decem ber 29 to 31, was a reco rd breaker for winter C ongresses.

Nov Julien Michel

W Table of Contents h at is Joint Marketing Fund (JMF) Joint Marketing Fund (JMF) G uidel ines Usage of Joint Marketing Fund (JMF) N ot P erm itted JM

EL ECTR IC MACH IN ES AND CON TROL. Study on rotor broken2bar fault in induction m otors based on spectrum analysis of Hilbert modulus

INTERIM MANAGEMENT REPORT FIRST HALF OF 2018

176 5 t h Fl oo r. 337 P o ly me r Ma te ri al s

Qua lity C la ssif ica tion M ethod for F ingerpr in t Image Ba sed on Support Vector M ach ine

: g (2001) : T G334; TH 133 : A. B laha. gencke [ 1 ] ,,,,,,,

A discussion on methodologies for research into complex system s

Foreword by Yvo de Boer Prefa ce a n d a c k n owledge m e n ts List of abbreviations


o l. 27 N o. 2 : A : P ], Ep stein [ 2 ] L eith [ 3 ] ,WM O , (N CEP) [ 627, 9, ]

Description LB I/O15 I/O14 I/O13 I/O12 GND I/O11 I/O10 I/O9 I/O8

COMPILATION OF AUTOMATA FROM MORPHOLOGICAL TWO-LEVEL RULES

e-hm REPAIR PARTS REPAIR PARTS ReHM R3

4A (Automatized A t2 mospheric Absorp tion A tlas) , 4A, NOVELTIS Laboratoire de. MetOp 4A /OP 3 IASI, AR ID LAND GEOGRAPHY Jan.

M itchelson R L , (Wolfson index) ( Tsui - W ang index) : ; : : ( ) :,, E - mail: edu.

A Pr imary Con tra stive Study of the Form s of Ch inese2english Ind irect Anaphora

Chouliaraki Fairclough

C o r p o r a t e l i f e i n A n c i e n t I n d i a e x p r e s s e d i t s e l f

Designing the Human Machine Interface of Innovative Emergency Handling Systems in Cars

(Com puter A ssisted A ssess2 m en t)

( ) (Computational System s B iology), China Academic Journal Electronic Publishing House. All rights reserved.

c. What is the average rate of change of f on the interval [, ]? Answer: d. What is a local minimum value of f? Answer: 5 e. On what interval(s) is f

China Academic Journal Electronic Publishing House. All rights reserved JOURNAL OF NATURAL RESOURCES Aug, 2009

M a rtin H. B r e e n, M.S., Q u i T. D a n g, M.S., J o se p h T. J a in g, B.S., G reta N. B o y d,

Transcription:

27 1 2006 1 M IN I- M ICRO SYST EM S V o l127 N o 1 Jan 2006 1, 2, 1, 1 1, 1 (,, 610065) 2 (,, 610065) E2m ail: yuzhonghua@cs scu edu cn :,,,, M edline, 99%,,,,,, : : T P391 : A : 100021220 (2006) 0120180205 Sen tence Boundary D etection in B iom ed ica l Texts Using Con textm orpholog ica l Fea tures YU Zhong2hua 1, ZHAN G Rong 2, TAN G Chang2jie 1, ZUO J ie 1, ZHAN G T ian2qing 1 1 (Comp u ter S cience S chool, S ichuan U niversity, Cheng d u 610065, Ch ina) 2 (N etw ork E d ucation S chool, S ichuan U niversity, Cheng d u 610065, Ch ina) Abstract: A sentence boundary detection algo rithm is p ropo sed fo r info rm ation extraction from biom edical texts acco rding to characteristics of the texts and special requirem ents of info rm ation extraction featu res and supervised learn ing techno logy T he algo rithm is based on context mo rpho logical In con trast to algo rithm s developed fo r sen tence boundary detection in comm on English texts, the algo rithm does no t use special vocabulary and gramm atical level info rm ation, and m akes decision about sen2 tence boundary just based on mo rpho logical info rm ation of the context w o rdṡ A m axim um entropy detecto r and a SVM detec2 to r are developed by using these featureṡ Experim ents done on M edline abstracts show that the algo rithm has ach ieved accura2 cy of recognition above 99%, and m axim um entropy and SVM m ethods have the app roxim ate perfo rm ance fo r the p roblem of sentence boundary disam biguation T he experim ents also show that just using mo rpho logical level info rm ation w ithout supp le2 m entary vocabulary and gramm atical level info rm ation rem ains the app roxim ately sam e perfo rm ance as the o ther algo rithm s us2 ing supp lem entary vocabulary and gramm atical level info rm ation fo r common English texts do Key words: natu ral language p rocessing b iom edical info rm ation ex traction sen tence boundary detection m ach ine learn ing 1,,,,,,,,, (GenBank),, :,, [1, 2 (1) ], [3 (2) ],, [428 (3) ] [9 (4) ], [10, 11 (5) ] : 2004207213 : (60073046) (20020610007) :,, 1967,,,,,, 1963,,,, 1946,,,,, 1977,,,, 1972,, 1994-2006 China Academic Journal Electronic Publishing House All rights reserved http://wwwcnkinet

1 : 181, ) (,, ),, : 3 [16 ],,,,,, :,,,,,,,,, 4,, 5,?,!,,,, 6,,, ( 3 14 ) ( U 3 S ),, ( T he p resident lives in W ash ington D, [ 16 ] C ),,, M EDL IN E [17 ], ( ) 2 (1)?,,, (2)?, [12 ] [13215 ], (3),,,,, [12 ] 100 1, 570 M EDL IN E F lex, W SJ (W all Street Journal) [16 ], 12, 200 0 9% ( W SJ ), 1 (P recision ),, (R ecall), F2 (F2M easure) (E rro r2r ate),,,,,? 1 [13 ], M EDL IN E (W SJ) [16 ] [13 ] [14, ] [15, ], P recision 99 93% 99 56% Recall 71 03% 76 95% : ( M ṙ ), F2M easure 83 04% 86 81%,, Erro r2rate 29 02% 16 25%, W SJ 0 39% 22% (,?,! ",, ), P - Po sitive,,, P - N egative,n - Po sitive,,, N -, N egative,,,,,, F2 :, ( Info rm ation Extrac2 P - P ositive tion ), ( Info rm ation R etrieval), P recision= P - P ositive+ N - P ositive,, P - P ositive R ecall= P - P ositive+ P - N eg ative, 2 P recision R ecall F - M easu re= (,, P recision+ R ecall 1994-2006 China Academic Journal Electronic Publishing House All rights reserved http://wwwcnkinet

182 2006 P - N eg ative+ N - P ositive E rror2r ate= P - P ositive+ P - N eg ative,, 1, (A cronym ),,, F5 F5,,, (F2M easure 86 81% ),, (F2M easure 83 04% ) : W 2 A 2 M edline,,,! ( : A [ 1 ], A [ 2 ], )?, K2W A ( ),, : W,, TRU E,, FAL SE,, : boo l bretv alue = false,,, bretv alue = true, break } 4 1,,,, 2, Eh rlich ia E 2,,, 1,, (Shallow Parsing),,,,,,,,, Eh rlich ia chaffeensis, an obligato ry intracellular bacterium of, monocytes o r m acrophages, is the etio logic agent of hum an monocytic eh rlich io sis< S> < gs> O ur p revious study show ed that gamm a in2,, terferon ( IFN 2 gamm a) added p rio r to o r at early stage of infection in2 hibited infection of hum an monocytes w ith E chaffeensis how ever,, after 24 h of infection, IFN 2gamm a had no antiehrlichial effect< S> < gs> To test w hether 4 1 (,, 2 M edline ( ) F1: F2: F3: F4: F5: F6: F7: F8: F9: F10: F11:, M edline, fo r (i = K21 i> 0 i22) { (A cronym ), A CE ( antio tensin2converting enzym e ) if (ϖw ( (A [ i]= = strcat (W,W ) ) &&(strlen (W ) > 0) ) ) { ggstrcat (W,W ) W W, } return (bretv alue) strlen (W ) W ), 2 M edline 11 :, < S> < gs>, 2, L abel (+ 1, 21 ) 2 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 L abel 0 1 0 0 0 1 1 0 0 0 0 + 1 1 0 0 0 1 1 1 0 0 0 0 21 0 1 0 0 0 1 1 0 0 0 0 + 1 4 2 [18 F5 ], 1994-2006 China Academic Journal Electronic Publishing House All rights reserved http://wwwcnkinet

1 : 183,, ) 80% 220%,,,,,,,, 3,,,, 2,,,, x, y 1 y = + 1 x F 2= 1 ), f (x, y ) = 0 3 n,, p g (x ) p (y gx ) f i (x, y ) = x, y p g 6 (x, y ) f i (x, y ), i= 1,, n x, y, p, H (p ) = - pg (x ) p (y gx ) log p (y gx ),,, x, y g p ( ),,, 2, G IS (Generalized Iterative Scaling) [19 ], IIS ( Imp roved Iterative Scaling) [20 ],, [21, 22 ] Am is [21 ], h ttp: ggwww 2tsujii iṡ s u2tokyo ac jp g yusukegam isg,, 4 3 (Suppo rt V ecto r M ch ines, SVM ) V ap2 nik [23 ], References: [ 1 ] Co llier N, N obata C, T sujii J Extracting the nam es of genes and gene p roducts w ith a h idden m arkov model[c ] In: P roc ( ) of the 18 th International Conference on Computational L inguis2 [24 ] [7 ], tics (COL IN G22000), Saarbrucken, Germ any [ 2 ] Fukuda et al Tow ard info rm ation extraction: Identifying p ro2 tein nam es from biom edical papers[c ] In: P roc of the Pacific Sympo sium on B iocomputing 98 (PSB 98), H aw aii,, [ 3 ] L iu H, Johnson S B, F riedm an C A utom atic reso lution of am 2, Internet, SVM ligh t [25 ], L IBSVM [26 ] biguous term s based on m achine learning and concep tual rela2 tions in the UM L S[J ] Journal of the Am erican M edical Info r2 L IBSVM [26 ], h ttp: ggwww csie m atics A ssociation, 2002, 9 (6) : 6212636 ntu edu tw g cjlinglibsvm g [ 4 ] Chang J, Schutze H, A ltm an R C reating an online dictionary of 5 1, 570 [ 5 ] Schw artz A, H earst M A simp le algo rithm fo r identifying ab2 3 P recision 98 55% 99 19% R ecall 99 59% 99 06% F2M easu re 99 07% 99 13% E rro r2r ate 1 87% 1 75% M edline ( 13, 851, 12, 200 ) ( 2 3, 99%, ( M edline,, abbreviations from M EDL IN E [ J ], Journal of the Am erican M edical Info rm atics A ssociation, 2002, 9 (6) : 6122620 breviation definitions in biom edical text [ C ] In: P roc of the Pacific Sympo sium on B iocomputing 2003 (PSB 2003) [ 6 ] Pakhomov S Sem i2supervised m axim um extropy based app roa2 ch to acronym and abbreviation no rm alization in m edical texts [C ] In: P roc of the 40th A nnual M eeting of the A ssociation fo r Computational L inguistics (A CL ), 1602167 [ 7 ] Yu Z, T suruoka Y, T sujii J A utom atic reso lution of am bigu2 ous abbreviations in biom edical texts using suppo rt vecto r m a2 chines and one sense per discourse hypo thesis[c ] A CM In: P roc of S IG IR 03 W o rk shop on T ext A nalysis and Search fo r 1994-2006 China Academic Journal Electronic Publishing House All rights reserved http://wwwcnkinet

184 2006 B io info rm atics, 57262 [ 8 ] Yu H et al M app ing abbreviations to full fo rm s in biom edical articles[j ] Journal of Am erican M edical Info rm ation A ssocia2 tion, 2002, (9) : 2622272 [ 9 ] Castan ζ o J, Zhang J, Pustejovsky J A napho ra reso lution in biom edical literature [ C ] International Sympo sium on Refer2 ence Reso lution, 2002 [ 10 ] Yu H et al A utom atic extraction of gene and p ro tein synonym s from m edline and journal articles [ C ] In: P roc of AM IA Symp, 2002: 9192923 [ 11 ] R indflesch L et al Edgar: Extraction of drugs, genes and rela2 tions from the biom edical literature[c ] In: P roc of the Pacific Sympo sium on B iocomputing 2000 (PSB 2000) [12 ] A berdeen J et al D escrip tion of the alem bic system used fo r M U C26[C ] In P roceedings of the Sixth M essage U nderstand2 ing Conference (NU C26), M o rgan Kaufm ann [ 13 ] Palm er D D, H earst M A A dap tive m ultilingual sentence boun2 dary disam biguation [ J ] Computational L inguistics, 1997, 23 (3) : 2412267 [ 14 ] Reynar J C, Ratnaparkh ia A m axim um entropy app roach to i2 dentifying sentence boundaries[c ] In: P roceedings of the F ifth A CL Conference on A pp lied N atural L anguage P rocessing (ANL P 97), W ashington, D C, 1997 [15 ] M ikheev A Tagging sentence boundaries [C ] In NA CL 2000 A CL, 2000, 2642271 [ 16 ] W ang H, H uang W Bondec2A sentence boundary detecto r[ebg OL ] http: ggnlp stanfo rd edugcoursesgcs224ng2003gfpg huangygfinal- p ro jecṫ doc [ 17 ] M edline[ebgol ] h ttp: ggwww nlm nih govg [ 18 ] Berger A L et al A m axim um entropy app roach to natural lan2 guage p rocessing [J ], Computational L inguistics, 1996, 22 (1) : 39268 [ 19 ] D arroch J N, Ratcliff D Generalized iterative scaling fo r log2 linear models [ J ] The A nnals of M athem atical Statistics, 1972, 43 (5) : 147021480 [20 ] D ella P S et al Inducing features of random fields [J ] IEEE T ransactions on Pattern A nalysis and M achine Intelligence, 1995, 19 (4) : 3802393 [ 21 ] h ttp: ggwww 2tsujii iṡ s u2tokyo ac jpg yusukegam isg [ 22 ] http: ggnlp stanfo rd edugdow nloadsgclassifier shtm l [ 23 ] Co rtes C, V apnik V Suppo rt2vecto r netwo rk s[j ] M ach ine L e2 arning, 1995, 20 (11) : 2732297 [ 24 ] T ho rsten J T ext catego rization w ith suppo rt vecto r m ach ines: L earning w ith m any relevant features[c ] European Conference on M achine L earning (ECM L ), 1998 [ 25 ] http: ggwww cs co rnell edugpeop legtjgsvm - lightg [ 26 ] h ttp: ggwww csie ntu edu tw g cjlinglibsvm g 1994-2006 China Academic Journal Electronic Publishing House All rights reserved http://wwwcnkinet