Krnls krnl K is a function of two ojcts, for xampl, two sntnc/tr pairs (x1; y1) an (x2; y2) K((x1; y1); (x2; y2)) Intuition: K((x1; y1); (x2; y2)) is a masur of th similarity (x1; y1) twn (x2; y2) an ormally: K((x1; y1); (x2; y2)) is a krnl if it can shown that thr is som fatur vctor Φ(x; y) mapping such that for all x1; y1; x2; y2 K((x1; y1); (x2; y2)) = Φ(x1; y1) Φ(x2; y2)
(Trivial) xampl of a Krnl ivn an xisting fatur vctor rprsntation Φ,fin K((x1; y1); (x2; y2)) = Φ(x1; y1) Φ(x2; y2)
K((x 1 ;y 1 ); (x 2 ;y 2 )) = (1 + Φ(x 1 ;y 1 ) Φ(x 2 ;y 2 )) 2 Mor Intrsting Krnl ivn an xisting fatur vctor rprsntation Φ, fin This can shown to an innr prouct in a nw spac Φ 0,whrΦ 0 contains all quaratic trms of Φ Mor gnrally, K((x 1 ;y 1 ); (x 2 ;y 2 )) = (1 + Φ(x 1 ;y 1 ) Φ(x 2 ;y 2 )) p can shown to an innr prouct in a nw spac Φ 0,whrΦ 0 contains all polynomial trms of Φ up to gr p Qustion: can w com up with spcializ krnls for NLP structurs?
Trs NLP Structurs S NP VP John saw NP Mary Tagg squncs,.g., nam ntity tagging S N N N S j j j j j j Napolon onapart was xil to la S = Start ntity = ontinu ntity N = Not an ntity
Φ maps a structur to a fatur vctor 2 R atur Vctors: Φ Φ fins th rprsntation of a structur S NP VP Sh announc NP NP VP a program to VP promot NP safty PP in NP NP trucks an NP vans Φ + 0; 2; 0; 0; 15; 5i h1;
aturs fatur is a function on a structur,.g., h(x) = Numr of tims is sn in x T 1 f g T 2 h c h(t 1 ) = 1 h(t 2 ) = 2
T 1 T 2 atur Vctors st of functions h1 : : : h fin a fatur vctor Φ(x) = hh1(x); h2(x) : : : h (x)i f g h c Φ(T 2 ) = h2; 0; 1; 1i Φ(T 1 ) = h1; 0; 0; 3i
ll Sutrs Rprsntation [o, 1998] ivn: Non-Trminal symols f; ; : : :g Trminal fa; ; c : : :g symols n infinit st of sutrs ::: n infinit st of faturs,.g., h3(x; y) = Numr of tims is sn in (x; y)
ll Su-fragmnts for Tagg Squncs Trminal symols fa; ; c; : : :g ivn: Stat symols fs; ; N g S S n infinit st of su-fragmnts j a S S j : : : n infinit st of faturs,.g., h3(x) = Numr of tims S j is sn in x
X Innr Proucts Φ(x) = hh1(x); h2(x) : : : h (x)i Innr prouct ( Krnl ) twn two structurs T1 an T2: Φ(T1) Φ(T2) = h i (T1)h i (T2) i=1 T2 T1 f g h c Φ(T1) = h1; 0; 0; 3i Φ(T2) = h2; 0; 1; 1i Φ(T1) Φ(T2) = 1 2 + 0 0 + 0 1 + 3 1 = 5
ll Sutrs Rprsntation ivn: Non-Trminal symols f; ; : : :g Trminal fa; ; c : : :g symols n infinit st of sutrs ::: Stp 1: hoos an (aritrary) mapping from sutrs to intgrs h i (x) = Numr of tims sutr i is sn in x Φ(x) = hh1(x); h2(x); h3(x) : : :i
Φ is now hug ll Sutrs Rprsntation ut innr prouct Φ(T 1 ) Φ(T 2 ) can comput fficintly using ynamic programming.
omputing th Innr Prouct fin N1 an N2 ar sts of nos in T1 an T2 rspctivly. I i (x) = ( if i th sutr is root at x. 1 othrwis: 0 ollows that: h i (T1) = P n 1 2N 1 I i (n1) an h i (T2) = P n 2 2N 2 I i (n2) Φ(T1) Φ(T2) = P i h i (T1)h i (T2) = P i (P n 1 2N 1 I i (n1)) ( P n 2 2N 2 I i (n2)) P P = 2N 1 n 2 2N 2 Pi I i (n1)i i (n2) n 1 = P n 1 2N 1 Pn 2 2N 2 (n1; n2) whr (n1; n2) = P i I i (n1)i i (n2) is th numr of common sutrs at n1; n2
n xampl T 1 f g T 2 h i Φ(T 1 ) Φ(T 2 ) = (; )+ (; ) :::+ (; )+ (; ) :::+ (; ) Most of ths trms ar 0 (.g. (; )). Som ar non-zro,.g. (; ) = 4
Rcursiv finition of (n1; n2) If th prouctions at n1 an n2 ar iffrnt (n1; n2) = 0 ls if n1; n2 ar pr-trminals, (n1; n2) = 1 ls 1 ) Y nc(n (n1; n2) = (1 + (ch(n1; j); ch(n2; j))) j=1 is numr of chilrn of no n1; nc(n1) j) is th j th chil of n1. ch(n1;
Illustration of th Rcursion f g h i How many sutrs o nos an hav in common? i.., What is (; )? (; ) = 4 (; ) = 1 (; ) = ( (; ) + 1) ( (; ) + 1) = 10
Th Innr Prouct for Tagg Squncs fin N1 an N2 to sts of stats in T1 an T2 rspctivly. y a similar argumnt, whr (n1; n2) is numr of common su-fragmnts at n1; n2 Φ(T1) Φ(T2) = P n 1 2N 1 Pn 2 2N 2 (n1; n2).g., T1 = j j 2 j = j a c T j j j j a 1 ) Φ(T 2 ) = (; )+ (; ) :::+ (; )+ (; ) :::+ (; ) Φ(T (; ) =.g., 4, j j
Th Rcursiv finition for Tagg Squncs fin N (n) = stat following n, W (n) = wor at stat n fin ß[W (n1); W (n2)] = 1 iff W (n1) = W (n2) Thn if lals at n1 an n2 ar th sam, (n1; n2) = (1+ß[W (n1); W (n2)]) (1+ (N (n1); N (n2)).g., T1 = j j 2 j = j a c T j j j j a (; ) = (1 + ß[a; a]) (1 + (; )) = (1 + 1) (1 + 4) = 10
Rfinmnts of th Krnls Inclu log proaility from th aslin mol: Φ(T1) is rprsntation unr all su-fragmnts krnl L(T1) is log proaility unr aslin mol Nw rprsntation Φ 0 whr Φ 0 (T1) Φ 0 (T2) = fil(t1)l(t2) + Φ(T1) Φ(T2) (inclus L(T1) as an aitional componnt with wight p fi) llows th prcptron to us original ranking as fault
X Rfinmnts of th Krnls ownwighting largr su-fragmnts SIZ i h i (T1)h i (T2) whr 0 <» 1, i=1 SIZ i is numr of stats/ruls in i th fragmnt Simpl moification to rcursiv finitions,.g., (n1; n2) = (1+ß[W (n1); W (n2)]) (1+ (N (n1); N (n2))
(n1; n2) = (1+ß[W (n1); W (n2)]) (1+ (N (n1); N (n2)) Rfinmnt of th Tagging Krnl Su-fragmnts snsitiv to splling faturs (.g., apitalization) fin ß[x; y] = 1 if x an y ar intical, y] = 0:5 if x an y shar sam capitalization faturs ß[x; N N S j j j xil to la N N S j j j xil to ap Su-fragmnts now inclu capitalization faturs N N S j j j No cap to ap N N S j j j No cap No cap ap
Parsing Wall Strt Journal xprimntal Rsults» 0 2 MOL 100 Wors (2416 sntncs) LR LP s s s O99 88.1% 88.3% 1.06 64.0% 85.1% VP 88.6% 88.9% 0.99 66.5% 86.3% VP givs 5.1% rlativ ruction in rror (O99 = my thsis parsr) Nam ntity Tagging on W ata P R Max-nt 84.4% 86.3% 85.3% Prc. 86.1% 89.1% 87.6% Improvmnt 10.9% 20.4% 15.6% VP givs 15.6% rlativ ruction in rror
Summary or any rprsntation Φ(x), fficint computation Φ(x) Φ(y) ) of fficint larning through krnl form of th prcptron ynamic programming can us to calculat Φ(x) Φ(y) unr all su-fragmnts rprsntations Svral rfinmnts of th innr proucts: Incluing proailitis from aslin mol ownwighting largr su-fragmnts Snsitivity to splling faturs