Combining Statistical and Knowledge-based Spoken Language Understanding in Conditional Models

Size: px

Start display at page:

Download "Combining Statistical and Knowledge-based Spoken Language Understanding in Conditional Models"

Naomi Hoover
5 years ago
Views:

1 COLING/ACL06, pp , Associaion for Compuaional Linguisics, Sydney, Ausralia, 2006 Combining Saisical and Knowledge-based Spoken Language Undersanding in Condiional Models Ye-Yi Wang, Alex Acero, Milind Mahajan Microsof Research One Microsof Way Redmond, WA 98052, USA John Lee Spoken Language Sysems MIT CSAIL Cambridge, MA 0239, USA Absrac Spoken Language Undersanding (SLU) addresses he problem of exracing semanic meaning conveyed in an uerance. The radiional knowledge-based approach o his problem is very expensive -- i requires join experise in naural language processing and speech recogniion, and bes pracices in language engineering for every new domain. On he oher hand, a saisical learning approach needs a large amoun of annoaed daa for model raining, which is seldom available in pracical applicaions ouside of large research labs. A generaive HMM/CFG composie model, which inegraes easy-oobain domain knowledge ino a daa-driven saisical learning framework, has previously been inroduced o reduce daa requiremen. The major conribuion of his paper is he invesigaion of inegraing prior knowledge and saisical learning in a condiional model framework. We also sudy and compare condiional random fields (CRFs) wih percepron learning for SLU. Experimenal resuls show ha he condiional models achieve more han 20% relaive reducion in slo error rae over he HMM/CFG model, which had already achieved an SLU accuracy a he same level as he bes resuls repored on he ATIS daa. Inroducion Spoken Language Undersanding (SLU) addresses he problem of exracing meaning conveyed in an uerance. Tradiionally, he problem is solved wih a knowledge-based approach, which requires join experise in naural language processing and speech recogniion, and bes pracices in language engineering for every new domain. In he pas decade many saisical learning approaches have been proposed, mos of which exploi generaive models, as surveyed in (Wang, Deng e al., 2005). While he daa-driven approach addresses he difficulies in knowledge engineering, i requires a large amoun of labeled daa for model raining, which is seldom available in pracical applicaions ouside of large research labs. To alleviae he problem, a generaive HMM/CFG composie model has previously been inroduced (Wang, Deng e al., 2005). I inegraes a knowledge-based approach ino a saisical learning framework, uilizing prior knowledge o compensae for he dearh of raining daa. In he ATIS evaluaion (Price, 990), his model achieves he same level of undersanding accuracy (5.3% error rae on sandard ATIS evaluaion) as he bes sysem (5.5% error rae), which is a semanic parsing sysem based on a manually developed grammar. Discriminaive raining has been widely used for acousic modeling in speech recogniion (Bahl, Brown e al., 986; Juang, Chou e al., 997; Povey and Woodland, 2002). Mos of he mehods use he same generaive model framework, exploi he same feaures, and apply discriminaive raining for parameer opimizaion. Along he same lines, we have recenly exploied condiional models by direcly poring he HMM/CFG model o Hidden Condiional Random Fields (HCRFs) (Gunawardana, Mahajan e al., 2005), bu failed o obain any improvemen. This is mainly due o he vas parameer space, wih he parameers seling a local opima. We hen simplified he original model srucure by removing he hidden variables, and inroduced a number of imporan overlapping and non-homogeneous feaures. The resuling Condiional Random Fields (CRFs) (Laffery, McCallum e al., 200) yielded a 2% relaive improvemen in SLU accuracy. We also applied a much simpler percepron learning algorihm on he condiional model and observed improved SLU accuracy as well. In his paper, we will firs inroduce he generaive HMM/CFG composie model, hen discuss he problem of direcly poring he model o HCRFs, and finally inroduce he CRFs and

he feaures ha obain he bes SLU resul on ATIS es daa. We compare he CRF and percepron raining performances on he ask. 2 Generaive Models The HMM/CFG composie model (Wang, Deng e al.

2 he feaures ha obain he bes SLU resul on ATIS es daa. We compare he CRF and percepron raining performances on he ask. 2 Generaive Models The HMM/CFG composie model (Wang, Deng e al., 2005) adops a paern recogniion approach o SLU. Given a word sequence W, an SLU componen needs o find he semanic represenaion of he meaning M ha has he maximum a poseriori probabiliy Pr ( M W ) : Mˆ = arg max Pr M W M ( ) ( W M) ( M) = arg max Pr Pr M The composie model inegraes domain knowledge by seing he opology of he prior model, Pr ( M ), according o he domain semanics; and by using PCFG rules as par of Pr W M. he lexicalizaion model ( ) The domain semanics define an applicaion s semanic srucure wih semanic frames. Figure shows a simplified example of hree semanic frames in he ATIS domain. The wo frames wih he oplevel aribue are also known as commands. The filler aribue of a slo specifies he semanic objec ha can fill i. Each slo may be associaed wih a CFG rule, and he filler semanic objec mus be insaniaed by a word sring ha is covered by ha rule. For example, he sring Seale is covered by he Ciy rule in a CFG. I can herefore fill he ACiy (ArrivalCiy) or he DCiy (DeparureCiy) slo, and insaniae a Fligh frame. This frame can hen fill he Fligh slo of a ShowFligh frame. Figure 2 shows a semanic represenaion according o hese frames. < frame name= ShowFligh oplevel= > <slo name= Fligh filler= Fligh / > < /frame > < frame name= GroundTrans oplevel= > < slo name= Ciy filler= Ciy / > < /frame > < frame name= Fligh > <slo name= DCiy filler= Ciy / > < slo name= ACiy filler= Ciy / > < /frame > Figure. Simplified domain semanics for he ATIS domain. The semanic prior model comprises he HMM opology and sae ransiion probabiliies. The opology is deermined by he domain semanics, and he ransiion probabiliies can be esimaed from raining daa. Figure 3 shows he opology of he underlying saes in he saisical model for he semanic frames in Figure. On op is he ransiion nework for he wo op-level commands. A he boom is a zoomed-in view for he Fligh sub-nework. Sae and sae 4 are called precommands. Sae 3 and sae 6 are called poscommands. Saes 2, 5, 8 and 9 represen slos. A slo is acually a hree-sae sequence he slo sae is preceded by a preamble sae and followed by a posamble sae, boh represened by black circles. They provide conexual clues for he slo s ideniy. < ShowFligh > < Fligh > < DCiy filler= Ciy > Seale < /DCiy > < ACiy filler= Ciy > Boson< /ACiy > < /Fligh > < /ShowFligh > Figure 2. The semanic represenaion for Show me he flighs deparing from Seale arriving a Boson is an insaniaion of he semanic frames in Figure. Figure 3. The HMM/CFG model s sae opology, as deermined by he semanic frames in Figure. The lexicalizaion model, Pr ( W M ), depics he process of senence generaion from he opology by esimaing he disribuion of words emied by a sae. I uses sae-dependen n- grams o model he precommands, poscommands, preambles and posambles, and uses knowledge-based CFG rules o model he slo fillers. These rules help compensae for he dearh of domain-specific daa. In he remainder of his paper we will say a sring is covered by a CFG non-erminal (NT), or equivalenly, is CFG-covered for s if he sring can be parsed by he CFG rule corresponding o he slo s. Given he semanic represenaion in Figure 2, he sae sequence hrough he model opology in

3 Figure 3 is deerminisic, as shown in Figure 4. However, he words are no aligned o he saes in he shaded boxes. The parameers in heir corresponding n-gram models can be esimaed wih an EM algorihm ha reas he alignmens as hidden variables. 4. Semanic label ambiguiy, e.g., Washingon D.C. can fill eiher an ArrivalCiy or DeparureCiy slo. Figure 4. Word/sae alignmens. The segmenaion of he word sequences in he shaded region is hidden. The HMM/CFG composie model was evaluaed in he ATIS domain (Price, 990). The model was rained wih ATIS3 caegory A raining daa (~700 annoaed senences) and esed wih he 993 ATIS3 caegory A es senences (470 senences wih 702 reference slos). The slo inserion-deleion-subsiuion error rae (SER) of he es se is 5.0%, leading o a 5.3% semanic error rae in he sandard end-oend ATIS evaluaion, which is slighly beer han he bes manually developed sysem (5.5%). Moreover, a seep drop in he error rae is observed afer raining wih only he firs wo hundred senences. This demonsraes ha he inclusion of prior knowledge in he saisical model helps alleviae he daa sparseness problem. 3 Condiional Models We invesigaed he applicaion of condiional models o SLU. The problem is formulaed as assigning a label l o each elemen in an observaion o. Here, o consiss of a word sequence o and a lis of CFG non-erminals (NT) ha cover is subsequences, as illusraed in Figure 5. The ask is o label wo as he Numof-ickes slo of he ShowFligh command, and Washingon D.C. as he ArrivalCiy slo for he same command. To do so, he model mus be able o resolve several kinds of ambiguiies:. Filler/non-filler ambiguiy, e.g., wo can eiher fill a Num-of-ickes slo, or is homonym o can form par of he preamble of an ArrivalCiy slo. 2. CFG ambiguiy, e.g., Washingon can be CFG-covered as eiher Ciy or Sae. 3. Segmenaion ambiguiy, e.g., [Washingon] [D.C.] vs. [Washingon D.C.]. Figure 5. The observaion includes a word sequence and he subsequences covered by CFG non-erminals. 3. CRFs and HCRFs Condiional Random Fields (CRFs) (Laffery, McCallum e al., 200) are undireced condiional graphical models ha assign he condiional probabiliy of a sae (label) sequence s wih respec o a vecor of feaures f( s, o ). They are of he following form: ps ( o; λ) = exp ( λ f( s, o) ). z( o; λ) z( o; λ) = exp λ f( s, o ) normalizes Here ( ) s he disribuion over all possible sae sequences. The parameer vecor λ is rained condiionally (discriminaively). If we assume ha s is a Markov chain given o and he feaure funcions only depend on wo adjacen saes, hen ps ( o; λ) ( ) ( ) = exp λk fk( s, s, o, ) z( o; λ) k = In some cases, i may be naural o exploi feaures on variables ha are no direcly observed. For example, a feaure for he Fligh preamble may be defined in erms of an observed word and an unobserved sae in he shaded region in Figure 4: ( ) ( ) f ( s, s, o, ) FlighIni,flighs if s =FlighIni o = flighs; 0 oherwise In his case, he sae sequence s is only parially observed in he meaning represenaion M : M( s5) = "DCiy" M( s8) = "ACiy" for he words Seale and Boson. The saes for he remaining words are hidden. Le Γ ( M ) represen he se of all sae sequences ha saisfy he consrains imposed by M. To obain he condiional probabiliy of M, we need o sum over all possible labels for he hidden saes: () (2) (3)

4 pm ( o; λ) = ( ) ( ) exp λk fk( s, s, o, ) z( o; λ) s Γ( M) k = CRFs wih feaures dependen on hidden sae variables are called Hidden Condiional Random Fields (HCRFs). They have been applied o asks such as phoneic classificaion (Gunawardana, Mahajan e al., 2005) and objec recogniion (Quaoni, Collins e al., 2004). 3.2 Condiional Model Training We rain CRFs and HCRFs wih gradien-based opimizaion algorihms ha maximize he log poserior. The gradien of he objecive funcion is λ L( λ) = E ( s ( ) ) λ P, P( s, ) f, o ; lo lo E ( s ( ), ); λ P o P( s o) f o which is he difference beween he condiional expecaion of he feaure vecor given he observaion sequence and label sequence, and he condiional expecaion given he observaion sequence alone. Wih he Markov assumpion in Eq. (2), hese expecaions can be compued using a forward-backward-like dynamic programming algorihm. For CRFs, whose feaures do no depend on hidden sae sequences, he firs expecaion is simply he feaure couns given he observaion and label sequences. In his work, we applied sochasic gradien descen (SGD) (Kushner and Yin, 997) for parameer opimizaion. In our experimens on several differen asks, i is faser han L- BFGS (Nocedal and Wrigh, 999), a quasi- Newon opimizaion algorihm. 3.3 CRFs and Percepron Learning Percepron raining for condiional models (Collins, 2002) is an approximaion o he SGD algorihm, using feaure couns from he Vierbi label sequence in lieu of expeced feaure couns. I eliminaes he need of a forward-backward algorihm o collec he expeced couns, hence grealy speeds up model raining. This algorihm can be viewed as using he minimum margin of a raining example (i.e., he difference in he log condiional probabiliy of he reference label sequence and he Vierbi label sequence) as he objecive funcion insead of he condiional probabiliy: L' ( λ) = log P( l o; λ) max log P( l' o; λ) l ' Here again, o is he observaion and l is is reference label sequence. In percepron raining, he parameer updaing sops when he Vierbi label sequence is he same as he reference label sequence. In conras, he opimizaion based on he log poserior probabiliy objecive funcion keeps pulling probabiliy mass from all incorrec label sequences o he reference label sequence unil convergence. In boh percepron and CRF raining, we average he parameers over raining ieraions (Collins, 2002). 4 Poring HMM/CFG Model o HCRFs In our firs experimen, we would like o exploi he discriminaive raining capabiliy of a condiional model wihou changing he HMM/CFG model s opology and feaure se. Since he sae sequence is only parially labeled, an HCRF is used o model he condiional disribuion of he labels. 4. Feaures We used he same sae opology and feaures as hose in he HMM/CFG composie model. The following indicaor feaures are included: Command prior feaures capure he a priori likelihood of differen op-level commands: PR ( ) ( ) fc ( s, s, o, ) if =0 C( s ) = c, c CommandSe 0 oherwise Here C(s) sands for he name of he command ha corresponds o he ransiion nework conaining sae s. Sae Transiion feaures capure he likelihood of ransiion from one sae o anoher: ( ) ( ) TR ( ) ( ) if s = s, s = s2 f ( s, s, o, ), s, s = 2 0 oherwise where s s2 is a legal ransiion according o he sae opology. Unigram and Bigram feaures capure he likelihoods of words emied by a sae:

5 = o = f s, s,, = 0 oherwise f s s o UG ( ) ( ) if s s w ( o, ), sw BG ( ) ( ) (,,,,, ) sw w2 ( ) ( ) if s = s s = s o = w o = w2, 0 oherwise s isfiller s ; w, ww TrainingDaa ( ) 2 The condiion isfiller( s ) resrics s o be a slo sae and no a pre- or posamble sae. 4.2 Experimens The model is rained wih SGD wih he parameers iniialized in wo ways. The fla sar iniializaion ses all parameers o 0. The generaive model iniializaion uses he parameers rained by he HMM/CFG model. Figure 6 shows he es se slo error raes (SER) a differen raining ieraions. Wih he fla sar iniializaion (op curve), he error rae never comes close o he 5% baseline error rae of he HMM/CFG model. Wih he generaive model iniializaion, he error rae is reduced o 4.8% a he second ieraion, bu he model quickly ges over-rained aferwards Figure 6. Tes se slo error raes (in %) a differen raining ieraions. The op curve is for he fla sar iniializaion, he boom for he generaive model iniializaion. The failure of he direc poring of he generaive model o he condiional model can be aribued o he following reasons: The condiional log-likelihood funcion is no longer a convex funcion due o he summaion over hidden variables. This makes he model highly likely o sele on a local opimum. The fac ha he fla sar iniializaion failed o achieve he accuracy of he generaive model iniializaion is a clear indicaion of he problem. In order o accoun for words in he es daa, he n-grams in he generaive model are properly smoohed wih back-offs o he uniform disribuion over he vocabulary. This resuls in a huge number of parameers, many of which canno be esimaed reliably in he condiional model, given ha model regularizaion is no as well sudied as in n-grams. The hidden variables make parameer esimaion less reliable, given only a small amoun of raining daa. 5 CRFs for SLU An imporan lesson we have learned from he previous experimen is ha we should no hink generaively when applying condiional models. While i is imporan o find cues ha help idenify he slos, here is no need o exhausively model he generaion of every word in a senence. Hence, he disincions beween preand poscommands, and pre- and posambles are no longer necessary. Every word ha appears beween wo slos is labeled as he preamble sae of he second slo, as illusraed in Figure 7. This labeling scheme effecively removes he hidden variables and simplifies he model o a CRF. I no only expedies model raining, bu also prevens parameers from seling a a local opimum, because he log condiional probabiliy is now a convex funcion. Figure 7. Once he slos are marked in he simplified model opology, he sae sequence is fully marked, leaving no hidden variables and resuling in a CRF. Here, PAC sands for preamble for arrival ciy, and PDC for preamble for deparure ciy. The command prior and sae ransiion feaures (wih fewer saes) are he same as in he HCRF model. For unigrams and bigrams, only hose ha occur in fron of a CFG-covered sring are considered. If he sring is CFG-covered for slo s, hen he unigram and bigram feaures for he preamble sae of s are included. Suppose he words ha depars occur a posiions and in fron of he word Seale, which is CFG-covered by he non-erminal Ciy. Since Ciy can fill a DeparureCiy or ArrivalCiy slo, he four following feaures are inroduced:

6 And UG ( ) ( ) UG ( ) ( ) f ( s, s, o PDC,ha, ) = f ( s, s, o PAC,ha, ) = previous slo is ArrivalCiy, so he sae ransiion feaures are no helpful for disambiguaion. The ideniy of he ime slo BG ( ) ( ) f ( s, s, o PDC,ha,depars, ) = depends no on he ArrivalCiy slo, bu on is BG ( ) ( ) preamble. Our second feaure se, previous-slo o PAC,ha,depars conex, inroduces his dependency o he model: f ( s, s,, ) = Formally, UG ( ) ( ) if s = s o = w f ( s, s, o,, ), sw = 0 oherwise BG ( ) ( ) f ( s, s, o, ) sw,, w2 ( ) ( ) if s = s = s o = w o = w2, 0 oherwise s isfiller ( s) ; www, 2 in he raining daa, wand ww 2 appears in fron of sequence ha is CFG-covered for s. 5. Addiional Feaures One advanage of CRFs over generaive models is he ease wih which overlapping feaures can be incorporaed. In his secion, we describe hree addiional feaure ses. The firs se addresses a side effec of no modeling he generaion of every word in a senence. Suppose a preamble sae has never occurred in a posiion ha is confusable wih a slo sae s, and a word ha is CFG-covered for s has never occurred as par of he preamble sae in he raining daa. Then, he unigram feaure of he word for ha preamble sae has weigh 0, and here is hus no penaly for mislabeling he word as he preamble. This is one of he mos common errors observed in he developmen se. The chunk coverage for preamble words feaure inroduced o model he likelihood of a CFGcovered word being labeled as a preamble: f s s CC ( ) ( ) (,, o, ) cnt, = if C( s ) = c covers( NT, o ) ispre( s ) 0 oherwise where ispre( s ) indicaes ha s is a preamble sae. Ofen, he ideniy of a slo depends on he preambles of he previous slo. For example, a wo PM is a DeparureTime in fligh from Seale o Boson a wo PM, bu i is an ArrivalTime in fligh deparing from Seale arriving in Boson a wo PM. In boh cases, he f s s o PC ( ) ( ) (,,, ) s, s2, w ( ) ( ) if s = s s = s2 w Θ( s, o, ) isfiller( s) Slo( s) Slo( s2) 0 oherwise Here Slo( s ) sands for he slo associaed wih he sae s, which can be a filler sae or a preamble sae, as shown in Figure 7. Θ( s, o, ) is he se of k words (where k is an adjusable window size) in fron of he longes sequence ha ends a posiion and ha is CFG-covered by Slo( s ). The hird feaure se is inended o penalize erroneous segmenaion, such as segmening Washingon D.C. ino wo separae Ciy slos. The chunk coverage for slo boundary feaure is acivaed when a slo boundary is covered by a CFG non-erminal NT, i.e., when words in wo consecuive slos ( Washingon and D.C. ) can also be covered by one single slo: SB ( ) ( ) f ( s, s, o, ) cnt, if C( s ) = c covers( NT, o ) ( ) ( ) isfiller( s ) isfiller( s ) ( ) ( ) s s 0 oherwise This feaure se shares is weighs wih he chunk coverage feaures for preamble words, and does no inroduce any new parameers. Feaures # of Param. SER Command Prior 6 +Sae Transiion % +Unigrams % +Bigrams % +Chunk Cov Preamble Word % +Previous-Slo Conex % +Chunk Cov Slo Boundaries % Table. Number of addiional parameers and he slo error rae afer each new feaure se is inroduced. 5.2 Experimens Since he objecive funcion is convex, he opimizaion algorihm does no make any significan difference on SLU accuracy. We

7 rained he model wih SGD. Oher opimizaion algorihm like Sochasic Mea-Decen (Vishwanahan, Schraudolph e al., 2006) can be used o speed up he convergence. The raining sopping crierion is cross-validaed wih he developmen se. Table shows he number of new parameers and he slo error rae (SER) on he es daa, afer each new feaure se is inroduced. The new feaures improve he predicion of slo ideniies and reduce he SER by 2%, relaive o he generaive HMM/CFG composie model. The figures below show in deail he impac of he n-gram, previous-slo conex and chunk coverage feaures. The chunk coverage feaure has hree seings: 0 sands for no chunk coverage feaures; for chunk coverage feaures for preamble words only; and 2 for boh words and slo boundaries. Figure 8 shows he impac of he order of n- gram feaures. Zero-order means no lexical feaures for preamble saes are included. As he figure illusraes, he inclusion of CFG rules for slo filler saes and domain-specific knowledge abou command priors and slo ransiions have already produced a reasonable SER under 5%. Unigram feaures for preamble saes cu he error by more han 50%, while he impac of bigram feaures is no consisen -- i yields a small posiive or negaive difference depending on oher experimenal parameer seings. Slo Error Rae 6% 4% 2% 0% 8% 6% 4% 2% 0% ChunkCoverage=0 ChunkCoverage= ChunkCoverage=2 0 2 Ngram Order Figure 8. Effecs of he order of n-grams on SER. The window size for he previous-slo conex feaures is 2. Figure 9 shows he impac of he CFG chunk coverage feaure. Coverage for boh preamble words and slo boundaries help improve he SLU accuracy. Figure 0 shows he impac of he window size for he previous-slo conex feaure. Here, 0 means ha he previous-slo conex feaure is no used. When he window size is k, he k words in fron of he longes previous CFG-covered word sequence are included as he previous-slo unigram conex feaures. As he figure illusraes, his feaure significanly reduces SER, while he window size does no make any significan difference. Slo Error Rae 6% 4% 2% 0% 8% 6% 4% 2% 0% n=0 n= n=2 0 2 Chunk Coverage Figure 9. Effecs of he chunk coverage feaure. The window size for he previous-slo conex feaure is 2. The hree lines correspond o differen n-gram orders, where 0-gram indicaes ha no preamble lexical feaures are used. I is imporan o noe ha overlapping CC SB PC feaures like f, f and f could no be easily incorporaed ino a generaive model. Slo Error Rae 2% 0% 8% 6% 4% 2% n=0 n= n=2 0% 0 2 Window Size Figure 0. Effecs of he window size of he previous-slo conex feaure. The hree lines represen differen orders of n-grams (0,, and 2). Chunk coverage feaures for boh preamble words and slo boundaries are used. 5.3 CRFs vs. Perceprons Table 2 compares he percepron and CRF raining algorihms, using chunk coverage feaures for boh preamble words and slo boundaries, wih which he bes accuracy resuls

8 are achieved. Boh improve upon he 5% baseline SER from he generaive HMM/CFG model. CRF raining ouperforms he percepron in mos seings, excep for he one wih unigram feaures for preamble saes and wih window size -- he model wih he fewes parameers. One possible explanaion is as follows. The objecive funcion in CRFs is a convex funcion, and so SGD can find he single global opimum for i. In conras, he objecive funcion for he percepron, which is he difference beween wo convex funcions, is no convex. The gradien ascen approach in percepron raining is hence more likely o sele on a local opimum as he model becomes more complicaed. PSWSize= PSWSize=2 Percepron CRFs Percepron CRFs n= 3.76% 4.% 4.23% 3.94% n=2 4.76% 4.4% 4.58% 3.94% Table 2. Percepron vs. CRF raining. Chunk coverage feaures are used for boh preamble words and slo boundaries. PSWSize sands for he window size of he previous-slo conex feaure. N is he order of he n-gram feaures. The bigges advanage of percepron learning is is speed. I direcly couns he occurrence of feaures given an observaion and is reference label sequence and Vierbi label sequence, wih no need o collec expeced feaure couns wih a forward-backward-like algorihm. No only is each ieraion faser, bu fewer ieraions are required, when using SLU accuracy on a crossvalidaion se as he sopping crierion. Overall, percepron raining is 5 o 8 imes faser han CRF raining. 6 Conclusions This paper has inroduced a condiional model framework ha inegraes saisical learning wih a knowledge-based approach o SLU. We have shown ha a condiional model reduces SLU slo error rae by more han 20% over he generaive HMM/CFG composie model. The improvemen was mosly due o he inroducion of new overlapping feaures ino he model. We have also discussed our experience in direcly poring a generaive model o a condiional model, and demonsraed ha i may no be beneficial a all if we sill hink generaively in condiional modeling; more specifically, replicaing he feaure se of a generaive model in a condiional model may no help much. The key benefi of condiional models is he ease wih which hey can incorporae overlapping and nonhomogeneous feaures. This is consisen wih he finding in he applicaion of condiional models for POS agging (Laffery, McCallum e al., 200). The paper also compares differen raining algorihms for condiional models. In mos cases, CRF raining is more accurae, however, percepron raining is much faser. References Bahl, L., P. Brown, e al Maximum muual informaion esimaion of hidden Markov model parameers for speech recogniion. IEEE Inernaional Conference on Acousics, Speech, and Signal Processing. Collins, M Discriminaive Training Mehods for Hidden Markov Models: Theory and Experimens wih Percepron Algorihms. EMNLP, Philadelphia, PA. Gunawardana, A., M. Mahajan, e al Hidden condiional random fields for phone classificaion. Eurospeech, Lisbon, Porugal. Juang, B.-H., W. Chou, e al "Minimum classificaion error rae mehods for speech recogniion." IEEE Transacions on Speech and Audio Processing 5(3): Kushner, H. J. and G. G. Yin Sochasic approximaion algorihms and applicaions, Springer-Verlag. Laffery, J., A. McCallum, e al Condiional random fields: probabilisic models for segmening and labeling sequence daa. ICML. Nocedal, J. and S. J. Wrigh Numerical opimizaion, Springer-Verlag. Povey, D. and P. C. Woodland Minimum phone error and I-smoohing for improved discriminaive raining. IEEE Inernaional Conference on Acousics, Speech, and Signal Processing. Price, P Evaluaion of spoken language sysem: he ATIS domain. DARPA Speech and Naural Language Workshop, Hidden Valley, PA. Quaoni, A., M. Collins and T. Darrell Condiional Random Fields for Objec Recogniion. NIPS. Vishwanahan, S. V. N., N. N. Schraudolph, e al Acceleraed Training of condiional random fields wih sochasic mea-descen. The Learning Workshop, Snowbird, Uah. Wang, Y.-Y., L. Deng, e al "Spoken language undersanding --- an inroducion o he saisical framework." IEEE Signal Processing Magazine 22(5): 6-3.

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions