Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:

Size: px

Start display at page:

Download "Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:"

Cory Berry
5 years ago
Views:

1 Ineligencia Arificial. Revisa Iberoamericana de Ineligencia Arificial ISSN: Asociación Española para la Ineligencia Arificial España Tomassi, Diego; Milone, Diego; Forzani, Liliana Minimum Classificaion Error Training of Hidden Markov Models for Sequenial Daa in he Wavele Domain Ineligencia Arificial. Revisa Iberoamericana de Ineligencia Arificial, vol. 13, núm. 44, 2009, pp Asociación Española para la Ineligencia Arificial Valencia, España Available in: hp:// How o cie Complee issue More informaion abou his aricle Journal's homepage in redalyc.org Scienific Informaion Sysem Nework of Scienific Journals from Lain America, he Caribbean, Spain and Porugal Non-profi academic projec, developed under he open access iniiaive

Ineligencia Arificial 44(2009), 46-55 doi: 10.4114/ia.v13i44.1045 INTELIGENCIA ARTIFICIAL hp://erevisa.aepia.

2 Ineligencia Arificial 44(2009), doi: /ia.v13i INTELIGENCIA ARTIFICIAL hp://erevisa.aepia.org/ Minimum Classificaion Error Training of Hidden Markov Models for Sequenial Daa in he Wavele Domain Diego Tomassi, Diego Milone, Liliana Forzani Laboraorio de Invesigación en Señales e Ineligencia Compuacional FICH, Universidad Nacional del Lioral - CONICET, Argenina diegoomassi@gmail.com Laboraorio de Invesigación en Señales e Ineligencia Compuacional FICH, Universidad Nacional del Lioral - CONICET, Argenina dmilone@fich.unl.edu.ar Insiuo de Maemáica Aplicada Lioral FIQ, Universidad Nacional del Lioral - CONICET, Argenina liliana.forzani@gmail.com Absrac In he las years here has been increasing ineres in developing discriminaive raining mehods for hidden Markov models, wih he aim o improve heir performance in classificaion and paern recogniion asks. Alhough several advances have been made in his area, hey have been argeed almos exclusively o sandard models whose condiional observaions are given by a Gaussian mixure densiy. In parallel wih his developmen, a special kind of hidden Markov models defined in he wavele domain has found wide-spread use in he signal and image processing communiy. Neverheless, hese models have been ypically resriced o fully-ied parameer raining using a single sequence and maximum likelihood esimaes. This paper akes a sep forward in he developmen of sequenial paern recognizers based on wavele-domain hidden Markov models by inroducing a new discriminaive raining mehod. The learning sraegy relies on he minimum classificaion error approach and provides reesimaion formulas for fully non-ied models. Numerical experimens on a simple phoneme recogniion ask show imporan improvemen over he recogniion rae achieved by he same models rained under he maximum likelihood esimaion approach. Keywords: Syle, Revisa Iberoamericana de Ineligencia Arificial, Sample documen. 1 Inroducion Hidden Markov models have been proven successful in dealing wih sequenial daa, being a he core of sae of he ar mehods for applicaions such as speech recogniion [15] and sequence alignmen in bioinformaics [3]. Wihin his modeling framework, maximum likelihood esimaion has been he sandard approach for learning parameers from daa, aking advanage of he efficiency of he expecaionmaximizaion algorihm (EM) [6]. The raionale behind his is ha minimum Bayes risk can be aained by picking he class which maximizes he poserior probabiliy given he observaion sequence. This probabiliy can be furher replaced via Bayes rule by he likelihood and an esimaion of he class prior. Thus, wihin his framework he classifier design involves in fac a disribuion approximaion ask. ISSN: (on-line) c AEPIA and he auhors

3 Ineligencia Arificial 44(2009) 47 The key observaion o be noiced is ha wha is acually used in mos cases is a plug-in maximum a poseriori approach: rue class poserior probabiliies are supposed o equal hose for he models linked o each class. When his is rue and he se of raining signals is large enough, he above approach is in fac he bes we can do. However, hese assumpions usually do no hold for paern classificaion asks involving real-world daa. When here is high variabiliy in daa or when raining samples are limied, models poseriors canno be expeced o mach he rue class poseriors and Bayes risk becomes an unaainable lower bound. To overcome hese limiaions, in recen years here has been a growing ineres in discriminaive raining of hidden Markov models [8]. Unlike he previous disribuion approach o parameer esimaion, hese mehods aim o reduce he classificaion error by using raining samples from all classes simulaneously and o maximize he dissimilariy beween models of differen classes. Several crieria have been proposed o drive he learning process, giving rise o mehods such as Maximum Muual Informaion [2] and Minimum Classificaion Error [9, 4]. The mos widely used of hose mehods is Minimum Classificaion Error raining (MCE). When applied o parameer esimaion in hidden Markov models, his is a HMM-based discriminan analysis approach in which a sof approximaion of he 0-1 loss is used o model he decision risk of he classifier. The learning problem becomes an opimizaion problem which direcly links he design of he classifier o is expeced performance and i is usually carried ou by he generalized probabilisic descen (GPD) mehod [10]. MCE raining has shown o ouperform he convenional maximum likelihood approach in many applicaions. This success has also simulaed several effors boh o ground he mehod on a more principled basis [12, 1] and o improve is efficiency in real-world applicaions [7]. Neverheless, mos of hese works deal only wih sandard hidden Markov models whose observaion densiies are given by Gaussian mixures. A very special kind of hidden Markov models comprises hose defined in he wavele domain. The bes known of hese models is he hidden Markov ree (HMT), which was inroduced in [5] o accoun for saisical dependencies beween coefficiens in wavele represenaions of signals and images. Alhough he HMT has found widespread use in applicaions, i is no well suied o sequenial paern recogniion asks because i canno handle variable-lengh sequences. This is due o he use of he discree wavele ransform, which makes he srucure of he represenaion depend on he lengh of he signal. To relax his limiaion, a composie HMM-HMT archiecure was proposed in [13], in which an HMT models he observaion densiy of each sae of an exernal HMM. An EM algorihm for parameer esimaion was derived in [13] for fully-coupled non-ied models and promising preliminary resuls boh for signal denoising and classificaion were repored in [14] and [13], respecively. In his paper we ake a sep forward in he developmen of sequenial paern classifiers in he wavele domain by inroducing a new discriminaive raining algorihm for he HMM-HMT model. I relies on he minimum classificaion error crierion and i is solved hrough he GPD approach. The proposed algorihm focusses in fully non-ied models in he wavele domain. Use of hem insead of Gaussian mixures as observaion densiies requires he inroducion of modificaions o he sandard MCE approach in order o avoid numerical issues. We provide reesimaion formulas for all he parameers in he model and carry ou simple phoneme recogniion experimens o compare he performance of he proposed algorihm agains he same model rained by he sandard EM approach. The paper is organized as follows: Secion 2 reviews he composie HMM-HMT model and noaion; reesimaion formulas for he proposed algorihm are given in Secion 3 and experimenal resuls for phoneme recogniion are shown in Secion 4. Conclusions and fuure works are oulined in Secion 5. 2 The HMM-HMT model The HMM-HMT archiecure is a composiion of wo Markovian models in which he HMT serves as observaion densiy for each sae of he HMM. Long-erm dependencies are modeled by he exernal HMM, while he HMT models shor-erm dependencies in he wavele domain. To make he following secions clear, we summarize nex he main definiions and noaion for he HMM-HMT model. Furher deails can be found in [13].

4 48 Ineligencia Arificial 44(2009) 2.1 Model definiion and noaion In order o model a sequence W = w 1, w 2,..., w T, wih w R N, we define a coninuous HMM wih he srucure ϑ = Q, A, π, B, where Q is he se of saes, A = {a ij } is he marix of sae ransiion probabiliies so ha a ij is he probabiliy of ransiion from sae i o sae j; π is he iniial sae probabiliy vecor; and B = {b k (w )}, is he se of observaion densiies. We will suppose ha Q akes values q 1, 2,..., N Q. In addiion, le w = [w1, w2,..., wn ], wih w n R, be he vecor of coefficiens of he wavele represenaion of a signal 1. The HMT in he sae k of he HMM can be defined wih he srucure θ k = U k, R k, κ k, ɛ k, F k, where U k is he se of nodes in he ree; R k is he se of saes in all he nodes of he ree; κ k are he probabiliies for he iniial saes in he roo node; ɛ k = [ ] ɛ k n is he array whose elemens hold he condiional probabiliy of node u being in sae m given ha he sae in is paren node ρ(u) is n; and F k = { f(w k u ) } is he se of observaion densiies for he wavele coefficiens, ha is, f(w k u) is he probabiliy of observing he wavele coefficien wu wih he sae m (in he node u). In paricular, we assume ha wavele coefficiens are condiionally Gaussian given he sae in he node of he ree; so f(w k u) = N (wu; µ k, σ), k where N ( ) denoes he Gaussian densiy. In laer developmens we will also denoe wih R k u he se of saes in he node u, which akes values r u 1, 2,..., M. 2.2 Likelihood of he observaions The likelihood of he firs order HMM for condiionally independen observaions is given by [15]: L Θ (W) = q where he observaion densiy for each HMM sae is given by (see [5]): b q (w ) = r u a q 1 q b q (w ), (1) ɛ q u,r ur ρ(u) f q u,r u (w u), (2) wih r = [r 1, r 2,..., r N ] a combinaion of hidden saes in he HMT nodes. Thus, he complee likelihood for he join HMM-HMT model is: L Θ (W) = a q 1 q ɛ q f q u,ru r u,r ρ(u) u(wu) (3) q r u = a q 1 q ɛ q f q u,ru r u,r ρ(u) u(wu) (4) q R u L Θ (W, q, R), (5) q R where a 01 = π 1 = 1. The sign q denoes ha he sum is over all possible sae sequences q = q 1, q 2,..., q T and R accouns for all possible sequences of all possible combinaions of hidden saes r 1, r 2,..., r T in he nodes of each ree. See [13] for deails abou he HMM-HMT model and he EM algorihm for raining i. We will refer o L Θ (W, q, R) as he join likelihood of he observaions and he saes of he model. 3 Algorihm formulaion The MCE approach for classifier design involves a se of discriminan funcions opimized in a compeiive way in order o achieve he leas classificaion error over he raining sample. Discriminan funcions are hose funcions which measure he degree of membership of an observaion o a given class, hus characerizing he decision rule of he classifier. Le {g j (W; Λ)} be a parameerized se of such discriminan funcions for a classificaion ask a hand, W be an observaion, Λ be he whole parameer se, and 1 For a wavele analysis up o J levels and skipping he coarser approximaion coefficien, N = 2 J 1.

5 Ineligencia Arificial 44(2009) 49 C(W) be he decision of he classifier. The classifier will decide ha observaion W belongs o class i when C(W) = arg max g j (W; Λ) = i. (6) j To rain a se of HMMs wihin his framework, he discriminan funcion of each class is chosen o be a funcion of he join likelihood for he HMM of ha class. In order o pu in conex he proposed algorihm for HMM-HMT models, we firs review he basics of he MCE approach. 3.1 General MCE approach A main feaure of he MCE raining mehod is ha model updae is compeiive wih regard o classes. Tha is, all models are updaed simulaneously and he srengh of he updae depends on how confusing he decision is o he classifier. Wihin his framework, minimizaion of he classificaion error is pursued hrough a hree-sep process: 1. Simulaion of he classifier decision. This is carried ou defining a funcion d i (W; Λ) : R R which is usually chosen o ake a negaive value when he classifier decision is righ and a posiive one oherwise. Following he decision rule (6), for a raining sequence ha belongs o class i, his funcion can be wrien as d i (W; Λ) = g i (W; Λ) + max j i {g j(w; Λ)}. However, he max operaion is no differeniable and so wha is used in pracice is a sof approximaion o i. Funcion d i (W; Λ) is ofen referred o as he missclassificaion funcion. 2. Sof approximaion of he 0-1 loss: he simulaed classifier decision is embedded in a sof differeniable funcion which approximaes he nonconinuous 0-1 loss. A common choice for his approximaion is he sigmoid funcion defined as: l(d i (W; Λ)) = l i (W; Λ) = exp ( γd i (W; Λ) + β). (7) Parameer γ conrols he sharpness of he sigmoid and he bias β is usually se o zero. 3. Minimizaion of he empirical classificaion risk: le M be he number of classes in he problem. Le Ω i sand for he se of paerns which belong o class i. The classificaion risk condiioned on W can be wrien as M l(w; Λ) = l i (W; Λ) I(W Ω i ), (8) where I( ) is he indicaor funcion. The expeced risk hen reads i=1 L(Λ) = E W [l(w; Λ)]. (9) The GPD approach for MCE raining is an on-line scheme which aims a minimizing (9) by updaing he whole se of parameers Λ in he seepes-descen direcion of he loss. Saring from an iniial esimae ˆΛ 0, he τ-h ieraion of he algorihm can be summarized as: ˆΛ ˆΛ α τ l(w τ ; Λ) Λ. (10) Λ=ˆΛτ The updaing process is ofen carried ou wih each raining signal. Under mild condiions, i is shown ha ˆΛ converges o Λ wih probabiliy one provided he learning rae α τ 0 as τ [10].

6 50 Ineligencia Arificial 44(2009) 3.2 Proposed algorihm We sar by choosing he funcional form for he discriminaion funcions g j (W; Λ). In order for he mehod o be useful for raining he model, we mus preserve some link beween hese funcions and he HMM. A common choice is o define g j (W; Λ) as a funcion of he join likelihood L Θ (W, q, R) [4]. In paricular, we will iniially consider he following funcional form based on Vierbi decoding: ( ) g(w Λ) = log max {L Θ(W, q, R)} q,r = log a q 1 q + u log ɛ q + u, r u r ρ(u) u log f q u, r u(w u). In he expression above, q and r refer o saes ha achieve maximum join likelihood. Nex, we mus define he missclassificaion funcion d i (W; Λ). For HMMs wih Gaussian mixure observaions and he discriminan funcions defined as above, i is a sandard pracice o choose i as d i (W) = g i (W; Λ) + log 1 M 1 j i e gj(w;λ)η 1/η. (11) As η becomes arbirarily large he erm in brackes approximaes, up o a consan, he supremum of {g j (W; Λ)} for all j differen han i. However, likelihoods for he HMT model are ipically much smaller han hose found for Gaussian mixures. As a resul, g j (W; Λ) ofen akes exremely low values for W / Ω j and he exponeniaion gives rise o numerical underflow. Therefore, we define he missclassificaion funcion o be: d i (W; Λ) = 1 [ 1 M 1 j i g j(w; Λ) η ] 1/η g i (W; Λ). (12) To avoid resricing η o be an even ineger, we also redefine he discriminan funcions o be posiivevalued: ( ) g i (W; Λ) = log max {L Θ(W, q, R)}. (13) q,r For an approximaion of he 0-1 loss, we follow he sandard pracice and choose a sigmoid funcion as defined in (7). As GPD is a gradien-based opimizaion mehod, we mus inroduce some ransformaion of he parameers o allow for such an unconsrained opimizaion o be valid [9]. To consrain a ij o be a probabiliy measure, we define ã ij so ha a ij = exp ã ij / m exp ã im. A similar ransformaion is needed for he analogous probabiliies in he inernal HMTs. So, we define ɛ k n so ha ɛ k n = exp ɛ k n/ p exp ɛk u,pn. We also need o consrain he Gaussian variances o be posiive-valued. Thus, we define σ k so ha σ k = log σ k. Finally, we scale he Gaussian means in he condiional densiies for he wavele coefficiens in order o improve numerical compuaions [4]. Following previous works, we define he ransformed means µ k o be µ k = µ k /σ k. 3.3 Esimaion of Gaussian means Le assume ha he τ-h raining sequence W τ belongs o Ω i and denoe by Λ (j) he subse of Λ corresponding o he model for class j. To simplify noaion, allow l i, d j and g j sand for l i (W; Λ), d j (W; Λ) and g j (W; Λ), respecively. The updaing process works upon he ransformed parameers µ (j)k and is given by µ (j)k µ (j)k l i (W τ ; Λ) α τ µ (j)k. (14) Λ=ˆΛτ

7 Ineligencia Arificial 44(2009) 51 Applying he chain rule of differeniaion we ge for j = i: µ (i)k µ (i)k α τ γl i (1 l i ) d i 1 g i [ ] δ( q k, r u wu ˆµ (i)k m) ˆσ (i)k. (15) For j i, he same procedure leads o: µ (j)k µ (j)k α τ γl i (1 l i )(1 d i ) δ( q k, r u m) [ g η 1 j k i g η w u ˆµ (j)k ˆσ (j)k k ]. (16) 3.4 Esimaion of Gaussian variances The updaing process for Gaussian variances is compleely analogous o he one shown above for means. Assuming again ha he τ-h raining sequence W τ belongs o Ω i, he updaing process for j = i reads: σ (i)k σ (i)k α τ γl i (1 l i ) d i 1 g i ( ) δ( q k, r u m) wu ˆµ (i)k 2 1. ˆσ (i)k (17) For j i, we ge: σ (j)k σ (j)k α τ γl i (1 l i )(1 d i ) ( δ( q k, r u m) g η 1 j k i g η k w u ˆµ (j)k ˆσ (j)k ) 2 1. (18) 3.5 Esimaion of sae-ransiion probabiliies in he HMT Working as above, i can be shown ha he updaing formulas for he ransformed parameers ɛ (j)k n reads for j = i: ɛ (i)k n ɛ (i)k n α τ γl i (1 l i ) d i 1 g { i δ( q k, r u m, r ρ(u) n) (19) } and for j i: ɛ (j)k n ɛ (j)k p δ( q k, r u p, r ρ(u) n)ˆɛ(i)k n n α τ γl i (1 l i )(1 d i ) g η 1 j k i g η k { δ( q k, r u m, r ρ(u) n) p δ( q k, r u p, r ρ(u) n)ˆɛ(j)k n },. (20)

8 52 Ineligencia Arificial 44(2009) 3.6 Esimaion of sae ransiion probabiliies in he HMM Similarly o secion (3.5), updaing formulas for he ransformed sae ransiion probabiliies ã (j) sj an i-class sequence reads: using ã (i) sj ã(i) sj α τ γl i (1 l i ) d i 1 g i { T } T δ( q 1 s, q j) δ( q 1 s)â (i) sj =1 =1. (21) and for j i: ã (j) sj ã(j) sj α τ γl i (1 l i )(1 d i ) =1 g η 1 j k i g η k { T T δ( q 1 s, q j) =1 δ( q 1 s)â (j) sj }. (22) 4 Experimenal resuls In order o assess he proposed raining mehod, we carry ou a simple auomaic speech recogniion es using phonemes from he TIMIT daabase [16]. In paricular, we use phonemes eh, ih and jh and compare recogniion raes achieved by he proposed mehod agains hose for he same models rained only by he EM algorihm. In all he experimens we use lef-o-righ hidden Markov models wih N Q = 3. The observaion densiy for each sae is given by an HMT wih wo saes per node. The sequence analysis is performed on a shor-erm basis using Hamming windows of 256-samples lengh, wih 50% overlap beween consecuive frames. On each frame, a full dyadic discree wavele decomposiion is carried ou using Daubechies waveles wih four vanishing momens [11]. In a firs se of experimens, we show numerically ha he recogniion rae achieved wih he EM algorihm aains an upper bound which canno be surpassed neiher increasing he number of reesimaions of he algorihm neiher enlarging he raining se. We nex es he improvemen in recogniion rae afer adding a discriminaive sage o he raining process. 4.1 How much improvemen can he EM algorihm achieve? Discriminaive raining mehods usually use maximum-likelihood esimaes provided by he EM algorihm as iniial values for he compeiive process. Thus, i is fair o ask if beer performance could be achieved jus using more raining sequences or increasing he number of reesimaions in he EM algorihm only. To answer his quesion we firs perform a wo-phoneme recogniion ask using models rained wih he EM algorihm only and raining ses of increasing sizes. The number of reesimaions was fixed o 5. Obained resuls for he { eh, ih } pair are given in Fig. 1.a). Shown resuls are averages over en rials for each size of he raining se and error bars indicae sandard deviaions. Resuls sugges ha performance is in fac improved when we enlarge very small raining ses. However, adding sequences o he raining se beyond 50 samples does no ranslae ino models achieving higher recogniion raes. The effec of fixing he size of he raining se and increasing he number of reesimaions used in he EM algorihm is shown in Fig. 1.b). Given values correspond o a raining sample comprising 50 sequences. I can be seen ha recogniion raes remain fairly he same wih he increase in he number of reesimaions. All of hese resuls confirm ha for models rained only wih he EM algorihm, performance is upper bounded and no significan improvemen can be expeced jus increasing he number of reesimaions or adding sequences o he raining se.

9 Ineligencia Arificial 44(2009) Recogniion rae Recogniion rae Size of raining se Number of reesimaions Figure 1: Recogniion raes for EM raining only: a) varying he size of he raining se; b) increasing he number of reesimaions. 4.2 MCE raining for phoneme recogniion In order o ge some insigh ino he learning process, we firs consider a classificaion ask comprising only wo phonemes. I is sraighforward o see ha he proposed missclassificaion funcion reduces o d 1 (W; Λ) = 1 g 2(W; Λ) g 1 (W; Λ). When he classifier decision is righ, he second erm in he righside of he above expression is bigger han one and he missclassificaion funcion akes a negaive value. As his decision is sronger, d 1 (W; Λ) becomes more negaive and he resuling loss (7) goes o zero. We hen see from he updaing formulas in Secs ha no updaing is performed in such a case. So, he algorihm preserves model parameers ha do well when classifying he curren raining signal. On he oher hand, if he curren raining sequence is srongly missclassified, d 1 (W; Λ) will end o 1. In his case, wheher he algorihm updae he parameers or no will depend on he value of γ in (7). As γ becomes larger, he loss approximaion will go o one faser and even hough he classifier is aking a wrong decision no parameer updae is carried ou. Thus, parameer updae akes place only when models are confusable and i is he sronges when he curren raining sequence is equally likely for boh of hem. Numerical experimens were carried ou for each pair of he considered phonemes. Fify sequences from each class were used for raining and anoher se of weny sequences from each class were used for esing. Five reesimaion seps were used in he EM algorihm, along wih Vierbi fla sar. Parameers for he MCE learning sage were se o γ = 1, β = 0, and η = 4. The learning rae α τ was linearly decreased during raining, saring from α 0 = 2.5. Five rials were performed, varying he number of compeiive ieraions hrough he whole raining se. The firs hree rows in Table 1 show he recogniion raes achieved for each pair of phonemes. Consisen performance improvemens are obained for he hree pairs of phonemes. For pairs { eh, jh } and { ih, jh } he recogniion rae increases monoonically o an upper bound as he number of ieraions of he algorihm increases. Recogniion rae for pair { eh, ih } shows some oscillaions as he number of ieraions increases. Neverheless, i is clearly seen ha discriminaively raining he models significanly improves he recogniion rae of he classifier. We nex repea he above experimen o consider he hree phonemes joinly. Obained resuls are shown in he las row in Table 1. Despie he recogniion rae oscillaes for increasing number of ieraions, improvemens remain bigger han 10% up o 35 ieraions. Furher MCE ieraions seem o decrease performance. I should be noiced ha adding phoneme jh o he classificaion ask resuls in higher recogniion raes over he { eh, ih } pair alone. This is because he former is an unvoiced phoneme and i is easier o discriminae from he pair of voiced phonemes.

10 54 Ineligencia Arificial 44(2009) Table 1: Recogniion raes vs. MCE ieraions over he whole raining se. Phoneme EM MCE Ieraions Se Baseline { eh, ih } { ih, jh } { eh, jh } { eh, ih, jh } Conclusions This paper inroduces a new mehod for discriminaive raining of hidden Markov models whose observaions are sequences in he wavele domain. The algorihm is based on he MCE/GPD approach and i allows for raining of fully non-ied HMM-HMT models. Simple speech recogniion experimens show ha he proposed mehod achieves imporan improvemens on recogniion raes over raining wih he sandard EM algorihm only. More exensive numerical experimens should be carried ou in order o es he model wih oher speech maerial as well as wih oher paerns. In addiion, furher work should be argeed o opimally se he parameers for GPD opimizaion. Acknowledgemens This work was carried ou wih financial suppor from UNL (CAI+D ), ANPCyT (PAE-PICT ) and CONICET. References [1] M. Afify, X. Li, and H. Jiang. Saisical analysis of minimum classificaion error learning for gaussian and hidden markov model classifiers. IEEE Transacions on Audio, Speech, and Language Processing, 15: , doi: /TASL [2] L.R. Bahl, P.F. Brown, P.V. De Souza, and R.L. Mercer. Maximum muual informaion esimaion of hmm parameers for speech recogniion. In Proc. of he In. Conf. on Audio, Speech, and Signal processing (ICASSP86), pages 49 52, [3] P. Baldi and S. Brunak. Bioinformaics: The Machine Learning Approach. MIT Press, Cambridge, Massachuses, [4] W. Chou. Minimum classificaion error rae (mce) approach in paern recogniion. In W. Chou and B.H. Juang, ediors, Paern Recogniion in Speech and Language Processing, pages CRC Press, [5] M. Crouse, R. Nowak, and R. Baraniuk. Wavele-based saisical signal processing using hidden Markov models. IEEE Trans. on Signal Proc., 46: , doi: / [6] A.P. Dempser, N.M. Laird, and D.B. Durbin. Maximum likelihood from incomplee daa via he em algorihm. Journal of he Royal Saisical Sociey, Series B, 39:1 38, [7] X. He and L. Deng. A new look a discriminaive raining for hidden markov models. Paern Recogniion Leers, 28: , doi: /j.parec [8] X. He, L. Deng, and W. Chou. Discriminaive learning in sequenial paern recogniion. IEEE Signal Processing Magazine, 25:14 36, doi: /MSP [9] B.-H. Juang, W. Chou, and C.-H. Lee. Minimum classificaion error rae mehods for speech recogniion. IEEE Transacions on Speech and Audio Processing, 5: , doi: /

11 Ineligencia Arificial 44(2009) 55 [10] S. Kaagiri, B.-H. Juang, and C.H. Lee. Paern recogniion using a family of design algorihms based upon he generalized probabilisic descen mehod. Proceedings of he IEEE, 86: , doi: / [11] S. Malla. A Wavele Tour of Signal Processing. Second Ediion. Academic Press, [12] E. McDermo and S. Kaagiri. A derivaion of minimum classificaion error from he heoreical classificaion risk using parzen esimaion. Compuers, Speech and Language, 18: , doi: /S (03) [13] D.H. Milone and L.E. Di Persia. An em algorihm o learn sequences in he wavele domain. Lecure Noes in Compuer Science, 4827: , doi: / [14] D.H. Milone, L.E. Di Persia, and D.R. Tomassi. Signal denoising wih hidden markov models using hidden markov rees as observaion densiies. In Proc. of he IEEE MLSP08 Workshop, pages , doi: /MLSP [15] L. Rabiner and B. Juang. Fundamenals of Speech Recogniion. Prenice-Hall, New Jersey, [16] V. Zue, S. Sneff, and J. Glass. Speech daabase developmen: Timi and beyond. Speech Communicaion, 9: , 1990.

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN: Ineligencia Arificial. Revisa Iberoamericana de Ineligencia Arificial ISSN: 1137-3601 revisa@aepia.org Asociación Española para la Ineligencia Arificial España Milone, Diego H.; Di Persia, Leandro E. Learning