IDIAP IDIAP RESEARCH REPORT A NEURAL NETWORK FOR CLASSIFICATION WITH INCOMPLETE DATA. Martigny - Valais - Suisse. August Andrew C.

Size: px
Start display at page:

Download "IDIAP IDIAP RESEARCH REPORT A NEURAL NETWORK FOR CLASSIFICATION WITH INCOMPLETE DATA. Martigny - Valais - Suisse. August Andrew C."

Transcription

1 IDIAP RESEARCH REPORT IDIAP Martgny - Valas - Susse A NEURAL NETWORK FOR CLASSIFICATION WITH INCOMPLETE DATA Andrew C. Morrs IDIAP-RR August 2000 REDUCED VERSION TO APPEAR IN Int. Conf. of Spoen Language Processng, ICSLP 2000, Beng Dalle Molle Insttute for Perceptual Artfcal Intellgence P.O.Box 592 Martgny Valas Swtzerland phone fax emal secretarat@dap.ch nternet

2

3 IDIAP-RR A NEURAL NETWORK FOR CLASSIFICATION WITH INCOMPLETE DATA Andrew C. Morrs August 2000 REDUCED VERSION TO APPEAR IN Int. Conf. of Spoen Language Processng, ICSLP 2000, Beng Abstract If the data vector for nput to an automatc classfer s ncomplete, the optmal estmate for each class probablty must be calculated as the expected value of the classfer output. We dentfy a form of Radal Bass Functon (RBF) classfer whose expected outputs can easly be evaluated n terms of the orgnal functon parameters. Two ways are descrbed n whch ths classfer can be appled to robust automatc speech recognton, dependng on whether or not the poston of mssng data s nown. Acnowledgements: Ths wor was supported by the EC/OFES (European Communty / Swss Federal Offce for Educaton and Scence) RESPITE proect (REcognton of Speech by Partal Informaton TEchnques). Recognton tests for the methods presented n ths report were carred out n collaboraton wth the speech group at Sheffeld Unversty, U.K.

4 4 IDIAP-RR 00-23

5 IDIAP-RR Contents 1. Introducton IDCN archtecture Poston of mssng data gven Poston of mssng data unnown IDCN tranng Parameter ntalsaton Error gradent calculaton Gradent descent teraton Recognton wth mssng data Summary and concluson Acnowledgements Appendx A: Usng HTK for both Gaussans and output layer weghts ntalsaton Appendx B: Dervaton of IDCN error gradent equatons Appendx C: Dervaton of expected class posteror probabltes References

6 6 IDIAP-RR 00-23

7 IDIAP-RR Introducton In any realstc automatc recognton tas t s common that part of the nput feature vector x to be classfed s corrupted by some nd of nose process, and the recognton performance of a system whch s not traned to expect ths nd of nose wll degrade dramatcally as the nose level ncreases. In many cases ths problem can be reduced by applyng some nd of nose removal or data enhancement process. But there are also many stuatons n whch some feature components are rretrevable. The approach taen n ths case depends on to what extent t s possble to dentfy whch features have been corrupted. If the poston of mssng features s gven, then the estmate for the posteror probablty for each class, whch s best n the sense that t gves the maxmum probablty of correct classfcaton, can be obtaned as the expected value of the classfer output for that class, condtoned by any avalable constrants on the mssng data [10]. The man problem wth ths approach s that for most classfers, the expected value of the class probablty outputs cannot be obtaned as a smple closed form expresson from the classfer parameters. If the poston of mssng data s not nown, one successful approach [6, 11, 12] has been to tran a separate classfer for each possble poston of mssng data and then to combne the posterors for one class as a weghted sum over all classfers. Even wth equal weghts ths approach shows some robustness to mssng data, because uncertan classfers tend to contrbute equal and therefore small probabltes to each class. The problem wth ths approach s that the number of dfferent possble postons of mssng data s generally far too large to allow tranng of a separate classfer for each poston. In ths paper we present a partcular form of Radal Bass Functon (RBF) classfer n whch the output layer uses Bayes Rule to drectly transform pooled mxture lelhoods from the RBF layer nto a-posteror class probabltes [2, 3, 8, 17]. Even though the output unts are non-lnear, the expected outputs of ths classfer, for any gven mssng data components, are a smple functon of the orgnal classfer parameters. The use of closely related RBF networs for recognton wth mssng data s not new [1], but to the author s nowledge the partcular form of ncomplete data classfcaton networ (IDCN) descrbed here has not been used before n ether of the technques presented n ths report. In Secton 2 we present the IDCN archtecture, and descrbe how t can be appled n two dfferent nds of HMM/ ANN hybrd system for automatc speech recognton (ASR), dependng on whether the poston of mssng data s nown, or otherwse. In Secton 3 we descrbe varous ways n whch the IDCN can be traned for ASR. Secton 4 shows how networ outputs (class posteror probabltes) are calculated when some of the nput features are mssng. In Secton 5 the wor s summarsed, problems arsng are brefly dscussed and new ways forward are suggested.

8 8 IDIAP-RR IDCN archtecture nput x y ( x) pxr ( ) z ( x) Ps ( x) x 1 y 1 z x y z x nx y ny z nz Fgure 1: RBF networ used here for classfcaton wth ncomplete data. The output layer uses Bayes Rule to drectly transform pooled mxture lelhoods from the RBF layer nto a-posteror class probabltes. The networ has one nput, one hdden and one output layer, as shown n Fg.1. Each RBF unt y n the hdden layer uses a dagonal covarance Gaussan y ( x) to model the probablty densty pxr ( ) for nput vector x havng been generated by ths Gaussan, whle each output unt uses a functon z ( x) to model the posteror probablty that x s from output class. If r denotes that x was generated by Gaussan, and s that x s from class, then: where y ( x) pxr ( ) N( x, µ, v ) pxs (, ) net z ( x) Ps ( x) px ( ) px ( ) (1) (2) net Pxr (,, s ) Pr (, s )pxr (, s ) Pr (, s )pxr ( ) px ( ) pxr (,, s ), net w y (3) (4) Although the above structure of the IDCN does not change, the way n whch t s appled depends on whether the poston of mssng nput data s nown. 2.1 Poston of mssng data gven The IDCN can be used as a front end to a conventonal HMM based ASR system, whereby the log lelhoods whch are normally calculated from the Gaussan mxture models for each hdden state are replaced, durng decodng, by log scaled lelhoods from the IDCN (by dvdng by ther class prors Ps ( ), then tang the logarthm). Ths comprses a form of HMM/ANN based ASR system [3] whch s sutable for use wth mssng data when the poston of mssng data s gven. The man potental advantages of ths model over the purely HMM based mssng-data theory based system descrbed n [10] s that the ANN s dscrmnatvely traned and provdes a more powerful model for capturng spectral dynamcs.

9 IDIAP-RR Poston of mssng data unnown In prncple a sngle IDCN can be used to replace the 2 d dfferent ANN experts whch are normally requred [12] to cover all possble selectons of mssng features from a d dmensonal feature vector. Provded that the combned features nput to each ANN expert are merely concatenated (.e. no compresson, orthogonalsaton, or whatever s appled), the expected posterors for each poston of mssng features can be computed drectly from the IDCN parameters, and then smply combned n a lnearly weghted sum [11] or geometrcally weghted product [5]. 3. IDCN tranng Classfer parameters to be traned are the mean and varance vectors n Eq.(1) for each Gaussan RBF unt, and the output layer weghts, w, n Eq.(3). In order for the performance of ths classfer to compete wth that of the MLP, t s essental that all parameters are traned together, and wth a dscrmnatve obectve functon. Unsupervsed dscrmnatve tranng s also possble, usng mnmum classfcaton error technques [9]. However, n ths artcle we tae the smpler approach of tranng by supervsed gradent descent. Durng tranng the softmax functon s used to constran the weghts w Pr (, s ) to le n [ 01, ], and sum to one. w e α l, m e α lm Ths gves the full set of parameters to be traned as ( µ, v, α ), for 1 n x, 1 n y, 1 n z. 3.1 Parameter ntalsaton Any hll clmbng procedure can encounter problems wth local mnma, so that system performance may be very senstve to the ntal parameter values used. In the context of the TIDgts connected dgts ASR tas, the followng two methods were tested [13] for ntalsng the RBF layer parameters (means, varances, and prors Pr ( )): Randomly assgn each data pont to an RBF centre, followed by -means clusterng and lelhood maxmsaton by Expectaton Maxmsaton (EM). Use HTK (verson 1.5) [18] to tran a set of 400 pooled Gaussans, usng the Baum-Welch forward-bacward tranng algorthm, wth embedded realgnment. As well as tranng the RBF layer parameters, HTK also trans mx weghts Pr ( s ) for each of the hdden states as specfed by whatever HMM structure s to be used n recognton. Whchever of the above methods was used, the traned HMM model was also used to provde a tranng data segmentaton, from whch we can estmate Ps ( ). Once the Gaussan parameters were ntalsed, two methods were tested for ntalsng the weghts w, usng the gven segmentaton: Use HMM traned mx weghts Pr ( s ) only: (5) w Pr (, s ) Pr ( s )Ps ( ) (6) Use HMM traned Gaussans only (see Appendx A for dervaton of ths rule): P( r ) Pr ( s )Ps ( ), P( s r ) y ( x ) y ( x ), w Pr ( )Ps ( r ) (7) x s

10 10 IDIAP-RR Of these dfferent RBF layer and output layer ntalsaton methods, the best results by far were obtaned usng RBFs traned usng HTK, and output weghts traned usng Eq.(7). Before gradent descent tranng, the auxlary parameters α were then ntalsed as: α log( w ) (8) 3.2 Error gradent calculaton Whchever error functon E s used, the dervatves of E wth respect to each of the model parameters were obtaned by the usual error bac propagaton (EBP) approach, frst calculatng the delta values for each output unt. See Appendx B for detals of EBP algorthm and dervaton of Eqs(9-12): δ net l ( δ l z l ) z l px ( ) (9) µ v ( ) y x µ v w δ ( x µ ) 2 y w v 2v δ (10) (11) w α ( 1 w )y δ (12) If s the target posteror for class l, then for three common error functons (to be mnmsed): τ l E, ( z ( x ) t ( x )) 2 : mean square error (13) E, t ( x ) logz ( x ) : cross-entropy (14) E, z ( x )t ( x ) : correlaton (15) we have , wth (droppng the subscrpt) as: z l z l z l z l τ l : mean square error (16) τ l z l τ l : cross-entropy (17) : correlaton (18) Best results here used the cross-entropy obectve.

11 IDIAP-RR Gradent descent teraton A constant momentum factor θ 1 s used, and an adaptable learnng rate, ϕ [4]. Wth g E, g 0 0, dw 0 0, ϕ 0 1, we have (where ĝ s a unt vector): + ϕ t ( 1 0.5dwˆ t ĝ t ) ϕ t 1 dw t + 1 ϕ t 1 w t 1 + ( θdw t ĝ t ) + w t + dw t + 1 (19) (20) (21) Tranng contnues untl the correct state classfcaton rate on the cross-valdaton set stops ncreasng. The gradent wth respect to all IDCN parameters was evaluated, and all parameters updated, usng all frames from a fxed number of utterances whch were selected at random from the full tranng set. We found that very small samples led rapdly to one or more RBFs developng zero prors, from whch they could not escape. As a compromse between processng speed and performance level at convergence, we settled on samples of 100 utterances. It was found that further tranng of the RBF parameters by EM, after gradent descent tranng had converged, followed by applcaton of Eq.(7), nevtably resulted n a very rapd ncrease n data lelhood, accompaned by an equally dramatc fall n classfcaton accuracy. As a result ths technque was not used. 4. Recognton wth mssng data As outlned n Secton 2, the way n whch the IDCN s ncorporated nto a recognton system depends on whether or not the poston of mssng data s gven. If t s gven then expected posteror probabltes for each state need to be calculated ust once, for the gven poston of mssng data. Otherwse the expected posterors need to be calculated for all possble postons of mssng data, and averaged [3]. Whchever s the case, for any gven poston of mssng data we may denote the present and mssng components of the feature vector by ( x p, x m ). The estmate for Ps ( x) whch results n the hghest probablty of correct classfcaton s then gven by the expected value of the classfer output functon, condtoned on x p and any nowledge κ m whch may constran mssng data values [10]. For the RBF classfer presented here ths leads to the followng estmates. If nothng s nown about the mssng data then (see Appendx C for dervaton of Eqs(22-24)): ẑ ( x) EPs [ ( x) x p ] w y ( x p ) If each mssng feature has a lmted range of possble values (as s the case for flterban features, whch are bounded below by zero and above by ther observed value): (22) ẑ ( x) EPs [ ( x) x p, x m r m ] w y ( x p ) y ( x m ) dx m r m (23) (24) In Eqs. (22) and (24) y ( x p ) s the margnal dagonal Gaussan over the ndcated x components. Posterors ẑ ( x) are obtaned by scalng the above values to sum to one across all classes.

12 12 IDIAP-RR It should be noted that t s only due to the consstent probablstc nterpretaton of each stage of processng by ths networ that t s so smple to obtan the margnal posterors n ths way drectly from the full system parameters. 5. Summary and concluson We have shown that an RBF networ, n whch the output layer uses Bayes Rule to drectly transform pooled mxture lelhoods from the RBF layer nto a-posteror class probabltes, s a sutable canddate networ for classfcaton wth mssng data. Ths s because t can be dscrmnatvely traned, and the expected values of ts posteror class probablty outputs can readly be evaluated as a smple functon of the orgnal model parameters. We have further shown how ths networ can be ncorporated nto two dfferent approaches to robust ASR. For the case where the poston of mssng data s nown we can ntegrate the IDCN nto an HMM system by replacng the usual state lelhoods by scaled state lelhoods output from the IDCN. In ths case the posterors based system should show some advantage over the lelhood based system due to dscrmnatve tranng. However, ASR tests [13] have shown that severe problems arse wth local mnma durng IDCN tranng by gradent descent, to the pont that very lttle performance mprovement s possble after parameter ntalsaton through normal non dscrmnatve EM based HMM tranng. In fact, performance of the IDCN/HMM system was almost dentcal to that of a Gaussan mxture lelhood based HMM system, usng the same mssng feature theory and the same method for detectng mssng data [16]. It s possble that the performance of the IDCN n ths case could be mproved by use of a more effectve dscrmnatve HMM tranng procedure, such as MCE [9] and/or boostng [15]. When the poston of mssng data s not nown, the IDCN offers a new approach to mult-stream processng whch should permt large numbers of feature streams to be combned wth greatly reduced effort. Ths approach remans to be tested. Acnowledgements Ths wor was supported by the EC/OFES (European Communty / Swss Federal Offce for Educaton and Scence) RESPITE proect (REcognton of Speech by Partal Informaton TEchnques). Recognton tests for the methods presented n ths report were carred out n collaboraton wth the speech group at Sheffeld Unversty, U.K.

13 IDIAP-RR Appendx A: Usng HTK for both Gaussans and output layer weghts ntalsaton HTK can be used to estmate the Gaussan parameters µ, v, and mx weghts Pr ( s ) for each pooled Gaussan r and hdden state s. The traned HMMs can then be used to produce a state level segmentaton. From ths segmentaton we can drectly estmate state prors Ps ( ) from the relatve frequency of occurrence of each state n the tranng data 1. The number of free parameters to be traned should frst be reduced by combnng Pr ( s ) and Ps ( ) nto Pr (, s ). We have tested two ways of dong ths. Method 1 uses Eq.(6), Secton 3.1. w Pr (, s ) Pr ( s )Ps ( ) Method 2 starts as method 1, but then estmates w Pr (, s ) Pr ( )Ps ( r ) by frst estmatng Pr ( ) usng Eq.(7), and then estmatng Ps ( r ) usng Eq.(7) P( s r ) y ( x ) x s y ( x ) Ths s derved as follows: P( s r ) Pr (, s ) Pr ( ) p( x, r, s ) Pr ( )p( x r, s ) (25) px (, r, s ) Pr ( ) px ( r, s ) Ps ( x, r )px (, r ) Pr ( ) px ( r ) (26) (27) px ( r ) px ( r ) y ( x ) x s x s y ( x ) (28) In the ASR tests made, t was found that method 2 gave far better recognton results. However, t s not clear why ths should be so, and so ths result may not generalse to other databases. 1. If one or more states occur only a small number of tmes (so that the varance of the relatve error n the relatve frequency estmate s unacceptably hgh) then all state pror estmates should be weghted towards the unform pror 1 n. Ths s a commonly used probablty estmate correcton, whch s drectly related to the so called m estmate, where the weghtng factor s proportonal to the pror degree of belef that the probabltes are all equal.

14 14 IDIAP-RR Appendx B: Dervaton of IDCN error gradent equatons Although n the present case the neural networ under consderaton has only one hdden layer, t s stll helpful to mae use of the error bac propagaton (EBP) theoretcal framewor, whch s based around the dea that, for any networ wth connectons between adacent layers only, the contrbutons to the error gradent for parameters n the networ at layer can be obtaned n terms of quanttes δ whch are frst evaluated for each node n layer ( + 1). z l net net , where net (29) px ( ) w y ( x) net m m The delta rule maes use of the chan rule for partal dfferentals, as follows: δ z l net net l z l net z l net px ( ) l l l ( δ l z l ) z l px ( ) (30) We can obtan the error gradent wth respect to the output layer parameters as follows: w w net net δ w w w l y l ( x) l δ y ( x) (31) α w w α (32) w α e α e α e α lm α e α e α lm e α lm α 2 l, m e α lm l, m l, m l, m (33) e α lm 2 e α e α lm e α 2 w w l, m l, m δ α y w ( 1 w ) w ( 1 w ) (34) (35) The error gradent for parameters µ and v n the hdden layer can be obtaned n a smlar way as follows: µ y , , (36) y µ y net net net y y wl y y l w l y ( x µ ) ( x y µ ), y w δ (37) µ v µ v y v y ( x µ ) 2 y ( x , µ ) w (38) 2v v v 2v v δ

15 IDIAP-RR Appendx C: Dervaton of expected class posteror probabltes When a parametrc classfer z ( x) s traned to estmate posteror class probabltes Pq ( x), and the poston of mssng components n the data vector x s gven so that t can be parttoned nto present and mssng parts ( x p, x m ), then the estmate for the posteror probablty for each class whch s best n the sense that t gves the maxmum probablty of correct classfcaton, s gven by the expected value of the networ output, condtoned on the data whch s not mssng, and any avalable nowledge whch may be used to constran the mssng data values 1 [10]. κ m ẑ ( x) Ez [ ( x) x p, κ m ] EPs [ ( x) x p, κ m ] ps (, x) px ( px ( ) m x p, κ m ) dx m (39) Here we wll consder ust two mssng-data condtons. One n whch nothng at all s nown about the mssng data values, and another n whch the mssng data s nown to le wthn a gven range,. In the second case we have: px ( m x p, κ m ) px ( m x p ) when x, else 0 (40) px ( m x p ) dx m ps (, x) px ( m x p ) so ẑ ( x) (41) px ( p )px ( m x p ) d x m A p( s, x) dx m px ( m x p ) dx m where A s ndependent of. The ntegral can easly be evaluated as follows: ps (, x) dx m w y ( x) d x m w N ( x p ) N ( x m x p ) dx m (42) Here we consder only the case of dagonal covarance, so that N( x m x p ) N( x m ) [14], and ẑ ( x) A w N ( x p ) N ( x m ) dx m (43) The ntegral n Eq.(43) can easly be evaluated as the product of unvarate Gaussan ntegrals, each of whch can be evaluated usng the C standard erf functon. If the mssng data s unbounded then the ntegral s ust unty and can be gnored. As Ps ( x, κ ) m 1, the constant A can be elmnated, to obtan ẑ ( x) as follows: θ ( x) w N ( x p ) N ( x m ) dx m (44) ẑ ( x) θ ( x) θ l ( x) l (45) 1. Note that the analyss presented n [1] regardng posterors estmaton wth mssng data s ncorrect.

16 16 IDIAP-RR References [1] Ahmed, S. & Tresp, V. (1993) Some solutons to the mssng feature problem n vson, n Advances n Neural Informaton Processng Systems 5, Morgan Kauffman, San Mateo, pp [2] Bshop, C. (1995) Neural Networs for Pattern Recognton, Clarendon Press, Oxford. [3] Bourlard, H. & Morgan, N. (1993) Connectonst Speech Recognton, Kluwer Academc Publshers, Boston. [4] Chan, L.W. & Fallsde, F. (1987) An adaptve tranng algorthm for bac-propagaton networs, Computer Speech and Language 2, pp [5] Hagen, A. & Morrs, A.C. (n press) Comparson of HMM experts wth MLP experts n the full combnaton mult-band approach to robust ASR, Proc. ICSLP [6] Hagen, A., Morrs, A.C. & Bourlard, H. (1998) Sub-band based speech recognton n nosy condtons: The Full-Combnaton approach, Research Report IDIAP-RR [7] Hermansy, H., Ells, D. & Sharma, S. (2000) Tandem connectonst feature stream extracton for conventonal HMM systems, Proc. ICASSP-2000, pp [8] Lppmann, R. P. & Carlson, B. A. (1997) Usng mssng feature theory to actvely select features for robust speech recognton wth nterruptons, flterng and nose, Proc. Eurospeech 97, pp [9] McDermott, E. & Katagr, S. (1994) Prototype-based mnmum classfcaton error / generalsed probablstc descent tranng for varous speech unts, Computer Speech and Language, No.8, pp [10] Morrs, A.C., Cooe, M. & Green, P. (1998) Some solutons to the mssng feature problem n data classfcaton, wth applcaton to nose robust ASR, Proc. ICASSP'98, pp [11] Morrs, A.C., Hagen, A. & Bourlard, H. (1999) The full-combnaton subbands approach to nose robust HMM/ANN based ASR, Proc. Eurospeech 99, pp [12] Morrs, A.C., Hagen, A., Glotn, H. & Bourlard, H. (n press) Mult-stream adaptve evdence combnaton for nose robust ASR, Speech Communcaton. [13] Morrs, A.C., Josfovs, L., Bourlard, H., Cooe, M. & Green, P. (n press) A neural networ for classfcaton wth ncomplete data: applcaton to robust ASR, Proc. ICSLP [14] Morrson, D.F. (1990) Multvarate statstcal methods, 3rd Edton, McGraw-Hll. [15] Schwen, H. (1999) Usng boostng to mprove a hybrd HMM/neural networ speech recognser, Proc. ICASSP 99. pp [16] Vznho, A., Green, P., Cooe, M. & Josfovs, L. (1999) Mssng data theory, spectral subtracton and sgnal-to-nose estmaton for robust ASR: An ntegrated study, Proc. Eurospeech 99, pp [17] Whte, H. (1989) Learnng n artfcal neural networs: A statstcal perspectve, Neural Computng, 1, pp [18] Young, S.J. & Woodland, P.C. (1993) HTK Verson 1.5: User, Reference and Programmer Manual, Cambrdge Unversty Engneerng Dept., Speech Group.

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Clustering with Gaussian Mixtures

Clustering with Gaussian Mixtures Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Large-Margin HMM Estimation for Speech Recognition

Large-Margin HMM Estimation for Speech Recognition Large-Margn HMM Estmaton for Speech Recognton Prof. Hu Jang Department of Computer Scence and Engneerng York Unversty, Toronto, Ont. M3J 1P3, CANADA Emal: hj@cs.yorku.ca Ths s a jont work wth Chao-Jun

More information

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

Probability Density Function Estimation by different Methods

Probability Density Function Estimation by different Methods EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT Probablty Densty Functon Estmaton by dfferent Methods Vas Chandraant Rayar Abstract The am of the assgnment was to estmate the probablty densty functon (PDF of

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks Internet Engneerng Jacek Mazurkewcz, PhD Softcomputng Part 3: Recurrent Artfcal Neural Networks Self-Organsng Artfcal Neural Networks Recurrent Artfcal Neural Networks Feedback sgnals between neurons Dynamc

More information

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing Advanced Scence and Technology Letters, pp.164-168 http://dx.do.org/10.14257/astl.2013 Pop-Clc Nose Detecton Usng Inter-Frame Correlaton for Improved Portable Audtory Sensng Dong Yun Lee, Kwang Myung Jeon,

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition

Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition Overvew Hdden Marov Models and Gaussan Mxture Models Steve Renals and Peter Bell Automatc Speech Recognton ASR Lectures &5 8/3 January 3 HMMs and GMMs Key models and algorthms for HMM acoustc models Gaussans

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Finding Dense Subgraphs in G(n, 1/2)

Finding Dense Subgraphs in G(n, 1/2) Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Non-linear Canonical Correlation Analysis Using a RBF Network

Non-linear Canonical Correlation Analysis Using a RBF Network ESANN' proceedngs - European Smposum on Artfcal Neural Networks Bruges (Belgum), 4-6 Aprl, d-sde publ., ISBN -97--, pp. 57-5 Non-lnear Canoncal Correlaton Analss Usng a RBF Network Sukhbnder Kumar, Elane

More information

Differentiating Gaussian Processes

Differentiating Gaussian Processes Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Chapter - 2. Distribution System Power Flow Analysis

Chapter - 2. Distribution System Power Flow Analysis Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load

More information

Indeterminate pin-jointed frames (trusses)

Indeterminate pin-jointed frames (trusses) Indetermnate pn-jonted frames (trusses) Calculaton of member forces usng force method I. Statcal determnacy. The degree of freedom of any truss can be derved as: w= k d a =, where k s the number of all

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows:

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows: Supplementary Note Mathematcal bacground A lnear magng system wth whte addtve Gaussan nose on the observed data s modeled as follows: X = R ϕ V + G, () where X R are the expermental, two-dmensonal proecton

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radal-Bass uncton Networs v.0 March 00 Mchel Verleysen Radal-Bass uncton Networs - Radal-Bass uncton Networs p Orgn: Cover s theorem p Interpolaton problem p Regularzaton theory p Generalzed RBN p Unversal

More information

Introduction to the Introduction to Artificial Neural Network

Introduction to the Introduction to Artificial Neural Network Introducton to the Introducton to Artfcal Neural Netork Vuong Le th Hao Tang s sldes Part of the content of the sldes are from the Internet (possbly th modfcatons). The lecturer does not clam any onershp

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

XII.3 The EM (Expectation-Maximization) Algorithm

XII.3 The EM (Expectation-Maximization) Algorithm XII.3 The EM (Expectaton-Maxzaton) Algorth Toshnor Munaata 3/7/06 The EM algorth s a technque to deal wth varous types of ncoplete data or hdden varables. It can be appled to a wde range of learnng probles

More information

Research Article Green s Theorem for Sign Data

Research Article Green s Theorem for Sign Data Internatonal Scholarly Research Network ISRN Appled Mathematcs Volume 2012, Artcle ID 539359, 10 pages do:10.5402/2012/539359 Research Artcle Green s Theorem for Sgn Data Lous M. Houston The Unversty of

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN S. Chtwong, S. Wtthayapradt, S. Intajag, and F. Cheevasuvt Faculty of Engneerng, Kng Mongkut s Insttute of Technology

More information

RBF Neural Network Model Training by Unscented Kalman Filter and Its Application in Mechanical Fault Diagnosis

RBF Neural Network Model Training by Unscented Kalman Filter and Its Application in Mechanical Fault Diagnosis Appled Mechancs and Materals Submtted: 24-6-2 ISSN: 662-7482, Vols. 62-65, pp 2383-2386 Accepted: 24-6- do:.428/www.scentfc.net/amm.62-65.2383 Onlne: 24-8- 24 rans ech Publcatons, Swtzerland RBF Neural

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

Inductance Calculation for Conductors of Arbitrary Shape

Inductance Calculation for Conductors of Arbitrary Shape CRYO/02/028 Aprl 5, 2002 Inductance Calculaton for Conductors of Arbtrary Shape L. Bottura Dstrbuton: Internal Summary In ths note we descrbe a method for the numercal calculaton of nductances among conductors

More information

Statistical Foundations of Pattern Recognition

Statistical Foundations of Pattern Recognition Statstcal Foundatons of Pattern Recognton Learnng Objectves Bayes Theorem Decson-mang Confdence factors Dscrmnants The connecton to neural nets Statstcal Foundatons of Pattern Recognton NDE measurement

More information

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

10.34 Fall 2015 Metropolis Monte Carlo Algorithm 10.34 Fall 2015 Metropols Monte Carlo Algorthm The Metropols Monte Carlo method s very useful for calculatng manydmensonal ntegraton. For e.g. n statstcal mechancs n order to calculate the prospertes of

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Relevance Vector Machines Explained

Relevance Vector Machines Explained October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Using Immune Genetic Algorithm to Optimize BP Neural Network and Its Application Peng-fei LIU1,Qun-tai SHEN1 and Jun ZHI2,*

Using Immune Genetic Algorithm to Optimize BP Neural Network and Its Application Peng-fei LIU1,Qun-tai SHEN1 and Jun ZHI2,* Advances n Computer Scence Research (ACRS), volume 54 Internatonal Conference on Computer Networks and Communcaton Technology (CNCT206) Usng Immune Genetc Algorthm to Optmze BP Neural Network and Its Applcaton

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

A Hybrid Variational Iteration Method for Blasius Equation

A Hybrid Variational Iteration Method for Blasius Equation Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 1 (June 2015), pp. 223-229 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) A Hybrd Varatonal Iteraton Method

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

CHAPTER 4 SPEECH ENHANCEMENT USING MULTI-BAND WIENER FILTER. In real environmental conditions the speech signal may be

CHAPTER 4 SPEECH ENHANCEMENT USING MULTI-BAND WIENER FILTER. In real environmental conditions the speech signal may be 55 CHAPTER 4 SPEECH ENHANCEMENT USING MULTI-BAND WIENER FILTER 4.1 Introducton In real envronmental condtons the speech sgnal may be supermposed by the envronmental nterference. In general, the spectrum

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

A Particle Filter Algorithm based on Mixing of Prior probability density and UKF as Generate Importance Function

A Particle Filter Algorithm based on Mixing of Prior probability density and UKF as Generate Importance Function Advanced Scence and Technology Letters, pp.83-87 http://dx.do.org/10.14257/astl.2014.53.20 A Partcle Flter Algorthm based on Mxng of Pror probablty densty and UKF as Generate Importance Functon Lu Lu 1,1,

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

EPR Paradox and the Physical Meaning of an Experiment in Quantum Mechanics. Vesselin C. Noninski

EPR Paradox and the Physical Meaning of an Experiment in Quantum Mechanics. Vesselin C. Noninski EPR Paradox and the Physcal Meanng of an Experment n Quantum Mechancs Vesseln C Nonnsk vesselnnonnsk@verzonnet Abstract It s shown that there s one purely determnstc outcome when measurement s made on

More information

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess

More information

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method

More information

A new Approach for Solving Linear Ordinary Differential Equations

A new Approach for Solving Linear Ordinary Differential Equations , ISSN 974-57X (Onlne), ISSN 974-5718 (Prnt), Vol. ; Issue No. 1; Year 14, Copyrght 13-14 by CESER PUBLICATIONS A new Approach for Solvng Lnear Ordnary Dfferental Equatons Fawz Abdelwahd Department of

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Solving Nonlinear Differential Equations by a Neural Network Method

Solving Nonlinear Differential Equations by a Neural Network Method Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We

2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We Advances n Neural Informaton Processng Systems 8 Actve Learnng n Multlayer Perceptrons Kenj Fukumzu Informaton and Communcaton R&D Center, Rcoh Co., Ltd. 3-2-3, Shn-yokohama, Yokohama, 222 Japan E-mal:

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data

More information

arxiv:cs.cv/ Jun 2000

arxiv:cs.cv/ Jun 2000 Correlaton over Decomposed Sgnals: A Non-Lnear Approach to Fast and Effectve Sequences Comparson Lucano da Fontoura Costa arxv:cs.cv/0006040 28 Jun 2000 Cybernetc Vson Research Group IFSC Unversty of São

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

Physics 5153 Classical Mechanics. Principle of Virtual Work-1

Physics 5153 Classical Mechanics. Principle of Virtual Work-1 P. Guterrez 1 Introducton Physcs 5153 Classcal Mechancs Prncple of Vrtual Work The frst varatonal prncple we encounter n mechancs s the prncple of vrtual work. It establshes the equlbrum condton of a mechancal

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

Integrals and Invariants of Euler-Lagrange Equations

Integrals and Invariants of Euler-Lagrange Equations Lecture 16 Integrals and Invarants of Euler-Lagrange Equatons ME 256 at the Indan Insttute of Scence, Bengaluru Varatonal Methods and Structural Optmzaton G. K. Ananthasuresh Professor, Mechancal Engneerng,

More information