Regression Using Support Vector Machines: Basic Foundations

Size: px
Start display at page:

Download "Regression Using Support Vector Machines: Basic Foundations"

Transcription

1 Regresson Usng Support Vector Machnes: Basc Foundatons Techncal Report December 004 Aly Farag and Refaat M Mohamed Computer Vson and Image Processng Laboratory Electrcal and Computer Engneerng Department Unversty of Lousvlle Lousvlle, KY 409

2 1 Regresson Usng Support Vector Machnes: Basc Foundatons Support Vector Machnes (SVM) were developed by Vapnk [1] to solve the classfcaton problem, but recently, SVM have been successfully extended to regresson and densty estmaton problems []. SVM are ganng popularty due to many attractve features and promsng emprcal performance. For nstance, the formulaton of SVM densty estmaton employs the Structural Rsk Mnmzaton (SRM) prncple, whch has been shown to be superor to the tradtonal Emprcal Rsk Mnmzaton (ERM) prncple employed n conventonal learnng algorthms (e.g. neural networks) [3]. SRM mnmzes an upper bound on the generalzaton error as opposed to ERM, whch mnmzes the error on the tranng data. Ths dfference makes SVM more attractve n statstcal learnng applcatons. The tradtonal formulaton of the SVM densty estmaton problem rases a quadratc optmzaton problem of the same sze as the tranng data set. Ths computatonally demandng optmzaton problem prevents the SVM from beng the default choce of the pattern recognton communty [4]. Several approaches have been ntroduced for crcumventng the above shortcomngs of the SVM learnng. These nclude smpler optmzaton crteron for SVM desgn (e.g. the kernel ADA- TRON [5]), specalzed QP algorthms lke the conjugate gradent method, decomposton technques (whch break down the large QP problem nto a seres of smaller QP sub-problems), the sequental mnmal optmzaton (SMO) algorthm and ts varous extensons [6], Nystrom approxmatons [7], and greedy Bayesan methods [8] and the Chunkng algorthm [9]. Recently, actve learnng has become a popular paradgm for reducng the sample complexty of large-scale learnng tasks (e.g. [10 1]). In actve learnng, nstead of learnng from random samples, the learner has the ablty to select ts own tranng data. Ths s done teratvely and the output of one step s used to select the examples for the next step. Ths tutoral presents the mathematcal foundatons of the SVM regresson algorthm. Then, t presents a new learnng algorthm whch uses the Mean Feld (MF) theory. The MF methods provde effcent approxmatons whch are able to cope wth the complexty of probablstc data models [13]. MF methods replace the ntractable task of computng hgh dmensonal sums and ntegrals by the much easer problem of solvng a system of lnear equatons. The regresson problem s formu-

3 1 Problem Statement and Some Basc Prncples lated so that the MF method can be used to approxmate the learnng procedure n a way that avods the quadratc programmng optmzaton. Ths proposed approach s sutable for hgh dmensonal regresson problems and several expermental examples are presented. 1 Problem Statement and Some Basc Prncples The regresson problem can be stated as: gven a tranng data set D = (y, t ) = 1,,..., n}, of nput vectors y and assocated targets t, the goal s to ft a functon g(y) whch approxmates the relaton nherted between the data set ponts and t can be used later on to nfer the output t for a new nput data pont y. Any practcal regresson algorthm has a loss functon L (t, g(y)), whch descrbes how the estmated functon devated from the true one. Many forms for the loss functon can be found n the lterature: e.g. lnear, quadratc loss functon, exponental, etc. In ths tutoral, Vapnk s loss functon s used, whch s known as ε nsenstve loss functon and defned as: 0 f t g(y) ε L (t, g(y)) = (1) t g(y) ε otherwse Fgure 1: The soft margn loss functon. where ε> 0 s a predefned constant whch controls the nose tolerance. Wth the ε nsenstve loss functon, the goal s to fnd g(y) that has at most ε devaton from the actually obtaned targets t for all tranng data, and at the same tme as flat as possble. In other words, the regresson algorthm does not care about errors as long as they are less than ε, but wll not accept any devaton larger than ths.

4 Classcal Formulaton of the Regresson Problem 3 For pedagogcal reasons, the followng dscusson begns by descrbng the case of lnear functons g, takng the form: f(y) = w.y + b () where w Y, Y s the nput space, b R, and w.y s the dot product of the vectors w and y. Classcal Formulaton of the Regresson Problem As stated before, the goal of a regresson algorthm s to ft a flat functon to the data ponts. Flatness n the case of Eq. () means that one seeks a small w. One way to ensure ths flatness s to mnmze the norm,.e. w. Thus, the regresson problem can be wrtten as a convex optmzaton problem: mnmze subject to 1 w (3) t (w.y + b) ε (4) (w.y + b) t ε The mpled assumpton n Eq.(4) s that such a functon g actually exsts that approxmates all pars (y, t ) wth ε precson, or n other words, that the convex optmzaton problem s feasble. Sometmes, however, ths may not be the case, or we also may want to allow for some errors. Analogously to the soft margn loss functon [14] whch was adapted to SVM machnes Vapnk [15], slack varables ζ, ζ can be ntroduced to cope wth otherwse nfeasble constrants of the optmzaton problem n Eq.(4). Hence the formulaton stated n [15] s attaned: mnmze subject to 1 w + C (ζ + ζ ) (5) t (w.y + b) ε + ζ (w.y + b) t ε + ζ (6) ζ, ζ 0 The constant C > 0 determnes the trade-off between the flatness of g and the amount up to whch devatons larger than ε are tolerated. Ths corresponds to dealng wth the so called ε-nsenstve loss functon whch descrbed before.

5 .1 Dual problem and quadratc programmng 4 As shown n Fg.1, only the ponts outsde the shaded regon contrbute to the cost nsofar, as the devatons are penalzed n a lnear fashon. It turns out that n most cases the optmzaton problem Eq. (6) can be solved more easly n ts dual formulaton. Moreover, the dual formulaton provdes the key for extendng SVM machne to nonlnear functons. Hence, a standard dualzaton method utlzng Lagrange multplers wll be descrbed next..1 Dual problem and quadratc programmng The mnmzaton problem n Eq. (6) s called the prmal objectve functon. The key dea of the dual problem s to construct a Lagrange functon from the prmal objectve functon and the correspondng constrants, by ntroducng a dual set of varables. It can be shown that the Lagrange functon has a saddle pont wth respect to the prmal and dual varables at the soluton (for detals see e.g. [16], [17]). The prmal objectve functon wth ts constrants are transformed to the Lagrange functon as follows: L = 1 w + C (ζ + ζ ) (λ ζ + λ ζ ) α (ε + ζ t + (w.y + b)) α (ε + ζ + t (w.y + b)) (7) Here L s the Lagrangan and α, α, λ, and λ are Lagrange multplers. Hence the dual varables n Eq. (7) have to satsfy postvty constrants: α, α, λ, λ 0. (8) It follows from the saddle pont condton that the partal dervatves of L wth respect to the prmal varables (w, b, ζ, ζ ) have to vansh for optmalty: (Note α ( ), refers to α, and α. b L = w L = (α α ) = 0 (9) (α α )y = 0 (10) ( ) ζ L =C α ( ) λ ( ) = 0 (11)

6 . Support Vectors 5 Substtutng from Eqs. (9),(10), and (11) nto Eq. (7) yelds the dual optmzaton problem: maxmze 1 (α α )(α j α j)(y.y j ) ε (α + α ) + y (α α ),j=1 subject to (α α ) = 0 and α, α [0, C] (1) In dervng Eq. (1), the dual varables λ, λ are elmnated through the condton n Eq. (11) whch can be reformulated as λ ( ) = C α ( ). Eq. (9) can be rewrtten as follows: w = g(y) = (α α )y, thus: (α α )(y.y) + b (13) Ths s the so-called Support Vector Machnes regresson expanson,.e. w can be completely descrbed as a lnear combnaton of the tranng patterns y. In a sense, the complexty of a functon s representaton by SVs s ndependent of the dmensonalty of the nput space Y, and depends only on the number of SVs. Moreover, the complete algorthm can be descrbed n terms of dot products between the data. Even when evaluatng g(y), the value of w does not need to be computed explctly. These observatons wll come n handy for the formulaton of a nonlnear extenson.. Support Vectors The Karush-Kuhn-Tucker (KKT) condtons [18, 19] are the bascs for the Lagrangan soluton. These condtons state that at the soluton pont, the product between dual varables and constrants has to vansh.e.: α (ε + ζ t + w.y + b) = 0 α (ε + ζ + t w.y b) = 0 (14) (C α )ζ = 0 (C α )ζ = 0 (15)

7 .3 Computng b 6 Several useful conclusons can be drawn from these condtons. Frstly only samples (y, t ) wth correspondng α ( ) a set of dual varables α, α that: = C le outsde the ε-nsenstve tube. Secondly α α = 0,.e. there can never be = 0 whch are both smultaneously nonzero. Ths allows to conclude ε t + w.y + b 0 and ζ = 0 f α C (16) ε t + w.y + b 0 f α > 0 (17) (18) A fnal note has to be made regardng the sparsty of the SVM expanson. From Eq. (14) t follows that only for g(y) ε the Lagrange multplers may be nonzero, or n other words, for all samples nsde the ε-tube (.e. the shaded regon n Fg. (1)) the α, α vansh: for g(y) < ε the second factor n Eq. (14) s nonzero, hence α, α has to be zero such that the KKT condtons are satsfed. Therefore there s a sparse expanson of w n terms of y (.e. not all y needed to descrbe w). The tranng samples that come wth nonvanshng coeffcents are called Support Vectors..3 Computng b There are many ways to compute the value of b n Eq. (13). One of such ways can be found n [0]: b = 1 (w.(y r + y s )) (19) where y r and y s are the support vectors (.e. any nput vector whch has nonzero value of ether α or α respectvely). 3 Nonlnear Regresson: The Kernel Trck The next step s to make the SVM algorthm nonlnear. Ths, for nstance, could be acheved by smply preprocessng the tranng patterns y by a map Ψ : Y I nto some feature space I, as descrbed n [1], and then applyng the standard SVM regresson algorthm. Here s a bref look at an example gven n [1]. Example 1 (Quadratc features n R)

8 3.1 Mappng va the Kernel 7 Consder the map Ψ : R R 3 wth Ψ(y 1, y ) = (y1, y 1 y, y). (The subscrpts refer to the components of y R ). Tranng a lnear SVM on the preprocessed features would yeld a quadratc functon. Whle ths approach seems reasonable n the partcular example above, t can easly become computatonally nfeasble for both polynomal features of hgher order and hgher dmensonalty. 3.1 Mappng va the Kernel To overcome the nfeasblty of the above approach, the key observaton s that the feature map of example 1 can be rewrtten as: (y 1, y 1 y, y ).(y 1, y 1 y, y ) = y.y (0) As noted n the prevous secton, the SVM algorthm only depends on dot products between patterns y. Hence t suffces to know K(y, y ) = Ψ(y, y ) rather than Ψ explctly whch allows us to restate the SVM optmzaton problem as: maxmze 1 (α α )(α j α j)k(y.y j ) ε subject to,j=1 (α + α ) + y (α α ) (α α ) = 0 and α, α [0, C] (1) Lkewse the expanson of g n Eq. (13) may be wrtten as: w = (α α )K(y ) and: g(y) = (α α )K(y, y) + b () An mportant note here s that n the nonlnear settng, the optmzaton problem corresponds to fndng the flattest functon n feature space, not n nput space. The detals of the condtons for admssble SVM kernel functons can be found n [4]. roughly speakng, any postve sem-defne Reproducng Kernel Hlbert Space (RKHS) functon s an admssble SVM kernel. Probably, the most used kerenl n the lterature s the Radal Base Gaussan Functon (RBGF) whch s defned as: K(y, y ) = exp ( 1 ) (y y )Λ 1 (y y ) T where Λ s a parameter. But (3)

9 4 Statstcal Formulaton of the Regresson Problem 8 4 Statstcal Formulaton of the Regresson Problem As stated before, the man problem wth the classcal formulaton of the SVM regresson s the optmzaton problem n Eq.(1). The soluton of such optmzaton problem of o(n 3 ) whch s hghly nfeasble wth large tranng sample sze n. Ths secton presents another formulaton for the SVM regresson whch overcomes ths problem. To construct a Bayesan framework under the assumed loss functon n Eq.(1), an exponental model s employed. In ths model, the lkelhood for the probablty of the true output t at a gven pont y, provdng that the machne output s g(y), (p (t g (y))), s assumed by the followng relatonshp: p (t g (y)) = C exp CL(t, g (y))} (4) (εc + 1) Snce the elements of the tranng sample are assumed to be statstcally ndependent random vectors, the probablstc nterpretaton of the SVM regresson can be consdered to have the followng lkelhood: where: ( ) n C p(t g(d))= exp C (εc + 1) T= [t 1, t,..., t n ] and g (D) = [g (y 1 ), g (y ),..., g (y n )]. } L(t, g(y )) Snce the SVM s consdered as a maxmum posteror probablty estmator wth a Gaussan pror, the pror probablty dstrbuton of the predcton g (y) s assumed as a Gaussan Process, GP. Generally, a GP s a stochastc process whch s completely specfed by ts mean vector and covarance matrx. Thus, the pror probablty for a sample D can be expressed as a GP wth zero mean (for smplcty) and a covarance functon K (y, y ) as: 1 p(g(d))= π det (Kn ) exp 1 } g (D) K 1 n g (D) T where K n = [K (y, y j )] s the covarance matrx at the ponts of D. From Bayes theorem: (5) (6) = p (g (D) D) = M exp C n p (D g (D)) p (g (D)) p (D) L(t, g(y )) 1 π det (Kn ) p (D) } g (D) K 1 n g (D) T (7)

10 5 Mean Feld Theory for Learnng of SVM Regresson 9 where Let: M = ( ) n C. (εc + 1) I = = exp C } n L(t, g(y )) 1 g(d)k 1 n g(d) T dg(d) π det (Kn ) ( ) N (g (D) 0, K n ) exp C L(t, g(y )) dg(d) (8) where N (g (D) 0, K n ) s a normal dstrbuton wth a zero mean and a covarance matrx of K n. Then the normalzng constant p (D) can be expressed as: p (D) = M I (9) From the above dscusson, t can be noted that the estmate of the posteror predcton dstrbuton p (g (D) D) s the one whch maxmzes the numerator of Eq.(7). Equvalently, the MAP estmate s the one whch mnmzes: mn C L(t, g(y )) + 1 g(d) g(d)k 1 n g(d) T (30) The tradtonal SVM formulaton, [3], stops at ths pont and uses quadratc programmng optmzaton by ntroducng Lagrange multplers to solve Eq.(30). The sze of the optmzaton problem s the same as the sze of the tranng sample. Thus, f the sze of the tranng sample ncreases, the optmzaton problem becomes nfeasble (n tme and accuracy consderatons) f t s at all possble. Thus, learnng algorthms are necessary to avod ths unfeasble such quadratc optmzaton. In the followng, a learnng algorthm whch accommodates such a requrement s presented. 5 Mean Feld Theory for Learnng of SVM Regresson An approxmate formulaton of the SVM regresson algorthm s desrable to avod rasng the quadratc programmng problem of the classcal formulaton. Recently, the authors of the work n [13] suggested an advanced approach whch utlzes some prncples of the mean feld theory to cope wth the Gaussan classfcaton problem. The basc dea of the mean feld theory s to approxmate the

11 5 Mean Feld Theory for Learnng of SVM Regresson 10 statstcs of a random varable whch s correlated to other random varables by assumng that the nfluence of the other varables can be compressed nto a sngle effectve mean feld wth a rather smple dstrbuton. In ths paper, ths approach s used to approxmate a dstrbuton for the SVM output, g (y ), correspondng to an nstant, y, from the tranng data set gven the rest of the tranng data set, D, ( p ( g (y ) D )). The dervaton of ths approxmaton s dscussed next. Usng the posteror predcton dstrbuton p (g (D) D) whch s defned n Eq.(7), the predcton (expectaton) on a new test pont y s gven by: g (y) = g (y) p (g (y) D) dg(y) = g (y) p (g (y), g (D) D) dg(y) d g(d) (31) Substtutng from Eq.(7) nto Eq.(31) and wth some mathematcal reducton: M g (y) = g (y) A dg(y) d g(d) (3) π det (Kn ) where: A exp C n L(t, g(y )) 1 g(d, y)k 1 n+1g(d, y) T = p (D) K n+1 = K n K n (y) T, and K n (y) K (y, y) K n (y) = [K (y 1, y), K (y, y),..., K (y n, y)]. But: } 1 g (y) exp g(d, y)k 1 n+1g(d, y) T = n+1 K (y, y ) g(y ) exp } 1 g(d, y)k 1 n+1g(d, y) T Substtutng from Eq.(33) nto Eq.(3), then: M g (y) = K (y, y ) N (g (D) 0, K n )g (y). P (D) } g (y ) exp C L(t j g(y j )) dg(d) = j=1 (33) w K (y, y ) (34)

12 5 Mean Feld Theory for Learnng of SVM Regresson 11 where w s a constant defned as: M w = P (D) g (y ) exp N (g (D) 0, K n )g (y). } C L(t j g(y j )) dg(d) (35) j=1 The weghts w s are estmated usng the tranng sample. One way to facltate ths estmaton s to assume a dstrbuton for the expected output correspondng to an nstant whch s left out of the tranng data set. Ths dea s known as the Leave-One-Out prncple, n whch one nstant y s tentatvely taken away (left out) from the tranng sample and ts correspondng weght w s estmated usng the remanng data nstants and the assumed dstrbuton whch s defned as: p ( g (y ) D ) ( ) B dg D = (36) B dg(d) where: B = N (g (D) 0, K n )exp C } j L(t j g(y j )), D s obtaned by removng the tranng data pattern (y, t ) from D, and D s obtaned by removng the nstant y from the sample D. It can be noted that p ( g (y ) D ) s the predctve dstrbuton at the test pont y gven the data set D. Wth the predctve dstrbuton p ( g (y ) D ), an average (expected) value s defned by: V = V p ( g (y ) D ) dg(y ) (37) where V denotes the expected value for V gven only the data sample D. Substtutng from Eq.(9) nto Eq.(35) for the normalzng constant P (D), and then usng Eq.(36) and Eq.(37), the weght coeffcent w n Eq.(35) can be rewrtten as: w = M exp CL(t g(x ) j g(y j ))} (38) M exp CL(t j g(y j ))} Thus, the weght coeffcents n Eq.(34) can be obtaned by the lkelhood varant rates wth respect to the local predctve dstrbuton p ( g (y ) D ). Dependng on the form of the local predctve dstrbuton p ( g (y ) D ), a formula for calculatng ths weght can be obtaned. In ths paper, a Gaussan approxmaton s used for p ( g (y ) D ) whch has the form: p ( g (y ) D ) } 1 exp (g (y ) g (y ) ) πσ σ where the varance s defned as: σ = g (y ) g (y ). (39)

13 5 Mean Feld Theory for Learnng of SVM Regresson 1 as: where: Insertng Eq. (39) nto Eq.(37) and evaluatng Eq. (38), the weght coeffcents can be obtaned F = w F ( g (y ), σ ) G ( g (y ), σ ) = F G (40) C C exp ( ) } g (y ) t + ε + Cσ ( }) g (y ) 1 erf t + ε + Cσ σ C C exp ( ) } g (y ) + t + ε + Cσ ( }) g (y ) 1 erf + t + ε + Cσ σ and G = } 1 erf t g (y ) + ε σ } 1 erf t g (y ) ε σ + C C exp ( ) } g (y ) t + ε + Cσ ( }) g (y ) 1 erf t + ε + Cσ σ C C exp ( ) } g (y ) + t + ε + Cσ ( }) g (y ) 1 erf + t + ε + Cσ σ (41) Equatons (40) and (41) are called the Mean Feld equatons correspondng to the weght coeffcent w. To evaluate the weght coeffcents n Eq. (40), t s requred to get both the mean (average) g (y ) and the varance σ of the assumed Gaussan model for the local predctve dstrbuton p ( g (y ) D ). The detaled dervaton for both g (y ) and σ dependng on the mean feld theory can be found n [13], but only the fnal results are summarzed here. The posteror average at y s gven by: g (y ) = w j K (y, y j ) (4) j=1

14 6 Summary of the Proposed MF-Based SVM Regresson Algorthm 13 From [13], the followng results are obtaned: g (y ) g (y ) σ w (43) and, σ 1 [ (Σ + K) 1 ] Σ (44) where: The expresson for w g(y ) Σ = dag (Σ 1, Σ,..., Σ n ), and ( ) 1 Σ = σ w g (y ) can be obtaned from Equatons (40) and (41) as: w g (y ) C w w g (y ) + σ C t +ε t ε p ( g (y ) D ) dg(y ) σ G ( g (y ), σ ) C w w g (y ) + σ C + IG σ G ( g (y ), σ ) (45) where: } IG = 1 erf t g (y ) + ε σ } 1 erf t g (y ) ε σ 6 Summary of the Proposed MF-Based SVM Regresson Algorthm The mplementaton steps of the proposed approach for densty estmaton usng SVM wth the mean feld theory beng appled to the learnng process are presented below: 1. Consder the tranng data set D.. Set a learnng rate η and randomly ntalze w s.

15 6.1 Remarks on the MF-Based SVM Densty Estmaton Algorthm Choose a kernel K (y, y ) and accordngly, calculate the covarance matrx K n and let σ = [K n ]. 4. Iterate steps 5 and 6 untl convergence n w s. 5. nner loop : For = 1,,..., n do 5.1 calculate g (y ) from Eq. (4) 5. calculate g (y ) from Eq.(43). 5.3 calculate F and G from Eq. (41) 5.4 update w by: ( ) F w = w + η G w 6. outer loop : For every M teratons for w, update σ from Eq. (44). 6.1 Remarks on the MF-Based SVM Densty Estmaton Algorthm 1. The most computatonally expensve step n the above algorthm s the nverson of the matrx K n + Σ n step 6. So, t s recommended that step 6 at the outer loop terate less frequently than step 5 of the nner loop. For example, after M = 10 teratons of updatng w, there wll be one update of σ.. The optmzaton needed to obtan the weghts s carred out n the feature space,.e. after applyng the kernel functon on the nput samples. 3. Snce the optmzaton s done n the feature space, the optmzaton does not depend on the nput space dmensonalty and so the densty estmaton procedure too. 7 Sample Results In ths secton, a sample of expermental results on the SVM regresson s ntroduced. In ths excrement, a data set from the 41 ponts from a mxture of Gaussan functons: f(y) = N ( 10, 9) + N (0,.5) + N (10, 9) (46)

16 8 Concluson True Approxmated True Approxmated (a) (b) Fgure : Functon approxmaton for a mxture of Gaussans functon usng: (a) classcal formulaton, and (b) statstcal formulaton for the SVM regresson algorthm. Fgure () shows the results of applyng the SVM regresson algorthm to approxmate f(y) and llustrates the good performance of the SVM regresson algorthm wth both types of mplementaton. The superorty of the statstcal based formulaton would appear wth data sets that have large szes. The SVM regresson s mplemented usng MATLAB software. The followng lnk contans both the classcal and statstcal based mplementatons. 8 Concluson An overvew of the mathematcal foundatons of the SVM regresson was ntroduced. The bascs of the regresson process and the dea of soft margn loss functon s dscussed. Classcal formulaton of the SVM regresson algorthm s ntroduced and ts shortcomngs. Formulaton of the SVM regresson n a statstcal setup s dscussed wth the advantage of avodng the the shortcomngs of the classcal formulaton. References [1] V. Vapnk, The Nature of Statstcal Learnng Theory. Second Edton, Sprnger, New York, 001. [] Refaat M Mohamed and Aly A Farag, Classfcaton of Multspectral Data Usng Support Vector Machnes Approach for Densty Estmaton, IEEE Seventh Internatonal Conference on

17 REFERENCES 16 Intellgent Engneerng Systems, INES03, Assut, Egypt, March 003. [3] V. Vapnk, S. Golowch and A. Smola, Support Vector Method for Multvarate Densty Estmaton, Advances n Neural Informaton Processng Systems, Vol. 1, pp , MIT Press, Aprl [4] B. Scholkopf, C. Burges, and A. Smola, Advances n Kernel Methods-Support Vector Learnng. MIT Press, Cambrdge, MA, [5] T. Fress, N. Crstann, and C. Campbell, The Kernel ADATRON algorthm: a fast and smple learnng procedure for Support Vector Machnes, The 15th Internatonal Conference on Machne Learnng, July 4-7, 1998, Madson, Wsconsn USA, pp [6] J. Platt, Fast Tranng of Support Vector Machnes Usng Sequental Mnmal Optmzaton. Advances n Kernel Methods Book, pp , MIT Press, Cambrdge: MA, [7] C. Wllams and M. Seeger, Usng the Nystrom Method to Speed Up Kernel Machnes, Advances n Neural Informaton Processng System, vol. 14, 001. [8] M. Tppng and A. Faul, Fast Margnal Lkelhood Maxmzaton for Sparse Bayesan Models, Internatonal Workshop on AI and Statstcs, 003. [9] C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, vol., no., pp. 1-47, [10] P. Mtra, C. Murthy and S. Pal, A Probablstc Actve Support Vector Learnng Algorthm, IEEE Transactons on Pattern Analyss and Machne Intellgence, vol. 6, no.3, pp , March 004. [11] D. Cohn, Z. Ghahraman and M. Jordan, Actve Learnng wth Statstcal Models, Journal of AI Research, vol. 4, pp , [1] D. MacKay, Informaton Based Objectve Functon for Actve Data Selecton, Neural Computaton, vol. 4, no. 4, pp , 199. [13] M. Opper and O. Wnther, Gaussan Processes for Classfcaton: Mean Feld Algorthms, Neural Computaton, Vol. 1, pp , 000.

18 REFERENCES 17 [14] K. P. Bennett and O. L. Mangasaran, Robust lnear programmng dscrmnaton of two lnearly nseparable sets, Optmzaton Methods and Software, Vol. 1, pp. 334, 199. [15] C. Cortes and V. Vapnk, Support vector networks, Machne Learnng, Vol. 0, pp. 7397, [16] O. L. Mangasaran. Nonlnear Programmng. McGraw-Hll, New York, [17] R. J. Vanderbe, LOQO users manualverson 3.10, Techncal Report SOR-97-08, Prnceton Unversty, Statstcs and Operatons Research, [18] W. Karush, Mnma of functons of several varables wth nequaltes as sde constrants, Masters thess, Dept. of Mathematcs, Unversty of Chcago, [19] H. W. Kuhn and A. W. Tucker, Nonlnear programmng, nd Berkeley Symposum on Mathematcal Statstcs and Probablstcs, pp , Berkeley, [0] S. R. Gunn, Support Vetor Machnes for Classfcaton and Regresson, Techncal Report, Unversty of Suthoampton, School of Electroncs and Computer Scence, 1998.

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Kristin P. Bennett. Rensselaer Polytechnic Institute

Kristin P. Bennett. Rensselaer Polytechnic Institute Support Vector Machnes and Other Kernel Methods Krstn P. Bennett Mathematcal Scences Department Rensselaer Polytechnc Insttute Support Vector Machnes (SVM) A methodology for nference based on Statstcal

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

Support Vector Machines

Support Vector Machines CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at

More information

Relevance Vector Machines Explained

Relevance Vector Machines Explained October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Lagrange Multipliers Kernel Trick

Lagrange Multipliers Kernel Trick Lagrange Multplers Kernel Trck Ncholas Ruozz Unversty of Texas at Dallas Based roughly on the sldes of Davd Sontag General Optmzaton A mathematcal detour, we ll come back to SVMs soon! subject to: f x

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Maximal Margin Classifier

Maximal Margin Classifier CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

De-noising Method Based on Kernel Adaptive Filtering for Telemetry Vibration Signal of the Vehicle Test Kejun ZENG

De-noising Method Based on Kernel Adaptive Filtering for Telemetry Vibration Signal of the Vehicle Test Kejun ZENG 6th Internatonal Conference on Mechatroncs, Materals, Botechnology and Envronment (ICMMBE 6) De-nosng Method Based on Kernel Adaptve Flterng for elemetry Vbraton Sgnal of the Vehcle est Kejun ZEG PLA 955

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

Sparse Gaussian Processes Using Backward Elimination

Sparse Gaussian Processes Using Backward Elimination Sparse Gaussan Processes Usng Backward Elmnaton Lefeng Bo, Lng Wang, and Lcheng Jao Insttute of Intellgent Informaton Processng and Natonal Key Laboratory for Radar Sgnal Processng, Xdan Unversty, X an

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

CSE 252C: Computer Vision III

CSE 252C: Computer Vision III CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

1 Motivation and Introduction

1 Motivation and Introduction Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference,

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning Advanced Introducton to Machne Learnng 10715, Fall 2014 The Kernel Trck, Reproducng Kernel Hlbert Space, and the Representer Theorem Erc Xng Lecture 6, September 24, 2014 Readng: Erc Xng @ CMU, 2014 1

More information

The Study of Teaching-learning-based Optimization Algorithm

The Study of Teaching-learning-based Optimization Algorithm Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution. Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Feb 14: Spatial analysis of data fields

Feb 14: Spatial analysis of data fields Feb 4: Spatal analyss of data felds Mappng rregularly sampled data onto a regular grd Many analyss technques for geophyscal data requre the data be located at regular ntervals n space and/or tme. hs s

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

A Hybrid Variational Iteration Method for Blasius Equation

A Hybrid Variational Iteration Method for Blasius Equation Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 1 (June 2015), pp. 223-229 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) A Hybrd Varatonal Iteraton Method

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012 Support Vector Machnes Je Tang Knowledge Engneerng Group Department of Computer Scence and Technology Tsnghua Unversty 2012 1 Outlne What s a Support Vector Machne? Solvng SVMs Kernel Trcks 2 What s a

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Time-Varying Systems and Computations Lecture 6

Time-Varying Systems and Computations Lecture 6 Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore 8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Support Vector Machines

Support Vector Machines /14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

A quantum-statistical-mechanical extension of Gaussian mixture model

A quantum-statistical-mechanical extension of Gaussian mixture model A quantum-statstcal-mechancal extenson of Gaussan mxture model Kazuyuk Tanaka, and Koj Tsuda 2 Graduate School of Informaton Scences, Tohoku Unversty, 6-3-09 Aramak-aza-aoba, Aoba-ku, Senda 980-8579, Japan

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Inexact Newton Methods for Inverse Eigenvalue Problems

Inexact Newton Methods for Inverse Eigenvalue Problems Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Introduction to Hidden Markov Models

Introduction to Hidden Markov Models Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information