Kernel Methods and SVMs

Size: px
Start display at page:

Download "Kernel Methods and SVMs"

Transcription

1 Statstcal Machne Learnng Notes 7 Instructor: Justn Domke Kernel Methods and SVMs Contents 1 Introducton 2 2 Kernel Rdge Regresson 2 3 The Kernel Trck 5 4 Support Vector Machnes 7 5 Examples 1 6 Kernel Theory Kernel algebra Understandng Polynomal Kernels va Kernel Algebra Mercer s Theorem Our Story so Far 21 8 Dscusson SVMs as Template Methods Theoretcal Issues

2 Kernel Methods and SVMs 2 1 Introducton Support Vector Machnes (SVMs) are a very succesful and popular set of technques for classfcaton. Hstorcally, SVMs emerged after the neural network boom of the 8s and early 9s. People were surprsed to see that SVMs wth lttle to no tweakng could compete wth neural networks nvolvng a great deal of manual engneerng. It remans true today that SVMs are among the best off-the-shelf classfcaton methods. If you want to get good results wth a mnmum of messng around, SVMs are a very good choce. Unlke the other classfcaton methods we dscuss, t s not convenent to begn wth a concse defnton of SVMs, or even to say what exactly a support vector s. There s a set of deas that must be understood frst. Most of these you have already seen n the notes on lnear methods, bass expansons, and template methods. The bggest remanng concept s known as the kernel trck. In fact, ths dea s so fundamental many people have advocated that SVMs be renamed Kernel Machnes. It s worth mentonng that the standard presentaton of SVMs s based on the concept of margn. For lack of tme, ths perspectve on SVMs wll not be presented here. If you wll be workng serously wth SVMs, you should famlarze yourself wth the margn perspectve to enoy a full understandng. (Warnng: These notes are probably the most techncally challengng n ths course, partcularly f you don t have a strong background n lnear algebra, Lagrange multplers, and optmzaton. Kernel methods smply use more mathematcal machnery than most of the other technques we cover, so you should be prepared to put n some extra effort. Enoy!) 2 Kernel Rdge Regresson We begn by not talkng about SVMs, or even about classfcaton. Instead, we revst rdge regresson, wth a slght change of notaton. Let the set of nputs be {(x, y )}, where ndexes the samples. The problem s to mnmze ( ) 2 x w y + λw w If we take the dervatve wth respect to w and set t to zero, we get

3 Kernel Methods and SVMs 3 = 2x (x T w y ) + 2λw w = ( x x T + λi ) 1 x y. Now, let s consder a dfferent dervaton, makng use of some Lagrange dualty. If we ntroduce a new varable z, and constran t to be the dfference between w x and y, we have 1 mn w,z 2 z z + 1 λw w (2.1) 2 s.t. z = x w y. Usng α to denote the Lagrange multplers, ths has the Lagrangan L = 1 2 z z λw w + α (x w y z ). Recall our foray nto Lagrange dualty. We can solve the orgnal problem by dong max α mn L(w,z, α). w,z To begn, we attack the nner mnmzaton: For fxed α, we would lke to solve for the mnmzng w and z. We can do ths by settng the dervatves of L wth respect to z and w to be zero. Dong ths, we can fnd 1 z = α, w = 1 α x (2.2) λ So, we can solve the problem by maxmzng the Lagrangan (wth respect to α), where we substtute the above expressons for z and w. Thus, we have an unconstraned maxmzaton. max α L(w (α),z (α), α) 1 = dl dz = z α = dl dw = λw + α x

4 Kernel Methods and SVMs 4 Before dvng nto the detals of that, we can already notce somethng very nterestng happenng here: w s gven by a sum of the nput vectors x, weghted by α /λ. If we were so nclned, we could avod explctly computng w, and predct a new pont x drectly from the data as f(x) = x w = 1 α x x. λ Now, let k(x,x ) = x x be the kernel functon. For now, ust thnk of ths as a change of notaton. Usng ths, we can agan wrte the rdge regresson predctons as f(x) = 1 α k(x,x ). λ Thus, all we really need s the nner product of x wth each of the tranng elements x. We wll return to why ths mght be useful later. Frst, let s return to dong the maxmzaton over the Lagrange multplers α, to see f anythng smlar happens there. The math below looks really complcated. However, all the we are dong s substtutng the expressons for z and w from Eq. 2.2, then dong a lot of manpulaton. max mn L α w,z = max α = max α = max α 1 2 = max α α λ( 1 ) ( 1 ) α x α x λ λ + α (x 1 α x y α ) λ 1 α α α x x 1 α α x x 2 2λ λ α 2 1 α α x x α y 2λ α 2 1 α α k(x,x ) α y. 2λ α (y + α ) Agan, we only need nner products. If we defne the matrx K by K = k(x,x ), then we can rewrte ths n a puncher vector notaton as

5 Kernel Methods and SVMs 5 max mn L = max 1 α w,z α 2 α α 1 2λ αt Kα α y. Here, we use a captal K to denote the matrx wth entres K and a lowercase k to denote the kernel functon k(, ). Note that most lterature on kernel machnes mldly abuses notaton by usng the captal letter K for both. The thng on the rght s ust a quadratc n α. As such, we can fnd the optmum as the soluton of a lnear system 2. What s mportant s the observaton that, agan, we only need the nner products of the data k(x,x ) = x x to do the optmzaton over α. Then, once we have solved for α, we can predct f(x) for new x agan usng only nner products. If someone tells us all the nner products, we don t need the orgnal data {x } at all! 3 The Kernel Trck So we can work completely wth nner products, rather than the vectors themselves. So what? One way of lookng at thngs s that we can mplctly use bass expansons. If we want to take x, and transform t nto some fancy feature space φ(x), we can replace the kernel functon by K = k(x,x ) = φ(x ) φ(x ). The pont of talkng about ths s that for certan bass expansons, we can compute k very cheaply wthout ever explctly formng φ(x ) or φ(x ). Ths can mean a huge computatonal savngs. A nce example of ths s the kernel functon We can see that k(x,v) = (x v) 2. 2 It s easy to show (by takng the gradent) that the optmum s at α = ( 1 λ K I) 1 y.

6 Kernel Methods and SVMs 6 k(x,v) = ( = ( = x v ) 2 x v )( x v ) x x v v. It s not hard to see that k(x,v) = φ(x) φ(v), where φ s a quadratc bass expanson φ m (x) = φ (x)φ (x). For example, n two dmensons, k(x,v) = ( v 1 + v 2 ) 2 = v 1 v 1 + v 1 v 2 + v 2 v 1 + v 2 v 2. whle the bass expansons are φ(x) = (,,, ), φ(v) = (v 1 v 1, v 1 v 2, v 2 v 1, v 2 v 2 ). It s not hard to work out that k(x,v) = φ(x) φ(v). However, notce that we can compute k(x,v) n tme O(d), rather than the O(d 2 ) tme t would take to explctly compute φ(x) φ(v). Ths s the kernel trck : gettng around the computatonal expense n computng large bass expansons by drectly computng kernel functons. Notce, however, that the kernel trck changes nothng, nada, zero about the statstcal ssues wth huge bass expansons. We get exactly the same predctons as f we computed the bass expanson explctly, and used tradtonal lnear methods. We ust compute the predctons n a dfferent way. In fact can nvent a new kernel functon k(x,v), and, as long as t obeys certan rules, use t n the above algorthm, wth out explctly thnkng about bass expanson at all. Some common examples are: name k(x, v) Lnear x v Polynomal (r + x v) d, for some r, d > Radal Bass Functon exp ( γ x v 2), γ > Gaussan exp ( 1 x v 2) 2σ 2

7 Kernel Methods and SVMs 7 We wll return below to the queston of what kernel functons are legal, meanng there s some feature space φ such that k(x,v) = φ(x) φ(v). Now, what exactly was t about rdge regresson that let us get away wth workng entrely wth nner products? How much could we change the problem, and preserve ths? We really need two thngs to happen: 1. When we take dl/dw =, we need to be able to solve for w, and the soluton needs to be a lnear combnaton of the nput vectors x. 2. When we substtute ths soluton back nto the Lagrangan, we need to get a soluton that smplfes down nto nner products only. Notce that ths leaves us a great deal of flexblty. For example, we could replace the leastsquares crteron z z wth an alternatve (convex) measure. We could also change the way n whch whch measure errors from z = w x y, to somethng else, (although wth some restrctons). 4 Support Vector Machnes Now, we turn to the bnary classfcaton problem. Support Vector Machnes result from mnmzng the hnge loss (1 y w x ) + wth rdge regularzaton. mn w Ths s equvalent to (for c = 2/λ) (1 y w x ) + + λ w 2. mn w c (1 y w x ) w 2. Because the hnge loss s non-dfferentable, we ntroduce new varables z, creatng a constraned optmzaton mn z,w c z w 2 (4.1) s.t. z (1 y w x ) z.

8 Kernel Methods and SVMs 8 Introducng new constrants to smplfy an obectve lke ths seems strange at frst, but sn t too hard to understand. Notce the constrants are exactly equvalent to forcng that z (1 y w x ) +. But snce we are mnmzng the sum of all the z, the optmzaton wll make t as small as possble, and so z wll be the hnge loss for example, no more, no less. Introducng Lagrange multplers α to enforce that z (1 y w x ) and µ to enforce that z, we get the Lagrangan L = c z w 2 + α (1 y w x z ) + µ ( z ). A bunch of manpulaton changes ths to L = z ( c µ α ) w 2 + α w α y x. As ever, Lagrangan dualty states that we can solve our orgnal problem by dong max mn L. α,µ z,w For now, we work on the nner mnmzaton. For a partcular α and µ, we want to mnmze wth respect to z and w. By settng dl/dw =, we fnd that w = α y x. Meanwhle, settng dl/dz gves that α = c µ. If we substtute these expressons, we fnd that µ dsappears. However, notce that snce µ we must have that α c. max mn L = max ( ) 1 z α α + α α y y x x α,µ z,w α C 2 + α α y x α y x = max α 1 α α y y x x α c 2

9 Kernel Methods and SVMs 9 Ths s a maxmzaton of a quadratc obectve, under lnear constrants. That s, ths s a quadratc program. Hstorcally, QP solvers were frst used to solve SVM problems. However, as these scale poorly to large problems, a huge amount of effort has been devoted to faster solvers. (Often based on coordnate ascent and/or onlne optmzaton). Ths area s stll evolvng. However, software s wdely avalable now for solvers that are qute fast n practce. Now, as we saw above that w = α y x, we can classfy new ponts x by f(x) = α y x x Clearly, ths can be kernelzed. If we do so, we can compute the Lagrange multplers by the SVM optmzaton max α c α 1 α α y y k(x,x ), (4.2) 2 whch s agan a quadratc program. We can classfy new ponts by the SVM classfcaton rule f(x) = α y k(x,x ). (4.3) Snce we have kernelzed both the learnng optmzaton, and the classfcaton rule, we are agan free to replace k wth any of the varety of kernel functons we saw before. Now, fnally, we can defne what a support vector s. Notce that Eq. 4.2 s the maxmzaton of a quadratc functon of α, under the box constrants that α c. It often happens that α wants to be negatve (n terms ot the quadratc functon), but s prevented from ths by the constrants. Thus, α s often sparse. Ths has some nterestng consequences. Frst of all, clearly f α =, we don t need to nclude the correspondng term n Eq Ths s potentally a bg savngs. If all α are nonzero, then we would need to explctly compute the kernel functon wth all nputs, and our tme complexty s smlar to a nearest-neghbor method. If we only have a few nonzero α then we only have to compute a few kernel functons, and our complexty s smlar to that of a normal lnear method. Another nterestng property of the sparsty of α s that non- don t affect the soluton. Let s see why. What does t mean f α =? Well, recall that the multpler α s enforcng the constrant that z 1 y w x. (4.4)

10 Kernel Methods and SVMs 1 If α = at the soluton, then ths means, nformally speakng, that we ddn t really need to enforce ths constrant at all: If we threw t out of the optmzaton, t would stll automatcally be obeyed. How could ths be? Recall that the orgnal optmzaton n Eq. 4.1 s tryng to mnmze all the z. There are two thngs stoppng t from flyng down to : the constrant n Eq. 4.4 above, and the constrant that z. If the constrant above can be removed wth out changng the soluton, then t must be that z =. Thus, α = mples that 1 y w x, or, equvalently, that y w x 1. Thus non- are ponts that are very well classfed, that are comfortably on the rght sde of the lnear boundary. Now, magne we take some x wth z =, and remove t from the tranng set. It s pretty easy to see that ths s equvalent to takng the optmzaton mn z,w c z w 2 s.t. z (1 y w x ) z, and ust droppng the constrant that z (1 y w x ), meanng that z decouples from the other varables, and the optmzaton wll pck z =. But, as we saw above, ths has no effect. Thus, removng a non-support vector from the tranng set has no mpact on the resultng classfcaton rule. 5 Examples In class, we saw some examples of runnng SVMs. Here are many more.

11 Kernel Methods and SVMs 11 Dataset A, c = 1, k(x,v) = x v. predctons 1 5 α sorted ndces Dataset A, c = 1 3, k(x,v) = x v. predctons 1 5 α sorted ndces Dataset A, c = 1 5, k(x,v) = x v. predctons 1 5 α sorted ndces

12 Kernel Methods and SVMs 12 Dataset A, c = 1, k(x,v) = 1 + x v. predctons 1 5 α sorted ndces Dataset A, c = 1 3, k(x,v) = 1 + x v. predctons 1 5 α sorted ndces Dataset A, c = 1 5, k(x,v) = 1 + x v. predctons 1 5 α sorted ndces

13 Kernel Methods and SVMs 13 Dataset B, c = 1 5, k(x,v) = 1 + x v. predctons 1 5 α sorted ndces Dataset B, c = 1 5, k(x,v) = (1 + x v) 5. predctons 1 5 α sorted ndces Dataset B, c = 1 5, k(x,v) = (1 + x v) 1. predctons 1 5 α sorted ndces

14 Kernel Methods and SVMs 14 Dataset C (dataset B wth nose), c = 1 5, k(x,v) = 1 + x v. predctons 1 5 α sorted ndces Dataset C, c = 1 5, k(x,v) = (1 + x v) 5. predctons 1 5 α sorted ndces Dataset C, c = 1 5, k(x,v) = (1 + x v) 1. predctons 1 5 α sorted ndces

15 Kernel Methods and SVMs 15 Dataset C (dataset B wth nose), c = 1 5, k(x,v) = exp ( 2 x v 2). predctons 1 5 α sorted ndces Dataset C, c = 1 5, k(x,v) = exp ( 2 x v 2). predctons 1 5 α sorted ndces Dataset C, c = 1 5, k(x,v) = exp ( 2 x v 2). predctons 1 5 α sorted ndces

16 Kernel Methods and SVMs 16 6 Kernel Theory We now return to the ssue of what makes a vald kernel k(x, v) where vald means there exsts some feature space φ such that k(x,v) = φ(x) φ(v). 6.1 Kernel algebra We can construct complex kernel functons from smple ones, usng an algebra of composton rules 3. Interestngly, these rules can be understood from parallel compostons n feature space. To take an example, suppose we have two vald kernel functons k a and k b. If we defne a new kernel functon by k(x,v) = k a (x,v) + k b (x,v), k wll be vald. To see why, consder the feature spaces φ a and φ b correspondng to k a and k b. If we defne φ by ust concatenatng φ a and φ b, by φ(x) = (φ a (x), φ b (x)), then φ s the feature space correspondng to k. To see ths, note φ(x) φ(v) = (φ a (x), φ b (x)) (φ a (v), φ b (v)) = φ a (x) φ a (v) + φ b (x)φ b (v) = k a (x,v) + k b (x,v) = k(x,v). We can make a table of kernel composton rules, along wth the dual feature space composton rules. kernel composton feature composton a) k(x,v) = k a (x,v) + k b (x,v) b) k(x,v) = fk a (x,v), f > φ(x) = (φ a (x), φ b (x)), φ(x) = fφ a (x) c) k(x,v) = k a (x,v)k b (x,v) φ m (x) = φ a (x)φ b (x) d) k(x,v) = x T Av, A postve sem-defnte φ(x) = L T x, where A = LL T. e) k(x,v) = x T M T Mv, M arbtrary φ(x) = Mx 3 Ths materal s based on class notes from Mchael Jordan.

17 Kernel Methods and SVMs 17 We have already proven rule (a). Let s prove some of the others. Rule (b) s qute easy to understand: φ(x) φ(v) = fφ a (x) fφ a (v) = fφ a (x) φ a (v) = fk a (x,v) = k(x, v) Rule (c) s more complex. It s mportant to understand the notaton. If then φ contans all sx pars. φ a (x) = (φ a1 (x), φ a2 (x), φ a3 (x)) φ b (x) = (φ b1 (x), φ b2 (x)), φ(x) = ( φ a1 (x)φ b1 (x), φ a2 (x)φ b1 (x), φ a3 (x)φ b1 (x) φ a1 (x)φ b2 (x), φ a2 (x)φ b2 (x), φ a3 (x)φ b2 (x) ) Wth that understandng, we can prove rule (c) va φ(x) φ(v) = φ m (x) m φ(v) m = φ a (x)φ b (x)φ a (v)φ b (v) = ( φ a (x)φ a (v) )( φ b (x)φ b (v) ) = ( φ a (x) φ a (v) )( φ b (x) φ b (v) ) = k a (x,v)k b (x,v) = k(x,v). Rule (d) follows from the well known result n lnear algebra that a symmetrc postve sem-defnte matrx A can be factored as A = LL T. Wth that known, clearly

18 Kernel Methods and SVMs 18 φ(x) φ(v) = (Lx) (Lv) = x T L T Lv = x T A T v = x T Av = k(x, v) We can alternatvely thnk of rule (d) as sayng that k(x) = x T M T Mx corresponds to the bass expanson φ(x) = Mx for any M. That gves rule (e). 6.2 Understandng Polynomal Kernels va Kernel Algebra So, we have all these rules for combnng kernels. What do they tell us? Rules (a),(b), and (c) essentally tell us that polynomal combnatons of vald kernels are vald kernels. Usng ths, we can understand the meanng of polynomal kernels Frst off, for some scalar varable x, consder a polynomal kernel of the form k(x, v) = (xv) d. To what bass expanson does ths kernel correspond? We can buld ths up stage by stage. k(x, v) = xv φ(x) = (x) k(x, v) = (xv) 2 φ(x) = ( ) by rule (c) k(x, v) = (xv) 3 φ(x) = (x 3 ) by rule (c). If we work wth vectors, we fnd that k(x,v) = (x v) corresponds to φ(x) = x, whle (by rule (c)) k(x,v) = (x v) 2 corresponds to a feature space wth all parwse terms φ m (x) = x x, 1, n. Smlarly, k(x,v) = (x v) 3 corresponds to a feature space wth all trplets φ m (x) = x x x k, 1,, k n.

19 Kernel Methods and SVMs 19 More generally, k(x,v) = (x v) d corresponds to a feature space wth terms φ m (x) =...x d, 1 n (6.1) Thus, a polynomal kernel s equvalent to a polynomal bass expanson, wth all terms of order d. Ths s pretty surprsng even though the word polynomal s n front of both of these terms! Agan, we should reterate the computatonal savngs here. In general, computng a polynomal bass expanson wll take tme O(n d ). However, computng a polynomal kernel only takes tme O(n). Agan, though, we have only defeated the computatonal ssue wth hgh-degree polynomal bass expansons. The statstcal propertes are unchanged. Now, consder the kernel k(x,v) = (r + x v) d. What s the mpact of addng the constant of r? Notce that ths s equvalent to smply takng the vectors x and v and prependng a constant of r to them. Thus, ths kernel corresponds to a polynomal expanson wth constant terms added. One way to wrte ths would be φ m (x) =...x d, n (6.2) where we consder x to be equal to r. Thus, ths kernel s equvalent to a polynomal bass expanson wth all terms of all order less than or equal to d. An nterestng queston s the mpact of the constant r. Should we set t large or small? What s the mpact of ths choce. Notce that the lower-order terms n the bass expanson n Eq. 6.2 wll have many terms x, and so get multpled by a r to a hgh power. Meanwhle, hgh order terms wll have have few or no terms of x, and so get multpled by r to a low power. Thus, a large factor of r has the effect of makng the low-order term larger, relatve to hgh-order terms. Recall that f we make the bass expanson larger, ths has the effect of reducng the the regularzaton penalty, snce the same classfcaton rule can be accomplshed wth a smaller weght. Thus, f we make part of a bass expanson larger, those parts of the bass expanson wll tend to play a larger role n the fnal classfcaton rule. Thus, usng a larger constant r had the effect of makng the low-order parts of the polynomal expanson n Eq. 6.2 tend to have more mpact. 6.3 Mercer s Theorem One thng we mght worry about s f the SVM optmzaton s convex n α. The concern s f α α y y k(x,x ) (6.3)

20 Kernel Methods and SVMs 2 s convex wth respect to α. We can show that f k s a vald kernel functon, then the kernel matrx K must be postve sem-defnte. z T Kz = = = z K z z φ(x ) φ(x )z z φ(x ) φ(x )z = z φ(x ) 2 We can also show that, f K s postve sem-defnte, then the SVM optmzaton s concave. The thng to see s that α α y y k(x,x ) = α T dag(y)kdag(y)α = α T Mα, where M = dag(y)kdag(y). It s not hard to show that M s postve sem-defnte. So ths s very nce f we use any vald kernel functon, we can be assured that the optmzaton that we need to solve n order to recover the Lagrange multplers α wll be concave. (The equvalent of convex when we are dong a maxmzaton nstead of a mnmzaton.) Now, we stll face the queston do there exst nvald kernel functons that also yeld postve sem-defnte kernel matrces? It turns out that the answer s no. Ths result s known as Mercer s theorem. A kernel functon s vald f and only f the correspondng kernel matrx s postve sem-defnte for all tranng sets {x }. Ths s very convenent the vald kernel functons are exactly those that yeld optmzaton problems that we can relably solve. However, notce that Mercer s theorem refers to all sets of ponts {x }. An nvald kernel can yeld a postve sem-defnte kernel matrx for some partcular tranng set. All we know s that, for an nvald kernel, there s some tranng set that yelds a non postve sem-defnte kernel matrx.

21 Kernel Methods and SVMs 21 7 Our Story so Far There were a lot of techncal detals here. It s worth takng a look back to do a conceptual overvew and make sure we haven t mssed the bg pcture. The startng pont for SVMs s s mnmzng the hnge loss under rdge regresson,.e. w = mn w c (1 y w x ) w 2. Fundamentally, SVMs are ust fttng an optmzaton lke ths. The dfference s that they perform the optmzaton n a dfferent way, and they allow you to work effcently wth powerful bass expansons / kernel functons. Act 1. We proved that f w s the vector of weghts that results from ths optmzaton, then we could alternatvely calculate w as w = α y x, where the α are gven by the optmzaton max α c α 1 α α y y x x. (7.1) 2 Wth that optmzaton solved, we can classfy a new pont x by f(x) = x w = α y x x, (7.2) Thus, f we want to, we can thnk of the varables α as beng the man thngs that we are fttng, rather than the weghts w. Act 2. Next, we notced that the above optmzaton (Eq. 7.3) and classfer (Eq. 7.4) only depend on the nner products of the data elements. Thus, we could replace the nner products n these expressons wth kernel evaluatons, gvng the optmzaton and the classfcaton rule max α c α 1 α α y y k(x,x ), (7.3) 2

22 Kernel Methods and SVMs 22 f(x) = α y k(x,x ), (7.4) where k(x,v) = x v. Act 3. Now, magne that nstead of drectly workng wth the data, we wanted to work wth some bass expanson. Ths would be easy to accomplsh ust by swtchng the kernel functon to be k(x,v) = φ(x) φ(v). However, we also notced that for some bass expansons, lke polynomals, we could compute k(x, v) much more effcently than explctly formng the bass expansons and then takng the nner product. We called ths computatonal trck the kernel trck. Act 4. Fnally, we developed a kernel algebra, whch allowed us to understand how we can combne dfferent kernel functons, and what ths meant n feature space. We also saw Mercer s theorm, whch tells us what kernel functons are and are not legal. Happly, ths corresponded wth the SVM optmzaton problem beng convex. 8 Dscusson 8.1 SVMs as Template Methods Regardless of what we say, at the end of the day, support vector machnes make ther predctons through the classfcaton rule f(x) = α y k(x,x ). Intutvely, k(x,x ) measures how smlar x s to tranng example x. Ths bears a strong resemblence to K-NN classfcaton, where we use k as the dstance metrc, rather than somethng lke the Eucldean dstance. Thus, SVMs can be seen as glorfed template methods, where the amount that each pont x partcpates n predctons s reweghted n the learnng stage. Ths s a vew usually espoused by SVM skeptcs, but a reasonable one. Remember, however, that there s absolutely nothng wrong wth template methods.

23 Kernel Methods and SVMs Theoretcal Issues An advantage of SVMs s that rgorous theoretcal guarantees can often be gven for ther performance. It s possble to use these theoretcal bounds to do model selecton, rather than, e.g., cross valdaton. However, at the moment, these theoretcal guarantees are rather loose n practce, meanng that SVMs perform sgnfcantly better than the bounds can show. As such, one can often get better practcal results by usng more heurstc model selecton procedures lke cross valdaton. We wll see ths when we get to learnng theory.

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far

More information

Maximal Margin Classifier

Maximal Margin Classifier CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Lagrange Multipliers Kernel Trick

Lagrange Multipliers Kernel Trick Lagrange Multplers Kernel Trck Ncholas Ruozz Unversty of Texas at Dallas Based roughly on the sldes of Davd Sontag General Optmzaton A mathematcal detour, we ll come back to SVMs soon! subject to: f x

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

1 Matrix representations of canonical matrices

1 Matrix representations of canonical matrices 1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 10: Classifica8on with Support Vector Machine (cont.

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 10: Classifica8on with Support Vector Machine (cont. UVA CS 4501-001 / 6501 007 Introduc8on to Machne Learnng and Data Mnng Lecture 10: Classfca8on wth Support Vector Machne (cont. ) Yanjun Q / Jane Unversty of Vrgna Department of Computer Scence 9/6/14

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Support Vector Machines

Support Vector Machines /14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

CSE 252C: Computer Vision III

CSE 252C: Computer Vision III CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Fisher Linear Discriminant Analysis

Fisher Linear Discriminant Analysis Fsher Lnear Dscrmnant Analyss Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan Fsher lnear

More information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}. CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

Kristin P. Bennett. Rensselaer Polytechnic Institute

Kristin P. Bennett. Rensselaer Polytechnic Institute Support Vector Machnes and Other Kernel Methods Krstn P. Bennett Mathematcal Scences Department Rensselaer Polytechnc Insttute Support Vector Machnes (SVM) A methodology for nference based on Statstcal

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning Advanced Introducton to Machne Learnng 10715, Fall 2014 The Kernel Trck, Reproducng Kernel Hlbert Space, and the Representer Theorem Erc Xng Lecture 6, September 24, 2014 Readng: Erc Xng @ CMU, 2014 1

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

18-660: Numerical Methods for Engineering Design and Optimization

18-660: Numerical Methods for Engineering Design and Optimization 8-66: Numercal Methods for Engneerng Desgn and Optmzaton n L Department of EE arnege Mellon Unversty Pttsburgh, PA 53 Slde Overve lassfcaton Support vector machne Regularzaton Slde lassfcaton Predct categorcal

More information

CSE 546 Midterm Exam, Fall 2014(with Solution)

CSE 546 Midterm Exam, Fall 2014(with Solution) CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

From Biot-Savart Law to Divergence of B (1)

From Biot-Savart Law to Divergence of B (1) From Bot-Savart Law to Dvergence of B (1) Let s prove that Bot-Savart gves us B (r ) = 0 for an arbtrary current densty. Frst take the dvergence of both sdes of Bot-Savart. The dervatve s wth respect to

More information

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1 C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Lecture 2: Prelude to the big shrink

Lecture 2: Prelude to the big shrink Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Section 8.3 Polar Form of Complex Numbers

Section 8.3 Polar Form of Complex Numbers 80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the

More information

THE SUMMATION NOTATION Ʃ

THE SUMMATION NOTATION Ʃ Sngle Subscrpt otaton THE SUMMATIO OTATIO Ʃ Most of the calculatons we perform n statstcs are repettve operatons on lsts of numbers. For example, we compute the sum of a set of numbers, or the sum of the

More information

PHYS 705: Classical Mechanics. Calculus of Variations II

PHYS 705: Classical Mechanics. Calculus of Variations II 1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Chapter 6 Support vector machine. Séparateurs à vaste marge

Chapter 6 Support vector machine. Séparateurs à vaste marge Chapter 6 Support vector machne Séparateurs à vaste marge Méthode de classfcaton bnare par apprentssage Introdute par Vladmr Vapnk en 1995 Repose sur l exstence d un classfcateur lnéare Apprentssage supervsé

More information

Moments of Inertia. and reminds us of the analogous equation for linear momentum p= mv, which is of the form. The kinetic energy of the body is.

Moments of Inertia. and reminds us of the analogous equation for linear momentum p= mv, which is of the form. The kinetic energy of the body is. Moments of Inerta Suppose a body s movng on a crcular path wth constant speed Let s consder two quanttes: the body s angular momentum L about the center of the crcle, and ts knetc energy T How are these

More information

10. Canonical Transformations Michael Fowler

10. Canonical Transformations Michael Fowler 10. Canoncal Transformatons Mchael Fowler Pont Transformatons It s clear that Lagrange s equatons are correct for any reasonable choce of parameters labelng the system confguraton. Let s call our frst

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Pattern Classification

Pattern Classification Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem. prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

17 Support Vector Machines

17 Support Vector Machines 17 We now dscuss an nfluental and effectve classfcaton algorthm called (SVMs). In addton to ther successes n many classfcaton problems, SVMs are responsble for ntroducng and/or popularzng several mportant

More information

Primer on High-Order Moment Estimators

Primer on High-Order Moment Estimators Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution. Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,

More information

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them? Image classfcaton Gven te bag-of-features representatons of mages from dfferent classes ow do we learn a model for dstngusng tem? Classfers Learn a decson rule assgnng bag-offeatures representatons of

More information

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k. THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Recap: the SVM problem

Recap: the SVM problem Machne Learnng 0-70/5-78 78 Fall 0 Advanced topcs n Ma-Margn Margn Learnng Erc Xng Lecture 0 Noveber 0 Erc Xng @ CMU 006-00 Recap: the SVM proble We solve the follong constraned opt proble: a s.t. J 0

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information

Time-Varying Systems and Computations Lecture 6

Time-Varying Systems and Computations Lecture 6 Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy

More information

8.6 The Complex Number System

8.6 The Complex Number System 8.6 The Complex Number System Earler n the chapter, we mentoned that we cannot have a negatve under a square root, snce the square of any postve or negatve number s always postve. In ths secton we want

More information

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012 Support Vector Machnes Je Tang Knowledge Engneerng Group Department of Computer Scence and Technology Tsnghua Unversty 2012 1 Outlne What s a Support Vector Machne? Solvng SVMs Kernel Trcks 2 What s a

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

Lecture 17: Lee-Sidford Barrier

Lecture 17: Lee-Sidford Barrier CSE 599: Interplay between Convex Optmzaton and Geometry Wnter 2018 Lecturer: Yn Tat Lee Lecture 17: Lee-Sdford Barrer Dsclamer: Please tell me any mstake you notced. In ths lecture, we talk about the

More information

Lecture 6: Support Vector Machines

Lecture 6: Support Vector Machines Lecture 6: Support Vector Machnes Marna Melă mmp@stat.washngton.edu Department of Statstcs Unversty of Washngton November, 2018 Lnear SVM s The margn and the expected classfcaton error Maxmum Margn Lnear

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus

More information

UVA CS / Introduc8on to Machine Learning and Data Mining

UVA CS / Introduc8on to Machine Learning and Data Mining UVA CS 4501-001 / 6501 007 Introduc8on to Machne Learnng and Data Mnng Lecture 11: Classfca8on wth Support Vector Machne (Revew + Prac8cal Gude) Yanjun Q / Jane Unversty of Vrgna Department of Computer

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

= z 20 z n. (k 20) + 4 z k = 4

= z 20 z n. (k 20) + 4 z k = 4 Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5

More information

Section 3.6 Complex Zeros

Section 3.6 Complex Zeros 04 Chapter Secton 6 Comple Zeros When fndng the zeros of polynomals, at some pont you're faced wth the problem Whle there are clearly no real numbers that are solutons to ths equaton, leavng thngs there

More information