Kernel Methods and SVMs
|
|
- Phebe Ginger Bridges
- 6 years ago
- Views:
Transcription
1 Statstcal Machne Learnng Notes 7 Instructor: Justn Domke Kernel Methods and SVMs Contents 1 Introducton 2 2 Kernel Rdge Regresson 2 3 The Kernel Trck 5 4 Support Vector Machnes 7 5 Examples 1 6 Kernel Theory Kernel algebra Understandng Polynomal Kernels va Kernel Algebra Mercer s Theorem Our Story so Far 21 8 Dscusson SVMs as Template Methods Theoretcal Issues
2 Kernel Methods and SVMs 2 1 Introducton Support Vector Machnes (SVMs) are a very succesful and popular set of technques for classfcaton. Hstorcally, SVMs emerged after the neural network boom of the 8s and early 9s. People were surprsed to see that SVMs wth lttle to no tweakng could compete wth neural networks nvolvng a great deal of manual engneerng. It remans true today that SVMs are among the best off-the-shelf classfcaton methods. If you want to get good results wth a mnmum of messng around, SVMs are a very good choce. Unlke the other classfcaton methods we dscuss, t s not convenent to begn wth a concse defnton of SVMs, or even to say what exactly a support vector s. There s a set of deas that must be understood frst. Most of these you have already seen n the notes on lnear methods, bass expansons, and template methods. The bggest remanng concept s known as the kernel trck. In fact, ths dea s so fundamental many people have advocated that SVMs be renamed Kernel Machnes. It s worth mentonng that the standard presentaton of SVMs s based on the concept of margn. For lack of tme, ths perspectve on SVMs wll not be presented here. If you wll be workng serously wth SVMs, you should famlarze yourself wth the margn perspectve to enoy a full understandng. (Warnng: These notes are probably the most techncally challengng n ths course, partcularly f you don t have a strong background n lnear algebra, Lagrange multplers, and optmzaton. Kernel methods smply use more mathematcal machnery than most of the other technques we cover, so you should be prepared to put n some extra effort. Enoy!) 2 Kernel Rdge Regresson We begn by not talkng about SVMs, or even about classfcaton. Instead, we revst rdge regresson, wth a slght change of notaton. Let the set of nputs be {(x, y )}, where ndexes the samples. The problem s to mnmze ( ) 2 x w y + λw w If we take the dervatve wth respect to w and set t to zero, we get
3 Kernel Methods and SVMs 3 = 2x (x T w y ) + 2λw w = ( x x T + λi ) 1 x y. Now, let s consder a dfferent dervaton, makng use of some Lagrange dualty. If we ntroduce a new varable z, and constran t to be the dfference between w x and y, we have 1 mn w,z 2 z z + 1 λw w (2.1) 2 s.t. z = x w y. Usng α to denote the Lagrange multplers, ths has the Lagrangan L = 1 2 z z λw w + α (x w y z ). Recall our foray nto Lagrange dualty. We can solve the orgnal problem by dong max α mn L(w,z, α). w,z To begn, we attack the nner mnmzaton: For fxed α, we would lke to solve for the mnmzng w and z. We can do ths by settng the dervatves of L wth respect to z and w to be zero. Dong ths, we can fnd 1 z = α, w = 1 α x (2.2) λ So, we can solve the problem by maxmzng the Lagrangan (wth respect to α), where we substtute the above expressons for z and w. Thus, we have an unconstraned maxmzaton. max α L(w (α),z (α), α) 1 = dl dz = z α = dl dw = λw + α x
4 Kernel Methods and SVMs 4 Before dvng nto the detals of that, we can already notce somethng very nterestng happenng here: w s gven by a sum of the nput vectors x, weghted by α /λ. If we were so nclned, we could avod explctly computng w, and predct a new pont x drectly from the data as f(x) = x w = 1 α x x. λ Now, let k(x,x ) = x x be the kernel functon. For now, ust thnk of ths as a change of notaton. Usng ths, we can agan wrte the rdge regresson predctons as f(x) = 1 α k(x,x ). λ Thus, all we really need s the nner product of x wth each of the tranng elements x. We wll return to why ths mght be useful later. Frst, let s return to dong the maxmzaton over the Lagrange multplers α, to see f anythng smlar happens there. The math below looks really complcated. However, all the we are dong s substtutng the expressons for z and w from Eq. 2.2, then dong a lot of manpulaton. max mn L α w,z = max α = max α = max α 1 2 = max α α λ( 1 ) ( 1 ) α x α x λ λ + α (x 1 α x y α ) λ 1 α α α x x 1 α α x x 2 2λ λ α 2 1 α α x x α y 2λ α 2 1 α α k(x,x ) α y. 2λ α (y + α ) Agan, we only need nner products. If we defne the matrx K by K = k(x,x ), then we can rewrte ths n a puncher vector notaton as
5 Kernel Methods and SVMs 5 max mn L = max 1 α w,z α 2 α α 1 2λ αt Kα α y. Here, we use a captal K to denote the matrx wth entres K and a lowercase k to denote the kernel functon k(, ). Note that most lterature on kernel machnes mldly abuses notaton by usng the captal letter K for both. The thng on the rght s ust a quadratc n α. As such, we can fnd the optmum as the soluton of a lnear system 2. What s mportant s the observaton that, agan, we only need the nner products of the data k(x,x ) = x x to do the optmzaton over α. Then, once we have solved for α, we can predct f(x) for new x agan usng only nner products. If someone tells us all the nner products, we don t need the orgnal data {x } at all! 3 The Kernel Trck So we can work completely wth nner products, rather than the vectors themselves. So what? One way of lookng at thngs s that we can mplctly use bass expansons. If we want to take x, and transform t nto some fancy feature space φ(x), we can replace the kernel functon by K = k(x,x ) = φ(x ) φ(x ). The pont of talkng about ths s that for certan bass expansons, we can compute k very cheaply wthout ever explctly formng φ(x ) or φ(x ). Ths can mean a huge computatonal savngs. A nce example of ths s the kernel functon We can see that k(x,v) = (x v) 2. 2 It s easy to show (by takng the gradent) that the optmum s at α = ( 1 λ K I) 1 y.
6 Kernel Methods and SVMs 6 k(x,v) = ( = ( = x v ) 2 x v )( x v ) x x v v. It s not hard to see that k(x,v) = φ(x) φ(v), where φ s a quadratc bass expanson φ m (x) = φ (x)φ (x). For example, n two dmensons, k(x,v) = ( v 1 + v 2 ) 2 = v 1 v 1 + v 1 v 2 + v 2 v 1 + v 2 v 2. whle the bass expansons are φ(x) = (,,, ), φ(v) = (v 1 v 1, v 1 v 2, v 2 v 1, v 2 v 2 ). It s not hard to work out that k(x,v) = φ(x) φ(v). However, notce that we can compute k(x,v) n tme O(d), rather than the O(d 2 ) tme t would take to explctly compute φ(x) φ(v). Ths s the kernel trck : gettng around the computatonal expense n computng large bass expansons by drectly computng kernel functons. Notce, however, that the kernel trck changes nothng, nada, zero about the statstcal ssues wth huge bass expansons. We get exactly the same predctons as f we computed the bass expanson explctly, and used tradtonal lnear methods. We ust compute the predctons n a dfferent way. In fact can nvent a new kernel functon k(x,v), and, as long as t obeys certan rules, use t n the above algorthm, wth out explctly thnkng about bass expanson at all. Some common examples are: name k(x, v) Lnear x v Polynomal (r + x v) d, for some r, d > Radal Bass Functon exp ( γ x v 2), γ > Gaussan exp ( 1 x v 2) 2σ 2
7 Kernel Methods and SVMs 7 We wll return below to the queston of what kernel functons are legal, meanng there s some feature space φ such that k(x,v) = φ(x) φ(v). Now, what exactly was t about rdge regresson that let us get away wth workng entrely wth nner products? How much could we change the problem, and preserve ths? We really need two thngs to happen: 1. When we take dl/dw =, we need to be able to solve for w, and the soluton needs to be a lnear combnaton of the nput vectors x. 2. When we substtute ths soluton back nto the Lagrangan, we need to get a soluton that smplfes down nto nner products only. Notce that ths leaves us a great deal of flexblty. For example, we could replace the leastsquares crteron z z wth an alternatve (convex) measure. We could also change the way n whch whch measure errors from z = w x y, to somethng else, (although wth some restrctons). 4 Support Vector Machnes Now, we turn to the bnary classfcaton problem. Support Vector Machnes result from mnmzng the hnge loss (1 y w x ) + wth rdge regularzaton. mn w Ths s equvalent to (for c = 2/λ) (1 y w x ) + + λ w 2. mn w c (1 y w x ) w 2. Because the hnge loss s non-dfferentable, we ntroduce new varables z, creatng a constraned optmzaton mn z,w c z w 2 (4.1) s.t. z (1 y w x ) z.
8 Kernel Methods and SVMs 8 Introducng new constrants to smplfy an obectve lke ths seems strange at frst, but sn t too hard to understand. Notce the constrants are exactly equvalent to forcng that z (1 y w x ) +. But snce we are mnmzng the sum of all the z, the optmzaton wll make t as small as possble, and so z wll be the hnge loss for example, no more, no less. Introducng Lagrange multplers α to enforce that z (1 y w x ) and µ to enforce that z, we get the Lagrangan L = c z w 2 + α (1 y w x z ) + µ ( z ). A bunch of manpulaton changes ths to L = z ( c µ α ) w 2 + α w α y x. As ever, Lagrangan dualty states that we can solve our orgnal problem by dong max mn L. α,µ z,w For now, we work on the nner mnmzaton. For a partcular α and µ, we want to mnmze wth respect to z and w. By settng dl/dw =, we fnd that w = α y x. Meanwhle, settng dl/dz gves that α = c µ. If we substtute these expressons, we fnd that µ dsappears. However, notce that snce µ we must have that α c. max mn L = max ( ) 1 z α α + α α y y x x α,µ z,w α C 2 + α α y x α y x = max α 1 α α y y x x α c 2
9 Kernel Methods and SVMs 9 Ths s a maxmzaton of a quadratc obectve, under lnear constrants. That s, ths s a quadratc program. Hstorcally, QP solvers were frst used to solve SVM problems. However, as these scale poorly to large problems, a huge amount of effort has been devoted to faster solvers. (Often based on coordnate ascent and/or onlne optmzaton). Ths area s stll evolvng. However, software s wdely avalable now for solvers that are qute fast n practce. Now, as we saw above that w = α y x, we can classfy new ponts x by f(x) = α y x x Clearly, ths can be kernelzed. If we do so, we can compute the Lagrange multplers by the SVM optmzaton max α c α 1 α α y y k(x,x ), (4.2) 2 whch s agan a quadratc program. We can classfy new ponts by the SVM classfcaton rule f(x) = α y k(x,x ). (4.3) Snce we have kernelzed both the learnng optmzaton, and the classfcaton rule, we are agan free to replace k wth any of the varety of kernel functons we saw before. Now, fnally, we can defne what a support vector s. Notce that Eq. 4.2 s the maxmzaton of a quadratc functon of α, under the box constrants that α c. It often happens that α wants to be negatve (n terms ot the quadratc functon), but s prevented from ths by the constrants. Thus, α s often sparse. Ths has some nterestng consequences. Frst of all, clearly f α =, we don t need to nclude the correspondng term n Eq Ths s potentally a bg savngs. If all α are nonzero, then we would need to explctly compute the kernel functon wth all nputs, and our tme complexty s smlar to a nearest-neghbor method. If we only have a few nonzero α then we only have to compute a few kernel functons, and our complexty s smlar to that of a normal lnear method. Another nterestng property of the sparsty of α s that non- don t affect the soluton. Let s see why. What does t mean f α =? Well, recall that the multpler α s enforcng the constrant that z 1 y w x. (4.4)
10 Kernel Methods and SVMs 1 If α = at the soluton, then ths means, nformally speakng, that we ddn t really need to enforce ths constrant at all: If we threw t out of the optmzaton, t would stll automatcally be obeyed. How could ths be? Recall that the orgnal optmzaton n Eq. 4.1 s tryng to mnmze all the z. There are two thngs stoppng t from flyng down to : the constrant n Eq. 4.4 above, and the constrant that z. If the constrant above can be removed wth out changng the soluton, then t must be that z =. Thus, α = mples that 1 y w x, or, equvalently, that y w x 1. Thus non- are ponts that are very well classfed, that are comfortably on the rght sde of the lnear boundary. Now, magne we take some x wth z =, and remove t from the tranng set. It s pretty easy to see that ths s equvalent to takng the optmzaton mn z,w c z w 2 s.t. z (1 y w x ) z, and ust droppng the constrant that z (1 y w x ), meanng that z decouples from the other varables, and the optmzaton wll pck z =. But, as we saw above, ths has no effect. Thus, removng a non-support vector from the tranng set has no mpact on the resultng classfcaton rule. 5 Examples In class, we saw some examples of runnng SVMs. Here are many more.
11 Kernel Methods and SVMs 11 Dataset A, c = 1, k(x,v) = x v. predctons 1 5 α sorted ndces Dataset A, c = 1 3, k(x,v) = x v. predctons 1 5 α sorted ndces Dataset A, c = 1 5, k(x,v) = x v. predctons 1 5 α sorted ndces
12 Kernel Methods and SVMs 12 Dataset A, c = 1, k(x,v) = 1 + x v. predctons 1 5 α sorted ndces Dataset A, c = 1 3, k(x,v) = 1 + x v. predctons 1 5 α sorted ndces Dataset A, c = 1 5, k(x,v) = 1 + x v. predctons 1 5 α sorted ndces
13 Kernel Methods and SVMs 13 Dataset B, c = 1 5, k(x,v) = 1 + x v. predctons 1 5 α sorted ndces Dataset B, c = 1 5, k(x,v) = (1 + x v) 5. predctons 1 5 α sorted ndces Dataset B, c = 1 5, k(x,v) = (1 + x v) 1. predctons 1 5 α sorted ndces
14 Kernel Methods and SVMs 14 Dataset C (dataset B wth nose), c = 1 5, k(x,v) = 1 + x v. predctons 1 5 α sorted ndces Dataset C, c = 1 5, k(x,v) = (1 + x v) 5. predctons 1 5 α sorted ndces Dataset C, c = 1 5, k(x,v) = (1 + x v) 1. predctons 1 5 α sorted ndces
15 Kernel Methods and SVMs 15 Dataset C (dataset B wth nose), c = 1 5, k(x,v) = exp ( 2 x v 2). predctons 1 5 α sorted ndces Dataset C, c = 1 5, k(x,v) = exp ( 2 x v 2). predctons 1 5 α sorted ndces Dataset C, c = 1 5, k(x,v) = exp ( 2 x v 2). predctons 1 5 α sorted ndces
16 Kernel Methods and SVMs 16 6 Kernel Theory We now return to the ssue of what makes a vald kernel k(x, v) where vald means there exsts some feature space φ such that k(x,v) = φ(x) φ(v). 6.1 Kernel algebra We can construct complex kernel functons from smple ones, usng an algebra of composton rules 3. Interestngly, these rules can be understood from parallel compostons n feature space. To take an example, suppose we have two vald kernel functons k a and k b. If we defne a new kernel functon by k(x,v) = k a (x,v) + k b (x,v), k wll be vald. To see why, consder the feature spaces φ a and φ b correspondng to k a and k b. If we defne φ by ust concatenatng φ a and φ b, by φ(x) = (φ a (x), φ b (x)), then φ s the feature space correspondng to k. To see ths, note φ(x) φ(v) = (φ a (x), φ b (x)) (φ a (v), φ b (v)) = φ a (x) φ a (v) + φ b (x)φ b (v) = k a (x,v) + k b (x,v) = k(x,v). We can make a table of kernel composton rules, along wth the dual feature space composton rules. kernel composton feature composton a) k(x,v) = k a (x,v) + k b (x,v) b) k(x,v) = fk a (x,v), f > φ(x) = (φ a (x), φ b (x)), φ(x) = fφ a (x) c) k(x,v) = k a (x,v)k b (x,v) φ m (x) = φ a (x)φ b (x) d) k(x,v) = x T Av, A postve sem-defnte φ(x) = L T x, where A = LL T. e) k(x,v) = x T M T Mv, M arbtrary φ(x) = Mx 3 Ths materal s based on class notes from Mchael Jordan.
17 Kernel Methods and SVMs 17 We have already proven rule (a). Let s prove some of the others. Rule (b) s qute easy to understand: φ(x) φ(v) = fφ a (x) fφ a (v) = fφ a (x) φ a (v) = fk a (x,v) = k(x, v) Rule (c) s more complex. It s mportant to understand the notaton. If then φ contans all sx pars. φ a (x) = (φ a1 (x), φ a2 (x), φ a3 (x)) φ b (x) = (φ b1 (x), φ b2 (x)), φ(x) = ( φ a1 (x)φ b1 (x), φ a2 (x)φ b1 (x), φ a3 (x)φ b1 (x) φ a1 (x)φ b2 (x), φ a2 (x)φ b2 (x), φ a3 (x)φ b2 (x) ) Wth that understandng, we can prove rule (c) va φ(x) φ(v) = φ m (x) m φ(v) m = φ a (x)φ b (x)φ a (v)φ b (v) = ( φ a (x)φ a (v) )( φ b (x)φ b (v) ) = ( φ a (x) φ a (v) )( φ b (x) φ b (v) ) = k a (x,v)k b (x,v) = k(x,v). Rule (d) follows from the well known result n lnear algebra that a symmetrc postve sem-defnte matrx A can be factored as A = LL T. Wth that known, clearly
18 Kernel Methods and SVMs 18 φ(x) φ(v) = (Lx) (Lv) = x T L T Lv = x T A T v = x T Av = k(x, v) We can alternatvely thnk of rule (d) as sayng that k(x) = x T M T Mx corresponds to the bass expanson φ(x) = Mx for any M. That gves rule (e). 6.2 Understandng Polynomal Kernels va Kernel Algebra So, we have all these rules for combnng kernels. What do they tell us? Rules (a),(b), and (c) essentally tell us that polynomal combnatons of vald kernels are vald kernels. Usng ths, we can understand the meanng of polynomal kernels Frst off, for some scalar varable x, consder a polynomal kernel of the form k(x, v) = (xv) d. To what bass expanson does ths kernel correspond? We can buld ths up stage by stage. k(x, v) = xv φ(x) = (x) k(x, v) = (xv) 2 φ(x) = ( ) by rule (c) k(x, v) = (xv) 3 φ(x) = (x 3 ) by rule (c). If we work wth vectors, we fnd that k(x,v) = (x v) corresponds to φ(x) = x, whle (by rule (c)) k(x,v) = (x v) 2 corresponds to a feature space wth all parwse terms φ m (x) = x x, 1, n. Smlarly, k(x,v) = (x v) 3 corresponds to a feature space wth all trplets φ m (x) = x x x k, 1,, k n.
19 Kernel Methods and SVMs 19 More generally, k(x,v) = (x v) d corresponds to a feature space wth terms φ m (x) =...x d, 1 n (6.1) Thus, a polynomal kernel s equvalent to a polynomal bass expanson, wth all terms of order d. Ths s pretty surprsng even though the word polynomal s n front of both of these terms! Agan, we should reterate the computatonal savngs here. In general, computng a polynomal bass expanson wll take tme O(n d ). However, computng a polynomal kernel only takes tme O(n). Agan, though, we have only defeated the computatonal ssue wth hgh-degree polynomal bass expansons. The statstcal propertes are unchanged. Now, consder the kernel k(x,v) = (r + x v) d. What s the mpact of addng the constant of r? Notce that ths s equvalent to smply takng the vectors x and v and prependng a constant of r to them. Thus, ths kernel corresponds to a polynomal expanson wth constant terms added. One way to wrte ths would be φ m (x) =...x d, n (6.2) where we consder x to be equal to r. Thus, ths kernel s equvalent to a polynomal bass expanson wth all terms of all order less than or equal to d. An nterestng queston s the mpact of the constant r. Should we set t large or small? What s the mpact of ths choce. Notce that the lower-order terms n the bass expanson n Eq. 6.2 wll have many terms x, and so get multpled by a r to a hgh power. Meanwhle, hgh order terms wll have have few or no terms of x, and so get multpled by r to a low power. Thus, a large factor of r has the effect of makng the low-order term larger, relatve to hgh-order terms. Recall that f we make the bass expanson larger, ths has the effect of reducng the the regularzaton penalty, snce the same classfcaton rule can be accomplshed wth a smaller weght. Thus, f we make part of a bass expanson larger, those parts of the bass expanson wll tend to play a larger role n the fnal classfcaton rule. Thus, usng a larger constant r had the effect of makng the low-order parts of the polynomal expanson n Eq. 6.2 tend to have more mpact. 6.3 Mercer s Theorem One thng we mght worry about s f the SVM optmzaton s convex n α. The concern s f α α y y k(x,x ) (6.3)
20 Kernel Methods and SVMs 2 s convex wth respect to α. We can show that f k s a vald kernel functon, then the kernel matrx K must be postve sem-defnte. z T Kz = = = z K z z φ(x ) φ(x )z z φ(x ) φ(x )z = z φ(x ) 2 We can also show that, f K s postve sem-defnte, then the SVM optmzaton s concave. The thng to see s that α α y y k(x,x ) = α T dag(y)kdag(y)α = α T Mα, where M = dag(y)kdag(y). It s not hard to show that M s postve sem-defnte. So ths s very nce f we use any vald kernel functon, we can be assured that the optmzaton that we need to solve n order to recover the Lagrange multplers α wll be concave. (The equvalent of convex when we are dong a maxmzaton nstead of a mnmzaton.) Now, we stll face the queston do there exst nvald kernel functons that also yeld postve sem-defnte kernel matrces? It turns out that the answer s no. Ths result s known as Mercer s theorem. A kernel functon s vald f and only f the correspondng kernel matrx s postve sem-defnte for all tranng sets {x }. Ths s very convenent the vald kernel functons are exactly those that yeld optmzaton problems that we can relably solve. However, notce that Mercer s theorem refers to all sets of ponts {x }. An nvald kernel can yeld a postve sem-defnte kernel matrx for some partcular tranng set. All we know s that, for an nvald kernel, there s some tranng set that yelds a non postve sem-defnte kernel matrx.
21 Kernel Methods and SVMs 21 7 Our Story so Far There were a lot of techncal detals here. It s worth takng a look back to do a conceptual overvew and make sure we haven t mssed the bg pcture. The startng pont for SVMs s s mnmzng the hnge loss under rdge regresson,.e. w = mn w c (1 y w x ) w 2. Fundamentally, SVMs are ust fttng an optmzaton lke ths. The dfference s that they perform the optmzaton n a dfferent way, and they allow you to work effcently wth powerful bass expansons / kernel functons. Act 1. We proved that f w s the vector of weghts that results from ths optmzaton, then we could alternatvely calculate w as w = α y x, where the α are gven by the optmzaton max α c α 1 α α y y x x. (7.1) 2 Wth that optmzaton solved, we can classfy a new pont x by f(x) = x w = α y x x, (7.2) Thus, f we want to, we can thnk of the varables α as beng the man thngs that we are fttng, rather than the weghts w. Act 2. Next, we notced that the above optmzaton (Eq. 7.3) and classfer (Eq. 7.4) only depend on the nner products of the data elements. Thus, we could replace the nner products n these expressons wth kernel evaluatons, gvng the optmzaton and the classfcaton rule max α c α 1 α α y y k(x,x ), (7.3) 2
22 Kernel Methods and SVMs 22 f(x) = α y k(x,x ), (7.4) where k(x,v) = x v. Act 3. Now, magne that nstead of drectly workng wth the data, we wanted to work wth some bass expanson. Ths would be easy to accomplsh ust by swtchng the kernel functon to be k(x,v) = φ(x) φ(v). However, we also notced that for some bass expansons, lke polynomals, we could compute k(x, v) much more effcently than explctly formng the bass expansons and then takng the nner product. We called ths computatonal trck the kernel trck. Act 4. Fnally, we developed a kernel algebra, whch allowed us to understand how we can combne dfferent kernel functons, and what ths meant n feature space. We also saw Mercer s theorm, whch tells us what kernel functons are and are not legal. Happly, ths corresponded wth the SVM optmzaton problem beng convex. 8 Dscusson 8.1 SVMs as Template Methods Regardless of what we say, at the end of the day, support vector machnes make ther predctons through the classfcaton rule f(x) = α y k(x,x ). Intutvely, k(x,x ) measures how smlar x s to tranng example x. Ths bears a strong resemblence to K-NN classfcaton, where we use k as the dstance metrc, rather than somethng lke the Eucldean dstance. Thus, SVMs can be seen as glorfed template methods, where the amount that each pont x partcpates n predctons s reweghted n the learnng stage. Ths s a vew usually espoused by SVM skeptcs, but a reasonable one. Remember, however, that there s absolutely nothng wrong wth template methods.
23 Kernel Methods and SVMs Theoretcal Issues An advantage of SVMs s that rgorous theoretcal guarantees can often be gven for ther performance. It s possble to use these theoretcal bounds to do model selecton, rather than, e.g., cross valdaton. However, at the moment, these theoretcal guarantees are rather loose n practce, meanng that SVMs perform sgnfcantly better than the bounds can show. As such, one can often get better practcal results by usng more heurstc model selecton procedures lke cross valdaton. We wll see ths when we get to learnng theory.
Lecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationLecture 3: Dual problems and Kernels
Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationLinear Classification, SVMs and Nearest Neighbors
1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far
More informationMaximal Margin Classifier
CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationLagrange Multipliers Kernel Trick
Lagrange Multplers Kernel Trck Ncholas Ruozz Unversty of Texas at Dallas Based roughly on the sldes of Davd Sontag General Optmzaton A mathematcal detour, we ll come back to SVMs soon! subject to: f x
More informationSupport Vector Machines
Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More information1 Matrix representations of canonical matrices
1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationUVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 10: Classifica8on with Support Vector Machine (cont.
UVA CS 4501-001 / 6501 007 Introduc8on to Machne Learnng and Data Mnng Lecture 10: Classfca8on wth Support Vector Machne (cont. ) Yanjun Q / Jane Unversty of Vrgna Department of Computer Scence 9/6/14
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationSupport Vector Machines
/14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationCSE 252C: Computer Vision III
CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING
1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would
More informationFisher Linear Discriminant Analysis
Fsher Lnear Dscrmnant Analyss Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan Fsher lnear
More informationKernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan
Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationWe present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.
CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationCS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015
CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research
More informationKristin P. Bennett. Rensselaer Polytechnic Institute
Support Vector Machnes and Other Kernel Methods Krstn P. Bennett Mathematcal Scences Department Rensselaer Polytechnc Insttute Support Vector Machnes (SVM) A methodology for nference based on Statstcal
More informationAdvanced Introduction to Machine Learning
Advanced Introducton to Machne Learnng 10715, Fall 2014 The Kernel Trck, Reproducng Kernel Hlbert Space, and the Representer Theorem Erc Xng Lecture 6, September 24, 2014 Readng: Erc Xng @ CMU, 2014 1
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More information18-660: Numerical Methods for Engineering Design and Optimization
8-66: Numercal Methods for Engneerng Desgn and Optmzaton n L Department of EE arnege Mellon Unversty Pttsburgh, PA 53 Slde Overve lassfcaton Support vector machne Regularzaton Slde lassfcaton Predct categorcal
More informationCSE 546 Midterm Exam, Fall 2014(with Solution)
CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationFrom Biot-Savart Law to Divergence of B (1)
From Bot-Savart Law to Dvergence of B (1) Let s prove that Bot-Savart gves us B (r ) = 0 for an arbtrary current densty. Frst take the dvergence of both sdes of Bot-Savart. The dervatve s wth respect to
More informationC/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1
C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned
More informationTHE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens
THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationLecture 2: Prelude to the big shrink
Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationInner Product. Euclidean Space. Orthonormal Basis. Orthogonal
Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationSection 8.3 Polar Form of Complex Numbers
80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the
More informationTHE SUMMATION NOTATION Ʃ
Sngle Subscrpt otaton THE SUMMATIO OTATIO Ʃ Most of the calculatons we perform n statstcs are repettve operatons on lsts of numbers. For example, we compute the sum of a set of numbers, or the sum of the
More informationPHYS 705: Classical Mechanics. Calculus of Variations II
1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationChapter 6 Support vector machine. Séparateurs à vaste marge
Chapter 6 Support vector machne Séparateurs à vaste marge Méthode de classfcaton bnare par apprentssage Introdute par Vladmr Vapnk en 1995 Repose sur l exstence d un classfcateur lnéare Apprentssage supervsé
More informationMoments of Inertia. and reminds us of the analogous equation for linear momentum p= mv, which is of the form. The kinetic energy of the body is.
Moments of Inerta Suppose a body s movng on a crcular path wth constant speed Let s consder two quanttes: the body s angular momentum L about the center of the crcle, and ts knetc energy T How are these
More information10. Canonical Transformations Michael Fowler
10. Canoncal Transformatons Mchael Fowler Pont Transformatons It s clear that Lagrange s equatons are correct for any reasonable choce of parameters labelng the system confguraton. Let s call our frst
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationOutline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique
Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationPattern Classification
Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationLecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.
prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More information17 Support Vector Machines
17 We now dscuss an nfluental and effectve classfcaton algorthm called (SVMs). In addton to ther successes n many classfcaton problems, SVMs are responsble for ntroducng and/or popularzng several mportant
More informationPrimer on High-Order Moment Estimators
Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationSolutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.
Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,
More informationImage classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?
Image classfcaton Gven te bag-of-features representatons of mages from dfferent classes ow do we learn a model for dstngusng tem? Classfers Learn a decson rule assgnng bag-offeatures representatons of
More informationCase A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.
THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationRecap: the SVM problem
Machne Learnng 0-70/5-78 78 Fall 0 Advanced topcs n Ma-Margn Margn Learnng Erc Xng Lecture 0 Noveber 0 Erc Xng @ CMU 006-00 Recap: the SVM proble We solve the follong constraned opt proble: a s.t. J 0
More informationNatural Language Processing and Information Retrieval
Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016
U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and
More informationTime-Varying Systems and Computations Lecture 6
Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy
More information8.6 The Complex Number System
8.6 The Complex Number System Earler n the chapter, we mentoned that we cannot have a negatve under a square root, snce the square of any postve or negatve number s always postve. In ths secton we want
More informationSupport Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012
Support Vector Machnes Je Tang Knowledge Engneerng Group Department of Computer Scence and Technology Tsnghua Unversty 2012 1 Outlne What s a Support Vector Machne? Solvng SVMs Kernel Trcks 2 What s a
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationLecture 17: Lee-Sidford Barrier
CSE 599: Interplay between Convex Optmzaton and Geometry Wnter 2018 Lecturer: Yn Tat Lee Lecture 17: Lee-Sdford Barrer Dsclamer: Please tell me any mstake you notced. In ths lecture, we talk about the
More informationLecture 6: Support Vector Machines
Lecture 6: Support Vector Machnes Marna Melă mmp@stat.washngton.edu Department of Statstcs Unversty of Washngton November, 2018 Lnear SVM s The margn and the expected classfcaton error Maxmum Margn Lnear
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus
More informationUVA CS / Introduc8on to Machine Learning and Data Mining
UVA CS 4501-001 / 6501 007 Introduc8on to Machne Learnng and Data Mnng Lecture 11: Classfca8on wth Support Vector Machne (Revew + Prac8cal Gude) Yanjun Q / Jane Unversty of Vrgna Department of Computer
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More information= z 20 z n. (k 20) + 4 z k = 4
Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5
More informationSection 3.6 Complex Zeros
04 Chapter Secton 6 Comple Zeros When fndng the zeros of polynomals, at some pont you're faced wth the problem Whle there are clearly no real numbers that are solutons to ths equaton, leavng thngs there
More information