CSCI567 Machine Learning (Fall 2014)
|
|
- Amber Bruce
- 5 years ago
- Views:
Transcription
1 CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu October 9, 2014 Drs. Sha & Liu CSCI567 Machie Learig (Fall 2014) October 9, / 49
2 Outlie Admiistratio 1 Admiistratio 2 Review of last lecture 3 Support vector machies 4 Geometric Uderstadig of SVM Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
3 Quiz #1 Admiistratio Tuesday Oct pm TTH301 Some exceptios are hadled case by case. Drs. Sha & Liu CSCI567 Machie Learig (Fall 2014) October 9, / 49
4 Outlie Review of last lecture 1 Admiistratio 2 Review of last lecture Kerel methods Kerelized machie learig methods 3 Support vector machies 4 Geometric Uderstadig of SVM Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
5 Review of last lecture Kerel methods How to to do oliear predictio without specifyig oliear basis fuctios? Defiitio of kerel fuctio: a (positive semidefiite) kerel fuctio k(, ) is a bivariate fuctio that satisfies the followig properties. For ay x m ad x, k(x m, x ) = k(x, x m ) ad k(x m, x ) = φ(x m ) T φ(x ) for some fuctio φ( ). Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
6 Review of last lecture Kerel methods How to to do oliear predictio without specifyig oliear basis fuctios? Defiitio of kerel fuctio: a (positive semidefiite) kerel fuctio k(, ) is a bivariate fuctio that satisfies the followig properties. For ay x m ad x, k(x m, x ) = k(x, x m ) ad k(x m, x ) = φ(x m ) T φ(x ) for some fuctio φ( ). Examples we have see k(x m, x ) = (x T mx ) 2 k(x m, x ) = 2 si(2π(x m1 x 1 )) x m1 x 1 si(2π(x m2 x 2 )) x m2 x 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
7 Review of last lecture Kerel methods Coditios for beig a positive semidefiite kerel fuctio Mercer theorem (loosely), a bivariate fuctio k(, ) is a positive semidefiite kerel fuctio, if ad oly if, for ay N ad ay x 1, x 2,..., ad x N, the matrix K = k(x 1, x 1 ) k(x 1, x 2 ) k(x 1, x N ) k(x 2, x 1 ) k(x 2, x 2 ) k(x 2, x N ).... k(x N, x 1 ) k(x N, x 2 ) k(x N, x N ) is positive semidefiite. We also refer k(, ) as a positive semidefiite kerel. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
8 Review of last lecture Kerel methods Flashback: why usig kerel fuctios? without specifyig φ( ), the kerel matrix k(x 1, x 1 ) k(x 1, x 2 ) k(x 1, x N ) k(x 2, x 1 ) k(x 2, x 2 ) k(x 2, x N ) K =.... k(x N, x 1 ) k(x N, x 2 ) k(x N, x N ) is exactly the same as K = ΦΦ T = φ(x 1 ) T φ(x 1 ) φ(x 1 ) T φ(x 2 ) φ(x 1 ) T φ(x N ) φ(x 2 ) T φ(x 1 ) φ(x 2 ) T φ(x 2 ) φ(x 2 ) T φ(x N ) φ(x N ) T φ(x 1 ) φ(x N ) T φ(x 2 ) φ(x N ) T φ(x N ) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
9 Kerel fuctios Review of last lecture Kerel methods Defiitio: a (positive semidefiite) kerel fuctio k(, ) is a bivariate fuctio that satisfies the followig properties. For ay x m ad x, k(x m, x ) = k(x, x m ) ad k(x m, x ) = φ(x m ) T φ(x ) for some fuctio φ( ). Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
10 Review of last lecture Kerel methods Kerel fuctios Defiitio: a (positive semidefiite) kerel fuctio k(, ) is a bivariate fuctio that satisfies the followig properties. For ay x m ad x, k(x m, x ) = k(x, x m ) ad k(x m, x ) = φ(x m ) T φ(x ) for some fuctio φ( ). Examples we have see k(x m, x ) = (x T mx ) 2 k(x m, x ) = 2 si(2π(x m1 x 1 )) x m1 x 1 si(2π(x m2 x 2 )) x m2 x 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
11 Review of last lecture Kerel methods Kerel fuctios Defiitio: a (positive semidefiite) kerel fuctio k(, ) is a bivariate fuctio that satisfies the followig properties. For ay x m ad x, k(x m, x ) = k(x, x m ) ad k(x m, x ) = φ(x m ) T φ(x ) for some fuctio φ( ). Examples we have see k(x m, x ) = (x T mx ) 2 k(x m, x ) = 2 si(2π(x m1 x 1 )) x m1 x 1 si(2π(x m2 x 2 )) x m2 x 2 Examples that are ot kerels k(x m, x ) = x m x 2 2 are ot our desired kerel fuctio as it caot be writte as ier products betwee two vectors. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
12 Review of last lecture Kerel methods Coditios for beig a positive semidefiite kerel fuctio Mercer theorem (loosely), a bivariate fuctio k(, ) is a positive semidefiite kerel fuctio, if ad oly if, for ay N ad ay x 1, x 2,..., ad x N, the matrix K = k(x 1, x 1 ) k(x 1, x 2 ) k(x 1, x N ) k(x 2, x 1 ) k(x 2, x 2 ) k(x 2, x N ).... k(x N, x 1 ) k(x N, x 2 ) k(x N, x N ) is positive semidefiite. We also refer k(, ) as a positive semidefiite kerel. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
13 Review of last lecture Kerel methods Flashback: why usig kerel fuctios? without specifyig φ( ), the kerel matrix k(x 1, x 1 ) k(x 1, x 2 ) k(x 1, x N ) k(x 2, x 1 ) k(x 2, x 2 ) k(x 2, x N ) K =.... k(x N, x 1 ) k(x N, x 2 ) k(x N, x N ) is exactly the same as K = ΦΦ T = φ(x 1 ) T φ(x 1 ) φ(x 1 ) T φ(x 2 ) φ(x 1 ) T φ(x N ) φ(x 2 ) T φ(x 1 ) φ(x 2 ) T φ(x 2 ) φ(x 2 ) T φ(x N ) φ(x N ) T φ(x 1 ) φ(x N ) T φ(x 2 ) φ(x N ) T φ(x N ) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
14 Review of last lecture Examples of kerel fuctios Kerel methods Polyomial kerel fuctio with degree of d for c 0 ad d is a positive iteger. k(x m, x ) = (x T mx + c) d Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
15 Review of last lecture Examples of kerel fuctios Kerel methods Polyomial kerel fuctio with degree of d k(x m, x ) = (x T mx + c) d for c 0 ad d is a positive iteger. Gaussia kerel, RBF kerel, or Gaussia RBF kerel k(x m, x ) = e xm x 2 2 /2σ2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
16 Review of last lecture Examples of kerel fuctios Kerel methods Polyomial kerel fuctio with degree of d k(x m, x ) = (x T mx + c) d for c 0 ad d is a positive iteger. Gaussia kerel, RBF kerel, or Gaussia RBF kerel k(x m, x ) = e xm x 2 2 /2σ2 Most of those kerels have parameters to be tued: d, c, σ 2, etc. They are hyper parameters ad are ofte tued o holdout data or with cross-validatio. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
17 Review of last lecture Kerel methods Why x m x 2 2 is ot a positive semidefiite kerel? Use the defiitio of positive semidefiite kerel fuctio. We choose N = 2, ad compute the matrix ( 0 x K = 1 x 2 2 ) 2 x 1 x This matrix caot be positive semidefiite as it has both egative ad positive eigevalues (the sum of the diagoal elemets is called the trace of a matrix, which equals to the sum of the matrix s eigevalues. I our case, the trace is zero.) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
18 Review of last lecture Kerel methods There are ifiite umbers of kerels to use! Rules of composig kerels (this is just a partial list) if k(x m, x ) is a kerel, the ck(x m, x ) is also if c > 0. if both k 1 (x m, x ) ad k 2 (x m, x ) are kerels, the αk 1 (x m, x ) + βk 2 (x m, x ) are also if α, β 0 if both k 1 (x m, x ) ad k 2 (x m, x ) are kerels, the k 1 (x m, x )k 2 (x m, x ) are also. if k(x m, x ) is a kerel, the e k(xm,x) is also. I practice, usig which kerel, or which kerels to compose a ew kerel, remais somewhat as black art, though most people will start with polyomial ad Gaussia RBF kerels. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
19 Kerelizatio trick Review of last lecture Kerelized machie learig methods May learig methods deped o computig ier products betwee features we have see the example of regularized least squares. For those methods, we ca use a kerel fuctio i the place of the ier products, i.e., kererlizig the methods, thus, itroducig oliear features/basis. We will preset oe more to illustrate this trick by kererlizig earest eighbor classifier. Whe we talk about support vector machies ext lecture, we will see the trick oe more time. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
20 Review of last lecture Kerelized earest eighbor classifier Kerelized machie learig methods I earest eighbor classifier, the most importat quatity to compute is the (squared) distace betwee two data poits x m ad x d(x m, x ) = x m x 2 2 = x T mx m + x T x 2x T mx Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
21 Review of last lecture Kerelized earest eighbor classifier Kerelized machie learig methods I earest eighbor classifier, the most importat quatity to compute is the (squared) distace betwee two data poits x m ad x d(x m, x ) = x m x 2 2 = x T mx m + x T x 2x T mx We replace all the ier products i the distace with a kerel fuctio k(, ), arrivig at the kereled distace d kerel (x m, x ) = k(x m, x m ) + k(x, x ) 2k(x m, x ) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
22 Review of last lecture Kerelized earest eighbor classifier Kerelized machie learig methods I earest eighbor classifier, the most importat quatity to compute is the (squared) distace betwee two data poits x m ad x d(x m, x ) = x m x 2 2 = x T mx m + x T x 2x T mx We replace all the ier products i the distace with a kerel fuctio k(, ), arrivig at the kereled distace d kerel (x m, x ) = k(x m, x m ) + k(x, x ) 2k(x m, x ) The distace is equivalet to compute the distace betwee φ(x m ) ad φ(x ) d kerel (x m, x ) = d(φ(x m ), φ(x )) where the φ( ) is the oliear mappig fuctio implied by the kerel fuctio. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
23 Review of last lecture Kerelized earest eighbor classifier Kerelized machie learig methods I earest eighbor classifier, the most importat quatity to compute is the (squared) distace betwee two data poits x m ad x d(x m, x ) = x m x 2 2 = x T mx m + x T x 2x T mx We replace all the ier products i the distace with a kerel fuctio k(, ), arrivig at the kereled distace d kerel (x m, x ) = k(x m, x m ) + k(x, x ) 2k(x m, x ) The distace is equivalet to compute the distace betwee φ(x m ) ad φ(x ) d kerel (x m, x ) = d(φ(x m ), φ(x )) where the φ( ) is the oliear mappig fuctio implied by the kerel fuctio. The earest eighbor of a poit x is thus foud with arg mi d kerel (x, x ) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
24 Review of last lecture Kerelized machie learig methods Take-home exercise You have see examples of kerelizig liear regressio earest eighbor But ca you kerelize the followig? Decisio tree Logistic (or multiomial logistic) regressio Drs. Sha & Liu CSCI567 Machie Learig (Fall 2014) October 9, / 49
25 Review of last lecture Examples of kerel fuctios Kerelized machie learig methods Polyomial kerel fuctio with degree of d k(x m, x ) = (x T mx + c) d for c 0 ad d is a positive iteger. Gaussia kerel, RBF kerel, or Gaussia RBF kerel k(x m, x ) = e xm x 2 2 /2σ2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
26 Outlie Support vector machies 1 Admiistratio 2 Review of last lecture 3 Support vector machies Hige loss Primal formulatio of SVM Basic Lagrage duality theory Dual formulatio of SVM A very simple example 4 Geometric Uderstadig of SVM Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
27 Support vector machies Support vector machies Oe of the most commoly used machie learig algorithms. Covex optimizatio for classificatio ad regressio. It icorporates kerel tricks to defie oliear decisio boudaries or regressio fuctios. It provides theoretical guaratees o geeralizatio errors. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
28 Hige loss Support vector machies Hige loss Defiitio Aassumig the label y { 1, 1} ad the decisio rule is h(x) = sig(f(x)) with f(x) = w T φ(x) + b, { l hige 0 if yf(x) 1 (f(x), y) = 1 yf(x) otherwise Ituitio: pealize more if icorrectly classified (the left brach to the kik poit) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
29 Hige loss Support vector machies Hige loss Defiitio Aassumig the label y { 1, 1} ad the decisio rule is h(x) = sig(f(x)) with f(x) = w T φ(x) + b, { l hige 0 if yf(x) 1 (f(x), y) = 1 yf(x) otherwise Ituitio: pealize more if icorrectly classified (the left brach to the kik poit) Coveiet shorthad l hige (f(x), y) = max(0, 1 yf(x)) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
30 Support vector machies Hige loss Properties Upper-boud (above) the 0/1 loss fuctio (black lie); optimizig it leads to reduced classificatio errors amely, we use the hige loss fuctio as a surrogate to the true error fuctio we care about. This fuctio is ot differetiable at the kik poit! Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
31 Support vector machies Primal formulatio of SVM Primal formulatio of support vector machies (SVM) Miimizig the total hige loss o all the traiig data mi w,b max(0, 1 y [w T φ(x ) + b]) + λ 2 w 2 2 which is aalogous to regularized least square, which balaces two terms (the loss ad the regularizer). Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
32 Support vector machies Primal formulatio of SVM Primal formulatio of support vector machies (SVM) Miimizig the total hige loss o all the traiig data mi w,b max(0, 1 y [w T φ(x ) + b]) + λ 2 w 2 2 which is aalogous to regularized least square, which balaces two terms (the loss ad the regularizer). Covetioally, we rewrite the objective fuctio as mi w,b C max(0, 1 y [w T φ(x ) + b]) w 2 2 where C is idetified as 1/λ. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
33 Support vector machies Primal formulatio of SVM Primal formulatio of support vector machies (SVM) Miimizig the total hige loss o all the traiig data mi w,b max(0, 1 y [w T φ(x ) + b]) + λ 2 w 2 2 which is aalogous to regularized least square, which balaces two terms (the loss ad the regularizer). Covetioally, we rewrite the objective fuctio as mi w,b C max(0, 1 y [w T φ(x ) + b]) w 2 2 where C is idetified as 1/λ. We further rewrite ito aother equivalet form mi w,b,{ξ } C ξ w 2 2 s.t. max(0, 1 y [w T φ(x ) + b]) = ξ, Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
34 Support vector machies Primal formulatio of SVM Primal formulatio of SVM Primal formulatio mi w,b,{ξ } C ξ w 2 2 s.t. 1 y [w T φ(x ) + b] ξ, ξ 0, where all ξ are called slack variables. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
35 Support vector machies Primal formulatio of SVM Primal formulatio of SVM Primal formulatio mi w,b,{ξ } C ξ w 2 2 s.t. 1 y [w T φ(x ) + b] ξ, ξ 0, where all ξ are called slack variables. Remarks This is a covex quadratic programmig: the objective fuctio is quadratic i w ad the costraits are liear (iequality) costraits i w ad ξ. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
36 Support vector machies Primal formulatio of SVM Primal formulatio of SVM Primal formulatio mi w,b,{ξ } C ξ w 2 2 s.t. 1 y [w T φ(x ) + b] ξ, ξ 0, where all ξ are called slack variables. Remarks This is a covex quadratic programmig: the objective fuctio is quadratic i w ad the costraits are liear (iequality) costraits i w ad ξ. Give φ( ), we ca solve the optimizatio problem efficietly as it is covex, for example, usig Matlab s quadprog() fuctio. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
37 Support vector machies Primal formulatio of SVM Primal formulatio of SVM Primal formulatio mi w,b,{ξ } C ξ w 2 2 s.t. 1 y [w T φ(x ) + b] ξ, ξ 0, where all ξ are called slack variables. Remarks This is a covex quadratic programmig: the objective fuctio is quadratic i w ad the costraits are liear (iequality) costraits i w ad ξ. Give φ( ), we ca solve the optimizatio problem efficietly as it is covex, for example, usig Matlab s quadprog() fuctio. However, there are efficiet algorithms for solvig this problem, takig advatage of the special structures of the objective fuctio ad the costraits. (We will ot discuss them. Most existig SVM implemetatio/packages implemet such efficiet algorithms.) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
38 Support vector machies Basic Lagrage duality theory Basic Lagrage duality theory Key cocepts you should kow What do primal ad dual mea? How SVM exploits dual formulatio, thus results i usig kerel fuctios for oliear classificatio What do support vectors mea? Our roadmap We will tell you what dual looks like We will show you how it is derived Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
39 Dual formulatio Support vector machies Dual formulatio of SVM Dual is also a covex quadratic programmig max α α 1 y m y α m α φ(x m ) T φ(x ) 2 m, s.t. 0 α C, α y = 0 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
40 Dual formulatio Support vector machies Dual formulatio of SVM Dual is also a covex quadratic programmig Remarks max α α 1 y m y α m α φ(x m ) T φ(x ) 2 m, s.t. 0 α C, α y = 0 The optimizatio is covex as the objective fuctio is cocave. (Take-home exercise: please verify) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
41 Dual formulatio Support vector machies Dual formulatio of SVM Dual is also a covex quadratic programmig Remarks max α α 1 y m y α m α φ(x m ) T φ(x ) 2 m, s.t. 0 α C, α y = 0 The optimizatio is covex as the objective fuctio is cocave. (Take-home exercise: please verify) There are N dual variable α, oe for each costrait 1 y [w T φ(x ) + b]) ξ i the primal formulatio. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
42 Kerelized SVM Support vector machies Dual formulatio of SVM We replace the ier products φ(x m ) T φ(x ) with a kerel fuctio max α α 1 y m y α m α k(x m, x ) 2 m, s.t. 0 α C, α y = 0 as i kerelized liear regressio ad kererlized earest eighbor. We oly eed to defie a kerel fuctio ad we will automatically get (oliearly) mapped features ad the support vector machie costructed with those features. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
43 Support vector machies Dual formulatio of SVM Recoverig solutio to the primal formulatio Weights w = y α φ(x ) Liear combiatio of the iput features! Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
44 Support vector machies Dual formulatio of SVM Recoverig solutio to the primal formulatio Weights w = y α φ(x ) Liear combiatio of the iput features! b b = [y w T φ(x )] = [y m y m α m k(x m, x )], for ay C > α > 0 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
45 Support vector machies Dual formulatio of SVM Recoverig solutio to the primal formulatio Weights w = y α φ(x ) Liear combiatio of the iput features! b b = [y w T φ(x )] = [y m y m α m k(x m, x )], for ay C > α > 0 Makig predictio o a test poit x h(x) = sig(w T φ(x) + b) = sig( y α k(x, x) + b) Agai, to make predictio, it suffices to kow the kerel fuctio. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
46 Support vector machies Dual formulatio of SVM Derivatio of the dual We will derive the dual formulatio as the process will reveal some iterestig ad importat properties of SVM. Particularly, why is it called support vector? Recipe Formulate a Lagragia fuctio that icorporates the costraits, thru itroducig dual variables Miimize the Lagragia fuctio to solve the primal variables Put the primal variables ito the Lagragia ad express i terms of dual variables Maximize the Lagragia with respect to dual variables Recover the solutio (for the primal variables) from the dual variables Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
47 Support vector machies A very simple example A simple example Cosider the example of covex quadratic programmig mi 1 2 x2 s.t. x 0 2x 3 0 The Lagragia is (ote that we do ot have equality costraits) L(x, µ) = 1 2 x2 + µ 1 ( x) + µ 2 (2x 3) = 1 2 x2 + (2µ 2 µ 1 )x 3µ 2 uder the costrait that µ 1 0 ad µ 2 0. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
48 Support vector machies A very simple example A simple example Cosider the example of covex quadratic programmig mi 1 2 x2 s.t. x 0 2x 3 0 The Lagragia is (ote that we do ot have equality costraits) L(x, µ) = 1 2 x2 + µ 1 ( x) + µ 2 (2x 3) = 1 2 x2 + (2µ 2 µ 1 )x 3µ 2 uder the costrait that µ 1 0 ad µ 2 0. Its dual problem is max mi L(x, µ) = max mi 1 µ 1 0,µ 2 0 x µ 1 0,µ 2 0 x 2 x2 + (2µ 2 µ 1 )x 3µ 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
49 Example (cot d) Support vector machies A very simple example We solve the mi x L(x, µ) first ow it is ucostraied. The optimal x is attaied by ( 1 2 x2 + (2µ 2 µ 1 )x 3µ 2 ) x = 0 x = (2µ 2 µ 1 ) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
50 Example (cot d) Support vector machies A very simple example We solve the mi x L(x, µ) first ow it is ucostraied. The optimal x is attaied by ( 1 2 x2 + (2µ 2 µ 1 )x 3µ 2 ) x = 0 x = (2µ 2 µ 1 ) This gives us the dual objective fuctio, by substitutig the solutio ito the objective fuctio, 1 g(µ) = mi x 2 x2 + (2µ 2 µ 1 )x 3µ 2 = 1 2 (2µ 2 µ 1 ) 2 3µ 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
51 Example (cot d) Support vector machies A very simple example We solve the mi x L(x, µ) first ow it is ucostraied. The optimal x is attaied by ( 1 2 x2 + (2µ 2 µ 1 )x 3µ 2 ) x = 0 x = (2µ 2 µ 1 ) This gives us the dual objective fuctio, by substitutig the solutio ito the objective fuctio, 1 g(µ) = mi x 2 x2 + (2µ 2 µ 1 )x 3µ 2 = 1 2 (2µ 2 µ 1 ) 2 3µ 2 We get our dual problem as We will solve the dual ext. max µ 1 0,µ (2µ 2 µ 1 ) 2 3µ 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
52 Solvig the dual Support vector machies A very simple example Note that, g(µ) = 1 2 (2µ 2 µ 1 ) 2 3µ 2 0 for all µ 1 0, µ 2 0. Thus, to maximize the fuctio, the optimal solutio is µ 1 = 0, µ 2 = 0 This brigs us back the optimal solutio of x x = (2µ 2 µ 1) = 0 Namely, we have arrived at the same solutio as the oe we guessed from the primal formulatio Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
53 Support vector machies A very simple example Derivig the dual for SVM Lagragia L(w, {ξ }, {α }, {λ }) = C ξ w 2 2 λ ξ + α {1 y [w T φ(x ) + b] ξ } uder the costrait that α 0 ad λ 0. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
54 Support vector machies A very simple example Miimizig the Lagragia Takig derivatives with respect to the primal variables L w = w y α φ(x ) = 0 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
55 Support vector machies Miimizig the Lagragia A very simple example Takig derivatives with respect to the primal variables L w = w y α φ(x ) = 0 L b = α y = 0 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
56 Support vector machies Miimizig the Lagragia A very simple example Takig derivatives with respect to the primal variables L w = w y α φ(x ) = 0 L b = α y = 0 L = C λ α = 0 ξ Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
57 Support vector machies Miimizig the Lagragia A very simple example Takig derivatives with respect to the primal variables L w = w y α φ(x ) = 0 L b = α y = 0 L = C λ α = 0 ξ This gives rise to equatios likig the primal variables ad the dual variables as well as ew costraits o the dual variables: w = α y = 0 C λ α = 0 y α φ(x ) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
58 Support vector machies A very simple example Rewrite the Lagrage i terms of dual variables Substitute the solutio to the primal back ito the Lagragia g({α },{λ }) = L(w, {ξ }, {α }, {λ }) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
59 Support vector machies A very simple example Rewrite the Lagrage i terms of dual variables Substitute the solutio to the primal back ito the Lagragia g({α },{λ }) = L(w, {ξ }, {α }, {λ }) = (C α λ )ξ y α φ(x ) α ( ) + α y b ( T α y y m α m φ(x m )) φ(x ) m Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
60 Support vector machies A very simple example Rewrite the Lagrage i terms of dual variables Substitute the solutio to the primal back ito the Lagragia g({α },{λ }) = L(w, {ξ }, {α }, {λ }) = (C α λ )ξ y α φ(x ) α ( ) + α y b ( T α y y m α m φ(x m )) φ(x ) m = α y α φ(x ) 2 2 α α m y m y φ(x m ) T φ(x ) m, Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
61 Support vector machies A very simple example Rewrite the Lagrage i terms of dual variables Substitute the solutio to the primal back ito the Lagragia g({α },{λ }) = L(w, {ξ }, {α }, {λ }) = (C α λ )ξ y α φ(x ) α ( ) + α y b ( T α y y m α m φ(x m )) φ(x ) m = α y α φ(x ) 2 2 α α m y m y φ(x m ) T φ(x ) m, = α 1 α α m y m y φ(x m ) T φ(x ) 2 m, Several terms vaish because of the costraits α y = 0 ad C λ α = 0. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
62 The dual problem Support vector machies A very simple example Maximizig the dual uder the costraits max g({α }, {λ }) = α 1 y m y α m α k(x m, x ) α 2 m, s.t. α 0, α y = 0 C λ α = 0, λ 0, Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
63 The dual problem Support vector machies A very simple example Maximizig the dual uder the costraits max g({α }, {λ }) = α 1 y m y α m α k(x m, x ) α 2 m, s.t. α 0, α y = 0 C λ α = 0, λ 0, We ca simplify as the objective fuctio does ot deped o λ, thus we ca covert the equality costrait ivolvig λ with a iequality costrait o α C: α C λ = C α 0 C λ α = 0, λ 0 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
64 Fial form Support vector machies A very simple example max α α 1 y m y α m α φ(x m ) T φ(x ) 2 m, s.t. 0 α C, α y = 0 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
65 Recover the solutio Support vector machies A very simple example The primal variable w is idetified as w = α y φ(x ) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
66 Recover the solutio Support vector machies A very simple example The primal variable w is idetified as w = α y φ(x ) To idetify b, we eed somethig else. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
67 Support vector machies A very simple example Complemetary slackess ad support vectors At the optimal solutio to both primal ad dual, the followig must be satisfied for every iequality costrait (these are called KKT coditios) λ ξ = 0 α {1 ξ y [w T φ(x ) + b]} = 0 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
68 Support vector machies A very simple example Complemetary slackess ad support vectors At the optimal solutio to both primal ad dual, the followig must be satisfied for every iequality costrait (these are called KKT coditios) λ ξ = 0 α {1 ξ y [w T φ(x ) + b]} = 0 From the first coditio, if α < C, the λ = C α > 0 ξ = 0 Thus, i cojuctio with the secod coditio, we kow that, if C > α > 0, the as y { 1, 1}. 1 y [w T φ(x ) + b] = 0 b = y w T φ(x ) For those whose α > 0, we call such traiig samples as support vectors. (We will discuss their geometric iterpretatio later). Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
69 Outlie Geometric Uderstadig of SVM 1 Admiistratio 2 Review of last lecture 3 Support vector machies 4 Geometric Uderstadig of SVM Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
70 Geometric Uderstadig of SVM Ituitio: where to put the decisio boudary? Cosider the biary classificatio i the followig figure. We have assumed, for coveiece, that the traiig dataset is separable there is a decisio boudary that separates the two classes perfectly. H H H There are ifiite may ways of puttig the decisio boudary H : w T φ(x) + b = 0! Our ituitio is, however, to put the decisio boudary to be i the middle of the two classes as much as possible. I other words, we wat the decisio boudary is to be far to every poit as much as possible as log as the decisio boudary classifies every poit correctly. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
71 Distaces Geometric Uderstadig of SVM The distace from a poit φ(x) to the decisio boudary is d H (φ(x)) = wt φ(x) + b w 2 (We have derived the above i the recitatio/quiz0. Please re-verify it as a take-home exercise). We ca remove the absolute by exploitig the fact that the decisio boudary classifies every poit i the traiig dataset correctly. Namely, (w T φ(x) + b) ad x s label y are of the same sig. The distace is ow, d H (φ(x)) = y[wt φ(x) + b] w 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
72 Maximizig margi Geometric Uderstadig of SVM Margi The margi is defied as the smallest distace from all the traiig poits y [w T φ(x ) + b] margi = mi w 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
73 Maximizig margi Geometric Uderstadig of SVM Margi The margi is defied as the smallest distace from all the traiig poits y [w T φ(x ) + b] margi = mi w 2 Sice we are iterested i fidig a w to put all poits as distat as possible from the decisio boudary, we maximize the margi max w y [w T φ(x ) + b] 1 mi = max mi y [w T φ(x ) + b] w w w 2 H : w T φ(x)+b =0 w T φ(x)+b w 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
74 Rescaled margi Geometric Uderstadig of SVM Sice the margi does ot chage if we scale (w, b) by a costat factor c ( as w T φ(x) + b = 0 ad (cw) T φ(x) + (cb) = 0 are the same decisio boudary), we fix the scale by forcig mi y [w T φ(x ) + b] = 1 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
75 Geometric Uderstadig of SVM Rescaled margi Sice the margi does ot chage if we scale (w, b) by a costat factor c ( as w T φ(x) + b = 0 ad (cw) T φ(x) + (cb) = 0 are the same decisio boudary), we fix the scale by forcig I this case, our margi becomes mi y [w T φ(x ) + b] = 1 margi = 1 w 2 precisely, the closest poit to the decisio boudary has a distace of that. w T φ(x)+b =1 H : w T φ(x)+b =0 1 w2 w T φ(x)+b = 1 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
76 Primal formulatio Geometric Uderstadig of SVM Combiig everythig we have, for a separable traiig dataset, we aim to max w This is equivalet to 1 w 2 such that y [w T φ(x ) + b] 1, 1 mi w 2 w 2 2 s.t. y [w T φ(x ) + b] 1, This starts to look like our first formulatio for SVMs. For this geometric ituitio, SVM is called max margi (or large margi) classifier. The costraits are called large margi costraits. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
77 Geometric Uderstadig of SVM SVM for o-separable data Suppose there are traiig data poits that caot be classified correctly o matter how we choose w. For those data poits, y [w T φ(x ) + b] 0 for ay w. Thus, the previous costrait y [w T φ(x ) + b] 1, is o loger feasible. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
78 Geometric Uderstadig of SVM SVM for o-separable data Suppose there are traiig data poits that caot be classified correctly o matter how we choose w. For those data poits, y [w T φ(x ) + b] 0 for ay w. Thus, the previous costrait y [w T φ(x ) + b] 1, is o loger feasible. To deal with this issue, we itroduce slack variables ξ to help y [w T φ(x ) + b] 1 ξ, where we also require ξ 0. Note that, eve for hard poits that caot be classified correctly, the slack variable will be able to make them satisfy the above costrait (we ca keep icreasig ξ util the above iequality is met.) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
79 Geometric Uderstadig of SVM SVM Primal formulatio with slack variables We obviously do ot wat ξ goes to ifiity, so we balace their sizes by pealizig them toward zero as much as possible mi w 1 2 w C ξ s.t. y [w T φ(x ) + b] 1 ξ, ξ 0, where C is our tradeoff (hyper)parameter. This is precisely the primal formulatio we first got for SVM. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
80 Geometric Uderstadig of SVM Meaig of support vectors i SVMs Complemetary slackess At optimum, we have to have α {1 ξ y [w T φ(x ) + b]} = 0, That meas, for some, α = 0. Additioally, our optimal solutio is give by w = α y φ(x ) = α y φ(x ) :α >0 I words, our solutio is oly determied by those traiig samples whose correspodig α is strictly positive. Those samples are called support vectors. No-support vectors whose α = 0 ca be removed by the traiig dataset this removal will ot affect the optimal solutio (i.e., after the removal, if we costruct aother SVM classifier o the reduced dataset, the optimal solutio is the same as the oe o the origial dataset.) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
81 Geometric Uderstadig of SVM Who are support vectors? Case aalysis Sice, we have We have 1 ξ y [w T φ(x ) + b]} = 0 ξ = 0. This implies y [w T φ(x ) + b] = 1. They are o poits that are 1/ w 2 away from the decisio boudary. ξ < 1. These are poits that ca be classified correctly but do ot satisfy the large margi costrait they have smaller distaces to the decisio boudary. ξ > 1. These are poits that are misclassified. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
82 Geometric Uderstadig of SVM Visualizatio of how traiig data poits are categorized w T φ(x)+b =1 H : w T φ(x)+b =0 ξ < 1 ξ > 1 w T φ(x)+b = 1 ξ =0 Support vectors are those beig circled with the orage lie. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49
CSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More informationLinear Support Vector Machines
Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationChapter 7. Support Vector Machine
Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)
More informationLinear Classifiers III
Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models
More informationSupport Vector Machines and Kernel Methods
Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric
More informationDefinitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.
Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,
More informationOptimization Methods MIT 2.098/6.255/ Final exam
Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More information1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationMachine Learning. Ilya Narsky, Caltech
Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationb i u x i U a i j u x i u x j
M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationLecture 7: Density Estimation: k-nearest Neighbor and Basis Approach
STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationIP Reference guide for integer programming formulations.
IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationLecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)
Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell
More informationTEACHER CERTIFICATION STUDY GUIDE
COMPETENCY 1. ALGEBRA SKILL 1.1 1.1a. ALGEBRAIC STRUCTURES Kow why the real ad complex umbers are each a field, ad that particular rigs are ot fields (e.g., itegers, polyomial rigs, matrix rigs) Algebra
More information1 The Primal and Dual of an Optimization Problem
CS 189 Itroductio to Machie Learig Fall 2017 Note 18 Previously, i our ivestigatio of SVMs, we forulated a costraied optiizatio proble that we ca solve to fid the optial paraeters for our hyperplae decisio
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationThe Method of Least Squares. To understand least squares fitting of data.
The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve
More informationSVM for Statisticians
SVM for Statisticias Youyi Fog Fred Hutchiso Cacer Research Istitute November 13, 2011 1 / 21 Primal Problem ad Pealized Loss Fuctio Miimize J over b, β ad ξ uder some costraits J = 1 2 β 2 + C ξ i (1)
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationSequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationA sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationPolynomials with Rational Roots that Differ by a Non-zero Constant. Generalities
Polyomials with Ratioal Roots that Differ by a No-zero Costat Philip Gibbs The problem of fidig two polyomials P(x) ad Q(x) of a give degree i a sigle variable x that have all ratioal roots ad differ by
More informationOutline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019
Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationRecurrence Relations
Recurrece Relatios Aalysis of recursive algorithms, such as: it factorial (it ) { if (==0) retur ; else retur ( * factorial(-)); } Let t be the umber of multiplicatios eeded to calculate factorial(). The
More informationMIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS
MIDTERM 3 CALCULUS MATH 300 FALL 08 Moday, December 3, 08 5:5 PM to 6:45 PM Name PRACTICE EXAM S Please aswer all of the questios, ad show your work. You must explai your aswers to get credit. You will
More informationLesson 10: Limits and Continuity
www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals
More informationECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations
ECE-S352 Itroductio to Digital Sigal Processig Lecture 3A Direct Solutio of Differece Equatios Discrete Time Systems Described by Differece Equatios Uit impulse (sample) respose h() of a DT system allows
More informationIntelligent Systems I 08 SVM
Itelliget Systems I 08 SVM Stefa Harmelig & Philipp Heig 12. December 2013 Max Plack Istitute for Itelliget Systems Dptmt. of Empirical Iferece 1 / 30 Your feeback Ejoye most Laplace approximatio gettig
More informationIntroduction to Optimization Techniques. How to Solve Equations
Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More information10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice
0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct
More informationSummary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.
Key Cocepts: 1) Sketchig of scatter diagram The scatter diagram of bivariate (i.e. cotaiig two variables) data ca be easily obtaied usig GC. Studets are advised to refer to lecture otes for the GC operatios
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationAdmin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)
Admi Assigmet 5! Starter REGULARIZATION David Kauchak CS 158 Fall 2016 Schedule Midterm ext week, due Friday (more o this i 1 mi Assigmet 6 due Friday before fall break Midterm Dowload from course web
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More informationThe Growth of Functions. Theoretical Supplement
The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that
More informationDifferentiable Convex Functions
Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for
More informationQuestions and answers, kernel part
Questios ad aswers, kerel part October 8, 205 Questios. Questio : properties of kerels, PCA, represeter theorem. [2 poits] Let F be a RK defied o some domai X, with feature map φ(x) x X ad reproducig kerel
More informationMath 155 (Lecture 3)
Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationThe picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled
1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationLecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)
Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +
More informationECON 3150/4150, Spring term Lecture 3
Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio
More informationProperties and Tests of Zeros of Polynomial Functions
Properties ad Tests of Zeros of Polyomial Fuctios The Remaider ad Factor Theorems: Sythetic divisio ca be used to fid the values of polyomials i a sometimes easier way tha substitutio. This is show by
More information(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3
MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special
More informationOptimization Methods: Linear Programming Applications Assignment Problem 1. Module 4 Lecture Notes 3. Assignment Problem
Optimizatio Methods: Liear Programmig Applicatios Assigmet Problem Itroductio Module 4 Lecture Notes 3 Assigmet Problem I the previous lecture, we discussed about oe of the bech mark problems called trasportatio
More information1 Approximating Integrals using Taylor Polynomials
Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................
More information4x 2. (n+1) x 3 n+1. = lim. 4x 2 n+1 n3 n. n 4x 2 = lim = 3
Exam Problems (x. Give the series (, fid the values of x for which this power series coverges. Also =0 state clearly what the radius of covergece is. We start by settig up the Ratio Test: x ( x x ( x x
More informationSequences. Notation. Convergence of a Sequence
Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it
More informationInformation-based Feature Selection
Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More information18.01 Calculus Jason Starr Fall 2005
Lecture 18. October 5, 005 Homework. Problem Set 5 Part I: (c). Practice Problems. Course Reader: 3G 1, 3G, 3G 4, 3G 5. 1. Approximatig Riema itegrals. Ofte, there is o simpler expressio for the atiderivative
More informationMa 530 Infinite Series I
Ma 50 Ifiite Series I Please ote that i additio to the material below this lecture icorporated material from the Visual Calculus web site. The material o sequeces is at Visual Sequeces. (To use this li
More informationsubcaptionfont+=small,labelformat=parens,labelsep=space,skip=6pt,list=0,hypcap=0 subcaption ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, 2/16/2016
subcaptiofot+=small,labelformat=pares,labelsep=space,skip=6pt,list=0,hypcap=0 subcaptio ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, /6/06. Self-cojugate Partitios Recall that, give a partitio λ, we may
More informationMa 530 Introduction to Power Series
Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power
More informationClustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.
Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.
More informationPROBLEM SET 5 SOLUTIONS 126 = , 37 = , 15 = , 7 = 7 1.
Math 7 Sprig 06 PROBLEM SET 5 SOLUTIONS Notatios. Give a real umber x, we will defie sequeces (a k ), (x k ), (p k ), (q k ) as i lecture.. (a) (5 pts) Fid the simple cotiued fractio represetatios of 6
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationLecture Notes for Analysis Class
Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios
More informationHomework Set #3 - Solutions
EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm
More informationBeurling Integers: Part 2
Beurlig Itegers: Part 2 Isomorphisms Devi Platt July 11, 2015 1 Prime Factorizatio Sequeces I the last article we itroduced the Beurlig geeralized itegers, which ca be represeted as a sequece of real umbers
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationNotes on iteration and Newton s method. Iteration
Notes o iteratio ad Newto s method Iteratio Iteratio meas doig somethig over ad over. I our cotet, a iteratio is a sequece of umbers, vectors, fuctios, etc. geerated by a iteratio rule of the type 1 f
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationLinear Programming and the Simplex Method
Liear Programmig ad the Simplex ethod Abstract This article is a itroductio to Liear Programmig ad usig Simplex method for solvig LP problems i primal form. What is Liear Programmig? Liear Programmig is
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationInverse Matrix. A meaning that matrix B is an inverse of matrix A.
Iverse Matrix Two square matrices A ad B of dimesios are called iverses to oe aother if the followig holds, AB BA I (11) The otio is dual but we ofte write 1 B A meaig that matrix B is a iverse of matrix
More information1 Generating functions for balls in boxes
Math 566 Fall 05 Some otes o geeratig fuctios Give a sequece a 0, a, a,..., a,..., a geeratig fuctio some way of represetig the sequece as a fuctio. There are may ways to do this, with the most commo ways
More informationThe z-transform. 7.1 Introduction. 7.2 The z-transform Derivation of the z-transform: x[n] = z n LTI system, h[n] z = re j
The -Trasform 7. Itroductio Geeralie the complex siusoidal represetatio offered by DTFT to a represetatio of complex expoetial sigals. Obtai more geeral characteristics for discrete-time LTI systems. 7.
More information4. Linear Classification. Kai Yu
4. Liear Classificatio Kai Y Liear Classifiers A simplest classificatio model Help to derstad oliear models Argably the most sefl classificatio method! 2 Liear Classifiers A simplest classificatio model
More informationChapter 7 Isoperimetric problem
Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated
More informationMarkov Decision Processes
Markov Decisio Processes Defiitios; Statioary policies; Value improvemet algorithm, Policy improvemet algorithm, ad liear programmig for discouted cost ad average cost criteria. Markov Decisio Processes
More informationEnumerative & Asymptotic Combinatorics
C50 Eumerative & Asymptotic Combiatorics Stirlig ad Lagrage Sprig 2003 This sectio of the otes cotais proofs of Stirlig s formula ad the Lagrage Iversio Formula. Stirlig s formula Theorem 1 (Stirlig s
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More informationMath Solutions to homework 6
Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there
More information