CSCI567 Machine Learning (Fall 2014)

Size: px
Start display at page:

Download "CSCI567 Machine Learning (Fall 2014)"

Transcription

1 CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu October 9, 2014 Drs. Sha & Liu CSCI567 Machie Learig (Fall 2014) October 9, / 49

2 Outlie Admiistratio 1 Admiistratio 2 Review of last lecture 3 Support vector machies 4 Geometric Uderstadig of SVM Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

3 Quiz #1 Admiistratio Tuesday Oct pm TTH301 Some exceptios are hadled case by case. Drs. Sha & Liu CSCI567 Machie Learig (Fall 2014) October 9, / 49

4 Outlie Review of last lecture 1 Admiistratio 2 Review of last lecture Kerel methods Kerelized machie learig methods 3 Support vector machies 4 Geometric Uderstadig of SVM Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

5 Review of last lecture Kerel methods How to to do oliear predictio without specifyig oliear basis fuctios? Defiitio of kerel fuctio: a (positive semidefiite) kerel fuctio k(, ) is a bivariate fuctio that satisfies the followig properties. For ay x m ad x, k(x m, x ) = k(x, x m ) ad k(x m, x ) = φ(x m ) T φ(x ) for some fuctio φ( ). Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

6 Review of last lecture Kerel methods How to to do oliear predictio without specifyig oliear basis fuctios? Defiitio of kerel fuctio: a (positive semidefiite) kerel fuctio k(, ) is a bivariate fuctio that satisfies the followig properties. For ay x m ad x, k(x m, x ) = k(x, x m ) ad k(x m, x ) = φ(x m ) T φ(x ) for some fuctio φ( ). Examples we have see k(x m, x ) = (x T mx ) 2 k(x m, x ) = 2 si(2π(x m1 x 1 )) x m1 x 1 si(2π(x m2 x 2 )) x m2 x 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

7 Review of last lecture Kerel methods Coditios for beig a positive semidefiite kerel fuctio Mercer theorem (loosely), a bivariate fuctio k(, ) is a positive semidefiite kerel fuctio, if ad oly if, for ay N ad ay x 1, x 2,..., ad x N, the matrix K = k(x 1, x 1 ) k(x 1, x 2 ) k(x 1, x N ) k(x 2, x 1 ) k(x 2, x 2 ) k(x 2, x N ).... k(x N, x 1 ) k(x N, x 2 ) k(x N, x N ) is positive semidefiite. We also refer k(, ) as a positive semidefiite kerel. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

8 Review of last lecture Kerel methods Flashback: why usig kerel fuctios? without specifyig φ( ), the kerel matrix k(x 1, x 1 ) k(x 1, x 2 ) k(x 1, x N ) k(x 2, x 1 ) k(x 2, x 2 ) k(x 2, x N ) K =.... k(x N, x 1 ) k(x N, x 2 ) k(x N, x N ) is exactly the same as K = ΦΦ T = φ(x 1 ) T φ(x 1 ) φ(x 1 ) T φ(x 2 ) φ(x 1 ) T φ(x N ) φ(x 2 ) T φ(x 1 ) φ(x 2 ) T φ(x 2 ) φ(x 2 ) T φ(x N ) φ(x N ) T φ(x 1 ) φ(x N ) T φ(x 2 ) φ(x N ) T φ(x N ) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

9 Kerel fuctios Review of last lecture Kerel methods Defiitio: a (positive semidefiite) kerel fuctio k(, ) is a bivariate fuctio that satisfies the followig properties. For ay x m ad x, k(x m, x ) = k(x, x m ) ad k(x m, x ) = φ(x m ) T φ(x ) for some fuctio φ( ). Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

10 Review of last lecture Kerel methods Kerel fuctios Defiitio: a (positive semidefiite) kerel fuctio k(, ) is a bivariate fuctio that satisfies the followig properties. For ay x m ad x, k(x m, x ) = k(x, x m ) ad k(x m, x ) = φ(x m ) T φ(x ) for some fuctio φ( ). Examples we have see k(x m, x ) = (x T mx ) 2 k(x m, x ) = 2 si(2π(x m1 x 1 )) x m1 x 1 si(2π(x m2 x 2 )) x m2 x 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

11 Review of last lecture Kerel methods Kerel fuctios Defiitio: a (positive semidefiite) kerel fuctio k(, ) is a bivariate fuctio that satisfies the followig properties. For ay x m ad x, k(x m, x ) = k(x, x m ) ad k(x m, x ) = φ(x m ) T φ(x ) for some fuctio φ( ). Examples we have see k(x m, x ) = (x T mx ) 2 k(x m, x ) = 2 si(2π(x m1 x 1 )) x m1 x 1 si(2π(x m2 x 2 )) x m2 x 2 Examples that are ot kerels k(x m, x ) = x m x 2 2 are ot our desired kerel fuctio as it caot be writte as ier products betwee two vectors. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

12 Review of last lecture Kerel methods Coditios for beig a positive semidefiite kerel fuctio Mercer theorem (loosely), a bivariate fuctio k(, ) is a positive semidefiite kerel fuctio, if ad oly if, for ay N ad ay x 1, x 2,..., ad x N, the matrix K = k(x 1, x 1 ) k(x 1, x 2 ) k(x 1, x N ) k(x 2, x 1 ) k(x 2, x 2 ) k(x 2, x N ).... k(x N, x 1 ) k(x N, x 2 ) k(x N, x N ) is positive semidefiite. We also refer k(, ) as a positive semidefiite kerel. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

13 Review of last lecture Kerel methods Flashback: why usig kerel fuctios? without specifyig φ( ), the kerel matrix k(x 1, x 1 ) k(x 1, x 2 ) k(x 1, x N ) k(x 2, x 1 ) k(x 2, x 2 ) k(x 2, x N ) K =.... k(x N, x 1 ) k(x N, x 2 ) k(x N, x N ) is exactly the same as K = ΦΦ T = φ(x 1 ) T φ(x 1 ) φ(x 1 ) T φ(x 2 ) φ(x 1 ) T φ(x N ) φ(x 2 ) T φ(x 1 ) φ(x 2 ) T φ(x 2 ) φ(x 2 ) T φ(x N ) φ(x N ) T φ(x 1 ) φ(x N ) T φ(x 2 ) φ(x N ) T φ(x N ) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

14 Review of last lecture Examples of kerel fuctios Kerel methods Polyomial kerel fuctio with degree of d for c 0 ad d is a positive iteger. k(x m, x ) = (x T mx + c) d Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

15 Review of last lecture Examples of kerel fuctios Kerel methods Polyomial kerel fuctio with degree of d k(x m, x ) = (x T mx + c) d for c 0 ad d is a positive iteger. Gaussia kerel, RBF kerel, or Gaussia RBF kerel k(x m, x ) = e xm x 2 2 /2σ2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

16 Review of last lecture Examples of kerel fuctios Kerel methods Polyomial kerel fuctio with degree of d k(x m, x ) = (x T mx + c) d for c 0 ad d is a positive iteger. Gaussia kerel, RBF kerel, or Gaussia RBF kerel k(x m, x ) = e xm x 2 2 /2σ2 Most of those kerels have parameters to be tued: d, c, σ 2, etc. They are hyper parameters ad are ofte tued o holdout data or with cross-validatio. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

17 Review of last lecture Kerel methods Why x m x 2 2 is ot a positive semidefiite kerel? Use the defiitio of positive semidefiite kerel fuctio. We choose N = 2, ad compute the matrix ( 0 x K = 1 x 2 2 ) 2 x 1 x This matrix caot be positive semidefiite as it has both egative ad positive eigevalues (the sum of the diagoal elemets is called the trace of a matrix, which equals to the sum of the matrix s eigevalues. I our case, the trace is zero.) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

18 Review of last lecture Kerel methods There are ifiite umbers of kerels to use! Rules of composig kerels (this is just a partial list) if k(x m, x ) is a kerel, the ck(x m, x ) is also if c > 0. if both k 1 (x m, x ) ad k 2 (x m, x ) are kerels, the αk 1 (x m, x ) + βk 2 (x m, x ) are also if α, β 0 if both k 1 (x m, x ) ad k 2 (x m, x ) are kerels, the k 1 (x m, x )k 2 (x m, x ) are also. if k(x m, x ) is a kerel, the e k(xm,x) is also. I practice, usig which kerel, or which kerels to compose a ew kerel, remais somewhat as black art, though most people will start with polyomial ad Gaussia RBF kerels. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

19 Kerelizatio trick Review of last lecture Kerelized machie learig methods May learig methods deped o computig ier products betwee features we have see the example of regularized least squares. For those methods, we ca use a kerel fuctio i the place of the ier products, i.e., kererlizig the methods, thus, itroducig oliear features/basis. We will preset oe more to illustrate this trick by kererlizig earest eighbor classifier. Whe we talk about support vector machies ext lecture, we will see the trick oe more time. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

20 Review of last lecture Kerelized earest eighbor classifier Kerelized machie learig methods I earest eighbor classifier, the most importat quatity to compute is the (squared) distace betwee two data poits x m ad x d(x m, x ) = x m x 2 2 = x T mx m + x T x 2x T mx Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

21 Review of last lecture Kerelized earest eighbor classifier Kerelized machie learig methods I earest eighbor classifier, the most importat quatity to compute is the (squared) distace betwee two data poits x m ad x d(x m, x ) = x m x 2 2 = x T mx m + x T x 2x T mx We replace all the ier products i the distace with a kerel fuctio k(, ), arrivig at the kereled distace d kerel (x m, x ) = k(x m, x m ) + k(x, x ) 2k(x m, x ) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

22 Review of last lecture Kerelized earest eighbor classifier Kerelized machie learig methods I earest eighbor classifier, the most importat quatity to compute is the (squared) distace betwee two data poits x m ad x d(x m, x ) = x m x 2 2 = x T mx m + x T x 2x T mx We replace all the ier products i the distace with a kerel fuctio k(, ), arrivig at the kereled distace d kerel (x m, x ) = k(x m, x m ) + k(x, x ) 2k(x m, x ) The distace is equivalet to compute the distace betwee φ(x m ) ad φ(x ) d kerel (x m, x ) = d(φ(x m ), φ(x )) where the φ( ) is the oliear mappig fuctio implied by the kerel fuctio. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

23 Review of last lecture Kerelized earest eighbor classifier Kerelized machie learig methods I earest eighbor classifier, the most importat quatity to compute is the (squared) distace betwee two data poits x m ad x d(x m, x ) = x m x 2 2 = x T mx m + x T x 2x T mx We replace all the ier products i the distace with a kerel fuctio k(, ), arrivig at the kereled distace d kerel (x m, x ) = k(x m, x m ) + k(x, x ) 2k(x m, x ) The distace is equivalet to compute the distace betwee φ(x m ) ad φ(x ) d kerel (x m, x ) = d(φ(x m ), φ(x )) where the φ( ) is the oliear mappig fuctio implied by the kerel fuctio. The earest eighbor of a poit x is thus foud with arg mi d kerel (x, x ) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

24 Review of last lecture Kerelized machie learig methods Take-home exercise You have see examples of kerelizig liear regressio earest eighbor But ca you kerelize the followig? Decisio tree Logistic (or multiomial logistic) regressio Drs. Sha & Liu CSCI567 Machie Learig (Fall 2014) October 9, / 49

25 Review of last lecture Examples of kerel fuctios Kerelized machie learig methods Polyomial kerel fuctio with degree of d k(x m, x ) = (x T mx + c) d for c 0 ad d is a positive iteger. Gaussia kerel, RBF kerel, or Gaussia RBF kerel k(x m, x ) = e xm x 2 2 /2σ2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

26 Outlie Support vector machies 1 Admiistratio 2 Review of last lecture 3 Support vector machies Hige loss Primal formulatio of SVM Basic Lagrage duality theory Dual formulatio of SVM A very simple example 4 Geometric Uderstadig of SVM Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

27 Support vector machies Support vector machies Oe of the most commoly used machie learig algorithms. Covex optimizatio for classificatio ad regressio. It icorporates kerel tricks to defie oliear decisio boudaries or regressio fuctios. It provides theoretical guaratees o geeralizatio errors. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

28 Hige loss Support vector machies Hige loss Defiitio Aassumig the label y { 1, 1} ad the decisio rule is h(x) = sig(f(x)) with f(x) = w T φ(x) + b, { l hige 0 if yf(x) 1 (f(x), y) = 1 yf(x) otherwise Ituitio: pealize more if icorrectly classified (the left brach to the kik poit) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

29 Hige loss Support vector machies Hige loss Defiitio Aassumig the label y { 1, 1} ad the decisio rule is h(x) = sig(f(x)) with f(x) = w T φ(x) + b, { l hige 0 if yf(x) 1 (f(x), y) = 1 yf(x) otherwise Ituitio: pealize more if icorrectly classified (the left brach to the kik poit) Coveiet shorthad l hige (f(x), y) = max(0, 1 yf(x)) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

30 Support vector machies Hige loss Properties Upper-boud (above) the 0/1 loss fuctio (black lie); optimizig it leads to reduced classificatio errors amely, we use the hige loss fuctio as a surrogate to the true error fuctio we care about. This fuctio is ot differetiable at the kik poit! Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

31 Support vector machies Primal formulatio of SVM Primal formulatio of support vector machies (SVM) Miimizig the total hige loss o all the traiig data mi w,b max(0, 1 y [w T φ(x ) + b]) + λ 2 w 2 2 which is aalogous to regularized least square, which balaces two terms (the loss ad the regularizer). Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

32 Support vector machies Primal formulatio of SVM Primal formulatio of support vector machies (SVM) Miimizig the total hige loss o all the traiig data mi w,b max(0, 1 y [w T φ(x ) + b]) + λ 2 w 2 2 which is aalogous to regularized least square, which balaces two terms (the loss ad the regularizer). Covetioally, we rewrite the objective fuctio as mi w,b C max(0, 1 y [w T φ(x ) + b]) w 2 2 where C is idetified as 1/λ. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

33 Support vector machies Primal formulatio of SVM Primal formulatio of support vector machies (SVM) Miimizig the total hige loss o all the traiig data mi w,b max(0, 1 y [w T φ(x ) + b]) + λ 2 w 2 2 which is aalogous to regularized least square, which balaces two terms (the loss ad the regularizer). Covetioally, we rewrite the objective fuctio as mi w,b C max(0, 1 y [w T φ(x ) + b]) w 2 2 where C is idetified as 1/λ. We further rewrite ito aother equivalet form mi w,b,{ξ } C ξ w 2 2 s.t. max(0, 1 y [w T φ(x ) + b]) = ξ, Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

34 Support vector machies Primal formulatio of SVM Primal formulatio of SVM Primal formulatio mi w,b,{ξ } C ξ w 2 2 s.t. 1 y [w T φ(x ) + b] ξ, ξ 0, where all ξ are called slack variables. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

35 Support vector machies Primal formulatio of SVM Primal formulatio of SVM Primal formulatio mi w,b,{ξ } C ξ w 2 2 s.t. 1 y [w T φ(x ) + b] ξ, ξ 0, where all ξ are called slack variables. Remarks This is a covex quadratic programmig: the objective fuctio is quadratic i w ad the costraits are liear (iequality) costraits i w ad ξ. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

36 Support vector machies Primal formulatio of SVM Primal formulatio of SVM Primal formulatio mi w,b,{ξ } C ξ w 2 2 s.t. 1 y [w T φ(x ) + b] ξ, ξ 0, where all ξ are called slack variables. Remarks This is a covex quadratic programmig: the objective fuctio is quadratic i w ad the costraits are liear (iequality) costraits i w ad ξ. Give φ( ), we ca solve the optimizatio problem efficietly as it is covex, for example, usig Matlab s quadprog() fuctio. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

37 Support vector machies Primal formulatio of SVM Primal formulatio of SVM Primal formulatio mi w,b,{ξ } C ξ w 2 2 s.t. 1 y [w T φ(x ) + b] ξ, ξ 0, where all ξ are called slack variables. Remarks This is a covex quadratic programmig: the objective fuctio is quadratic i w ad the costraits are liear (iequality) costraits i w ad ξ. Give φ( ), we ca solve the optimizatio problem efficietly as it is covex, for example, usig Matlab s quadprog() fuctio. However, there are efficiet algorithms for solvig this problem, takig advatage of the special structures of the objective fuctio ad the costraits. (We will ot discuss them. Most existig SVM implemetatio/packages implemet such efficiet algorithms.) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

38 Support vector machies Basic Lagrage duality theory Basic Lagrage duality theory Key cocepts you should kow What do primal ad dual mea? How SVM exploits dual formulatio, thus results i usig kerel fuctios for oliear classificatio What do support vectors mea? Our roadmap We will tell you what dual looks like We will show you how it is derived Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

39 Dual formulatio Support vector machies Dual formulatio of SVM Dual is also a covex quadratic programmig max α α 1 y m y α m α φ(x m ) T φ(x ) 2 m, s.t. 0 α C, α y = 0 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

40 Dual formulatio Support vector machies Dual formulatio of SVM Dual is also a covex quadratic programmig Remarks max α α 1 y m y α m α φ(x m ) T φ(x ) 2 m, s.t. 0 α C, α y = 0 The optimizatio is covex as the objective fuctio is cocave. (Take-home exercise: please verify) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

41 Dual formulatio Support vector machies Dual formulatio of SVM Dual is also a covex quadratic programmig Remarks max α α 1 y m y α m α φ(x m ) T φ(x ) 2 m, s.t. 0 α C, α y = 0 The optimizatio is covex as the objective fuctio is cocave. (Take-home exercise: please verify) There are N dual variable α, oe for each costrait 1 y [w T φ(x ) + b]) ξ i the primal formulatio. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

42 Kerelized SVM Support vector machies Dual formulatio of SVM We replace the ier products φ(x m ) T φ(x ) with a kerel fuctio max α α 1 y m y α m α k(x m, x ) 2 m, s.t. 0 α C, α y = 0 as i kerelized liear regressio ad kererlized earest eighbor. We oly eed to defie a kerel fuctio ad we will automatically get (oliearly) mapped features ad the support vector machie costructed with those features. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

43 Support vector machies Dual formulatio of SVM Recoverig solutio to the primal formulatio Weights w = y α φ(x ) Liear combiatio of the iput features! Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

44 Support vector machies Dual formulatio of SVM Recoverig solutio to the primal formulatio Weights w = y α φ(x ) Liear combiatio of the iput features! b b = [y w T φ(x )] = [y m y m α m k(x m, x )], for ay C > α > 0 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

45 Support vector machies Dual formulatio of SVM Recoverig solutio to the primal formulatio Weights w = y α φ(x ) Liear combiatio of the iput features! b b = [y w T φ(x )] = [y m y m α m k(x m, x )], for ay C > α > 0 Makig predictio o a test poit x h(x) = sig(w T φ(x) + b) = sig( y α k(x, x) + b) Agai, to make predictio, it suffices to kow the kerel fuctio. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

46 Support vector machies Dual formulatio of SVM Derivatio of the dual We will derive the dual formulatio as the process will reveal some iterestig ad importat properties of SVM. Particularly, why is it called support vector? Recipe Formulate a Lagragia fuctio that icorporates the costraits, thru itroducig dual variables Miimize the Lagragia fuctio to solve the primal variables Put the primal variables ito the Lagragia ad express i terms of dual variables Maximize the Lagragia with respect to dual variables Recover the solutio (for the primal variables) from the dual variables Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

47 Support vector machies A very simple example A simple example Cosider the example of covex quadratic programmig mi 1 2 x2 s.t. x 0 2x 3 0 The Lagragia is (ote that we do ot have equality costraits) L(x, µ) = 1 2 x2 + µ 1 ( x) + µ 2 (2x 3) = 1 2 x2 + (2µ 2 µ 1 )x 3µ 2 uder the costrait that µ 1 0 ad µ 2 0. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

48 Support vector machies A very simple example A simple example Cosider the example of covex quadratic programmig mi 1 2 x2 s.t. x 0 2x 3 0 The Lagragia is (ote that we do ot have equality costraits) L(x, µ) = 1 2 x2 + µ 1 ( x) + µ 2 (2x 3) = 1 2 x2 + (2µ 2 µ 1 )x 3µ 2 uder the costrait that µ 1 0 ad µ 2 0. Its dual problem is max mi L(x, µ) = max mi 1 µ 1 0,µ 2 0 x µ 1 0,µ 2 0 x 2 x2 + (2µ 2 µ 1 )x 3µ 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

49 Example (cot d) Support vector machies A very simple example We solve the mi x L(x, µ) first ow it is ucostraied. The optimal x is attaied by ( 1 2 x2 + (2µ 2 µ 1 )x 3µ 2 ) x = 0 x = (2µ 2 µ 1 ) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

50 Example (cot d) Support vector machies A very simple example We solve the mi x L(x, µ) first ow it is ucostraied. The optimal x is attaied by ( 1 2 x2 + (2µ 2 µ 1 )x 3µ 2 ) x = 0 x = (2µ 2 µ 1 ) This gives us the dual objective fuctio, by substitutig the solutio ito the objective fuctio, 1 g(µ) = mi x 2 x2 + (2µ 2 µ 1 )x 3µ 2 = 1 2 (2µ 2 µ 1 ) 2 3µ 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

51 Example (cot d) Support vector machies A very simple example We solve the mi x L(x, µ) first ow it is ucostraied. The optimal x is attaied by ( 1 2 x2 + (2µ 2 µ 1 )x 3µ 2 ) x = 0 x = (2µ 2 µ 1 ) This gives us the dual objective fuctio, by substitutig the solutio ito the objective fuctio, 1 g(µ) = mi x 2 x2 + (2µ 2 µ 1 )x 3µ 2 = 1 2 (2µ 2 µ 1 ) 2 3µ 2 We get our dual problem as We will solve the dual ext. max µ 1 0,µ (2µ 2 µ 1 ) 2 3µ 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

52 Solvig the dual Support vector machies A very simple example Note that, g(µ) = 1 2 (2µ 2 µ 1 ) 2 3µ 2 0 for all µ 1 0, µ 2 0. Thus, to maximize the fuctio, the optimal solutio is µ 1 = 0, µ 2 = 0 This brigs us back the optimal solutio of x x = (2µ 2 µ 1) = 0 Namely, we have arrived at the same solutio as the oe we guessed from the primal formulatio Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

53 Support vector machies A very simple example Derivig the dual for SVM Lagragia L(w, {ξ }, {α }, {λ }) = C ξ w 2 2 λ ξ + α {1 y [w T φ(x ) + b] ξ } uder the costrait that α 0 ad λ 0. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

54 Support vector machies A very simple example Miimizig the Lagragia Takig derivatives with respect to the primal variables L w = w y α φ(x ) = 0 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

55 Support vector machies Miimizig the Lagragia A very simple example Takig derivatives with respect to the primal variables L w = w y α φ(x ) = 0 L b = α y = 0 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

56 Support vector machies Miimizig the Lagragia A very simple example Takig derivatives with respect to the primal variables L w = w y α φ(x ) = 0 L b = α y = 0 L = C λ α = 0 ξ Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

57 Support vector machies Miimizig the Lagragia A very simple example Takig derivatives with respect to the primal variables L w = w y α φ(x ) = 0 L b = α y = 0 L = C λ α = 0 ξ This gives rise to equatios likig the primal variables ad the dual variables as well as ew costraits o the dual variables: w = α y = 0 C λ α = 0 y α φ(x ) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

58 Support vector machies A very simple example Rewrite the Lagrage i terms of dual variables Substitute the solutio to the primal back ito the Lagragia g({α },{λ }) = L(w, {ξ }, {α }, {λ }) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

59 Support vector machies A very simple example Rewrite the Lagrage i terms of dual variables Substitute the solutio to the primal back ito the Lagragia g({α },{λ }) = L(w, {ξ }, {α }, {λ }) = (C α λ )ξ y α φ(x ) α ( ) + α y b ( T α y y m α m φ(x m )) φ(x ) m Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

60 Support vector machies A very simple example Rewrite the Lagrage i terms of dual variables Substitute the solutio to the primal back ito the Lagragia g({α },{λ }) = L(w, {ξ }, {α }, {λ }) = (C α λ )ξ y α φ(x ) α ( ) + α y b ( T α y y m α m φ(x m )) φ(x ) m = α y α φ(x ) 2 2 α α m y m y φ(x m ) T φ(x ) m, Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

61 Support vector machies A very simple example Rewrite the Lagrage i terms of dual variables Substitute the solutio to the primal back ito the Lagragia g({α },{λ }) = L(w, {ξ }, {α }, {λ }) = (C α λ )ξ y α φ(x ) α ( ) + α y b ( T α y y m α m φ(x m )) φ(x ) m = α y α φ(x ) 2 2 α α m y m y φ(x m ) T φ(x ) m, = α 1 α α m y m y φ(x m ) T φ(x ) 2 m, Several terms vaish because of the costraits α y = 0 ad C λ α = 0. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

62 The dual problem Support vector machies A very simple example Maximizig the dual uder the costraits max g({α }, {λ }) = α 1 y m y α m α k(x m, x ) α 2 m, s.t. α 0, α y = 0 C λ α = 0, λ 0, Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

63 The dual problem Support vector machies A very simple example Maximizig the dual uder the costraits max g({α }, {λ }) = α 1 y m y α m α k(x m, x ) α 2 m, s.t. α 0, α y = 0 C λ α = 0, λ 0, We ca simplify as the objective fuctio does ot deped o λ, thus we ca covert the equality costrait ivolvig λ with a iequality costrait o α C: α C λ = C α 0 C λ α = 0, λ 0 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

64 Fial form Support vector machies A very simple example max α α 1 y m y α m α φ(x m ) T φ(x ) 2 m, s.t. 0 α C, α y = 0 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

65 Recover the solutio Support vector machies A very simple example The primal variable w is idetified as w = α y φ(x ) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

66 Recover the solutio Support vector machies A very simple example The primal variable w is idetified as w = α y φ(x ) To idetify b, we eed somethig else. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

67 Support vector machies A very simple example Complemetary slackess ad support vectors At the optimal solutio to both primal ad dual, the followig must be satisfied for every iequality costrait (these are called KKT coditios) λ ξ = 0 α {1 ξ y [w T φ(x ) + b]} = 0 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

68 Support vector machies A very simple example Complemetary slackess ad support vectors At the optimal solutio to both primal ad dual, the followig must be satisfied for every iequality costrait (these are called KKT coditios) λ ξ = 0 α {1 ξ y [w T φ(x ) + b]} = 0 From the first coditio, if α < C, the λ = C α > 0 ξ = 0 Thus, i cojuctio with the secod coditio, we kow that, if C > α > 0, the as y { 1, 1}. 1 y [w T φ(x ) + b] = 0 b = y w T φ(x ) For those whose α > 0, we call such traiig samples as support vectors. (We will discuss their geometric iterpretatio later). Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

69 Outlie Geometric Uderstadig of SVM 1 Admiistratio 2 Review of last lecture 3 Support vector machies 4 Geometric Uderstadig of SVM Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

70 Geometric Uderstadig of SVM Ituitio: where to put the decisio boudary? Cosider the biary classificatio i the followig figure. We have assumed, for coveiece, that the traiig dataset is separable there is a decisio boudary that separates the two classes perfectly. H H H There are ifiite may ways of puttig the decisio boudary H : w T φ(x) + b = 0! Our ituitio is, however, to put the decisio boudary to be i the middle of the two classes as much as possible. I other words, we wat the decisio boudary is to be far to every poit as much as possible as log as the decisio boudary classifies every poit correctly. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

71 Distaces Geometric Uderstadig of SVM The distace from a poit φ(x) to the decisio boudary is d H (φ(x)) = wt φ(x) + b w 2 (We have derived the above i the recitatio/quiz0. Please re-verify it as a take-home exercise). We ca remove the absolute by exploitig the fact that the decisio boudary classifies every poit i the traiig dataset correctly. Namely, (w T φ(x) + b) ad x s label y are of the same sig. The distace is ow, d H (φ(x)) = y[wt φ(x) + b] w 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

72 Maximizig margi Geometric Uderstadig of SVM Margi The margi is defied as the smallest distace from all the traiig poits y [w T φ(x ) + b] margi = mi w 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

73 Maximizig margi Geometric Uderstadig of SVM Margi The margi is defied as the smallest distace from all the traiig poits y [w T φ(x ) + b] margi = mi w 2 Sice we are iterested i fidig a w to put all poits as distat as possible from the decisio boudary, we maximize the margi max w y [w T φ(x ) + b] 1 mi = max mi y [w T φ(x ) + b] w w w 2 H : w T φ(x)+b =0 w T φ(x)+b w 2 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

74 Rescaled margi Geometric Uderstadig of SVM Sice the margi does ot chage if we scale (w, b) by a costat factor c ( as w T φ(x) + b = 0 ad (cw) T φ(x) + (cb) = 0 are the same decisio boudary), we fix the scale by forcig mi y [w T φ(x ) + b] = 1 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

75 Geometric Uderstadig of SVM Rescaled margi Sice the margi does ot chage if we scale (w, b) by a costat factor c ( as w T φ(x) + b = 0 ad (cw) T φ(x) + (cb) = 0 are the same decisio boudary), we fix the scale by forcig I this case, our margi becomes mi y [w T φ(x ) + b] = 1 margi = 1 w 2 precisely, the closest poit to the decisio boudary has a distace of that. w T φ(x)+b =1 H : w T φ(x)+b =0 1 w2 w T φ(x)+b = 1 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

76 Primal formulatio Geometric Uderstadig of SVM Combiig everythig we have, for a separable traiig dataset, we aim to max w This is equivalet to 1 w 2 such that y [w T φ(x ) + b] 1, 1 mi w 2 w 2 2 s.t. y [w T φ(x ) + b] 1, This starts to look like our first formulatio for SVMs. For this geometric ituitio, SVM is called max margi (or large margi) classifier. The costraits are called large margi costraits. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

77 Geometric Uderstadig of SVM SVM for o-separable data Suppose there are traiig data poits that caot be classified correctly o matter how we choose w. For those data poits, y [w T φ(x ) + b] 0 for ay w. Thus, the previous costrait y [w T φ(x ) + b] 1, is o loger feasible. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

78 Geometric Uderstadig of SVM SVM for o-separable data Suppose there are traiig data poits that caot be classified correctly o matter how we choose w. For those data poits, y [w T φ(x ) + b] 0 for ay w. Thus, the previous costrait y [w T φ(x ) + b] 1, is o loger feasible. To deal with this issue, we itroduce slack variables ξ to help y [w T φ(x ) + b] 1 ξ, where we also require ξ 0. Note that, eve for hard poits that caot be classified correctly, the slack variable will be able to make them satisfy the above costrait (we ca keep icreasig ξ util the above iequality is met.) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

79 Geometric Uderstadig of SVM SVM Primal formulatio with slack variables We obviously do ot wat ξ goes to ifiity, so we balace their sizes by pealizig them toward zero as much as possible mi w 1 2 w C ξ s.t. y [w T φ(x ) + b] 1 ξ, ξ 0, where C is our tradeoff (hyper)parameter. This is precisely the primal formulatio we first got for SVM. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

80 Geometric Uderstadig of SVM Meaig of support vectors i SVMs Complemetary slackess At optimum, we have to have α {1 ξ y [w T φ(x ) + b]} = 0, That meas, for some, α = 0. Additioally, our optimal solutio is give by w = α y φ(x ) = α y φ(x ) :α >0 I words, our solutio is oly determied by those traiig samples whose correspodig α is strictly positive. Those samples are called support vectors. No-support vectors whose α = 0 ca be removed by the traiig dataset this removal will ot affect the optimal solutio (i.e., after the removal, if we costruct aother SVM classifier o the reduced dataset, the optimal solutio is the same as the oe o the origial dataset.) Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

81 Geometric Uderstadig of SVM Who are support vectors? Case aalysis Sice, we have We have 1 ξ y [w T φ(x ) + b]} = 0 ξ = 0. This implies y [w T φ(x ) + b] = 1. They are o poits that are 1/ w 2 away from the decisio boudary. ξ < 1. These are poits that ca be classified correctly but do ot satisfy the large margi costrait they have smaller distaces to the decisio boudary. ξ > 1. These are poits that are misclassified. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

82 Geometric Uderstadig of SVM Visualizatio of how traiig data poits are categorized w T φ(x)+b =1 H : w T φ(x)+b =0 ξ < 1 ξ > 1 w T φ(x)+b = 1 ξ =0 Support vectors are those beig circled with the orage lie. Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, / 49

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Chapter 7. Support Vector Machine

Chapter 7. Support Vector Machine Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Machine Learning. Ilya Narsky, Caltech

Machine Learning. Ilya Narsky, Caltech Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

TEACHER CERTIFICATION STUDY GUIDE

TEACHER CERTIFICATION STUDY GUIDE COMPETENCY 1. ALGEBRA SKILL 1.1 1.1a. ALGEBRAIC STRUCTURES Kow why the real ad complex umbers are each a field, ad that particular rigs are ot fields (e.g., itegers, polyomial rigs, matrix rigs) Algebra

More information

1 The Primal and Dual of an Optimization Problem

1 The Primal and Dual of an Optimization Problem CS 189 Itroductio to Machie Learig Fall 2017 Note 18 Previously, i our ivestigatio of SVMs, we forulated a costraied optiizatio proble that we ca solve to fid the optial paraeters for our hyperplae decisio

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

SVM for Statisticians

SVM for Statisticians SVM for Statisticias Youyi Fog Fred Hutchiso Cacer Research Istitute November 13, 2011 1 / 21 Primal Problem ad Pealized Loss Fuctio Miimize J over b, β ad ξ uder some costraits J = 1 2 β 2 + C ξ i (1)

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Polynomials with Rational Roots that Differ by a Non-zero Constant. Generalities

Polynomials with Rational Roots that Differ by a Non-zero Constant. Generalities Polyomials with Ratioal Roots that Differ by a No-zero Costat Philip Gibbs The problem of fidig two polyomials P(x) ad Q(x) of a give degree i a sigle variable x that have all ratioal roots ad differ by

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Recurrence Relations

Recurrence Relations Recurrece Relatios Aalysis of recursive algorithms, such as: it factorial (it ) { if (==0) retur ; else retur ( * factorial(-)); } Let t be the umber of multiplicatios eeded to calculate factorial(). The

More information

MIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS

MIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS MIDTERM 3 CALCULUS MATH 300 FALL 08 Moday, December 3, 08 5:5 PM to 6:45 PM Name PRACTICE EXAM S Please aswer all of the questios, ad show your work. You must explai your aswers to get credit. You will

More information

Lesson 10: Limits and Continuity

Lesson 10: Limits and Continuity www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals

More information

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations ECE-S352 Itroductio to Digital Sigal Processig Lecture 3A Direct Solutio of Differece Equatios Discrete Time Systems Described by Differece Equatios Uit impulse (sample) respose h() of a DT system allows

More information

Intelligent Systems I 08 SVM

Intelligent Systems I 08 SVM Itelliget Systems I 08 SVM Stefa Harmelig & Philipp Heig 12. December 2013 Max Plack Istitute for Itelliget Systems Dptmt. of Empirical Iferece 1 / 30 Your feeback Ejoye most Laplace approximatio gettig

More information

Introduction to Optimization Techniques. How to Solve Equations

Introduction to Optimization Techniques. How to Solve Equations Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice 0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct

More information

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram. Key Cocepts: 1) Sketchig of scatter diagram The scatter diagram of bivariate (i.e. cotaiig two variables) data ca be easily obtaied usig GC. Studets are advised to refer to lecture otes for the GC operatios

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)

Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min) Admi Assigmet 5! Starter REGULARIZATION David Kauchak CS 158 Fall 2016 Schedule Midterm ext week, due Friday (more o this i 1 mi Assigmet 6 due Friday before fall break Midterm Dowload from course web

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

The Growth of Functions. Theoretical Supplement

The Growth of Functions. Theoretical Supplement The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that

More information

Differentiable Convex Functions

Differentiable Convex Functions Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for

More information

Questions and answers, kernel part

Questions and answers, kernel part Questios ad aswers, kerel part October 8, 205 Questios. Questio : properties of kerels, PCA, represeter theorem. [2 poits] Let F be a RK defied o some domai X, with feature map φ(x) x X ad reproducig kerel

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled 1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Properties and Tests of Zeros of Polynomial Functions

Properties and Tests of Zeros of Polynomial Functions Properties ad Tests of Zeros of Polyomial Fuctios The Remaider ad Factor Theorems: Sythetic divisio ca be used to fid the values of polyomials i a sometimes easier way tha substitutio. This is show by

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Optimization Methods: Linear Programming Applications Assignment Problem 1. Module 4 Lecture Notes 3. Assignment Problem

Optimization Methods: Linear Programming Applications Assignment Problem 1. Module 4 Lecture Notes 3. Assignment Problem Optimizatio Methods: Liear Programmig Applicatios Assigmet Problem Itroductio Module 4 Lecture Notes 3 Assigmet Problem I the previous lecture, we discussed about oe of the bech mark problems called trasportatio

More information

1 Approximating Integrals using Taylor Polynomials

1 Approximating Integrals using Taylor Polynomials Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

More information

4x 2. (n+1) x 3 n+1. = lim. 4x 2 n+1 n3 n. n 4x 2 = lim = 3

4x 2. (n+1) x 3 n+1. = lim. 4x 2 n+1 n3 n. n 4x 2 = lim = 3 Exam Problems (x. Give the series (, fid the values of x for which this power series coverges. Also =0 state clearly what the radius of covergece is. We start by settig up the Ratio Test: x ( x x ( x x

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

18.01 Calculus Jason Starr Fall 2005

18.01 Calculus Jason Starr Fall 2005 Lecture 18. October 5, 005 Homework. Problem Set 5 Part I: (c). Practice Problems. Course Reader: 3G 1, 3G, 3G 4, 3G 5. 1. Approximatig Riema itegrals. Ofte, there is o simpler expressio for the atiderivative

More information

Ma 530 Infinite Series I

Ma 530 Infinite Series I Ma 50 Ifiite Series I Please ote that i additio to the material below this lecture icorporated material from the Visual Calculus web site. The material o sequeces is at Visual Sequeces. (To use this li

More information

subcaptionfont+=small,labelformat=parens,labelsep=space,skip=6pt,list=0,hypcap=0 subcaption ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, 2/16/2016

subcaptionfont+=small,labelformat=parens,labelsep=space,skip=6pt,list=0,hypcap=0 subcaption ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, 2/16/2016 subcaptiofot+=small,labelformat=pares,labelsep=space,skip=6pt,list=0,hypcap=0 subcaptio ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, /6/06. Self-cojugate Partitios Recall that, give a partitio λ, we may

More information

Ma 530 Introduction to Power Series

Ma 530 Introduction to Power Series Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

PROBLEM SET 5 SOLUTIONS 126 = , 37 = , 15 = , 7 = 7 1.

PROBLEM SET 5 SOLUTIONS 126 = , 37 = , 15 = , 7 = 7 1. Math 7 Sprig 06 PROBLEM SET 5 SOLUTIONS Notatios. Give a real umber x, we will defie sequeces (a k ), (x k ), (p k ), (q k ) as i lecture.. (a) (5 pts) Fid the simple cotiued fractio represetatios of 6

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Lecture Notes for Analysis Class

Lecture Notes for Analysis Class Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios

More information

Homework Set #3 - Solutions

Homework Set #3 - Solutions EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm

More information

Beurling Integers: Part 2

Beurling Integers: Part 2 Beurlig Itegers: Part 2 Isomorphisms Devi Platt July 11, 2015 1 Prime Factorizatio Sequeces I the last article we itroduced the Beurlig geeralized itegers, which ca be represeted as a sequece of real umbers

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Notes on iteration and Newton s method. Iteration

Notes on iteration and Newton s method. Iteration Notes o iteratio ad Newto s method Iteratio Iteratio meas doig somethig over ad over. I our cotet, a iteratio is a sequece of umbers, vectors, fuctios, etc. geerated by a iteratio rule of the type 1 f

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Linear Programming and the Simplex Method

Linear Programming and the Simplex Method Liear Programmig ad the Simplex ethod Abstract This article is a itroductio to Liear Programmig ad usig Simplex method for solvig LP problems i primal form. What is Liear Programmig? Liear Programmig is

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Inverse Matrix. A meaning that matrix B is an inverse of matrix A. Iverse Matrix Two square matrices A ad B of dimesios are called iverses to oe aother if the followig holds, AB BA I (11) The otio is dual but we ofte write 1 B A meaig that matrix B is a iverse of matrix

More information

1 Generating functions for balls in boxes

1 Generating functions for balls in boxes Math 566 Fall 05 Some otes o geeratig fuctios Give a sequece a 0, a, a,..., a,..., a geeratig fuctio some way of represetig the sequece as a fuctio. There are may ways to do this, with the most commo ways

More information

The z-transform. 7.1 Introduction. 7.2 The z-transform Derivation of the z-transform: x[n] = z n LTI system, h[n] z = re j

The z-transform. 7.1 Introduction. 7.2 The z-transform Derivation of the z-transform: x[n] = z n LTI system, h[n] z = re j The -Trasform 7. Itroductio Geeralie the complex siusoidal represetatio offered by DTFT to a represetatio of complex expoetial sigals. Obtai more geeral characteristics for discrete-time LTI systems. 7.

More information

4. Linear Classification. Kai Yu

4. Linear Classification. Kai Yu 4. Liear Classificatio Kai Y Liear Classifiers A simplest classificatio model Help to derstad oliear models Argably the most sefl classificatio method! 2 Liear Classifiers A simplest classificatio model

More information

Chapter 7 Isoperimetric problem

Chapter 7 Isoperimetric problem Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated

More information

Markov Decision Processes

Markov Decision Processes Markov Decisio Processes Defiitios; Statioary policies; Value improvemet algorithm, Policy improvemet algorithm, ad liear programmig for discouted cost ad average cost criteria. Markov Decisio Processes

More information

Enumerative & Asymptotic Combinatorics

Enumerative & Asymptotic Combinatorics C50 Eumerative & Asymptotic Combiatorics Stirlig ad Lagrage Sprig 2003 This sectio of the otes cotais proofs of Stirlig s formula ad the Lagrage Iversio Formula. Stirlig s formula Theorem 1 (Stirlig s

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information