Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Size: px
Start display at page:

Download "Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32"

Transcription

1 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

2 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

3 Grade Policy ad Fial Exam Fial Grades HWs (30%), midterm (30%), ad fial exam (40%) of fial grade The fial grades will be curved so that the media grade is either a B or B+ (I have ot yet decided) I may icrease weight of fial for studets who do much better o fial tha midterm (I do t have a strict policy i place though) Fial Exam Cumulative but with more emphasis o ew material O last day of class (Wedesday, 3/15) Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

4 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

5 Support vector machies (SVM) Primal Formulatio mi w,b,ξ 1 2 w C ξ s.t. y [w T φ(x ) + b] 1 ξ ad ξ 0, Two equivalet iterpretatios Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

6 Support vector machies (SVM) Primal Formulatio mi w,b,ξ 1 2 w C ξ s.t. y [w T φ(x ) + b] 1 ξ ad ξ 0, Two equivalet iterpretatios Geometric: Maximizig (soft) margi Optimizatio: Miimize hige loss with L2 regularizatio Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

7 Hige Loss Upper-boud for 0/1 loss fuctio (black lie) Covex surrogate to 0/1 loss Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

8 Hige Loss Upper-boud for 0/1 loss fuctio (black lie) Covex surrogate to 0/1 loss, though others exist as well Hige loss less sesitive to outliers tha expoetial (or logistic) loss Logistic loss has a atural probabilistic iterpretatio We ca optimize expoetial loss efficietly i a greedy maer (Adaboost) Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

9 Costraied Optimizatio Primal Formulatio mi w,b,ξ 1 2 w C ξ s.t. y [w T φ(x ) + b] 1 ξ ad ξ 0, Whe workig with costraied optimizatio problems with iequality costraits, we ca write dow primal ad dual problems Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

10 Costraied Optimizatio Primal Formulatio mi w,b,ξ 1 2 w C ξ s.t. y [w T φ(x ) + b] 1 ξ ad ξ 0, Whe workig with costraied optimizatio problems with iequality costraits, we ca write dow primal ad dual problems The dual solutio is always a lower boud o the primal solutio (weak duality) The duality gap equals 0 uder certai coditios (strog duality), ad i such cases we ca either solve the primal or dual problem Strog duality holds for the SVM problem, ad i particular the KKT coditios are ecessary ad sufficiet for the optimal solutio Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

11 Dual formulatio of SVM ad Kerel SVMs max α α 1 y m y α m α φ(x m ) T φ(x ) 2 m, s.t. 0 α C, α y = 0 Dual problem is also a covex quadratic programmig Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

12 Dual formulatio of SVM ad Kerel SVMs max α α 1 y m y α m α φ(x m ) T φ(x ) 2 m, s.t. 0 α C, α y = 0 Dual problem is also a covex quadratic programmig ivolvig N dual variables α Kerel SVM: Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

13 Dual formulatio of SVM ad Kerel SVMs max α α 1 y m y α m α φ(x m ) T φ(x ) 2 m, s.t. 0 α C, α y = 0 Dual problem is also a covex quadratic programmig ivolvig N dual variables α Kerel SVM: Replace ier products φ(xm ) T φ(x ) with a kerel fuctio, k(x m, x ) whe solvig dual problem Show that we ca recover primal predictios at test time without relyig explicitly o φ( ). Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

14 Recoverig primal solutio usig dual variables Why do we care? Usig solely a kerel fuctio, we ca solve the dual optimizatio problem ad make predictios at test time! Predictio oly depeds o support vectors, i.e., poits with α > 0! Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

15 Recoverig primal solutio usig dual variables Why do we care? Usig solely a kerel fuctio, we ca solve the dual optimizatio problem ad make predictios at test time! Predictio oly depeds o support vectors, i.e., poits with α > 0! Weights w = y α φ(x ) Liear combiatio of the iput features Offset b = [y w T φ(x )] Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

16 Recoverig primal solutio usig dual variables Why do we care? Usig solely a kerel fuctio, we ca solve the dual optimizatio problem ad make predictios at test time! Predictio oly depeds o support vectors, i.e., poits with α > 0! Weights w = y α φ(x ) Liear combiatio of the iput features Offset b = [y w T φ(x )] = [y m y m α m k(x m, x )], for ay C > α > 0 Predictio o a test poit x h(x) = sig(w T φ(x) + b) = sig( y α k(x, x) + b) Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

17 Derivig the dual for SVM Primal SVM mi w,b,ξ 1 2 w C ξ s.t. y [w T φ(x ) + b] 1 ξ, ξ 0, Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

18 Derivig the dual for SVM Primal SVM mi w,b,ξ 1 2 w C ξ s.t. y [w T φ(x ) + b] 1 ξ, ξ 0, Lagragia L(w, b, {ξ }, {α }, {λ }) = C ξ w 2 2 λ ξ + α {1 y [w T φ(x ) + b] ξ } uder the costraits that α 0 ad λ 0. Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

19 Miimizig the Lagragia Takig derivatives with respect to the primal variables L w = w L b = α y = 0 y α φ(x ) = 0 L ξ = C λ α = 0 Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

20 Miimizig the Lagragia Takig derivatives with respect to the primal variables L w = w L b = α y = 0 y α φ(x ) = 0 L ξ = C λ α = 0 These equatios lik the primal variables ad the dual variables ad provide ew costraits o the dual variables: w = α y = 0 C λ α = 0 y α φ(x ) Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

21 Rearrage the Lagragia ad icorporate these costraits Recall: L( ) = C ξ w 2 2 λξ + α{1 y[wt φ(x ) + b] ξ } where α 0 ad λ 0 Costraits from partial derivatives: αy = 0 ad C λ α = 0 g({α },{λ }) = L(w, b, {ξ }, {α }, {λ }) Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

22 Rearrage the Lagragia ad icorporate these costraits Recall: L( ) = C ξ w 2 2 λξ + α{1 y[wt φ(x ) + b] ξ } where α 0 ad λ 0 Costraits from partial derivatives: αy = 0 ad C λ α = 0 g({α },{λ }) = L(w, b, {ξ }, {α }, {λ }) = (C α λ )ξ Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

23 Rearrage the Lagragia ad icorporate these costraits Recall: L( ) = C ξ w 2 2 λξ + α{1 y[wt φ(x ) + b] ξ } where α 0 ad λ 0 Costraits from partial derivatives: αy = 0 ad C λ α = 0 g({α },{λ }) = L(w, b, {ξ }, {α }, {λ }) = (C α λ )ξ y α φ(x ) 2 2 Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

24 Rearrage the Lagragia ad icorporate these costraits Recall: L( ) = C ξ w 2 2 λξ + α{1 y[wt φ(x ) + b] ξ } where α 0 ad λ 0 Costraits from partial derivatives: αy = 0 ad C λ α = 0 g({α },{λ }) = L(w, b, {ξ }, {α }, {λ }) = (C α λ )ξ y α φ(x ) α ( ) α y b Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

25 Rearrage the Lagragia ad icorporate these costraits Recall: L( ) = C ξ w 2 2 λξ + α{1 y[wt φ(x ) + b] ξ } where α 0 ad λ 0 Costraits from partial derivatives: αy = 0 ad C λ α = 0 g({α },{λ }) = L(w, b, {ξ }, {α }, {λ }) = (C α λ )ξ y α φ(x ) α ( ) α y b ( T α y y m α m φ(x m )) φ(x ) m Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

26 Rearrage the Lagragia ad icorporate these costraits Recall: L( ) = C ξ w 2 2 λξ + α{1 y[wt φ(x ) + b] ξ } where α 0 ad λ 0 Costraits from partial derivatives: αy = 0 ad C λ α = 0 g({α },{λ }) = L(w, b, {ξ }, {α }, {λ }) = (C α λ )ξ y α φ(x ) α ( ) α y b ( T α y y m α m φ(x m )) φ(x ) m = α y α φ(x ) 2 2 α α m y m y φ(x m ) T φ(x ) m, Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

27 Rearrage the Lagragia ad icorporate these costraits Recall: L( ) = C ξ w 2 2 λξ + α{1 y[wt φ(x ) + b] ξ } where α 0 ad λ 0 Costraits from partial derivatives: αy = 0 ad C λ α = 0 g({α },{λ }) = L(w, b, {ξ }, {α }, {λ }) = (C α λ )ξ y α φ(x ) α ( ) α y b ( T α y y m α m φ(x m )) φ(x ) m = α y α φ(x ) 2 2 α α m y m y φ(x m ) T φ(x ) m, = α 1 α α m y m y φ(x m ) T φ(x ) 2 m, Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

28 The dual problem Maximizig the dual uder the costraits max α g({α }, {λ }) = s.t. α 0, α y = 0 C λ α = 0, λ 0, α 1 y m y α m α k(x m, x ) 2 m, Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

29 The dual problem Maximizig the dual uder the costraits max α g({α }, {λ }) = s.t. α 0, α y = 0 C λ α = 0, λ 0, α 1 y m y α m α k(x m, x ) 2 We ca simplify as the objective fuctio does ot deped o λ. m, C λ α = 0, λ 0 λ = C α 0 Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

30 The dual problem Maximizig the dual uder the costraits max α g({α }, {λ }) = s.t. α 0, α y = 0 C λ α = 0, λ 0, α 1 y m y α m α k(x m, x ) 2 We ca simplify as the objective fuctio does ot deped o λ. m, C λ α = 0, λ 0 λ = C α 0 0 α C Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

31 Simplified Dual max α α 1 y m y α m α φ(x m ) T φ(x ) 2 m, s.t. 0 α C, α y = 0 Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig AdaBoost Derivatio of AdaBoost Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

33 Boostig High-level idea: combie a lot of classifiers Sequetially costruct / idetify these classifiers, h t ( ), oe at a time Use weak classifiers to arrive at a complex decisio boudary (strog classifier), where β t is the cotributio of each weak classifier Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

34 Boostig High-level idea: combie a lot of classifiers Sequetially costruct / idetify these classifiers, h t ( ), oe at a time Use weak classifiers to arrive at a complex decisio boudary (strog classifier), where β t is the cotributio of each weak classifier [ T ] h[x] = sig β t h t (x) t=1 Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

35 Boostig High-level idea: combie a lot of classifiers Sequetially costruct / idetify these classifiers, h t ( ), oe at a time Use weak classifiers to arrive at a complex decisio boudary (strog classifier), where β t is the cotributio of each weak classifier Our pla Describe AdaBoost algorithm Derive the algorithm [ T ] h[x] = sig β t h t (x) t=1 Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

36 Adaboost Algorithm Give: N samples {x, y }, where y {+1, 1}, ad some way of costructig weak (or base) classifiers Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

37 Adaboost Algorithm Give: N samples {x, y }, where y {+1, 1}, ad some way of costructig weak (or base) classifiers Iitialize weights w 1 () = 1 N for every traiig sample Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

38 Adaboost Algorithm Give: N samples {x, y }, where y {+1, 1}, ad some way of costructig weak (or base) classifiers Iitialize weights w 1 () = 1 N for every traiig sample For t = 1 to T 1 Trai a weak classifier h t (x) usig curret weights w t (), by miimizig ɛ t = w t ()I[y h t (x )] Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

39 Adaboost Algorithm Give: N samples {x, y }, where y {+1, 1}, ad some way of costructig weak (or base) classifiers Iitialize weights w 1 () = 1 N for every traiig sample For t = 1 to T 1 Trai a weak classifier h t (x) usig curret weights w t (), by miimizig ɛ t = w t ()I[y h t (x )] (the weighted classificatio error) Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

40 Adaboost Algorithm Give: N samples {x, y }, where y {+1, 1}, ad some way of costructig weak (or base) classifiers Iitialize weights w 1 () = 1 N for every traiig sample For t = 1 to T 1 Trai a weak classifier h t (x) usig curret weights w t (), by miimizig ɛ t = w t ()I[y h t (x )] (the weighted classificatio error) 2 Compute cotributio for this classifier Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

41 Adaboost Algorithm Give: N samples {x, y }, where y {+1, 1}, ad some way of costructig weak (or base) classifiers Iitialize weights w 1 () = 1 N for every traiig sample For t = 1 to T 1 Trai a weak classifier h t (x) usig curret weights w t (), by miimizig ɛ t = w t ()I[y h t (x )] (the weighted classificatio error) 2 Compute cotributio for this classifier: β t = 1 2 log 1 ɛt ɛ t Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

42 Adaboost Algorithm Give: N samples {x, y }, where y {+1, 1}, ad some way of costructig weak (or base) classifiers Iitialize weights w 1 () = 1 N for every traiig sample For t = 1 to T 1 Trai a weak classifier h t (x) usig curret weights w t (), by miimizig ɛ t = w t ()I[y h t (x )] (the weighted classificatio error) 2 Compute cotributio for this classifier: β t = Update weights o traiig poits log 1 ɛt ɛ t Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

43 Adaboost Algorithm Give: N samples {x, y }, where y {+1, 1}, ad some way of costructig weak (or base) classifiers Iitialize weights w 1 () = 1 N for every traiig sample For t = 1 to T 1 Trai a weak classifier h t (x) usig curret weights w t (), by miimizig ɛ t = w t ()I[y h t (x )] (the weighted classificatio error) 2 Compute cotributio for this classifier: β t = Update weights o traiig poits w t+1 () w t ()e βtyht(x) log 1 ɛt ɛ t Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

44 Adaboost Algorithm Give: N samples {x, y }, where y {+1, 1}, ad some way of costructig weak (or base) classifiers Iitialize weights w 1 () = 1 N for every traiig sample For t = 1 to T 1 Trai a weak classifier h t (x) usig curret weights w t (), by miimizig ɛ t = w t ()I[y h t (x )] (the weighted classificatio error) 2 Compute cotributio for this classifier: β t = Update weights o traiig poits w t+1 () w t ()e βtyht(x) ad ormalize them such that w t+1() = 1 log 1 ɛt ɛ t Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

45 Adaboost Algorithm Give: N samples {x, y }, where y {+1, 1}, ad some way of costructig weak (or base) classifiers Iitialize weights w 1 () = 1 N for every traiig sample For t = 1 to T 1 Trai a weak classifier h t (x) usig curret weights w t (), by miimizig ɛ t = w t ()I[y h t (x )] (the weighted classificatio error) 2 Compute cotributio for this classifier: β t = Update weights o traiig poits w t+1 () w t ()e βtyht(x) ad ormalize them such that w t+1() = 1 Output the fial classifier [ T ] h[x] = sig β t h t (x) t=1 log 1 ɛt ɛ t Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

46 Example 10 data poits ad 2 features D 1 The data poits are clearly ot liear separable I the begiig, all data poits have equal weights (the size of the data markers + or - ) Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

47 Example 10 data poits ad 2 features D 1 The data poits are clearly ot liear separable I the begiig, all data poits have equal weights (the size of the data markers + or - ) Base classifier h( ): horizotal or vertical lies ( decisio stumps ) Depth-1 decisio trees, i.e., classify data based o a sigle attribute Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

48 Roud 1: t = 1 h 1 D 2 Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

49 Roud 1: t = 1 h 1 D 2 3 misclassified (with circles): ɛ 1 = 0.3 β 1 = Weights recomputed; the 3 misclassified data poits receive larger weights Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

50 Roud 2: t = 2 h 2 D 3 Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

51 Roud 2: t = 2 h 2 D 3 3 misclassified (with circles): ɛ 2 = 0.21 β 2 = Note that ɛ as those 3 data poits have weights less tha 1/10 3 misclassified data poits get larger weights Data poits classified correctly i both rouds have small weights Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

52 Roud33: t = 3 Roud h3 "3 =0.14!3=0.92 Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

53 Roud33: t = 3 Roud h3 "3 =0.14!3= misclassified (with circles): 3 = 0.14 β3 = Previously correctly classified data poits are ow misclassified, hece our error is low; what s the ituitio? Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

54 Roud33: t = 3 Roud h3 "3 =0.14!3= misclassified (with circles): 3 = 0.14 β3 = Previously correctly classified data poits are ow misclassified, hece our error is low; what s the ituitio? I Sice they have bee cosistetly classified correctly, this roud s mistake will hopefully ot have a huge impact o the overall predictio Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

55 Fial classifier: combiig 3 classifiers H = sig fial = All data poits are ow classified correctly! Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

56 Why AdaBoost works? It miimizes a loss fuctio related to classificatio error. Classificatio loss Suppose we wat to have a classifier { 1 if f(x) > 0 h(x) = sig[f(x)] = 1 if f(x) < 0 Oe seemigly atural loss fuctio is 0-1 loss: { 0 if yf(x) > 0 l(h(x), y) = 1 if yf(x) < 0 Namely, the fuctio f(x) ad the target label y should have the same sig to avoid a loss of 1. Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

57 Surrogate loss 0 1 loss fuctio l(h(x), y) is o-covex ad difficult to optimize. We ca istead use a surrogate loss what are examples? Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

58 Surrogate loss 0 1 loss fuctio l(h(x), y) is o-covex ad difficult to optimize. We ca istead use a surrogate loss what are examples? Expoetial Loss l exp (h(x), y) = e yf(x) `(h(x),y) yf(x) Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

59 Choosig the t-th classifier Suppose a classifier f t 1 (x), ad wat to add a weak learer h t (x) f(x) = f t 1 (x) + β t h t (x) ote: h t ( ) outputs 1 or 1, as does sig [f t 1 ( )] Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

60 Choosig the t-th classifier Suppose a classifier f t 1 (x), ad wat to add a weak learer h t (x) f(x) = f t 1 (x) + β t h t (x) ote: h t ( ) outputs 1 or 1, as does sig [f t 1 ( )] How ca we optimally choose h t (x) ad combiatio coefficiet β t? Adaboost greedily miimizes the expoetial loss fuctio! (h t (x), βt ) = arg mi (ht(x),βt) e yf(x) Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

61 Choosig the t-th classifier Suppose a classifier f t 1 (x), ad wat to add a weak learer h t (x) f(x) = f t 1 (x) + β t h t (x) ote: h t ( ) outputs 1 or 1, as does sig [f t 1 ( )] How ca we optimally choose h t (x) ad combiatio coefficiet β t? Adaboost greedily miimizes the expoetial loss fuctio! (h t (x), βt ) = arg mi (ht(x),βt) e yf(x) = arg mi (ht(x),β t) e y[f t 1(x )+β th t(x )] Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

62 Choosig the t-th classifier Suppose a classifier f t 1 (x), ad wat to add a weak learer h t (x) f(x) = f t 1 (x) + β t h t (x) ote: h t ( ) outputs 1 or 1, as does sig [f t 1 ( )] How ca we optimally choose h t (x) ad combiatio coefficiet β t? Adaboost greedily miimizes the expoetial loss fuctio! (h t (x), βt ) = arg mi (ht(x),βt) e yf(x) = arg mi (ht(x),β t) e y[f t 1(x )+β th t(x )] = arg mi (ht(x),βt) w t ()e yβtht(x) where we have used w t () as a shorthad for e yf t 1(x ) Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

63 The ew classifier We ca decompose the weighted loss fuctio ito two parts w t ()e yβtht(x) = w t ()e βt I[y h t (x )] + w t ()e βt I[y = h t (x )] Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

64 The ew classifier We ca decompose the weighted loss fuctio ito two parts w t ()e yβtht(x) = w t ()e βt I[y h t (x )] + w t ()e βt I[y = h t (x )] = w t ()e βt I[y h t (x )] + w t ()e βt (1 I[y h t (x )]) Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

65 The ew classifier We ca decompose the weighted loss fuctio ito two parts w t ()e yβtht(x) = w t ()e βt I[y h t (x )] + w t ()e βt I[y = h t (x )] = w t ()e βt I[y h t (x )] + w t ()e βt (1 I[y h t (x )]) = (e βt e βt ) w t ()I[y h t (x )] + e βt w t () We have used the followig properties to derive the above y h t (x ) is either 1 or -1 as h t (x ) is the output of a biary classifier The idicator fuctio I[y = h t (x )] is either 0 or 1, so it equals 1 I[y h t (x )] Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

66 Fidig the optimal weak learer Summary (h t (x), βt ) = arg mi (ht(x),βt) w t ()e yβtht(x) = arg mi (ht(x),βt)(e βt e βt ) w t ()I[y h t (x )] + e βt w t () What term(s) must we optimize to choose h t (x )? Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

67 Fidig the optimal weak learer Summary (h t (x), βt ) = arg mi (ht(x),βt) w t ()e yβtht(x) = arg mi (ht(x),βt)(e βt e βt ) w t ()I[y h t (x )] + e βt w t () What term(s) must we optimize to choose h t (x )? h t (x) = arg mi ht(x) ɛ t = w t ()I[y h t (x )] Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

68 Fidig the optimal weak learer Summary (h t (x), βt ) = arg mi (ht(x),βt) w t ()e yβtht(x) = arg mi (ht(x),βt)(e βt e βt ) w t ()I[y h t (x )] + e βt w t () What term(s) must we optimize to choose h t (x )? h t (x) = arg mi ht(x) ɛ t = w t ()I[y h t (x )] Miimize weighted classificatio error as oted i step 1 of Adaboost! Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

69 How to choose β t? Summary (h t (x), βt ) = arg mi (ht(x),βt) w t ()e yβtht(x) = arg mi (ht(x),βt)(e βt e βt ) w t ()I[y h t (x )] + e βt w t () What term(s) must we optimize? Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

70 How to choose β t? Summary (h t (x), βt ) = arg mi (ht(x),βt) w t ()e yβtht(x) = arg mi (ht(x),βt)(e βt e βt ) w t ()I[y h t (x )] + e βt w t () What term(s) must we optimize? We eed to miimize the etire objective fuctio with respect to β t! Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

71 How to choose β t? Summary (h t (x), βt ) = arg mi (ht(x),βt) w t ()e yβtht(x) = arg mi (ht(x),βt)(e βt e βt ) w t ()I[y h t (x )] + e βt w t () What term(s) must we optimize? We eed to miimize the etire objective fuctio with respect to β t! We ca do this by takig derivative with respect to β t, settig to zero, ad solvig for β t. After some calculatio ad usig w t() = 1, we fid: β t = 1 2 log 1 ɛ t ɛ t which is precisely step 2 of Adaboost! (Exercise verify the solutio) Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

72 Updatig the weights Oce we fid the optimal weak learer we ca update our classifier: f(x) = f t 1 (x) + β t h t (x) Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

73 Updatig the weights Oce we fid the optimal weak learer we ca update our classifier: f(x) = f t 1 (x) + βt h t (x) We the eed to compute the weights for the above classifier as: w t+1 () = e yf(x) = e y[f t 1(x)+βt h t (x)] Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

74 Updatig the weights Oce we fid the optimal weak learer we ca update our classifier: f(x) = f t 1 (x) + βt h t (x) We the eed to compute the weights for the above classifier as: w t+1 () = e yf(x) = e y[f t 1(x)+βt h t (x)] = w t ()e yβ t h t (x) Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

75 Updatig the weights Oce we fid the optimal weak learer we ca update our classifier: f(x) = f t 1 (x) + β t h t (x) We the eed to compute the weights for the above classifier as: w t+1 () = e yf(x) = e y[f t 1(x)+βt h t (x)] { = w t ()e yβ t h t (x) wt ()e = β t if y h t (x ) w t ()e β t if y = h t (x ) Ituitio Misclassified data poits will get their weights icreased, while correctly classified data poits will get their weight decreased Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

76 Meta-Algorithm Note that the AdaBoost algorithm itself ever specifies how we would get h t (x) as log as it miimizes the weighted classificatio error ɛ t = w t ()I[y h t (x )] I this aspect, the AdaBoost algorithm is a meta-algorithm ad ca be used with ay type of classifier Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

77 E.g., Decisio Stumps How do we choose the decisio stump classifier give the weights at the secod roud of the followig distributio? h 1 D 2 Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

78 E.g., Decisio Stumps How do we choose the decisio stump classifier give the weights at the secod roud of the followig distributio? h 1 D 2 We ca simply eumerate all possible ways of puttig vertical ad horizotal lies to separate the data poits ito two classes ad fid the oe with the smallest weighted classificatio error! Rutime? Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

79 E.g., Decisio Stumps How do we choose the decisio stump classifier give the weights at the secod roud of the followig distributio? h 1 D 2 We ca simply eumerate all possible ways of puttig vertical ad horizotal lies to separate the data poits ito two classes ad fid the oe with the smallest weighted classificatio error! Rutime? Presort data by each feature i O(dN log N) time Evaluate N + 1 thresholds for each feature at each roud i O(dN) time I total O(dN log N + dnt ) time this efficiecy is a attractive quality of boostig! Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

80 Iterpretig boostig as learig oliear basis Two-stage process Get sig[h 1 (x)], sig[h 2 (x)],, Combie ito a liear classificatio model { } { } y = sig β t sig[h t (x)] = sig β φ(x) t Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

81 Iterpretig boostig as learig oliear basis Two-stage process Get sig[h 1 (x)], sig[h 2 (x)],, Combie ito a liear classificatio model { } { } y = sig β t sig[h t (x)] = sig β φ(x) t I other words, each stage lears a oliear basis φ t (x) = sig[h t (x)] This is a alterative way to itroduce o-liearity aside from kerel methods We could also try to lear the basis fuctios ad the classifier at the same time, as we ll talk about with eural etworks ext class Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, / 32

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017 Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

Chapter 7. Support Vector Machine

Chapter 7. Support Vector Machine Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Machine Learning. Ilya Narsky, Caltech

Machine Learning. Ilya Narsky, Caltech Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Supplemental Material: Proofs

Supplemental Material: Proofs Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 3 : Olie Learig, miimax value, sequetial Rademacher complexity Recap: Miimax Theorem We shall use the celebrated miimax theorem as a key tool to boud the miimax rate

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

1 The Primal and Dual of an Optimization Problem

1 The Primal and Dual of an Optimization Problem CS 189 Itroductio to Machie Learig Fall 2017 Note 18 Previously, i our ivestigatio of SVMs, we forulated a costraied optiizatio proble that we ca solve to fid the optial paraeters for our hyperplae decisio

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 23. SOME CONSEQUENCES OF ONLINE NO-REGRET METHODS I this lecture, we explore some cosequeces of the developed techiques.. Covex optimizatio Wheever

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

subject to A 1 x + A 2 y b x j 0, j = 1,,n 1 y j = 0 or 1, j = 1,,n 2

subject to A 1 x + A 2 y b x j 0, j = 1,,n 1 y j = 0 or 1, j = 1,,n 2 Additioal Brach ad Boud Algorithms 0-1 Mixed-Iteger Liear Programmig The brach ad boud algorithm described i the previous sectios ca be used to solve virtually all optimizatio problems cotaiig iteger variables,

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)

Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min) Admi Assigmet 5! Starter REGULARIZATION David Kauchak CS 158 Fall 2016 Schedule Midterm ext week, due Friday (more o this i 1 mi Assigmet 6 due Friday before fall break Midterm Dowload from course web

More information

Differentiable Convex Functions

Differentiable Convex Functions Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Homework Set #3 - Solutions

Homework Set #3 - Solutions EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

15.081J/6.251J Introduction to Mathematical Programming. Lecture 21: Primal Barrier Interior Point Algorithm

15.081J/6.251J Introduction to Mathematical Programming. Lecture 21: Primal Barrier Interior Point Algorithm 508J/65J Itroductio to Mathematical Programmig Lecture : Primal Barrier Iterior Poit Algorithm Outlie Barrier Methods Slide The Cetral Path 3 Approximatig the Cetral Path 4 The Primal Barrier Algorithm

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

4. Linear Classification. Kai Yu

4. Linear Classification. Kai Yu 4. Liear Classificatio Kai Y Liear Classifiers A simplest classificatio model Help to derstad oliear models Argably the most sefl classificatio method! 2 Liear Classifiers A simplest classificatio model

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

You may work in pairs or purely individually for this assignment.

You may work in pairs or purely individually for this assignment. CS 04 Problem Solvig i Computer Sciece OOC Assigmet 6: Recurreces You may work i pairs or purely idividually for this assigmet. Prepare your aswers to the followig questios i a plai ASCII text file or

More information

Naïve Bayes. Naïve Bayes

Naïve Bayes. Naïve Bayes Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier

More information

Math 5311 Problem Set #5 Solutions

Math 5311 Problem Set #5 Solutions Math 5311 Problem Set #5 Solutios March 9, 009 Problem 1 O&S 11.1.3 Part (a) Solve with boudary coditios u = 1 0 x < L/ 1 L/ < x L u (0) = u (L) = 0. Let s refer to [0, L/) as regio 1 ad (L/, L] as regio.

More information

Selective Prediction

Selective Prediction COMS 6998-4 Fall 2017 November 8, 2017 Selective Predictio Preseter: Rog Zhou Scribe: Wexi Che 1 Itroductio I our previous discussio o a variatio o the Valiat Model [3], the described learer has the ability

More information

Linear Programming and the Simplex Method

Linear Programming and the Simplex Method Liear Programmig ad the Simplex ethod Abstract This article is a itroductio to Liear Programmig ad usig Simplex method for solvig LP problems i primal form. What is Liear Programmig? Liear Programmig is

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments: Recall: STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Commets:. So far we have estimates of the parameters! 0 ad!, but have o idea how good these estimates are. Assumptio: E(Y x)! 0 +! x (liear coditioal

More information

Questions and answers, kernel part

Questions and answers, kernel part Questios ad aswers, kerel part October 8, 205 Questios. Questio : properties of kerels, PCA, represeter theorem. [2 poits] Let F be a RK defied o some domai X, with feature map φ(x) x X ad reproducig kerel

More information

Chapter 7 Isoperimetric problem

Chapter 7 Isoperimetric problem Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated

More information

APPENDIX A SMO ALGORITHM

APPENDIX A SMO ALGORITHM AENDIX A SMO ALGORITHM Sequetial Miimal Optimizatio SMO) is a simple algorithm that ca quickly solve the SVM Q problem without ay extra matrix storage ad without usig time-cosumig umerical Q optimizatio

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

Math 257: Finite difference methods

Math 257: Finite difference methods Math 257: Fiite differece methods 1 Fiite Differeces Remember the defiitio of a derivative f f(x + ) f(x) (x) = lim 0 Also recall Taylor s formula: (1) f(x + ) = f(x) + f (x) + 2 f (x) + 3 f (3) (x) +...

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions Math 451: Euclidea ad No-Euclidea Geometry MWF 3pm, Gasso 204 Homework 3 Solutios Exercises from 1.4 ad 1.5 of the otes: 4.3, 4.10, 4.12, 4.14, 4.15, 5.3, 5.4, 5.5 Exercise 4.3. Explai why Hp, q) = {x

More information

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A) REGRESSION (Physics 0 Notes, Partial Modified Appedix A) HOW TO PERFORM A LINEAR REGRESSION Cosider the followig data poits ad their graph (Table I ad Figure ): X Y 0 3 5 3 7 4 9 5 Table : Example Data

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

SVM for Statisticians

SVM for Statisticians SVM for Statisticias Youyi Fog Fred Hutchiso Cacer Research Istitute November 13, 2011 1 / 21 Primal Problem ad Pealized Loss Fuctio Miimize J over b, β ad ξ uder some costraits J = 1 2 β 2 + C ξ i (1)

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform

More information

Lecture 10: Universal coding and prediction

Lecture 10: Universal coding and prediction 0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved

More information

The Simplex algorithm: Introductory example. The Simplex algorithm: Introductory example (2)

The Simplex algorithm: Introductory example. The Simplex algorithm: Introductory example (2) Discrete Mathematics for Bioiformatics WS 07/08, G. W. Klau, 23. Oktober 2007, 12:21 1 The Simplex algorithm: Itroductory example The followig itroductio to the Simplex algorithm is from the book Liear

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECURE 4 his lecture is partly based o chapters 4-5 i [SSBD4]. Let us o give a variat of SGD for strogly covex fuctios. Algorithm SGD for strogly covex

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

SNAP Centre Workshop. Basic Algebraic Manipulation

SNAP Centre Workshop. Basic Algebraic Manipulation SNAP Cetre Workshop Basic Algebraic Maipulatio 8 Simplifyig Algebraic Expressios Whe a expressio is writte i the most compact maer possible, it is cosidered to be simplified. Not Simplified: x(x + 4x)

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

Properties and Tests of Zeros of Polynomial Functions

Properties and Tests of Zeros of Polynomial Functions Properties ad Tests of Zeros of Polyomial Fuctios The Remaider ad Factor Theorems: Sythetic divisio ca be used to fid the values of polyomials i a sometimes easier way tha substitutio. This is show by

More information

Markov Decision Processes

Markov Decision Processes Markov Decisio Processes Defiitios; Statioary policies; Value improvemet algorithm, Policy improvemet algorithm, ad liear programmig for discouted cost ad average cost criteria. Markov Decisio Processes

More information

Introduction to Optimization Techniques. How to Solve Equations

Introduction to Optimization Techniques. How to Solve Equations Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

NICK DUFRESNE. 1 1 p(x). To determine some formulas for the generating function of the Schröder numbers, r(x) = a(x) =

NICK DUFRESNE. 1 1 p(x). To determine some formulas for the generating function of the Schröder numbers, r(x) = a(x) = AN INTRODUCTION TO SCHRÖDER AND UNKNOWN NUMBERS NICK DUFRESNE Abstract. I this article we will itroduce two types of lattice paths, Schröder paths ad Ukow paths. We will examie differet properties of each,

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Istitute of Techology 6.867 Machie Learig, Fall 6 Problem Set : Solutios. (a) (5 poits) From the lecture otes (Eq 4, Lecture 5), the optimal parameter values for liear regressio give the

More information

Linear Programming! References! Introduction to Algorithms.! Dasgupta, Papadimitriou, Vazirani. Algorithms.! Cormen, Leiserson, Rivest, and Stein.

Linear Programming! References! Introduction to Algorithms.! Dasgupta, Papadimitriou, Vazirani. Algorithms.! Cormen, Leiserson, Rivest, and Stein. Liear Programmig! Refereces! Dasgupta, Papadimitriou, Vazirai. Algorithms.! Corme, Leiserso, Rivest, ad Stei. Itroductio to Algorithms.! Slack form! For each costrait i, defie a oegative slack variable

More information

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times Sigificace level vs. cofidece level Agreemet of CI ad HT Lecture 13 - Tests of Proportios Sta102 / BME102 Coli Rudel October 15, 2014 Cofidece itervals ad hypothesis tests (almost) always agree, as log

More information

Problem Cosider the curve give parametrically as x = si t ad y = + cos t for» t» ß: (a) Describe the path this traverses: Where does it start (whe t =

Problem Cosider the curve give parametrically as x = si t ad y = + cos t for» t» ß: (a) Describe the path this traverses: Where does it start (whe t = Mathematics Summer Wilso Fial Exam August 8, ANSWERS Problem 1 (a) Fid the solutio to y +x y = e x x that satisfies y() = 5 : This is already i the form we used for a first order liear differetial equatio,

More information

NUMERICAL METHODS FOR SOLVING EQUATIONS

NUMERICAL METHODS FOR SOLVING EQUATIONS Mathematics Revisio Guides Numerical Methods for Solvig Equatios Page 1 of 11 M.K. HOME TUITION Mathematics Revisio Guides Level: GCSE Higher Tier NUMERICAL METHODS FOR SOLVING EQUATIONS Versio:. Date:

More information

MATH 10550, EXAM 3 SOLUTIONS

MATH 10550, EXAM 3 SOLUTIONS MATH 155, EXAM 3 SOLUTIONS 1. I fidig a approximate solutio to the equatio x 3 +x 4 = usig Newto s method with iitial approximatio x 1 = 1, what is x? Solutio. Recall that x +1 = x f(x ) f (x ). Hece,

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects

More information

TEACHER CERTIFICATION STUDY GUIDE

TEACHER CERTIFICATION STUDY GUIDE COMPETENCY 1. ALGEBRA SKILL 1.1 1.1a. ALGEBRAIC STRUCTURES Kow why the real ad complex umbers are each a field, ad that particular rigs are ot fields (e.g., itegers, polyomial rigs, matrix rigs) Algebra

More information

Optimization Methods: Linear Programming Applications Assignment Problem 1. Module 4 Lecture Notes 3. Assignment Problem

Optimization Methods: Linear Programming Applications Assignment Problem 1. Module 4 Lecture Notes 3. Assignment Problem Optimizatio Methods: Liear Programmig Applicatios Assigmet Problem Itroductio Module 4 Lecture Notes 3 Assigmet Problem I the previous lecture, we discussed about oe of the bech mark problems called trasportatio

More information

Bertrand s Postulate

Bertrand s Postulate Bertrad s Postulate Lola Thompso Ross Program July 3, 2009 Lola Thompso (Ross Program Bertrad s Postulate July 3, 2009 1 / 33 Bertrad s Postulate I ve said it oce ad I ll say it agai: There s always a

More information