ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques

2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N f : and y f x. y 3 y 2 y True functon Estmate 4 y 1 y 1 x 2 x 3 x 4 x x 1,... Estmate f that best predct set of tranng ponts x, y?

3 ADVANCED ACHINE LEARNING Regresson: Issues N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N f : and y f x. y 3 y 2 y 4 y 1 y Ft strongly nfluenced by choce of: - dataponts for tranng - complexty of the model (nterpolaton) True functon Estmate 1 x 2 x 3 x 4 x x 1,... Estmate f that best predct set of tranng ponts x, y?

4 ADVANCED ACHINE LEARNING Regresson Algorthms n ths Course Support Vector achne Relevance Vector achne Support vector regresson Boostng random projectons Relevance vector regresson Boostng random gaussans Random Gaussan forest process regresson Gaussan Process Gradent boostng Locally weghted projected regresson

5 ADVANCED ACHINE LEARNING Today, we wll see: Support Vector achne Relevance Vector achne Support vector regresson Boostng random projectons Relevance vector regresson

66 ADVANCED ACHINE LEARNING Support Vector Regresson

77 ADVANCED ACHINE LEARNING Support Vector Regresson Assume a nonlnear mappng f, s.t. y f x. 1,... How to estmate f to best predct the par of tranng ponts x, y? How to generalze the support vector machne framework for classfcaton to estmate contnuous functons? 1. Assume a non-lnear mappng through feature space and then perform lnear regresson n feature space 2. Supervsed learnng mnmzes an error functon. Frst determne a way to measure error on testng set n the lnear case!

88 ADVANCED ACHINE LEARNING Support Vector Regresson Assume a lnear mappng f, s.t. y f x w T x b. 1,... How to estmate w and b to best predct the par of tranng ponts x, y? y T y f x w x b easure the error on predcton x

99 ADVANCED ACHINE LEARNING Support Vector Regresson Set an upper bound on the error and consder as correctly classfed all ponts such that f ( x) y. y T y f x w x b Penalze only dataponts that are not contaned n the -tube. x

1010 ADVANCED ACHINE LEARNING Support Vector Regresson The -margn s a measure of the wdth of the -nsenstve tube. It s a measure of the precson of the regresson. A small w corresponds to a small slope for f. In the lnear case, f s more horzontal. y x -margn

111 ADVANCED ACHINE LEARNING Support Vector Regresson A large w corresponds to a large slope for f. In the lnear case, f s more vertcal. The flatter the slope of the functon f, the larger the margn. y To maxmze the margn, we must mnmze the norm of w. x -margn

1212 ADVANCED ACHINE LEARNING Support Vector Regresson Ths can be rephrased as a constrant-based optmzaton problem of the form: 1 mnmze 2 subject to 1,... w 2 w, x b y y w, x b Need to penalze ponts outsde the -nsenstve tube. Consder as correctly classfed all ponts such that f ( x) y.

1313 ADVANCED ACHINE LEARNING Support Vector Regresson Introduce slack varables,, C 0 : 1 C mnmze w + 2 2 1 w, x b y subject to y w, x b 0, 0 Need to penalze ponts outsde the -nsenstve tube.

1414 ADVANCED ACHINE LEARNING Support Vector Regresson Introduce slack varables,, C 0 : 1 C mnmze w + 2 2 1 w, x b y subject to y w, x b 0, 0 All ponts outsde the -tube become Support Vectors We now have the soluton to the lnear regresson problem. How to generalze ths to the nonlnear case?

1515 ADVANCED ACHINE LEARNING Support Vector Regresson We can solve ths quadratc problem by ntroducng sets of, Lagrange multplers and wrtng the Lagrangan : Lagrangan = Objectve functon + multplers constrants 1 C C L,,, = + 2 2 w b w 1 1 1 1 y w, x b y w, x b

1616 ADVANCED ACHINE LEARNING Support Vector Regresson 0 for all ponts that satsfy the constrants ponts nsde the -tube 0 Constrants on ponts lyng on ether sde of the -tube. 1 C C L,,, = + 2 2 w b w 1 1 1 1 y w, x b y w, x b 0

1717 ADVANCED ACHINE LEARNING Support Vector Regresson Requrng that the partal dervatves are all zero: L 0. b 1 1 1 Rebalancng the effect of the support vectors on both sdes of the -tube L w x 0. w w 1 1 x. Lnear combnaton of support vectors The soluton s gven by: y f x w, x b x, x 1 b

1818 ADVANCED ACHINE LEARNING Support Vector Regresson Lft x nto feature space and then perform lnear regresson n feature space. Lnear Case: y f x w, x b Non-Lnear Case: x x, y f x w x b x x w lves n feature space!

1919 ADVANCED ACHINE LEARNING Support Vector Regresson In feature space, we obtan the same constraned optmzaton problem: 1 C mnmze w + 2 2 1 w, x b y subject to, y w x b 0, 0

2020 ADVANCED ACHINE LEARNING Support Vector Regresson We can solve ths quadratc problem by ntroducng sets of, Lagrange multplers and wrtng the Lagrangan : Lagrangan = Objectve functon + multplers constrants 1 C C L,,, = + 2 2 w b w 1 1, 1 y w, x b 1 y w x b

2121 ADVANCED ACHINE LEARNING Support Vector Regresson And replacng n the prmal Lagrangan, we get the Dual optmzaton problem: 1 2, j1 max, 1 1 j j j k x, x y C subject to 0 and, 0, 1 Kernel Trck, j, j k x x x x

222 ADVANCED ACHINE LEARNING Support Vector Regresson The soluton s gven by:, y f x k x x b 1 Lnear Coeffcents (Lagrange multplers for each constrant). If one uses RBF Kernel, un-normalzed sotropc Gaussans centered on each tranng datapont.

2323 ADVANCED ACHINE LEARNING Support Vector Regresson The soluton s gven by: y, y f x k x x b 1 Kernel places a Gaussan functon on each SV x

ADVANCED ACHINE LEARNING The soluton s gven by: Support Vector Regresson, y f x k x x b 1 The Lagrange multplers defne ythe mportance of each Gaussan functon. Converges to b when SV effect vanshes. Y=f(x) b x1 x2 x3 x4 x5 x6 1 1.5 3 1.5 4 3 5 1 2 2 6 2.5 x 2424

2525 ADVANCED ACHINE LEARNING Support Vector Regresson: Exercse I j j y x SVR gves the followng estmate for each par of dataponts, j j j y k x, x b, 1... 1 For the set of 3 ponts drawn below, 1 a) compute an estmate of b usng: b y j1 1 k x, x wth RBF kernel, and plot b. j j b) Plot the regressve curve and show how t vares dependng on the kernel wdth and.

2626 ADVANCED ACHINE LEARNING Support Vector Regresson: Exercse I Soluton a) To answer ths queston, you must frst determne the number of SVs. Ths depends on epslon. If we assume a small epslon, all 3 ponts become SVs. b s then nfluenced by the value of the kernel wdth. Wth a very small kernel wdth j j 1 j k x, x 0 x x and then, b y 1 s the mean of the data. j1 1 0 As the kernel wdth grows, b s pulled toward the SVs modulated by the kernel 1 j j b y k x, x j1 j Wth only 2 SVs, the nfluence of the SVs cancels out and we are back to the mean of the data.

ADVANCED ACHINE LEARNING Support Vector Regresson: Exercse I Soluton Kernel wdth = 0.001, epslon = 0.1 Kernel wdth = 0.1, epslon = 0.1 Wth small kernel wdth, the effect of each SV s well separated and the curve comes back to b n-between two SVs. Wth a large kernel wdth, b changes and the curve yelds a smooth nterpolaton n-between SVs. Wth a large epslon, one pont s absorbed n the epslon tube and s no longer a SV. Kernel wdth = 0.001, epslon = 0.24 2727

2828 ADVANCED ACHINE LEARNING Support Vector Regresson: Exercse II Recall the soluton to SV:, y f x k x x b 1 a) What type of functon f can you model wth the homogeneous polynomal? b) What mnmum order of a homogeneous polynomal kernel do you need to acheve good regresson on the set of 3 ponts below?

2929 ADVANCED ACHINE LEARNING Support Vector Regresson: Solutons Exercse II The equaton for a homogeneous polynomal n the 1-D case below s: 1 p y ( )( x x) b : sngle term scaled & shfted polynomal For the set of ponts below, we need at mnmum p=2 and 2 SVs. See also the supplementary exercses posted on the class s webste!

3030 ADVANCED ACHINE LEARNING SVR: Hyperparameters The soluton to SVR we just saw s referred to as SVR Two Hyperparameters 1 C mnmze w + 2 2 1 w, x b y subject to y w, x b 0, 0 C controls the penalty term on poor ft. determnes the mnmal requred precson.

3131 ADVANCED ACHINE LEARNING SVR: Effect of Hyperparameters Effect of the RBF kernel wdth on the ft. Here ft usng C=100, =0.1, kernel wdth=0.01.

3232 ADVANCED ACHINE LEARNING SVR: Effect of Hyperparameters Effect of the RBF kernel wdth on the ft. Here ft usng C=100, =0.01, kernel wdth=0.01 Overfttng

Effect of the RBF kernel wdth on the ft. Here ft usng C=100, =0.05, kernel wdth=0.01 Reducton of the effect of the kernel wdth on the ft by choosng approprate hyperparameters.. 333 ADVANCED ACHINE LEARNING SVR: Effect of Hyperparameters

3434 ADVANCED ACHINE LEARNING SVR: Effect of Hyperparameters ldemos does not dsplay the support vectors f there s more than one pont for the same x!

3535 ADVANCED ACHINE LEARNING SVR: Hyperparameters The soluton to SVR we just saw s referred to as SVR Two Hyperparameters 1 C mnmze w + 2 2 1 w, x b y subject to y w, x b 0, 0 C controls the penalty term on poor ft determnes the mnmal requred precson

3636 ADVANCED ACHINE LEARNING Extensons of SVR As n the classfcaton case, the optmzaton framework used for support vector regresson s extended wth: n-svr: yeldng a sparser verson of SVR and relaxng the constrant of choosng, the wdth of the nsenstve tube. - Relevance Vector Regresson: the regresson verson of RV, whch provdes also a sparser verson of SV and offers a probablstc nterpretaton of the soluton. (see Tppng 2011, supplementary materal to the class)

3737 ADVANCED ACHINE LEARNING Support Vector Regresson: n-svr As the number of data grows, so does the number of support vectors. nsvr puts a lower bound on the fracton of support vectors (see prevous case for SV) n 0,1

ADVANCED ACHINE LEARNING Support Vector Regresson: n-svr As for n-sv, one can rewrte the problem as a convex optmzaton expresson: 1 2 1 mn w C n under constrants 2 j 1 j j w,, T j j w x b y, T j j j y w x b, 0, 0 n 1, 0, 0. j j j The margn error s gven by all the data ponts for whch 0. j n s an upper bound on the fracton of tranng error and a lower bound on the fracton of support vectors. 3838

3939 ADVANCED ACHINE LEARNING nsvr: Example Effect of the automatc adaptaton of usng n-svr

4040 ADVANCED ACHINE LEARNING nsvr: Example Added nose on data Effect of the automatc adaptaton of usng n-svr

4141 ADVANCED ACHINE LEARNING Relevance Vector Regresson (RVR) Same prncple as that descrbed for RV (see sldes on SV and extensons). The dervaton of the parameters however dffer (see Tppng 2011 for detals). To recall, we start from the soluton of SV., y x f x k x x b 1 x T,... z x x x 1 In the (bnary) classfcaton case, y [0;1]. In the regresson case, y. A sparse soluton has a majorty of entres wth alpha zero. T Rewrte the soluton of SV as a lnear combnaton over bass functons 1 1 2 0. 3. 0. 0..

42 ADVANCED ACHINE LEARNING Comparson -SVR, n-svr, RVR Soluton wth -SVR: RBF kernel, C=3000, =0.08, s=0.05, 37 support vectors

43 ADVANCED ACHINE LEARNING Comparson -SVR, n-svr, RVR Soluton wth n-svr: RBF kernel, C=3000, n=0.04, s=0.001, 17 support vectors

44 ADVANCED ACHINE LEARNING Comparson -SVR, n-svr, RVR Soluton wth RVR: RBF kernel, =0.08, s=0.05, 7 support vectors

45 ADVANCED ACHINE LEARNING SVR: Examples of Applcatons

4646 ADVANCED ACHINE LEARNING Catchng Object n Flght Extremely fast computaton (object fles n half a second); reestmaton of arm moton to adapt to nosy vsual detecton of object.

4747 ADVANCED ACHINE LEARNING Catchng Object n Flght Learn a model of the translatonal and rotatonal moton of the object Not at the center of mass

ADVANCED ACHINE LEARNING Catchng Object n Flght Gather Demonstratons of free flyng object Precson (1cm, 1degree); Computaton 0.17-0.32 second ahead of tme Buld model of dynamcs usng Support Vector Regresson 1 x k x x, x x b Compute dervatve (closed form) T T Use model n Extended Kalman Flter for real-tme trackng Km, S. and Bllard, A. (2012) Estmatng the non-lnear dynamcs of free-flyng objects. Robotcs and Autonomous Systems, Volume 60, Issue 9, P. 1108 1122.. 4848

Kernel Wdth ADVANCED ACHINE LEARNING SVR wth RBF kernel SVR wth polynomal kernel C C Systematc assessment of senstvty to choce of hyperparameters and choce of kernel. 4949

ADVANCED ACHINE LEARNING Km, Shukla, Bllard, IEEE Transacton on Robotcs, 2014IEEE 2015 Transactons on on Robotcs Kng-Sun Fu Fu emoral Best Best Paper Award 201 5050

ADVANCED ACHINE LEARNING Km, Shukla, Bllard, IEEE Transacton on Robotcs, 2014IEEE 2015 Transactons on on Robotcs Kng-Sun Fu Fu emoral Best Best Paper Award 201 5151

ADVANCED ACHINE LEARNING Learnng a ult-attractor System Shukla and Bllard, NIPS 2012 - http://asvm.epfl.ch 5252

5353 ADVANCED ACHINE LEARNING 2 Crossng over

5454 ADVANCED ACHINE LEARNING Learnng a ult-attractor System Buld a partton wth support Vector achne (SV)

555 ADVANCED ACHINE LEARNING Learnng a ult-attractor System Buld a partton wth support Vector achne (SV) Attractors not located at the rght place

ADVANCED ACHINE LEARNING Learnng a ult-attractor System Extend the SV optmzaton framework wth new constrants axmze classfcaton margn All ponts correctly classfed Follow dynamcs Stablty at attractor Shukla and Bllard, NIPS 2012 - http://asvm.epfl.ch 5656

ADVANCED ACHINE LEARNING Learnng a ult-attractor System f x Standard SV - SVs New b- SVs Non-lnear bas Const. bas 5757

ADVANCED ACHINE LEARNING Several possble graspng ponts 5858

ADVANCED ACHINE LEARNING 5959

ADVANCED ACHINE LEARNING ultple Attractors System 6060

ADVANCED ACHINE LEARNING 6161