ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

Similar documents
MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Kernel Methods and SVMs Extension

Lecture 10 Support Vector Machines II

Which Separator? Spring 1

Support Vector Machines

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Natural Language Processing and Information Retrieval

Support Vector Machines

Linear Classification, SVMs and Nearest Neighbors

Maximal Margin Classifier

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Generalized Linear Methods

Lecture 3: Dual problems and Kernels

10-701/ Machine Learning, Fall 2005 Homework 3

PHYS 705: Classical Mechanics. Calculus of Variations II

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Support Vector Machines

Lecture Notes on Linear Regression

Support Vector Machines

Chapter 6 Support vector machine. Séparateurs à vaste marge

Linear Feature Engineering 11

Support Vector Machines

Ensemble Methods: Boosting

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Lecture 10 Support Vector Machines. Oct

CSC 411 / CSC D11 / CSC C11

Problem Set 9 Solutions

17 Support Vector Machines

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Support Vector Machines CS434

Support Vector Machines CS434

18-660: Numerical Methods for Engineering Design and Optimization

Lagrange Multipliers Kernel Trick

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

15-381: Artificial Intelligence. Regression and cross validation

Canonical transformations

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Homework Assignment 3 Due in class, Thursday October 15

EEE 241: Linear Systems

Intro to Visual Recognition

Kristin P. Bennett. Rensselaer Polytechnic Institute

Feature Selection: Part 1

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

SELECTED SOLUTIONS, SECTION (Weak duality) Prove that the primal and dual values p and d defined by equations (4.3.2) and (4.3.3) satisfy p d.

Chapter 9: Statistical Inference and the Relationship between Two Variables

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

Classification as a Regression Problem

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

FMA901F: Machine Learning Lecture 5: Support Vector Machines. Cristian Sminchisescu

Relevance Vector Machines Explained

Online Classification: Perceptron and Winnow

Linear Approximation with Regularization and Moving Least Squares

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Lecture 20: November 7

Lagrange Multipliers. A Somewhat Silly Example. Monday, 25 September 2013

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

CSE 252C: Computer Vision III

CSE 546 Midterm Exam, Fall 2014(with Solution)

Lecture 6: Support Vector Machines

1 Convex Optimization

Supporting Information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Chapter 3 Differentiation and Integration

Regression Using Support Vector Machines: Basic Foundations

COS 521: Advanced Algorithms Game Theory and Linear Programming

1 The Mistake Bound Model

The Expectation-Maximization Algorithm

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

Lecture 21: Numerical methods for pricing American type derivatives

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Errors for Linear Systems

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

THE VIBRATIONS OF MOLECULES II THE CARBON DIOXIDE MOLECULE Student Instructions

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

14 Lagrange Multipliers

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

CS4495/6495 Introduction to Computer Vision. 3C-L3 Calibrating cameras

CSCI B609: Foundations of Data Science

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Pattern Classification

Report on Image warping

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Advanced Introduction to Machine Learning

Evaluation of simple performance measures for tuning SVM hyperparameters

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

Generative classification models

Determination of Compressive Strength of Concrete by Statistical Learning Algorithms

Statistical machine learning and its application to neonatal seizure detection

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 10: Classifica8on with Support Vector Machine (cont.

3.1 ML and Empirical Distribution

15 Lagrange Multipliers

ECE559VV Project Report

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012

Transcription:

1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques

2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N f : and y f x. y 3 y 2 y True functon Estmate 4 y 1 y 1 x 2 x 3 x 4 x x 1,... Estmate f that best predct set of tranng ponts x, y?

3 ADVANCED ACHINE LEARNING Regresson: Issues N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N f : and y f x. y 3 y 2 y 4 y 1 y Ft strongly nfluenced by choce of: - dataponts for tranng - complexty of the model (nterpolaton) True functon Estmate 1 x 2 x 3 x 4 x x 1,... Estmate f that best predct set of tranng ponts x, y?

4 ADVANCED ACHINE LEARNING Regresson Algorthms n ths Course Support Vector achne Relevance Vector achne Support vector regresson Boostng random projectons Relevance vector regresson Boostng random gaussans Random Gaussan forest process regresson Gaussan Process Gradent boostng Locally weghted projected regresson

5 ADVANCED ACHINE LEARNING Today, we wll see: Support Vector achne Relevance Vector achne Support vector regresson Boostng random projectons Relevance vector regresson

66 ADVANCED ACHINE LEARNING Support Vector Regresson

77 ADVANCED ACHINE LEARNING Support Vector Regresson Assume a nonlnear mappng f, s.t. y f x. 1,... How to estmate f to best predct the par of tranng ponts x, y? How to generalze the support vector machne framework for classfcaton to estmate contnuous functons? 1. Assume a non-lnear mappng through feature space and then perform lnear regresson n feature space 2. Supervsed learnng mnmzes an error functon. Frst determne a way to measure error on testng set n the lnear case!

88 ADVANCED ACHINE LEARNING Support Vector Regresson Assume a lnear mappng f, s.t. y f x w T x b. 1,... How to estmate w and b to best predct the par of tranng ponts x, y? y T y f x w x b easure the error on predcton x

99 ADVANCED ACHINE LEARNING Support Vector Regresson Set an upper bound on the error and consder as correctly classfed all ponts such that f ( x) y. y T y f x w x b Penalze only dataponts that are not contaned n the -tube. x

1010 ADVANCED ACHINE LEARNING Support Vector Regresson The -margn s a measure of the wdth of the -nsenstve tube. It s a measure of the precson of the regresson. A small w corresponds to a small slope for f. In the lnear case, f s more horzontal. y x -margn

111 ADVANCED ACHINE LEARNING Support Vector Regresson A large w corresponds to a large slope for f. In the lnear case, f s more vertcal. The flatter the slope of the functon f, the larger the margn. y To maxmze the margn, we must mnmze the norm of w. x -margn

1212 ADVANCED ACHINE LEARNING Support Vector Regresson Ths can be rephrased as a constrant-based optmzaton problem of the form: 1 mnmze 2 subject to 1,... w 2 w, x b y y w, x b Need to penalze ponts outsde the -nsenstve tube. Consder as correctly classfed all ponts such that f ( x) y.

1313 ADVANCED ACHINE LEARNING Support Vector Regresson Introduce slack varables,, C 0 : 1 C mnmze w + 2 2 1 w, x b y subject to y w, x b 0, 0 Need to penalze ponts outsde the -nsenstve tube.

1414 ADVANCED ACHINE LEARNING Support Vector Regresson Introduce slack varables,, C 0 : 1 C mnmze w + 2 2 1 w, x b y subject to y w, x b 0, 0 All ponts outsde the -tube become Support Vectors We now have the soluton to the lnear regresson problem. How to generalze ths to the nonlnear case?

1515 ADVANCED ACHINE LEARNING Support Vector Regresson We can solve ths quadratc problem by ntroducng sets of, Lagrange multplers and wrtng the Lagrangan : Lagrangan = Objectve functon + multplers constrants 1 C C L,,, = + 2 2 w b w 1 1 1 1 y w, x b y w, x b

1616 ADVANCED ACHINE LEARNING Support Vector Regresson 0 for all ponts that satsfy the constrants ponts nsde the -tube 0 Constrants on ponts lyng on ether sde of the -tube. 1 C C L,,, = + 2 2 w b w 1 1 1 1 y w, x b y w, x b 0

1717 ADVANCED ACHINE LEARNING Support Vector Regresson Requrng that the partal dervatves are all zero: L 0. b 1 1 1 Rebalancng the effect of the support vectors on both sdes of the -tube L w x 0. w w 1 1 x. Lnear combnaton of support vectors The soluton s gven by: y f x w, x b x, x 1 b

1818 ADVANCED ACHINE LEARNING Support Vector Regresson Lft x nto feature space and then perform lnear regresson n feature space. Lnear Case: y f x w, x b Non-Lnear Case: x x, y f x w x b x x w lves n feature space!

1919 ADVANCED ACHINE LEARNING Support Vector Regresson In feature space, we obtan the same constraned optmzaton problem: 1 C mnmze w + 2 2 1 w, x b y subject to, y w x b 0, 0

2020 ADVANCED ACHINE LEARNING Support Vector Regresson We can solve ths quadratc problem by ntroducng sets of, Lagrange multplers and wrtng the Lagrangan : Lagrangan = Objectve functon + multplers constrants 1 C C L,,, = + 2 2 w b w 1 1, 1 y w, x b 1 y w x b

2121 ADVANCED ACHINE LEARNING Support Vector Regresson And replacng n the prmal Lagrangan, we get the Dual optmzaton problem: 1 2, j1 max, 1 1 j j j k x, x y C subject to 0 and, 0, 1 Kernel Trck, j, j k x x x x

222 ADVANCED ACHINE LEARNING Support Vector Regresson The soluton s gven by:, y f x k x x b 1 Lnear Coeffcents (Lagrange multplers for each constrant). If one uses RBF Kernel, un-normalzed sotropc Gaussans centered on each tranng datapont.

2323 ADVANCED ACHINE LEARNING Support Vector Regresson The soluton s gven by: y, y f x k x x b 1 Kernel places a Gaussan functon on each SV x

ADVANCED ACHINE LEARNING The soluton s gven by: Support Vector Regresson, y f x k x x b 1 The Lagrange multplers defne ythe mportance of each Gaussan functon. Converges to b when SV effect vanshes. Y=f(x) b x1 x2 x3 x4 x5 x6 1 1.5 3 1.5 4 3 5 1 2 2 6 2.5 x 2424

2525 ADVANCED ACHINE LEARNING Support Vector Regresson: Exercse I j j y x SVR gves the followng estmate for each par of dataponts, j j j y k x, x b, 1... 1 For the set of 3 ponts drawn below, 1 a) compute an estmate of b usng: b y j1 1 k x, x wth RBF kernel, and plot b. j j b) Plot the regressve curve and show how t vares dependng on the kernel wdth and.

2626 ADVANCED ACHINE LEARNING Support Vector Regresson: Exercse I Soluton a) To answer ths queston, you must frst determne the number of SVs. Ths depends on epslon. If we assume a small epslon, all 3 ponts become SVs. b s then nfluenced by the value of the kernel wdth. Wth a very small kernel wdth j j 1 j k x, x 0 x x and then, b y 1 s the mean of the data. j1 1 0 As the kernel wdth grows, b s pulled toward the SVs modulated by the kernel 1 j j b y k x, x j1 j Wth only 2 SVs, the nfluence of the SVs cancels out and we are back to the mean of the data.

ADVANCED ACHINE LEARNING Support Vector Regresson: Exercse I Soluton Kernel wdth = 0.001, epslon = 0.1 Kernel wdth = 0.1, epslon = 0.1 Wth small kernel wdth, the effect of each SV s well separated and the curve comes back to b n-between two SVs. Wth a large kernel wdth, b changes and the curve yelds a smooth nterpolaton n-between SVs. Wth a large epslon, one pont s absorbed n the epslon tube and s no longer a SV. Kernel wdth = 0.001, epslon = 0.24 2727

2828 ADVANCED ACHINE LEARNING Support Vector Regresson: Exercse II Recall the soluton to SV:, y f x k x x b 1 a) What type of functon f can you model wth the homogeneous polynomal? b) What mnmum order of a homogeneous polynomal kernel do you need to acheve good regresson on the set of 3 ponts below?

2929 ADVANCED ACHINE LEARNING Support Vector Regresson: Solutons Exercse II The equaton for a homogeneous polynomal n the 1-D case below s: 1 p y ( )( x x) b : sngle term scaled & shfted polynomal For the set of ponts below, we need at mnmum p=2 and 2 SVs. See also the supplementary exercses posted on the class s webste!

3030 ADVANCED ACHINE LEARNING SVR: Hyperparameters The soluton to SVR we just saw s referred to as SVR Two Hyperparameters 1 C mnmze w + 2 2 1 w, x b y subject to y w, x b 0, 0 C controls the penalty term on poor ft. determnes the mnmal requred precson.

3131 ADVANCED ACHINE LEARNING SVR: Effect of Hyperparameters Effect of the RBF kernel wdth on the ft. Here ft usng C=100, =0.1, kernel wdth=0.01.

3232 ADVANCED ACHINE LEARNING SVR: Effect of Hyperparameters Effect of the RBF kernel wdth on the ft. Here ft usng C=100, =0.01, kernel wdth=0.01 Overfttng

Effect of the RBF kernel wdth on the ft. Here ft usng C=100, =0.05, kernel wdth=0.01 Reducton of the effect of the kernel wdth on the ft by choosng approprate hyperparameters.. 333 ADVANCED ACHINE LEARNING SVR: Effect of Hyperparameters

3434 ADVANCED ACHINE LEARNING SVR: Effect of Hyperparameters ldemos does not dsplay the support vectors f there s more than one pont for the same x!

3535 ADVANCED ACHINE LEARNING SVR: Hyperparameters The soluton to SVR we just saw s referred to as SVR Two Hyperparameters 1 C mnmze w + 2 2 1 w, x b y subject to y w, x b 0, 0 C controls the penalty term on poor ft determnes the mnmal requred precson

3636 ADVANCED ACHINE LEARNING Extensons of SVR As n the classfcaton case, the optmzaton framework used for support vector regresson s extended wth: n-svr: yeldng a sparser verson of SVR and relaxng the constrant of choosng, the wdth of the nsenstve tube. - Relevance Vector Regresson: the regresson verson of RV, whch provdes also a sparser verson of SV and offers a probablstc nterpretaton of the soluton. (see Tppng 2011, supplementary materal to the class)

3737 ADVANCED ACHINE LEARNING Support Vector Regresson: n-svr As the number of data grows, so does the number of support vectors. nsvr puts a lower bound on the fracton of support vectors (see prevous case for SV) n 0,1

ADVANCED ACHINE LEARNING Support Vector Regresson: n-svr As for n-sv, one can rewrte the problem as a convex optmzaton expresson: 1 2 1 mn w C n under constrants 2 j 1 j j w,, T j j w x b y, T j j j y w x b, 0, 0 n 1, 0, 0. j j j The margn error s gven by all the data ponts for whch 0. j n s an upper bound on the fracton of tranng error and a lower bound on the fracton of support vectors. 3838

3939 ADVANCED ACHINE LEARNING nsvr: Example Effect of the automatc adaptaton of usng n-svr

4040 ADVANCED ACHINE LEARNING nsvr: Example Added nose on data Effect of the automatc adaptaton of usng n-svr

4141 ADVANCED ACHINE LEARNING Relevance Vector Regresson (RVR) Same prncple as that descrbed for RV (see sldes on SV and extensons). The dervaton of the parameters however dffer (see Tppng 2011 for detals). To recall, we start from the soluton of SV., y x f x k x x b 1 x T,... z x x x 1 In the (bnary) classfcaton case, y [0;1]. In the regresson case, y. A sparse soluton has a majorty of entres wth alpha zero. T Rewrte the soluton of SV as a lnear combnaton over bass functons 1 1 2 0. 3. 0. 0..

42 ADVANCED ACHINE LEARNING Comparson -SVR, n-svr, RVR Soluton wth -SVR: RBF kernel, C=3000, =0.08, s=0.05, 37 support vectors

43 ADVANCED ACHINE LEARNING Comparson -SVR, n-svr, RVR Soluton wth n-svr: RBF kernel, C=3000, n=0.04, s=0.001, 17 support vectors

44 ADVANCED ACHINE LEARNING Comparson -SVR, n-svr, RVR Soluton wth RVR: RBF kernel, =0.08, s=0.05, 7 support vectors

45 ADVANCED ACHINE LEARNING SVR: Examples of Applcatons

4646 ADVANCED ACHINE LEARNING Catchng Object n Flght Extremely fast computaton (object fles n half a second); reestmaton of arm moton to adapt to nosy vsual detecton of object.

4747 ADVANCED ACHINE LEARNING Catchng Object n Flght Learn a model of the translatonal and rotatonal moton of the object Not at the center of mass

ADVANCED ACHINE LEARNING Catchng Object n Flght Gather Demonstratons of free flyng object Precson (1cm, 1degree); Computaton 0.17-0.32 second ahead of tme Buld model of dynamcs usng Support Vector Regresson 1 x k x x, x x b Compute dervatve (closed form) T T Use model n Extended Kalman Flter for real-tme trackng Km, S. and Bllard, A. (2012) Estmatng the non-lnear dynamcs of free-flyng objects. Robotcs and Autonomous Systems, Volume 60, Issue 9, P. 1108 1122.. 4848

Kernel Wdth ADVANCED ACHINE LEARNING SVR wth RBF kernel SVR wth polynomal kernel C C Systematc assessment of senstvty to choce of hyperparameters and choce of kernel. 4949

ADVANCED ACHINE LEARNING Km, Shukla, Bllard, IEEE Transacton on Robotcs, 2014IEEE 2015 Transactons on on Robotcs Kng-Sun Fu Fu emoral Best Best Paper Award 201 5050

ADVANCED ACHINE LEARNING Km, Shukla, Bllard, IEEE Transacton on Robotcs, 2014IEEE 2015 Transactons on on Robotcs Kng-Sun Fu Fu emoral Best Best Paper Award 201 5151

ADVANCED ACHINE LEARNING Learnng a ult-attractor System Shukla and Bllard, NIPS 2012 - http://asvm.epfl.ch 5252

5353 ADVANCED ACHINE LEARNING 2 Crossng over

5454 ADVANCED ACHINE LEARNING Learnng a ult-attractor System Buld a partton wth support Vector achne (SV)

555 ADVANCED ACHINE LEARNING Learnng a ult-attractor System Buld a partton wth support Vector achne (SV) Attractors not located at the rght place

ADVANCED ACHINE LEARNING Learnng a ult-attractor System Extend the SV optmzaton framework wth new constrants axmze classfcaton margn All ponts correctly classfed Follow dynamcs Stablty at attractor Shukla and Bllard, NIPS 2012 - http://asvm.epfl.ch 5656

ADVANCED ACHINE LEARNING Learnng a ult-attractor System f x Standard SV - SVs New b- SVs Non-lnear bas Const. bas 5757

ADVANCED ACHINE LEARNING Several possble graspng ponts 5858

ADVANCED ACHINE LEARNING 5959

ADVANCED ACHINE LEARNING ultple Attractors System 6060

ADVANCED ACHINE LEARNING 6161