On non-parametric robust quantile regression by support vector machines

Size: px
Start display at page:

Download "On non-parametric robust quantile regression by support vector machines"

Transcription

1 On non-parametric robust quantile regression by support vector machines Andreas Christmann joint work with: Ingo Steinwart (Los Alamos National Lab) Arnout Van Messem (Vrije Universiteit Brussel) ERCIM 2008, Neuchâtel, Switzerland, June 19-21, 2008 On non-parametric robust quantile regression by support vector machines 1

2 Example linear QR logratio range LIDAR data set. Quantiles α = 0.95, 0.50, 0.05 On non-parametric robust quantile regression by support vector machines 2

3 Example SVM for quantile regression logratio range LIDAR data set. Quantiles α = 0.95, 0.50, 0.05 On non-parametric robust quantile regression by support vector machines 3

4 Nonparametric Quantile Regression Assumptions: X = complete measurable space, e.g. X = R d Y R closed, Y = D = D n = (z 1,..., z n ), z i := (x i, y i ) Z := X Y, D := 1 n n i=1 δ (x i,y i ) (X i, Y i ) i.i.d. P M 1, P (totally) unknown Goal: estimate quantile function f α,p(x) = inf {q Y; P(Y q X = x) α}, x X Assumption: f α,p unique If f α,p is linear: Koenker & Bassett 78, Koenker 05 On non-parametric robust quantile regression by support vector machines 4

5 Loss Function and Risk Pinball loss function: { (α 1)(y t), (y t) < 0 L α (y, t) := α(y t), (y t) 0 alpha=0.1 alpha=0.75 Loss of y t Loss of y t Risk: R Lα,P(f) := E P L α ( Y, f(x) ), y t y t P M1 On non-parametric robust quantile regression by support vector machines 5

6 Support Vector Machine approach Schölkopf et al 00 and Takeuchi et al. 06 proposed: f D,λ = arg min f H 1 n n ( L α Yi, f(x i ) ) + λ f 2 H, i=1 where H is a reproducing kernel Hilbert space (RKHS) and λ > 0. S(P) = f P,λ = arg min f H E PL α ( Y, f(x) ) + λ f 2 H, P M 1, On non-parametric robust quantile regression by support vector machines 6

7 Kernel k is kernel: k : X X K, if K-Hilbert space H and Φ : X H such that k(x, x ) = Φ(x ), Φ(x) H, x, x X H is reproducing kernel Hilbert space (RKHS): x X : δ x (f) := f(x), f H, is continuous reproducing kernel: k(, x) H, x X, and f(x) = f, k(, x) H, f H, x X Φ is canonical feature map: Φ(x) = k(, x), x X bounded: k := sup x X k(x, x) < GRBF: k(x, x ) = e γ x x 2 2, γ > 0 On non-parametric robust quantile regression by support vector machines 7

8 Consistency f D,λn is risk consistent, if R Lα,P(f D,λn ) P R L α,p := inf f:x R measurable R L α,p(f) (2.1) R L α,p,h := inf f H R L α,p(f) = R L α,p (2.2) Large RKHS (CHR & Steinwart 08) Let H be the RKHS of a bounded kernel k : X X R and µ be a distribution on X. Then the following statements are equivalent: 1 H is dense in L 1 (µ). 2 (2.2) holds for all P M 1 with P X = µ and E P Y <. On non-parametric robust quantile regression by support vector machines 8

9 Bounded L-risk (CHR & Steinwart 08) Assume: P M 1 with E P Y <, f : X R with f L 1 (P). Then: R Lα,P(f) <. On non-parametric robust quantile regression by support vector machines 9

10 Existence and uniqueness (CHR & Steinwart 08) Assume: P M 1 with E P Y <, H RKHS of bounded kernel k, λ > 0. Then: 1 exists unique minimizer S(P) = f P,λ H 2 f P,λ H R Lα,P(0)/λ. On non-parametric robust quantile regression by support vector machines 10

11 Consistency (Steinwart & CHR 08a) Assume: H separable RKHS of a bounded measurable kernel k such that H is dense in L 1 (µ) for all distributions µ on X (λ n ) n N with λ n 0. If λ 2 nn, then for all P M 1, E P Y < : 1 R Lα,P(f D,λn ) P R L α,p 2 f D,λn f α,p L 0 (P X ) P 0 If δ > 0 and λ 2+δ n n, then for all P M 1, E P Y < : 3 R Lα,P(f D,λn ) a.s. R L α,p a.s. 0 4 f D,λn f α,p L 0 (P X ) On non-parametric robust quantile regression by support vector machines 11

12 Rate of convergence 1 No-free-lunch theorem: no uniform rate of convergence! [Devroye 82] 2 Under many assumptions we have [Steinwart & CHR 08b] f D,λn f α,p L (P X ) c n 1/3 On non-parametric robust quantile regression by support vector machines 12

13 Robustness What is the impact on S(P) or S(P n ) due to violations from (X i, Y i ) i.i.d. P, P M 1 unknown? On non-parametric robust quantile regression by support vector machines 13

14 Bias, maxbias, sensitivity curve (CHR & Steinwart 07) Assume: E P Y < and E P Y <, (can be weakend) H RKHS of continuous and bounded kernel k, λ > 0, ε > 0. Then we have with c := λ 1 k max{α, 1 α} 1 Bias: f (1 ε)p+ε P,λ f P,λ H c P P tv ε 2 Maxbias: sup Q Nε(P) f Q,λ f P,λ H 2 c ε 3 Sensitivity curve: SC n (z; S n ) H 2c, z X Y On non-parametric robust quantile regression by support vector machines 14

15 On non-parametric Goal: Bounded robust quantile regression BIF by support vector machines 15 Bouligand Influence Function g : X Z is Bouligand differentiable at x 0 X, if a positive homogeneous function B g(x 0 ) : X Z exists with g(x0 + h) g(x 0 ) B g(x 0 )(h) Z lim = 0. h 0 h X Def. (CHR & Van Messem 08) The Bouligand influence function (BIF) of a function S : M 1 H for a distribution P in the direction of a distribution Q P is the special B-derivative (if it exists) lim ε(q P) 0 S ( P + ε(q P) ) S(P) BIF(Q; S, P) H ε Q P = 0.

16 On non-parametric Goal: Bounded robust quantile regression BIF by support vector machines 16 Bouligand Influence Function g : X Z is Bouligand differentiable at x 0 X, if a positive homogeneous function B g(x 0 ) : X Z exists with g(x0 + h) g(x 0 ) B g(x 0 )(h) Z lim = 0. h 0 h X Def. (CHR & Van Messem 08) The Bouligand influence function (BIF) of a function S : M 1 H for a distribution P in the direction of a distribution Q P is the special B-derivative (if it exists) lim ε 0 S ( (1 ε)p + εq ) S(P) BIF(Q; S, P) H ε = 0.

17 Bounded BIF (CHR & Van Messem 08) Assume: k bounded and measurable E P Y <, E Q Y < (can be weakend) δ > 0 positive constants ξ P, ξ Q, c P, and c Q such that t R with t f P,λ (x) δ k the following inequalities hold a [0, 2δ k ] and x X : Then: P ( Y [t, t+a] x ) c P a 1+ξ P, Q ( Y [t, t+a] x ) c Q a 1+ξ Q. BIF(Q; S, P) with S(P) := f P,λ 1 exists ( ( 2 1 equals 2λ X P Y fp,λ (x) ) ) x α Φ(x) dpx (x) ( ( 1 2λ X Q Y fp,λ (x) ) ) x α Φ(x) dqx (x) 3 bounded. On non-parametric robust quantile regression by support vector machines 17

18 Conclusions Non-parametric quantile regression by SVMs 1 exists a unique solution 2 consistent L α -risk of f D,λn converges to Bayes risk (in P) f D,λn converges to true quantile function (in P or a.s.) rate of convergence 3 robust if kernel bounded Bouligand-IF, sensitivity curve, maxbias are bounded 4 computable for large high-dimensional data sets On non-parametric robust quantile regression by support vector machines 18

19 References CHR & Steinwart (2007). Bernoulli. CHR & Steinwart (2008). Appl. Stochastic Models Bus. Ind. CHR & Van Messem (2008). J. Mach. Learn. Res. Koenker (2005). Quantile Regression. Cambridge U.P. Koenker & Bassett (1978). Econometrica Schölkopf & Smola (2002). Learning with Kernels. MIT Press. Steinwart & CHR (2008). Support Vector Machines. Springer. Steinwart & Chr (2008b). Advances in Neural Information Processing Systems, 20, Takeuchi, Le, Sears, Smola (2006). J. Mach. Learn. Res. On non-parametric robust quantile regression by support vector machines 19

20 Appendix: Example nonparametric QRSS logratio range LIDAR data set. Quantiles α = 0.95, 0.50, 0.05 [R-package quantreg, rqss(logratio qss(range,constraint= N,lambda=25), tau=0.5)] On non-parametric robust quantile regression by support vector machines 20

Robust Support Vector Machines for Probability Distributions

Robust Support Vector Machines for Probability Distributions Robust Support Vector Machines for Probability Distributions Andreas Christmann joint work with Ingo Steinwart (Los Alamos National Lab) ICORS 2008, Antalya, Turkey, September 8-12, 2008 Andreas Christmann,

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Consistency and robustness of kernel-based regression in convex risk minimization

Consistency and robustness of kernel-based regression in convex risk minimization Bernoulli 13(3), 2007, 799 819 DOI: 10.3150/07-BEJ5102 Consistency and robustness of kernel-based regression in convex risk minimization ANDREAS CHRISTMANN 1 and INGO STEINWART 2 1 Department of Mathematics,

More information

Consistency and robustness of kernel-based regression in convex risk minimization

Consistency and robustness of kernel-based regression in convex risk minimization Bernoulli 13(3), 2007, 799 819 DOI: 10.3150/07-BEJ5102 arxiv:0709.0626v1 [math.st] 5 Sep 2007 Consistency and robustness of kernel-based regression in convex risk minimization ANDREAS CHRISTMANN 1 and

More information

Statistical Optimality of Stochastic Gradient Descent through Multiple Passes

Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Loucas Pillaud-Vivien

More information

Strictly Positive Definite Functions on a Real Inner Product Space

Strictly Positive Definite Functions on a Real Inner Product Space Strictly Positive Definite Functions on a Real Inner Product Space Allan Pinkus Abstract. If ft) = a kt k converges for all t IR with all coefficients a k 0, then the function f< x, y >) is positive definite

More information

RegML 2018 Class 2 Tikhonov regularization and kernels

RegML 2018 Class 2 Tikhonov regularization and kernels RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,

More information

Oslo Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

Gaps in Support Vector Optimization

Gaps in Support Vector Optimization Gaps in Support Vector Optimization Nikolas List 1 (student author), Don Hush 2, Clint Scovel 2, Ingo Steinwart 2 1 Lehrstuhl Mathematik und Informatik, Ruhr-University Bochum, Germany nlist@lmi.rub.de

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

Methoden des maschinellen Lernens für Daten aus der Versicherungswirtschaft

Methoden des maschinellen Lernens für Daten aus der Versicherungswirtschaft Methoden des maschinellen Lernens für Daten aus der Versicherungswirtschaft Universität Dortmund, Fachbereich Statistik christmann@statistik.uni-dortmund.de DoMuS Kolloquium und Vollversammlung, Dortmund,

More information

TUM 2016 Class 1 Statistical learning theory

TUM 2016 Class 1 Statistical learning theory TUM 2016 Class 1 Statistical learning theory Lorenzo Rosasco UNIGE-MIT-IIT July 25, 2016 Machine learning applications Texts Images Data: (x 1, y 1 ),..., (x n, y n ) Note: x i s huge dimensional! All

More information

Quantile Processes for Semi and Nonparametric Regression

Quantile Processes for Semi and Nonparametric Regression Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response

More information

Strictly positive definite functions on a real inner product space

Strictly positive definite functions on a real inner product space Advances in Computational Mathematics 20: 263 271, 2004. 2004 Kluwer Academic Publishers. Printed in the Netherlands. Strictly positive definite functions on a real inner product space Allan Pinkus Department

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels 1/12 MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels Dominique Guillot Departments of Mathematical Sciences University of Delaware March 14, 2016 Separating sets:

More information

Kernel methods and the exponential family

Kernel methods and the exponential family Kernel methods and the exponential family Stephane Canu a Alex Smola b a 1-PSI-FRE CNRS 645, INSA de Rouen, France, St Etienne du Rouvray, France b Statistical Machine Learning Program, National ICT Australia

More information

Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results

Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results Peter L. Bartlett 1 and Ambuj Tewari 2 1 Division of Computer Science and Department of Statistics University of California,

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department

More information

Hilbert Space Embedding of Probability Measures

Hilbert Space Embedding of Probability Measures Lecture 2 Hilbert Space Embedding of Probability Measures Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 2017 Recap of Lecture

More information

Does Modeling Lead to More Accurate Classification?

Does Modeling Lead to More Accurate Classification? Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang

More information

A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

More information

BINARY CLASSIFICATION

BINARY CLASSIFICATION BINARY CLASSIFICATION MAXIM RAGINSY The problem of binary classification can be stated as follows. We have a random couple Z = X, Y ), where X R d is called the feature vector and Y {, } is called the

More information

Mathematical Methods for Data Analysis

Mathematical Methods for Data Analysis Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Efficient Complex Output Prediction

Efficient Complex Output Prediction Efficient Complex Output Prediction Florence d Alché-Buc Joint work with Romain Brault, Alex Lambert, Maxime Sangnier October 12, 2017 LTCI, Télécom ParisTech, Institut-Mines Télécom, Université Paris-Saclay

More information

Learning with kernels and SVM

Learning with kernels and SVM Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

arxiv: v2 [stat.ml] 10 Jul 2017

arxiv: v2 [stat.ml] 10 Jul 2017 Mean Absolute Percentage Error for Regression Models arxiv:1605.02541v2 [stat.ml] 10 Jul 2017 5 Arnaud de Myttenaere a,c, Boris Golden a, Bénédicte Le Grand b, Fabrice Rossi c, a Viadeo, 30 rue de la Victoire,

More information

A Bahadur Representation of the Linear Support Vector Machine

A Bahadur Representation of the Linear Support Vector Machine A Bahadur Representation of the Linear Support Vector Machine Yoonkyung Lee Department of Statistics The Ohio State University October 7, 2008 Data Mining and Statistical Learning Study Group Outline Support

More information

Recovering Distributions from Gaussian RKHS Embeddings

Recovering Distributions from Gaussian RKHS Embeddings Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded

More information

Statistical Properties of Large Margin Classifiers

Statistical Properties of Large Margin Classifiers Statistical Properties of Large Margin Classifiers Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Jordan, Jon McAuliffe, Ambuj Tewari. slides

More information

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

Classification and statistical machine learning

Classification and statistical machine learning http://www.di.ens.fr/~arlot/ 1 Cnrs 2 École Normale Supérieure (Paris), DI/ENS, Équipe Sierra CEMRACS 2013, July 26th, 2013 1/53 Outline 1 Introduction 2 Goals 3 Overfitting 4 Examples 5 Key issues 2/53

More information

Interval-valued regression and classification models in the framework of machine learning

Interval-valued regression and classification models in the framework of machine learning 7th International Symposium on Imprecise Probability: Theories and Applications, Innsbruck, Austria, 2011 Interval-valued regression and classification models in the framework of machine learning Lev V.

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Support Vector Method for Multivariate Density Estimation

Support Vector Method for Multivariate Density Estimation Support Vector Method for Multivariate Density Estimation Vladimir N. Vapnik Royal Halloway College and AT &T Labs, 100 Schultz Dr. Red Bank, NJ 07701 vlad@research.att.com Sayan Mukherjee CBCL, MIT E25-201

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

arxiv: v1 [stat.ml] 19 Mar 2017

arxiv: v1 [stat.ml] 19 Mar 2017 Universal Consistency and Robustness of Localized Support Vector Machines Florian Dumpert Department of Mathematics, University of Bayreuth, Germany ariv:1703.06528v1 [stat.ml] 19 Mar 2017 Abstract The

More information

Polyhedral Computation. Linear Classifiers & the SVM

Polyhedral Computation. Linear Classifiers & the SVM Polyhedral Computation Linear Classifiers & the SVM mcuturi@i.kyoto-u.ac.jp Nov 26 2010 1 Statistical Inference Statistical: useful to study random systems... Mutations, environmental changes etc. life

More information

CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)

CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont) CS325 Artificial Intelligence Cengiz Spring 2013 Model Complexity in Learning f(x) x Model Complexity in Learning f(x) x Let s start with the linear case... Linear Regression Linear Regression price =

More information

The Learning Problem and Regularization

The Learning Problem and Regularization 9.520 Class 02 February 2011 Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference problem from usually small sets of high dimensional, noisy data. Learning

More information

Kernel methods and the exponential family

Kernel methods and the exponential family Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1- PSI - FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insa-rouen.fr 2- Statistical Machine

More information

Kernel methods for Bayesian inference

Kernel methods for Bayesian inference Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization

More information

Hilbert Space Methods in Learning

Hilbert Space Methods in Learning Hilbert Space Methods in Learning guest lecturer: Risi Kondor 6772 Advanced Machine Learning and Perception (Jebara), Columbia University, October 15, 2003. 1 1. A general formulation of the learning problem

More information

Online Gradient Descent Learning Algorithms

Online Gradient Descent Learning Algorithms DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline

More information

Optimal Rates for Regularized Least Squares Regression

Optimal Rates for Regularized Least Squares Regression Optimal Rates for Regularized Least Suares Regression Ingo Steinwart, Don Hush, and Clint Scovel Modeling, Algorithms and Informatics Group, CCS-3 Los Alamos National Laboratory {ingo,dhush,jcs}@lanl.gov

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

Maximum Mean Discrepancy

Maximum Mean Discrepancy Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia

More information

Distribution Regression

Distribution Regression Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information

SMO Algorithms for Support Vector Machines without Bias Term

SMO Algorithms for Support Vector Machines without Bias Term Institute of Automatic Control Laboratory for Control Systems and Process Automation Prof. Dr.-Ing. Dr. h. c. Rolf Isermann SMO Algorithms for Support Vector Machines without Bias Term Michael Vogt, 18-Jul-2002

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

ASSESSING ROBUSTNESS OF CLASSIFICATION USING ANGULAR BREAKDOWN POINT

ASSESSING ROBUSTNESS OF CLASSIFICATION USING ANGULAR BREAKDOWN POINT Submitted to the Annals of Statistics ASSESSING ROBUSTNESS OF CLASSIFICATION USING ANGULAR BREAKDOWN POINT By Junlong Zhao, Guan Yu and Yufeng Liu, Beijing Normal University, China State University of

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6

More information

Support Vector Machine Regression for Volatile Stock Market Prediction

Support Vector Machine Regression for Volatile Stock Market Prediction Support Vector Machine Regression for Volatile Stock Market Prediction Haiqin Yang, Laiwan Chan, and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,

More information

INDEPENDENCE MEASURES

INDEPENDENCE MEASURES INDEPENDENCE MEASURES Beatriz Bueno Larraz Máster en Investigación e Innovación en Tecnologías de la Información y las Comunicaciones. Escuela Politécnica Superior. Máster en Matemáticas y Aplicaciones.

More information

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane

More information

1. Mathematical Foundations of Machine Learning

1. Mathematical Foundations of Machine Learning 1. Mathematical Foundations of Machine Learning 1.1 Basic Concepts Definition of Learning Definition [Mitchell (1997)] A computer program is said to learn from experience E with respect to some class of

More information

Quantiles symmetry. Keywords: Quantile function, distribution function, symmetry, equivariance

Quantiles symmetry. Keywords: Quantile function, distribution function, symmetry, equivariance 1 Quantiles symmetry Reza Hosseini, University of British Columbia 333-6356 Agricultural Road, Vancouver, BC, Canada, V6T1Z2 reza1317@gmail.com arxiv:1004.0540v1 [math.st] 4 Apr 2010 1 Abstract This paper

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

Semi-Supervised Learning through Principal Directions Estimation

Semi-Supervised Learning through Principal Directions Estimation Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Curve learning. p.1/35

Curve learning. p.1/35 Curve learning Gérard Biau UNIVERSITÉ MONTPELLIER II p.1/35 Summary The problem The mathematical model Functional classification 1. Fourier filtering 2. Wavelet filtering Applications p.2/35 The problem

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

A Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes

A Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes A Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes Ürün Dogan 1 Tobias Glasmachers 2 and Christian Igel 3 1 Institut für Mathematik Universität Potsdam Germany

More information

On-line Support Vector Machines and Optimization Strategies

On-line Support Vector Machines and Optimization Strategies On-line Support Vector Machines and Optimization Strategies J.M. Górriz, C.G. Puntonet, M. Salmerón, R. Martin-Clemente and S. Hornillo-Mellado and J. Ortega E.S.I., Informática, Universidad de Granada

More information

3. Some tools for the analysis of sequential strategies based on a Gaussian process prior

3. Some tools for the analysis of sequential strategies based on a Gaussian process prior 3. Some tools for the analysis of sequential strategies based on a Gaussian process prior E. Vazquez Computer experiments June 21-22, 2010, Paris 21 / 34 Function approximation with a Gaussian prior Aim:

More information

An introduction to Support Vector Machines

An introduction to Support Vector Machines 1 An introduction to Support Vector Machines Giorgio Valentini DSI - Dipartimento di Scienze dell Informazione Università degli Studi di Milano e-mail: valenti@dsi.unimi.it 2 Outline Linear classifiers

More information

An Introduction to Kernel Methods 1

An Introduction to Kernel Methods 1 An Introduction to Kernel Methods 1 Yuri Kalnishkan Technical Report CLRC TR 09 01 May 2009 Department of Computer Science Egham, Surrey TW20 0EX, England 1 This paper has been written for wiki project

More information

References for online kernel methods

References for online kernel methods References for online kernel methods W. Liu, J. Principe, S. Haykin Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, 2010. W. Liu, P. Pokharel, J. Principe. The kernel least mean square

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,

More information

Sparseness of Support Vector Machines

Sparseness of Support Vector Machines Journal of Machine Learning Research 4 2003) 07-05 Submitted 0/03; Published /03 Sparseness of Support Vector Machines Ingo Steinwart Modeling, Algorithms, and Informatics Group, CCS-3 Mail Stop B256 Los

More information

Iteratively Reweighted Least Square for Asymmetric L 2 -Loss Support Vector Regression

Iteratively Reweighted Least Square for Asymmetric L 2 -Loss Support Vector Regression Iteratively Reweighted Least Square for Asymmetric L 2 -Loss Support Vector Regression Songfeng Zheng Department of Mathematics Missouri State University Springfield, MO 65897 SongfengZheng@MissouriState.edu

More information

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018 Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018 Topics: consistent estimators; sub-σ-fields and partial observations; Doob s theorem about sub-σ-field measurability;

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

ν =.1 a max. of 10% of training set can be margin errors ν =.8 a max. of 80% of training can be margin errors

ν =.1 a max. of 10% of training set can be margin errors ν =.8 a max. of 80% of training can be margin errors p.1/1 ν-svms In the traditional softmargin classification SVM formulation we have a penalty constant C such that 1 C size of margin. Furthermore, there is no a priori guidance as to what C should be set

More information

Statistical Properties and Adaptive Tuning of Support Vector Machines

Statistical Properties and Adaptive Tuning of Support Vector Machines Machine Learning, 48, 115 136, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Statistical Properties and Adaptive Tuning of Support Vector Machines YI LIN yilin@stat.wisc.edu

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

Nonparametric Quantile Estimation

Nonparametric Quantile Estimation Nonparametric Quantile Estimation NICTA Technical Report 1000005T.1 Statistical Machine Learning Program, Canberra June 2005 Quoc V. Le Tim Sears Alexander J. Smola National ICTA Australia and National

More information

Metric Embedding for Kernel Classification Rules

Metric Embedding for Kernel Classification Rules Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding

More information